Boost _ Firedancer v0.1 34564 - [Blockchain_DLT - Medium] shred tile overflow

Submitted on Thu Aug 15 2024 21:52:58 GMT-0400 (Atlantic Standard Time) by @gln for Boost | Firedancer v0.1

Report ID: #34564

Report type: Blockchain/DLT

Report severity: Medium

Target: https://github.com/firedancer-io/firedancer/tree/e60d9a6206efaceac65a5a2c3a9e387a79d1d096

Impacts:

  • Any bug leading to loss of funds or acceptance of forged / invalid signatures

Description

Brief/Intro

To process incoming shreds from network, shred tile calls fd_fec_resolver_add_shred() which is vulnerable to heap overflow.

Vulnerability Details

Let's look at the code https://github.com/firedancer-io/firedancer/blob/main/src/app/fdctl/run/tiles/fd_shred.c#L298


static void
during_frag( void * _ctx,
             ulong  in_idx,
             ulong  seq,
             ulong  sig,
             ulong  chunk,
             ulong  sz,
             int *  opt_filter ) {
  (void)seq;

  fd_shred_ctx_t * ctx = (fd_shred_ctx_t *)_ctx;
  ...
   } else { /* the common case, from the netmux tile */
    /* The FEC resolver API does not present a prepare/commit model. If we
       get overrun between when the FEC resolver verifies the signature
       and when it stores the local copy, we could end up storing and
       retransmitting garbage.  Instead we copy it locally, sadly, and
       only give it to the FEC resolver when we know it won't be overrun
       anymore. */
1.    if( FD_UNLIKELY( chunk<ctx->net_in_chunk0 || chunk>ctx->net_in_wmark || sz>FD_NET_MTU ) )
      FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz, ctx->net_in_chunk0, ctx->net_in_wmark ));
    uchar const * dcache_entry = fd_chunk_to_laddr_const( ctx->net_in_mem, chunk );
    ulong hdr_sz = fd_disco_netmux_sig_hdr_sz( sig );
    FD_TEST( hdr_sz <= sz ); /* Should be ensured by the net tile */
    fd_shred_t const * shred = fd_shred_parse( dcache_entry+hdr_sz, sz-hdr_sz );
    if( FD_UNLIKELY( !shred ) ) {
      *opt_filter = 1;
      return;
    };
    ...
    fd_memcpy( ctx->shred_buffer, dcache_entry+hdr_sz, sz-hdr_sz );
    ctx->shred_buffer_sz = sz-hdr_sz;
  }
}
  1. The only check here is that packet size should be larger than FD_NET_MTU, which is 2048

Now let's look at function after_frag(), which processes incoming shreds:

  1. The shred is parsed by calling fd_shred_parse().

Note that fd_shred_parse() basically does not have any upper bounds limits on incoming shreds.

As a result, shred could have any size between FD_SHRED_MAX_SZ (which is 1228 bytes) and FD_NET_MTU.

  1. To add parsed shred to FEC set, the function fd_fec_resolver_add_shred() is called

Let's look at this function:

Note fd_memcpy() on line #3.

We also need to see, how ctx->set->data_shreds and ctx->set->parity_shreds are allocated:

And we also need the declaration of fd_shred34_t structure:

So, ctx->set->data_shreds are adjacent to each other in FEC set.

Thus fd_memcpy() on line #3 will copy incoming shred to data_shreds[] array which is 1228 bytes in size.

If the size of incoming shred is larger than 1228, next shred in FEC set will be overwritten.

Also if shreds are coming out of order, that is - first shred comes with in_type_idx 1, than second shred with in_type_idx 0, it is possible to overwrite parts of first shred in FEC set.

Such overflow will invalidates first shred (already added) in FEC set, because it has been validated and its signature was checked before.

Note that Agave apparently discards such malformed shreds.

I see the following scenarios how it could be exploited:

  1. slashing of FD node for producing bad blocks

  2. consensus split between FD and Agave nodes as FD nodes will accept and parse such shreds, Agave will not

  3. In case incoming shred is the last shred in pkts[] array, heap overflow will occur.

This could be potentially be a promising remote code execution vulnerability, as shreds are coming from network.

Currently, looke like RCE vector is not possible to exploit, as fd_shred34 structure lays in the middle of huge mapped region of 3GB in size.

Impact Details

Consensus split between FD and Agave nodes. Possibility of RCE.

Proof of concept

Proof of Concept

How to reproduce:

  1. get archive by using provided gist link

  2. unpack it:

  1. copy provided test_fec_resolver.c over src/disco/shred/test_fec_resolver.c

  2. build FD with:

  1. run test_fec_resolver unit-test:

  1. proof of concept script t1.py should be tested against live FD, but before we need a few modifictions to the code (to simplify the testing):

6.1) comment out lines 548-552 https://github.com/firedancer-io/firedancer/blob/main/src/app/fdctl/run/tiles/fd_shred.c#L548

6.2) comment out lines 439-442 https://github.com/firedancer-io/firedancer/blob/main/src/disco/shred/fd_fec_resolver.c#L439

6.3) after fd_memcpy() https://github.com/firedancer-io/firedancer/blob/main/src/disco/shred/fd_fec_resolver.c#L497 add the following code (we are checking if next shred in FEC set has been overwritten):

  1. run FD:

  1. run t1.py:

  1. notice that shred tile crashes with message, which means adjacent shred in FEC set has been overwritten:

Last updated

Was this helpful?