Any bug leading to loss of funds or acceptance of forged / invalid signatures
Description
Brief/Intro
To process incoming shreds from network, shred tile calls fd_fec_resolver_add_shred() which is vulnerable to heap overflow.
Vulnerability Details
Let's look at the code https://github.com/firedancer-io/firedancer/blob/main/src/app/fdctl/run/tiles/fd_shred.c#L298
static void
during_frag( void * _ctx,
ulong in_idx,
ulong seq,
ulong sig,
ulong chunk,
ulong sz,
int * opt_filter ) {
(void)seq;
fd_shred_ctx_t * ctx = (fd_shred_ctx_t *)_ctx;
...
} else { /* the common case, from the netmux tile */
/* The FEC resolver API does not present a prepare/commit model. If we
get overrun between when the FEC resolver verifies the signature
and when it stores the local copy, we could end up storing and
retransmitting garbage. Instead we copy it locally, sadly, and
only give it to the FEC resolver when we know it won't be overrun
anymore. */
1. if( FD_UNLIKELY( chunk<ctx->net_in_chunk0 || chunk>ctx->net_in_wmark || sz>FD_NET_MTU ) )
FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz, ctx->net_in_chunk0, ctx->net_in_wmark ));
uchar const * dcache_entry = fd_chunk_to_laddr_const( ctx->net_in_mem, chunk );
ulong hdr_sz = fd_disco_netmux_sig_hdr_sz( sig );
FD_TEST( hdr_sz <= sz ); /* Should be ensured by the net tile */
fd_shred_t const * shred = fd_shred_parse( dcache_entry+hdr_sz, sz-hdr_sz );
if( FD_UNLIKELY( !shred ) ) {
*opt_filter = 1;
return;
};
...
fd_memcpy( ctx->shred_buffer, dcache_entry+hdr_sz, sz-hdr_sz );
ctx->shred_buffer_sz = sz-hdr_sz;
}
}
The only check here is that packet size should be larger than FD_NET_MTU, which is 2048
Now let's look at function after_frag(), which processes incoming shreds:
So, ctx->set->data_shreds are adjacent to each other in FEC set.
Thus fd_memcpy() on line #3 will copy incoming shred to data_shreds[] array which is 1228 bytes in size.
If the size of incoming shred is larger than 1228, next shred in FEC set will be overwritten.
Also if shreds are coming out of order, that is - first shred comes with in_type_idx 1, than second shred with in_type_idx 0, it is possible to overwrite parts of first shred in FEC set.
Such overflow will invalidates first shred (already added) in FEC set, because it has been validated and its signature was checked before.
Note that Agave apparently discards such malformed shreds.
I see the following scenarios how it could be exploited:
slashing of FD node for producing bad blocks
consensus split between FD and Agave nodes as FD nodes will accept and parse such shreds, Agave will not
In case incoming shred is the last shred in pkts[] array, heap overflow will occur.
This could be potentially be a promising remote code execution vulnerability, as shreds are coming from network.
Currently, looke like RCE vector is not possible to exploit, as fd_shred34 structure lays in the middle of huge mapped region of 3GB in size.
Impact Details
Consensus split between FD and Agave nodes. Possibility of RCE.
Proof of concept
Proof of Concept
How to reproduce:
get archive by using provided gist link
unpack it:
$ base64 -d arch.txt > arch.tgz
$ tar zxf arch.tgz
copy provided test_fec_resolver.c over src/disco/shred/test_fec_resolver.c
build FD with:
EXTRAS="asan" make -j unit-test
run test_fec_resolver unit-test:
$ ...test_fec_resolver test1.bin
=================================================================
==162381==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61a00000054c at pc 0x562f4677584a bp 0x7ffde97060b0 sp 0x7ffde9705880
WRITE of size 1648 at 0x61a00000054c thread T0
#0 0x562f46775849 in __asan_memcpy (/build/linux/clang/x86_64/unit-test/test_fec_resolver+0xb6849) (BuildId: 741307849f3df20bb7c98e537e880c65c37056cd)
#1 0x562f467b8b80 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10
#2 0x562f467b8b80 in fd_memcpy /src/disco/shred/../../ballet/shred/../bmtree/../../util/fd_util_base.h:1011:10
#3 0x562f467b8b80 in fd_fec_resolver_add_shred /src/disco/shred/fd_fec_resolver.c:519:2
#4 0x562f467b43c8 in test_one_batch /src/disco/shred/test_fec_resolver.c:106:8
#5 0x562f467b43c8 in main /src/disco/shred/test_fec_resolver.c:135:4
#6 0x710c23e29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#7 0x710c23e29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#8 0x562f466f3624 in _start (/build/linux/clang/x86_64/unit-test/test_fec_resolver+0x34624) (BuildId: 741307849f3df20bb7c98e537e880c65c37056cd)
...
SUMMARY: AddressSanitizer: heap-buffer-overflow (/build/linux/clang/x86_64/unit-test/test_fec_resolver+0xb6849) (BuildId: 741307849f3df20bb7c98e537e880c65c37056cd) in __asan_memcpy
proof of concept script t1.py should be tested against live FD, but before we need a few modifictions to the code (to simplify the testing):
6.1) comment out lines 548-552 https://github.com/firedancer-io/firedancer/blob/main/src/app/fdctl/run/tiles/fd_shred.c#L548
6.2) comment out lines 439-442 https://github.com/firedancer-io/firedancer/blob/main/src/disco/shred/fd_fec_resolver.c#L439
6.3) after fd_memcpy() https://github.com/firedancer-io/firedancer/blob/main/src/disco/shred/fd_fec_resolver.c#L497 add the following code (we are checking if next shred in FEC set has been overwritten):