Network not being able to confirm new transactions (total network shutdown)
RPC API crash affecting projects with greater than or equal to 25% of the market capitalization on top of the respective layer
Description
Brief/Intro
Archivers can join network without any staking. Network has a max limit for archivers to join, but shardus-core has a bug that allows more than MAX limit archiver to join the network.
This bug can harm network in many ways, for example it disallows any other archiver from joining the network, or when a node wants to join/left the network, it finds a random archiver and requests some data from it, because a malicious actor can join it's archivers more than specified limit, it is possible that every time a node selects a random archiver that archiver is one of these malicious ones. So bad actor can return invalid data and break the network. Another example which i provided a POC for it, can completely disable archivers functionality to save Cycle data, so history of blockchain would be lost forever.
I will explain the problem here and provide a POC after.
Vulnerability Details
For an archiver to join the network, it should send a http request to a node. Node handles request here:
then addArchiverJoinRequest function is called which does some validations and adds join request to a list and propagates it to other nodes
src/shardus/index.ts
exportfunctionaddArchiverJoinRequest(joinRequest:P2P.ArchiversTypes.Request, tracker?, gossip =true) {// validate inputlet err =validateTypes(joinRequest, { nodeInfo:'o', requestType:'s', requestTimestamp:'n', sign:'o' })if (err) {warn('addJoinRequest: bad joinRequest '+ err)return { success:false, reason:'bad joinRequest '+ err } } err =validateTypes(joinRequest.nodeInfo, { curvePk:'s', ip:'s', port:'n', publicKey:'s', })if (err) {warn('addJoinRequest: bad joinRequest.nodeInfo '+ err)return { success:false, reason:'bad joinRequest '+ err } }if (joinRequest.requestType !==P2P.ArchiversTypes.RequestTypes.JOIN) {warn('addJoinRequest: invalid joinRequest.requestType')return { success:false, reason:'invalid joinRequest.requestType' } } err =validateTypes(joinRequest.sign, { owner:'s', sig:'s' })if (err) {warn('addJoinRequest: bad joinRequest.sign '+ err)return { success:false, reason:'bad joinRequest.sign '+ err } }if (!crypto.verify(joinRequest,joinRequest.nodeInfo.publicKey)) {warn('addJoinRequest: bad signature')return { success:false, reason:'bad signature ' } }if (archivers.get(joinRequest.nodeInfo.publicKey)) {warn('addJoinRequest: This archiver is already in the active archiver list')return { success:false, reason:'This archiver is already in the active archiver list' } }constexistingJoinRequest=joinRequests.find( (j) =>j.nodeInfo.publicKey ===joinRequest.nodeInfo.publicKey )if (existingJoinRequest) {warn('addJoinRequest: This archiver join request already exists')return { success:false, reason:'This archiver join request already exists' } }if (Context.config.p2p.forceBogonFilteringOn) {if (isBogonIP(joinRequest.nodeInfo.ip)) {warn('addJoinRequest: This archiver join request uses a bogon IP')return { success:false, reason:'This archiver join request is a bogon IP' } } }if (archivers.size >0) {// Check the archiver version from dappif (Context.config.p2p.validateArchiverAppData) {constvalidationResponse=validateArchiverAppData(joinRequest)if (validationResponse &&!validationResponse.success) return validationResponse }// Check if the archiver request timestamp is within the acceptable timestamp range (after current cycle, before next cycle)constrequestTimestamp=joinRequest.requestTimestampconstcycleDuration=newest.durationconstcycleStart=newest.startconstcurrentCycleStartTime= (cycleStart + cycleDuration) *1000constnextCycleStartTime= (cycleStart +2* cycleDuration) *1000if (requestTimestamp < currentCycleStartTime) {warn('addJoinRequest: This archiver join request timestamp is earlier than acceptable timestamp range')return { success:false, reason:'This archiver join request timestamp is earlier than acceptable timestamp range', } }if (requestTimestamp > nextCycleStartTime) {warn('addJoinRequest: This archiver join request timestamp exceeds acceptable timestamp range')return { success:false, reason:'This archiver join request timestamp exceeds acceptable timestamp range', } }// Get the consensus radius of the networktry {const { shardGlobals: { consensusRadius }, } =Context.stateManager.getCurrentCycleShardData()if (archivers.size >= consensusRadius *config.p2p.maxArchiversSubscriptionPerNode) {warn('addJoinRequest: This archiver cannot join as max archivers limit has been reached')return { success:false, reason:'Max number of archivers limit reached' } } } catch (e) {warn('addJoinRequest: Failed to get consensus radius', e)return { success:false, reason:'This node is not ready to accept this request!' } } }joinRequests.push(joinRequest)if (logFlags.console)console.log(`Join request added in cycle ${CycleCreator.currentCycle}, quarter ${CycleCreator.currentQuarter}`, joinRequest )if (gossip ===true) {Comms.sendGossip('joinarchiver', joinRequest, tracker,null,NodeList.byIdOrder,true) }return { success:true }}
we can see addArchiverJoinRequest function checks that active archivers count is not greater than maximum allowed value:
const { shardGlobals: { consensusRadius },} =Context.stateManager.getCurrentCycleShardData()if (archivers.size >= consensusRadius *config.p2p.maxArchiversSubscriptionPerNode) {warn('addJoinRequest: This archiver cannot join as max archivers limit has been reached')return { success:false, reason:'Max number of archivers limit reached' }}
and the bug is here, because every accepted join request would be appended to active archiver list (i will show it later) here you must check that archivers.size + joinRequests.size not be greater than maximum value.
So our join request is appended to joinRequests array. We continue with how shardeum uses this list. In every cycle a node calls getTxs() function on every submodule to process those transactions and adds them to block.
shardus-core/src/p2p/CycleCreator.ts
functioncollectCycleTxs():P2P.CycleCreatorTypes.CycleTxs {/* prettier-ignore */if (logFlags.p2pNonFatal) console.log('collectCycleTxs: inside collectCycleTxs')// Collect cycle txs from all submodulesconsttxs=submodules.map((submodule) =>submodule.getTxs())returnObject.assign({},...txs)}
Archivers.ts that we saw earlier is a sub module, it returns transactions as below:
shardus-core/src/p2p/Archivers.ts
exportfunctiongetTxs():P2P.ArchiversTypes.Txs {// [IMPORTANT] Must return a copy to avoid mutationconstrequestsCopy=deepmerge({}, [...joinRequests,...leaveRequests])if (logFlags.console)console.log(`getTxs: Cycle ${CycleCreator.currentCycle}, Quarter: ${CycleCreator.currentQuarter}`, { archivers: requestsCopy, })return { archivers: requestsCopy, }}
so it returns joinRequests and leaveRequests. Then CycleCreator calls makeCycleData to create a block:
shardus-core/src/p2p/CycleCreator.ts
asyncfunctionrunQ3() { currentQuarter =3Self.emitter.emit('cycle_q3_start')if (logFlags.p2pNonFatal) info(`C${currentCycle} Q${currentQuarter}`)profilerInstance.profileSectionStart('CycleCreator-runQ3')// Get txs and create this cycle's record, marker, and cert txs =collectCycleTxs() ;({ record, marker, cert } =makeCycleData(txs,CycleChain.newest))}
so this record would be parsed by nodes and archivers in the network, and they would add these new archivers to their active archiver list.
So i will provide a POC to add more archivers than expected, after that i will show one consequence of this bug which is blocking archivers from persisting new blocks
Impact Details
This bug could affect all validators and archivers, collectors that collect historical data and explorer which displays them..
we want to have a network with at least 17 nodes, because consensusRadius is 16 for a small network and we want more nodes than this for next part of POC. Also change forceBogonFilteringOn: false in src/config/index.ts because we are running all nodes in one machine, or if you can run the blockchain in multiple machines so it is ok to not change any config. So start a network with 18 nodes for example. (one way is to follow README.ms file in shardeum repository and execute shardus start 18)
After all nodes became active run cd archive-server to go to this repository, then run npm install && npm run prepare
create a file and name it sign.js and write below code to it
then create a file and name it join.js and copy below code into it. This file sends at most 1000 join request to a node with port nodePort which is 9004 (you can change it if your nodes are running in different ports). It also tells the node for each join request, our archiver port is a number between myArchiverPortStart to myArchiverPortEnd. Archivers with different publicKey but same ip:port are allowed to join the network. which is a bug too, but that is not case of this report, Anyway, we want our new archivers to have different ip:port because later we need it to maliciously disable functionality of archivers.
By default configuration, network does not removes an archiver if it is down or not responding. But we assume this functionality is enbaled and we want our new archivers to respond to network requests. One way is to actually run 1000 archiver but it is not required, we can simply fool the network, and proxy every request to a real archiver. for this i used nginx. install nginx on your device (sudo apt install nginx) and append this text to /etc/nginx/nginx.conf. It is like a port mapping from our archivers port to real archiver port which is 4000. So every request to our archiver would be answered by archiver at 127.0.0.1:4000.
stream {
upstream stream_archiver {
server 127.0.0.1:4000;
}
server {
listen 23500;
proxy_pass stream_archiver;
}
server {
listen 23501;
proxy_pass stream_archiver;
}
server {
listen 23502;
proxy_pass stream_archiver;
}
server {
listen 23503;
proxy_pass stream_archiver;
}
server {
listen 23504;
proxy_pass stream_archiver;
}
server {
listen 23505;
proxy_pass stream_archiver;
}
server {
listen 23506;
proxy_pass stream_archiver;
}
server {
listen 23507;
proxy_pass stream_archiver;
}
server {
listen 23508;
proxy_pass stream_archiver;
}
server {
listen 23509;
proxy_pass stream_archiver;
}
}
Now we have our fake archivers. execute node join.js to generate some public/private key and send join requests to network. When you see max limit reached in console you can press Ctrl+C to terminate remaining requests.
now if you open http://localhost:4000/archivers in your browser, you can see many archivers are joined as active to the network.
Untill now we showed how archiver join limit validation bug can not prevent archivers from joining the network. Now we are going to use this bug and make all archivers useless.
Open http://localhost:4000/archivers in your browser, copy two of our fake archiver publicKeys which have different port number, crate a file and name it gossipdata.js with following text. Replace pkList array items with those two publicKeys. Also open http://localhost:4000/cycleinfo/1 in your browser and copy first item of cycleInfo array, and replace default value of cycle object in following file with it.
this script is sending a fake cycle value to archiver. we used two publicKey to sign it because archiver uses consensusRadius and number of active nodes to calculate how many archivers should sign a cycle data to be persisted. and because it is 16 and we have 18 nodes so we need two archiver to sign it. This script also changes cycle.counter to a big number for example 9999999, so from now in each block when nodes send actual cycles which has counter less than this value would be discarded.