(Specifications) A bug in specifications with no direct impact on client implementations
Description
Brief/Intro
The Client.BatchCallContext in the rpc package of Ethereum client Erigon ( https://github.com/erigontech/erigon ) could be stuck due to the missing check of the HTTP batch response length in the sendBatchHTTP().
In the rpc package, the Client.BatchCall() is used to send all requests in a single batch, which calls the Client.BatchCallContext() to perform the batched calls, in which the Client.sendBatchHTTP() will be invoked in case that it’s an HTTP request:
// BatchCall sends all given requests as a single batch and waits for the server
// to return a response for all of them.
//
// In contrast to Call, BatchCall only returns I/O errors. Any error specific to
// a request is reported through the Error field of the corresponding BatchElem.
//
// Note that batch calls may not be executed atomically on the server side.
func (c *Client) BatchCall(b []BatchElem) error {
ctx := context.Background()
return c.BatchCallContext(ctx, b)
}
// BatchCall sends all given requests as a single batch and waits for the server
// to return a response for all of them. The wait duration is bounded by the
// context's deadline.
//
// In contrast to CallContext, BatchCallContext only returns errors that have occurred
// while sending the request. Any error specific to a request is reported through the
// Error field of the corresponding BatchElem.
//
// Note that batch calls may not be executed atomically on the server side.
func (c *Client) BatchCallContext(ctx context.Context, b []BatchElem) error {
msgs := make([]*jsonrpcMessage, len(b))
op := &requestOp{
ids: make([]json.RawMessage, len(b)),
resp: make(chan *jsonrpcMessage, len(b)),
}
for i, elem := range b {
msg, err := c.newMessage(elem.Method, elem.Args...)
if err != nil {
return err
}
msgs[i] = msg
op.ids[i] = msg.ID
}
var err error
if c.isHTTP {
err = c.sendBatchHTTP(ctx, op, msgs)
} else {
err = c.send(ctx, op, msgs)
}
// Wait for all responses to come back.
for n := 0; n < len(b) && err == nil; n++ {
var resp *jsonrpcMessage
resp, err = op.wait(ctx, c)
if err != nil {
break
}
// Find the element corresponding to this response.
// The element is guaranteed to be present because dispatch
// only sends valid IDs to our channel.
var elem *BatchElem
for i := range msgs {
if bytes.Equal(msgs[i].ID, resp.ID) {
elem = &b[i]
break
}
}
if resp.Error != nil {
elem.Error = resp.Error
continue
}
if len(resp.Result) == 0 {
elem.Error = ErrNoResult
continue
}
elem.Error = json.Unmarshal(resp.Result, elem.Result)
}
return err
}
As mentioned in the PR: https://github.com/ethereum/go-ethereum/pull/26064
It turns out that Client.BatchCallContext relies on the number or response messages exactly matching the number of requests. If too many responses are received, sendBatchHTTP blocks trying to send more than will fit in the channel buffer, and never yields to timeout. If too few responses are received, BatchCallContext waits for the missing responses from the empty channel, but eventually yields to timeout.
func (c *Client) sendBatchHTTP(ctx context.Context, op *requestOp, msgs []*jsonrpcMessage) error {
hc := c.writeConn.(*httpConn)
respBody, err := hc.doRequest(ctx, msgs)
if err != nil {
return err
}
var respmsgs []jsonrpcMessage
if err := json.Unmarshal(respBody, &respmsgs); err != nil {
return err
}
for i := 0; i < len(respmsgs); i++ {
op.resp <- &respmsgs[i]
}
return nil
}
It is worth noted a similar issue has been fixed in go-ethereum by checking the message and response have same size : https://github.com/ethereum/go-ethereum/pull/26064
func (c *Client) sendBatchHTTP(ctx context.Context, op *requestOp, msgs []*jsonrpcMessage) error {
hc := c.writeConn.(*httpConn)
respBody, err := hc.doRequest(ctx, msgs)
if err != nil {
return err
}
var respmsgs []jsonrpcMessage
if err := json.Unmarshal(respBody, &respmsgs); err != nil {
return err
}
if len(respmsgs) != len(msgs) {
return fmt.Errorf("batch has %d requests but response has %d: %w", len(msgs), len(respmsgs), ErrBadResult)
}
for i := 0; i < len(respmsgs); i++ {
op.resp <- &respmsgs[i]
}
return nil
}
Impact Details
Client.BatchCall() could be stuck in case two many or too few responses are received.