-
Notifications
You must be signed in to change notification settings - Fork 902
Make range sync peer loadbalancing PeerDAS-friendly #6864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make range sync peer loadbalancing PeerDAS-friendly #6864
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this, but just wanted to throw in my 2 cents that I think the idea of bringing up the logic into network context is a good idea.
There's still some todo's in here, I guess these can be addressed later.
} | ||
// TODO(das): previously here we de-prioritize peers that had failed to download or | ||
// process a batch | ||
self.send_batch(network, batch_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function does nothing now and can just be replaced with send_batch
? (Unless implementing the TODO)
This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏 |
Blocked by |
@dapplion this PR got closed due to base branch being deleted, would you mind re-opening this against |
- Re-opens #6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!
Issue Addressed
Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.
Issues with current unstable:
Proposed Changes
block_components_by_range_request
gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the errorRpcRequestSendError::NoPeer
RpcRequestSendError::NoPeer
explicitlyAwaitingDownload
state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too longTODOs
Note: this touches the mainnet path!