-
Notifications
You must be signed in to change notification settings - Fork 902
Retry custody requests after peer metadata updates #6975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me! I think we need to attempt to make progress on range sync as well?
@@ -483,6 +486,13 @@ impl<T: BeaconChainTypes> SyncManager<T> { | |||
} | |||
} | |||
|
|||
fn updated_peer_cgc(&mut self, _peer_id: PeerId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to resume by range request as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I resume range sync aswell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'll run some tests today to confirm this fixes the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like this fixes the issue. I still don't see range requests until a few minutes later until we add a new finalized chain.
Right after startup, waiting for custody peers
Feb 12 04:33:07.181 DEBG Waiting for peers to be available on sampling column subnets, chain: 1, service: range_sync, service: sync, module: network::sync::range_sync::chain:1057
Got peer metadata response after 15s
Feb 12 04:33:22.271 DEBG Obtained peer's metadata, new_seq_no: 6, peer_id: 16Uiu2HAmJyaVGkRGR9ACqompkoge8T2x4KFH4KbzDkh7zz6uN2JX, service: libp2p, module: lighthouse_network::peer_manager:732
No range requests until ~5 mins later
Feb 12 04:38:07.303 DEBG Finalization sync peer joined, peer_id: 16Uiu2HAmJyaVGkRGR9ACqompkoge8T2x4KFH4KbzDkh7zz6uN2JX, service: range_sync, service: sync, module: network::sync::range_sync::range:143
Feb 12 04:38:07.305 DEBG New chain added to sync, id: 2, from: 38, to: 1071, end_root: 0x3be00d7ce6e52f7938fd588d909055f72469d0f09ce545d7b23077f2d6b40e8a, current_target: 38, batches: 0, peers: 1, state: Stopped, sync_type: Finalized, peer_id: 16Uiu2HAmJyaVGkRGR9ACqompk
oge8T2x4KFH4KbzDkh7zz6uN2JX, service: range_sync, service: sync, module: network::sync::range_sync::chain_collection:506
Feb 12 04:38:07.306 DEBG Sync RPC request sent, id: 4/3/RangeSync/39/1, peer: 16Uiu2HAmAAZ5wP6fvpe1b9tWNmgA2Wn8MsrNYVhkw5WohcAoaHKR, epoch: 39, slots: 32, method: BlocksByRange, service: sync, module: network::sync::network_context:788
Feb 12 04:38:07.307 DEBG Sync RPC request sent, id: 5/3/RangeSync/39/1, peer: 16Uiu2HAm9PijSZpm5QUphXRoBtkhUZPkGJ4Rgxk4Bny91oZPYZLG, columns: [74, 30, 39, 19, 63, 41, 52, 47, 58], epoch: 39, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you saw this log? Received updated peer CGC message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall seeing this. I think I was using the right locally-built image, but can be worth re-testing to confirm if you have time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dapplion could you retest this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've retested this, and it doesn't seem to trigger retry after obtaining the peers metadata, and the Received updated peer CGC message
log was not observed.
May 06 07:16:46.854 DEBUG Waiting for peers to be available on custody column subnets chain: 1service: "range_sync"
...
# obtained peers metadata
May 06 07:17:01.389 DEBUG Obtained peer's metadata peer_id: 16Uiu2HAkz7SLRbDFNscs6RFDrD37oB5yCR9UaBcVpAiy1G54BuQW, new_seq_no: 6, service: "network"
May 06 07:17:01.393 DEBUG Obtained peer's metadata peer_id: 16Uiu2HAmKjMmG6VJG2oi5Gd7x4iszsv4Fm9JZWeEad62kTVF9dke, new_seq_no: 132, service: "network"
...
# no range request until ~5 mins later when a new chain is added
May 06 07:21:46.397 DEBUG New chain added to sync peer_id: "16Uiu2HAkz7SLRbDFNscs6RFDrD37oB5yCR9UaBcVpAiy1G54BuQW", sync_type: Finalized, id: 5, start_epoch: 0, target_head_slot: 193, target_head_root: 0xc1802494a0935c8bee449a412f982cc1903ec791565d
9cf9eaab233f2cd2bc90, component: "range_sync"
May 06 07:21:46.397 DEBUG Sync RPC request sent method: "DataColumnsByRange", slots: 32, epoch: 1, columns: [8, 69, 99, 90, 9, 84, 24, 71, 12, 121, 50, 68, 127, 66, 36, 26, 41, 107, 47, 52, 108, 98, 70, 100, 54, 35, 21, 60, 49, 120, 10, 72, 44, 93, 67, 0, 122, 19, 43, 62, 97, 115, 59, 95, 48, 11, 101, 116, 34, 20, 25, 18, 3, 124, 110, 57, 46, 83, 77, 105, 81, 106, 1, 45, 104, 22, 29, 53, 6, 32, 13, 119, 4, 78, 63, 86, 92, 7, 114, 73, 16, 17, 109, 28, 75, 102, 80, 40, 79, 51, 125, 38, 85, 89, 42, 39, 126, 113, 87, 96, 37, 88, 64, 112, 14, 118, 76, 117, 58, 82, 27, 94, 74, 23, 30, 111, 15, 2, 123, 33, 55, 5, 65, 91, 31, 56], peer: 16Uiu2HAmKjMmG6VJG2oi5Gd7x4iszsv4Fm9JZWeEad62kTVF9dke, id: 8/6/RangeSync/1/1, chain: 1, service: "range_sync"
May 06 07:21:46.397 DEBUG Sync RPC request sent method: "DataColumnsByRange", slots: 32, epoch: 1, columns: [61, 103], peer: 16Uiu2HAkz7SLRbDFNscs6RFDrD37oB5yCR9UaBcVpAiy1G54BuQW, id: 9/6/RangeSync/1/1, chain: 1, service: "range_sync"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6975 (comment)
This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see here, you only care if the custody columns get updated.
I think we should try and maintain all our internal consistencies, i.e Internal Requests stay internal else it can get very confusing.
The solution you have here has redundant cases, like if its not updated, we send the event up to the router which then throws it away.
To keep everything consistent, I think we should do the following:
- Add a new event to
NetworkEvent
i.e
NetworkEvent::PeerUpdatedCustodyGroupCount(PeerId)
- In the handling of a metadata response, if the cgc is updated, then return the above event
- Handle this event and send to sync
This has the benefits of:
- Not sending redundant messages with redundant information (i.e the original metadata)
- We maintain the logic that internal messages stay internal
- I think its a bit clearer the purpose of the message and why its there.
Thanks @pawanjay176 @AgeManning ! yeah make sense, I've updated the code as per your suggestions. I think it's much cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ This approach looks much cleaner! Thanks @AgeManning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You may have to fix your CI before adding the pull request to the queue again. |
@mergify requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
Issue Addressed
Closes #6895
We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.
Proposed Changes
Add new sync event
SyncMessage::UpdatedPeerCgc
. It's sent by the router when a metadata response updates the known CGC