Fix the cache cleaning of multiaddresses on peer dial #6873

Teebor-Choka · 2025-02-14T15:41:19Z

As identified by @tolbrino the swarm voids the cache for the peer id that failed to dial, causing the multiaddress to be further unknown for redials, causing an incremental decline in peer connectability.

To solve this, the node now holds the entire announced peer list and on failed dials reinserts the multiaddress for a peer that failed to be dialed.

Closes #6869

coderabbitai · 2025-02-14T15:41:26Z

📝 Walkthrough

Walkthrough

This pull request modifies how peer discovery events are handled. In the discovery module, the NewPeerMultiddress variant is removed from the Event enum, and a new all_peers field (a HashMap<PeerId, Multiaddr>) is added to the Behaviour struct. The initialization and event processing methods (new, on_swarm_event, and poll) are updated accordingly to store peer addresses and enqueue a NewExternalAddrOfPeer event upon failures or announcements. In the swarm module, discovery event handling logic for dialing new peers has been removed, effectively making the swarm ignore these events.

Changes

File(s)	Change Summary
`transport/p2p/.../discovery.rs`	Removed the `NewPeerMultiddress` variant from the `Event` enum; added an `all_peers: HashMap<PeerId, Multiaddr>` field to the `Behaviour` struct; updated `new`, `on_swarm_event`, and `poll` for new peer address handling and dial failure logic.
`transport/p2p/.../swarm.rs`	Removed the logic that handled discovery events for new peer multiaddresses, resulting in discovery events being matched to an empty case and ignored during event processing.
`transport/p2p/.../lib.rs`	Moved the `discovery` field to the end of the `HoprNetworkBehavior` struct definition, maintaining its type and functionality.
`transport/p2p/Cargo.toml`	Updated package version from "0.5.4" to "0.5.5".

Possibly related PRs

Dependency updates 2024-08-14 #6454: Modifications to the Behaviour struct and event enum in discovery.rs share similar code-level changes, indicating a direct connection between the changes in both pull requests.

Suggested labels

bug, component:p2p

Suggested reviewers

tolbrino
NumberFour8

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1f9345 and f6ba53b.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

transport/p2p/Cargo.toml (1 hunks)
transport/p2p/src/swarm.rs (3 hunks)

✅ Files skipped from review due to trivial changes (1)

transport/p2p/Cargo.toml

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: hoprd / docker
GitHub Check: hopli / docker
GitHub Check: Docs / Rust docs
GitHub Check: tests-unit-nightly
GitHub Check: tests-smoke-websocket
GitHub Check: tests-unit
GitHub Check: Linter
GitHub Check: tests-smart-contracts

🔇 Additional comments (3)

transport/p2p/src/swarm.rs (3)

624-638: LGTM! New external address event handling improves peer discovery.

The new match arms for external address events (NewExternalAddrCandidate, ExternalAddrConfirmed, ExternalAddrExpired, NewExternalAddrOfPeer) ensure that peer multiaddresses are properly managed. This change directly addresses the issue by retaining peer addresses even after failed dials.

610-611: LGTM! Enhanced logging for better observability.

The logging changes in ListenerClosed and Dialing events improve the observability of the P2P network's behavior.

Also applies to: 622-622

500-500:
❓ Verification inconclusive

Verify the impact of ignoring Discovery events.

The Discovery events are now being ignored, which aligns with the PR objective to prevent voiding the cache for failed dials. However, we should verify that this doesn't break other functionality that might have depended on these events.

🏁 Script executed:
#!/bin/bash
# Description: Search for any code that might be affected by ignoring Discovery events

# Test: Search for references to Discovery events in the codebase
rg -A 5 "HoprNetworkBehaviorEvent::Discovery"

# Test: Search for any code that might handle peer discovery
ast-grep --pattern 'fn $_(Discovery $_) {
  $$$
}'
Length of output: 247
Action Required: Manually Validate the Impact of Ignoring Discovery Events

The change to leave the Discovery event match arm empty aligns with the goal to prevent unwanted cache voiding on failed dials. However, our automated search did not identify other consumers or handlers of Discovery events, likely due to filtering issues with the initial commands. Please perform a manual verification of the following in your codebase:
Locate references in the affected file:
Confirm that in transport/p2p/src/swarm.rs the empty match arm:
                    SwarmEvent::Behaviour(HoprNetworkBehaviorEvent::Discovery(_)) => {}
is safe and does not break any downstream logic expecting discovery handling.
Check for dependencies:
Look for any parts of the code (or tests) that might expect non-empty processing of Discovery events. For instance, running:
rg -A 10 "HoprNetworkBehaviorEvent::Discovery" transport/p2p/src/swarm.rs
manually should help ensure there are no overlooked side effects.
Based on the current output (or lack thereof from automated scripts), additional manual confirmation is recommended to ensure that ignoring these events will not disrupt peer discovery or related functionalities.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…d connections

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

transport/p2p/src/behavior/discovery.rs (5)

23-23: Consider removing or repurposing this empty enum.

Having an empty public enum may cause confusion and might be unnecessary if no variants are planned. You could remove it entirely or add a placeholder variant for clarity.

34-34: Reevaluate using a single address per peer.

Storing only one multiaddress in all_peers could discard potentially useful addresses. Consider storing a collection (e.g., HashSet<Multiaddr>) to handle multiple valid addresses per peer.

116-128: Potentially re-adding stale addresses on dial failures.

Reinserting the multiaddress into the swarm is useful for reliability; however, consider how to detect and retire stale addresses if multiple attempts fail. Otherwise, you may repeatedly retry dead endpoints.

168-173: Address might be outdated or incomplete.

Re-pushing the multiaddress on PeerDiscovery::Allow is practical but could lead to attempting outdated addresses. More granular logic (e.g., verifying success before re-appending) may be beneficial.

190-197: Consider handling all announced addresses instead of just the last.

Relying solely on multiaddresses.last() discards earlier addresses, which might still be valid. You could iterate through all addresses or store them in a more comprehensive data structure.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77721bf and 71d86ae.

📒 Files selected for processing (2)

transport/p2p/src/behavior/discovery.rs (6 hunks)
transport/p2p/src/swarm.rs (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: hoprd / docker
GitHub Check: hopli / docker
GitHub Check: Docs / Rust docs
GitHub Check: tests-smoke-websocket
GitHub Check: tests-unit-nightly
GitHub Check: tests-unit
GitHub Check: tests-smart-contracts
GitHub Check: Linter

🔇 Additional comments (2)

transport/p2p/src/behavior/discovery.rs (1)

55-55: Same concern regarding multiple addresses.

This initialization aligns with the current single-address-per-peer approach. However, if you decide to store multiple addresses per peer, update this initial value accordingly.

transport/p2p/src/swarm.rs (1)

501-501: Verify the impact of ignoring discovery events.

Dropping all Discovery events may limit visibility into newly announced peers or addresses. Ensure upstream logic sufficiently covers these cases and that no critical network behavior is lost.

github-actions · 2025-02-14T16:06:42Z

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmTrntCxz63TpcQtK2fLaZ8yxhZ4iBn62UR7qCXJhJdAAT

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

transport/p2p/src/lib.rs (1)

87-89: LGTM! Field order is crucial for correct peer address handling.

The placement of the discovery field at the end of the struct is essential. When fields are dropped, they are dropped in reverse order of declaration. By placing discovery last, we ensure it's dropped first, allowing it to properly handle peer addresses that might have been removed by request_response components.

This ordering dependency should be documented in the module-level documentation to ensure it's not accidentally changed in future refactors.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71d86ae and f360243.

📒 Files selected for processing (1)

transport/p2p/src/lib.rs (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: hoprd / docker
GitHub Check: hopli / docker
GitHub Check: tests-smoke-websocket
GitHub Check: tests-smart-contracts
GitHub Check: tests-unit-nightly
GitHub Check: tests-unit
GitHub Check: Docs / Rust docs
GitHub Check: Linter

🔇 Additional comments (2)

transport/p2p/src/lib.rs (2)

115-164: LGTM! Initialization order is correct.

The initialization order in the new method is correct, with discovery being initialized first. This maintains proper functionality while working with the new field order.

76-90:
❓ Verification inconclusive

Verify thread safety of peer address handling.

The field ordering ensures correct drop order, but we should verify that peer address updates are properly synchronized between request_response components and discovery.

Let's check for any potential race conditions in the discovery module:

🏁 Script executed:
#!/bin/bash
# Description: Check for synchronization mechanisms in discovery module
# Look for mutex/atomic usage around peer address handling

rg -A 5 'Mutex|RwLock|atomic|sync' transport/p2p/src/behavior/discovery.rs
Length of output: 76
Manual verification needed for thread safety in peer address updates.

The discovery module does not appear to use explicit synchronization mechanisms (e.g. Mutex, RwLock, atomics) per our search. While the field ordering in HoprNetworkBehavior ensures a deterministic drop order (with discovery being dropped last), it's unclear if the design sufficiently protects concurrent peer address modifications when request_response components remove addresses and discovery re-adds them. Please manually verify that the peer address handling is thread safe under concurrent scenarios.

github-actions · 2025-02-14T17:08:57Z

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmZdkrQzTrrVKvEpjgNJy8Z9yWVBDwDZWBRxnFKGSXUTSQ

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

transport/p2p/src/behavior/discovery.rs (2)

23-23: Consider documenting or removing the empty enum.

The Event enum is now empty after removing the NewPeerMultiddress variant. If it's no longer needed, consider removing it entirely. Otherwise, add documentation explaining why it's kept empty.

34-34: Add documentation for the all_peers field.

This field is crucial for the new caching behavior. Please add documentation explaining its purpose in retaining peer addresses across dial attempts.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f360243 and e5c6280.

📒 Files selected for processing (1)

transport/p2p/src/behavior/discovery.rs (7 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: hopli / docker
GitHub Check: hoprd / docker
GitHub Check: Docs / Rust docs
GitHub Check: tests-unit
GitHub Check: tests-smart-contracts
GitHub Check: tests-smoke-websocket
GitHub Check: tests-unit-nightly
GitHub Check: Linter

transport/p2p/src/behavior/discovery.rs

github-actions · 2025-02-14T17:40:35Z

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmSfcXY3kCERLfioZBHwdddeeqKw5diQSyrReDpeahgyo1

github-actions · 2025-02-14T18:34:04Z

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmcMUsCFmSkhJCXCVMVjJTwwW5EocHTrnqRZhQobEvCDdz

tolbrino

crate version bump missing

transport/p2p/src/behavior/discovery.rs

github-actions · 2025-02-17T01:37:10Z

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmU8SCyvDZw8oPZwdamWK5GWazxDuNcqHK88NcnqpsnzwA

Teebor-Choka added package:DAppNodePackage-Hopr Build dAppNode package for dufour docker-production-build labels Feb 14, 2025

Teebor-Choka added this to the 2.2.3 milestone Feb 14, 2025

Teebor-Choka changed the base branch from master to release/singapore February 14, 2025 15:41

Teebor-Choka requested a review from a team February 14, 2025 15:41

Teebor-Choka self-assigned this Feb 14, 2025

Teebor-Choka marked this pull request as ready for review February 14, 2025 15:42

Ensure that peer's multiaddresses are always in the swarm after faile…

71d86ae

…d connections

Teebor-Choka force-pushed the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch from a822d66 to 71d86ae Compare February 14, 2025 15:43

github-actions bot added the crate:core-p2p label Feb 14, 2025

Teebor-Choka mentioned this pull request Feb 14, 2025

Tb/202502 debug swarm dial #6869

Closed

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

Teebor-Choka added the DO NOT MERGE label Feb 14, 2025

Try discovery as last member

f360243

Teebor-Choka force-pushed the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch from cbcd1ad to f360243 Compare February 14, 2025 16:47

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

Add the first dial after announce

e5c6280

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

transport/p2p/src/behavior/discovery.rs Outdated Show resolved Hide resolved

transport/p2p/src/behavior/discovery.rs Show resolved Hide resolved

Teebor-Choka added 3 commits February 14, 2025 18:48

Improve the comments

b1f9345

Downgrade an error

c510bb0

Fix non-exhaustive enumeration

944aae6

NumberFour8 approved these changes Feb 15, 2025

View reviewed changes

tolbrino approved these changes Feb 16, 2025

View reviewed changes

transport/p2p/src/behavior/discovery.rs Show resolved Hide resolved

Bump crate

f6ba53b

github-actions bot added the dependencies Pull requests that update a dependency file label Feb 17, 2025

tolbrino mentioned this pull request Feb 17, 2025

swarm: Add network info logs #6875

Merged

Teebor-Choka merged commit 4d5244b into release/singapore Feb 17, 2025
27 of 28 checks passed

Teebor-Choka deleted the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch February 17, 2025 08:18

tolbrino pushed a commit that referenced this pull request Apr 1, 2025

Fix the cache cleaning of multiaddresses on peer dial (#6873)

72c88a5

tolbrino mentioned this pull request Apr 1, 2025

Merge back changes from v2.2.3 #6970

Closed

tolbrino pushed a commit that referenced this pull request Apr 4, 2025

Fix the cache cleaning of multiaddresses on peer dial (#6873)

6a2d0cf

tolbrino mentioned this pull request Apr 4, 2025

Merge back changes from v2.2.3 (on top of libp2p_stream) #6977

Merged

coderabbitai bot mentioned this pull request Jun 23, 2025

fix(hopr-lib): Connectability issues in the local cluster and production #7255

Merged

Fix the cache cleaning of multiaddresses on peer dial #6873

Fix the cache cleaning of multiaddresses on peer dial #6873

Uh oh!

Conversation

Teebor-Choka commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

tolbrino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!

Teebor-Choka commented Feb 14, 2025 •

edited

Loading

coderabbitai bot commented Feb 14, 2025 •

edited

Loading