Skip to content

Conversation

Teebor-Choka
Copy link
Contributor

@Teebor-Choka Teebor-Choka commented Feb 14, 2025

As identified by @tolbrino the swarm voids the cache for the peer id that failed to dial, causing the multiaddress to be further unknown for redials, causing an incremental decline in peer connectability.

To solve this, the node now holds the entire announced peer list and on failed dials reinserts the multiaddress for a peer that failed to be dialed.

Closes #6869

@Teebor-Choka Teebor-Choka added this to the 2.2.3 milestone Feb 14, 2025
Copy link
Contributor

coderabbitai bot commented Feb 14, 2025

📝 Walkthrough

Walkthrough

This pull request modifies how peer discovery events are handled. In the discovery module, the NewPeerMultiddress variant is removed from the Event enum, and a new all_peers field (a HashMap<PeerId, Multiaddr>) is added to the Behaviour struct. The initialization and event processing methods (new, on_swarm_event, and poll) are updated accordingly to store peer addresses and enqueue a NewExternalAddrOfPeer event upon failures or announcements. In the swarm module, discovery event handling logic for dialing new peers has been removed, effectively making the swarm ignore these events.

Changes

File(s) Change Summary
transport/p2p/.../discovery.rs Removed the NewPeerMultiddress variant from the Event enum; added an all_peers: HashMap<PeerId, Multiaddr> field to the Behaviour struct; updated new, on_swarm_event, and poll for new peer address handling and dial failure logic.
transport/p2p/.../swarm.rs Removed the logic that handled discovery events for new peer multiaddresses, resulting in discovery events being matched to an empty case and ignored during event processing.
transport/p2p/.../lib.rs Moved the discovery field to the end of the HoprNetworkBehavior struct definition, maintaining its type and functionality.
transport/p2p/Cargo.toml Updated package version from "0.5.4" to "0.5.5".

Possibly related PRs

  • Dependency updates 2024-08-14 #6454: Modifications to the Behaviour struct and event enum in discovery.rs share similar code-level changes, indicating a direct connection between the changes in both pull requests.

Suggested labels

bug, component:p2p

Suggested reviewers

  • tolbrino
  • NumberFour8

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1f9345 and f6ba53b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • transport/p2p/Cargo.toml (1 hunks)
  • transport/p2p/src/swarm.rs (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • transport/p2p/Cargo.toml
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: hoprd / docker
  • GitHub Check: hopli / docker
  • GitHub Check: Docs / Rust docs
  • GitHub Check: tests-unit-nightly
  • GitHub Check: tests-smoke-websocket
  • GitHub Check: tests-unit
  • GitHub Check: Linter
  • GitHub Check: tests-smart-contracts
🔇 Additional comments (3)
transport/p2p/src/swarm.rs (3)

624-638: LGTM! New external address event handling improves peer discovery.

The new match arms for external address events (NewExternalAddrCandidate, ExternalAddrConfirmed, ExternalAddrExpired, NewExternalAddrOfPeer) ensure that peer multiaddresses are properly managed. This change directly addresses the issue by retaining peer addresses even after failed dials.


610-611: LGTM! Enhanced logging for better observability.

The logging changes in ListenerClosed and Dialing events improve the observability of the P2P network's behavior.

Also applies to: 622-622


500-500:

❓ Verification inconclusive

Verify the impact of ignoring Discovery events.

The Discovery events are now being ignored, which aligns with the PR objective to prevent voiding the cache for failed dials. However, we should verify that this doesn't break other functionality that might have depended on these events.


🏁 Script executed:

#!/bin/bash
# Description: Search for any code that might be affected by ignoring Discovery events

# Test: Search for references to Discovery events in the codebase
rg -A 5 "HoprNetworkBehaviorEvent::Discovery"

# Test: Search for any code that might handle peer discovery
ast-grep --pattern 'fn $_(Discovery $_) {
  $$$
}'

Length of output: 247


Action Required: Manually Validate the Impact of Ignoring Discovery Events

The change to leave the Discovery event match arm empty aligns with the goal to prevent unwanted cache voiding on failed dials. However, our automated search did not identify other consumers or handlers of Discovery events, likely due to filtering issues with the initial commands. Please perform a manual verification of the following in your codebase:

  • Locate references in the affected file:
    Confirm that in transport/p2p/src/swarm.rs the empty match arm:

                        SwarmEvent::Behaviour(HoprNetworkBehaviorEvent::Discovery(_)) => {}

    is safe and does not break any downstream logic expecting discovery handling.

  • Check for dependencies:
    Look for any parts of the code (or tests) that might expect non-empty processing of Discovery events. For instance, running:

    rg -A 10 "HoprNetworkBehaviorEvent::Discovery" transport/p2p/src/swarm.rs

    manually should help ensure there are no overlooked side effects.

Based on the current output (or lack thereof from automated scripts), additional manual confirmation is recommended to ensure that ignoring these events will not disrupt peer discovery or related functionalities.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Teebor-Choka Teebor-Choka changed the base branch from master to release/singapore February 14, 2025 15:41
@Teebor-Choka Teebor-Choka requested a review from a team February 14, 2025 15:41
@Teebor-Choka Teebor-Choka self-assigned this Feb 14, 2025
@Teebor-Choka Teebor-Choka marked this pull request as ready for review February 14, 2025 15:42
@Teebor-Choka Teebor-Choka force-pushed the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch from a822d66 to 71d86ae Compare February 14, 2025 15:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
transport/p2p/src/behavior/discovery.rs (5)

23-23: Consider removing or repurposing this empty enum.

Having an empty public enum may cause confusion and might be unnecessary if no variants are planned. You could remove it entirely or add a placeholder variant for clarity.


34-34: Reevaluate using a single address per peer.

Storing only one multiaddress in all_peers could discard potentially useful addresses. Consider storing a collection (e.g., HashSet<Multiaddr>) to handle multiple valid addresses per peer.


116-128: Potentially re-adding stale addresses on dial failures.

Reinserting the multiaddress into the swarm is useful for reliability; however, consider how to detect and retire stale addresses if multiple attempts fail. Otherwise, you may repeatedly retry dead endpoints.


168-173: Address might be outdated or incomplete.

Re-pushing the multiaddress on PeerDiscovery::Allow is practical but could lead to attempting outdated addresses. More granular logic (e.g., verifying success before re-appending) may be beneficial.


190-197: Consider handling all announced addresses instead of just the last.

Relying solely on multiaddresses.last() discards earlier addresses, which might still be valid. You could iterate through all addresses or store them in a more comprehensive data structure.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77721bf and 71d86ae.

📒 Files selected for processing (2)
  • transport/p2p/src/behavior/discovery.rs (6 hunks)
  • transport/p2p/src/swarm.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: hoprd / docker
  • GitHub Check: hopli / docker
  • GitHub Check: Docs / Rust docs
  • GitHub Check: tests-smoke-websocket
  • GitHub Check: tests-unit-nightly
  • GitHub Check: tests-unit
  • GitHub Check: tests-smart-contracts
  • GitHub Check: Linter
🔇 Additional comments (2)
transport/p2p/src/behavior/discovery.rs (1)

55-55: Same concern regarding multiple addresses.

This initialization aligns with the current single-address-per-peer approach. However, if you decide to store multiple addresses per peer, update this initial value accordingly.

transport/p2p/src/swarm.rs (1)

501-501: Verify the impact of ignoring discovery events.

Dropping all Discovery events may limit visibility into newly announced peers or addresses. Ensure upstream logic sufficiently covers these cases and that no critical network behavior is lost.

Copy link
Contributor

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmTrntCxz63TpcQtK2fLaZ8yxhZ4iBn62UR7qCXJhJdAAT

@Teebor-Choka Teebor-Choka force-pushed the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch from cbcd1ad to f360243 Compare February 14, 2025 16:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
transport/p2p/src/lib.rs (1)

87-89: LGTM! Field order is crucial for correct peer address handling.

The placement of the discovery field at the end of the struct is essential. When fields are dropped, they are dropped in reverse order of declaration. By placing discovery last, we ensure it's dropped first, allowing it to properly handle peer addresses that might have been removed by request_response components.

This ordering dependency should be documented in the module-level documentation to ensure it's not accidentally changed in future refactors.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71d86ae and f360243.

📒 Files selected for processing (1)
  • transport/p2p/src/lib.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: hoprd / docker
  • GitHub Check: hopli / docker
  • GitHub Check: tests-smoke-websocket
  • GitHub Check: tests-smart-contracts
  • GitHub Check: tests-unit-nightly
  • GitHub Check: tests-unit
  • GitHub Check: Docs / Rust docs
  • GitHub Check: Linter
🔇 Additional comments (2)
transport/p2p/src/lib.rs (2)

115-164: LGTM! Initialization order is correct.

The initialization order in the new method is correct, with discovery being initialized first. This maintains proper functionality while working with the new field order.


76-90:

❓ Verification inconclusive

Verify thread safety of peer address handling.

The field ordering ensures correct drop order, but we should verify that peer address updates are properly synchronized between request_response components and discovery.

Let's check for any potential race conditions in the discovery module:


🏁 Script executed:

#!/bin/bash
# Description: Check for synchronization mechanisms in discovery module
# Look for mutex/atomic usage around peer address handling

rg -A 5 'Mutex|RwLock|atomic|sync' transport/p2p/src/behavior/discovery.rs

Length of output: 76


Manual verification needed for thread safety in peer address updates.

The discovery module does not appear to use explicit synchronization mechanisms (e.g. Mutex, RwLock, atomics) per our search. While the field ordering in HoprNetworkBehavior ensures a deterministic drop order (with discovery being dropped last), it's unclear if the design sufficiently protects concurrent peer address modifications when request_response components remove addresses and discovery re-adds them. Please manually verify that the peer address handling is thread safe under concurrent scenarios.

Copy link
Contributor

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmZdkrQzTrrVKvEpjgNJy8Z9yWVBDwDZWBRxnFKGSXUTSQ

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
transport/p2p/src/behavior/discovery.rs (2)

23-23: Consider documenting or removing the empty enum.

The Event enum is now empty after removing the NewPeerMultiddress variant. If it's no longer needed, consider removing it entirely. Otherwise, add documentation explaining why it's kept empty.


34-34: Add documentation for the all_peers field.

This field is crucial for the new caching behavior. Please add documentation explaining its purpose in retaining peer addresses across dial attempts.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f360243 and e5c6280.

📒 Files selected for processing (1)
  • transport/p2p/src/behavior/discovery.rs (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: hopli / docker
  • GitHub Check: hoprd / docker
  • GitHub Check: Docs / Rust docs
  • GitHub Check: tests-unit
  • GitHub Check: tests-smart-contracts
  • GitHub Check: tests-smoke-websocket
  • GitHub Check: tests-unit-nightly
  • GitHub Check: Linter

Copy link
Contributor

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmSfcXY3kCERLfioZBHwdddeeqKw5diQSyrReDpeahgyo1

Copy link
Contributor

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmcMUsCFmSkhJCXCVMVjJTwwW5EocHTrnqRZhQobEvCDdz

Copy link
Contributor

@tolbrino tolbrino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crate version bump missing

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Feb 17, 2025
Copy link
Contributor

The latest dAppNode package for DAppNodePackage-Hopr has been built and uploaded with IPFS hash: /ipfs/QmU8SCyvDZw8oPZwdamWK5GWazxDuNcqHK88NcnqpsnzwA

@Teebor-Choka Teebor-Choka merged commit 4d5244b into release/singapore Feb 17, 2025
27 of 28 checks passed
@Teebor-Choka Teebor-Choka deleted the kauki/libp2p/bug/fix-the-cache-cleaning-of-multiaddresses-on-peer-dial branch February 17, 2025 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate:core-p2p dependencies Pull requests that update a dependency file DO NOT MERGE docker-production-build package:DAppNodePackage-Hopr Build dAppNode package for dufour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants