-
Notifications
You must be signed in to change notification settings - Fork 97
heartbeat: Replace the libp2p protocol based heartbeat with in-HOPR protocol based one #7142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heartbeat: Replace the libp2p protocol based heartbeat with in-HOPR protocol based one #7142
Conversation
📝 WalkthroughWalkthroughThis change removes the heartbeat and ping mechanisms from the transport and P2P layers and introduces a new modular probing subsystem for network peer liveness and latency detection. It adds new transport crates ( Changes
Sequence Diagram(s)sequenceDiagram
participant Node as Local Node
participant Probe as Probe Subsystem
participant PeerStore as Peer Store
participant Network as Network Layer
participant Peer as Remote Peer
Node->>Probe: Start continuous probing (with config)
loop Probing Round
Probe->>PeerStore: Fetch peers to probe
PeerStore-->>Probe: List of eligible peers
loop For each peer
Probe->>Network: Send probe (ping) message
Network->>Peer: Deliver probe
Peer->>Network: Respond with probe (pong)
Network->>Probe: Deliver probe response
Probe->>PeerStore: Update peer status (success/failure)
end
end
sequenceDiagram
participant CLI as CLI/Config
participant App as Application
participant Probe as Probe Subsystem
CLI->>App: Parse --probeRecheckThreshold and probe config
App->>Probe: Initialize with probe config
App->>Probe: Run probe subsystem
Probe->>App: Report probe results (liveness, latency)
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
a79722e
to
c890e9f
Compare
ad9d6dd
to
9320064
Compare
ae755a4
to
df5345c
Compare
f2b5b51
to
57c6986
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
transport/probe/src/ping.rs (1)
104-106
: Return early when ping dispatch fails.When
send_ping.unbounded_send()
fails, the code logs a warning but continues to wait for a response, which will inevitably timeout. Since thisping
method returns aResult<std::time::Duration>
, it should return early with an error when dispatch fails.- if let Err(error) = self.send_ping.clone().unbounded_send((peer, replier)) { - warn!(%peer, %error, "Failed to initiate a ping request"); - } + if let Err(error) = self.send_ping.unbounded_send((peer, replier)) { + warn!(%peer, %error, "Failed to initiate a ping request"); + return Err(ProbeError::PingerError(peer, "dispatch failed".into())); + }This also removes the unnecessary
clone()
onsend_ping
.
🧹 Nitpick comments (1)
transport/probe/src/ping.rs (1)
18-26
: Rename type alias to reflect probe functionality.The type alias
HeartbeatSendPingTx
still references "Heartbeat" even though this code is part of the probe subsystem that replaces heartbeat functionality. Consider renaming it toProbeSendPingTx
orPingChannelTx
for clarity.-/// Heartbeat send ping TX type +/// Probe send ping TX type -pub type HeartbeatSendPingTx = UnboundedSender<(PeerId, PingQueryReplier)>; +pub type ProbeSendPingTx = UnboundedSender<(PeerId, PingQueryReplier)>;
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (102)
.github/files/pr-python-sdk.md
(0 hunks).github/labeler.yml
(3 hunks).github/renovate.json
(0 hunks)Cargo.toml
(6 hunks)METRICS.md
(1 hunks)README.md
(3 hunks)chain/indexer/src/handlers.rs
(12 hunks)common/internal-types/Cargo.toml
(2 hunks)common/internal-types/src/protocol.rs
(1 hunks)common/network-types/Cargo.toml
(1 hunks)common/network-types/src/session/state.rs
(2 hunks)common/network-types/src/types.rs
(1 hunks)common/parallelize/Cargo.toml
(1 hunks)common/primitive-types/Cargo.toml
(1 hunks)common/primitive-types/src/primitives.rs
(2 hunks)crypto/packet/Cargo.toml
(1 hunks)crypto/sphinx/Cargo.toml
(2 hunks)crypto/sphinx/src/packet.rs
(2 hunks)crypto/types/Cargo.toml
(2 hunks)db/api/src/protocol.rs
(1 hunks)db/migration/src/lib.rs
(3 hunks)db/migration/src/m20250528_000023_peers_reset.rs
(1 hunks)db/sql/Cargo.toml
(1 hunks)db/sql/src/peers.rs
(1 hunks)ethereum/contracts/contracts-addresses.json
(2 hunks)hopli/src/key_pair.rs
(2 hunks)hopr/hopr-lib/src/config.rs
(2 hunks)hopr/hopr-lib/src/lib.rs
(3 hunks)hopr/hopr-lib/tests/session_integration_tests.rs
(3 hunks)hoprd/hoprd/Cargo.toml
(0 hunks)hoprd/hoprd/example_cfg.yaml
(1 hunks)hoprd/hoprd/src/cli.rs
(2 hunks)hoprd/hoprd/src/config.rs
(1 hunks)hoprd/hoprd/src/lib.rs
(0 hunks)hoprd/keypair/src/key_pair.rs
(3 hunks)hoprd/rest-api/Cargo.toml
(0 hunks)hoprd/rest-api/src/peers.rs
(1 hunks)hoprd/rest-api/src/session.rs
(2 hunks)sdk/barebone-lower-win-prob.cfg.yaml
(1 hunks)sdk/barebone.cfg.yaml
(1 hunks)sdk/default.cfg.yaml
(1 hunks)sdk/python/localcluster/cluster.py
(1 hunks)tests/conftest.py
(1 hunks)transport/api/Cargo.toml
(4 hunks)transport/api/src/config.rs
(3 hunks)transport/api/src/constants.rs
(0 hunks)transport/api/src/errors.rs
(2 hunks)transport/api/src/helpers.rs
(1 hunks)transport/api/src/lib.rs
(16 hunks)transport/api/src/network_notifier.rs
(6 hunks)transport/bloom/.gitignore
(1 hunks)transport/bloom/Cargo.toml
(1 hunks)transport/bloom/benches/bloom_filter.rs
(1 hunks)transport/bloom/src/lib.rs
(1 hunks)transport/bloom/src/persistent.rs
(1 hunks)transport/bloom/src/raw.rs
(1 hunks)transport/mixer/benches/mixer_throughput.rs
(1 hunks)transport/mixer/src/channel.rs
(2 hunks)transport/network/Cargo.toml
(0 hunks)transport/network/src/config.rs
(1 hunks)transport/network/src/constants.rs
(0 hunks)transport/network/src/errors.rs
(0 hunks)transport/network/src/heartbeat.rs
(0 hunks)transport/network/src/lib.rs
(0 hunks)transport/network/src/messaging.rs
(0 hunks)transport/network/src/network.rs
(11 hunks)transport/network/src/ping.rs
(0 hunks)transport/p2p/Cargo.toml
(2 hunks)transport/p2p/src/behavior/discovery.rs
(5 hunks)transport/p2p/src/behavior/heartbeat.rs
(0 hunks)transport/p2p/src/behavior/mod.rs
(0 hunks)transport/p2p/src/constants.rs
(0 hunks)transport/p2p/src/lib.rs
(3 hunks)transport/p2p/src/swarm.rs
(6 hunks)transport/p2p/tests/p2p_transport_test.rs
(5 hunks)transport/packet/Cargo.toml
(2 hunks)transport/packet/src/errors.rs
(1 hunks)transport/packet/src/lib.rs
(1 hunks)transport/packet/src/v1.rs
(1 hunks)transport/probe/.gitignore
(1 hunks)transport/probe/Cargo.toml
(1 hunks)transport/probe/src/config.rs
(1 hunks)transport/probe/src/content.rs
(1 hunks)transport/probe/src/db_proxy.rs
(1 hunks)transport/probe/src/errors.rs
(1 hunks)transport/probe/src/lib.rs
(1 hunks)transport/probe/src/neighbors.rs
(1 hunks)transport/probe/src/ping.rs
(1 hunks)transport/probe/src/probe.rs
(1 hunks)transport/probe/src/traits.rs
(1 hunks)transport/protocol/Cargo.toml
(1 hunks)transport/protocol/benches/protocol_throughput_emulated.rs
(3 hunks)transport/protocol/src/config.rs
(0 hunks)transport/protocol/src/lib.rs
(3 hunks)transport/protocol/src/processor.rs
(3 hunks)transport/protocol/src/stream.rs
(6 hunks)transport/protocol/tests/common/mod.rs
(6 hunks)transport/session/Cargo.toml
(1 hunks)transport/session/src/initiation.rs
(12 hunks)transport/session/src/manager.rs
(16 hunks)transport/session/src/traits.rs
(1 hunks)transport/session/src/types.rs
(9 hunks)
💤 Files with no reviewable changes (17)
- hoprd/rest-api/Cargo.toml
- .github/renovate.json
- hoprd/hoprd/src/lib.rs
- hoprd/hoprd/Cargo.toml
- transport/p2p/src/constants.rs
- transport/protocol/src/config.rs
- .github/files/pr-python-sdk.md
- transport/network/src/lib.rs
- transport/p2p/src/behavior/mod.rs
- transport/network/Cargo.toml
- transport/api/src/constants.rs
- transport/network/src/errors.rs
- transport/network/src/constants.rs
- transport/network/src/heartbeat.rs
- transport/p2p/src/behavior/heartbeat.rs
- transport/network/src/messaging.rs
- transport/network/src/ping.rs
✅ Files skipped from review due to trivial changes (2)
- sdk/python/localcluster/cluster.py
- transport/protocol/benches/protocol_throughput_emulated.rs
🚧 Files skipped from review as they are similar to previous changes (79)
- crypto/types/Cargo.toml
- common/network-types/Cargo.toml
- transport/bloom/benches/bloom_filter.rs
- METRICS.md
- ethereum/contracts/contracts-addresses.json
- transport/api/src/helpers.rs
- tests/conftest.py
- transport/session/src/traits.rs
- common/parallelize/Cargo.toml
- hopr/hopr-lib/src/config.rs
- transport/bloom/src/persistent.rs
- db/api/src/protocol.rs
- transport/session/Cargo.toml
- transport/network/src/config.rs
- hopli/src/key_pair.rs
- transport/probe/.gitignore
- common/network-types/src/session/state.rs
- hopr/hopr-lib/tests/session_integration_tests.rs
- db/sql/Cargo.toml
- hoprd/rest-api/src/peers.rs
- crypto/packet/Cargo.toml
- db/migration/src/lib.rs
- transport/bloom/.gitignore
- common/primitive-types/Cargo.toml
- transport/packet/Cargo.toml
- db/sql/src/peers.rs
- hoprd/rest-api/src/session.rs
- common/internal-types/Cargo.toml
- transport/packet/src/errors.rs
- sdk/barebone.cfg.yaml
- hoprd/hoprd/src/config.rs
- transport/protocol/Cargo.toml
- sdk/barebone-lower-win-prob.cfg.yaml
- README.md
- transport/protocol/src/processor.rs
- common/network-types/src/types.rs
- transport/mixer/src/channel.rs
- db/migration/src/m20250528_000023_peers_reset.rs
- transport/session/src/types.rs
- transport/p2p/Cargo.toml
- hoprd/hoprd/example_cfg.yaml
- crypto/sphinx/Cargo.toml
- transport/bloom/src/lib.rs
- hoprd/keypair/src/key_pair.rs
- transport/probe/src/errors.rs
- sdk/default.cfg.yaml
- transport/protocol/src/lib.rs
- transport/api/Cargo.toml
- transport/api/src/errors.rs
- transport/p2p/src/behavior/discovery.rs
- transport/api/src/config.rs
- hopr/hopr-lib/src/lib.rs
- transport/p2p/tests/p2p_transport_test.rs
- transport/probe/src/lib.rs
- transport/bloom/Cargo.toml
- common/primitive-types/src/primitives.rs
- crypto/sphinx/src/packet.rs
- transport/packet/src/lib.rs
- transport/mixer/benches/mixer_throughput.rs
- transport/probe/src/neighbors.rs
- transport/protocol/src/stream.rs
- transport/probe/src/db_proxy.rs
- hoprd/hoprd/src/cli.rs
- transport/network/src/network.rs
- transport/probe/src/config.rs
- chain/indexer/src/handlers.rs
- common/internal-types/src/protocol.rs
- transport/p2p/src/lib.rs
- transport/probe/Cargo.toml
- transport/probe/src/content.rs
- transport/session/src/initiation.rs
- transport/bloom/src/raw.rs
- transport/session/src/manager.rs
- transport/probe/src/traits.rs
- transport/protocol/tests/common/mod.rs
- transport/packet/src/v1.rs
- transport/probe/src/probe.rs
- Cargo.toml
- transport/api/src/lib.rs
🧰 Additional context used
🧠 Learnings (1)
.github/labeler.yml (1)
Learnt from: Teebor-Choka
PR: hoprnet/hoprnet#7142
File: .github/labeler.yml:95-106
Timestamp: 2025-05-26T22:04:22.750Z
Learning: In .github/labeler.yml, both legacy crate labels (like hopr-path, hopr-platform) and new crate labels (like hopr-chain-actions, hopr-chain-api) can coexist in the same file. When new crate labels are added, it doesn't necessarily mean old ones are deprecated or should be removed.
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: hoprd / docker
- GitHub Check: hopli / docker
- GitHub Check: tests-unit
- GitHub Check: tests-smoke-hopli
- GitHub Check: tests-unit-nightly
- GitHub Check: zizmor
- GitHub Check: Linter
🔇 Additional comments (12)
transport/p2p/src/swarm.rs (3)
11-12
: LGTM! Import changes align with heartbeat removal.The imports correctly reflect the transition from heartbeat-based functionality to network-triggered events and peer discovery mechanisms.
Also applies to: 21-21
34-41
: Function signatures properly simplified for heartbeat removal.The removal of heartbeat-related parameters from
build_p2p_network
,HoprNetworkBehavior::new
,HoprSwarm::new
, and therun
method appropriately simplifies the API and aligns with the architectural transition to probe-based functionality.Also applies to: 62-62, 105-117, 174-174
303-303
: Good addition for peer address management.Adding
swarm.add_peer_address()
when handlingSwarmEvent::NewExternalAddrOfPeer
is a logical enhancement for maintaining peer address information in the swarm.transport/probe/src/ping.rs (3)
46-47
: Good! Box allocation removed as suggested.The challenge field is now a tuple instead of
Box<(u64, NeighborProbe)>
, which removes the unnecessary heap allocation mentioned in previous reviews. The use ofcurrent_time()
is maintained for platform compatibility with WASM.Also applies to: 53-57
69-78
: Latency calculation logic is correct.The implementation properly validates the pong response against the challenge using
is_complement_to()
and correctly calculates unidirectional latency by halving the round-trip time. The error handling for invalid responses is appropriate.
135-262
: Excellent test coverage for ping functionality.The test suite comprehensively covers all important scenarios:
- Correct pong response validation
- Incorrect nonce handling
- Latency measurement accuracy
- Timeout behavior
- Integration with channel-based communication
The tests use realistic async simulation and provide good confidence in the implementation.
.github/labeler.yml (3)
19-19
: Good addition of uv.lock to dependency tracking.Adding
"uv.lock"
to the dependency file patterns is appropriate as it's a Python package lock file, similar to how"Cargo.lock"
is tracked for Rust dependencies.
32-46
: Well-organized crate label restructuring.The systematic reorganization of crate labels reflects a more modular project structure. The addition of chain-specific crates (
hopr-chain-actions
,hopr-chain-api
, etc.) and transport-specific crates (hopr-transport-bloom
,hopr-transport-probe
, etc.) provides better granularity for automated labeling.Also applies to: 74-139
53-64
:✅ Verification successful
Verify crypto crate directory structure.
The labeler patterns reference specific crypto subdirectories (
crypto/packet/**
,crypto/random/**
, etc.). Please ensure these directories exist in the repository structure, as previous reviews indicated they might be missing.
🏁 Script executed:
#!/bin/bash # Verify that crypto subdirectories exist to match labeler patterns echo "Checking crypto directory structure:" fd -t d . crypto/ || echo "No subdirectories found in crypto/" echo -e "\nListing all crypto-related paths:" fd . crypto/ | head -20Length of output: 858
Labeler patterns for crypto crates verified
I confirmed that the following directories exist, so the rules in
.github/labeler.yml
will match as intended:
- crypto/packet/**
- crypto/random/**
- crypto/sphinx/**
- crypto/types/**
No changes required.
transport/api/src/network_notifier.rs (3)
40-69
: Well-executed transition to probe-based network interactions.The refactoring from
PingExternalInteractions
toProbeNetworkInteractions
with the newPeerDiscoveryFetch
andProbeStatusUpdate
trait implementations properly abstracts the network interaction logic. The error handling and logging are appropriate, and the metrics integration is maintained.Also applies to: 72-89, 92-153
17-30
: Metrics properly updated for probe system.The metrics definitions and usage have been appropriately updated to reflect the probe-based architecture while maintaining backward compatibility with existing monitoring. The histogram for ping times and counter for probe results provide good observability.
Also applies to: 104-108, 113-114
80-88
: Improved error handling and logging.The error handling has been enhanced with:
- Graceful fallback to empty peer list on discovery failures
- More specific error messages for peer resolution issues
- Appropriate logging levels for different error conditions
The changes provide better observability and robustness.
Also applies to: 128-132, 134-151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some public types are still missing docs + some minor issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
transport/probe/src/probe.rs (1)
360-369
:⚠️ Potential issueAddress the unsafe transmute in test code.
This is a duplicate of a previous review concern. The
random_memory_violating_surb
function uses unsafetransmute
which could lead to undefined behavior.The function generates potentially invalid SURB structures for testing. Consider using a safer approach that ensures valid test data structure.
🧹 Nitpick comments (2)
transport/probe/src/probe.rs (2)
48-75
: Consider improving error handling specificity.The
send_message
method has good instrumentation and error handling flow, but all error variants are converted to the sameProbeError::SendError
type, which may lose important diagnostic information.Consider preserving more specific error information:
- .map_err(|e: anyhow::Error| ProbeError::SendError(e.to_string()))?; + .map_err(|e: anyhow::Error| ProbeError::SerializationError(e.to_string()))?;- } else { - Err(ProbeError::SendError("transport error".to_string())) + } else { + Err(ProbeError::TransportError("downstream sink closed".to_string()))
246-246
: Consider logging SURB response failures.The pong response sending ignores errors with
let _ = ...
. While this may be intentional to avoid blocking the probe flow, consider adding debug logging for diagnostic purposes.- let _ = push_to_network.send_message(path, message).await; + if let Err(e) = push_to_network.send_message(path, message).await { + tracing::debug!(%pseudonym, %e, "failed to send pong response"); + }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
transport/api/src/lib.rs
(16 hunks)transport/packet/src/v1.rs
(1 hunks)transport/probe/src/db_proxy.rs
(1 hunks)transport/probe/src/probe.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- transport/probe/src/db_proxy.rs
- transport/packet/src/v1.rs
- transport/api/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: hoprd / docker
- GitHub Check: zizmor
- GitHub Check: Linter
- GitHub Check: tests-smoke-hopli
- GitHub Check: tests-unit
- GitHub Check: tests-unit-nightly
🔇 Additional comments (7)
transport/probe/src/probe.rs (7)
26-33
: Utility function looks correct.The
to_nonce
function properly handles different message types and provides appropriate string representations for logging/debugging purposes.
35-41
: Pseudonym extraction logic is sound.The function correctly extracts pseudonyms from both forward and return routing paths, handling the different variants appropriately.
77-91
: Probe struct definition is clean and well-documented.The struct holds the necessary state for probe operations with clear field documentation.
115-156
: Cache configuration and eviction handling are well-designed.The routing cache TTL of 600 seconds and capacity of 100,000 entries are reasonable. The async eviction listener properly handles probe timeouts by notifying the store of failures.
164-218
: Probe emission logic handles routing resolution robustly.The emission task properly:
- Merges automatic neighbor discovery with manual probe requests
- Uses caching to avoid repeated routing resolution
- Handles routing resolution failures gracefully with appropriate logging
- Tracks active probes for timeout management
220-279
: Message processing handles all probe types correctly.The processing task appropriately:
- Forwards non-probe messages upstream unchanged
- Handles ping requests by responding with pong messages using SURBs
- Processes pong responses by calculating latency and updating the store
- Logs errors for invalid messages and SURB lookup failures
The bounded concurrency using
max_parallel_probes
addresses the previous review concern about resource exhaustion.
285-687
: Test coverage is comprehensive and well-structured.The test suite covers:
- Manual probe requests with success and timeout scenarios
- Automatic probe generation and result recording
- Message pass-through for non-probe traffic
- Concurrent probe processing with configurable pass rates
The test infrastructure with mock stores and channels provides good isolation and control.
26d54f2
to
fbae516
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looking forward to testing it
The PR in nutshell:
More details:
This pull request introduces several updates across multiple files, focusing on dependency management, configuration options, and code enhancements. Key changes include updates to
Cargo.toml
dependencies, restructuring labels in.github/labeler.yml
, and improvements to theREADME.md
configuration options. Additionally, minor code adjustments and logging enhancements are made inhandlers.rs
.Dependency Updates:
Cargo.toml
: Pinnedalloy
to version1.0.4
due to breaking changes in patch releases and updated multiple dependencies, includingbackon
(to1.5.1
) andclap
(to4.5.39
). Switchedlibp2p
andlibp2p-stream
to specific Git revisions for enhanced configurability. Updatedprimitive-types
to version0.13.1
for compatibility withethers.rs
. [1] [2] [3] [4]Configuration and Labeling Enhancements:
.github/labeler.yml
: Added new crates and updated paths for better organization, includinghopr-chain-actions
,hopr-chain-api
, andhopr-transport-*
modules. Removed outdated labels and added new ones for recently structured directories. [1] [2]README.md
: Added new options like--defaultSessionListenHost
and--probeRecheckThreshold
. Removed deprecated options such as--heartbeatInterval
and--heartbeatThreshold
. Introduced environment variables likeENV_WORKER_THREADS
for tokio executor configuration. [1] [2] [3]Code Enhancements:
chain/indexer/src/handlers.rs
: Improved logging by adding structured fields and updated tests to useHoprBalance
for better type safety. Adjusted informational messages for node activation processes. [1] [2] [3] [4]Documentation Updates:
METRICS.md
: Renamed metrichopr_heartbeat_pings_count
tohopr_probe_count
for consistency with new terminology.Minor Updates:
.github/renovate.json
: Removedserde_cbor
andprimitive-types
from package matching due to compatibility issues.chain/indexer/Cargo.toml
: Updated package version from0.7.0
to0.7.1
.