-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Overview
This issue tracks the integration of the "New KAD-DHT Provide system" with "Reprovide Sweep" strategy from go-libp2p-kad-dht into Kubo 0.38+.
The modernized Provider system and related interface/architecture refactor significantly improves DHT content republishing efficiency by exploring "keyspace regions" instead of providing keys one-by-one, spreading reprovide operations evenly over time to avoid performance bursts.
Key Benefits:
- More efficient key republishing (current system can take up to 10 seconds per key if unlucky and hitting worst case scenarios). This raises the ceiling from the ~8,000 key limit during 22-hour intervals for the default DHT client
- Enables backend optimizations, concurrent provide operations with error handling and retry mechanisms, dynamic prefix length estimation for keyspace exploration etc
More details in libp2p/go-libp2p-kad-dht#1082 and libp2p/go-libp2p-kad-dht#1095
Dependencies
The following must be completed before this integration:
-
boxo integration:
-
go-libp2p-kad-dht consolidation:
- Changes from issues linked in Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 need to be consolidated in the
provider
branch
- Changes from issues linked in Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 need to be consolidated in the
-
Kubo dependencies:
- Update Kubo to depend on latest boxo and kad-dht with necessary changes
- Provide newly received blocks according to reprovide strategy #10837
- MFS provides strategy: Plan and implement solution for handling MFS provides to not break when
Reprovider.Strategy=mfs|pinned+mfs
- MFS provides strategy: Plan and implement solution for handling MFS provides to not break when
Implementation Tasks
Core Integration
- Update go.mod dependencies to include latest boxo and kad-dht versions
- Add new
Internal.DHTProviderSweepSystem
(name tbd)Flag
configuration option (mark as experimental, opt-in, disabled by default for now, in the future it will flip to true by default, and eventually we will remove it)- purge (re)provider queues when flag state changes, similar to how we do it for
Reprovider.Strategy
- purge (re)provider queues when flag state changes, similar to how we do it for
- Document in
changelog
anddocs/config.md
- Add forced
ipfs provide clear
when switching to/from new system (similar to existingReprovider.Strategy
changes)
Metrics and Observability
- Implement and document DHT provider record puts metric (ideally in go-libp2p-kad-dht - perhaps we already have one?) that counts raw DHT publish events
- @guillaumemichel : we currently have this metric
- Expose metrics via Kubo's existing metrics endpoints (
/debug/metrics/prometheus
) and document in changelog - Ensure metric works for both old and new provide systems for performance comparison
- @guillaumemichel : the new system metric corresponds to the sum of both old system counters
- Add metric to collab cluster grafana board to visualize average provide rate when provide system is working (for A/B test)
- provider: display stats for new provide system #10900
RPC/CLI Command Updates
TBD, provisionally we want to move everything related to provide/reprovide under ipfs provide
namespace.
- Update
ipfs routing provide|reprovide
commands to work with new system (if possible/feasible, if not, return informative error to use new commands inipfs provide
) - Similar for
ipfs stats provide
andipfs stats reprovide
(wire up, or update to return error until properly wired up for new system) - Ensure
ipfs provide clear
works correctly with new system, and that queue is automatically purged when - Direct users to use modern
ipfs provide
commands (update--help
of deprected commands)
Configuration
- Add configuration option in Kubo config (opt-in initially)
- Update
docs/config.md
- Include migration guidance and performance comparison information (anectodal A/B from collab cluster) in changelog
- Document breaking changes and command deprecations in changelog
Testing Requirements
End-to-End Regression Tests
TBD, we may not have tests for different Reprovider.Strategy
. If not, we need to add them, to catch regression. Avoid Sharness if possible, prefer E2E in go tests in test/cli
- Test all existing
Reprovider.Strategy
options continue to work- With defaults (old backend)
- With new provide system (opt-in to new backend)
Performance Validation
- Kubo PR with boxo and kad-dht + config wired up
- Deploy staging image to 2 collab cluster boxes for A/B, opt-in provide on one of them
- write down results on the Kubo PR with integration / changelog
Breaking Changes
- Provider Queue Reset: Switching to/from the new reprovide sweep system will force
ipfs provide clear
to ensure provider queues are reset - Command Deprecation: when opt-in
ipfs routing provide|reprovide
commands may return errors directing users to modernipfs provide
commands when using the new system (TBD, maybe we can keep them working) - Stats Commands: when opt-in
ipfs stats provide
andipfs stats reprovide
will return errors when using the new system until properly implemented (TBD, maybe we can keep them working, but we should mark them as deprecated) - Opt-in Required: Existing users will continue using the old system; new system requires explicit configuration change
Success Criteria
- new provide system available as opt-in configuration option
- existing functionality preserved for users not opting in
- metrics show improved provide performance (higher average provide rate / shorter provide window)
- end-to-end tests pass for
- all reprovider strategies with old backend
- all reprovider strategies with new backend
- new provide system stats RPC/CLI wired up in
ipfs stats provide|reprovide
ipfs provide stat
- new provide system RPC/CLI for manual
provide <cid>/reprovide
- documentation and changelog
Related Issues and PRs
- Source PRs
- feat: Reprovide Sweep libp2p/go-libp2p-kad-dht#1082 (Reprovide Sweep implementation)
- Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 (Reprovide Sweep review)
- Dependencies
- Remove providing Exchange. Call Provide() from relevant places. boxo#976 (boxo integration)
- Provide newly received blocks according to reprovide strategy #10837 (MFS provides issue)
- Background