Skip to content

Integrate Modernized Provider System from go-libp2p-kad-dht #10881

@lidel

Description

@lidel

Overview

This issue tracks the integration of the "New KAD-DHT Provide system" with "Reprovide Sweep" strategy from go-libp2p-kad-dht into Kubo 0.38+.

The modernized Provider system and related interface/architecture refactor significantly improves DHT content republishing efficiency by exploring "keyspace regions" instead of providing keys one-by-one, spreading reprovide operations evenly over time to avoid performance bursts.

Key Benefits:

  • More efficient key republishing (current system can take up to 10 seconds per key if unlucky and hitting worst case scenarios). This raises the ceiling from the ~8,000 key limit during 22-hour intervals for the default DHT client
  • Enables backend optimizations, concurrent provide operations with error handling and retry mechanisms, dynamic prefix length estimation for keyspace exploration etc

More details in libp2p/go-libp2p-kad-dht#1082 and libp2p/go-libp2p-kad-dht#1095

Dependencies

The following must be completed before this integration:

Implementation Tasks

Core Integration

  • Update go.mod dependencies to include latest boxo and kad-dht versions
  • Add new Internal.DHTProviderSweepSystem (name tbd) Flag configuration option (mark as experimental, opt-in, disabled by default for now, in the future it will flip to true by default, and eventually we will remove it)
    • purge (re)provider queues when flag state changes, similar to how we do it for Reprovider.Strategy
  • Document in changelog and docs/config.md
  • Add forced ipfs provide clear when switching to/from new system (similar to existing Reprovider.Strategy changes)

Metrics and Observability

RPC/CLI Command Updates

TBD, provisionally we want to move everything related to provide/reprovide under ipfs provide namespace.

  • Update ipfs routing provide|reprovide commands to work with new system (if possible/feasible, if not, return informative error to use new commands in ipfs provide)
  • Similar for ipfs stats provide and ipfs stats reprovide (wire up, or update to return error until properly wired up for new system)
  • Ensure ipfs provide clear works correctly with new system, and that queue is automatically purged when
  • Direct users to use modern ipfs provide commands (update --help of deprected commands)

Configuration

  • Add configuration option in Kubo config (opt-in initially)
  • Update docs/config.md
  • Include migration guidance and performance comparison information (anectodal A/B from collab cluster) in changelog
  • Document breaking changes and command deprecations in changelog

Testing Requirements

End-to-End Regression Tests

TBD, we may not have tests for different Reprovider.Strategy. If not, we need to add them, to catch regression. Avoid Sharness if possible, prefer E2E in go tests in test/cli

  • Test all existing Reprovider.Strategy options continue to work
    • With defaults (old backend)
    • With new provide system (opt-in to new backend)

Performance Validation

  • Kubo PR with boxo and kad-dht + config wired up
  • Deploy staging image to 2 collab cluster boxes for A/B, opt-in provide on one of them
  • write down results on the Kubo PR with integration / changelog

Breaking Changes

⚠️ Important behavioral changes:

  1. Provider Queue Reset: Switching to/from the new reprovide sweep system will force ipfs provide clear to ensure provider queues are reset
  2. Command Deprecation: when opt-in ipfs routing provide|reprovide commands may return errors directing users to modern ipfs provide commands when using the new system (TBD, maybe we can keep them working)
  3. Stats Commands: when opt-in ipfs stats provide and ipfs stats reprovide will return errors when using the new system until properly implemented (TBD, maybe we can keep them working, but we should mark them as deprecated)
  4. Opt-in Required: Existing users will continue using the old system; new system requires explicit configuration change

Success Criteria

  • new provide system available as opt-in configuration option
  • existing functionality preserved for users not opting in
  • metrics show improved provide performance (higher average provide rate / shorter provide window)
  • end-to-end tests pass for
    • all reprovider strategies with old backend
    • all reprovider strategies with new backend
  • new provide system stats RPC/CLI wired up in ipfs stats provide|reprovide ipfs provide stat
  • new provide system RPC/CLI for manual provide <cid>/reprovide
  • documentation and changelog

Related Issues and PRs

Metadata

Metadata

Assignees

Labels

P1High: Likely tackled by core team if no one steps updif/hardeffort/weeksEstimated to take multiple weeksepicexp/expertHaving worked on the specific codebase is importantkind/featureA new featuretopic/dhtTopic dhttopic/providerTopic provider

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions