Skip to content

Conversation

chlins
Copy link
Member

@chlins chlins commented Jul 28, 2025

This pull request introduces significant changes to the seed peer management and preheating logic in the scheduler, primarily aimed at improving flexibility, simplifying the codebase, and enhancing the selection of seed peers. The most notable changes include replacing the centralized SeedPeerClient with a distributed client selection approach, introducing a hash ring for seed peer selection, and updating the HostManager to distinguish between normal and non-normal hosts.

Seed Peer Management Enhancements:

  • Replaced the SeedPeerClient with a distributed client selection mechanism, removing the centralized client implementation (scheduler/resource/standard/seed_peer_client.go removed entirely).
  • Introduced a hash ring for seed peer selection based on task IDs, enabling consistent and balanced seed peer assignment (scheduler/resource/standard/seed_peer.go).
  • Updated the SeedPeer interface to include a SelectSeedPeer method, replacing the old Client method for selecting seed peers dynamically (scheduler/resource/standard/seed_peer.go).

Host Management Updates:

  • Added a new method LoadAllNonNormals to HostManager to retrieve non-normal hosts (used to identify seed peers), and implemented this method in scheduler/resource/standard/host_manager.go. [1] [2]
  • Updated the mock implementation of HostManager to support the new LoadAllNonNormals method (scheduler/resource/standard/host_manager_mock.go).

Preheating Logic Improvements:

  • Updated the preheating logic to use the newly introduced SelectSeedPeer method for selecting seed peers dynamically, replacing the previous approach of relying on a static client (scheduler/job/job.go). [1] [2] [3]
  • Modified the preheatV1SingleSeedPeer and preheatV2SingleSeedPeerByURL methods to create gRPC clients dynamically for the selected seed peer, with a note to reuse clients in the future if performance issues arise (scheduler/job/job.go). [1] [2]

Codebase Simplification:

  • Removed the SeedPeerClient implementation and its associated methods, significantly simplifying the codebase (scheduler/resource/standard/seed_peer_client.go removed entirely).
  • Updated the Resource initialization to directly create the SeedPeer without the intermediate SeedPeerClient (scheduler/resource/standard/resource.go).

These changes collectively improve the flexibility and maintainability of the seed peer management system, while also laying the groundwork for better scalability and performance in future iterations.… self pick

Description

Related Issue

#4217

Motivation and Context

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation Update (if none of the other choices apply)

Checklist

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

@chlins chlins requested a review from a team as a code owner July 28, 2025 04:13
@chlins chlins added the enhancement New feature or request label Jul 28, 2025
@chlins chlins added this to the v2.4.0 milestone Jul 28, 2025
@chlins chlins force-pushed the refactor/seed-peer-pick branch from a092c3d to 1743e7d Compare July 29, 2025 02:42
Copy link

codecov bot commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 10.75269% with 166 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.41%. Comparing base (88dc8e1) to head (4ac99eb).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
scheduler/resource/standard/seed_peer.go 20.93% 68 Missing ⚠️
scheduler/job/job.go 0.00% 38 Missing ⚠️
scheduler/resource/standard/host_manager.go 0.00% 14 Missing ⚠️
scheduler/resource/standard/seed_peer_mock.go 0.00% 14 Missing ⚠️
scheduler/scheduler.go 0.00% 11 Missing ⚠️
scheduler/resource/standard/host_manager_mock.go 0.00% 8 Missing ⚠️
scheduler/resource/standard/resource_mock.go 0.00% 8 Missing ⚠️
scheduler/resource/standard/resource.go 28.57% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4235      +/-   ##
==========================================
- Coverage   32.91%   32.41%   -0.51%     
==========================================
  Files         352      350       -2     
  Lines       41781    41630     -151     
==========================================
- Hits        13754    13493     -261     
- Misses      27137    27259     +122     
+ Partials      890      878      -12     
Flag Coverage Δ
unittests 32.41% <10.75%> (-0.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
scheduler/resource/standard/resource.go 62.50% <28.57%> (-9.24%) ⬇️
scheduler/resource/standard/host_manager_mock.go 4.87% <0.00%> (-23.51%) ⬇️
scheduler/resource/standard/resource_mock.go 0.00% <0.00%> (ø)
scheduler/scheduler.go 0.00% <0.00%> (ø)
scheduler/resource/standard/host_manager.go 52.88% <0.00%> (-8.23%) ⬇️
scheduler/resource/standard/seed_peer_mock.go 0.00% <0.00%> (ø)
scheduler/job/job.go 0.00% <0.00%> (ø)
scheduler/resource/standard/seed_peer.go 16.88% <20.93%> (-6.40%) ⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chlins chlins force-pushed the refactor/seed-peer-pick branch 2 times, most recently from db1dddc to 773b5ba Compare August 6, 2025 10:13
@chlins chlins force-pushed the refactor/seed-peer-pick branch 3 times, most recently from ffbb3c5 to 246e836 Compare August 6, 2025 12:09
@chlins chlins force-pushed the refactor/seed-peer-pick branch from 246e836 to 7079d0f Compare August 7, 2025 06:38
@chlins chlins enabled auto-merge (squash) August 7, 2025 06:43
@gaius-qi
Copy link
Member

gaius-qi commented Aug 7, 2025

  1. Remove unused func.
  2. Remove announcer in the client.
  3. Client uses IP:Port to build consistent hash.

… self pick

Signed-off-by: chlins <chlins.zhang@gmail.com>
@chlins chlins force-pushed the refactor/seed-peer-pick branch from 7079d0f to 4ac99eb Compare August 7, 2025 07:26
Copy link
Member

@gaius-qi gaius-qi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@BraveY BraveY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chlins chlins merged commit bd2ae63 into main Aug 8, 2025
16 checks passed
@chlins chlins deleted the refactor/seed-peer-pick branch August 8, 2025 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants