Skip to content

Conversation

ecosysbin
Copy link
Contributor

What this PR does / why we need it:
Add NetworkTopology plugin score doc

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 23, 2025
1. If it is the first scheduling of a job, all HyperNodes that need to be scored will be given a score of 0 and returned. The HyperNode that is successfully scheduled in the end will be recorded as the `JobAllocatedHyperNode` attribute of the job.
2. If it is not the first scheduling of a job, calculate the LCAHyperNode (Lowest Common Ancestor HyperNode) between all HyperNodes that need to be scored and the `JobAllocatedHyperNode` of the job. The lower the tier of the calculated LCAHyperNode, the higher the score. If there is only one highest score, return the scoring result.
3. If there is more than one HyperNode with the highest score in the scoring result of step 2, calculate the distribution of the tasks that have been successfully scheduled for the job among these HyperNodes. The greater the distribution quantity, the higher the score.
4. The HyperNode that is successfully scheduled in the end in steps 2 and 3 will also be recorded as the `JobAllocatedHyperNode` attribute of the job.

- AddNodeOrderFn: score for nodes.(take effect in soft limit,take effect in )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take effect in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it, you can check again.


- AddNodeOrderFn: score for nodes.(take effect in soft limit,take effect in )
1. To score all nodes, you need to first obtain the HyperNode to which the node belongs and the `JobAllocatedHyperNode` of the job to which the task belongs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need -> we need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

@@ -570,8 +570,14 @@ Allocate resources for queue-\> hyperJob \-\> Job \-\> Task.
- AddJobGroupReadyFn: check whether hyperJob minAvailable is met.(phase 2)

- AddHyperNodeOrderFn: score for hyperNodes.(take effect in hard limit, closest tiers have higher score)
1. If it is the first scheduling of a job, all HyperNodes that need to be scored will be given a score of 0 and returned. The HyperNode that is successfully scheduled in the end will be recorded as the `JobAllocatedHyperNode` attribute of the job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. If it is the first scheduling of a job, all HyperNodes that need to be scored will be given a score of 0 and returned. The HyperNode that is successfully scheduled in the end will be recorded as the `JobAllocatedHyperNode` attribute of the job.
1. If a Job is being scheduled for the very first time, all HyperNodes that need to be scored will get a score of 0 and then return right away. The name of the HyperNode where the Job eventually gets scheduled successfully will be recorded in the Job's annotations under the key JobAllocatedHyperNode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change to "plugin: network-topology-aware" in line 568.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Done

@volcano-sh-bot volcano-sh-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 24, 2025
Signed-off-by: wangbin <994903808@qq.com>
@Monokaix
Copy link
Member

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2025
@JesseStutler
Copy link
Member

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 27, 2025
@volcano-sh-bot volcano-sh-bot merged commit 0b0024d into volcano-sh:network-topology Apr 27, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. retest-not-required-docs-only size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants