Skip to content

proposal: Node Resource Balance Rescheduling (#2332) #2341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JBinin
Copy link
Contributor

@JBinin JBinin commented Feb 17, 2025

Ⅰ. Describe what this PR does

Propose Node Resource Balance Rescheduling

Ⅱ. Does this pull request fix one issue?

fixes #2332

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

Copy link

codecov bot commented Feb 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.94%. Comparing base (a12423b) to head (e8dafb3).
Report is 112 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2341   +/-   ##
=======================================
  Coverage   65.94%   65.94%           
=======================================
  Files         466      466           
  Lines       54879    54879           
=======================================
  Hits        36190    36190           
  Misses      16074    16074           
  Partials     2615     2615           
Flag Coverage Δ
unittests 65.94% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

It consistency with the scheduler's **NodeResourcesBalancedAllocation** strategy and provides scalability for incorporating additional resource types in the future.

A node is considered to have excessive fragmentation when its fragmentation rate exceeds a threshold, which may adversely affect future pod scheduling on that node.
To simplify user configuration, the plugin autonomously computes the mean (μ) and standard deviation (σ) of fragmentation rates across the cluster, dynamically setting the threshold at μ + σ.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a theoretical basis to set μ + σ as default or just an experienced value?

- KoordinatorQoSClass
- PodDeletionCost
- EvictionCost
- NodeFragmentationRate (the node fragmentation rate is calculated under the hypothetical scenario of pod eviction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this criteria, we need more details. For example, the Pod that result in the largest decrease in the node's fragmentation rate is evicted first.


#### Pod Selection
The plugin receives externally filter to evaluate and classify Pods on a node into two distinct categories: removable Pods and non-removable Pods.
Subsequently, a secondary filtering process is applied to the removable Pods, which eliminates those whose eviction would result in an increased fragmentation rate on the node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, we need to judge if there is a suitable target node to place the evicted Pod. First, the target node must have enough available CPU and memory resources to accommodate the evicted Pod, this can refer to LowNodeLoad plugin inside koord-descheduler. Second, after placing the Pod on the target node, the node’s resource allocation should not become fragmented.

I feel this is a little bit complicated, or maybe I'm not very clear-minded. We can discuss further.

Copy link
Contributor

@songtao98 songtao98 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this proposal, let's discuss all details clearly. Then we can implement only the most important features inside the first version of code.

For example, suppose node A has a CPU allocation rate of 90% and a memory allocation rate of 50%, while node B has a CPU allocation rate of 50% and a memory allocation rate of 90%.
If a pod requests 15% CPU and 15% memory of the node, the pod may fail to be scheduled, even though the total resources on node A and node B are sufficient.
Such situations should be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a figure here will be more clear. I'll fix the one that I comment in the issue before and try to add it here later.

@@ -0,0 +1,87 @@
---
title: Node Resource Balance Rescheduling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of rescheduling, use descheduling here is better.

Copy link

stale bot commented Jul 17, 2025

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] Rescheduling to address the imbalance of different types of resources on a single node
2 participants