Fix allocation of dead head nodes #794

javiermtorres · 2025-02-04T12:23:24Z

What's changing

When storing the Ray data in Redis, the cluster information goes in as well - including the dead head nodes from previous Ray executions. In some situations, e.g. when a job doesn't have node requirements, the scheduler will place it on the node head, but chances are a dead head node is retrieved.

How to test it

This should avoid error messages mentioning that the assigned node for a job is "infeasible".

Additional notes for reviewers

N/A

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and product documentation under /docs)
Checked if a (backend) DB migration step was required and included it if required

njbrake

Would it be possible to leave some kind of comment in the code explaining what that environment variable is?

javiermtorres · 2025-02-05T08:02:08Z

Tracking this issue in https://discuss.ray.io/t/dead-head-nodes-selected-in-scheduling/21686

Fix allocation of dead head nodes

44ced29

javiermtorres requested review from dpoulopoulos, chainlink, veekaybee, aittalam, njbrake and peteski22 February 4, 2025 12:23

javiermtorres marked this pull request as ready for review February 4, 2025 13:44

njbrake approved these changes Feb 4, 2025

View reviewed changes

javiermtorres added 2 commits February 4, 2025 16:50

Merge branch 'main' into javiermtorres/fix-dead-head-scheduling

0102ed5

Merge branch 'main' into javiermtorres/fix-dead-head-scheduling

0c07b38

javiermtorres added 3 commits February 5, 2025 09:12

Add comment

966a89a

Merge branch 'main' into javiermtorres/fix-dead-head-scheduling

82a3f57

Merge branch 'main' into javiermtorres/fix-dead-head-scheduling

ad11ddd

javiermtorres merged commit 745701c into main Feb 5, 2025
15 checks passed

javiermtorres deleted the javiermtorres/fix-dead-head-scheduling branch February 5, 2025 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix allocation of dead head nodes #794

Fix allocation of dead head nodes #794

Uh oh!

javiermtorres commented Feb 4, 2025

Uh oh!

njbrake left a comment

Uh oh!

javiermtorres commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

Fix allocation of dead head nodes #794

Fix allocation of dead head nodes #794

Uh oh!

Conversation

javiermtorres commented Feb 4, 2025

What's changing

How to test it

Additional notes for reviewers

I already...

Uh oh!

njbrake left a comment

Choose a reason for hiding this comment

Uh oh!

javiermtorres commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!