Skip to content

Conversation

javiermtorres
Copy link
Contributor

What's changing

When storing the Ray data in Redis, the cluster information goes in as well - including the dead head nodes from previous Ray executions. In some situations, e.g. when a job doesn't have node requirements, the scheduler will place it on the node head, but chances are a dead head node is retrieved.

How to test it

This should avoid error messages mentioning that the assigned node for a job is "infeasible".

Additional notes for reviewers

N/A

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and product documentation under /docs)
  • Checked if a (backend) DB migration step was required and included it if required

@javiermtorres javiermtorres marked this pull request as ready for review February 4, 2025 13:44
Copy link
Contributor

@njbrake njbrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to leave some kind of comment in the code explaining what that environment variable is?

@javiermtorres
Copy link
Contributor Author

@javiermtorres javiermtorres merged commit 745701c into main Feb 5, 2025
15 checks passed
@javiermtorres javiermtorres deleted the javiermtorres/fix-dead-head-scheduling branch February 5, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants