Skip to content

[Feature] Add e2e tests for inconsistency between worker group's replicas and the number of Pods #2575

@kevin85421

Description

@kevin85421

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

ray-project/ray#48909 fixes an issue in Ray Autoscaler V2. The bug occurs when submitting many Ray tasks simultaneously, triggering the Autoscaler to create multiple Ray nodes. According to the PR description, it should create 10 Ray Pods. However, due to the delay between the replicas in the RayCluster CR spec and the actual number of Pods, only 5 Pods (or fewer than 10) are created.

Add an e2e tests:

  • Submit many Ray tasks at the same time.
  • Make sure the cluster can scale up to maxReplicas and all tasks can finish successfully.

ray-project/ray#48909 will be included in Ray 2.41. If the release has not been made yet, we can test V1 first and then add a follow-up PR for V2 later.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions