Skip to content

[Bug] [CI regression] [AMD] TestNoOverlapScheduler #7703

@michael-amd

Description

@michael-amd

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

The CI unit-test-backend-1-gpu-amd failed when runtest/srt/test_no_overlap_scheduler.py. It exits with a GPU memory access fault on node-2.

Error snippet:

...
batch. #new-seq: 1, #new-token: 32, #cached-token: 0, token usage: 0.00, #running-req: 8, #queue-req: 119
[2025-07-02 03:22:31] Prefill batch. #new-seq: 1, #new-token: 32, #cached-token: 0, token usage: 0.00, #running-req: 8, #queue-req: 119
...
[2025-07-02 03:22:34] Prefill batch. #new-seq: 2, #new-token: 32, #cached-token: 0, token usage: 0.01, #running-req: 45, #queue-req: 81
Memory access fault by GPU node-2 (Agent handle: 0xdedb180) on address 0x7f57d9a00000. Reason: Unknown.

@hubertlu-tw suggust temporarily disable the test in AMD CI to avoid blocking other PRs.

Reproduction

Sample failure run: https://github.com/sgl-project/sglang/actions/runs/15965626491/job/45029188833

SGLANG_AMD_CI=1 SGLANG_IS_IN_CI=1 SGLANG_USE_AITER=1 python3 -m unittest test_no_overlap_scheduler.py

Environment

  • Docker image: lmsysorg/sglang:v0.4.8.post1-rocm630

CC: @saienduri @HaiShaw @hubertlu-tw

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions