Skip to content

[Bug] Llama 4 CUDA assertion error on long input length with fa3 backend #5170

@KCFindstr

Description

@KCFindstr

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I built the latest SGLang docker image from head and used it to serve the meta-llama/Llama-4-Scout-17B-16E-Instruct model on an 8xH100 machine, sending requests to the /chat/completions endpoint. It handles requests of 3k input tokens without any problems, but once I send a request of ~10k input tokens, it crashes with a CUDA assertion error. Once I remove the --attention-backend=fa3 flag, the long requests can be served successfully.

[2025-04-08 23:08:18 TP0] Prefill batch. #new-seq: 1, #new-token: 3296, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-04-08 23:08:22 TP0] Decode batch. #running-req: 1, #token: 3330, token usage: 0.00, gen throughput (token/s): 1.28, #queue-req: 0, 
[2025-04-08 23:08:46 TP0] Prefill batch. #new-seq: 1, #new-token: 6958, #cached-token: 3292, token usage: 0.00, #running-req: 0, #queue-req: 0, 
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: operator(): block: [20: block: [20,0,0,0,0], thread: [10], thread: [57,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [58], thread: [11,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [59], thread: [12,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [60], thread: [13,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [14,0,0,0], thread: [61] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [20:93,0: operator(),0: block: [20], thread: [15,0,0,0,0], thread: [62] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0,0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(): block: [20,0,0,0,0], thread: [16], thread: [63,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0,0], thread: [64], thread: [17,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [65], thread: [18,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [66], thread: [19,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [67], thread: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [21], thread: [68,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [22,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [23,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0,0: block: [20], thread: [24,0,0,0,0], thread: [71] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [20:93,0: operator(),0: block: [20], thread: [25,0,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [26,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [27,0,0], thread: [74,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [28], thread: [75,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [29], thread: [76,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [30], thread: [77,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [31], thread: [78,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [20,0,0], thread: [64,0,0], thread: [79,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [65: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [80` failed.
,0,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [66,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [81` failed.
,0,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [67: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [82` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [68: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [83` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [69: block: [20,0,0,0,0], thread: [84] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21,0: operator(),0: block: [20], thread: [70,0,0,0,0], thread: [85] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [71,0,0,0,0], thread: [86] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [72,0,0,0,0], thread: [87] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [73: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [88` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [74: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [89` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [75: block: [20,0,0,0,0], thread: [90] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [76,0,0,0,0], thread: [91] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [20,0,0], thread: [77,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): operator(): block: [21: block: [20,0,0,0,0], thread: [78,0], thread: [93,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [21: block: [20,0,0,0,0], thread: [94], thread: [79,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [21,0,0,0,0], thread: [95], thread: [80,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [81,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [82,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(): block: [23,0,0,0,0], thread: [83], thread: [98,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: operator(): block: [21: block: [23,0,0,0,0], thread: [84], thread: [99,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [85,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [86,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [87,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [88,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [89,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu:93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [23], thread: [90,0,0,0,0], thread: [105] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [23], thread: [91,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [106` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [92: block: [23,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
], thread: [107,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [93: block: [23,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [108` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
...

Error log is too long so I just included the first few lines. Let me know if more info is needed!

Reproduction

Build a docker from head and run the server:

#!/bin/bash

docker run --gpus all \
  --shm-size 32g \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HF_TOKEN=$HF_TOKEN \
  -e NVIDIA_VISIBLE_DEVICES=all \
  --ipc=host --network=host --privileged \
  <docker uri> \
    --host=0.0.0.0 \
    --port=30000 \
    --model=meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --context-length=1000000 \
    --enable-torch-compile \
    --disable-cuda-graph \
    --attention-backend=fa3 \
    --tp 8

This server can handle short prompts around 3k tokens without any problems, but once I send a request of 10k input tokens, it crashes immediately.

Environment

Python: 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H100 80GB HBM3
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.90.07
PyTorch: 2.5.1+cu124
sglang: 0.4.5
sgl_kernel: 0.0.8
flashinfer: Module Not Found
triton: 3.1.0
transformers: 4.51.0
torchao: 0.10.0
numpy: 2.2.4
aiohttp: 3.11.16
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.24.1
orjson: 3.10.16
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.11.3
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.17
openai: 1.71.0
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.65.4.post1
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    0-51,104-155    0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    0-51,104-155    0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    0-51,104-155    0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    0-51,104-155    0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    52-103,156-207  1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    52-103,156-207  1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    52-103,156-207  1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      52-103,156-207  1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Hypervisor vendor: KVM
ulimit soft: 1048576

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions