Skip to content

[Bug] TypeError: GroupCoordinator.all_gather() got an unexpected keyword argument 'tensor_list #7417

@whybeyoung

Description

@whybeyoung

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

[2025-06-21 19:55:30 DP0 TP7] TpModelWorkerClient hit an exception: Traceback (most recent call last):
  File "/usr/local/src/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 127, in forward_thread_func
    self.forward_thread_func_()
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/src/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 162, in forward_thread_func_
    self.worker.forward_batch_generation(
  File "/usr/local/src/sglang/python/sglang/srt/managers/tp_worker.py", line 211, in forward_batch_generation
    logits_output, can_run_cuda_graph = self.model_runner.forward(
  File "/usr/local/src/sglang/python/sglang/srt/model_executor/model_runner.py", line 1221, in forward
    output = self._forward_raw(
  File "/usr/local/src/sglang/python/sglang/srt/model_executor/model_runner.py", line 1250, in _forward_raw
    ret = self.forward_extend(
  File "/usr/local/src/sglang/python/sglang/srt/model_executor/model_runner.py", line 1189, in forward_extend
    return self.model.forward(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/src/sglang/python/sglang/srt/models/deepseek_v2.py", line 1764, in forward
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/src/sglang/python/sglang/srt/models/deepseek_v2.py", line 1657, in forward
    hidden_states, residual = layer(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/src/sglang/python/sglang/srt/models/deepseek_v2.py", line 1491, in forward
    hidden_states, residual = self.layer_communicator.prepare_attn(
  File "/usr/local/src/sglang/python/sglang/srt/layers/communicator.py", line 191, in prepare_attn
    hidden_states = self._communicate_simple_fn(
  File "/usr/local/src/sglang/python/sglang/srt/layers/communicator.py", line 294, in _scattered_to_tp_attn_full
    attn_tp_all_gather(
  File "/usr/local/src/sglang/python/sglang/srt/layers/dp_attention.py", line 313, in attn_tp_all_gather
    return get_attention_tp_group().all_gather(input_, tensor_list=output_list)
TypeError: GroupCoordinator.all_gather() got an unexpected keyword argument 'tensor_list'```






### Reproduction

## start cmd

```python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --disable-radix-cache --init-expert-location /home/aiges/tuned/attachment_ep_statistics/prefill_in1024.json --ep-dispatch-algorithm dynamic --eplb-algorithm deepseek --deepep-config /home/aiges/tuned/tuned_8sms.json --enable-dp-lm-head --chunked-prefill-size 262144 --max-prefill-tokens 32768 --tp 16 --dp-size 2 --page-size 64 --enable-dp-attention --context-length 32768 --max-running-requests 1024 --mem-fraction-static 0.83 --enable-deepep-moe --deepep-mode normal --ep-num-redundant-experts 32 --moe-dense-tp-size 1 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond3 --enable-metrics --disaggregation-mode prefill --nnodes 2 --dist-init-addr xdeepseekv3-lws-mtp-main-prefill-0.xdeepseekv3-lws-mtp-main-prefill.aiservice:20102 --node-rank 0```

### Environment

main branch

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions