-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
The issue is the same as #2556, but for llama models. We should be able to fix with a similar approach.
The following command crashes.
python3 -m sglang.bench_one_batch --model unsloth/llama-3-8b-bnb-4bit --load-format bitsandbytes
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/root/sglang/python/sglang/bench_one_batch.py", line 470, in <module>
[rank0]: main(server_args, bench_args)
[rank0]: File "/root/sglang/python/sglang/bench_one_batch.py", line 434, in main
[rank0]: work_func(server_args, port_args, bench_args, 0)
[rank0]: File "/root/sglang/python/sglang/bench_one_batch.py", line 369, in latency_test
[rank0]: model_runner, tokenizer = load_model(server_args, port_args, tp_rank)
[rank0]: File "/root/sglang/python/sglang/bench_one_batch.py", line 121, in load_model
[rank0]: model_runner = ModelRunner(
[rank0]: File "/root/sglang/python/sglang/srt/model_executor/model_runner.py", line 158, in __init__
[rank0]: self.load_model()
[rank0]: File "/root/sglang/python/sglang/srt/model_executor/model_runner.py", line 258, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/root/sglang/python/sglang/srt/model_loader/__init__.py", line 22, in get_model
[rank0]: return loader.load_model(
[rank0]: File "/root/sglang/python/sglang/srt/model_loader/loader.py", line 1029, in load_model
[rank0]: self._load_weights(model_config, model)
[rank0]: File "/root/sglang/python/sglang/srt/model_loader/loader.py", line 960, in _load_weights
[rank0]: model.load_weights(qweight_iterator)
[rank0]: File "/root/sglang/python/sglang/srt/models/llama.py", line 442, in load_weights
[rank0]: param = params_dict[name]
[rank0]: KeyError: 'model.layers.0.mlp.down_proj.qweight'
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:01<?, ?it/s]
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed