Skip to content

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Aug 17, 2024

Motivation

Modification

Checklist

  • Before submitting a PR for review, make sure it has passed verification in your local development environment at least.
  • Ensure pre-commit pre-commit run --all-files or other linting tools are used to fix potential lint issues.
  • Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
  • Modify documentation as needed, such as docstrings or example tutorials.

@zhyncs zhyncs marked this pull request as draft August 17, 2024 13:07
@zhyncs zhyncs added the wip label Aug 17, 2024
@zhyncs zhyncs removed the wip label Aug 17, 2024
@zhyncs zhyncs marked this pull request as ready for review August 17, 2024 14:27
@zhyncs
Copy link
Member Author

zhyncs commented Aug 17, 2024

tested with GCP T4

(base) root@hostname:/home/me/sglang# python3 -m sglang.launch_server --model Qwen/Qwen1.5-1.8B-Chat --disable-flashinfer-sampling --mem-frac 0.7
server_args=ServerArgs(model_path='Qwen/Qwen1.5-1.8B-Chat', tokenizer_path='Qwen/Qwen1.5-1.8B-Chat', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', dtype='auto', trust_remote_code=False, context_length=None, quantization=None, served_model_name='Qwen/Qwen1.5-1.8B-Chat', chat_template=None, host='127.0.0.1', port=30000, additional_ports=[30001, 30002, 30003, 30004], mem_fraction_static=0.7, max_running_requests=None, max_num_reqs=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, tp_size=1, stream_interval=1, random_seed=593901843, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, api_key=None, file_storage_pth='SGLang_storage', dp_size=1, load_balance_method='round_robin', disable_flashinfer=False, disable_flashinfer_sampling=True, disable_radix_cache=False, disable_regex_jump_forward=False, disable_cuda_graph=False, disable_disk_cache=False, enable_mixed_chunk=False, enable_torch_compile=False, enable_p2p_check=False, enable_mla=False, attention_reduce_in_fp32=False, efficient_weight_load=False, nccl_init_addr=None, nnodes=1, node_rank=None)
[gpu=0] Init nccl begin.
[gpu=0] Load weight begin. avail mem=14.47 GB
Compute capability below sm80 use float16 due to lack of bfloat16 support.
INFO 08-17 14:35:09 weight_utils.py:225] Using model weights format ['*.safetensors']
INFO 08-17 14:35:09 weight_utils.py:269] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.79s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.79s/it]

[gpu=0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.float16, avail mem=10.93 GB
[gpu=0] Memory pool end. avail mem=4.02 GB
[gpu=0] Capture cuda graph begin. This can take up to several minutes.
[gpu=0] max_total_num_tokens=35991, max_prefill_tokens=16384, max_running_requests=2047, context_len=32768
INFO:     Started server process [226684]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
INFO:     127.0.0.1:57388 - "GET /get_model_info HTTP/1.1" 200 OK
[gpu=0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
INFO:     127.0.0.1:57398 - "POST /generate HTTP/1.1" 200 OK
The server is fired up and ready to roll!

@zhyncs zhyncs merged commit 9208591 into sgl-project:main Aug 17, 2024
5 checks passed
@zhyncs zhyncs deleted the sm75 branch August 17, 2024 14:45
@zhyncs
Copy link
Member Author

zhyncs commented Aug 17, 2024

The reason this check is not added in check_server_args is because there will be a Cannot re-initialize CUDA in forked subprocess.

@zhyncs zhyncs mentioned this pull request Aug 17, 2024
4 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant