Skip to content

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Sep 3, 2024

Motivation

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Co-authored-by: ispobock <ISPObaoke@163.com>
@zhyncs
Copy link
Member Author

zhyncs commented Sep 3, 2024

python3 benchmark/gsm8k/bench_sglang.py

Latency: 92.746
Invalid: 0.000
Accuracy: 0.935

python3 -m sglang.bench_serving --backend sglang --num-prompts 5000

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Successful requests:                     5000
Benchmark duration (s):                  346.41
Total input tokens:                      1224620
Total generated tokens:                  1061203
Total generated tokens (retokenized):    1055493
Request throughput (req/s):              14.43
Input token throughput (tok/s):          3535.22
Output token throughput (tok/s):         3063.47
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   145289.76
Median E2E Latency (ms):                 143439.03
---------------Time to First Token----------------
Mean TTFT (ms):                          60141.13
Median TTFT (ms):                        55335.66
P99 TTFT (ms):                           131151.83
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          742.79
Median TPOT (ms):                        549.99
P99 TPOT (ms):                           4744.62
---------------Inter-token Latency----------------
Mean ITL (ms):                           424.83
Median ITL (ms):                         237.24
P99 ITL (ms):                            1699.88
=================================================

@zhyncs zhyncs merged commit dc67d97 into main Sep 3, 2024
8 of 9 checks passed
@zhyncs zhyncs deleted the speedup branch September 3, 2024 18:29
@zhyncs
Copy link
Member Author

zhyncs commented Sep 3, 2024

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Co-authored-by: ispobock <ISPObaoke@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants