Add gen-shared-prefix dataset in bench_serving #1990

ByronHsu · 2024-11-11T00:27:16Z

Motivation

Add a new dataset gen-shared-prefix in bench_serving to evaluate the performance of the engine on the shared prefix generation task. The format of the requests is <system prompt> <question>. This benchmark is useful for evaluating cache-aware optimization like cache-aware DP.

Modifications

Add function to generate shared prefix requests
Add related arguments

Tests

Python --dp 4 (round robin)

python DP = 4 
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Successful requests:                     1024      
Benchmark duration (s):                  54.54     
Total input tokens:                      2326152   
Total generated tokens:                  262144    
Total generated tokens (retokenized):    253772    
Request throughput (req/s):              18.78     
Input token throughput (tok/s):          42651.75  
Output token throughput (tok/s):         4806.61   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   53560.65  
Median E2E Latency (ms):                 53362.96  
---------------Time to First Token----------------
Mean TTFT (ms):                          33504.17  
Median TTFT (ms):                        35524.99  
P99 TTFT (ms):                           36763.44  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          78.65     
Median TPOT (ms):                        70.23     
P99 TPOT (ms):                           180.28    
---------------Inter-token Latency----------------
Mean ITL (ms):                           78.92     
Median ITL (ms):                         68.45     
P99 ITL (ms):                            527.79    
==================================================

[2024-11-11 00:06:15 DP3 TP0] Prefill batch. #new-seq: 2, #new-token: 699, #cached-token: 3827, cache hit rate: 21.75%, token usage: 0.40, #running-req: 254, #queue-req: 1

Rust Approx Tree --dp 4

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Successful requests:                     1024      
Benchmark duration (s):                  28.27     
Total input tokens:                      2327209   
Total generated tokens:                  262144    
Total generated tokens (retokenized):    256144    
Request throughput (req/s):              36.23     
Input token throughput (tok/s):          82328.91  
Output token throughput (tok/s):         9273.78   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   26073.70  
Median E2E Latency (ms):                 27724.49  
---------------Time to First Token----------------
Mean TTFT (ms):                          7005.06   
Median TTFT (ms):                        7061.64   
P99 TTFT (ms):                           7995.75   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          74.78     
Median TPOT (ms):                        79.34     
P99 TPOT (ms):                           99.97     
---------------Inter-token Latency----------------
Mean ITL (ms):                           75.60     
Median ITL (ms):                         74.43     
P99 ITL (ms):                            94.90     
==================================================

[2024-11-11 00:12:58 TP0] Prefill batch. #new-seq: 27, #new-token: 3517, #cached-token: 57525, cache hit rate: 84.86%, token usage: 0.16, #running-req: 245, #queue-req: 1

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs

LGTM
BTW you can follow this https://github.com/sgl-project/sglang/blob/main/docs/references/contributor_guide.md#format-your-code to resolve the lint issue

zhyncs · 2024-11-11T00:34:54Z

python/sglang/bench_serving.py

@@ -627,6 +627,61 @@ def sample_random_requests(
    return input_requests


+def gen_prompt(tokenizer, token_num):
+    """Generate a random prompt of specified token length using tokenizer vocabulary."""
+    all_available_tokens = list(tokenizer.get_vocab().values())


Nit: Using tokenizer.get_vocab().values() may include special tokens, which could cause issues with the generated text.

zhyncs · 2024-11-11T00:37:55Z

The new benchmark dataset is very interesting. I think it can be merged soon, and perhaps Yichuan @yichuan520030910320 will also be interested.

zhyncs · 2024-11-11T00:39:52Z

This change does not affect other modules, so there is no need to run perf/accu/ut completely.

ByronHsu · 2024-11-11T00:46:04Z

Thanks @zhyncs! Note that Total input tokens may fluctuate a bit because len(tokenize(prompt + question)) != len(tokenize(prompt) + tokenize(question))

Add gen-shared-prefix dataset in bench_serving

10cf354

zhyncs approved these changes Nov 11, 2024

View reviewed changes

format

4f6a3e2

zhyncs merged commit 8169c6f into sgl-project:main Nov 11, 2024
1 of 12 checks passed

ByronHsu mentioned this pull request Dec 11, 2024

[Core] in batch prefix caching by delay scheduling #2442

Merged

3 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Add gen-shared-prefix dataset in bench_serving (sgl-project#1990)

68a3df7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gen-shared-prefix dataset in bench_serving #1990

Add gen-shared-prefix dataset in bench_serving #1990

Uh oh!

ByronHsu commented Nov 11, 2024 •

edited

Loading

Uh oh!

zhyncs left a comment

Uh oh!

zhyncs Nov 11, 2024

Uh oh!

zhyncs commented Nov 11, 2024

Uh oh!

zhyncs commented Nov 11, 2024

Uh oh!

Uh oh!

ByronHsu commented Nov 11, 2024

Uh oh!

Uh oh!

Add gen-shared-prefix dataset in bench_serving #1990

Add gen-shared-prefix dataset in bench_serving #1990

Uh oh!

Conversation

ByronHsu commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Tests

Checklist

Uh oh!

zhyncs left a comment

Choose a reason for hiding this comment

Uh oh!

zhyncs Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Nov 11, 2024

Uh oh!

zhyncs commented Nov 11, 2024

Uh oh!

Uh oh!

ByronHsu commented Nov 11, 2024

Uh oh!

Uh oh!

ByronHsu commented Nov 11, 2024 •

edited

Loading