Skip to content

Conversation

cermeng
Copy link
Contributor

@cermeng cermeng commented Nov 23, 2024

Motivation

There could be a load balancer above the server to control the request traffic since the server can't reject requests. This pr can simulate this situation. Code borrowed from vllm-project/vllm#9390

Modifications

  • add an option --max-concurrency to bench_serving.py
  • make sure there will not exceed max-concurrency requests coming to the server concurrently

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs requested a review from merrymercy November 23, 2024 08:38
@zhyncs zhyncs merged commit 60769be into sgl-project:main Nov 23, 2024
@zhyncs
Copy link
Member

zhyncs commented Nov 23, 2024

I'll fix the SimpleNamespace issue cc @merrymercy @cermeng

@zhyncs zhyncs mentioned this pull request Nov 23, 2024
3 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants