-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Closed
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
In the current SGLang router implementation (written in Rust), we support:
- Regular routing strategies: cache-aware, random, and round-robin
- Prefill-decode (PD) disaggregated routing: random and power-of-two (Po2) based
Previously, incoming requests were deserialized from raw bytes into dictionaries (maps) to extract minimal fields (e.g., stream). However, with the addition of PD routing requirements, fields like bootstrap_port and bootstrap_room need to be injected into the request object. As a result, the router now deserializes the full request into a fully typed struct.
This shift raises performance concerns regarding deserialization overhead, especially under high QPS.
Goal
Evaluate and implement an optimized solution that balances:
- Performance overhead
- Code maintainability
- Flexibility for routing logic extensions
Task
- Benchmark and compare the following approaches:
- Full deserialization of typed request objects
- Partial deserialization (extract only required fields)
- Byte-based routing (minimal/no deserialization)
- Profile latency and CPU cost in each scenario (especially under load)
- Propose and implement a best-practice design based on findings: e.g., use partial deserialization for fast path (stream detection, method detection) and fallback to full deserialization only when needed (e.g., bootstrap injection)
Related resources
sample bootstrap injection
fn inject_bootstrap_fields(
&self,
json: &mut serde_json::Value,
prefill: &EngineInfo,
batch_size: Option<usize>,
) -> Result<(), String> {
let obj = json
.as_object_mut()
.ok_or("Request body is not a JSON object")?;
// Generate bootstrap room
let room_id = rand::random::<u64>();
match batch_size {
Some(n) => {
// Batch format
obj.insert(
"bootstrap_host".to_string(),
serde_json::json!(vec![prefill.url.as_str(); n]),
);
obj.insert(
"bootstrap_port".to_string(),
serde_json::json!(vec![prefill.bootstrap_port; n]),
);
obj.insert(
"bootstrap_room".to_string(),
serde_json::json!(vec![room_id; n]),
);
}
None => {
// Single format
obj.insert(
"bootstrap_host".to_string(),
serde_json::json!(prefill.url.as_str()),
);
obj.insert(
"bootstrap_port".to_string(),
serde_json::json!(prefill.bootstrap_port),
);
obj.insert("bootstrap_room".to_string(), serde_json::json!(room_id));
}
}
Ok(())
}
slin1237