Basic Tutorial: Adding a New LLM Inference/Serving Backend

1. **Prerequisite:** Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by `torchrun --standalone --nproc=8 offline_inference.py`
2. **A Rollout class:** Build a `xxx_rollout.py` script similar to `vllm_rollout.py`. In this file, define a `xxxRollout` class that inherits from `BaseRollout`.
	1. This class should have a `generate_sequence` API that accepts a batch of `input_ids`, `response_masks`, and `position_ids` from the `DataProto` as input. The `self.inference_engine` (e.g., `LLMEngine` in vLLM) is responsible for performing auto-regressive generation and outputting a batch of responses. These responses should then be concatenated with `input_ids`, and the `response_masks` and `position_ids` should be reconstructed accordingly.
3. **ShardingManager Classes for Weight Synchronization with Training Frameworks:** Create files named `fsdp_xxx.py` and `megatron_xxx.py`, similar to `fsdp_vllm.py` and `megatron_vllm.py`. These files should define `XXXShardingManager` classes (i.e., HybridEngine) that handle weight sharding between the training and inference frameworks.
	1. In `megatron_vllm.py`, we define an `AllGatherPPModel` class to collect weights across the pipeline parallelism dimension. The parameters stored in the `memory_buffers` of `AllGatherPPModel` will be used to synchronize the weights with the models in the vLLM rollout.
4. **Weight loading issues:** It may be necessary to provide specific weight loaders for transferring weights between different LLM Inference and Training backends for each model. This is similar to the `dtensor_weight_loader.py` and `megatron_weight_loader.py` files in vLLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Basic Tutorial: Adding a New LLM Inference/Serving Backend #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions