Skip to content

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling? #176

@accupham

Description

@accupham

Hi, I am trying to understand the code. I would like to try RL training on tool calling in an interactive environment.

As I understand it, the reward is calculated by some custom reward function for a particular dataset. In other words, the flow of data during PPO is like this:

graph TD
   DatasetExample --> InferenceRollout --> RewardFunction --> UpdateGradients
Loading

But the inference step rollout here is a one-shot input/output function. If online tool calling was desired, we'd have to hook the llm.generate function here, right?

https://github.com/volcengine/verl/blob/main/verl/workers/rollout/vllm_rollout/vllm_rollout.py#L181

Then we could inject in function calling. But i'm confused because the inference engine is not an ordinary VLLM LLM class, but a subclass which monkey patches the output to return tensors instead of the normal VLLM output format.

So what would be the best way to add in dynamic function calling? Hook the generate method of vLLM's LLM class, then call LLM._post_process_output to convert token_id and logprobs from VLLM into torch tensors at the very end?

Or is there an more obvious place to add in this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions