[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling?

Hi, I am trying to understand the code. I would like to try RL training on tool calling in an interactive environment.

As I understand it, the reward is calculated by some custom reward function for a particular dataset. In other words, the flow of data during PPO is like this:

```mermaid
graph TD
   DatasetExample --> InferenceRollout --> RewardFunction --> UpdateGradients
``` 

But the inference step rollout here is a one-shot input/output function. If online tool calling was desired, we'd have to hook the llm.generate function here, right?

https://github.com/volcengine/verl/blob/main/verl/workers/rollout/vllm_rollout/vllm_rollout.py#L181

Then we could inject in function calling. But i'm confused because the [inference engine](https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/llm.py) is not an ordinary VLLM LLM class, but a subclass which monkey patches the output to return tensors instead of the normal VLLM output format.

So what would be the best way to add in dynamic function calling? Hook the [generate method of vLLM's LLM class](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L376), then call `LLM._post_process_output` to convert token_id and logprobs from VLLM into torch tensors at the very end?

Or is there an more obvious place to add in this feature?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling? #176

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling? #176

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions