Skip to content

Using FP8 inference in long response vllm rollouts  #1803

@vadimkantorov

Description

@vadimkantorov

Hi! In our experience, vllm rollouts use 70% of the grpo iteration time (when performed in bf16)

Has anyone tried using more aggressive precision reduction for rollouts (like using FP8) for gaining speed? I wonder if some okay methods exist for online FP8 usage for gaining speed in this rollout phase

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions