Skip to content

[RFC] User Simulation For Multi-Turn Rollout #4

@JiwenJ

Description

@JiwenJ

User Simulation for Multi-Turn RL

We propose adding a user simulator to an RL framework to create realistic, varied user interactions for better training and testing.

Goals

Simulate real user behaviors (dialogue, feedback, navigation).

Provide a simple API for interaction steps and episode control.

Support different user types and random behaviors.

Integrate smoothly as the RL environment, giving observations and rewards.

Log interactions and performance data.

Design

Use rule-based or learned models for user actions.

RL agent acts → simulator responds with user action, reward, and state → repeat until done.

Rewards reflect user satisfaction; states capture context and history.

Applications

Train dialogue and recommendation systems.

Test interactive systems automatically.

Create safe, cost-effective RL training without real users.

Benefits

Faster, repeatable RL training.

Less reliance on real user data, protecting privacy.

More robust RL policies via diverse simulations.

Enables quick experiments and comparisons.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions