-
Notifications
You must be signed in to change notification settings - Fork 129
Description
User Simulation for Multi-Turn RL
We propose adding a user simulator to an RL framework to create realistic, varied user interactions for better training and testing.
Goals
Simulate real user behaviors (dialogue, feedback, navigation).
Provide a simple API for interaction steps and episode control.
Support different user types and random behaviors.
Integrate smoothly as the RL environment, giving observations and rewards.
Log interactions and performance data.
Design
Use rule-based or learned models for user actions.
RL agent acts → simulator responds with user action, reward, and state → repeat until done.
Rewards reflect user satisfaction; states capture context and history.
Applications
Train dialogue and recommendation systems.
Test interactive systems automatically.
Create safe, cost-effective RL training without real users.
Benefits
Faster, repeatable RL training.
Less reliance on real user data, protecting privacy.
More robust RL policies via diverse simulations.
Enables quick experiments and comparisons.