This issue is a follow up task from https://github.com/volcengine/verl/pull/1698 The performance is poor because every request need to call broadcast_pyobj together with async_generate. Need to find a way to call async_generate without broadcast_pyobj.