Skip to content

Conversation

chenhaiq
Copy link
Collaborator

@chenhaiq chenhaiq commented May 29, 2025

Checklist Before Starting

  • [ done ] Search for similar PR(s).

What does this PR do?

fix a bug when register async method to fsdp worker.

When use async method in fsdp worker, it fails with:

>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(TypeError): ray::WorkerDict.critic_sub() (pid=232160, ip=192.168.111.50, actor_id=ca29f2b51caa8e56243d6b8e01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f8c50729270>)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
E                           cp.dump(obj)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
E                           return super().dump(obj)
E                       TypeError: cannot pickle 'coroutine' object

/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:919: RayTaskError(TypeError)

You can reproduce this error in tests/ray_gpu/test_colocated_workers.py with async method.

High-Level Design

wrap async method if the original method is coroutine

Specific Changes

changed _bind_workers_method_to_parent

API

n\a

Usage Example

tests/ray_gpu/test_colocated_workers.py

Test

tests/ray_gpu/test_colocated_workers.py

Additional Info.

Checklist Before Submitting

  • [done ] Read the Contribute Guide.
  • [ done] Apply pre-commit checks.
  • [ done] Add [BREAKING] to the PR title if it breaks any API.
  • [ done] Update the documentation about your changes in the docs.
  • [ done] Add CI test(s) if necessary.

@chenhaiq chenhaiq requested a review from wuxibin89 May 29, 2025 01:58
@wuxibin89 wuxibin89 changed the title fix error when bind async method in create_colocated_worker [ray] fix: error when bind async method in create_colocated_worker May 29, 2025
@wuxibin89 wuxibin89 merged commit bb4f97b into volcengine:main May 29, 2025
34 of 35 checks passed
@chenhaiq chenhaiq deleted the fix_ray_bind_async_method branch May 29, 2025 05:51
wwwjn pushed a commit to wwwjn/verl that referenced this pull request Jun 10, 2025
…olcengine#1745)

### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

fix a bug when register async method to fsdp worker.

When use async method in fsdp worker, it fails with:
```
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(TypeError): ray::WorkerDict.critic_sub() (pid=232160, ip=192.168.111.50, actor_id=ca29f2b51caa8e56243d6b8e01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f8c50729270>)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
E                           cp.dump(obj)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
E                           return super().dump(obj)
E                       TypeError: cannot pickle 'coroutine' object
```
/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:919:
RayTaskError(TypeError)

You can reproduce this error in tests/ray_gpu/test_colocated_workers.py
with async method.

### High-Level Design

wrap async method if the original method is coroutine

### Specific Changes

changed _bind_workers_method_to_parent

### API

n\a

### Usage Example

tests/ray_gpu/test_colocated_workers.py


### Test

tests/ray_gpu/test_colocated_workers.py

### Additional Info.

- **Issue Number**: required by
volcengine#1721

### Checklist Before Submitting

- [done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ done] Add `[BREAKING]` to the PR title if it breaks any API.
- [ done] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ done] Add CI test(s) if necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants