-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Core
Description
What happened + What you expected to happen
What happened
Ray throws an opaque error in place of the actual one. This is the error from Ray:
File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 45, in from_bytes
return RayError.from_ray_exception(ray_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 54, in from_ray_exception
raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception
What should have happened
Digging in with the Ray debugger, I catch the exception before Ray tries to deserialize it.
- I expect Ray to deserialize it.
- If Ray can't deserialize it, I expect it to still dump it in some way in the logs -- otherwise I have no means of fixing the issue!
The exception is of type tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError
:
Traceback (most recent call last):
File "python/ray/_raylet.pyx", line 1974, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1879, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1820, in ray._raylet.execute_task.function_executor
File "/usr/local/lib/python3.11/dist-packages/ray/_private/function_manager.py", line 696, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/util/tracing/tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<user frames>
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filey6_d4pwe.py", line 76, in tf__compute_gradients
ag__.if_stmt(ag__.ld(is_design_module), if_body_1, else_body_1, get_state_1, set_state_1, ('loss_value', 'preds'), 2)
File "/tmp/__autograph_generated_filey6_d4pwe.py", line 58, in if_body_1
loss_value = ag__.converted_call(ag__.ld(loss), (), dict(preds=ag__.ld(preds), inputs=ag__.ld(full_sample)), fscope)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/__autograph_generated_fileg_9kc5kn.py", line 31, in tf____call__
raise
File "/tmp/__autograph_generated_file6ahn6zy_.py", line 16, in tf__call
raise
tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError: in user code:
<user error>
Proposed fix
- In this case, dump a base64 version of the pickled error
- Why don't we use cloudpickle to deserialize more errors?
Similar issues
- [Core] ray raises a "Failed to unpickle serialized exception" error when an OpenAI Authentication Error is raised in task #43428
- [Core] Please provide better message where 'RuntimeError: Failed to unpickle serialized exception' #49885
Versions / Dependencies
- Ray 2.40.0
- Python 3.11.11
Reproduction script
Not sure how to produce an unserializable exception without Tensorflow.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
MichoChan
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Core