Skip to content

[Core] Deserialize tensorflow MultilineMessageKeyError #50138

@Oblynx

Description

@Oblynx

What happened + What you expected to happen

What happened

Ray throws an opaque error in place of the actual one. This is the error from Ray:

  File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 45, in from_bytes
    return RayError.from_ray_exception(ray_exception)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 54, in from_ray_exception
   raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception

What should have happened

Digging in with the Ray debugger, I catch the exception before Ray tries to deserialize it.

  1. I expect Ray to deserialize it.
  2. If Ray can't deserialize it, I expect it to still dump it in some way in the logs -- otherwise I have no means of fixing the issue!

The exception is of type tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError:

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 1974, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1879, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1820, in ray._raylet.execute_task.function_executor
  File "/usr/local/lib/python3.11/dist-packages/ray/_private/function_manager.py", line 696, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ray/util/tracing/tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  <user frames>
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filey6_d4pwe.py", line 76, in tf__compute_gradients
    ag__.if_stmt(ag__.ld(is_design_module), if_body_1, else_body_1, get_state_1, set_state_1, ('loss_value', 'preds'), 2)
  File "/tmp/__autograph_generated_filey6_d4pwe.py", line 58, in if_body_1
    loss_value = ag__.converted_call(ag__.ld(loss), (), dict(preds=ag__.ld(preds), inputs=ag__.ld(full_sample)), fscope)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/__autograph_generated_fileg_9kc5kn.py", line 31, in tf____call__
    raise
  File "/tmp/__autograph_generated_file6ahn6zy_.py", line 16, in tf__call
    raise
tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError: in user code:
  <user error>

Proposed fix

  1. In this case, dump a base64 version of the pickled error
  2. Why don't we use cloudpickle to deserialize more errors?

Similar issues

Versions / Dependencies

  • Ray 2.40.0
  • Python 3.11.11

Reproduction script

Not sure how to produce an unserializable exception without Tensorflow.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray Core

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions