Skip to content

tf Softmax input dimension error #11

@AlfredMoore

Description

@AlfredMoore

System version: Gcloud debian 11
Cpu: C3 8vCPU
Memory: 64 GB

Software version1:

Python 3.9.2
numpy                         1.26.0
tensorflow                    2.14.0
open3d                        0.17.0
opencv-python-headless        4.8.1.78

Trace back:

Traceback (most recent call last):
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.AbortedError: Graph execution error:

Detected at node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax defined at (most recent call last):
<stack traces unavailable>
Input dims must be <= 5 and >=1
         [[{{node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax}}]] [Op:__inference_train_step_67952]

When I try to run 'train.py', it happens. I think it might be caused by dataload or preprocessed data itself.
I am still trying to fix it......

Software version2

Python == 3.8
tensorflow == 2.13

the behavior changes.

ARNING:tensorflow:From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
W1016 02:08:06.098888 140408121303680 deprecation.py:364] From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 4645555079012573616
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
......
......
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 6459351093901513222
I1016 02:08:23.444825 140408121303680 train_model.py:151] Model created on device.
2023-10-16 02:08:23.611301: W tensorflow/core/framework/dataset.cc:956] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I1016 02:08:24.023096 140408121303680 train_model.py:245] Beginning training.
Traceback (most recent call last):
  File "train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 93, in tf__train_step
    ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).constant, (ag__.ld(train_params).batches_per_train_step,), None, fscope),), None, fscope), None, loop_body_1, get_state_4, set_state_4, (), {'iterate_names': '_'})
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 91, in loop_body_1
    ag__.converted_call(ag__.ld(strategy).run, (ag__.ld(step_fn),), dict(args=(ag__.converted_call(ag__.ld(next), (ag__.ld(iterator),), None, fscope),), options=ag__.converted_call(ag__.ld(tf).distribute.RunOptions, (), dict(experimental_enable_dynamic_batch_size=False), fscope)), fscope)
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 21, in step_fn
    loss_dict = ag__.converted_call(ag__.ld(loss_obj), (ag__.ld(output_batch), ag__.ld(predictions)), None, fscope_1)
  File "/tmp/__autograph_generated_file4bviil8d.py", line 12, in tf____call__
    retval_ = ag__.converted_call(ag__.ld(self).call, (ag__.ld(input_batch), ag__.ld(predictions)), None, fscope)
  File "/tmp/__autograph_generated_filevzzr7lrl.py", line 13, in tf__call
    loss_dict = (ag__.ld(position_loss) | ag__.ld(mixture_loss))
TypeError: in user code:

    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 184, in step_fn  *
        loss_dict = loss_obj(output_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 37, in __call__  *
        return self.call(input_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 456, in call  *
        loss_dict = position_loss | mixture_loss

    TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions