tf Softmax input dimension error

System version: Gcloud debian 11
Cpu: C3 8vCPU
Memory: 64 GB

Software version1:
------
```bash
Python 3.9.2
numpy                         1.26.0
tensorflow                    2.14.0
open3d                        0.17.0
opencv-python-headless        4.8.1.78
```
Trace back: 
```bash
Traceback (most recent call last):
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.AbortedError: Graph execution error:

Detected at node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax defined at (most recent call last):
<stack traces unavailable>
Input dims must be <= 5 and >=1
         [[{{node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax}}]] [Op:__inference_train_step_67952]
```

When I try to run 'train.py', it happens. I think it might be caused by dataload or preprocessed data itself.
I am still trying to fix it......


Software version2
------
```bash
Python == 3.8
tensorflow == 2.13
```
the behavior changes.
```bash
ARNING:tensorflow:From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
W1016 02:08:06.098888 140408121303680 deprecation.py:364] From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 4645555079012573616
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
......
......
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 6459351093901513222
I1016 02:08:23.444825 140408121303680 train_model.py:151] Model created on device.
2023-10-16 02:08:23.611301: W tensorflow/core/framework/dataset.cc:956] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I1016 02:08:24.023096 140408121303680 train_model.py:245] Beginning training.
Traceback (most recent call last):
  File "train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 93, in tf__train_step
    ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).constant, (ag__.ld(train_params).batches_per_train_step,), None, fscope),), None, fscope), None, loop_body_1, get_state_4, set_state_4, (), {'iterate_names': '_'})
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 91, in loop_body_1
    ag__.converted_call(ag__.ld(strategy).run, (ag__.ld(step_fn),), dict(args=(ag__.converted_call(ag__.ld(next), (ag__.ld(iterator),), None, fscope),), options=ag__.converted_call(ag__.ld(tf).distribute.RunOptions, (), dict(experimental_enable_dynamic_batch_size=False), fscope)), fscope)
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 21, in step_fn
    loss_dict = ag__.converted_call(ag__.ld(loss_obj), (ag__.ld(output_batch), ag__.ld(predictions)), None, fscope_1)
  File "/tmp/__autograph_generated_file4bviil8d.py", line 12, in tf____call__
    retval_ = ag__.converted_call(ag__.ld(self).call, (ag__.ld(input_batch), ag__.ld(predictions)), None, fscope)
  File "/tmp/__autograph_generated_filevzzr7lrl.py", line 13, in tf__call
    loss_dict = (ag__.ld(position_loss) | ag__.ld(mixture_loss))
TypeError: in user code:

    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 184, in step_fn  *
        loss_dict = loss_obj(output_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 37, in __call__  *
        return self.call(input_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 456, in call  *
        loss_dict = position_loss | mixture_loss

    TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tf Softmax input dimension error #11

Software version1:

Software version2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tf Softmax input dimension error #11

Description

Software version1:

Software version2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions