-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Description
System version: Gcloud debian 11
Cpu: C3 8vCPU
Memory: 64 GB
Software version1:
Python 3.9.2
numpy 1.26.0
tensorflow 2.14.0
open3d 0.17.0
opencv-python-headless 4.8.1.78
Trace back:
Traceback (most recent call last):
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 141, in <module>
app.run(main)
File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 124, in main
train_model.train_model(
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
train_step(train_iter)
File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.AbortedError: Graph execution error:
Detected at node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax defined at (most recent call last):
<stack traces unavailable>
Input dims must be <= 5 and >=1
[[{{node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax}}]] [Op:__inference_train_step_67952]
When I try to run 'train.py', it happens. I think it might be caused by dataload or preprocessed data itself.
I am still trying to fix it......
Software version2
Python == 3.8
tensorflow == 2.13
the behavior changes.
ARNING:tensorflow:From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
W1016 02:08:06.098888 140408121303680 deprecation.py:364] From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 4645555079012573616
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
......
......
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 6459351093901513222
I1016 02:08:23.444825 140408121303680 train_model.py:151] Model created on device.
2023-10-16 02:08:23.611301: W tensorflow/core/framework/dataset.cc:956] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I1016 02:08:24.023096 140408121303680 train_model.py:245] Beginning training.
Traceback (most recent call last):
File "train.py", line 141, in <module>
app.run(main)
File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "train.py", line 124, in main
train_model.train_model(
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
train_step(train_iter)
File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_fileq49vjt2f.py", line 93, in tf__train_step
ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).constant, (ag__.ld(train_params).batches_per_train_step,), None, fscope),), None, fscope), None, loop_body_1, get_state_4, set_state_4, (), {'iterate_names': '_'})
File "/tmp/__autograph_generated_fileq49vjt2f.py", line 91, in loop_body_1
ag__.converted_call(ag__.ld(strategy).run, (ag__.ld(step_fn),), dict(args=(ag__.converted_call(ag__.ld(next), (ag__.ld(iterator),), None, fscope),), options=ag__.converted_call(ag__.ld(tf).distribute.RunOptions, (), dict(experimental_enable_dynamic_batch_size=False), fscope)), fscope)
File "/tmp/__autograph_generated_fileq49vjt2f.py", line 21, in step_fn
loss_dict = ag__.converted_call(ag__.ld(loss_obj), (ag__.ld(output_batch), ag__.ld(predictions)), None, fscope_1)
File "/tmp/__autograph_generated_file4bviil8d.py", line 12, in tf____call__
retval_ = ag__.converted_call(ag__.ld(self).call, (ag__.ld(input_batch), ag__.ld(predictions)), None, fscope)
File "/tmp/__autograph_generated_filevzzr7lrl.py", line 13, in tf__call
loss_dict = (ag__.ld(position_loss) | ag__.ld(mixture_loss))
TypeError: in user code:
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 184, in step_fn *
loss_dict = loss_obj(output_batch, predictions)
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 37, in __call__ *
return self.call(input_batch, predictions)
File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 456, in call *
loss_dict = position_loss | mixture_loss
TypeError: unsupported operand type(s) for |: 'dict' and 'dict'
Metadata
Metadata
Assignees
Labels
No labels