-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Bug report
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- Binaries, from source
- Version or commit hash:
- Binaries: 1.2.1-1focal.20201007.210239
- Source: f54c74b
- DDS implementation:
- rmw_fastrtps_cpp
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
- Check out the rmf_core repository into a workspace: https://github.com/osrf/rmf_core
- Switch to the
fastdds_segfaults
branch - Compile the
rmf_fleet_adapter
package:rosdep install --from-paths src --ignore-src -yr colcon build --packages-up-to rmf_fleet_adapter
- Execute the small program that reliably triggers the segmentation fault
source install/setup.bash ./build/rmf_fleet_adapter/segfaulter
Expected behavior
The sample program completes successfully without any errors.
Actual behavior
The sample program, in most iterations after the first couple, either fails to delete a wait set or causes segmentation faults in rmw_fastrtps_cpp
code.
Example output:
$ ./build/rmf_fleet_adapter/segfaulter
0
[INFO] [1604462249.727237249] [test_node_0]: Added a robot named [test_robot] with participant ID [0]
1
[INFO] [1604462251.764308128] [test_node_1]: Added a robot named [test_robot] with participant ID [0]
2
[INFO] [1604462253.797161121] [test_node_2]: Added a robot named [test_robot] with participant ID [0]
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
3
[INFO] [1604462255.833644587] [test_node_3]: Added a robot named [test_robot] with participant ID [0]
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
4
[INFO] [1604462257.876494584] [test_node_4]: Added a robot named [test_robot] with participant ID [0]
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
5
[INFO] [1604462259.924830354] [test_node_5]: Added a robot named [test_robot] with participant ID [0]
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
6
[INFO] [1604462261.974901964] [test_node_6]: Added a robot named [test_robot] with participant ID [0]
7
[INFO] [1604462264.032900604] [test_node_7]: Added a robot named [test_robot] with participant ID [0]
8
[INFO] [1604462266.085180435] [test_node_8]: Added a robot named [test_robot] with participant ID [0]
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
"/home/geoff/src/workspaces/ros2_foxy_debug/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:150"failed to destroy wait set": ros discovery info listener thread will shutdown ...
zsh: segmentation fault (core dumped) ./build/rmf_fleet_adapter/segfaulter
Additional information
We have traced both errors to the node_listener
function in listen_thread.cpp
.
For the wait set deletion failure, the error occurs when the context
is deallocated and a new one allocated in the same memory before the node_listen
function returns. It tries to delete a wait set pointer that is null, and the null pointer check in rmw_fastrtps_shared_cpp::__rmw_destroy_wait_set
catches the null pointer and returns an error, triggering the error message.
The segmentation fault has a similar cause. The context
is deallocated and a new one allocated in the same memory. This time it tries to use a member of the zero-initialised context, which is a null pointer, which triggers a segmentation fault.
In both cases, we have not been able to trace where the context is being overwritten. Both errors appear to be race conditions, and as far as we can tell they are occurring inside the rmw_fastrtps_cpp
code.
The sample program is a cut-down version of a test we have that used to work on the version of Fast RTPS that was in Eloquent, and started failing with the shift to Fast DDS in Foxy. It starts up several threads to handle messages in ROS at the rclcpp
level, and the test itself hammers the ROS initialisation and finalisation machinery, creating and destroying contexts constantly and rapidly.