Operating System:
Ubuntu 22.04
ROS version or commit hash:
humble
Steps to reproduce issue
- Run the Autoware with rmw_zenoh on https://autowarefoundation.github.io/autoware-documentation/main/tutorials/ad-hoc-simulation/planning-simulation/
- After giving an initial pose, we can set the goal successfully once
- However, it's stuck if we set it twice.
More detail log is here
Expected behavior
Able to plan the route several times
Actual behavior
Failed to plan the route
Additional information
After some investigation, we found that the issue is related to thread-safe issue in GuardCondition.
If we make the composable node in Autoware running with only 1 thread, it works fine.
If we don't release the guard_condition->data, it also works fine.
|
allocator->deallocate(guard_condition->data, allocator->state); |
I think it's use-after-free error. When guard_condition->data is still using inside rmw_wait, another thread calls rmw_destroy_guard_condition and caused the problem.
I've tried both ways to solve the issues:
- Wait until rmw_wait exits to destroy the GuardCondition
humble...evshary:rmw_zenoh:wait_fix
- Record all the addresses and ignore them while using the invalid one in rmw_wait
humble...evshary:rmw_zenoh:ignore_invalid_address
Both approaches can solve the issue, but the second might run into trouble sometimes. (about once every ten times)
[component_container_mt-34] terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
[component_container_mt-34] what(): failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460
I suppose that guard condition might be used even after rmw_wait. I would like to know more about your opinion on how to deal with this issue.
Operating System:
Ubuntu 22.04
ROS version or commit hash:
humble
Steps to reproduce issue
More detail log is here
Expected behavior
Able to plan the route several times
Actual behavior
Failed to plan the route
Additional information
After some investigation, we found that the issue is related to thread-safe issue in GuardCondition.
If we make the composable node in Autoware running with only 1 thread, it works fine.
If we don't release the
guard_condition->data, it also works fine.rmw_zenoh/rmw_zenoh_cpp/src/rmw_zenoh.cpp
Line 1994 in 2c91032
I think it's use-after-free error. When
guard_condition->datais still using inside rmw_wait, another thread callsrmw_destroy_guard_conditionand caused the problem.I've tried both ways to solve the issues:
humble...evshary:rmw_zenoh:wait_fix
humble...evshary:rmw_zenoh:ignore_invalid_address
Both approaches can solve the issue, but the second might run into trouble sometimes. (about once every ten times)
I suppose that guard condition might be used even after rmw_wait. I would like to know more about your opinion on how to deal with this issue.