Skip to content

Thread-safe issue on GuardCondition #608

@evshary

Description

@evshary

Operating System:

Ubuntu 22.04

ROS version or commit hash:

humble

Steps to reproduce issue

  1. Run the Autoware with rmw_zenoh on https://autowarefoundation.github.io/autoware-documentation/main/tutorials/ad-hoc-simulation/planning-simulation/
  2. After giving an initial pose, we can set the goal successfully once
  3. However, it's stuck if we set it twice.

More detail log is here

Expected behavior

Able to plan the route several times

Actual behavior

Failed to plan the route

Additional information

After some investigation, we found that the issue is related to thread-safe issue in GuardCondition.
If we make the composable node in Autoware running with only 1 thread, it works fine.
If we don't release the guard_condition->data, it also works fine.

allocator->deallocate(guard_condition->data, allocator->state);

I think it's use-after-free error. When guard_condition->data is still using inside rmw_wait, another thread calls rmw_destroy_guard_condition and caused the problem.
I've tried both ways to solve the issues:

  1. Wait until rmw_wait exits to destroy the GuardCondition
    humble...evshary:rmw_zenoh:wait_fix
  2. Record all the addresses and ignore them while using the invalid one in rmw_wait
    humble...evshary:rmw_zenoh:ignore_invalid_address

Both approaches can solve the issue, but the second might run into trouble sometimes. (about once every ten times)

[component_container_mt-34] terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'                                                                                                   
[component_container_mt-34]   what():  failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460

I suppose that guard condition might be used even after rmw_wait. I would like to know more about your opinion on how to deal with this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions