Bug report
Steps to reproduce
- Launch a gateway with at least two synthetic / runtime-discovered components (any deployment without a manifest declaring the components is enough).
- Trigger a fault on an app hosted by component A (for example by injecting a fault from a sensor node).
- Query the same fault code on a different synthetic component B that does not host that app:
curl -s "localhost:8080/api/v1/components/$COMP_B_ID/faults/$FAULT_CODE"
Expected behavior
404 Resource Not Found (or similar) - the fault was not reported by any app hosted by component B, so the request should not return a body containing it.
Actual behavior
200 OK with the fault body that was reported by component A. Cross-entity scope leak: any client with read access to component B can pull faults that were reported by entirely different components, as long as the fault_code matches.
Root cause
FaultHandlers::handle_get_fault (src/ros2_medkit_gateway/src/http/handlers/fault_handlers.cpp:631) passes entity_info.namespace_path as source_id to FaultManager::get_fault_with_env. The match logic in fault_manager.cpp:280 is:
if (!source_id.empty()) {
bool matches = false;
for (const auto & src : result.fault.reporting_sources) {
if (src.rfind(source_id, 0) == 0) { matches = true; break; }
}
if (!matches) {
result.success = false;
result.error_message = "Fault not found for source: " + source_id;
...
}
}
Synthetic components have empty fqn and empty namespace_path. The guard if (!source_id.empty()) short-circuits, the reporting_sources check is skipped entirely, and the manager returns the raw fault without any scope filter. The handler then forwards it to the client.
Manifest components with a non-empty namespace_path work correctly because the prefix match against reporting_sources still runs.
Severity / class
Different class from #393 - that one is a false negative (synthetic components return empty results from /logs and /bulk-data). This one is a false positive / scope leak: a synthetic component returns faults that belong to other entities. Potentially a security-relevant boundary issue if any RBAC role allows viewer access to specific components but not others.
handle_clear_fault (DELETE per-entity) takes the same path and is presumably leaky in the same way - clearing a fault scoped to component B may delete a fault reported by component A. Worth verifying once the fix is in flight.
Why the simple "iterate hosted apps" pattern from #393 does not work here
FaultManager::get_fault_with_env(fault_code, source_id) accepts a single source_id string and runs prefix-match against fault.reporting_sources. To fix this properly the manager (and likely the underlying GetFault ROS service contract in ros2_medkit_msgs) needs to accept a list of allowed source FQNs and match against any of them, with exact match instead of prefix match. That is a service-contract change with downstream impact on every generated client (Python MCP, Rust selfpatch CLI, etc.).
Related
Environment
- ros2_medkit version: main
- ROS 2 distro: any (Jazzy / Humble / Rolling)
- OS: any
Additional information
Suggested approach (subject to design discussion on #380):
- Extend
GetFault.srv request with an optional string[] allowed_sources field (backward-compatible default empty = no scope filter).
- Change
FaultManager::get_fault_with_env signature to accept std::vector<std::string> allowed_sources.
- In
handle_get_fault and handle_clear_fault, build the list via cache.get_apps_for_component(entity_id) + effective_fqn() and pass exact-match FQNs.
- Drop the
if (!source_id.empty()) guard - if the caller passes an empty list, treat that as the explicit "no scope filter" choice (only valid for global handlers, never for per-entity routes).
Bug report
Steps to reproduce
Expected behavior
404 Resource Not Found(or similar) - the fault was not reported by any app hosted by component B, so the request should not return a body containing it.Actual behavior
200 OKwith the fault body that was reported by component A. Cross-entity scope leak: any client with read access to component B can pull faults that were reported by entirely different components, as long as the fault_code matches.Root cause
FaultHandlers::handle_get_fault(src/ros2_medkit_gateway/src/http/handlers/fault_handlers.cpp:631) passesentity_info.namespace_pathassource_idtoFaultManager::get_fault_with_env. The match logic infault_manager.cpp:280is:Synthetic components have empty
fqnand emptynamespace_path. The guardif (!source_id.empty())short-circuits, thereporting_sourcescheck is skipped entirely, and the manager returns the raw fault without any scope filter. The handler then forwards it to the client.Manifest components with a non-empty
namespace_pathwork correctly because the prefix match againstreporting_sourcesstill runs.Severity / class
Different class from #393 - that one is a false negative (synthetic components return empty results from
/logsand/bulk-data). This one is a false positive / scope leak: a synthetic component returns faults that belong to other entities. Potentially a security-relevant boundary issue if any RBAC role allowsvieweraccess to specific components but not others.handle_clear_fault(DELETE per-entity) takes the same path and is presumably leaky in the same way - clearing a fault scoped to component B may delete a fault reported by component A. Worth verifying once the fix is in flight.Why the simple "iterate hosted apps" pattern from #393 does not work here
FaultManager::get_fault_with_env(fault_code, source_id)accepts a singlesource_idstring and runs prefix-match againstfault.reporting_sources. To fix this properly the manager (and likely the underlyingGetFaultROS service contract inros2_medkit_msgs) needs to accept a list of allowed source FQNs and match against any of them, with exact match instead of prefix match. That is a service-contract change with downstream impact on every generated client (Python MCP, Rust selfpatch CLI, etc.).Related
reporting_sourcesinstead of explicit entity id) - see comment on #380 for the cross-link. Whichever option lands on Global /faults/stream SSE events lack entity context for downstream consumers #380 should also unblock the proper fix here./logsand/bulk-data, same synthetic-component root cause but different fix).Environment
Additional information
Suggested approach (subject to design discussion on #380):
GetFault.srvrequest with an optionalstring[] allowed_sourcesfield (backward-compatible default empty = no scope filter).FaultManager::get_fault_with_envsignature to acceptstd::vector<std::string> allowed_sources.handle_get_faultandhandle_clear_fault, build the list viacache.get_apps_for_component(entity_id)+effective_fqn()and pass exact-match FQNs.if (!source_id.empty())guard - if the caller passes an empty list, treat that as the explicit "no scope filter" choice (only valid for global handlers, never for per-entity routes).