Skip to content

Issue #1670: Integrate BT pick/place with GraspGen-first grasping#1975

Open
KrishnaH96 wants to merge 1 commit intodimensionalOS:mainfrom
KrishnaH96:feature/issue-1670-bt-graspgen-pick-place
Open

Issue #1670: Integrate BT pick/place with GraspGen-first grasping#1975
KrishnaH96 wants to merge 1 commit intodimensionalOS:mainfrom
KrishnaH96:feature/issue-1670-bt-graspgen-pick-place

Conversation

@KrishnaH96
Copy link
Copy Markdown

Issue #1670 PR Draft: Behavior-Tree Pick/Place Integration

Problem Statement

The current pick/place path in PickAndPlaceModule is usable, but the control
flow is mostly monolithic. Perception, grasp selection, planning, execution,
gripper checks, retry behavior, and recovery are folded into one procedural
skill path.

That makes real-robot debugging harder. When a pick fails, the final return
string often does not tell us cleanly which stage failed:

  • Was the object missing from perception?
  • Did GraspGen or pointcloud lookup fail?
  • Did heuristic fallback run?
  • Were all candidates filtered by workspace constraints?
  • Did IK fail?
  • Did path planning fail?
  • Did execution fail?
  • Did the gripper close empty?

Issue #1670 asks for a stronger manipulation pipeline by combining:

  • behavior-tree orchestration,
  • GraspGen-first grasp generation,
  • heuristic fallback,
  • retry/recovery structure,
  • and the current agent-facing PickAndPlaceModule interface.

The goal of this PR is not to introduce a second manipulation stack. The goal is
to keep the existing public skills and make the internals more structured,
testable, and debuggable.

What This PR Does

This PR integrates py_trees behavior-tree orchestration inside the existing
PickAndPlaceModule.

The public agent-facing API remains unchanged:

  • scan_objects
  • pick
  • place
  • place_back
  • pick_and_place
  • get_scene_info

Internally:

  • pick() builds and ticks a pick behavior tree.
  • place() builds and ticks a place behavior tree.
  • GraspGen is attempted first by default.
  • Heuristic top-down grasps remain as fallback.
  • bt_enable_graspgen=False can be used locally for hardware bring-up to
    isolate BT motion/retry/gripper behavior from Docker/model/pointcloud grasp
    generation.
  • The tree reports success/failure through the same string-style skill contract
    expected by existing agents.

Default behavior is GraspGen-first:

bt_enable_graspgen: bool = True

This matters because the normal PR behavior still matches the issue goal:

GraspGen first -> heuristic fallback -> same pick/place skill API

Why This Architecture

The previous BT direction introduced a separate public BT manipulation module
and separate blueprint path. That made the review and runtime story larger:

  • two public manipulation surfaces,
  • more blueprint wiring,
  • more RPC indirection,
  • more opportunity for drift from the current PickAndPlaceModule.

This PR intentionally avoids that split.

The behavior tree is an internal scheduler. BT leaves call the existing module
primitives directly:

  • refresh_obstacles
  • _find_object_in_detections
  • generate_grasps
  • plan_to_pose
  • _preview_execute_wait
  • _set_gripper_position
  • cancel
  • reset

This keeps the surface area stable for agents while making the internal pick and
place phases explicit.

How It Works

Files Added

dimos/manipulation/bt/__init__.py
dimos/manipulation/bt/actions.py
dimos/manipulation/bt/conditions.py
dimos/manipulation/bt/trees.py
dimos/manipulation/bt/test_pick_place_bt.py

Files Modified

dimos/manipulation/pick_and_place_module.py
dimos/conftest.py
dimos/msgs/sensor_msgs/PointCloud2.py
dimos/msgs/sensor_msgs/test_PointCloud2.py
dimos/protocol/pubsub/encoders.py
dimos/manipulation/blueprints.py
pyproject.toml
uv.lock

Pick Skill Flow

At the public skill level, the shape is still simple:

pick(object_name, object_id, robot_name)
  -> validate robot
  -> build_pick_tree(...)
  -> _tick_bt_tree(root)
  -> return result string or error string

The behavior tree expands that into named phases:

Pick [Sequence]
├── EnsureReady [Selector]
│   ├── RobotIsHealthy
│   └── ResetAndVerify
│       ├── ResetRobot
│       └── RobotIsHealthyAfterReset
├── PickWithRescan [RetryOnFailure]
│   └── ScanAndGrasp [Sequence]
│       ├── ClearGraspState
│       ├── PrepareDetections [Selector]
│       │   ├── HasTargetDetection
│       │   └── RefreshTargetSnapshot
│       │       ├── ClearPerceptionObstacles
│       │       └── EnsureScanned
│       │           ├── HasDetections
│       │           └── ScanObjects
│       ├── FindObject
│       ├── GenerateGraspCandidates
│       │   ├── GraspGenCandidates
│       │   │   ├── GetObjectPointcloud
│       │   │   ├── GetScenePointcloud
│       │   │   └── GenerateGrasps
│       │   └── HeuristicGrasps
│       ├── FilterGraspWorkspace
│       └── GraspWithRetry [RetryOnFailure]
│           └── AttemptOrRecover [Selector]
│               ├── GraspAttempt
│               │   ├── SelectNextGrasp
│               │   ├── ComputePreGrasp
│               │   ├── OpenGripper
│               │   ├── PlanToPreGrasp
│               │   ├── ExecuteApproach
│               │   ├── PlanToGrasp
│               │   ├── ExecuteGrasp
│               │   ├── CloseGripper
│               │   ├── VerifyGrasp
│               │   ├── ComputeLiftPose
│               │   ├── PlanLift
│               │   ├── ExecuteLift
│               │   └── VerifyHoldAfterLift
│               └── RecoverThenFail
│                   ├── CancelMotion
│                   ├── ResetRobot
│                   └── RecoveryComplete
├── StorePickPosition
└── SetResult

Grasp Generation

When bt_enable_graspgen=True, the tree first tries:

GetObjectPointcloud -> GetScenePointcloud -> GenerateGrasps

If that branch fails or returns no poses, the selector falls back to:

GenerateHeuristicGrasps

When bt_enable_graspgen=False, the tree skips the pointcloud and GraspGen
nodes and goes directly to heuristic candidate generation. This is useful for
hardware bring-up because it lets us validate the BT scan/filter/plan/execute
and gripper-verification path without blocking on Docker/model/runtime GraspGen
issues.

Place Skill Flow

The place tree is smaller:

Place [Sequence]
├── ComputePlacePose
├── PlanToPrePlace
├── ExecutePrePlace
├── PlanToPlace
├── ExecutePlace
├── OpenGripper
├── PlanRetract
├── ExecuteRetract
└── SetResult

This keeps place behavior explicit while still using the existing module
planning and execution helpers.

Blackboard Contract

The BT uses the py_trees blackboard for per-run state:

  • target_object
  • object_pointcloud
  • scene_pointcloud
  • grasp_candidates
  • grasp_index
  • current_grasp
  • pre_grasp_pose
  • lift_pose
  • place_pose
  • pre_place_pose
  • grasp_source
  • error_message
  • result_message

_tick_bt_tree resets these keys before each run so stale state does not leak
between pick/place calls.

Supporting Runtime Fixes

While exercising the real xArm + perception path, several small blockers showed
up. They are included because they directly affect whether reviewers/operators
can run the stack.

PointCloud2 pickling

PointCloud2.__getstate__ now drops cached Open3D bounding boxes before
pickling:

state.pop("axis_aligned_bounding_box", None)
state.pop("oriented_bounding_box", None)

This fixes ObjectDB/pLCM publication after a pointcloud has lazily computed
Open3D bounding boxes.

Rerun ImageAnnotations decode hardening

LCM decode now converts only the known broken
foxglove_msgs.ImageAnnotations generated-decoder AttributeError into
DecodingError. That lets the subscriber skip bad annotation frames instead of
killing the Rerun session.

Rerun throttling

xarm_perception now uses:

rerun_bridge(min_interval_sec=0.5)

This caps heavy Image and PointCloud2 streams to roughly 2 Hz per entity
path, which made Rerun stable enough for real xArm perception bring-up.

Focused test support

DIMOS_SKIP_AUTOCONF=1 lets focused unit tests run in environments where
LCM/multicast autoconfiguration is not available.

Validation

Automated Validation

The main BT validation is in:

dimos/manipulation/bt/test_pick_place_bt.py

These tests intentionally use fake module objects, so they do not require robot
hardware, Docker, GraspGen models, pointcloud RPCs, or LCM.

They validate:

  • RetryOnFailure retries a failed child and succeeds on the next tick.
  • The pick tree builds with the expected top-level structure:
    EnsureReady, PickWithRescan, StorePickPosition, SetResult.
  • The place tree builds with the expected place flow.
  • bt_enable_graspgen=False removes GraspGen and pointcloud nodes while keeping
    heuristic candidate generation.
  • GraspGen/pointcloud failure falls back to heuristic grasps and records
    grasp_source="heuristic".
  • Gripper verification accepts partial closure and rejects fully open or
    empty-closed gripper states.
  • The optional minimum-closure threshold is respected when an open-reference
    value is configured.

This gives reviewers a fast way to validate the BT control-flow behavior
without needing the real xArm stack.

Focused validation after cleanup:

.venv/bin/ruff check \
  dimos/manipulation/bt \
  dimos/manipulation/pick_and_place_module.py \
  dimos/manipulation/blueprints.py \
  dimos/msgs/sensor_msgs/PointCloud2.py \
  dimos/msgs/sensor_msgs/test_PointCloud2.py \
  dimos/protocol/pubsub/encoders.py \
  dimos/conftest.py

Result:

All checks passed

Compile:

.venv/bin/python -m compileall \
  dimos/manipulation/bt \
  dimos/manipulation/pick_and_place_module.py \
  dimos/manipulation/blueprints.py \
  dimos/msgs/sensor_msgs/PointCloud2.py \
  dimos/protocol/pubsub/encoders.py \
  dimos/conftest.py

Result:

passed

Focused tests:

DIMOS_SKIP_AUTOCONF=1 .venv/bin/python -m pytest -q \
  dimos/manipulation/bt/test_pick_place_bt.py \
  dimos/msgs/sensor_msgs/test_PointCloud2.py::test_pickle_after_cached_open3d_bounding_boxes

Result:

10 passed

Real Hardware Validation

The BT-backed pick path was exercised on the real xArm stack through grasp
candidate generation, pre-grasp planning/execution, current-grasp
planning/execution, and gripper verification. The physical grasp failed, but
the BT correctly localized the failure at VerifyGrasp. Remaining work is grasp
calibration and reliability tuning, not basic BT integration.

Best log excerpt from the xArm run:

[GenerateHeuristicGrasps] 1 top-down grasps above (0.741, 0.026, -0.012); z=[0.160]
[ComputePreGrasp] offset=0.100 pose=(0.741, 0.026, 0.260)
[PlanToPose] Planning to pre_grasp_pose: (0.741, 0.026, 0.260)
IK solved, error: 0.0001m
Path: 2 waypoints
Trajectory: 3.642s
[PlanToPose] Planning to pre_grasp_pose succeeded
Executing trajectory...
Trajectory completed
[PlanToPose] Planning to current_grasp: (0.741, 0.026, 0.160)
IK solved, error: 0.0008m
Path: 2 waypoints
Trajectory: 0.797s
[PlanToPose] Planning to current_grasp succeeded
Executing trajectory...
Trajectory completed
[VerifyGrasp] gripper opening=-0.0020m expected range=(0.0050, 0.8100)m
Error: Grasp verification failed - gripper empty (-0.0020m <= 0.0050m)

What this proves:

  • BT pick path started and selected a grasp candidate.
  • GenerateHeuristicGrasps BT leaf ran.
  • ComputePreGrasp BT leaf ran.
  • PlanToPose ran for pre_grasp_pose.
  • IK solved.
  • Path planning produced waypoints.
  • Trajectory execution completed.
  • PlanToPose ran again for current_grasp.
  • IK solved again.
  • Trajectory execution completed again.
  • VerifyGrasp ran and correctly failed the pick because the gripper was empty.

Interpretation:

The BT integration is real and reaches live planning/execution/gripper
verification on hardware. The remaining issue is not BT wiring; it is physical
grasp calibration, TCP/gripper geometry, perception/grasp pose tuning, and
reliability work.

How BT Helped During Hardware Testing

The hardware logs show the main debugging benefit of BT: failures are localized
to named stages.

1. Missing OSR pointcloud RPC and workspace filter failure

[GetObjectPointcloud] unavailable: RPC methods not found. Class: PickAndPlaceModule, RPC methods: ObjectSceneRegistrationModule.get_object_pointcloud_by_object_id
[GenerateHeuristicGrasps] Top-down grasp at (0.752, -0.002, 0.006)
[PickWithRescan] attempt 1/3 failed
Error: No grasp candidates passed workspace filtering

What this told us:

  • Pointcloud lookup failed at GetObjectPointcloud.
  • The tree correctly fell back to GenerateHeuristicGrasps.
  • The final blocker was FilterGraspWorkspace, not perception or planning.

2. IK failure at pre-grasp

[GenerateHeuristicGrasps] Top-down grasp at (0.732, 0.037, 0.037)
[FilterGraspWorkspace] 1/1 passed
[SelectNextGrasp] Candidate 1/1
[ComputePreGrasp] offset=0.100 pose=(0.732, 0.037, 0.137)
[PlanToPose] Planning to pre_grasp_pose: (0.732, 0.037, 0.137)
IK failed: NO_SOLUTION
[PlanToPose] Planning to pre_grasp_pose failed
[GraspWithRetry] attempt 1/5 failed

What this told us:

  • Candidate generation and workspace filtering were fine.
  • The failure was specifically PlanToPose for pre_grasp_pose.
  • The next tuning target was reachability/IK/grasp pose, not object detection.

3. Collision-at-start during planning

[ClearPerceptionObstacles] Cleared 4 perception obstacle(s) from planning world
[GenerateHeuristicGrasps] 3 top-down grasps above (0.682, 0.022, 0.035); z=[0.140, 0.155, 0.195]
[FilterGraspWorkspace] 3/3 passed
[SelectNextGrasp] Candidate 1/3
[ComputePreGrasp] offset=0.100 pose=(0.682, 0.022, 0.240)
[PlanToPose] Planning to pre_grasp_pose: (0.682, 0.022, 0.240)
IK solved, error: 0.0006m
Planning failed: COLLISION_AT_START

What this told us:

  • The obstacle-clear BT action ran.
  • IK solved, so this was not an IK problem.
  • The remaining blocker was planning-world/collision state.

4. Workspace distance guardrail prevented unsafe motion

[ClearPerceptionObstacles] Cleared 6 perception obstacle(s) from planning world
Detection snapshot: ['cup', 'person', 'handbag', 'hassock', 'printer', 'storage box']
[GenerateHeuristicGrasps] 3 top-down grasps above (1.132, -0.054, 0.084); z=[0.164, 0.204, 0.244]
[FilterGraspWorkspace] rejected by distance: distance=1.145 max=0.900
[FilterGraspWorkspace] rejected by distance: distance=1.152 max=0.900
[FilterGraspWorkspace] rejected by distance: distance=1.160 max=0.900
Error: No grasp candidates passed workspace filtering

What this told us:

  • The robot did not move.
  • The candidates were rejected by a specific workspace rule.
  • This pointed to perception/target geometry being outside the configured
    reachable workspace.

5. Object label/snapshot mismatch

Starting BT pick for 'contain'
[ClearPerceptionObstacles] Cleared 2 perception obstacle(s) from planning world
Detection snapshot: ['gift basket', 'office chair', 'storage box']
Object 'contain' not found in snapshot. Available: ['gift basket', 'office chair', 'storage box']
Error: Object 'contain' not found

What this told us:

  • The failure happened before robot motion.
  • The problem was target lookup / detection snapshot stability.
  • This directly motivated the HasTargetDetection path that prefers an
    existing target snapshot instead of refreshing away a just-scanned label.

6. Physical grasp failure localized to gripper verification

[PlanToPose] Planning to pre_grasp_pose succeeded
[PlanToPose] Planning to current_grasp: (0.765, 0.012, 0.160)
[PlanToPose] Planning to current_grasp succeeded
[VerifyGrasp] gripper opening=0.8460m expected range=(0.0050, 0.8100)m
Error: Grasp verification failed - gripper still open (0.8460m >= 0.8100m)

What this told us:

  • Planning and execution reached the grasp pose.
  • The failure was not BT wiring or planning.
  • The remaining issue was physical object capture / gripper geometry / grasp
    calibration.

Summary:

The hardware logs show the BT made failures phase-local: GetObjectPointcloud for
missing OSR binding, FilterGraspWorkspace for invalid candidates, PlanToPose for
IK/planner failures, FindObject for label/snapshot mismatch, and VerifyGrasp for
physical grasp failure. This is the core debugging advantage over the old
monolithic pick path.

What Is Left

This PR lands the architecture and the first real-stack validation. The next
work should focus on reliability.

1. Tune BT branches for reliable pick/place

Needed work:

  • Tune grasp candidate heights and offsets.
  • Tune approach orientation for the side/gantry-mounted xArm setup.
  • Validate TCP and gripper geometry.
  • Improve candidate generation for table-height detections.
  • Tune retry counts and BT timeout relative to agent RPC timeout.
  • Add better branch-specific recovery where needed.
  • Validate GraspGen runtime and pointcloud grasp output separately from
    heuristic-only hardware bring-up.

Goal:

Make the BT reliably pick and place known objects on the real xArm setup.

2. Benchmark against current main

After the hardware path is stable enough, compare:

main monolithic pick/place
vs
BT + GraspGen-first + heuristic fallback

Suggested benchmark:

  • Choose 2 to 4 representative objects.
  • Keep table, lighting, camera, start pose, and object placement consistent.
  • Run repeated trials for each object and each method.

Metrics:

  • Successful picks of the same object.
  • Successful places of the same object.
  • Full pick-and-place success rate.
  • Failure phase.
  • Grasp source used: GraspGen or heuristic.
  • Number of retry attempts.
  • Recovery attempts.
  • Total completion time.
  • Manual intervention count.
  • Object dropped after lift: yes/no.

Example table:

Method Object Trial Pick Success Place Success Failure Phase Grasp Source Retries Recovery Attempts Time Manual Intervention
main monolithic cup 1 yes/no yes/no planning/grasp/lift/place heuristic 0 0 0.0s yes/no
BT + fallback cup 1 yes/no yes/no PlanToPose/VerifyGrasp/etc. graspgen/heuristic 0 0 0.0s yes/no

3. Add structured error reporting

BT gives named phases now. A future PR can add stable machine-readable fields:

error_code
error_phase
last_failed_leaf
attempt_index
grasp_source

Examples:

  • OBJECT_NOT_FOUND
  • NO_POINTCLOUD
  • GRASPGEN_EMPTY
  • ALL_GRASPS_FILTERED
  • PLAN_FAILED_PRE_GRASP
  • EXEC_FAILED
  • GRASP_VERIFY_FAILED
  • BT_TIMEOUT

This should be added next to the existing human-readable error_message, not
instead of it.

Final Reviewer Summary

This PR is the first clean integration slice for issue #1670.

It does:

  • keep the current public PickAndPlaceModule skill API,
  • add internal behavior-tree orchestration for pick/place,
  • use GraspGen first by default,
  • keep heuristic fallback,
  • add retry/recovery structure,
  • add gripper verification and hold verification,
  • add focused tests,
  • include narrow runtime fixes needed for real-stack validation,
  • and show real xArm logs proving the BT path reached live planning, execution,
    and gripper verification.

The remaining work is tuning and benchmarking, BT is successfully integrated in the workflow.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR integrates py_trees behavior-tree orchestration inside PickAndPlaceModule, keeping the public skill API unchanged while making the internal pick/place phases explicit, named, and retryable. GraspGen is attempted first by default with heuristic top-down grasps as fallback, and several narrow runtime fixes (PointCloud2 pickling, Rerun throttling, LCM decode hardening) are bundled in.

  • P1 — bt_execution_timeout does not bound hung synchronous leaves: _tick_bt_tree checks the timeout only between ticks, but every BT leaf (PlanToPose, ExecuteTrajectory, SetGripper) is a blocking synchronous call. A stalled planner or executor will block the loop indefinitely inside the leaf's update(), making the timeout check unreachable until the call returns.
  • P2 — FilterGraspWorkspace registers grasp_candidates twice (READ then WRITE) instead of READ_WRITE; should be a single READ_WRITE registration.
  • P2 — SetResultMessage reads grasp_source without registering it, relying on a broad except (AttributeError, KeyError) catch to swallow the py_trees access-control violation silently.

Confidence Score: 3/5

Safe to merge for architecture, but the P1 hang risk in _tick_bt_tree should be addressed before production use with real hardware.

One P1 finding (synchronous leaf hangs bypass bt_execution_timeout) caps the score. The public API is stable, the BT structure and retry logic are sound, and the tests cover key control-flow paths. The P1 is a real operational risk on hardware but does not break the BT wiring itself.

dimos/manipulation/pick_and_place_module.py (_tick_bt_tree timeout gap) and dimos/manipulation/bt/actions.py (key registration issues).

Important Files Changed

Filename Overview
dimos/manipulation/pick_and_place_module.py Main public skill module refactored to use BT internally; public API unchanged, but _tick_bt_tree has a P1 gap where synchronous leaf hangs are not bounded by bt_execution_timeout.
dimos/manipulation/bt/actions.py All BT leaf action nodes; contains two code-quality issues: duplicate grasp_candidates key registration in FilterGraspWorkspace, and unregistered grasp_source access in SetResultMessage.
dimos/manipulation/bt/trees.py Defines build_pick_tree and build_place_tree plus the RetryOnFailure decorator; tree structure matches PR description and retry/recovery logic is sound.
dimos/manipulation/bt/conditions.py Condition nodes (RobotIsHealthy, HasTargetDetection, GripperHasObject); gripper verification logic is correct and the min-closure threshold derivation is well-tested.
dimos/manipulation/bt/test_pick_place_bt.py Ten unit tests covering retry logic, tree structure, graspgen/heuristic fallback, and gripper verification; all use fake module objects and run without hardware.
dimos/protocol/pubsub/encoders.py Adds narrow AttributeError hardening for foxglove_msgs.ImageAnnotations decode failures; intentionally scoped but may miss future broken generated decoders.
dimos/msgs/sensor_msgs/PointCloud2.py Adds __getstate__ cleanup for cached Open3D bounding boxes to fix pickling after lazy property access; covered by a new test.
dimos/manipulation/blueprints.py Migrates xarm_perception from foxglove_bridge to rerun_bridge with a 2 Hz throttle (min_interval_sec=0.5) to stabilize heavy perception streams.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant PickAndPlaceModule
    participant _tick_bt_tree
    participant BehaviourTree
    participant EnsureReady
    participant PickWithRescan
    participant ScanAndGrasp
    participant GraspGen as GraspGen / Heuristic
    participant FilterGraspWorkspace
    participant GraspWithRetry
    participant GraspAttempt
    participant RecoverThenFail

    Agent->>PickAndPlaceModule: pick(object_name, object_id, robot_name)
    PickAndPlaceModule->>_tick_bt_tree: build_pick_tree() -> root
    _tick_bt_tree->>BehaviourTree: setup()
    loop tick loop (bt_tick_rate_hz)
        BehaviourTree->>EnsureReady: tick
        EnsureReady-->>BehaviourTree: SUCCESS / FAILURE
        BehaviourTree->>PickWithRescan: tick (RetryOnFailure, max_rescan_attempts)
        PickWithRescan->>ScanAndGrasp: tick
        ScanAndGrasp->>GraspGen: GetObjectPointcloud -> GetScenePointcloud -> GenerateGrasps
        alt GraspGen succeeds
            GraspGen-->>ScanAndGrasp: grasp_candidates (source=graspgen)
        else GraspGen fails
            GraspGen-->>ScanAndGrasp: FAILURE
            ScanAndGrasp->>GraspGen: GenerateHeuristicGrasps (fallback)
            GraspGen-->>ScanAndGrasp: grasp_candidates (source=heuristic)
        end
        ScanAndGrasp->>FilterGraspWorkspace: filter by z / distance / angle
        FilterGraspWorkspace-->>ScanAndGrasp: filtered candidates or FAILURE
        ScanAndGrasp->>GraspWithRetry: tick (RetryOnFailure, max_pick_attempts)
        GraspWithRetry->>GraspAttempt: SelectNextGrasp -> PlanToPreGrasp -> Execute -> CloseGripper -> VerifyGrasp -> Lift
        alt GraspAttempt succeeds
            GraspAttempt-->>GraspWithRetry: SUCCESS
        else GraspAttempt fails
            GraspAttempt-->>GraspWithRetry: FAILURE
            GraspWithRetry->>RecoverThenFail: CancelMotion -> ResetRobot -> Failure
            RecoverThenFail-->>GraspWithRetry: FAILURE
            GraspWithRetry-->>ScanAndGrasp: RUNNING (retry)
        end
        BehaviourTree-->>_tick_bt_tree: SUCCESS / FAILURE / RUNNING
    end
    _tick_bt_tree-->>PickAndPlaceModule: result_message or error_message
    PickAndPlaceModule-->>Agent: string result
Loading

Reviews (1): Last reviewed commit: "Issue #1670: Integrate BT pick/place wit..." | Re-trigger Greptile

Comment on lines +401 to +425
start = time.time()
period = 1.0 / self.config.bt_tick_rate_hz
while True:
tree.tick()

if root.status == py_trees.common.Status.SUCCESS:
return self._read_bt_blackboard(
"result_message", "Operation completed successfully"
)

if root.status == py_trees.common.Status.FAILURE:
return self._read_bt_blackboard("error_message", "Error: Operation failed")

if (
self.config.bt_execution_timeout is not None
and time.time() - start > self.config.bt_execution_timeout
):
root.stop(py_trees.common.Status.INVALID)
try:
self.cancel()
except Exception:
logger.warning("Failed to cancel timed-out BT operation", exc_info=True)
return "Error: Operation timed out"

time.sleep(period)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 BT timeout does not protect against hung synchronous leaves

bt_execution_timeout is checked between ticks (time.sleep(period) then check), but every BT leaf (PlanToPose, ExecuteTrajectory, SetGripper, etc.) is a synchronous blocking call. If plan_to_pose() or _preview_execute_wait() stalls indefinitely, the entire while True loop blocks inside the leaf's update() and the timeout branch is never reached. The default bt_execution_timeout=None means no protection at all in production. A blocked planning or execution call will hang the pick() or place() skill call with no recovery path.

Comment on lines +282 to +283
self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ)
self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.WRITE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 grasp_candidates is registered twice — once as READ and once as WRITE — on the same blackboard client. The proper way to express both access levels in py_trees is a single READ_WRITE registration. Duplicate registrations of the same key may raise in stricter py_trees configurations and make the access intent unclear.

Suggested change
self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ)
self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.WRITE)
self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ_WRITE)

Comment on lines +498 to +506
def update(self) -> Status:
source = ""
try:
if self.bb.grasp_source:
source = f" using {self.bb.grasp_source} grasp"
except (AttributeError, KeyError):
pass
self.bb.result_message = self.message.format(grasp_source=source)
return Status.SUCCESS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 grasp_source read without blackboard registration

SetResultMessage reads self.bb.grasp_source inside a broad except (AttributeError, KeyError) catch without registering the key via the blackboard client. In py_trees, accessing an unregistered key raises AttributeError, so the catch silently swallows the access-control violation instead of surfacing it. In the place tree this produces a silent no-op; in future strict-mode configurations it could hide misconfiguration. The key should be registered with READ access in __init__, matching the pattern used by every other node in this module.

Comment on lines 113 to +121
def decode(self, msg: bytes, topic: LCMTopicProto) -> DimosMsg:
if topic.lcm_type is None:
raise DecodingError(f"Cannot decode: topic {topic.topic!r} has no lcm_type")
return topic.lcm_type.lcm_decode(msg)
try:
return topic.lcm_type.lcm_decode(msg)
except AttributeError as exc:
if getattr(topic.lcm_type, "msg_name", "") == "foxglove_msgs.ImageAnnotations":
raise DecodingError("Skipping undecodable Foxglove ImageAnnotations") from exc
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Narrow AttributeError catch tied to a single msg_name literal

The guard catches AttributeError only when msg_name is exactly "foxglove_msgs.ImageAnnotations". Any other LCM type whose generated decoder raises AttributeError (e.g., a future broken Foxglove or third-party type) will still propagate and crash the subscriber session — defeating the stated goal of "hardening." Consider whether the pattern should apply to all AttributeError from lcm_decode, or at least document the deliberate scope to avoid future maintainers widening it incorrectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant