Issue #1670: Integrate BT pick/place with GraspGen-first grasping#1975
Issue #1670: Integrate BT pick/place with GraspGen-first grasping#1975KrishnaH96 wants to merge 1 commit intodimensionalOS:mainfrom
Conversation
Greptile SummaryThis PR integrates
Confidence Score: 3/5Safe to merge for architecture, but the P1 hang risk in One P1 finding (synchronous leaf hangs bypass
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant PickAndPlaceModule
participant _tick_bt_tree
participant BehaviourTree
participant EnsureReady
participant PickWithRescan
participant ScanAndGrasp
participant GraspGen as GraspGen / Heuristic
participant FilterGraspWorkspace
participant GraspWithRetry
participant GraspAttempt
participant RecoverThenFail
Agent->>PickAndPlaceModule: pick(object_name, object_id, robot_name)
PickAndPlaceModule->>_tick_bt_tree: build_pick_tree() -> root
_tick_bt_tree->>BehaviourTree: setup()
loop tick loop (bt_tick_rate_hz)
BehaviourTree->>EnsureReady: tick
EnsureReady-->>BehaviourTree: SUCCESS / FAILURE
BehaviourTree->>PickWithRescan: tick (RetryOnFailure, max_rescan_attempts)
PickWithRescan->>ScanAndGrasp: tick
ScanAndGrasp->>GraspGen: GetObjectPointcloud -> GetScenePointcloud -> GenerateGrasps
alt GraspGen succeeds
GraspGen-->>ScanAndGrasp: grasp_candidates (source=graspgen)
else GraspGen fails
GraspGen-->>ScanAndGrasp: FAILURE
ScanAndGrasp->>GraspGen: GenerateHeuristicGrasps (fallback)
GraspGen-->>ScanAndGrasp: grasp_candidates (source=heuristic)
end
ScanAndGrasp->>FilterGraspWorkspace: filter by z / distance / angle
FilterGraspWorkspace-->>ScanAndGrasp: filtered candidates or FAILURE
ScanAndGrasp->>GraspWithRetry: tick (RetryOnFailure, max_pick_attempts)
GraspWithRetry->>GraspAttempt: SelectNextGrasp -> PlanToPreGrasp -> Execute -> CloseGripper -> VerifyGrasp -> Lift
alt GraspAttempt succeeds
GraspAttempt-->>GraspWithRetry: SUCCESS
else GraspAttempt fails
GraspAttempt-->>GraspWithRetry: FAILURE
GraspWithRetry->>RecoverThenFail: CancelMotion -> ResetRobot -> Failure
RecoverThenFail-->>GraspWithRetry: FAILURE
GraspWithRetry-->>ScanAndGrasp: RUNNING (retry)
end
BehaviourTree-->>_tick_bt_tree: SUCCESS / FAILURE / RUNNING
end
_tick_bt_tree-->>PickAndPlaceModule: result_message or error_message
PickAndPlaceModule-->>Agent: string result
Reviews (1): Last reviewed commit: "Issue #1670: Integrate BT pick/place wit..." | Re-trigger Greptile |
| start = time.time() | ||
| period = 1.0 / self.config.bt_tick_rate_hz | ||
| while True: | ||
| tree.tick() | ||
|
|
||
| if root.status == py_trees.common.Status.SUCCESS: | ||
| return self._read_bt_blackboard( | ||
| "result_message", "Operation completed successfully" | ||
| ) | ||
|
|
||
| if root.status == py_trees.common.Status.FAILURE: | ||
| return self._read_bt_blackboard("error_message", "Error: Operation failed") | ||
|
|
||
| if ( | ||
| self.config.bt_execution_timeout is not None | ||
| and time.time() - start > self.config.bt_execution_timeout | ||
| ): | ||
| root.stop(py_trees.common.Status.INVALID) | ||
| try: | ||
| self.cancel() | ||
| except Exception: | ||
| logger.warning("Failed to cancel timed-out BT operation", exc_info=True) | ||
| return "Error: Operation timed out" | ||
|
|
||
| time.sleep(period) |
There was a problem hiding this comment.
BT timeout does not protect against hung synchronous leaves
bt_execution_timeout is checked between ticks (time.sleep(period) then check), but every BT leaf (PlanToPose, ExecuteTrajectory, SetGripper, etc.) is a synchronous blocking call. If plan_to_pose() or _preview_execute_wait() stalls indefinitely, the entire while True loop blocks inside the leaf's update() and the timeout branch is never reached. The default bt_execution_timeout=None means no protection at all in production. A blocked planning or execution call will hang the pick() or place() skill call with no recovery path.
| self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ) | ||
| self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.WRITE) |
There was a problem hiding this comment.
grasp_candidates is registered twice — once as READ and once as WRITE — on the same blackboard client. The proper way to express both access levels in py_trees is a single READ_WRITE registration. Duplicate registrations of the same key may raise in stricter py_trees configurations and make the access intent unclear.
| self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ) | |
| self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.WRITE) | |
| self.bb.register_key(key="grasp_candidates", access=py_trees.common.Access.READ_WRITE) |
| def update(self) -> Status: | ||
| source = "" | ||
| try: | ||
| if self.bb.grasp_source: | ||
| source = f" using {self.bb.grasp_source} grasp" | ||
| except (AttributeError, KeyError): | ||
| pass | ||
| self.bb.result_message = self.message.format(grasp_source=source) | ||
| return Status.SUCCESS |
There was a problem hiding this comment.
grasp_source read without blackboard registration
SetResultMessage reads self.bb.grasp_source inside a broad except (AttributeError, KeyError) catch without registering the key via the blackboard client. In py_trees, accessing an unregistered key raises AttributeError, so the catch silently swallows the access-control violation instead of surfacing it. In the place tree this produces a silent no-op; in future strict-mode configurations it could hide misconfiguration. The key should be registered with READ access in __init__, matching the pattern used by every other node in this module.
| def decode(self, msg: bytes, topic: LCMTopicProto) -> DimosMsg: | ||
| if topic.lcm_type is None: | ||
| raise DecodingError(f"Cannot decode: topic {topic.topic!r} has no lcm_type") | ||
| return topic.lcm_type.lcm_decode(msg) | ||
| try: | ||
| return topic.lcm_type.lcm_decode(msg) | ||
| except AttributeError as exc: | ||
| if getattr(topic.lcm_type, "msg_name", "") == "foxglove_msgs.ImageAnnotations": | ||
| raise DecodingError("Skipping undecodable Foxglove ImageAnnotations") from exc | ||
| raise |
There was a problem hiding this comment.
Narrow
AttributeError catch tied to a single msg_name literal
The guard catches AttributeError only when msg_name is exactly "foxglove_msgs.ImageAnnotations". Any other LCM type whose generated decoder raises AttributeError (e.g., a future broken Foxglove or third-party type) will still propagate and crash the subscriber session — defeating the stated goal of "hardening." Consider whether the pattern should apply to all AttributeError from lcm_decode, or at least document the deliberate scope to avoid future maintainers widening it incorrectly.
Issue #1670 PR Draft: Behavior-Tree Pick/Place Integration
Problem Statement
The current pick/place path in
PickAndPlaceModuleis usable, but the controlflow is mostly monolithic. Perception, grasp selection, planning, execution,
gripper checks, retry behavior, and recovery are folded into one procedural
skill path.
That makes real-robot debugging harder. When a pick fails, the final return
string often does not tell us cleanly which stage failed:
Issue #1670 asks for a stronger manipulation pipeline by combining:
PickAndPlaceModuleinterface.The goal of this PR is not to introduce a second manipulation stack. The goal is
to keep the existing public skills and make the internals more structured,
testable, and debuggable.
What This PR Does
This PR integrates
py_treesbehavior-tree orchestration inside the existingPickAndPlaceModule.The public agent-facing API remains unchanged:
scan_objectspickplaceplace_backpick_and_placeget_scene_infoInternally:
pick()builds and ticks a pick behavior tree.place()builds and ticks a place behavior tree.bt_enable_graspgen=Falsecan be used locally for hardware bring-up toisolate BT motion/retry/gripper behavior from Docker/model/pointcloud grasp
generation.
expected by existing agents.
Default behavior is GraspGen-first:
This matters because the normal PR behavior still matches the issue goal:
Why This Architecture
The previous BT direction introduced a separate public BT manipulation module
and separate blueprint path. That made the review and runtime story larger:
PickAndPlaceModule.This PR intentionally avoids that split.
The behavior tree is an internal scheduler. BT leaves call the existing module
primitives directly:
refresh_obstacles_find_object_in_detectionsgenerate_graspsplan_to_pose_preview_execute_wait_set_gripper_positioncancelresetThis keeps the surface area stable for agents while making the internal pick and
place phases explicit.
How It Works
Files Added
Files Modified
Pick Skill Flow
At the public skill level, the shape is still simple:
The behavior tree expands that into named phases:
Grasp Generation
When
bt_enable_graspgen=True, the tree first tries:If that branch fails or returns no poses, the selector falls back to:
When
bt_enable_graspgen=False, the tree skips the pointcloud and GraspGennodes and goes directly to heuristic candidate generation. This is useful for
hardware bring-up because it lets us validate the BT scan/filter/plan/execute
and gripper-verification path without blocking on Docker/model/runtime GraspGen
issues.
Place Skill Flow
The place tree is smaller:
This keeps place behavior explicit while still using the existing module
planning and execution helpers.
Blackboard Contract
The BT uses the py_trees blackboard for per-run state:
target_objectobject_pointcloudscene_pointcloudgrasp_candidatesgrasp_indexcurrent_grasppre_grasp_poselift_poseplace_posepre_place_posegrasp_sourceerror_messageresult_message_tick_bt_treeresets these keys before each run so stale state does not leakbetween pick/place calls.
Supporting Runtime Fixes
While exercising the real xArm + perception path, several small blockers showed
up. They are included because they directly affect whether reviewers/operators
can run the stack.
PointCloud2 pickling
PointCloud2.__getstate__now drops cached Open3D bounding boxes beforepickling:
This fixes ObjectDB/pLCM publication after a pointcloud has lazily computed
Open3D bounding boxes.
Rerun ImageAnnotations decode hardening
LCM decode now converts only the known broken
foxglove_msgs.ImageAnnotationsgenerated-decoderAttributeErrorintoDecodingError. That lets the subscriber skip bad annotation frames instead ofkilling the Rerun session.
Rerun throttling
xarm_perceptionnow uses:This caps heavy
ImageandPointCloud2streams to roughly 2 Hz per entitypath, which made Rerun stable enough for real xArm perception bring-up.
Focused test support
DIMOS_SKIP_AUTOCONF=1lets focused unit tests run in environments whereLCM/multicast autoconfiguration is not available.
Validation
Automated Validation
The main BT validation is in:
These tests intentionally use fake module objects, so they do not require robot
hardware, Docker, GraspGen models, pointcloud RPCs, or LCM.
They validate:
RetryOnFailureretries a failed child and succeeds on the next tick.EnsureReady,PickWithRescan,StorePickPosition,SetResult.bt_enable_graspgen=Falseremoves GraspGen and pointcloud nodes while keepingheuristic candidate generation.
grasp_source="heuristic".empty-closed gripper states.
value is configured.
This gives reviewers a fast way to validate the BT control-flow behavior
without needing the real xArm stack.
Focused validation after cleanup:
Result:
Compile:
Result:
Focused tests:
Result:
Real Hardware Validation
The BT-backed pick path was exercised on the real xArm stack through grasp
candidate generation, pre-grasp planning/execution, current-grasp
planning/execution, and gripper verification. The physical grasp failed, but
the BT correctly localized the failure at VerifyGrasp. Remaining work is grasp
calibration and reliability tuning, not basic BT integration.
Best log excerpt from the xArm run:
What this proves:
GenerateHeuristicGraspsBT leaf ran.ComputePreGraspBT leaf ran.PlanToPoseran forpre_grasp_pose.PlanToPoseran again forcurrent_grasp.VerifyGraspran and correctly failed the pick because the gripper was empty.Interpretation:
How BT Helped During Hardware Testing
The hardware logs show the main debugging benefit of BT: failures are localized
to named stages.
1. Missing OSR pointcloud RPC and workspace filter failure
What this told us:
GetObjectPointcloud.GenerateHeuristicGrasps.FilterGraspWorkspace, not perception or planning.2. IK failure at pre-grasp
What this told us:
PlanToPoseforpre_grasp_pose.3. Collision-at-start during planning
What this told us:
4. Workspace distance guardrail prevented unsafe motion
What this told us:
reachable workspace.
5. Object label/snapshot mismatch
What this told us:
HasTargetDetectionpath that prefers anexisting target snapshot instead of refreshing away a just-scanned label.
6. Physical grasp failure localized to gripper verification
What this told us:
calibration.
Summary:
What Is Left
This PR lands the architecture and the first real-stack validation. The next
work should focus on reliability.
1. Tune BT branches for reliable pick/place
Needed work:
heuristic-only hardware bring-up.
Goal:
2. Benchmark against current main
After the hardware path is stable enough, compare:
Suggested benchmark:
Metrics:
Example table:
3. Add structured error reporting
BT gives named phases now. A future PR can add stable machine-readable fields:
Examples:
OBJECT_NOT_FOUNDNO_POINTCLOUDGRASPGEN_EMPTYALL_GRASPS_FILTEREDPLAN_FAILED_PRE_GRASPEXEC_FAILEDGRASP_VERIFY_FAILEDBT_TIMEOUTThis should be added next to the existing human-readable
error_message, notinstead of it.
Final Reviewer Summary
This PR is the first clean integration slice for issue #1670.
It does:
PickAndPlaceModuleskill API,and gripper verification.
The remaining work is tuning and benchmarking, BT is successfully integrated in the workflow.