Skip to content

Stop importing object_library from calling the Lightwheel API#741

Open
cvolkcvolk wants to merge 5 commits into
mainfrom
cvolk/fix/lightwheel-lazy-asset-paths
Open

Stop importing object_library from calling the Lightwheel API#741
cvolkcvolk wants to merge 5 commits into
mainfrom
cvolk/fix/lightwheel-lazy-asset-paths

Conversation

@cvolkcvolk
Copy link
Copy Markdown
Collaborator

@cvolkcvolk cvolkcvolk commented May 29, 2026

Summary

Importing isaaclab_arena.assets.object_library — anyone instantiating an Arena asset, Lightwheel or not — made six unconditional HTTPS calls to the Lightwheel API at module-import time.

This PR introduces LightwheelLazyPath: a class-attribute descriptor that defers the SDK call to first access.

Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsaacLab Review Bot Summary

PR #741: Defer Lightwheel SDK calls in asset classes to first-access

Overview

This PR introduces a LightwheelLazyPath descriptor to defer Lightwheel SDK calls from module import time to first attribute access. This is a well-motivated fix for a production issue where transient SDK timeouts during module reload would leave asset classes unregistered.

What I Reviewed

  • New file: isaaclab_arena/assets/lightwheel_lazy.py (117 lines)
  • Modified file: isaaclab_arena/assets/object_library.py (6 asset classes updated)

Findings

✅ Strengths

  1. Clean descriptor pattern: The LightwheelLazyPath descriptor correctly implements __get__ for lazy evaluation and caching.

  2. Good error handling: SDK failures are wrapped with contextual information (asset identifier, registry type), making debugging easier than the previous bare SDK exceptions.

  3. Minimal behavioral change: Existing users see no change — paths still resolve via the SDK, just at first-access instead of import-time.

  4. Well-documented: Excellent docstrings explaining the problem, solution, and usage patterns.

  5. Correct use of both SDK signatures: Properly supports both file_name= and registry_name= variants as used by the different asset classes.

📝 Minor Observations (Not Blocking)

  1. Thread safety: The lazy initialization in __get__ has a benign race condition — concurrent first-accesses could both call the SDK before _cached_path is set. This is acceptable here since:

    • Both calls would return identical results
    • Worst case is duplicate network calls, not incorrect state
    • Adding thread locks would increase complexity for minimal benefit in this context

    If thread safety ever becomes a concern (e.g., heavily parallel asset loading), a simple threading.Lock could be added.

  2. Type annotation: The return type of __get__ could be explicitly annotated as str for clarity, though this is stylistic.

Verdict

Looks good to merge. The implementation is clean, well-documented, and correctly addresses the stated problem. The architectural decision to use a descriptor is appropriate and Pythonic.


🤖 Automated review by IsaacLab Review Bot


Update (f2c94bc): Reviewed incremental changes between 35f0a18..f2c94bc.

Additional Changes Since Last Review

This update significantly expands the PR scope beyond the original Lightwheel lazy loading:

1. New Lightwheel Utilities (Related to PR Title)

  • lightwheel_utils.py: Adds acquire_lightwheel_asset() with retry logic and timeout handling for SDK calls. Good defensive programming for network-based asset fetching.
  • background_library.py: Updated RobocasaKitchen to use the new acquire_lightwheel_asset() wrapper.

2. Composite Task System (Major Addition)

  • composite_task_base.py (~400 lines): New base class for unordered composite tasks where subtask completion order does not matter.
  • sequential_task_base.py: Refactored to extend CompositeTaskBase, now enforces ordering constraints by overriding the success function.
  • Renames internal state variable _subtask_success_state_subtask_ever_succeeded for clarity.
  • Adds support for desired_subtask_success_state with None entries as "do not care" wildcards.

3. Relation Solver Refactoring

  • relation_solver_interface.py (~166 lines): Extracts relation-solving logic from ArenaEnvBuilder into a standalone module for reusability.
  • arena_env_builder.py: Significantly simplified by delegating to the new interface.

4. Documentation Updates

  • Multiple .rst files updated for XR/CloudXR teleoperation workflows.
  • New images for second viewport monitoring.

5. Test Coverage

  • New tests for CompositeTaskBase, SequentialTaskBase, LightwheelLazyPath, and relation_solver_interface.
  • Tests renamed from "pooled_placer" to "pooled_object_placer" for consistency.

Observations

  1. Scope creep: The PR title says "Defer Lightwheel SDK calls" but now includes a substantial task system refactor and documentation updates. Consider splitting into separate PRs for cleaner history.

  2. The Lightwheel changes are solid: The acquire_lightwheel_asset() wrapper with retry/timeout logic is a good addition that complements the lazy descriptor.

  3. CompositeTaskBase looks well-designed: Clean separation between ordered (sequential) and unordered composite task semantics.

Verdict

Changes look good technically. The code is well-tested and the refactoring improves modularity. The main concern is scope — this PR now contains three distinct features. If the team is okay with the combined scope, it is ready to merge.


🤖 Automated review by IsaacLab Review Bot


Update (47c6d74): Reviewed incremental changes between f2c94bc..47c6d74.

Changes in This Update

This commit simplifies the LightwheelLazyPath descriptor API and adds explicit file_type parameters:

1. API Simplification in lightwheel_lazy.py

The descriptor constructor is now a simple **kwargs passthrough:

  • Before: Explicit registry_type, file_name, registry_name, file_type parameters with validation
  • After: __init__(self, **acquire_by_registry_kwargs) — just stores and forwards kwargs directly

This is a nice simplification that:

  • Removes coupling between the descriptor and SDK parameter names
  • Makes adding new SDK parameters easier (no descriptor changes needed)
  • Trades compile-time argument checking for runtime SDK validation

Trade-off note: The previous version had an assert ensuring exactly one of file_name or registry_name was provided. This validation is now deferred to the SDK call. Both approaches are valid.

2. Explicit file_type="USD" in object_library.py

All LightwheelLazyPath usages now explicitly specify file_type="USD":

  • Microwave, CoffeeMachine, Broccoli, SweetPotato, Jug, BeerBottle

Previously file_type="USD" was the default in the descriptor; now that defaults are removed, it must be explicit. This is cleaner — no hidden defaults.

3. Import Ordering Fix

Sorted imports in object_library.pyLightwheelLazyPath import moved to alphabetical position.

Verdict

Good incremental improvement. The kwargs-passthrough pattern is cleaner and more maintainable. Explicit file_type parameters improve readability. No concerns with this update.


🤖 Automated review by IsaacLab Review Bot

Six asset classes in ``object_library`` (Microwave, CoffeeMachine, Broccoli,
SweetPotato, Jug, BeerBottle) resolved their USD paths via
``object_loader.acquire_by_registry(...)`` in the class body. Class bodies
are evaluated at module import time, so importing or reloading
``object_library`` made blocking HTTP calls to the Lightwheel API.

The SDK has a 10-second read timeout. If any one call timed out (or hit
any other transient error), the class body raised, the module import or
reload failed halfway, and every asset class defined later in the same
file was left undefined and unregistered. We hit this in a long
multi-job eval sweep: a transient Lightwheel stall during
``importlib.reload(object_library)`` broke ``RubiksCubeHot3DRobolab``'s
registration alongside the actual culprit, and subsequent jobs crashed
with a misleading "component rubiks_cube_hot3d_robolab not found" error.

Introduce ``LightwheelLazyPath``: a class-attribute descriptor that
defers the SDK call from class-body evaluation to first attribute
access. Resolved paths are cached per descriptor instance, so the SDK is
called at most once per asset per process. Failures are re-raised at
use-time with the asset's identifier, so a future broken asset surfaces
where it's being used rather than corrupting unrelated registrations
elsewhere.

After this change, ``object_library`` imports without network access,
all ``@register_asset`` decorators run, and the registry is fully
populated regardless of Lightwheel API health. Existing users of these
six assets see no behavioral change — first access still resolves the
path, just slightly later in the lifecycle.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
PR #731 added a shared ``acquire_lightwheel_asset`` helper that wraps
``object_loader.acquire_by_registry`` calls with a longer scoped timeout
(60 s vs. the SDK default 10 s) and a small retry budget for transient
timeout failures. After the lazy descriptor lands, the moment Lightwheel
USD paths are actually fetched moves from class-body evaluation to first
attribute access, but the call itself still benefits from the resilience
work. Route ``LightwheelLazyPath.__get__`` through the helper so the two
fixes compose: zero network at import, retry-on-timeout on actual use.

The descriptor still wraps the resolved call in a ``RuntimeError`` that
names the failing asset, so caller-facing error messages remain specific
even when retries have been exhausted.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk force-pushed the cvolk/fix/lightwheel-lazy-asset-paths branch from 35f0a18 to 4152e55 Compare May 29, 2026 13:26
Move the query-kwargs assembly and retry-helper description string from
``__get__`` into ``__init__`` — they only depend on constructor arguments
and don't need to be recomputed on every attribute access. Drop the
``try / except → RuntimeError`` wrapper: the SDK already includes the
registry parameters in its own error message, and ``acquire_lightwheel_asset``
already logs the asset description in its retry-attempt lines, so the
wrap added no information that wasn't already in the failure-path logs.

``__get__`` is now just a cache check, the lazy SDK import, and the
retry-helper call.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Drop the module-level docstring (history, before/after example, rationale)
and reduce the class docstring to a one-liner. The PR description carries
the bug-discovery write-up and the rationale; the code itself is small
enough that the new one-line summary is sufficient.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Drop the named ``registry_type``/``file_name``/``registry_name``/
``file_type`` parameters on ``LightwheelLazyPath`` along with the
assert that enforced exactly one of file_name/registry_name. The
descriptor now accepts ``**acquire_by_registry_kwargs`` and forwards
them verbatim to the SDK; mis-use surfaces from the SDK at first
access rather than from a parallel guard.

Restore the explicit ``file_type="USD"`` argument at the six call sites
in ``object_library.py`` so the descriptor no longer needs a default to
keep them short.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk changed the title Defer Lightwheel SDK calls in asset classes to first-access Stop importing object_library from calling the Lightwheel API May 29, 2026
@cvolkcvolk cvolkcvolk marked this pull request as ready for review May 29, 2026 13:52
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR removes six unconditional Lightwheel HTTPS calls that were triggered on object_library import by introducing LightwheelLazyPath, a non-data descriptor that defers each SDK call to the first attribute access and caches the result.

  • lightwheel_lazy.py adds the new descriptor; it imports lightwheel_sdk lazily inside __get__ so the SDK is not required at import time.
  • object_library.py replaces the six eager acquire_lightwheel_asset class-body calls with usd_path = LightwheelLazyPath(...) assignments, and drops the now-unused from lightwheel_sdk.loader import object_loader class-level imports.

Confidence Score: 4/5

Safe to merge for the primary goal of eliminating import-time API calls; the main instantiation path works correctly.

The core logic is sound: the descriptor fires on first self.usd_path access in LibraryObject.__init__, the resolved string is then stored as an instance attribute by Object.__init__, and the class-level cache avoids repeat calls across instances. Two edge cases in LightwheelLazyPath are worth addressing: class-level attribute access (Microwave.usd_path) unexpectedly triggers an API call because instance is None is not guarded, and using None as the cache sentinel creates ambiguity if the SDK ever returns None without raising.

isaaclab_arena/assets/lightwheel_lazy.py — the descriptor's __get__ method needs the instance is None guard and a more robust cache sentinel.

Important Files Changed

Filename Overview
isaaclab_arena/assets/lightwheel_lazy.py New non-data descriptor for lazy Lightwheel path resolution; missing instance is None guard causes class-level access to trigger API calls, and None sentinel for caching could mask an unresolved state.
isaaclab_arena/assets/object_library.py Six Lightwheel-backed assets now use LightwheelLazyPath descriptor instead of eagerly resolving at class-definition time; object_name and metadata from the old tuple-unpack are silently discarded but appear unused in the codebase.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant M as Microwave.__init__
    participant L as LibraryObject.__init__
    participant D as LightwheelLazyPath.__get__
    participant SDK as lightwheel_sdk
    participant O as Object.__init__

    C->>M: Microwave()
    M->>L: super().__init__(...)
    L->>D: self.usd_path (descriptor access)
    alt _cached_path is set
        D-->>L: return cached string
    else first access
        D->>SDK: acquire_lightwheel_asset(...)
        SDK-->>D: (file_path, _, _)
        D->>D: "self._cached_path = file_path"
        D-->>L: return file_path
    end
    L->>O: "super().__init__(usd_path=file_path, ...)"
    O->>O: "self.usd_path = file_path (instance attr)"
    O-->>C: "instance ready (instance.usd_path = string)"
Loading

Reviews (1): Last reviewed commit: "Make LightwheelLazyPath a pure kwargs pa..." | Re-trigger Greptile

Comment on lines +18 to +19
def __get__(self, instance, owner):
if self._cached_path is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing instance is None guard in __get__

Without a check for instance is None, accessing usd_path directly on the class (Microwave.usd_path) triggers a live Lightwheel API call. This breaks the lazy-loading contract for any code that introspects the class attribute — e.g., registry tooling, serialization, or a simple cls.usd_path is not None guard — all of which will now make network requests. The standard descriptor convention is to return self when instance is None.

Comment on lines +14 to +16
def __init__(self, **acquire_by_registry_kwargs):
self._kwargs = acquire_by_registry_kwargs
self._cached_path: str | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 None used as cache sentinel is fragile

_cached_path is initialized to None and the guard is if self._cached_path is not None. If the SDK ever returns None for a valid (but empty/missing) path without raising, the value would be cached as None but the guard would miss it on every subsequent access, causing repeated API calls. lightwheel_utils.py already uses a _MISSING = object() sentinel for exactly this pattern — applying the same approach here avoids the ambiguity between "not yet resolved" and a genuinely None result.

Suggested change
def __init__(self, **acquire_by_registry_kwargs):
self._kwargs = acquire_by_registry_kwargs
self._cached_path: str | None = None
_UNSET = object() # sentinel distinct from None
def __init__(self, **acquire_by_registry_kwargs):
self._kwargs = acquire_by_registry_kwargs
self._cached_path: str | object = LightwheelLazyPath._UNSET

Comment on lines +18 to +20
def __get__(self, instance, owner):
if self._cached_path is not None:
return self._cached_path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Cache guard must also be updated to use the sentinel

This is the companion change to replacing None with _UNSET as the sentinel (see the __init__ comment above).

Suggested change
def __get__(self, instance, owner):
if self._cached_path is not None:
return self._cached_path
def __get__(self, instance, owner):
if instance is None:
return self
if self._cached_path is not LightwheelLazyPath._UNSET:
return self._cached_path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants