v0.3 Add EnvGenAgent for agentic env gen#718
Conversation
Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.
… always called when loading from yaml/dict
There was a problem hiding this comment.
Review Summary
Thanks for adding the LLM-based environment generation pipeline! The modular design with clear separation between schema definition, LLM interaction, and deterministic resolution is well thought out. The trace-based debugging will be valuable for understanding resolution decisions.
Suggestions
1. Replace assertions with explicit exceptions for runtime validation
File: isaaclab_arena/llm_env_gen/llm_agent.py (lines 57-58)
The API key validation uses assert, which can be stripped with Python's -O flag:
assert self.api_key, "API key required: set NV_API_KEY or pass api_key."Consider using an explicit exception:
if not self.api_key:
raise ValueError("API key required: set NV_API_KEY or pass api_key.")2. Improve JSON extraction error handling
File: isaaclab_arena/llm_env_gen/llm_agent.py (lines 148-161)
The _extract_json method uses contextlib.suppress(json.JSONDecodeError) before attempting manual parsing. If both paths fail, it raises AssertionError. Consider:
- Raising a more descriptive custom exception (e.g.,
LLMResponseParseError) - Including the raw response in error messages for debugging
class LLMResponseParseError(ValueError):
"""Raised when the LLM response cannot be parsed as JSON."""
@staticmethod
def _extract_json(content: str) -> dict:
content = content.strip()
# ... extraction logic ...
raise LLMResponseParseError(f"Could not extract JSON from LLM response: {content[:200]}...")3. Consider adding retry logic for transient API failures
File: isaaclab_arena/llm_env_gen/llm_agent.py
LLM API calls can fail transiently due to rate limits or network issues. Consider adding retry logic with exponential backoff, or at minimum documenting that callers should implement retries:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def generate_spec(self, ...):
...4. Add test coverage for LLMAgent
File: isaaclab_arena/tests/test_resolver.py
The resolver tests are thorough with the FakeAssetRegistry pattern. Consider adding similar tests for LLMAgent:
- Test
_extract_jsonwith various edge cases (markdown fences, nested JSON, malformed responses) - Test schema validation rejections
- Mock the OpenAI client to test the overall flow without making real API calls
5. Document the silent failure behavior in resolver
File: isaaclab_arena/llm_env_gen/resolver.py
The design choice to never raise on LLM mistakes (recording trace events instead) is documented in the class docstring, which is good. Consider adding a helper method or property to check if resolution was "clean":
@property
def has_resolution_errors(self) -> bool:
"""Returns True if any trace events indicate failed resolution."""
error_stages = {"item.miss", "name.miss", "task.unknown_subject", "task.unknown_target"}
return any(e.stage in error_stages for e in self.trace)Minor Notes
- The
try_schema.pyscript is useful for manual testing but consider adding it to.gitignoreor documenting it as a development-only tool - Good use of
__post_init__inArenaEnvGraphSpecto centralize invariant checking
Overall this is a solid foundation for the LLM environment generation feature.
Update (commit 3336f26): The PR has been substantially refactored since my initial review.
Status of Previous Suggestions
-
✅ Replace assertions with explicit exceptions — Fixed. The new
EnvGenAgentuses explicitraise ValueErrorinstead ofassertfor the API key check. -
✅ Improve JSON extraction error handling — Fixed. The new
AgentResponseParseError(ValueError)custom exception is exactly what I suggested, including raw response preview (capped at 500 chars for readability). -
ℹ️ Retry logic — Not implemented, but this was a nice-to-have suggestion. Callers can wrap as needed.
-
✅ Add test coverage for LLMAgent/EnvGenAgent — Fixed. The new
test_env_gen_agent.py(399 lines) comprehensively covers:__init__argument/env-var precedence and missing-key guard_extract_jsonedge cases (fenced, prose, malformed, unbalanced braces, truncated error messages)generate_spec/pingwith mocked OpenAI client- System prompt coverage for RelationKind/TaskKind literals
-
⚪ Resolver silent failure documentation — N/A. The entire resolver module was removed in this refactor. The PR now focuses on the agent layer; resolution is presumably handled elsewhere.
New Changes Observed
- Module renamed:
llm_env_gen→agentic_env_gen - Schema renamed:
SceneSpec→EnvIntentSpec(with newreasoningfield for chain-of-thought) - Added CI job
test_agent_remote_e2ewith proper secret validation - Added
pydantic>=2.0as hard dependency in setup.py - Removed
to_dict()/to_yaml()fromArenaEnvGraphSpec— validation now runs only infrom_dict() - Added
at_posetoRelationKindliteral
No new issues found in the incremental changes. The refactor improves the codebase significantly.
Update (commit fb42596): Major refactor to OpenAI-compatible structured outputs.
Changes Observed
-
Structured outputs migration: The agent now uses
response_format={"type": "json_schema", ...}with strict mode, eliminating the need for prose-parsing fallback. This is a significant reliability improvement — the wire guarantees a valid JSON envelope. -
New
structured_output_utils.pymodule: Well-designed extraction of reusable utilities:build_strict_schema()/apply_strict_constraints()— munges pydantic schemas for OpenAI/Bedrock strict mode (addsadditionalProperties: false, populatesrequired, stripsdefaultkeys)extract_response_text()— handles NVIDIA DeepSeek'sreasoning_contentchannel quirkcheck_structured_output_support()— deployment validator probe with rich diagnostics
-
Removed code:
_extract_json()andAgentResponseParseErrorremoved — no longer needed since structured outputs guarantee the JSON envelope. Clean deletion. -
Constructor-time fail-fast: Two-stage validation on
EnvGenAgent.__init__:- First: cheap
ping()to verify connectivity/auth - Then: full
check_structured_output_support()probe with the real schema
This surfaces capability failures at construction time rather than mid-pipeline.
- First: cheap
-
Test coverage: Comprehensive —
test_structured_output_utils.py(421 lines) covers:- Schema munging for strict mode (``, nested objects, idempotency)
- Channel extraction fallback (
content→reasoning_content→empty) - Probe failure modes (4xx API error, empty envelope, parse error, validation error)
- Live endpoint test gated by
@pytest.mark.agent_remote_e2e
Notes
- The
json.loads(text, strict=False)call ingenerate_specis a good workaround for DeepSeek-v4-flash's unescaped control characters — documented via inline comment. - Good defensive caching of
self._spec_schemato avoid re-walking the schema on every call. - The system prompt was correctly trimmed to remove the now-redundant "emit ONLY JSON" instruction.
No new issues found. This is a clean, well-tested migration to structured outputs.
Update (commit f92fa73): Two defensive improvements.
-
Allow empty tasks list in
EnvIntentSpec: Removed the@model_validatorthat enforced non-empty tasks. Empty tasks now legitimately maps toNoTaskfor static sandbox environments. The field description clearly documents this behavior. -
Handle empty
choicesarray in structured output validation: Added defensive check incheck_structured_output_support()for when providers (Azure content-filter, Bedrock guardrails) return HTTP 200 with an emptychoiceslist — preventingIndexErrorand surfacing it cleanly as aparse_errorwithresponse_route="empty".
Both changes include test updates. No issues found.
Update (commit bd4293b): Simplification of the probe API + test flakiness mitigation.
Changes Observed
-
Simplified
check_structured_output_supportAPI: Replaced theStructuredOutputSupportdataclass with a clean bool-or-raise contract:- Returns
Trueon successful round-trip - Raises
RuntimeErrorwith multi-line diagnostic on any failure mode - The diagnostic includes:
response_route,finish_reason,cause, andsample_payload
This is a cleaner design — callers no longer need to inspect dataclass fields, and the probe's exception message is immediately grep-able from CI logs.
- Returns
-
Exception chaining (
raise ... from exc): The original SDK / parser exception is preserved on__cause__, so tracebacks retain full context (HTTP status, validation errors, etc.) while the outermostRuntimeErrorprovides the structured diagnostic. -
_format_failure_messagehelper: Centralizes the multi-line diagnostic format. Good separation — keeps the main function clean. -
Flaky test markers added:
@pytest.mark.flaky(max_runs=3, min_passes=1)on both live tests to absorb transient wire-level hiccups (429s, blank content from DeepSeek, etc.) without failing CI. Real breakage will still fail all 3 attempts. -
TODO comments added: Three TODOs documenting the need for exponential backoff + jitter retry logic for transient errors. Good breadcrumb for the follow-up work.
Notes
- The constructor-side wrapping in
EnvGenAgent.__init__is now cleaner — the probe raises the diagnostic directly, no caller-side message construction needed. - Test coverage updated to match the new contract: asserts
is True, checks forRuntimeError+ model name in message +__cause__chaining. - The
StructuredOutputSupportdataclass and its import are cleanly removed.
No issues found. This is a good simplification that improves the developer experience without losing diagnostic fidelity.
Update (commit 0a3c30a): Two minor follow-up improvements.
-
TODO breadcrumb in
env_gen_agent.py: Added comment noting that optional lights and HDR images need handling in the catalog builder. Good housekeeping to track future work. -
Defensive
ping()guard instructured_output_utils.py: Extended the same empty-choicesguard fromcheck_structured_output_support()to the simplerping()function. Nowping()raises a clearRuntimeError(with model name baked in) instead of an opaqueIndexErrorwhen providers return HTTP 200 with an emptychoiceslist. Docstring updated to document the new exception. Test added (test_raises_runtime_error_when_choices_list_is_empty) verifying the guard and that the model name surfaces in the error message.
No issues found. Consistent defensive coding across both probe functions.
Update (commit 6dbeda4): CI simplification — agent tests consolidated into main job.
Changes Observed
-
Removed separate
test_agent_remote_e2eCI job: The dedicated job that ran agent E2E tests has been removed. These tests now run as part of the maintest_runner_and_envjob. -
NV_API_KEYmoved to main job: The secret is now set at the top of the main test job, making it available to all tests. -
Removed
agent_remote_e2emarker: The pytest marker has been removed from bothtest_generate_spec_against_live_endpoint()andtest_default_model_supports_structured_output(). These tests now run with the standard in-process test suite (still gated by the@pytest.mark.flakydecorator for transient failure tolerance). -
pytest.ini cleanup: Removed the
agent_remote_e2emarker definition.
Assessment
This is a sensible simplification that reduces CI complexity without losing coverage. The agent tests still run (now integrated into the main job), and the flaky markers continue to handle transient API failures. No new issues found.
Update (commit f9ecfdf): Complete reintroduction of agentic_env_gen module with structured outputs.
Changes Observed
This commit appears to be a rebased/squashed version of the iterative development, introducing the full agentic environment generation feature:
-
New
agentic_env_genmodule with four files:env_gen_agent.py—EnvGenAgentclass that parses NL prompts intoEnvIntentSpecusing OpenAI-compatible structured outputsenv_intent_spec.py— Pydantic schema for structured env intent (reasoning, background, embodiment, items, relations, tasks)structured_output_utils.py— Reusable utilities for strict-mode schema munging,ping(),check_structured_output_support()try_schema.py— CLI tool for manual testing
-
CI improvements:
- Added
NV_API_KEYenvironment variable fromARENA_NV_API_KEYsecret to the main test job - Added explicit verification step that fails fast with
::error::if the secret is not configured - Updated
docker/run_docker.shto pass throughNV_API_KEYwhen set on the host
- Added
-
Comprehensive test coverage (3 new test files):
test_env_gen_agent.py— Tests forEnvGenAgentinit, generate_spec, and live endpointtest_env_intent_spec.py— ValidatesRelationKindis a subset of engine spatial constraintstest_structured_output_utils.py— Tests for schema munging, ping, and probe failure modes
-
Dependencies: Added
openaiandpydantic>=2.0tosetup.py
Design Notes
-
Chain-of-thought via
reasoningfield: TheEnvIntentSpecschema places areasoningfield first, forcing the LLM to emit its analysis before committing to structured fields. Good technique for improving output quality. -
Strict-mode schema munging:
build_strict_schema()correctly handles OpenAI's requirements (additionalProperties: false, all fields required, no defaults). -
NVIDIA DeepSeek quirk handling:
extract_response_text()checks bothcontentandreasoning_contentchannels — documented workaround for DeepSeek-v4-flash's non-standard response location. -
Lenient JSON parsing: Uses
json.loads(text, strict=False)to tolerate unescaped control characters (tabs/newlines) that DeepSeek-v4-flash sometimes emits. -
Flaky markers on live tests:
@pytest.mark.flaky(max_runs=3, min_passes=1)appropriately handles transient API failures without hiding real breakage.
Potential Considerations
-
params: dictinRelationmodel: The TODO comment notes this may cause issues with strict-mode endpoints that rejectadditionalProperties: true. Currently works with NVIDIA DeepSeek but is a portability landmine for OpenAI/Bedrock. Consider making this adict[str, str | float | int | bool | None]with explicit typing. -
No retry logic on API calls: The
TODO(qianl)comments note this is planned but not yet implemented. For production use, consider addingtenacityor similar retry wrapper. -
try_schema.pydevelopment tool: Consider adding to.gitignoreor documenting it as development-only in a README.
No blocking issues found. The implementation is well-structured with good defensive coding and test coverage.
Greptile SummaryThis PR introduces the first stage of the agentic environment-generation pipeline:
Confidence Score: 4/5Safe to merge for internal use; one edge-case in generate_spec (empty choices from the API) was acknowledged in prior review discussion as out-of-scope for this iteration. The core LLM pipeline is well-structured: strict-schema munging is correct and idempotent, response routing handles NVIDIA DeepSeek's non-standard channel, and empty-choices guards are present in ping and check_structured_output_support. The one remaining fragility — resp.choices[0] in generate_spec is still unchecked — was explicitly carried forward from a prior review round as a known, acknowledged edge case rather than an oversight. The CI verification step hard-exits the whole job when the secret is absent, which blocks even the mocked unit tests on any runner not yet configured with the key.
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant EnvGenAgent
participant OpenAI_Endpoint as OpenAI-compatible Endpoint
participant EnvIntentSpec
User->>EnvGenAgent: __init__(api_key, model, base_url)
EnvGenAgent->>OpenAI_Endpoint: ping() — smoke test
OpenAI_Endpoint-->>EnvGenAgent: "OK"
EnvGenAgent->>OpenAI_Endpoint: check_structured_output_support(EnvIntentSpec schema)
OpenAI_Endpoint-->>EnvGenAgent: probe JSON response
EnvGenAgent->>EnvIntentSpec: model_validate(probe)
EnvIntentSpec-->>EnvGenAgent: validation OK
User->>EnvGenAgent: generate_spec(prompt, catalog_text)
EnvGenAgent->>EnvGenAgent: build_catalog_text() / use provided catalog
EnvGenAgent->>OpenAI_Endpoint: "chat.completions.create(response_format=json_schema)"
OpenAI_Endpoint-->>EnvGenAgent: structured JSON envelope
EnvGenAgent->>EnvGenAgent: extract_response_text() — content or reasoning_content
EnvGenAgent->>EnvIntentSpec: json.loads + model_validate
EnvIntentSpec-->>EnvGenAgent: EnvIntentSpec instance
EnvGenAgent-->>User: (EnvIntentSpec, raw_text)
Reviews (14): Last reviewed commit: "Test for env_intent relation is subset o..." | Re-trigger Greptile |
| from .env_intent_spec import EnvIntentSpec | ||
| from .structured_output_utils import build_strict_schema, check_structured_output_support, extract_response_text, ping | ||
|
|
||
| DEFAULT_BASE_URL = "https://inference-api.nvidia.com" |
There was a problem hiding this comment.
Just curious, is this public? AFAIK, it's not. And what's the plan for external users to link to their own hub? I'm assuming as long as with OpenAI client protocol, it shall be safe right?
| (wrong model), ``APIConnectionError`` (unreachable | ||
| endpoint), or ``RateLimitError`` (quota exhausted). | ||
| """ | ||
| from openai import OpenAI |
There was a problem hiding this comment.
Can we avoid lazy import which Claude likes it sadly?
There was a problem hiding this comment.
the lazy import here is actually intentional. otherwise the mock openai in isaaclab_arena/tests/test_env_gen_agent.py using
with patch("openai.OpenAI") as mock_cls
would break.
There was a problem hiding this comment.
Ah, makes sense for the test mocking. Thanks for clarifying! ✅
| return choices[0].message.content or "" | ||
|
|
||
|
|
||
| def extract_response_text(message: Any) -> tuple[str, str]: |
There was a problem hiding this comment.
Why not the Agent's member func?
There was a problem hiding this comment.
it's used be check_structured_output_support below which I'd also like to keep as an common util function between (potential) agent classes.
There was a problem hiding this comment.
Got it — keeping it as a standalone util makes sense. 👍
| # "in" has no In class in isaaclab_arena.relations.relations yet — see the | ||
| # TODO there. The downstream env builder materializes goal-state "in" | ||
| # relations as the task's success predicate. | ||
| RelationKind = Literal["on", "in", "next_to", "at_position", "at_pose", "is_anchor"] |
There was a problem hiding this comment.
I'm adding a object relation registry, maybe you can leverage its subset. https://github.com/isaac-sim/IsaacLab-Arena/pull/736/changes
There was a problem hiding this comment.
Ideally we'd not maintain a list that is constrained to match another list somewhere else in the code base.
But perhaps this is a sub-set, and therefore we're required to make another list? Is that accurate?
There was a problem hiding this comment.
This is a subset. Certain task/constraint from the existing registry might not be desirable for the agent to output (atm they are simply untested with the agent/model).
I added a test to ensure RelationType is subset of ArenaEnvGraphSpatialConstraintType (which is what the EnvIntentSpec will be resolved into).
I'll let MR736 handle the consistency between ArenaEnvGraphSpatialConstraintType and the relations registry.
There was a problem hiding this comment.
Makes sense, thanks for clarifying. 👍
| ItemRole = Literal["foreground", "distractor", "anchor"] | ||
|
|
||
| # Task kinds the agent can propose as an atomic task. | ||
| TaskKind = Literal["pick_and_place", "open_door", "close_door"] |
There was a problem hiding this comment.
I would say we select this subset thru the task registry https://github.com/isaac-sim/IsaacLab-Arena/pull/736/changes
There was a problem hiding this comment.
Even more than Relation, for task we are strictly limiting to these three at least in v0.3 scope, so I'd keep this as a hard-coded list.
There was a problem hiding this comment.
Understood — keeping it constrained for now makes sense. ✅
|
|
||
| The functions here are the building blocks the env-gen agent uses to | ||
| send strict-mode-compatible schemas, handle provider-specific response | ||
| routing (NVIDIA DeepSeek's ``reasoning_content`` quirk), and probe a |
There was a problem hiding this comment.
Is chain-of-thought only available thru Deepseek's API reasoning_content payload? How about OpenAI which we are using here? Is it model specific or client API specific?
I'm confused.
There was a problem hiding this comment.
it seems to be model specific (and looks like deepseek-specific bug). It's not just COT, deepseek put the whole structued output in reasoning_content instead of content which is the standard OpenAI compatible (claude/openai/nemotron models all put it in .content).
Seems we should switch the default to avoid these deepseek perks?
There was a problem hiding this comment.
Got it — model-specific quirks are tricky. Thanks for clarifying! ✅
xyao-nv
left a comment
There was a problem hiding this comment.
Nit: can we fix the tone of the docstring? It feels very agentic and robotic right now, with the use of overly complicated terms and fancy structures.
Can we make it easy for human to read, and build upon?
|
|
||
| @pytest.fixture | ||
| def stub_openai(): | ||
| """Patch ``openai.OpenAI`` so ``EnvGenAgent()`` never hits the wire. |
There was a problem hiding this comment.
Why stubbing? Shouldn't CI always have the real agent in the loop?
There was a problem hiding this comment.
it's easy to inject server-side error/exceptions with a mock to test coverage for failure paths then using the actual server
the test file includes one "good path" test invoking the actual inference API in the loop
There was a problem hiding this comment.
Good point — easy mock injection for error testing. 👍
|
|
||
|
|
||
| class TestInit: | ||
| def test_explicit_api_key_overrides_env(self, monkeypatch, stub_openai): |
There was a problem hiding this comment.
sorry what's the monkeypatch?
There was a problem hiding this comment.
from claude :)
monkeypatch is a built-in pytest fixture — pytest auto-injects it whenever a test names it as a parameter.
What it gives you is scoped, auto-reverting mutations of process-global state (env vars, attributes, dict items, sys.path, etc.). Every change made through it is undone when the test finishes, even on failure or exception. That's what makes it safe to mutate os.environ from inside a test.
There was a problem hiding this comment.
Hah, TIL! Thanks for the pytest tip. ✅
| @@ -0,0 +1,429 @@ | |||
| # Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md). | |||
There was a problem hiding this comment.
This test is so agentic, and I'm sorry I have to give up reviewing it.
Can we make it less agentic (e.g. simplify to minimal set of tests, add human-friendly docstring, reduce the overly-tested scopes etc.), since we will be maintaining it as well?
There was a problem hiding this comment.
good point. fixed.
|
|
||
| Construction runs two fail-fast wire checks in order: | ||
|
|
||
| 1. :func:`.structured_output_utils.ping` — cheap liveness |
There was a problem hiding this comment.
Why is ping not a class method?
There was a problem hiding this comment.
We might have EnvGenAgent, EnvCriticAgent, ect. ping is a common util function for all agents.
There was a problem hiding this comment.
Makes sense — good call keeping it as a shared util. 👍
| @pytest.mark.flaky(max_runs=3, min_passes=1) | ||
| def test_generate_spec_against_live_endpoint(): |
There was a problem hiding this comment.
Live test fails with
ValueError instead of skipping when NV_API_KEY is absent
EnvGenAgent() raises ValueError("API key required") before any test assertion when NV_API_KEY is unset, so a developer running the suite without the key sees a hard test error rather than a skip. The same issue was explicitly fixed in test_llm_agent.py (previous thread). test_default_model_supports_structured_output in test_structured_output_utils.py has the same gap — it uses assert api_key, ... which is a no-op under python -O and raises AssertionError rather than skipping.
| @pytest.mark.flaky(max_runs=3, min_passes=1) | |
| def test_generate_spec_against_live_endpoint(): | |
| @pytest.mark.flaky(max_runs=3, min_passes=1) | |
| @pytest.mark.skipif(not os.environ.get("NV_API_KEY"), reason="NV_API_KEY not set") | |
| def test_generate_spec_against_live_endpoint(): |
There was a problem hiding this comment.
Looks basically good. Amazing work.
I left a whole lot of smaller comments.
My main concern is that we're building up multiple (>3) representations of the same thing in the codebase. We have multiple representations of Objects, Relations, Embodiments etc. We have the original run-time objects, their counter parts in the graph_spec and now were introducing a third representation here. This is going to be a bit of a nightmare to maintain because there are unwritten and unenforced contracts between these representations. For example, already, the list of available relations is different, depending on where you look. Shall we get on a call and discuss if there's some way to reduce the number here? What do you think?
| - name: Verify ARENA_NV_API_KEY is configured | ||
| run: | | ||
| if [ -z "${NV_API_KEY}" ]; then | ||
| echo "::error::ARENA_NV_API_KEY repo secret is not set." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Do we need this check in production?
I feel like a potential failure is if the API key gets outdated, and gets rejected, but this test doesn't test that.
Suggestion to remove.
There was a problem hiding this comment.
Suggestion to move this out of the environments folder. I feel that this will be substantial enough to warrent it's own folder. Also suggestion to avoid the abbreviations (we kind of failed this elsewhere... this is because Isaac Lab calls environments envs, but I feel that the code that's not interacting with Isaac Lab should avoid abbreviation)
isaaclab_arena/agentic_environment_generation/environment_generation_agent.py
| from .env_intent_spec import EnvIntentSpec | ||
| from .structured_output_utils import build_strict_schema, check_structured_output_support, extract_response_text, ping |
There was a problem hiding this comment.
We haven't used relative imports elsewhere in the code. Suggestion to go for absolute imports everywhere (unless there's a specific reason)?
| # TODO(qianl): handle optional lights and hdr images. | ||
| for name in registry.get_all_keys(): | ||
| cls = registry.get_asset_by_name(name) | ||
| tags = list(getattr(cls, "tags", [])) | ||
| if "embodiment" in tags: | ||
| embodiments.append(name) | ||
| elif "background" in tags: | ||
| backgrounds.append(name) | ||
| elif "object" in tags: | ||
| objects.append({"name": name, "tags": [t for t in tags if t != "object"]}) |
There was a problem hiding this comment.
What do you think about switching instead on the base class type.
Here we rely on people adding correct tags to things in the registry. Is we use, for example, issubclass(cls, EmbodimentBase), this is sort of guarunteed to work for all valid embodiments. No reliance on the user.
| elif "object" in tags: | ||
| objects.append({"name": name, "tags": [t for t in tags if t != "object"]}) | ||
|
|
||
| obj_lines = "\n".join(f"- {o['name']} tags={o['tags']}" for o in sorted(objects, key=lambda o: o["name"])) |
There was a problem hiding this comment.
nit: obj_lines -> object_lines.
| print(json.dumps(EnvIntentSpec.model_json_schema(), indent=2)) | ||
| return | ||
|
|
||
| from isaaclab_arena.environments.agentic_env_gen.env_gen_agent import EnvGenAgent, build_catalog_text |
| kwargs = {"model": args.model} if args.model else {} | ||
| agent = EnvGenAgent(**kwargs) |
There was a problem hiding this comment.
Bit of an odd construction.
Suggestion to move DEFAULT_MODEL in the constructor args of EnvGenAgent into the body, and make the constructor, therefore robust to taking None. Then we don't need this weird kwargs stuff.
| if args.background and args.background != spec.background: | ||
| # Swap the background name wherever it appears so downstream code | ||
| # (resolver, proposer) sees a consistent scene. Rewrite both | ||
| # ``rel.target`` (binary relations like ``on(bowl, table)``) AND | ||
| # ``rel.subject`` (unary relations like ``is_anchor(table)``); | ||
| # missing the subject case would leave the unary constraint | ||
| # pointing at the old background name, after which the resolver | ||
| # would emit a ``relation.initial.unknown_subject`` trace and | ||
| # silently drop the constraint. | ||
| old_bg = spec.background | ||
| new_bg = args.background | ||
| for rel in spec.initial_scene_graph: | ||
| if rel.subject == old_bg: | ||
| rel.subject = new_bg | ||
| if rel.target == old_bg: | ||
| rel.target = new_bg | ||
| # Note: tasks don't directly reference background in target (typically None or items), | ||
| # so no background substitution needed in task.target | ||
| spec.background = new_bg | ||
| print(f"\n=== background override applied: {old_bg!r} -> {new_bg!r} ===") |
There was a problem hiding this comment.
See comment above about marking backgrounds.
We could then remove all of this.
| "vuer[all]", | ||
| "lightwheel-sdk", | ||
| "pytest", | ||
| # Used lazily by isaaclab_arena/environments/agentic_env_gen/* for NV_API_KEY-based agent calls. |
| "pytest", | ||
| # Used lazily by isaaclab_arena/environments/agentic_env_gen/* for NV_API_KEY-based agent calls. | ||
| "openai", | ||
| # Hard dependency of isaaclab_arena/environments/agentic_env_gen/env_intent_spec.py (BaseModel / Field / |
There was a problem hiding this comment.
remove comment. Easily goes outdated because it refers to very specific implimentation detail (like where exactly a module is loaded).
Summary
First part of the LLM env gen pipeline: LLMAgent going from prompt to LLMEnvSchema (JSON)
Detailed description
Changed: Add new LLMAgent class & output shema (defines subset of supported task/constraints ect).
To run: python -m isaaclab_arena.environments.agentic_env_gen.try_schema