fix(setup): default to --service-group all --controller-group all#95
Open
aleckhoury wants to merge 6 commits into
Open
fix(setup): default to --service-group all --controller-group all#95aleckhoury wants to merge 6 commits into
aleckhoury wants to merge 6 commits into
Conversation
Make the agent spec a typed, server-side-visible artifact instead of an unstructured local markdown file, and rebuild the onboarding skill chain around it. What changes - New AgentSpec Pydantic model (nemo_agents_plugin.spec) with hard validators on Job (vague-phrase rejection, min length) and Framework (LangGraph + NAT vs. needs-wrapper enum). 12 body sections + 2 front-matter fields, sourced from the POR's existing + proposed sections table. - New spec_render module with bidirectional AgentSpec <-> markdown round-trip. Strict parser: rejects unknown sections, duplicate sections, and malformed labeled bullets before the value reaches the Pydantic model. - spec_file_ref: FilesetRef | None added to the Agent entity and CreateAgentRequest, with a --spec-file-ref flag on `nemo agents create`. Lets `nemo agents list` answer "which agents have an onboarded spec?" with a single field check. - nemo-explore SKILL.md rewritten to be explore-first and schema-driven: scans the codebase, fills every inferable field, presents the full spec in a single review block, and only asks the user for gaps. Replaces the hardcoded 12-question Q&A loop. - nemo-spec SKILL.md rewritten to validate against AgentSpec, refuse to write when Job or Framework is unresolved, upload the canonical copy to a per-agent Filesets fileset (Fileset is canonical, local file is a write-through cache), and hand SPEC_FILE_REF to nemo-build-agent. - nemo-build-agent SKILL.md updated to thread SPEC_FILE_REF through all three `nemo agents create` invocations (initial deploy, RAG redeploy, guardrails redeploy). - Spec template (references/templates/agent-spec.md) rewritten to match the new schema + renderer format. Drops the Known Issues section (handled by the Insights plugin's first-class entity instead). - Plugin OpenAPI regenerated; SDK skill copies re-vendored. Tests - 45 new unit tests across test_spec.py and test_spec_render.py covering the schema, validators, JSON-schema export, render output shape, round-trip identity, and structural parse errors. - test_entities.py / test_cli.py extended for spec_file_ref on Agent, CreateAgentRequest, and the new CLI flag. - 102 tests pass across the touched files. ruff + ruff format + ty clean on changed files (two ty diagnostics pre-existing in setup.py, unrelated).
ae73bba to
1a8e831
Compare
Address PR #94 review feedback. The spec_file_ref field on Agent (and the matching --spec-file-ref CLI flag, API request field, and SPEC_FILE_REF shell ceremony in the skills) is redundant under the project's stated constraints: spec and agent always live in the same workspace, and the spec filename is the industry-standard AGENTSpec.md. The location is fully derivable from (workspace, agent_name). - Remove spec_file_ref from Agent entity, CreateAgentRequest schema, POST /agents handler, --spec-file-ref CLI option, and openapi.yaml. - Add canonical helpers in nemo_agents_plugin.entities: AGENT_SPEC_FILENAME, agent_spec_fileset_name, agent_spec_file_ref. Downstream consumers (analyst, Studio, optimization loop) call these rather than reconstructing the path inline; layout changes are then a one-function migration. - Canonicalize the filename to AGENTSpec.md across nemo-spec and nemo-build-agent skills and template; drop the SPEC_FILE_REF shell variable and the --spec-file-ref flag from skill bash blocks. - Mirror skill changes to the vendored sdk/python copies. Also picks up CodeRabbit review fixes: - spec.py: re-enforce _MIN_JOB_LENGTH after strip (Pydantic min_length runs before @field_validator), with vague-phrase check first so the more specific diagnosis wins. Replace EN DASH with hyphen for RUF001. - test_spec_render.py: raw regex string for RUF043. - agent-spec.md template: move parse-rule guidance out of the strict Constraints / Allowed Changes / Open Questions sections into the top-level prose block. Test suite trimmed from 80 to 62 tests by removing assertions that re-state Field declarations or test Pydantic itself. Remaining tests lock validators, the markdown round-trip, parser invariants, and the new convention helpers.
c91baf3 to
4cfda64
Compare
1a8e831 to
4b09433
Compare
Second-pass audit of the spec/entity test additions, looking for tests that lock Pydantic, Field declarations, or constants against themselves rather than behavior we wrote. Cuts: - test_no_spec_file_ref_field, test_spec_file_ref_field_rejected: assert pydantic silently drops unknown keys without ``extra="forbid"``. - test_canonical_filename: ``assert AGENT_SPEC_FILENAME == "AGENTSpec.md"`` locks a constant against itself. - test_file_ref_uses_active_workspace: redundant with test_file_ref_combines_workspace_fileset_and_filename; both exercise the same one-line helper. - test_missing_framework_rejected: tests pydantic's required-field handling — framework being required is in the type signature. - test_category_count_out_of_range_rejected: tests ``Field(min_length=3, max_length=6)`` against itself. - test_validation_error_propagates: tests that parse_spec calls AgentSpec(...), which is the next line in the function. 53 tests remain; all genuinely lock validator behavior, the markdown round-trip contract, parser invariants, or the convention helpers.
4b09433 to
7e5f74f
Compare
The CLI's create handler builds its payload from a fixed dict literal with no source for spec_file_ref — asserting the key is absent locks nothing the code could plausibly violate.
7e5f74f to
1f9da5f
Compare
The lsof + curl /v1/models block in nemo-build-agent's pre-flight was a byte-for-byte duplicate of nemo-status's Step 1. Replace it with a one-line reference so platform-health probing has a single owner and future probe changes (new component, port change) land in one place. Service-specific endpoint checks (Files API, Agents API) stay local to the skills that need them — nemo-status doesn't probe those today, and expanding it would change its character from user-facing dashboard to programmatic health-check service.
2f068f0 to
f79325c
Compare
`nemo services run` (both manual and via `nemo setup --start-services`) was starting only a subset of services and a single controller, which silently broke the agent-onboarding skill chain: nemo-spec couldn't upload to Filesets, nemo-build-agent's `nemo agents deploy` wrote AgentDeployment entities that never reconciled past `pending` because the AgentDeploymentController plugin controller wasn't running. What changes - SETUP.md: manual `nemo services run` examples switched to `--service-group all --controller-group all` with a rationale block explaining what each group provides and when to use a narrower set. - nemo setup --start-services: `_start_services_background()` now passes `service_group="all"` and `controller_group="all"` to the underlying process launcher, so the streamlined setup path produces a platform that can handle every documented workflow without follow-up restarts. - nemo-spec SKILL.md: added a Files-service pre-flight that fails fast with the exact remediation if a user manually overrides defaults. - nemo-build-agent SKILL.md: strengthened the existing agents-CLI check to also probe the agents service endpoint, with the same remediation hint. Catches the failure mode where the CLI exists but the service backend was excluded from --services. - Vendored SDK copies re-synced. Tests - test_start_services_background_uses_all_groups asserts both group kwargs are passed through. - Existing data_dir passthrough test still passes.
acdbad5 to
695a420
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nemo services run— both the manual command inSETUP.mdand the implicit call fromnemo setup --start-services— was starting only a subset of services and a single controller. That silently broke the agent-onboarding skill chain:nemo-speccouldn't upload to Filesets (Files service wasn't started)nemo-build-agent'snemo agents create404'd (agents service wasn't started)nemo agents deploywroteAgentDeploymententities that never reconciled pastpendingbecauseAgentDeploymentController(a plugin controller) wasn't runningSwitching both groups to
allis the documented intent — every existing user-facing workflow needs at minimum the core + agents services and the agents plugin controller.SETUP.md: manualnemo services runexamples switched to--service-group all --controller-group allwith a rationale block explaining what each group provides and when to use a narrower set (inference-only setups).nemo setup --start-services:_start_services_background()now passesservice_group="all"andcontroller_group="all"to the underlying process launcher.nemo-specSKILL.md: added a Files-service pre-flight that fails fast with the exact remediation if a user manually overrides defaults.nemo-build-agentSKILL.md: strengthened the existing agents-CLI check to also probe the agents service endpoint, with the same remediation hint. Catches the failure mode where the CLI exists but the service backend was excluded.Test plan
test_start_services_background_uses_all_groupsasserts both group kwargs are passed throughdata_dirpassthrough test still passesruff check+ruff format --checkclean on changed filesty checkclean (two pre-existing diagnostics insetup.pyunrelated, verified via stash)pre-commit run -apasses all hooks exceptstudio-lint-staged(pre-existing local node-version mismatch, only fires under-a)entities,files,agents,modelsall return 200;AgentDeploymentController started.confirmed in startup logsStacked on
#94 — the pre-flight checks in this PR amend skill files introduced there.
When #94 merges, rebase this onto
main.