Skip to content

feat(studio): API to expose test case row level data in experiments#198

Open
shanaiabuggy wants to merge 2 commits into
mainfrom
sbuggy/fp-201
Open

feat(studio): API to expose test case row level data in experiments#198
shanaiabuggy wants to merge 2 commits into
mainfrom
sbuggy/fp-201

Conversation

@shanaiabuggy
Copy link
Copy Markdown
Contributor

@shanaiabuggy shanaiabuggy commented Jun 4, 2026

Summary by CodeRabbit

  • New Features

    • Added endpoint to list experiment sessions with filtering by root-span status and test case ID
    • Session results show timing, token counts, costs, and evaluator scores
    • Paginated session listings with standard page controls
    • Returns appropriate error codes for missing or unavailable resources
  • Tests

    • Added integration tests covering listing, filtering, pagination, and unknown-experiment 404 handling

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy requested review from a team as code owners June 4, 2026 23:00
@shanaiabuggy shanaiabuggy changed the title API to expose test case row level data in experiments feat(studio): API to expose test case row level data in experiments Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Documentation preview is ready

Preview: https://nvidia-nemo.github.io/nemo-platform/pr-preview/pr-198/pr-198/

Built from 7a8b472 in workflow run.

This preview is deployed from this PR branch, updates when docs changes are pushed, and will be removed when the PR closes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bc6dfaf8-4827-49d4-aad8-b015ed70ae60

📥 Commits

Reviewing files that changed from the base of the PR and between a5e5c12 and 7a8b472.

📒 Files selected for processing (2)
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/tests/integration/spans/test_experiment_sessions.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/intake/tests/integration/spans/test_experiment_sessions.py
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py

📝 Walkthrough

Walkthrough

This PR adds a complete list-experiment-sessions endpoint across the intake API. It defines OpenAPI contracts, authorizes access, implements a ClickHouse-backed repository with aggregation and scoring, wires the endpoint handler with dependency injection, and validates end-to-end with integration tests covering pagination, filtering, and error cases.

Changes

Experiment Sessions Listing

Layer / File(s) Summary
OpenAPI contract and schemas
openapi/ga/individual/platform.openapi.yaml, openapi/ga/openapi.yaml, openapi/openapi.yaml
Defined new GET /apis/intake/v2/workspaces/{workspace}/experiments/{name}/sessions endpoint with path/query parameters and response contracts in all three specs. Introduced ExperimentSessionResponse (session metadata, tracing, timing, token/cost/evaluator aggregates) and ExperimentSessionResponsesPage (paginated wrapper).
Authorization mapping
services/core/auth/src/nmp/core/auth/assets/static-authz.yaml
Configured endpoint to require intake.experiments.read permission with intake:read and platform:read scopes.
Pydantic response schema
services/intake/src/nmp/intake/api/v2/experiments/schemas.py
Added ExperimentSessionResponse model with fields for identifiers, timing, root-span status, aggregated tokens/cost, and evaluator scores. Implemented from_row classmethod to hydrate from repository data.
ClickHouse repository
services/intake/src/nmp/intake/spans/experiment_session_repository.py
Implemented ExperimentSessionRepository with list_sessions method that builds hydrated CTE joining sessions and spans, aggregates token/cost metrics, extracts root-span status/input, applies optional filtering, paginates results, and batch-fetches evaluator mean scores. Added ExperimentSessionRow and ExperimentSessionPage data models plus SQL helpers for filtering, scoping, aggregation, and row mapping.
Endpoint handler
services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
Added list_experiment_sessions handler verifying experiment exists, injecting optional repository via get_experiment_session_repository, returning 503 when ClickHouse unavailable, calling repository with filters/pagination, converting rows to response objects, and wrapping with pagination.
Integration tests
services/intake/tests/integration/spans/test_experiment_sessions.py
Added test suite creating experiment, ingesting multiple ATIF payloads, listing sessions with pagination validation, testing test_case_id filtering, and verifying 404 for unknown experiments. Includes ATIF payload builder and timestamp helpers.

Suggested reviewers

  • asutermo
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main change: adding a new API endpoint to expose experiment session data at the row level, supporting pagination and filtering.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sbuggy/fp-201

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
services/intake/tests/integration/spans/test_experiment_sessions.py (1)

86-133: ⚡ Quick win

Add integration coverage for status filtering and the 503 path.

Current tests miss two new endpoint behaviors: status query filtering and deterministic 503 when telemetry storage is unavailable. Add both cases to lock the contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/intake/tests/integration/spans/test_experiment_sessions.py` around
lines 86 - 133, Add two integration tests: one to cover status filtering and one
to cover deterministic 503 when telemetry storage is unavailable. For status
filtering, create sessions (using the same pattern in
test_list_experiment_sessions_filter_by_test_case via client.post to ATIF_INGEST
and EXPERIMENTS) with different "status" values, then call GET
f"{EXPERIMENTS}/{experiment_name}/sessions" with params={"status":"<value>"} and
assert pagination total_results and returned data match the filtered status. For
the 503 path, add a test (similar setup to
test_list_experiment_sessions_returns_404_for_unknown_experiment) that simulates
telemetry storage being unavailable (mock or configure the telemetry/storage
client used by the service to raise or be None) before calling the sessions
endpoint and assert the response.status_code == 503 and appropriate error shape;
reference the TestClient, EXPERIMENTS, ATIF_INGEST, and the existing test
patterns for setup and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi/ga/individual/platform.openapi.yaml`:
- Around line 15425-15429: The schema for the test_case_id property currently
disallows nulls but the description says it may be null; update the OpenAPI
schema for the test_case_id property (the YAML block defining test_case_id) to
include nullable: true so null values are accepted, keeping the existing title,
description, and type entries unchanged.

In `@openapi/ga/openapi.yaml`:
- Around line 15425-15429: The OpenAPI schema for the property test_case_id
declares it can be null in the description but omits the nullable marker; update
the test_case_id property in openapi.yaml to allow nulls by adding nullable:
true beneath the test_case_id definition (keeping title, description and type
intact) so clients will accept null values for test_case_id.
- Around line 15441-15443: Update the ExperimentSessionResponse schema in
openapi.yaml to mark fields that the backend can return as null as nullable:
true; specifically add nullable: true for ended_at, latency_ms, input,
input_tokens, output_tokens, cached_tokens, and cost_total_usd in the
ExperimentSessionResponse object so the OpenAPI spec accepts nulls consistent
with services/intake/src/nmp/intake/spans/experiment_session_repository.py.

In `@openapi/openapi.yaml`:
- Around line 15425-15430: The schema for the property test_case_id currently
lists only type: string but the description states it can be null; update the
test_case_id schema to allow nulls consistently with other fields by adding
nullable: true (or changing the type to a string/null union if your style
prefers) under the test_case_id entry in openapi/openapi.yaml so the spec and
description match.

In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 432-439: Wrap the call to session_repository.list_sessions in a
try/except that catches ClickHouse-related backend exceptions and converts them
into an HTTP 503 response; specifically, around the block calling
session_repository.list_sessions capture exceptions from the ClickHouse client
(e.g., clickhouse_driver.errors.Error / connection-related exceptions) and raise
fastapi.HTTPException(status_code=503, detail="Telemetry store unavailable")
while logging the original exception for debugging. Ensure you import
fastapi.HTTPException (or use the existing FastAPI HTTPException) and only map
backend/unavailability errors to 503, re-raising other exceptions unchanged.

---

Nitpick comments:
In `@services/intake/tests/integration/spans/test_experiment_sessions.py`:
- Around line 86-133: Add two integration tests: one to cover status filtering
and one to cover deterministic 503 when telemetry storage is unavailable. For
status filtering, create sessions (using the same pattern in
test_list_experiment_sessions_filter_by_test_case via client.post to ATIF_INGEST
and EXPERIMENTS) with different "status" values, then call GET
f"{EXPERIMENTS}/{experiment_name}/sessions" with params={"status":"<value>"} and
assert pagination total_results and returned data match the filtered status. For
the 503 path, add a test (similar setup to
test_list_experiment_sessions_returns_404_for_unknown_experiment) that simulates
telemetry storage being unavailable (mock or configure the telemetry/storage
client used by the service to raise or be None) before calling the sessions
endpoint and assert the response.status_code == 503 and appropriate error shape;
reference the TestClient, EXPERIMENTS, ATIF_INGEST, and the existing test
patterns for setup and assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c63875b3-6333-457d-a1e8-fca92fe0229f

📥 Commits

Reviewing files that changed from the base of the PR and between a9469fb and a5e5c12.

⛔ Files ignored due to path filters (12)
  • sdk/python/nemo-platform/.nmpcontext/openapi.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/.nmpcontext/stainless.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/api.md is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/experiments.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/sessions.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/experiment_session_response.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/experiment_session_responses_page.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/session_list_params.py is excluded by !sdk/**
  • sdk/python/nemo-platform/tests/api_resources/experiments/test_sessions.py is excluded by !sdk/**
  • sdk/stainless.yaml is excluded by !sdk/**
📒 Files selected for processing (8)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • services/core/auth/src/nmp/core/auth/assets/static-authz.yaml
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • services/intake/src/nmp/intake/spans/experiment_session_repository.py
  • services/intake/tests/integration/spans/test_experiment_sessions.py

Comment thread openapi/ga/individual/platform.openapi.yaml
Comment thread openapi/ga/openapi.yaml
Comment thread openapi/ga/openapi.yaml
Comment thread openapi/openapi.yaml
Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18717/24765 75.6% 62.0%
Integration Tests 11995/23529 51.0% 26.2%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
# Sessions are the response payload (not enrichment), so we can't silently degrade like
# _hydrate_rollups does. Convert backend failures (ClickHouse connection drop, query
# timeout, etc.) to a deterministic 503 instead of letting them bubble as 500s.
logger.exception("Per-session read failed for workspace=%s experiment=%s", workspace, name)
# Sessions are the response payload (not enrichment), so we can't silently degrade like
# _hydrate_rollups does. Convert backend failures (ClickHouse connection drop, query
# timeout, etc.) to a deterministic 503 instead of letting them bubble as 500s.
logger.exception("Per-session read failed for workspace=%s experiment=%s", workspace, name)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants