Add GET /runs/trace/{run_id} endpoint#272
Add GET /runs/trace/{run_id} endpoint#272saathviksheerla wants to merge 2 commits intoopenml:mainfrom
Conversation
WalkthroughAdds two RFC 9457 problem-detail errors (RunNotFoundError, RunTraceNotFoundError), database helpers to fetch a run and its trace, Pydantic schemas (TraceIteration, RunTrace), a FastAPI router with a GET /runs/trace/{run_id} endpoint and registration of that router in the app, and tests covering successful retrieval, missing trace, and missing run. The endpoint validates run existence, queries trace rows, and returns a structured RunTrace or raises the new errors. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #272 +/- ##
=======================================
Coverage ? 55.16%
=======================================
Files ? 37
Lines ? 1490
Branches ? 118
=======================================
Hits ? 822
Misses ? 666
Partials ? 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
database.runs.get, since the result is only used for existence checking, consider selecting a constant (e.g.SELECT 1) and usingscalar_one_or_none()to reduce overhead and avoid returning an unused row object. - The
RunTrace.run_idfield usesField(serialization_alias="run_id")even though the attribute name matches the alias; you can likely drop the alias for simplicity unless you specifically need different input/output names. - The
selected=row.selected == "true"mapping assumes the database stores the value as the exact string "true"; consider normalizing/handling boolean or other truthy representations to make this more robust against schema or data differences.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database.runs.get`, since the result is only used for existence checking, consider selecting a constant (e.g. `SELECT 1`) and using `scalar_one_or_none()` to reduce overhead and avoid returning an unused row object.
- The `RunTrace.run_id` field uses `Field(serialization_alias="run_id")` even though the attribute name matches the alias; you can likely drop the alias for simplicity unless you specifically need different input/output names.
- The `selected=row.selected == "true"` mapping assumes the database stores the value as the exact string "true"; consider normalizing/handling boolean or other truthy representations to make this more robust against schema or data differences.
## Individual Comments
### Comment 1
<location path="tests/routers/openml/runs_test.py" line_range="10-19" />
<code_context>
+
+
+@pytest.mark.parametrize("run_id", [34])
+async def test_get_run_trace_success(run_id: int, py_api: httpx.AsyncClient) -> None:
+ """Test that trace data is returned for a run that has trace entries."""
+ response = await py_api.get(f"/runs/trace/{run_id}")
+ assert response.status_code == HTTPStatus.OK
+ body = response.json()
+ assert body["run_id"] == run_id
+ assert isinstance(body["trace"], list)
+ assert len(body["trace"]) > 0
+ first = body["trace"][0]
+ assert "repeat" in first
+ assert "fold" in first
+ assert "iteration" in first
+ assert "setup_string" in first
+ assert "evaluation" in first
+ assert "selected" in first
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen success test by asserting types/values of key fields, not only their presence
Since the test only verifies key presence, it won’t catch type regressions from the DB layer to the API schema. Please also assert that `selected` is a boolean and that `evaluation` is either `None` or numeric, e.g.:
```python
assert isinstance(first["selected"], bool)
assert (first["evaluation"] is None) or isinstance(first["evaluation"], (int, float))
```
This helps detect accidental changes in the response shape earlier.
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/runs_test.py" line_range="28-33" />
<code_context>
+
+
+@pytest.mark.parametrize("run_id", [24])
+async def test_get_run_trace_no_trace(run_id: int, py_api: httpx.AsyncClient) -> None:
+ """Test that 412 is returned for a run that exists but has no trace."""
+ response = await py_api.get(f"/runs/trace/{run_id}")
+ assert response.status_code == HTTPStatus.PRECONDITION_FAILED
+ body = response.json()
+ assert body["code"] == "572"
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `no_trace` test to validate the full problem details shape, not just the code
In addition to status and error `code`, please assert key fields from the problem details payload (e.g. `type`/`title`/`status`) so the error contract is fully covered and accidental changes are caught earlier. For example:
```python
assert body["status"] == HTTPStatus.PRECONDITION_FAILED
assert body["type"].endswith("/problems/run-trace-not-found")
assert body["title"] == "Run Trace Not Found"
```
It would also be helpful to mirror this in the `run_not_found` test using the corresponding `RunNotFoundError` values.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/database/runs.py (1)
30-33: Add explicit trace ordering for deterministic responses.The trace query currently relies on DB default row order. Please sort explicitly so client output is stable and reproducible.
Proposed fix
SELECT `repeat`, `fold`, `iteration`, `setup_string`, `evaluation`, `selected` FROM `trace` WHERE `run_id` = :run_id + ORDER BY `repeat`, `fold`, `iteration`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/runs.py` around lines 30 - 33, The trace SELECT in src/database/runs.py relies on unspecified DB ordering; update the query that selects `repeat`, `fold`, `iteration`, `setup_string`, `evaluation`, `selected` for a given `run_id` to include an explicit ORDER BY (e.g., ORDER BY `repeat`, `fold`, `iteration`, `selected`) so results are deterministic and reproducible; locate the SQL string used to fetch traces and append the ORDER BY clause to it (keeping the WHERE `run_id` = :run_id filter intact).tests/routers/openml/runs_test.py (1)
18-25: Strengthen success-path assertions forselected.Please assert the accepted
selectedvalues, not just key presence, so contract regressions are caught.Proposed fix
first = body["trace"][0] @@ assert "selected" in first + assert first["selected"] in {"true", "false"}Based on learnings: enum-like values in this API should be exposed as literal strings (
"true"/"false"), so tests should verify that explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_test.py` around lines 18 - 25, The test currently only checks presence of the "selected" key on the first trace entry; update the assertion to verify the value is one of the literal strings "true" or "false" so regressions are caught—locate the variable first = body["trace"][0] in tests/routers/openml/runs_test.py and replace the key-presence assert for "selected" with an assertion that first["selected"] is in the allowed set {"true", "false"} (i.e., explicitly check the string value, not just existence).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/runs.py`:
- Line 40: The response mapping currently coerces the DB enum-like literal by
doing selected=row.selected == "true"; instead, preserve the original DB value
and pass row.selected through unchanged (replace the boolean coercion with the
literal), updating the mapping where "selected" is assigned (the expression
using row.selected) so the API shape remains the original string
("true"/"false").
In `@src/schemas/runs.py`:
- Line 14: The `selected` field in the runs schema is currently typed as bool
but must preserve literal string values "true"/"false"; change its type to
typing.Literal["true", "false"] (or str with a validator enforcing those exact
values) in the schema definition (the `selected` field in the model declared in
src/schemas/runs.py) and add a Pydantic validator if needed to ensure only
"true"/"false" are accepted and returned as strings so the response contract is
preserved.
---
Nitpick comments:
In `@src/database/runs.py`:
- Around line 30-33: The trace SELECT in src/database/runs.py relies on
unspecified DB ordering; update the query that selects `repeat`, `fold`,
`iteration`, `setup_string`, `evaluation`, `selected` for a given `run_id` to
include an explicit ORDER BY (e.g., ORDER BY `repeat`, `fold`, `iteration`,
`selected`) so results are deterministic and reproducible; locate the SQL string
used to fetch traces and append the ORDER BY clause to it (keeping the WHERE
`run_id` = :run_id filter intact).
In `@tests/routers/openml/runs_test.py`:
- Around line 18-25: The test currently only checks presence of the "selected"
key on the first trace entry; update the assertion to verify the value is one of
the literal strings "true" or "false" so regressions are caught—locate the
variable first = body["trace"][0] in tests/routers/openml/runs_test.py and
replace the key-presence assert for "selected" with an assertion that
first["selected"] is in the allowed set {"true", "false"} (i.e., explicitly
check the string value, not just existence).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2084608b-d09f-4e37-af42-5c84b9ffd054
📒 Files selected for processing (6)
src/core/errors.pysrc/database/runs.pysrc/main.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_test.py
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
database.runs.get, consider returning a simple boolean (or usingscalar_one_or_none()withSELECT 1 ... LIMIT 1) instead of aRow | None, since the query is only used as an existence check and this would make the intent and typing clearer. - The
get_tracefunction currently returns a genericSequence[Row]; if you add a typedProtocol/TypedDictor a small data class for the expected columns, you can get better type safety and make the mapping inget_run_tracemore robust to schema changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database.runs.get`, consider returning a simple boolean (or using `scalar_one_or_none()` with `SELECT 1 ... LIMIT 1`) instead of a `Row | None`, since the query is only used as an existence check and this would make the intent and typing clearer.
- The `get_trace` function currently returns a generic `Sequence[Row]`; if you add a typed `Protocol`/`TypedDict` or a small data class for the expected columns, you can get better type safety and make the mapping in `get_run_trace` more robust to schema changes.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/routers/openml/runs_test.py (1)
11-41: Reduce seed-data coupling in endpoint tests.These tests depend on fixed seeded IDs (Line 11, Line 28, Line 40). Consider arranging run/trace rows inside the test transaction (or via fixtures) so they stay deterministic if seed SQL changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_test.py` around lines 11 - 41, The tests test_get_run_trace_success, test_get_run_trace_no_trace (and the parametrized case in test_get_run_trace_run_not_found) are coupled to seeded run IDs (34, 24, 999999); instead insert or create the required run and trace rows inside the test or via a fixture (transaction-scoped or test-scoped factory) and use the created run_id in the GET call to /runs/trace/{run_id}, or create a run without traces for the PRECONDITION_FAILED case; ensure cleanup by using the test DB transaction/fixture so tests remain deterministic if seed SQL changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/routers/openml/runs_test.py`:
- Around line 11-41: The tests test_get_run_trace_success,
test_get_run_trace_no_trace (and the parametrized case in
test_get_run_trace_run_not_found) are coupled to seeded run IDs (34, 24,
999999); instead insert or create the required run and trace rows inside the
test or via a fixture (transaction-scoped or test-scoped factory) and use the
created run_id in the GET call to /runs/trace/{run_id}, or create a run without
traces for the PRECONDITION_FAILED case; ensure cleanup by using the test DB
transaction/fixture so tests remain deterministic if seed SQL changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 74f66776-f5ed-43cc-ad45-a5f9b1a40f25
📒 Files selected for processing (4)
src/database/runs.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/database/runs.py
- src/schemas/runs.py
Description
Implements
GET /runs/trace/{run_id}as part of the run endpoints.Fixes: #40
Related: #36
Matches PHP API behavior for error codes (571/572) and response shape.
Returns 412 with code 571 if run does not exist, 412 with code 572 if run
exists but has no trace, and trace rows on success.
Checklist
Please check all that apply. You can mark items as N/A if they don't apply to your change.
Always:
Required for code changes:
If applicable:
/docs)Extra context: