fix(flows): replace GET /flows/exists with POST and keep deprecated GET alias#275
fix(flows): replace GET /flows/exists with POST and keep deprecated GET alias#275Jayant-kernel wants to merge 5 commits intoopenml:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis pull request adds dataset untagging: two async DB helpers ( Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@PGijsbers |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 85-98: The current flow fetches dataset_tags and authorizes based
on matched_tag_row, then calls database.datasets.untag separately, allowing a
race where both requests pass the precheck; make the untag operation atomic by
moving authorization into the delete: modify database.datasets.untag (or add a
new method) to accept the deleting user's id (user.user_id) and perform a single
SQL DELETE ... WHERE data_id = ? AND tag = ? AND (uploader = ? OR EXISTS(SELECT
1 FROM user_groups WHERE user_id = ? AND group = 'ADMIN')), returning the
rowcount; in the route replace the precheck+untag sequence with a single call to
that atomic method and raise TagNotFoundError or TagNotOwnedError when rowcount
== 0 so the request fails when nothing was removed.
In `@src/routers/openml/flows.py`:
- Around line 34-41: The GET handler flow_exists_get currently accepts raw path
params and constructs FlowExistsBody(name=..., external_version=...), causing
Pydantic validation errors to become 500s; update the function signature to
apply the same Path constraints as FlowExistsBody (name: str = Path(...,
min_length=1, max_length=1024), external_version: str = Path(..., min_length=1,
max_length=128)) so FastAPI validates inputs and returns 422 on bad input, and
add the required Path import from fastapi.
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 253-257: The compensating retag call currently posts to php_api
but doesn't verify success; change the block that calls php_api.post (the call
using api_key, tag, data_id/dataset_id) to capture the response (e.g., response
= await php_api.post(...)) and assert its status is successful (compare
response.status_code to HTTPStatus.OK or similar) before allowing the test to
continue so failures in the restore step fail fast and prevent downstream noise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3abc5e77-b1ae-472c-a0ab-fa6910897c4e
📒 Files selected for processing (7)
src/database/datasets.pysrc/routers/openml/datasets.pysrc/routers/openml/flows.pysrc/schemas/flows.pytests/routers/openml/dataset_tag_test.pytests/routers/openml/flows_test.pytests/routers/openml/migration/datasets_migration_test.py
| dataset_tags = await database.datasets.get_tags(data_id, expdb_db) | ||
| matched_tag_row = next((t for t in dataset_tags if t.tag.casefold() == tag.casefold()), None) | ||
| if matched_tag_row is None: | ||
| msg = f"Dataset {data_id} does not have tag {tag!r}." | ||
| raise TagNotFoundError(msg) | ||
|
|
||
| if matched_tag_row.uploader != user.user_id and UserGroup.ADMIN not in await user.get_groups(): | ||
| msg = ( | ||
| f"You may not remove tag {tag!r} of dataset {data_id} " | ||
| "because it was not created by you." | ||
| ) | ||
| raise TagNotOwnedError(msg) | ||
|
|
||
| await database.datasets.untag(data_id, matched_tag_row.tag, connection=expdb_db) |
There was a problem hiding this comment.
Make the untag operation atomic.
This authorizes against matched_tag_row in one query and then does a separate delete in another. Two concurrent untag requests can both pass the precheck; the loser deletes 0 rows but still returns 200 OK. Please collapse this into a single authorized delete, or have database.datasets.untag(...) return rowcount and fail when nothing was actually removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/datasets.py` around lines 85 - 98, The current flow
fetches dataset_tags and authorizes based on matched_tag_row, then calls
database.datasets.untag separately, allowing a race where both requests pass the
precheck; make the untag operation atomic by moving authorization into the
delete: modify database.datasets.untag (or add a new method) to accept the
deleting user's id (user.user_id) and perform a single SQL DELETE ... WHERE
data_id = ? AND tag = ? AND (uploader = ? OR EXISTS(SELECT 1 FROM user_groups
WHERE user_id = ? AND group = 'ADMIN')), returning the rowcount; in the route
replace the precheck+untag sequence with a single call to that atomic method and
raise TagNotFoundError or TagNotOwnedError when rowcount == 0 so the request
fails when nothing was removed.
| @router.get("/exists/{name}/{external_version}", deprecated=True) | ||
| async def flow_exists_get( | ||
| name: str, | ||
| external_version: str, | ||
| expdb: Annotated[AsyncConnection, Depends(expdb_connection)], | ||
| ) -> dict[Literal["flow_id"], int]: | ||
| """Use POST /flows/exists instead.""" | ||
| return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In FastAPI with Pydantic v2, if a route handler manually instantiates a Pydantic model and that constructor raises ValidationError, is the response converted to 422 like request validation, or does it surface as a 500 unless explicitly handled?
💡 Result:
If you manually instantiate/validate a Pydantic model inside a FastAPI route handler and it raises pydantic.ValidationError, FastAPI will treat that as an internal error and the client will get HTTP 500, not an automatic 422.
FastAPI only auto-converts request parsing/validation failures into RequestValidationError (which has a default handler returning 422). A plain Pydantic ValidationError “in your response or anywhere in your code (not in the client’s request)” is considered a bug and results in 500 unless you catch/handle it yourself. [1]
So, if you want a 422, you must explicitly handle it, e.g. catch ValidationError and raise HTTPException(status_code=422, detail=e.errors(...)). [2]
Sources: [1] [2]
🏁 Script executed:
cd src/routers/openml && head -100 flows.py | cat -nRepository: openml/server-api
Length of output: 3829
🏁 Script executed:
rg "FlowExistsBody" --type py -A 5 -B 2Repository: openml/server-api
Length of output: 4002
🏁 Script executed:
rg "def flow_exists" --type py -A 15Repository: openml/server-api
Length of output: 2056
Add Path constraints to the GET alias to match POST validation and ensure consistent 422 responses.
The GET route accepts name and external_version as bare path parameters with no validation, then manually instantiates FlowExistsBody(...) inside the handler. When Pydantic validation fails in a manually instantiated model, FastAPI treats it as an unhandled error returning 500, while the POST route returns 422 for the same invalid input. Apply the same constraints to path parameters as defined in FlowExistsBody (min_length=1, max_length=1024 for name; min_length=1, max_length=128 for external_version).
🛠️ Suggested fix
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, Path
@@
async def flow_exists_get(
- name: str,
- external_version: str,
+ name: Annotated[str, Path(min_length=1, max_length=1024)],
+ external_version: Annotated[str, Path(min_length=1, max_length=128)],
expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
) -> dict[Literal["flow_id"], int]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @router.get("/exists/{name}/{external_version}", deprecated=True) | |
| async def flow_exists_get( | |
| name: str, | |
| external_version: str, | |
| expdb: Annotated[AsyncConnection, Depends(expdb_connection)], | |
| ) -> dict[Literal["flow_id"], int]: | |
| """Use POST /flows/exists instead.""" | |
| return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb) | |
| `@router.get`("/exists/{name}/{external_version}", deprecated=True) | |
| async def flow_exists_get( | |
| name: Annotated[str, Path(min_length=1, max_length=1024)], | |
| external_version: Annotated[str, Path(min_length=1, max_length=128)], | |
| expdb: Annotated[AsyncConnection, Depends(expdb_connection)], | |
| ) -> dict[Literal["flow_id"], int]: | |
| """Use POST /flows/exists instead.""" | |
| return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/flows.py` around lines 34 - 41, The GET handler
flow_exists_get currently accepts raw path params and constructs
FlowExistsBody(name=..., external_version=...), causing Pydantic validation
errors to become 500s; update the function signature to apply the same Path
constraints as FlowExistsBody (name: str = Path(..., min_length=1,
max_length=1024), external_version: str = Path(..., min_length=1,
max_length=128)) so FastAPI validates inputs and returns 422 on bad input, and
add the required Path import from fastapi.
| if original.status_code == HTTPStatus.OK: | ||
| await php_api.post( | ||
| "/data/tag", | ||
| data={"api_key": api_key, "tag": tag, "data_id": dataset_id}, | ||
| ) |
There was a problem hiding this comment.
Check the compensating retag call.
If this restore step fails, the Python request runs against the wrong starting state and any mismatch later is just noise. Please assert the /data/tag response before continuing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/routers/openml/migration/datasets_migration_test.py` around lines 253 -
257, The compensating retag call currently posts to php_api but doesn't verify
success; change the block that calls php_api.post (the call using api_key, tag,
data_id/dataset_id) to capture the response (e.g., response = await
php_api.post(...)) and assert its status is successful (compare
response.status_code to HTTPStatus.OK or similar) before allowing the test to
continue so failures in the restore step fail fast and prevent downstream noise.
Description
This PR updates flow existence lookup to support URI-unsafe flow names by adding a POST endpoint that accepts request JSON instead of path parameters.
Changes included:
POST /flows/existswith body{ "name", "external_version" }GET /flows/exists/{name}/{external_version}as a deprecated compatibility alias delegating to POST behaviorFlowExistsBodyschema with basic field validation422)Fixes: #166
Checklist
Please check all that apply. You can mark items as N/A if they don't apply to your change.
Always:
Required for code changes:
If applicable:
/docs)Extra context:
nb. If tests pass locally but fail on CI, please try to investigate the cause. If you are unable to resolve the issue, please share your findings.