Skip to content

fix(flows): replace GET /flows/exists with POST and keep deprecated GET alias#275

Open
Jayant-kernel wants to merge 5 commits intoopenml:mainfrom
Jayant-kernel:feature/flows-exists-post
Open

fix(flows): replace GET /flows/exists with POST and keep deprecated GET alias#275
Jayant-kernel wants to merge 5 commits intoopenml:mainfrom
Jayant-kernel:feature/flows-exists-post

Conversation

@Jayant-kernel
Copy link

Description

This PR updates flow existence lookup to support URI-unsafe flow names by adding a POST endpoint that accepts request JSON instead of path parameters.

Changes included:

  • Adds POST /flows/exists with body { "name", "external_version" }
  • Keeps GET /flows/exists/{name}/{external_version} as a deprecated compatibility alias delegating to POST behavior
  • Adds FlowExistsBody schema with basic field validation
  • Updates flow endpoint tests to use POST and adds coverage for:
    • empty field validation (422)
    • URI-unsafe flow names
    • deprecated GET alias behavior

Fixes: #166

Checklist

Please check all that apply. You can mark items as N/A if they don't apply to your change.

Always:

  • I have performed a self-review of my own pull request to ensure it contains all relevant information, and the proposed changes are minimal but sufficient to accomplish their task.

Required for code changes:

  • Tests pass locally
  • I have commented my code in hard-to-understand areas, and provided or updated docstrings as needed
  • Changes are already covered under existing tests
  • I have added tests that cover the changes (only required if not already under coverage)

If applicable:

  • I have made corresponding changes to the documentation pages (/docs)

Extra context:

  • This PR and the commits have been created autonomously by a bot/agent.

nb. If tests pass locally but fail on CI, please try to investigate the cause. If you are unable to resolve the issue, please share your findings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bbea2800-3e28-4dd9-bbef-b128a565f9b9

📥 Commits

Reviewing files that changed from the base of the PR and between 72675aa and 841c16e.

📒 Files selected for processing (1)
  • src/routers/openml/flows.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/routers/openml/flows.py

Walkthrough

This pull request adds dataset untagging: two async DB helpers (get_tags, untag) and a POST /untag API that validates dataset existence, tag presence, and ownership/admin permissions before removing a tag and returning the dataset id. It expands dataset status filters to include IN_PREPARATION and ALL. It also refactors the flow existence check to accept a POST body (FlowExistsBody) with a deprecated GET wrapper retained for compatibility. Tests were added/updated for dataset untagging, flow existence, and migration parity between PHP and Python APIs.

Possibly related PRs

  • [ENH] RFC9457 compliant errors #238: Adds RFC9457-style problem-detail error classes and conventions used by the new TagNotFoundError and TagNotOwnedError in the untagging flow.

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes dataset tagging functionality (untag operation and tests) which is unrelated to the flow existence endpoint changes required by issue #166. Remove dataset tagging changes (datasets.py, routers/openml/datasets.py, dataset_tag_test.py, datasets_migration_test.py) as they are out of scope for the flow existence fix.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: adding POST /flows/exists endpoint while keeping the deprecated GET endpoint as an alias.
Description check ✅ Passed The PR description is relevant to the changeset, detailing the POST endpoint addition, deprecated GET alias, schema validation, and test coverage.
Linked Issues check ✅ Passed The PR addresses issue #166 by replacing GET /flows/exists with POST and keeping deprecated GET as alias, supporting URI-unsafe flow names and maintaining backward compatibility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Jayant-kernel
Copy link
Author

@PGijsbers
please review my pr

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 85-98: The current flow fetches dataset_tags and authorizes based
on matched_tag_row, then calls database.datasets.untag separately, allowing a
race where both requests pass the precheck; make the untag operation atomic by
moving authorization into the delete: modify database.datasets.untag (or add a
new method) to accept the deleting user's id (user.user_id) and perform a single
SQL DELETE ... WHERE data_id = ? AND tag = ? AND (uploader = ? OR EXISTS(SELECT
1 FROM user_groups WHERE user_id = ? AND group = 'ADMIN')), returning the
rowcount; in the route replace the precheck+untag sequence with a single call to
that atomic method and raise TagNotFoundError or TagNotOwnedError when rowcount
== 0 so the request fails when nothing was removed.

In `@src/routers/openml/flows.py`:
- Around line 34-41: The GET handler flow_exists_get currently accepts raw path
params and constructs FlowExistsBody(name=..., external_version=...), causing
Pydantic validation errors to become 500s; update the function signature to
apply the same Path constraints as FlowExistsBody (name: str = Path(...,
min_length=1, max_length=1024), external_version: str = Path(..., min_length=1,
max_length=128)) so FastAPI validates inputs and returns 422 on bad input, and
add the required Path import from fastapi.

In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 253-257: The compensating retag call currently posts to php_api
but doesn't verify success; change the block that calls php_api.post (the call
using api_key, tag, data_id/dataset_id) to capture the response (e.g., response
= await php_api.post(...)) and assert its status is successful (compare
response.status_code to HTTPStatus.OK or similar) before allowing the test to
continue so failures in the restore step fail fast and prevent downstream noise.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3abc5e77-b1ae-472c-a0ab-fa6910897c4e

📥 Commits

Reviewing files that changed from the base of the PR and between f94808c and 72675aa.

📒 Files selected for processing (7)
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • src/routers/openml/flows.py
  • src/schemas/flows.py
  • tests/routers/openml/dataset_tag_test.py
  • tests/routers/openml/flows_test.py
  • tests/routers/openml/migration/datasets_migration_test.py

Comment on lines +85 to +98
dataset_tags = await database.datasets.get_tags(data_id, expdb_db)
matched_tag_row = next((t for t in dataset_tags if t.tag.casefold() == tag.casefold()), None)
if matched_tag_row is None:
msg = f"Dataset {data_id} does not have tag {tag!r}."
raise TagNotFoundError(msg)

if matched_tag_row.uploader != user.user_id and UserGroup.ADMIN not in await user.get_groups():
msg = (
f"You may not remove tag {tag!r} of dataset {data_id} "
"because it was not created by you."
)
raise TagNotOwnedError(msg)

await database.datasets.untag(data_id, matched_tag_row.tag, connection=expdb_db)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the untag operation atomic.

This authorizes against matched_tag_row in one query and then does a separate delete in another. Two concurrent untag requests can both pass the precheck; the loser deletes 0 rows but still returns 200 OK. Please collapse this into a single authorized delete, or have database.datasets.untag(...) return rowcount and fail when nothing was actually removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/datasets.py` around lines 85 - 98, The current flow
fetches dataset_tags and authorizes based on matched_tag_row, then calls
database.datasets.untag separately, allowing a race where both requests pass the
precheck; make the untag operation atomic by moving authorization into the
delete: modify database.datasets.untag (or add a new method) to accept the
deleting user's id (user.user_id) and perform a single SQL DELETE ... WHERE
data_id = ? AND tag = ? AND (uploader = ? OR EXISTS(SELECT 1 FROM user_groups
WHERE user_id = ? AND group = 'ADMIN')), returning the rowcount; in the route
replace the precheck+untag sequence with a single call to that atomic method and
raise TagNotFoundError or TagNotOwnedError when rowcount == 0 so the request
fails when nothing was removed.

Comment on lines +34 to +41
@router.get("/exists/{name}/{external_version}", deprecated=True)
async def flow_exists_get(
name: str,
external_version: str,
expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
) -> dict[Literal["flow_id"], int]:
"""Use POST /flows/exists instead."""
return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In FastAPI with Pydantic v2, if a route handler manually instantiates a Pydantic model and that constructor raises ValidationError, is the response converted to 422 like request validation, or does it surface as a 500 unless explicitly handled?

💡 Result:

If you manually instantiate/validate a Pydantic model inside a FastAPI route handler and it raises pydantic.ValidationError, FastAPI will treat that as an internal error and the client will get HTTP 500, not an automatic 422.

FastAPI only auto-converts request parsing/validation failures into RequestValidationError (which has a default handler returning 422). A plain Pydantic ValidationError “in your response or anywhere in your code (not in the client’s request)” is considered a bug and results in 500 unless you catch/handle it yourself. [1]

So, if you want a 422, you must explicitly handle it, e.g. catch ValidationError and raise HTTPException(status_code=422, detail=e.errors(...)). [2]

Sources: [1] [2]


🏁 Script executed:

cd src/routers/openml && head -100 flows.py | cat -n

Repository: openml/server-api

Length of output: 3829


🏁 Script executed:

rg "FlowExistsBody" --type py -A 5 -B 2

Repository: openml/server-api

Length of output: 4002


🏁 Script executed:

rg "def flow_exists" --type py -A 15

Repository: openml/server-api

Length of output: 2056


Add Path constraints to the GET alias to match POST validation and ensure consistent 422 responses.

The GET route accepts name and external_version as bare path parameters with no validation, then manually instantiates FlowExistsBody(...) inside the handler. When Pydantic validation fails in a manually instantiated model, FastAPI treats it as an unhandled error returning 500, while the POST route returns 422 for the same invalid input. Apply the same constraints to path parameters as defined in FlowExistsBody (min_length=1, max_length=1024 for name; min_length=1, max_length=128 for external_version).

🛠️ Suggested fix
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, Path
@@
 async def flow_exists_get(
-    name: str,
-    external_version: str,
+    name: Annotated[str, Path(min_length=1, max_length=1024)],
+    external_version: Annotated[str, Path(min_length=1, max_length=128)],
     expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
 ) -> dict[Literal["flow_id"], int]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.get("/exists/{name}/{external_version}", deprecated=True)
async def flow_exists_get(
name: str,
external_version: str,
expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
) -> dict[Literal["flow_id"], int]:
"""Use POST /flows/exists instead."""
return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb)
`@router.get`("/exists/{name}/{external_version}", deprecated=True)
async def flow_exists_get(
name: Annotated[str, Path(min_length=1, max_length=1024)],
external_version: Annotated[str, Path(min_length=1, max_length=128)],
expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
) -> dict[Literal["flow_id"], int]:
"""Use POST /flows/exists instead."""
return await flow_exists(FlowExistsBody(name=name, external_version=external_version), expdb)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/flows.py` around lines 34 - 41, The GET handler
flow_exists_get currently accepts raw path params and constructs
FlowExistsBody(name=..., external_version=...), causing Pydantic validation
errors to become 500s; update the function signature to apply the same Path
constraints as FlowExistsBody (name: str = Path(..., min_length=1,
max_length=1024), external_version: str = Path(..., min_length=1,
max_length=128)) so FastAPI validates inputs and returns 422 on bad input, and
add the required Path import from fastapi.

Comment on lines +253 to +257
if original.status_code == HTTPStatus.OK:
await php_api.post(
"/data/tag",
data={"api_key": api_key, "tag": tag, "data_id": dataset_id},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Check the compensating retag call.

If this restore step fails, the Python request runs against the wrong starting state and any mismatch later is just noise. Please assert the /data/tag response before continuing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/migration/datasets_migration_test.py` around lines 253 -
257, The compensating retag call currently posts to php_api but doesn't verify
success; change the block that calls php_api.post (the call using api_key, tag,
data_id/dataset_id) to capture the response (e.g., response = await
php_api.post(...)) and assert its status is successful (compare
response.status_code to HTTPStatus.OK or similar) before allowing the test to
continue so failures in the restore step fail fast and prevent downstream noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

On flow names, external versions and URI-safe characters

1 participant