Fix tasktype api_constraints parsing for malformed JSON and dict inputs#274
Fix tasktype api_constraints parsing for malformed JSON and dict inputs#274aayush-0131 wants to merge 2 commits intoopenml:mainfrom
Conversation
WalkthroughAdded a helper 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Consider extracting the
api_constraintsparsing logic into a small helper function to keep the endpoint handler focused on assembling the response and to make the parsing behavior reusable elsewhere. - Right now malformed JSON in
api_constraintsis silently ignored; adding at least debug-level logging for theJSONDecodeErrorcase would make it easier to trace bad data in production. - You might want to add explicit type hints for
constraintandac(e.g.,Mapping[str, Any] | str | None) to make the expected shapes clearer and let static type checkers catch future regressions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the `api_constraints` parsing logic into a small helper function to keep the endpoint handler focused on assembling the response and to make the parsing behavior reusable elsewhere.
- Right now malformed JSON in `api_constraints` is silently ignored; adding at least debug-level logging for the `JSONDecodeError` case would make it easier to trace bad data in production.
- You might want to add explicit type hints for `constraint` and `ac` (e.g., `Mapping[str, Any] | str | None`) to make the expected shapes clearer and let static type checkers catch future regressions.
## Individual Comments
### Comment 1
<location path="src/routers/openml/tasktype.py" line_range="72-77" />
<code_context>
- input_["data_type"] = constraint["data_type"]
+
+ # Accept either legacy JSON string or already parsed dict
+ constraint = None
+ ac = task_type_input.api_constraints
+ if isinstance(ac, str):
+ try:
+ constraint = json.loads(ac)
+ except json.JSONDecodeError:
+ # Keep response stable for malformed legacy values
+ constraint = None
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silently swallowing JSONDecodeError could hide data issues; consider at least lightweight logging or metrics.
Malformed legacy `api_constraints` are currently dropped, which preserves the response shape but makes data issues hard to trace. Please consider adding a minimal log or metric in the `JSONDecodeError` branch so we can detect and investigate unexpected values without changing external behavior.
Suggested implementation:
```python
import json
import logging
logger = logging.getLogger(__name__)
```
```python
# Accept either legacy JSON string or already parsed dict
constraint = None
ac = task_type_input.api_constraints
if isinstance(ac, str):
try:
constraint = json.loads(ac)
except json.JSONDecodeError:
# Keep response stable for malformed legacy values, but surface the issue via logs
logger.warning(
"Failed to decode legacy api_constraints JSON for task_type_input '%s'; value=%r",
getattr(task_type_input, "name", "<unknown>"),
ac,
exc_info=True,
)
constraint = None
elif isinstance(ac, dict):
constraint = ac
```
</issue_to_address>
### Comment 2
<location path="src/routers/openml/tasktype.py" line_range="80-81" />
<code_context>
+ except json.JSONDecodeError:
+ # Keep response stable for malformed legacy values
+ constraint = None
+ elif isinstance(ac, dict):
+ constraint = ac
+
+ if isinstance(constraint, dict):
</code_context>
<issue_to_address>
**suggestion:** The type check for dict is quite strict; consider accepting Mapping to be more flexible with ORM / JSON field types.
If `api_constraints` comes from a JSON/JSONB or other ORM field, it may be a custom mapping rather than a `dict`. Using `isinstance(ac, Mapping)` from `collections.abc` would keep behavior the same while working with those mapping types as well.
Suggested implementation:
```python
from collections.abc import Mapping
import json
```
```python
elif isinstance(ac, Mapping):
constraint = ac
if isinstance(constraint, Mapping):
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Updates the OpenML /tasktype/{task_type_id} endpoint response normalization to be more tolerant of legacy/variant api_constraints formats when building task type input metadata.
Changes:
- Accept
api_constraintsas either a legacy JSON string or an already-parsed dict. - Gracefully handle malformed legacy JSON by omitting
data_typerather than erroring.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/routers/openml/tasktype.py
Outdated
| if isinstance(ac, str): | ||
| try: | ||
| constraint = json.loads(ac) | ||
| except json.JSONDecodeError: | ||
| # Keep response stable for malformed legacy values | ||
| constraint = None |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/tasktype.py`:
- Around line 84-86: The code currently assigns constraint.get("data_type")
directly into input_ which can be non-string; change this to only set
input_["data_type"] when the retrieved data_type is a string (e.g., check
isinstance(data_type, str) or equivalent) so non-string JSON values are not
emitted; locate the assignment around the variables data_type, constraint, and
input_ (within the function handling task type parsing) and add the type guard
before setting input_["data_type"].
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 380cd0cd-ef05-49a7-9399-954efaa24034
📒 Files selected for processing (1)
src/routers/openml/tasktype.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/routers/openml/tasktype.py (2)
43-47: Avoid logging rawapi_constraintspayloads.On Line 44-Line 47, logging the full malformed value can leak unexpected data and create noisy logs. Prefer bounded metadata (e.g., length/type) instead of the full payload.
Proposed adjustment
except json.JSONDecodeError: logger.warning( - "Failed to decode legacy api_constraints JSON for task_type_input '%s'; value=%r", + "Failed to decode legacy api_constraints JSON for task_type_input '%s' (type=%s, len=%s)", input_name, - api_constraints, + type(api_constraints).__name__, + len(api_constraints), exc_info=True, ) return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/tasktype.py` around lines 43 - 47, Replace the direct logging of the raw api_constraints payload in the logger.warning call around the task type decoding code: do not include api_constraints value itself; instead log safe metadata such as type(api_constraints), a length (len(api_constraints) if it’s a sequence/string), and an optional short truncated preview or deterministic fingerprint (e.g., first N characters or a hash) together with input_name and exc_info=True; update the logger.warning invocation that currently references api_constraints and input_name so it emits only this bounded metadata for api_constraints.
32-60: Add focused tests for new parsing branches.Please add endpoint/helper coverage for malformed JSON strings, dict inputs, non-object JSON values, and non-string
data_typevalues to lock in this robustness fix.Also applies to: 104-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/tasktype.py` around lines 32 - 60, Add focused unit tests for _extract_data_type_from_api_constraints to cover all parsing branches: pass a malformed JSON string and assert it returns None (and logs a warning), pass a Mapping/dict and assert it extracts string data_type correctly, pass valid JSON that decodes to a non-object value (e.g., JSON "null" or "42") and assert it returns None, and pass a constraint object with a non-string data_type (e.g., number or list) and assert it returns None; place tests against the helper used by the endpoint so both direct helper and the endpoint path exercising this logic are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/routers/openml/tasktype.py`:
- Around line 43-47: Replace the direct logging of the raw api_constraints
payload in the logger.warning call around the task type decoding code: do not
include api_constraints value itself; instead log safe metadata such as
type(api_constraints), a length (len(api_constraints) if it’s a
sequence/string), and an optional short truncated preview or deterministic
fingerprint (e.g., first N characters or a hash) together with input_name and
exc_info=True; update the logger.warning invocation that currently references
api_constraints and input_name so it emits only this bounded metadata for
api_constraints.
- Around line 32-60: Add focused unit tests for
_extract_data_type_from_api_constraints to cover all parsing branches: pass a
malformed JSON string and assert it returns None (and logs a warning), pass a
Mapping/dict and assert it extracts string data_type correctly, pass valid JSON
that decodes to a non-object value (e.g., JSON "null" or "42") and assert it
returns None, and pass a constraint object with a non-string data_type (e.g.,
number or list) and assert it returns None; place tests against the helper used
by the endpoint so both direct helper and the endpoint path exercising this
logic are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 58219706-ad23-46b9-b018-6654d895b8f9
📒 Files selected for processing (1)
src/routers/openml/tasktype.py
Description
This PR improves robustness in the
/tasktype/{task_type_id}endpoint when processingapi_constraints.What changed
In
src/routers/openml/tasktype.py, parsing oftask_type_input.api_constraintsnow:json.JSONDecodeError) without crashing,data_typein the response when it is present.Motivation / context
Previously, malformed legacy values or unexpected shapes in
api_constraintscould lead to unstable behavior during response construction. This change makes the endpoint defensive and stable for those cases.Fixes: #273
Checklist
Please check all that apply. You can mark items as N/A if they don't apply to your change.
Always:
Required for code changes:
If applicable:
/docs)Extra context:
This PR and the commits have been created autonomously by a bot/agent.
ruff check .✅ruff format .✅pytest -k task_type❌ (blocked by local test DB hostopenml-test-databasenot available)