Skip to content

Fix tasktype api_constraints parsing for malformed JSON and dict inputs#274

Open
aayush-0131 wants to merge 2 commits intoopenml:mainfrom
aayush-0131:fix/tasktype-api-constraints-robustness
Open

Fix tasktype api_constraints parsing for malformed JSON and dict inputs#274
aayush-0131 wants to merge 2 commits intoopenml:mainfrom
aayush-0131:fix/tasktype-api-constraints-robustness

Conversation

@aayush-0131
Copy link

@aayush-0131 aayush-0131 commented Mar 14, 2026

Description

This PR improves robustness in the /tasktype/{task_type_id} endpoint when processing api_constraints.

What changed

In src/routers/openml/tasktype.py, parsing of task_type_input.api_constraints now:

  • accepts both legacy JSON strings and already-parsed dictionaries,
  • safely handles malformed JSON (json.JSONDecodeError) without crashing,
  • only includes data_type in the response when it is present.

Motivation / context

Previously, malformed legacy values or unexpected shapes in api_constraints could lead to unstable behavior during response construction. This change makes the endpoint defensive and stable for those cases.

Fixes: #273

Checklist

Please check all that apply. You can mark items as N/A if they don't apply to your change.

Always:

  • I have performed a self-review of my own pull request to ensure it contains all relevant information, and the proposed changes are minimal but sufficient to accomplish their task.

Required for code changes:

  • Tests pass locally
  • I have commented my code in hard-to-understand areas, and provided or updated docstrings as needed
  • Changes are already covered under existing tests
  • I have added tests that cover the changes (only required if not already under coverage)

If applicable:

  • I have made corresponding changes to the documentation pages (/docs)

Extra context:

  • This PR and the commits have been created autonomously by a bot/agent.

  • ruff check .

  • ruff format .

  • pytest -k task_type ❌ (blocked by local test DB host openml-test-database not available)

Copilot AI review requested due to automatic review settings March 14, 2026 05:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Walkthrough

Added a helper _extract_data_type_from_api_constraints and a module-level logger in src/routers/openml/tasktype.py. The code now accepts api_constraints as either a JSON string or a Mapping, attempts json.loads for strings with JSONDecodeError handling and logging, and returns data_type when present. Inline JSON parsing was replaced with this helper; input["data_type"] is populated only if the helper returns a value. A safe fallback path is used when parsing fails or data_type is absent.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main fix: improving robustness of api_constraints parsing to handle both malformed JSON and dict inputs.
Description check ✅ Passed The description is well-related to the changeset, clearly explaining what changed, why, and referencing the linked issue #273.
Linked Issues check ✅ Passed The PR fully addresses issue #273 requirements: handles malformed JSON via try/except, accepts both strings and dicts, includes data_type conditionally, and implements defensive parsing.
Out of Scope Changes check ✅ Passed All changes in tasktype.py are focused on the api_constraints parsing robustness, directly addressing issue #273 with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Consider extracting the api_constraints parsing logic into a small helper function to keep the endpoint handler focused on assembling the response and to make the parsing behavior reusable elsewhere.
  • Right now malformed JSON in api_constraints is silently ignored; adding at least debug-level logging for the JSONDecodeError case would make it easier to trace bad data in production.
  • You might want to add explicit type hints for constraint and ac (e.g., Mapping[str, Any] | str | None) to make the expected shapes clearer and let static type checkers catch future regressions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the `api_constraints` parsing logic into a small helper function to keep the endpoint handler focused on assembling the response and to make the parsing behavior reusable elsewhere.
- Right now malformed JSON in `api_constraints` is silently ignored; adding at least debug-level logging for the `JSONDecodeError` case would make it easier to trace bad data in production.
- You might want to add explicit type hints for `constraint` and `ac` (e.g., `Mapping[str, Any] | str | None`) to make the expected shapes clearer and let static type checkers catch future regressions.

## Individual Comments

### Comment 1
<location path="src/routers/openml/tasktype.py" line_range="72-77" />
<code_context>
-            input_["data_type"] = constraint["data_type"]
+
+        # Accept either legacy JSON string or already parsed dict
+        constraint = None
+        ac = task_type_input.api_constraints
+        if isinstance(ac, str):
+            try:
+                constraint = json.loads(ac)
+            except json.JSONDecodeError:
+                # Keep response stable for malformed legacy values
+                constraint = None
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silently swallowing JSONDecodeError could hide data issues; consider at least lightweight logging or metrics.

Malformed legacy `api_constraints` are currently dropped, which preserves the response shape but makes data issues hard to trace. Please consider adding a minimal log or metric in the `JSONDecodeError` branch so we can detect and investigate unexpected values without changing external behavior.

Suggested implementation:

```python
import json
import logging

logger = logging.getLogger(__name__)

```

```python
        # Accept either legacy JSON string or already parsed dict
        constraint = None
        ac = task_type_input.api_constraints
        if isinstance(ac, str):
            try:
                constraint = json.loads(ac)
            except json.JSONDecodeError:
                # Keep response stable for malformed legacy values, but surface the issue via logs
                logger.warning(
                    "Failed to decode legacy api_constraints JSON for task_type_input '%s'; value=%r",
                    getattr(task_type_input, "name", "<unknown>"),
                    ac,
                    exc_info=True,
                )
                constraint = None
        elif isinstance(ac, dict):
            constraint = ac

```
</issue_to_address>

### Comment 2
<location path="src/routers/openml/tasktype.py" line_range="80-81" />
<code_context>
+            except json.JSONDecodeError:
+                # Keep response stable for malformed legacy values
+                constraint = None
+        elif isinstance(ac, dict):
+            constraint = ac
+
+        if isinstance(constraint, dict):
</code_context>
<issue_to_address>
**suggestion:** The type check for dict is quite strict; consider accepting Mapping to be more flexible with ORM / JSON field types.

If `api_constraints` comes from a JSON/JSONB or other ORM field, it may be a custom mapping rather than a `dict`. Using `isinstance(ac, Mapping)` from `collections.abc` would keep behavior the same while working with those mapping types as well.

Suggested implementation:

```python
from collections.abc import Mapping
import json

```

```python
        elif isinstance(ac, Mapping):
            constraint = ac

        if isinstance(constraint, Mapping):

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the OpenML /tasktype/{task_type_id} endpoint response normalization to be more tolerant of legacy/variant api_constraints formats when building task type input metadata.

Changes:

  • Accept api_constraints as either a legacy JSON string or an already-parsed dict.
  • Gracefully handle malformed legacy JSON by omitting data_type rather than erroring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +74 to +79
if isinstance(ac, str):
try:
constraint = json.loads(ac)
except json.JSONDecodeError:
# Keep response stable for malformed legacy values
constraint = None
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/tasktype.py`:
- Around line 84-86: The code currently assigns constraint.get("data_type")
directly into input_ which can be non-string; change this to only set
input_["data_type"] when the retrieved data_type is a string (e.g., check
isinstance(data_type, str) or equivalent) so non-string JSON values are not
emitted; locate the assignment around the variables data_type, constraint, and
input_ (within the function handling task type parsing) and add the type guard
before setting input_["data_type"].

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 380cd0cd-ef05-49a7-9399-954efaa24034

📥 Commits

Reviewing files that changed from the base of the PR and between f94808c and b6cee6c.

📒 Files selected for processing (1)
  • src/routers/openml/tasktype.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/routers/openml/tasktype.py (2)

43-47: Avoid logging raw api_constraints payloads.

On Line 44-Line 47, logging the full malformed value can leak unexpected data and create noisy logs. Prefer bounded metadata (e.g., length/type) instead of the full payload.

Proposed adjustment
         except json.JSONDecodeError:
             logger.warning(
-                "Failed to decode legacy api_constraints JSON for task_type_input '%s'; value=%r",
+                "Failed to decode legacy api_constraints JSON for task_type_input '%s' (type=%s, len=%s)",
                 input_name,
-                api_constraints,
+                type(api_constraints).__name__,
+                len(api_constraints),
                 exc_info=True,
             )
             return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/tasktype.py` around lines 43 - 47, Replace the direct
logging of the raw api_constraints payload in the logger.warning call around the
task type decoding code: do not include api_constraints value itself; instead
log safe metadata such as type(api_constraints), a length (len(api_constraints)
if it’s a sequence/string), and an optional short truncated preview or
deterministic fingerprint (e.g., first N characters or a hash) together with
input_name and exc_info=True; update the logger.warning invocation that
currently references api_constraints and input_name so it emits only this
bounded metadata for api_constraints.

32-60: Add focused tests for new parsing branches.

Please add endpoint/helper coverage for malformed JSON strings, dict inputs, non-object JSON values, and non-string data_type values to lock in this robustness fix.

Also applies to: 104-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/tasktype.py` around lines 32 - 60, Add focused unit tests
for _extract_data_type_from_api_constraints to cover all parsing branches: pass
a malformed JSON string and assert it returns None (and logs a warning), pass a
Mapping/dict and assert it extracts string data_type correctly, pass valid JSON
that decodes to a non-object value (e.g., JSON "null" or "42") and assert it
returns None, and pass a constraint object with a non-string data_type (e.g.,
number or list) and assert it returns None; place tests against the helper used
by the endpoint so both direct helper and the endpoint path exercising this
logic are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/routers/openml/tasktype.py`:
- Around line 43-47: Replace the direct logging of the raw api_constraints
payload in the logger.warning call around the task type decoding code: do not
include api_constraints value itself; instead log safe metadata such as
type(api_constraints), a length (len(api_constraints) if it’s a
sequence/string), and an optional short truncated preview or deterministic
fingerprint (e.g., first N characters or a hash) together with input_name and
exc_info=True; update the logger.warning invocation that currently references
api_constraints and input_name so it emits only this bounded metadata for
api_constraints.
- Around line 32-60: Add focused unit tests for
_extract_data_type_from_api_constraints to cover all parsing branches: pass a
malformed JSON string and assert it returns None (and logs a warning), pass a
Mapping/dict and assert it extracts string data_type correctly, pass valid JSON
that decodes to a non-object value (e.g., JSON "null" or "42") and assert it
returns None, and pass a constraint object with a non-string data_type (e.g.,
number or list) and assert it returns None; place tests against the helper used
by the endpoint so both direct helper and the endpoint path exercising this
logic are covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 58219706-ad23-46b9-b018-6654d895b8f9

📥 Commits

Reviewing files that changed from the base of the PR and between b6cee6c and ac5dc6f.

📒 Files selected for processing (1)
  • src/routers/openml/tasktype.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tasktype endpoint should gracefully handle malformed api_constraints values

2 participants