Skip to content

feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191

Open
maxdubrinsky wants to merge 3 commits into
mainfrom
aircore-388-filter-matches/md
Open

feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191
maxdubrinsky wants to merge 3 commits into
mainfrom
aircore-388-filter-matches/md

Conversation

@maxdubrinsky
Copy link
Copy Markdown
Contributor

@maxdubrinsky maxdubrinsky commented Jun 4, 2026

What

Adds FilterOperation.matches(entity) -> bool — a Python-side evaluator of the parsed filter tree that mirrors the SQL/repository layer — and makes the jobs dispatcher use it.

Primitive (AIRCORE-388): matches() on ComparisonOperation / LogicalOperation in nemo_platform_plugin/filter_ops.py, covering $eq / $like / $in / $nin / $lt / $lte / $gt / $gte / $and / $or / $not. Semantics mirror SQLAlchemyFilterRepository exactly — case-insensitive substring $like (not regex), data.* JSON-path resolution, and SQLite text/bool/null coercion. It's a concrete (non-abstract) default that raises NotImplementedError, so existing subclasses with no in-memory evaluation (e.g. relationship / $exists) are unaffected.

Consumer (AIRCORE-324): list_jobs now evaluates the full filter tree in-memory via matches() against a virtual job entity carrying the attempt status, replacing the hand-rolled status split. This fixes AIRCORE-324: $or / $not / $nin status expressions (and status mixed with other fields) now return correct results instead of being rejected or inverted. The store pre-query uses a provably-superset transform (_status_free_superset) so it never drops a row matches() would accept; matches() then narrows the page exactly.

Compatibility

Additive. matches() is a new method (no signature/field/enum changes). For the status filters the old code already supported ($eq / $in under $and), results are unchanged. One minor delta: an invalid status value (e.g. status=zzzz) previously returned ([], 0) and now returns ([], <superset count>) — the data list is empty in both cases.

Tests

  • 56 unit tests for matches() — every operator, JSON-path vs plain attribute, null/missing, bool/numeric/datetime coercion, and the SQLite-mirror edge cases.
  • A SQL-parity test (33 cases) that runs each filter through the real SQLAlchemyFilterRepository and asserts matches() selects the identical row set — the safety net against coercion/null drift.
  • list_jobs dispatcher tests updated to assert correct results for the previously-rejected $or / $not / $nin status trees.
  • filter_ops.py line coverage 96% (remaining lines are defensive/unreachable).

Not in scope

Status-filtered pagination and total_count are inaccurate — the dispatcher pages a store superset then filters in-memory, so a page can under-fill and total_count reflects the superset. This is pre-existing (the old hand-rolled implementation had the identical structure) and is unchanged by this PR. Tracked separately in AIRCORE-738.

AIRCORE-388, AIRCORE-324

Summary by CodeRabbit

  • New Features

    • Added in-memory filter evaluation with SQL-equivalent behavior
    • Expanded job status filtering to accept complex filter expressions including NOT, NIN, and OR combinations with multiple fields
  • Tests

    • Added comprehensive test coverage validating in-memory filter evaluation matches SQL semantics

… evaluation (AIRCORE-388)

Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
…ches() (AIRCORE-324)

list_jobs now evaluates the full filter tree in-memory with
parsed.operation.matches() against a per-job "virtual entity" that carries
the current attempt's status as data.status (the attempt status enum is
injected as its JSON string value to mirror the SQL JSON-text comparison).

The store pre-query receives only a status-free SUPERSET of the filter
(via _status_free_superset), so it never drops a row that matches() would
accept; matches() then narrows the candidate page exactly. The superset
rules are sound per branch: status comparisons relax to "accept all", $and
drops status conjuncts, $or with any status branch relaxes to None (an
unconstrained branch makes the union unconstrained), and $not over a
status subtree relaxes to None (negation can invert sub/superset).

This fixes AIRCORE-324: status inside $or / $not / mixed-with-other-fields
trees that the old _validate_status_filter REJECTED now return correct
results. The hand-rolled status split (_validate_status_filter,
_extract_status_values, _strip_status, _subtree_* helpers) is removed.

Behavior for the common top-level $and status case is unchanged: the same
status-free filter is pushed to the store and the same rows are narrowed.

Tests: the AIRCORE-324 rejection tests now assert correct results, and new
tests cover status under $or, under $not, and status mixed with a
non-status field under $or.

Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
@maxdubrinsky maxdubrinsky requested review from a team as code owners June 4, 2026 21:19
@maxdubrinsky maxdubrinsky changed the title feat(filter): add FilterOperation.matches() and use it for jobs status filtering (AIRCORE-388, AIRCORE-324) feat(filter): add FilterOperation.matches() and use it for jobs status filtering Jun 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements Python-side in-memory evaluation of filter operations with SQL-parity semantics, then refactors job status filtering to eliminate special-case validation by leveraging the new capability. The core adds matches() methods to FilterOperation, ComparisonOperation, and LogicalOperation with field resolution, operator dispatch, and null/missing/type handling. Job listing no longer validates or strips status filters; instead, it builds virtual entities that inject attempt status and evaluates the full filter tree in-memory.

Changes

Filter In-Memory Evaluation

Layer / File(s) Summary
Core filter evaluation implementation
packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py
Adds _MISSING sentinel and implements FilterOperation.matches() base method (raises by default), ComparisonOperation.matches() with field resolution for plain attributes vs data.* JSON paths and operator dispatch (eq/like/in/nin/lt/lte/gt/gte), and LogicalOperation.matches() with AND/OR/NOT semantics plus validation. Includes internal helpers for dotted JSON traversal, SQL/SQLite-compatible null/text rendering, float casting, and operator-specific predicates.
Filter evaluation unit tests
packages/nemo_platform_plugin/tests/test_filter_matches.py
Covers equality, LIKE, IN/NIN, ordered comparisons (GT/GTE/LT/LTE), and logical operators (AND/OR/NOT) on plain attributes and JSON data paths. Validates null/missing handling, type coercion, SQLite float casting, datetime ISO ordering, and error cases (EXISTS unsupported, unknown fields, operator/operand mismatches).
SQL parity verification
services/core/entities/tests/test_filter_matches_sql_parity.py
Parametrized test asserting in-memory matches() evaluation selects identical row IDs as SQL evaluation via SQLAlchemyFilterRepository across comparison, LIKE, IN/NIN, ordered, and composed logical filter cases.
Job dispatcher status filtering refactor
services/core/jobs/src/nmp/core/jobs/app/dispatcher.py
Removes status filter validation/extraction/stripping; introduces _status_free_superset() to relax status constraints for store query and _build_virtual_job_entity() to inject attempt status as data.status. In-memory join now evaluates the full filter tree via matches(virtual_entity) instead of manual status comparison.
Job dispatcher test updates
services/core/jobs/tests/test_dispatcher.py
Changes test expectations from rejection (ValueError) to acceptance and correct evaluation for status filters using $not/$eq, $nin, and composite boolean expressions ($or, $not+$and) mixing status and non-status fields via in-memory matching.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main change: adding FilterOperation.matches() for in-memory filter evaluation and applying it to job status filtering.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aircore-388-filter-matches/md

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py`:
- Around line 343-359: _like_matches currently does a literal substring search
but must match SQL ilike("%value%") semantics where % is any sequence and _ is
any single character; update _like_matches to translate the input value into a
case-insensitive pattern (escape all regex metacharacters except treat '%' ->
'.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and
re.DOTALL against either _json_text(field_value) (when is_json) or
str(field_value) after the existing None/_MISSING checks; keep the behavior that
JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does
not match, and ensure the pattern is used unanchored to mirror the surrounding
"%" in ilike.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b3db696-a047-4648-8d12-d56c334926d6

📥 Commits

Reviewing files that changed from the base of the PR and between f029675 and 3a03e90.

📒 Files selected for processing (5)
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py
  • packages/nemo_platform_plugin/tests/test_filter_matches.py
  • services/core/entities/tests/test_filter_matches_sql_parity.py
  • services/core/jobs/src/nmp/core/jobs/app/dispatcher.py
  • services/core/jobs/tests/test_dispatcher.py

Comment on lines +343 to +359
def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool:
"""Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring.

`$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an
unanchored case-insensitive contains. `%`/`_` in the value are NOT treated
as wildcards here.

For a plain column a NULL value never matches (``NULL ILIKE x`` is NULL).
For a JSON field, null/missing renders as the literal text ``"null"`` and
is matched as such.
"""
needle = str(value).lower()
if is_json:
return needle in _json_text(field_value).lower()
if field_value is _MISSING or field_value is None:
return False
return needle in str(field_value).lower()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

$like semantics diverge from SQL for %/_ values.

_like_matches does literal substring checks, but SQL uses ilike("%<value>%"), where % and _ are wildcards. This breaks the parity contract for these inputs.

Proposed fix
 def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool:
-    """Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring.
+    """Mirror ``SQLAlchemyFilterRepository.like``: SQL ILIKE pattern match.

-    `$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an
-    unanchored case-insensitive contains. `%`/`_` in the value are NOT treated
-    as wildcards here.
+    SQL does ``column.ilike("%" + value + "%")``. `%` and `_` inside ``value``
+    remain SQL wildcards.
@@
-    needle = str(value).lower()
+    pattern = re.escape(str(value).lower()).replace(r"\%", ".*").replace(r"\_", ".")
     if is_json:
-        return needle in _json_text(field_value).lower()
+        return re.search(pattern, _json_text(field_value).lower()) is not None
     if field_value is _MISSING or field_value is None:
         return False
-    return needle in str(field_value).lower()
+    return re.search(pattern, str(field_value).lower()) is not None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py` around
lines 343 - 359, _like_matches currently does a literal substring search but
must match SQL ilike("%value%") semantics where % is any sequence and _ is any
single character; update _like_matches to translate the input value into a
case-insensitive pattern (escape all regex metacharacters except treat '%' ->
'.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and
re.DOTALL against either _json_text(field_value) (when is_json) or
str(field_value) after the existing None/_MISSING checks; keep the behavior that
JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does
not match, and ensure the pattern is used unanchored to mirror the surrounding
"%" in ilike.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18841/24890 75.7% 62.4%
Integration Tests 12013/23654 50.8% 25.9%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant