feat(filter): add FilterOperation.matches() and use it for jobs status filtering by maxdubrinsky · Pull Request #191 · NVIDIA-NeMo/nemo-platform

maxdubrinsky · 2026-06-04T21:19:34Z

What

Adds FilterOperation.matches(entity) -> bool — a Python-side evaluator of the parsed filter tree that mirrors the SQL/repository layer — and makes the jobs dispatcher use it.

Primitive (AIRCORE-388): matches() on ComparisonOperation / LogicalOperation in nemo_platform_plugin/filter_ops.py, covering $eq / $like / $in / $nin / $lt / $lte / $gt / $gte / $and / $or / $not. Semantics mirror SQLAlchemyFilterRepository exactly — case-insensitive substring $like (not regex), data.* JSON-path resolution, and SQLite text/bool/null coercion. It's a concrete (non-abstract) default that raises NotImplementedError, so existing subclasses with no in-memory evaluation (e.g. relationship / $exists) are unaffected.

Consumer (AIRCORE-324): list_jobs now evaluates the full filter tree in-memory via matches() against a virtual job entity carrying the attempt status, replacing the hand-rolled status split. This fixes AIRCORE-324: $or / $not / $nin status expressions (and status mixed with other fields) now return correct results instead of being rejected or inverted. The store pre-query uses a provably-superset transform (_status_free_superset) so it never drops a row matches() would accept; matches() then narrows the page exactly.

Compatibility

Additive. matches() is a new method (no signature/field/enum changes). For the status filters the old code already supported ($eq / $in under $and), results are unchanged. One minor delta: an invalid status value (e.g. status=zzzz) previously returned ([], 0) and now returns ([], <superset count>) — the data list is empty in both cases.

Tests

56 unit tests for matches() — every operator, JSON-path vs plain attribute, null/missing, bool/numeric/datetime coercion, and the SQLite-mirror edge cases.
A SQL-parity test (33 cases) that runs each filter through the real SQLAlchemyFilterRepository and asserts matches() selects the identical row set — the safety net against coercion/null drift.
list_jobs dispatcher tests updated to assert correct results for the previously-rejected $or / $not / $nin status trees.
filter_ops.py line coverage 96% (remaining lines are defensive/unreachable).

Not in scope

Status-filtered pagination and total_count are inaccurate — the dispatcher pages a store superset then filters in-memory, so a page can under-fill and total_count reflects the superset. This is pre-existing (the old hand-rolled implementation had the identical structure) and is unchanged by this PR. Tracked separately in AIRCORE-738.

AIRCORE-388, AIRCORE-324

Summary by CodeRabbit

New Features
- Added in-memory filter evaluation with SQL-equivalent behavior
- Expanded job status filtering to accept complex filter expressions including NOT, NIN, and OR combinations with multiple fields
Tests
- Added comprehensive test coverage validating in-memory filter evaluation matches SQL semantics

… evaluation (AIRCORE-388) Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>

…ches() (AIRCORE-324) list_jobs now evaluates the full filter tree in-memory with parsed.operation.matches() against a per-job "virtual entity" that carries the current attempt's status as data.status (the attempt status enum is injected as its JSON string value to mirror the SQL JSON-text comparison). The store pre-query receives only a status-free SUPERSET of the filter (via _status_free_superset), so it never drops a row that matches() would accept; matches() then narrows the candidate page exactly. The superset rules are sound per branch: status comparisons relax to "accept all", $and drops status conjuncts, $or with any status branch relaxes to None (an unconstrained branch makes the union unconstrained), and $not over a status subtree relaxes to None (negation can invert sub/superset). This fixes AIRCORE-324: status inside $or / $not / mixed-with-other-fields trees that the old _validate_status_filter REJECTED now return correct results. The hand-rolled status split (_validate_status_filter, _extract_status_values, _strip_status, _subtree_* helpers) is removed. Behavior for the common top-level $and status case is unchanged: the same status-free filter is pushed to the store and the same rows are narrowed. Tests: the AIRCORE-324 rejection tests now assert correct results, and new tests cover status under $or, under $not, and status mixed with a non-status field under $or. Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>

Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>

coderabbitai · 2026-06-04T21:25:02Z

📝 Walkthrough

Walkthrough

This PR implements Python-side in-memory evaluation of filter operations with SQL-parity semantics, then refactors job status filtering to eliminate special-case validation by leveraging the new capability. The core adds matches() methods to FilterOperation, ComparisonOperation, and LogicalOperation with field resolution, operator dispatch, and null/missing/type handling. Job listing no longer validates or strips status filters; instead, it builds virtual entities that inject attempt status and evaluates the full filter tree in-memory.

Changes

Filter In-Memory Evaluation

Layer / File(s)	Summary
Core filter evaluation implementation `packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py`	Adds `_MISSING` sentinel and implements `FilterOperation.matches()` base method (raises by default), `ComparisonOperation.matches()` with field resolution for plain attributes vs `data.*` JSON paths and operator dispatch (eq/like/in/nin/lt/lte/gt/gte), and `LogicalOperation.matches()` with AND/OR/NOT semantics plus validation. Includes internal helpers for dotted JSON traversal, SQL/SQLite-compatible null/text rendering, float casting, and operator-specific predicates.
Filter evaluation unit tests `packages/nemo_platform_plugin/tests/test_filter_matches.py`	Covers equality, LIKE, IN/NIN, ordered comparisons (GT/GTE/LT/LTE), and logical operators (AND/OR/NOT) on plain attributes and JSON data paths. Validates null/missing handling, type coercion, SQLite float casting, datetime ISO ordering, and error cases (EXISTS unsupported, unknown fields, operator/operand mismatches).
SQL parity verification `services/core/entities/tests/test_filter_matches_sql_parity.py`	Parametrized test asserting in-memory `matches()` evaluation selects identical row IDs as SQL evaluation via `SQLAlchemyFilterRepository` across comparison, LIKE, IN/NIN, ordered, and composed logical filter cases.
Job dispatcher status filtering refactor `services/core/jobs/src/nmp/core/jobs/app/dispatcher.py`	Removes status filter validation/extraction/stripping; introduces `_status_free_superset()` to relax status constraints for store query and `_build_virtual_job_entity()` to inject attempt status as `data.status`. In-memory join now evaluates the full filter tree via `matches(virtual_entity)` instead of manual status comparison.
Job dispatcher test updates `services/core/jobs/tests/test_dispatcher.py`	Changes test expectations from rejection (ValueError) to acceptance and correct evaluation for status filters using `$not/$eq`, `$nin`, and composite boolean expressions (`$or`, `$not`+`$and`) mixing status and non-status fields via in-memory matching.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.60% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title accurately describes the main change: adding FilterOperation.matches() for in-memory filter evaluation and applying it to job status filtering.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aircore-388-filter-matches/md

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py`:
- Around line 343-359: _like_matches currently does a literal substring search
but must match SQL ilike("%value%") semantics where % is any sequence and _ is
any single character; update _like_matches to translate the input value into a
case-insensitive pattern (escape all regex metacharacters except treat '%' ->
'.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and
re.DOTALL against either _json_text(field_value) (when is_json) or
str(field_value) after the existing None/_MISSING checks; keep the behavior that
JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does
not match, and ensure the pattern is used unanchored to mirror the surrounding
"%" in ilike.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b3db696-a047-4648-8d12-d56c334926d6

📥 Commits

Reviewing files that changed from the base of the PR and between f029675 and 3a03e90.

📒 Files selected for processing (5)

packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py
packages/nemo_platform_plugin/tests/test_filter_matches.py
services/core/entities/tests/test_filter_matches_sql_parity.py
services/core/jobs/src/nmp/core/jobs/app/dispatcher.py
services/core/jobs/tests/test_dispatcher.py

coderabbitai · 2026-06-04T21:25:05Z

+def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool:
+    """Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring.
+
+    `$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an
+    unanchored case-insensitive contains. `%`/`_` in the value are NOT treated
+    as wildcards here.
+
+    For a plain column a NULL value never matches (``NULL ILIKE x`` is NULL).
+    For a JSON field, null/missing renders as the literal text ``"null"`` and
+    is matched as such.
+    """
+    needle = str(value).lower()
+    if is_json:
+        return needle in _json_text(field_value).lower()
+    if field_value is _MISSING or field_value is None:
+        return False
+    return needle in str(field_value).lower()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

$like semantics diverge from SQL for %/_ values.

_like_matches does literal substring checks, but SQL uses ilike("%<value>%"), where % and _ are wildcards. This breaks the parity contract for these inputs.

Proposed fix

def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool: - """Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring. + """Mirror ``SQLAlchemyFilterRepository.like``: SQL ILIKE pattern match. - `$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an - unanchored case-insensitive contains. `%`/`_` in the value are NOT treated - as wildcards here. + SQL does ``column.ilike("%" + value + "%")``. `%` and `_` inside ``value`` + remain SQL wildcards. @@ - needle = str(value).lower() + pattern = re.escape(str(value).lower()).replace(r"\%", ".*").replace(r"\_", ".") if is_json: - return needle in _json_text(field_value).lower() + return re.search(pattern, _json_text(field_value).lower()) is not None if field_value is _MISSING or field_value is None: return False - return needle in str(field_value).lower() + return re.search(pattern, str(field_value).lower()) is not None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py` around lines 343 - 359, _like_matches currently does a literal substring search but must match SQL ilike("%value%") semantics where % is any sequence and _ is any single character; update _like_matches to translate the input value into a case-insensitive pattern (escape all regex metacharacters except treat '%' -> '.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and re.DOTALL against either _json_text(field_value) (when is_json) or str(field_value) after the existing None/_MISSING checks; keep the behavior that JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does not match, and ensure the pattern is used unanchored to mirror the surrounding "%" in ilike.

github-actions · 2026-06-04T21:33:02Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	18841/24890	75.7%	62.4%
Integration Tests	12013/23654	50.8%	25.9%

maxdubrinsky added 3 commits June 4, 2026 17:18

feat(filter): add FilterOperation.matches() for in-memory filter-tree…

75dfca7

… evaluation (AIRCORE-388) Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>

test(filter): close matches() SQLite-edge coverage gaps (AIRCORE-388)

3a03e90

Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>

maxdubrinsky requested review from a team as code owners June 4, 2026 21:19

maxdubrinsky changed the title ~~feat(filter): add FilterOperation.matches() and use it for jobs status filtering (AIRCORE-388, AIRCORE-324)~~ feat(filter): add FilterOperation.matches() and use it for jobs status filtering Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191

feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191
maxdubrinsky wants to merge 3 commits into
mainfrom
aircore-388-filter-matches/md

maxdubrinsky commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxdubrinsky commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Compatibility

Tests

Not in scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxdubrinsky commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading