feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191
feat(filter): add FilterOperation.matches() and use it for jobs status filtering#191maxdubrinsky wants to merge 3 commits into
Conversation
… evaluation (AIRCORE-388) Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
…ches() (AIRCORE-324) list_jobs now evaluates the full filter tree in-memory with parsed.operation.matches() against a per-job "virtual entity" that carries the current attempt's status as data.status (the attempt status enum is injected as its JSON string value to mirror the SQL JSON-text comparison). The store pre-query receives only a status-free SUPERSET of the filter (via _status_free_superset), so it never drops a row that matches() would accept; matches() then narrows the candidate page exactly. The superset rules are sound per branch: status comparisons relax to "accept all", $and drops status conjuncts, $or with any status branch relaxes to None (an unconstrained branch makes the union unconstrained), and $not over a status subtree relaxes to None (negation can invert sub/superset). This fixes AIRCORE-324: status inside $or / $not / mixed-with-other-fields trees that the old _validate_status_filter REJECTED now return correct results. The hand-rolled status split (_validate_status_filter, _extract_status_values, _strip_status, _subtree_* helpers) is removed. Behavior for the common top-level $and status case is unchanged: the same status-free filter is pushed to the store and the same rows are narrowed. Tests: the AIRCORE-324 rejection tests now assert correct results, and new tests cover status under $or, under $not, and status mixed with a non-status field under $or. Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
📝 WalkthroughWalkthroughThis PR implements Python-side in-memory evaluation of filter operations with SQL-parity semantics, then refactors job status filtering to eliminate special-case validation by leveraging the new capability. The core adds ChangesFilter In-Memory Evaluation
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py`:
- Around line 343-359: _like_matches currently does a literal substring search
but must match SQL ilike("%value%") semantics where % is any sequence and _ is
any single character; update _like_matches to translate the input value into a
case-insensitive pattern (escape all regex metacharacters except treat '%' ->
'.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and
re.DOTALL against either _json_text(field_value) (when is_json) or
str(field_value) after the existing None/_MISSING checks; keep the behavior that
JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does
not match, and ensure the pattern is used unanchored to mirror the surrounding
"%" in ilike.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9b3db696-a047-4648-8d12-d56c334926d6
📒 Files selected for processing (5)
packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.pypackages/nemo_platform_plugin/tests/test_filter_matches.pyservices/core/entities/tests/test_filter_matches_sql_parity.pyservices/core/jobs/src/nmp/core/jobs/app/dispatcher.pyservices/core/jobs/tests/test_dispatcher.py
| def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool: | ||
| """Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring. | ||
|
|
||
| `$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an | ||
| unanchored case-insensitive contains. `%`/`_` in the value are NOT treated | ||
| as wildcards here. | ||
|
|
||
| For a plain column a NULL value never matches (``NULL ILIKE x`` is NULL). | ||
| For a JSON field, null/missing renders as the literal text ``"null"`` and | ||
| is matched as such. | ||
| """ | ||
| needle = str(value).lower() | ||
| if is_json: | ||
| return needle in _json_text(field_value).lower() | ||
| if field_value is _MISSING or field_value is None: | ||
| return False | ||
| return needle in str(field_value).lower() |
There was a problem hiding this comment.
$like semantics diverge from SQL for %/_ values.
_like_matches does literal substring checks, but SQL uses ilike("%<value>%"), where % and _ are wildcards. This breaks the parity contract for these inputs.
Proposed fix
def _like_matches(field_value: Any, value: Any, is_json: bool) -> bool:
- """Mirror ``SQLAlchemyFilterRepository.like``: case-insensitive substring.
+ """Mirror ``SQLAlchemyFilterRepository.like``: SQL ILIKE pattern match.
- `$like` is NOT regex: SQL does ``column.ilike("%" + value + "%")``, an
- unanchored case-insensitive contains. `%`/`_` in the value are NOT treated
- as wildcards here.
+ SQL does ``column.ilike("%" + value + "%")``. `%` and `_` inside ``value``
+ remain SQL wildcards.
@@
- needle = str(value).lower()
+ pattern = re.escape(str(value).lower()).replace(r"\%", ".*").replace(r"\_", ".")
if is_json:
- return needle in _json_text(field_value).lower()
+ return re.search(pattern, _json_text(field_value).lower()) is not None
if field_value is _MISSING or field_value is None:
return False
- return needle in str(field_value).lower()
+ return re.search(pattern, str(field_value).lower()) is not None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/filter_ops.py` around
lines 343 - 359, _like_matches currently does a literal substring search but
must match SQL ilike("%value%") semantics where % is any sequence and _ is any
single character; update _like_matches to translate the input value into a
case-insensitive pattern (escape all regex metacharacters except treat '%' ->
'.*' and '_' -> '.') and use a regex search (re.search) with re.IGNORECASE and
re.DOTALL against either _json_text(field_value) (when is_json) or
str(field_value) after the existing None/_MISSING checks; keep the behavior that
JSON null/missing is rendered via _json_text and that plain NULL/_MISSING does
not match, and ensure the pattern is used unanchored to mirror the surrounding
"%" in ilike.
|
What
Adds
FilterOperation.matches(entity) -> bool— a Python-side evaluator of the parsed filter tree that mirrors the SQL/repository layer — and makes the jobs dispatcher use it.Primitive (AIRCORE-388):
matches()onComparisonOperation/LogicalOperationinnemo_platform_plugin/filter_ops.py, covering$eq/$like/$in/$nin/$lt/$lte/$gt/$gte/$and/$or/$not. Semantics mirrorSQLAlchemyFilterRepositoryexactly — case-insensitive substring$like(not regex),data.*JSON-path resolution, and SQLite text/bool/null coercion. It's a concrete (non-abstract) default that raisesNotImplementedError, so existing subclasses with no in-memory evaluation (e.g. relationship /$exists) are unaffected.Consumer (AIRCORE-324):
list_jobsnow evaluates the full filter tree in-memory viamatches()against a virtual job entity carrying the attempt status, replacing the hand-rolled status split. This fixes AIRCORE-324:$or/$not/$ninstatus expressions (and status mixed with other fields) now return correct results instead of being rejected or inverted. The store pre-query uses a provably-superset transform (_status_free_superset) so it never drops a rowmatches()would accept;matches()then narrows the page exactly.Compatibility
Additive.
matches()is a new method (no signature/field/enum changes). For the status filters the old code already supported ($eq/$inunder$and), results are unchanged. One minor delta: an invalid status value (e.g.status=zzzz) previously returned([], 0)and now returns([], <superset count>)— the data list is empty in both cases.Tests
matches()— every operator, JSON-path vs plain attribute, null/missing, bool/numeric/datetime coercion, and the SQLite-mirror edge cases.SQLAlchemyFilterRepositoryand assertsmatches()selects the identical row set — the safety net against coercion/null drift.list_jobsdispatcher tests updated to assert correct results for the previously-rejected$or/$not/$ninstatus trees.filter_ops.pyline coverage 96% (remaining lines are defensive/unreachable).Not in scope
Status-filtered pagination and
total_countare inaccurate — the dispatcher pages a store superset then filters in-memory, so a page can under-fill andtotal_countreflects the superset. This is pre-existing (the old hand-rolled implementation had the identical structure) and is unchanged by this PR. Tracked separately in AIRCORE-738.AIRCORE-388, AIRCORE-324
Summary by CodeRabbit
New Features
Tests