Skip to content

GFQL: Cypher-style row pipeline + parser-backed where_rows hardening#931

Merged
lmeyerov merged 99 commits intomasterfrom
feat/gfql-cypher-return-pipeline
Mar 7, 2026
Merged

GFQL: Cypher-style row pipeline + parser-backed where_rows hardening#931
lmeyerov merged 99 commits intomasterfrom
feat/gfql-cypher-return-pipeline

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented Mar 5, 2026

Summary

This PR adds a Cypher-style GFQL row pipeline to g.gfql(...), expands/guards expression support, and hardens validation/runtime behavior for pure-vector execution.

Primary outcomes:

  • New row-table pipeline primitives (rows, select/with_/return_, where_rows, order_by, skip, limit, distinct, unwind, group_by).
  • Parser-backed expression validation/eval path for where_rows(expr=...).
  • Strict fail-fast checks for malformed/unsupported expression shapes.
  • Security hardening for CodeQL findings (removed ReDoS-prone regex + removed ast.literal_eval sinks in new GFQL parser paths).

Complete GFQL Example 1: Pattern Matching + where=[...]

This is the graph-pattern flow (MATCH ... WHERE alias.prop ...) using same-path constraints:

import pandas as pd
from graphistry.tests.test_compute import CGFull
from graphistry import n, e_forward, col, compare

nodes_df = pd.DataFrame({
    "id": ["u1", "u2", "a1", "a2"],
    "kind": ["user", "user", "acct", "acct"],
    "owner_id": ["o1", "o2", "o1", "o2"],
})
edges_df = pd.DataFrame({
    "s": ["u1", "u2"],
    "d": ["a1", "a2"],
    "rel": ["owns", "owns"],
})

g = CGFull().nodes(nodes_df, "id").edges(edges_df, "s", "d")

# Pattern + same-path alias constraints
result = g.gfql(
    [
        n({"kind": "user"}, name="u"),
        e_forward({"rel": "owns"}, name="e"),
        n({"kind": "acct"}, name="a"),
    ],
    where=[
        compare(col("u", "owner_id"), "==", col("a", "owner_id")),
    ],
)

print(result._nodes)
print(result._edges)

Complete GFQL Example 2: Row Pipeline (MATCH ... RETURN ... ORDER BY ... LIMIT)

This is the new row-table pipeline flow for projection/filter/sort/paging:

import pandas as pd
from graphistry.tests.test_compute import CGFull
from graphistry.compute.ast import rows, where_rows, return_, order_by, limit

nodes_df = pd.DataFrame({
    "id": ["a", "b", "c", "d"],
    "name": ["alice", "bob", "carol", "dave"],
    "score": [1, 4, 3, 2],
    "vals": [[1], [1, 2], [1, 2, 3], []],
})
edges_df = pd.DataFrame({"s": ["a", "b"], "d": ["b", "c"]})

g = CGFull().nodes(nodes_df, "id").edges(edges_df, "s", "d")

result = g.gfql([
    rows(),
    where_rows(expr="score > 1 AND size(vals) > 1"),
    return_([("id", "id"), ("name", "name"), ("score", "score")]),
    order_by([("score", "desc"), ("name", "asc")]),
    limit(2),
])

# table-shaped output
#   id   name  score
# 0  b    bob      4
# 1  c  carol      3
print(result._nodes)

New/Expanded GFQL Forms

Row-pipeline calls in this PR:

  • rows(table='nodes'|'edges', source=...)
  • select(items=[(alias, expr), ...])
  • with_(items=[...])
  • return_(...) (row projection alias)
  • where_rows(filter_dict=..., expr=...)
  • order_by(keys=[(expr, 'asc'|'desc'), ...])
  • skip(value=...)
  • limit(value=...)
  • distinct()
  • unwind(expr=..., as_=...)
  • group_by(keys=[...], aggregations=[...])

Expression coverage (validator-gated):

  • Boolean composition: NOT, AND, OR, precedence/parentheses
  • Comparators + null tests: =, !=, <>, <, <=, >, >=, IS NULL, IS NOT NULL
  • String predicates: CONTAINS, STARTS WITH, ENDS WITH
  • Quantifiers: ANY, ALL, NONE, SINGLE
  • List comprehensions: [x IN list WHERE ... | ...]
  • CASE WHEN ... THEN ... ELSE ... END
  • Literal/list/map handling with malformed-shape failfast

Validation / Safety

  • Runtime path remains pure-vector oriented (no row-wise iterrows / itertuples / .apply additions in these changes).
  • Parser/runtime gates support off/shadow/strict behavior.
  • Added security hardening for CodeQL-reported issues in changed code.

Tests

Main test coverage additions/expansions:

  • graphistry/tests/compute/gfql/test_row_pipeline_ops.py
  • graphistry/tests/compute/gfql/test_expr_parser.py
  • graphistry/tests/compute/test_gfql.py

Local validation on current branch:

  • PYTHONPATH=. pytest -q graphistry/tests/compute/gfql/test_expr_parser.py graphistry/tests/compute/gfql/test_row_pipeline_ops.py graphistry/tests/compute/test_gfql.py graphistry/tests/compute/test_call_operations.py
    • 152 passed, 26 skipped
  • py_compile on touched GFQL files: pass
  • ruff check on touched GFQL files: pass
  • mypy --ignore-missing-imports --follow-imports=skip on touched GFQL files: pass

Comment thread graphistry/compute/gfql/call_safelist.py Fixed
Comment thread graphistry/compute/gfql/call_safelist.py Fixed
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to standalone pr

Comment thread graphistry/compute/gfql/call/validation.py
lmeyerov added 26 commits March 5, 2026 21:30
Comment thread graphistry/compute/gfql/expr_parser.py
lmeyerov added 27 commits March 6, 2026 20:55
@lmeyerov lmeyerov merged commit 852a8ce into master Mar 7, 2026
66 checks passed
@lmeyerov lmeyerov deleted the feat/gfql-cypher-return-pipeline branch March 7, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants