Fix UNWIND [null] causing count() to miscount null rows (#2383)#2411
Fix UNWIND [null] causing count() to miscount null rows (#2383)#2411crprashant wants to merge 1 commit intoapache:masterfrom
Conversation
33d6650 to
290105d
Compare
|
Heads-up for reviewers — just noticed @MuhammadTahaNaveed's #2406 fixes the same root cause (agtype Happy to close #2411 in favor of #2406 once it lands. The 7 regression cases in Thanks for the fast work on the null-propagation family of bugs. |
There was a problem hiding this comment.
Pull request overview
Fixes Apache AGE’s UNWIND null-row semantics so count(expr) and other null-strict SQL operators correctly ignore top-level JSON null elements produced by UNWIND, aligning behavior with Neo4j/openCypher (issue #2383).
Changes:
- Added a dedicated SRF
age_unwind()that emits SQL NULL rows for top-levelAGTV_NULLelements while preserving nested agtype nulls. - Updated Cypher UNWIND transformation to call
age_unwindinstead of routing throughage_unnest. - Added regression coverage for UNWIND/count/IS NULL behavior and updated expected outputs; added SQL declarations for installation/upgrade.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/utils/adt/agtype.c | Introduces age_unwind() SRF with UNWIND-specific top-level null → SQL NULL behavior. |
| src/backend/parser/cypher_clause.c | Switches UNWIND clause transformation to invoke age_unwind. |
| sql/agtype_typecast.sql | Declares ag_catalog.age_unwind(agtype) RETURNS SETOF agtype. |
| age--1.7.0--y.y.y.sql | Adds age_unwind to the upgrade template for in-place upgrades. |
| regress/sql/cypher_unwind.sql | Adds regression queries validating UNWIND null semantics and strict-null operator behavior. |
| regress/expected/cypher_unwind.out | Updates expected results for the new regression coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cypher's count() aggregate follows Neo4j/openCypher semantics and must
ignore nulls. Previously,
UNWIND [null] AS x RETURN count(x)
returned 1 instead of 0 because age_unnest() materialized an agtype
JSON null as a non-NULL datum wrapping AGTV_NULL, so the count()
strictness check never skipped the row. The reference paths
RETURN count(null)
WITH null AS x RETURN count(x)
already returned 0.
Rather than change age_unnest() -- which is also reached by list
comprehensions and the predicate functions all/any/none/single via the
construct_age_function_name('unnest') -> 'age_unnest' rewrite and whose
callers deliberately rely on agtype-null flowing through -- introduce
a dedicated SRF age_unwind() with UNWIND-specific null semantics:
* top-level AGTV_NULL elements are emitted as SQL NULL rows so
null-strict operators (count, IS NULL, WHERE x IS NOT NULL, ...)
behave like they do for WITH null AS x,
* non-null elements are returned unchanged, and
* nested nulls inside arrays/objects are preserved as agtype-null
so container semantics are unchanged.
transform_cypher_unwind() now calls age_unwind(); list comprehensions
and predicate functions continue to go through age_unnest(), so their
existing AGE-specific null behavior is untouched.
Adds regression tests covering:
* UNWIND [null] RETURN count(x) = 0
* UNWIND [1, null, 2, null, 3] RETURN count(x) = 3
* UNWIND [null] RETURN x IS NULL = true
* UNWIND [null] RETURN x produces an SQL NULL row value
* WHERE x IS NOT NULL filtering after UNWIND
* count(*) still sees the row produced by UNWIND [null]
* nested nulls inside containers survive unchanged
Fixes apache#2383.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
290105d to
50b4253
Compare
Fixes #2383.
Problem
Per Neo4j / openCypher semantics,
count()must ignore nulls. On Apache AGE:The control cases already returned the correct
0:Root cause
age_unnest()insrc/backend/utils/adt/agtype.cmaterialized every top-level array element into a heap tuple withnulls[0] = false, even when the element'sagtype_value.type == AGTV_NULL. The row that reachedcount()therefore carried a non-NULL datum wrapping an agtype JSON null, so the aggregate's strict null check never skipped it.Fix
A naïve fix inside
age_unnest()breaks other callers.construct_age_function_name()insrc/backend/parser/cypher_expr.crewritesunnest→age_unnest, so list comprehensions and the predicate functions (all/any/none/single) also route throughage_unnest. Those callers deliberately rely on the quirk that AGE'sagtype_null > agtype_integerevaluates toagtype_null(truthy), not SQL NULL. Changing the SRF broke fivepredicate_functionsexpected results.Introduce a dedicated SRF
age_unwind()with UNWIND-specific null semantics:AGTV_NULLelements → SQL NULL rows so null-strict operators (count,IS NULL,WHERE x IS NOT NULL, …) matchWITH null AS x.transform_cypher_unwind()is the only call site changed fromage_unnesttoage_unwind. List comprehensions and predicate functions continue to useage_unnestand their behavior is untouched.Before / after
Regression tests (
regress/sql/cypher_unwind.sql)UNWIND [null] RETURN count(x) = 0UNWIND [1, null, 2, null, 3] RETURN count(x) = 3UNWIND [null] RETURN x IS NULLUNWIND [1, null, 2] WITH x WHERE x IS NOT NULL RETURN count(x) = 2UNWIND [null] RETURN count(*) = 1(row still produced)UNWIND [[null], {k: null}] RETURN count(x) = 2(nested nulls preserved)UNWIND [null] RETURN x— asserts the row value itself is SQL NULLFiles changed
src/backend/utils/adt/agtype.c— newage_unwindSRFsrc/backend/parser/cypher_clause.c— UNWIND callsage_unwindsql/agtype_typecast.sql— declareage_unwindage--1.7.0--y.y.y.sql— addage_unwindfor in-place upgradesregress/sql/cypher_unwind.sql+regress/expected/cypher_unwind.outTest results
make installcheckwith PostgreSQL 18 on Ubuntu 24.04:age_upgradefails identically onapache/agemasterHEAD(pre-existing, unrelated; sameage_prepare_pg_upgrade already existserror).cypher_unwind,predicate_functions,list_comprehensionall pass.