Fix: revive subquery decorrelator and fix failing test by duongcongtoai · Pull Request #19577 · apache/datafusion

duongcongtoai · 2025-12-31T08:05:13Z

Which issue does this PR close?

#16059 has completed, but the result is not persisted into datafusion.

Some sqllogic test is also failing => This PR brings back all the changes inside the GSOC work and a complete POC.

I'll try to break it down into smaller components and bring them into datafusion:

Prerequiste

logical/physical operator for delimget
left singlejoin support

Major part

implement DependentJoinRewriter
implement DependentJoinDecorrelator
implement Deliminator

Rationale for this change

DependentJoin

As per paper 2

We split the algorithm into three parts: First, a preparatory phase that identifies all non-trivial
dependent joins and annotates them with information that the main algorithm needs. Second,
the logic to eliminate dependent joins, which will be called for all non-trivial dependent
joins in top-to-bottom order and which is the main algorithm, and third, the unnesting rules
for individual operators. Note that we do not include a formalization of this approach due to
space constraints, formal definitions and a proof of correctness can be found in a technical
report [Ne24].

There is a need to detect non-trivial dependent join (i.e dependent joins where the RHS accesses columns provided by the LHS) and annotate metadata before decorrelation begins.
The paper suggest the usage of index algebra, however, for now we go forward without such data structure

"Using an indexed algebra is ideal for this phase because it can do every LCA computation in 𝑂(log 𝑛) without any additional data structures. If the DBMS does not support that functionality, the same information can be computed with worse asymptotic complexity by keeping track of the column sets that are available in the different parts of the tree."

Given this query

SELECT *
FROM customer
WHERE c_mktsegment='AUTOMOBILE' AND
    (SELECT COUNT(*) FROM orders
        WHERE o_custkey=c_custkey AND
            (SELECT SUM(l_extendedprice) FROM lineitem
                WHERE l_orderkey=o_orderkey
            )>300000
    )>5

According to the paper, this query is constructed into this trees with some additional annotations

"we consider every column access and compute the lowest common ancestor (LCA) of the operator o1 that accesses the column and the operator o2 that provides the column. If the LCA is not o1 , it must be a dependent join 3 and we annotate 3 with the fact that o1 is accessing the left-hand side of 3"
Explanation:

Node9 is a filter with expr T3.a=T1.a, with T1.a is a column provided by some operator/relation outside its current context (in Datafusion we call them OuterRefColumn). Now we need to annotate this access with extra information:
- Where should the dependent join node for this access be (i.e Node5, Node4 or Node1)
- Let's say we already detect D is the dependent join node,
  then which descendant of this node "provides" the column T1.a. In this case Node2 (not Node3) is the provider of column T1.a

We introduce a new struct in Datafusion to contain these annotations

pub struct CorrelatedColumnInfo {
    pub col: Column,
    // TODO: is data_type necessary?
    pub field: FieldRef,
    pub depth: usize,
    // the reference to the delim scan node map
    // this is usedful to construct delim scan operator later
    pub delim_scan_node_id: usize,
}

To implement this annotation similar to the paper, in Datafusion we use the tree traversal API on the root LogicalPlan node, specifically method rewrite_with_subqueries. This method ensure all the RHS of any potential dependent join node are visited first

macro_rules! handle_transform_recursion {
    ($F_DOWN:expr, $F_CHILD:expr, $F_UP:expr) => {{
        $F_DOWN?
            .transform_children(|n| {
                n.map_subqueries($F_CHILD)?
                    .transform_sibling(|n| n.map_children($F_CHILD))
            })?
            .transform_parent($F_UP)
    }};
}

The goal of this traversal is to link between the accessor and the providers. There will be intermediate state persisted during the traversal

pub struct DependentJoinRewriter {
    // each logical plan traversal will assign it a integer id
    current_id: usize,

    subquery_depth: usize,
    // each newly visted `LogicalPlan` is inserted inside this map for tracking
    nodes: IndexMap<usize, Node>,
    // all the node ids from root to the current node
    // this is mutated duriing traversal
    stack: Vec<usize>,
    // track for each column, the nodes/logical plan that reference to its within the tree,
    // but not yet resolved
    // during the tree traversal these entries will be resolved
    // by matching the column provider and accessor
    unresolved_outer_ref_columns: IndexMap<Column, Vec<ColumnAccess>>,
    // used to generate unique alias for subqueries appearing in the logical plan
    alias_generator: Arc<AliasGenerator>,
    // this is used to decorrelation optimizor later
    // to construct delim scan node.
    pub domain_columns_provider_nodes: IndexMap<usize, LogicalPlan>,
}

let's walk through the logical plan from the paper above. Execution sequences happens like below

flowchart TD


A["(1)Projection: ..."]
-->|"f_down(1)"| B["(2): customer.c_mktsegment = 'AUTOMOBILE' AND (subquery) > 5"]
B -->|"f_up(1)"| A
B --> |"f_down(2)"| D["(3)Subquery #1"]
D --> |"f_up(2)"| B

D --> E["(4)Projection: count(1)"]
E --> D

E --> F["(5)Aggregate: count(*)"]
F --> G["(6)Filter: orders.o_custkey = outer_ref(customer.c_custkey) AND (subquery) > 300000"]

G --> |"f_down(6): mark_outer_column_access(customer.c_custkey)"| I["(7)Subquery #2"]

I --> J["(8)Projection: sum(lineitem.l_extendedprice)"]
J --> K["(9)Aggregate: sum(lineitem.l_extendedprice)"]

K --> L["(10)Filter: lineitem.l_orderkey = outer_ref(orders.o_orderkey)"]
L --> |"f_down(10): mark_outer_column_access(orders.o_orderkey)"| M["(11)TableScan: lineitem"]
B --> |"f_down(2):check_matching_column_provider -> resolve column access for customer.c_custkey"| C["(12)TableScan: customer"]
G --> |"f_down(13):check_matching_column_provider -> resolve column access for orders.o_orderkey"| H["(13)TableScan: orders"]

Now pay attention to f_down(6) and f_down(2). f_down(6) marks an appearance of outer_ref(customer.c_custkey). The accessed stack will be [1,2,3,4,5,6]. f_down(2) marks the first logical plan that knows about the expression customer.c_custkey and will resolve the previous column access, when this happens the traversal stack is [1,2,12]. The LCA (lowest common ancestor) of the two stacks according to the algorithm is [2] and thus 2 should be converted into a dependent join logical plan later on. The same can be applied for the couple of f_down(10) and f_down(13).

Correlated subqueries are rewritten into dependent join nodes as followed

flowchart TD

A["Projection: customer.t3_id, customer.c_mktsegment, customer.c_custkey"]
--> B["Projection: customer.t3_id, customer.c_mktsegment, customer.c_custkey"]

B --> C["Filter: customer.c_mktsegment = 'AUTOMOBILE' AND __scalar_sq_2 > 5"]

C --> D["DependentJoin depth=1 on customer.c_custkey"]

D --> E["TableScan: customer"]
D --> F["Projection: count(1)"]

F --> G["Aggregate: count(*)"]
G --> H["Projection: orders.t2_id, orders.o_custkey, orders.o_orderkey"]

H --> I["Filter: orders.o_custkey = outer_ref(customer.c_custkey) AND __scalar_sq_1 > 300000"]

I --> J["DependentJoin depth=2 on orders.o_orderkey"]

J --> K["TableScan: orders"]
J --> L["Projection: sum(lineitem.l_extendedprice)"]

L --> M["Aggregate: sum(lineitem.l_extendedprice)"]
M --> N["Filter: lineitem.l_orderkey = outer_ref(orders.o_orderkey)"]

N --> O["TableScan: lineitem"]

[NOTE TO SELF]: looks like the provider of column c_custkey was not correctly detected (in paper it should be the filter node above table scan of customer), but in current implementation it is the table scan. This difference will yield significant performance on delim scan later on

The collections of nodes that provides the columns will be persisted and passed to the next round of decorrelation optimizor (to construct delim_get). More details on this [TBU]

…-complex-filter

…-framework

kosiew · 2026-01-14T08:44:36Z

hi @duongcongtoai

It'll help reviewers if you explain more about your PR.

…-decorrelate-revive

duongcongtoai added 30 commits February 3, 2025 04:44

chore: add test

4ba36c0

chore: more progress

79eaca3

temp

7ed0831

Merge remote-tracking branch 'origin/main' into 14554-unnest-subquery…

cc97879

…-complex-filter

Merge remote-tracking branch 'origin/main' into 14554-unnest-subquery…

5096937

…-complex-filter

chore: some work

68fd9ca

chore: some work on indexed algebra

ace332e

chore: more progress

da8980c

chore: impl projection pull up

483e3ac

chore: complete unnesting simple subquery

f14b145

chore: correct join condition

0cd8143

chore: handle exist query

cc3e01c

test: in sq test

9b5daa2

test: exist with no dependent column

f26baf8

test: exist with dependent columns

37852c1

Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…

2544478

…-framework

chore: remove redundant clone

e984a55

feat: dummy implementation for aggregation

94aba08

feat: handle count bug

0f039fe

feat: add sq alias step

898bdc4

test: simple count decorrelate

1a600b6

chore: some work to support multiple subqueries per level

6ce21b3

feat: support multiple subqueries decorrelation untested

67923d4

feat: correct node rewriting rule

64538cc

fix: subquery alias

957403f

fix: adjust test case expectation

a465459

feat: convert sq to dependent joins

479ae64

feat: impl dependent join rewriter

2171e52

chore: clean up unused function

9d26437

chore: clean up debug slt

24d1223

fix: display null=null of join predicate

c797555

duongcongtoai added 2 commits January 16, 2026 05:23

fix: tpch til q19

ccfd5a2

fix: q20

811595e

duongcongtoai marked this pull request as draft February 16, 2026 03:59

duongcongtoai mentioned this pull request Feb 23, 2026

General framework to decorrelate the subqueries #5492

Open

toaiduong-blip added 2 commits February 24, 2026 05:32

fix: save a map of delim_scan nodes

4ef9adf

fix: q21 and q22

5a0c64b

duongcongtoai force-pushed the temp-decorrelate-revive branch from a4b660e to 5a0c64b Compare February 25, 2026 07:23

duongcongtoai added 11 commits February 25, 2026 09:00

Merge branch 'main' of https://github.com/apache/datafusion into temp…

ca5f996

…-decorrelate-revive

Merge branch 'main' of https://github.com/apache/datafusion into temp…

6e3f4d3

…-decorrelate-revive

chore: rename

dfb02b5

fix: compile err

343c458

fix: comile err

520ab4a

Merge branch 'main' of https://github.com/apache/datafusion into temp…

cca0799

…-decorrelate-revive

fix: delim get should work with any logical plan

a0f2f39

fix: all tpch

aa7611d

feat: support pushdown subquery alias

a2df1e1

fix: some comments

ade6673

fix: rm edge case comment

9f2f006

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: revive subquery decorrelator and fix failing test#19577

Fix: revive subquery decorrelator and fix failing test#19577
duongcongtoai wants to merge 241 commits intoapache:mainfrom
duongcongtoai:temp-decorrelate-revive

duongcongtoai commented Dec 31, 2025 •

edited

Loading

Uh oh!

kosiew commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

duongcongtoai commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Prerequiste

Major part

Rationale for this change

DependentJoin

Uh oh!

kosiew commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

duongcongtoai commented Dec 31, 2025 •

edited

Loading