Skip to content

feat(rust/sedona-spatial-join): Automatic query-side filter pushdown for KNN joins#641

Merged
paleolimbot merged 4 commits intoapache:mainfrom
Kontinuation:feat/knn-query-side-filter-pushdown
Feb 20, 2026
Merged

feat(rust/sedona-spatial-join): Automatic query-side filter pushdown for KNN joins#641
paleolimbot merged 4 commits intoapache:mainfrom
Kontinuation:feat/knn-query-side-filter-pushdown

Conversation

@Kontinuation
Copy link
Member

Summary

  • Adds a KnnQuerySideFilterPushdown optimizer rule that automatically pushes query-side-only filters below the SpatialJoinPlanNode extension node for KNN inner joins
  • Only handles INNER JOIN (conservative start); outer join support can be added later
  • Updates docs to document the automatic pushdown behavior and clarify when barrier() is still needed

Background

Previously, KNN joins blocked ALL filter pushdown (both query-side and object-side) because the SpatialJoinPlanNode extension node's default prevent_predicate_push_down_columns() returns all columns. Object-side pushdown must remain blocked (it changes KNN candidate sets), but query-side pushdown is safe and should be automatic.

DataFusion's built-in PushDownFilter pushes the same predicate to ALL children of an extension node, so a query-side filter like h.stars >= 4 would fail when applied to the object-side child that doesn't have column h.stars. This requires a custom optimizer rule instead.

Implementation

The KnnQuerySideFilterPushdown rule:

  1. Pattern matches Filter(predicate, Extension(SpatialJoinPlanNode)) where the join filter contains ST_KNN
  2. Uses find_knn_query_side() to determine which child is the query side (from the first argument of ST_KNN)
  3. Splits the filter predicate into conjuncts; pushes query-side-only conjuncts below the extension node; keeps the rest above
  4. Runs before DataFusion's PushDownFilter so the pushed-down filters are further optimized into scan nodes in the same pass

Testing

  • 12 unit tests for find_st_knn_call and find_knn_query_side
  • 3 integration tests verifying correct plan structure (filter pushed into query-side child, object-side filters stay above)

Depends on #635

@Kontinuation Kontinuation marked this pull request as ready for review February 19, 2026 17:36
@Kontinuation Kontinuation changed the title feat(spatial-join): automatic query-side filter pushdown for KNN joins feat(rust/sedona-spatial-join): automatic query-side filter pushdown for KNN joins Feb 20, 2026
@Kontinuation Kontinuation changed the title feat(rust/sedona-spatial-join): automatic query-side filter pushdown for KNN joins feat(rust/sedona-spatial-join): Automatic query-side filter pushdown for KNN joins Feb 20, 2026
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines 2372 to 2375
let expr = Expr::ScalarFunction(ScalarFunction {
func: st_knn_udf,
args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)],
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but if you have datafusion as a dev dependency already you can also use ctx.parse_sql_expr() to create these (might make it easier to spot errors or create more complex test cases in the future).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require registering UDFs with a SessionContext, which adds complexity for what are currently straightforward unit tests. I have replaced these using st_knn_udf.call.

Comment on lines 2385 to 2388
let knn_expr = Expr::ScalarFunction(ScalarFunction {
func: st_knn_udf,
args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)],
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.call() is probably a bit more compact since there are lot of these:

Suggested change
let knn_expr = Expr::ScalarFunction(ScalarFunction {
func: st_knn_udf,
args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)],
});
let knn_expr = st_knn_udf.call(vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)]);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@paleolimbot
Copy link
Member

Merging to keep all the PRs in sync since we have a lot open!

@paleolimbot paleolimbot merged commit bed9151 into apache:main Feb 20, 2026
17 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants