feat(rust/sedona-spatial-join): Automatic query-side filter pushdown for KNN joins#641
Merged
paleolimbot merged 4 commits intoapache:mainfrom Feb 20, 2026
Conversation
paleolimbot
approved these changes
Feb 20, 2026
Comment on lines
2372
to
2375
| let expr = Expr::ScalarFunction(ScalarFunction { | ||
| func: st_knn_udf, | ||
| args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)], | ||
| }); |
Member
There was a problem hiding this comment.
Optional, but if you have datafusion as a dev dependency already you can also use ctx.parse_sql_expr() to create these (might make it easier to spot errors or create more complex test cases in the future).
Member
Author
There was a problem hiding this comment.
It would require registering UDFs with a SessionContext, which adds complexity for what are currently straightforward unit tests. I have replaced these using st_knn_udf.call.
Comment on lines
2385
to
2388
| let knn_expr = Expr::ScalarFunction(ScalarFunction { | ||
| func: st_knn_udf, | ||
| args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)], | ||
| }); |
Member
There was a problem hiding this comment.
.call() is probably a bit more compact since there are lot of these:
Suggested change
| let knn_expr = Expr::ScalarFunction(ScalarFunction { | |
| func: st_knn_udf, | |
| args: vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)], | |
| }); | |
| let knn_expr = st_knn_udf.call(vec![col("l.geom"), col("r.geom"), lit(5i32), lit(false)]); |
paleolimbot
approved these changes
Feb 20, 2026
Member
|
Merging to keep all the PRs in sync since we have a lot open! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KnnQuerySideFilterPushdownoptimizer rule that automatically pushes query-side-only filters below theSpatialJoinPlanNodeextension node for KNN inner joinsINNER JOIN(conservative start); outer join support can be added laterbarrier()is still neededBackground
Previously, KNN joins blocked ALL filter pushdown (both query-side and object-side) because the
SpatialJoinPlanNodeextension node's defaultprevent_predicate_push_down_columns()returns all columns. Object-side pushdown must remain blocked (it changes KNN candidate sets), but query-side pushdown is safe and should be automatic.DataFusion's built-in
PushDownFilterpushes the same predicate to ALL children of an extension node, so a query-side filter likeh.stars >= 4would fail when applied to the object-side child that doesn't have columnh.stars. This requires a custom optimizer rule instead.Implementation
The
KnnQuerySideFilterPushdownrule:Filter(predicate, Extension(SpatialJoinPlanNode))where the join filter containsST_KNNfind_knn_query_side()to determine which child is the query side (from the first argument ofST_KNN)PushDownFilterso the pushed-down filters are further optimized into scan nodes in the same passTesting
find_st_knn_callandfind_knn_query_sideDepends on #635