docs: replace barrier() with KNN join behavior documentation by Kontinuation · Pull Request #635 · apache/sedona-db

Kontinuation · 2026-02-18T14:20:06Z

Summary

Remove the barrier() UDF function, which was an optimization barrier workaround for KNN joins. It had no external consumers (no Python bindings, no integration tests, no doc references) and is no longer needed since KNN joins inherently block filter pushdown through extension node semantics.
Replace the "Optimization Barrier" docs section in sql-joins.md with a "KNN Join Caveats" section that accurately documents:
- No Filter Pushdown: KNN joins do not push filters into input tables; all predicates are post-filters. Notes that query-side pushdown is a valid future optimization.
- ST_KNN Predicate Precedence: ST_KNN is always extracted first when combined with other predicates via AND; equivalent examples shown for ON ... AND vs WHERE placement.

Changes

docs/reference/sql-joins.md — Replaced "Optimization Barrier" section with "KNN Join Caveats"
rust/sedona-functions/src/barrier.rs — Deleted (649 lines)
rust/sedona-functions/src/lib.rs — Removed mod barrier;
rust/sedona-functions/src/register.rs — Removed barrier_udf registration

Testing

cargo test -p sedona-functions — 344 tests pass
cargo test -p sedona-spatial-join — 171 tests pass
All doc claims verified experimentally via Python SedonaContext

jiayuasu · 2026-02-18T18:50:36Z

Can you clarify why we need to remove the barrier function?

This function gives the user a choice to describe what his/her intention is. Because pushing down the filter through KNN Join is not a wrong behavior. I don't think simply blocking all filter pushdown will work, unless we can achieve something similar via CTE.

In addition, SedonaSpark also has the barrier function: https://sedona.apache.org/latest/api/sql/NearestNeighbourSearching/

paleolimbot · 2026-02-18T19:16:29Z

While it's not hurting anybody for it to continue to exist, we should definitely recommend more explicit syntax now that we have it available. You have to be a database expert familiar with the concept of barrier() to know what will happen here, and you have to have read the documentation very closely to know to use:

SELECT h.name AS hotel, r.name AS restaurant, r.rating
FROM hotels AS h
INNER JOIN restaurants AS r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE barrier('rating > 4.0 AND stars >= 4', 'rating', r.rating, 'stars', h.stars)

Since we can now type this:

SELECT h.name AS hotel, r.name AS restaurant, r.rating
FROM hotels AS h
INNER JOIN restaurants AS r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE rating > 4.0 AND stars >= 4

...we may as well recommend it and remove the hack before it becomes widely used. We can always add it back if it is requested.

paleolimbot · 2026-02-18T19:22:10Z

As I understand it, we also can optimize rating > 4.0 AND stars >= 4 by pushing stars >= 4 through one side of the join (whereas we can't do that with barrier()), in addition to other built-in optimizations that DataFusion does (e.g., constant folding, common subexpression elimination).

jiayuasu · 2026-02-18T20:20:45Z

I am fine removing the barrier function. I agree it is ugly.

But is there a way to allow users to clearly describe their intention? i.e., whether you want the filter first or the join first? I think we discussed this before and the suggestion was to use CTE?

SELECT h.name AS hotel, r.name AS restaurant, r.rating
FROM hotels AS h
INNER JOIN restaurants AS r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE rating > 4.0 AND stars >= 4

paleolimbot · 2026-02-18T20:41:07Z

Yes, I think a CTE or a subquery will both work if the filter should be applied first.

jiayuasu · 2026-02-18T22:48:39Z

OK. As long as we document the CTE approach, I am fine with removing the function

KNN joins now block all filter pushdown automatically, so the barrier() function is no longer needed. Replace the Optimization Barrier section with a KNN Join Behavior section that documents: - No filter pushdown: WHERE predicates are evaluated after KNN candidate selection, not pushed into input tables - ST_KNN predicate precedence: ST_KNN is always extracted first when combined with other predicates via AND

The barrier() function was a workaround to prevent filter pushdown past KNN joins by evaluating boolean expressions as opaque strings at runtime. KNN joins now block all filter pushdown automatically via the KnnJoinEarlyRewrite optimizer rule, making barrier() unnecessary. The function had no external consumers: no Python bindings, no integration tests, no documentation references, and no other Rust modules importing it.

…nce example

… in KNN joins

Kontinuation · 2026-02-19T02:32:04Z

I have updated the doc to include subquery and CTE examples for manually pushing down the filters. This could be a workaround for the current stage. We definitely should implement query-side predicate push down optimization for KNN in future patches.

petern48 · 2026-02-19T03:51:25Z

For what it's worth, the other day I stumbled across lancedb handling this exact scenario. They offer a prefilter parameter for their approximate k-nearest-neighbors vector search functionality. Not quite SQL, but worth noting that another project has indeed encountered this and supports both cases.

results_post_filtered = (
    table.search(query_embed)
    .where("label > 1", prefilter=False)  # prefilter parameter allows user to choose
    .select(["text", "keywords", "label"])
    .limit(5)
    .to_pandas()
)

https://docs.lancedb.com/search/vector-search#vector-search-with-postfiltering

Kontinuation · 2026-02-19T10:48:21Z

LanceDB's fluent API does not allow something like table.where(...).search(embedding). where can only appear after search so it has an optional parameter for distinguishing whether it is a pre-filter or not.

SQL is more flexible than LanceDB's query builder API, and there are ambiguous ways to express pre- and post-filtering in SQL, so I don't think we need barrier-like annotations. We only need to faithfully carry out the semantics of the SQL.

paleolimbot

Thank you!

Kontinuation force-pushed the docs/remove-barrier-update-knn-docs branch from 199e083 to 19fce52 Compare February 18, 2026 15:17

Kontinuation added 6 commits February 19, 2026 09:52

docs: refine KNN join behavior section with pushdown note and precede…

ac156e8

…nce example

Refine the doc

94b8a27

Remove qmd barrier

a7509b5

docs: add subquery/CTE examples for manual query-side filter pushdown…

b30eb2f

… in KNN joins

Kontinuation force-pushed the docs/remove-barrier-update-knn-docs branch from 19fce52 to b30eb2f Compare February 19, 2026 02:30

Kontinuation marked this pull request as ready for review February 19, 2026 10:48

paleolimbot approved these changes Feb 19, 2026

View reviewed changes

paleolimbot merged commit 6b08cf2 into apache:main Feb 19, 2026
17 checks passed

Kontinuation mentioned this pull request Feb 19, 2026

feat(rust/sedona-spatial-join): Automatic query-side filter pushdown for KNN joins #641

Merged

paleolimbot added this to the 0.3.0 milestone Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

docs: replace barrier() with KNN join behavior documentation#635

docs: replace barrier() with KNN join behavior documentation#635
paleolimbot merged 6 commits intoapache:mainfrom
Kontinuation:docs/remove-barrier-update-knn-docs

Kontinuation commented Feb 18, 2026

Uh oh!

jiayuasu commented Feb 18, 2026 •

edited

Loading

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

jiayuasu commented Feb 18, 2026 •

edited

Loading

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

jiayuasu commented Feb 18, 2026

Uh oh!

Kontinuation commented Feb 19, 2026

Uh oh!

petern48 commented Feb 19, 2026

Uh oh!

Kontinuation commented Feb 19, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

Kontinuation commented Feb 18, 2026

Summary

Changes

Testing

Uh oh!

jiayuasu commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

jiayuasu commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Feb 18, 2026

Uh oh!

jiayuasu commented Feb 18, 2026

Uh oh!

Kontinuation commented Feb 19, 2026

Uh oh!

petern48 commented Feb 19, 2026

Uh oh!

Kontinuation commented Feb 19, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiayuasu commented Feb 18, 2026 •

edited

Loading

jiayuasu commented Feb 18, 2026 •

edited

Loading