docs: replace barrier() with KNN join behavior documentation#635
Conversation
199e083 to
19fce52
Compare
|
Can you clarify why we need to remove the barrier function? This function gives the user a choice to describe what his/her intention is. Because pushing down the filter through KNN Join is not a wrong behavior. I don't think simply blocking all filter pushdown will work, unless we can achieve something similar via CTE. In addition, SedonaSpark also has the barrier function: https://sedona.apache.org/latest/api/sql/NearestNeighbourSearching/ |
|
While it's not hurting anybody for it to continue to exist, we should definitely recommend more explicit syntax now that we have it available. You have to be a database expert familiar with the concept of SELECT h.name AS hotel, r.name AS restaurant, r.rating
FROM hotels AS h
INNER JOIN restaurants AS r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE barrier('rating > 4.0 AND stars >= 4', 'rating', r.rating, 'stars', h.stars)Since we can now type this: SELECT h.name AS hotel, r.name AS restaurant, r.rating
FROM hotels AS h
INNER JOIN restaurants AS r ON ST_KNN(h.geometry, r.geometry, 3, false)
WHERE rating > 4.0 AND stars >= 4...we may as well recommend it and remove the hack before it becomes widely used. We can always add it back if it is requested. |
|
As I understand it, we also can optimize |
|
I am fine removing the barrier function. I agree it is ugly. But is there a way to allow users to clearly describe their intention? i.e., whether you want the filter first or the join first? I think we discussed this before and the suggestion was to use CTE? |
|
Yes, I think a CTE or a subquery will both work if the filter should be applied first. |
|
OK. As long as we document the CTE approach, I am fine with removing the function |
KNN joins now block all filter pushdown automatically, so the barrier() function is no longer needed. Replace the Optimization Barrier section with a KNN Join Behavior section that documents: - No filter pushdown: WHERE predicates are evaluated after KNN candidate selection, not pushed into input tables - ST_KNN predicate precedence: ST_KNN is always extracted first when combined with other predicates via AND
The barrier() function was a workaround to prevent filter pushdown past KNN joins by evaluating boolean expressions as opaque strings at runtime. KNN joins now block all filter pushdown automatically via the KnnJoinEarlyRewrite optimizer rule, making barrier() unnecessary. The function had no external consumers: no Python bindings, no integration tests, no documentation references, and no other Rust modules importing it.
19fce52 to
b30eb2f
Compare
|
I have updated the doc to include subquery and CTE examples for manually pushing down the filters. This could be a workaround for the current stage. We definitely should implement query-side predicate push down optimization for KNN in future patches. |
|
For what it's worth, the other day I stumbled across lancedb handling this exact scenario. They offer a results_post_filtered = (
table.search(query_embed)
.where("label > 1", prefilter=False) # prefilter parameter allows user to choose
.select(["text", "keywords", "label"])
.limit(5)
.to_pandas()
)https://docs.lancedb.com/search/vector-search#vector-search-with-postfiltering |
|
LanceDB's fluent API does not allow something like SQL is more flexible than LanceDB's query builder API, and there are ambiguous ways to express pre- and post-filtering in SQL, so I don't think we need barrier-like annotations. We only need to faithfully carry out the semantics of the SQL. |
Summary
barrier()UDF function, which was an optimization barrier workaround for KNN joins. It had no external consumers (no Python bindings, no integration tests, no doc references) and is no longer needed since KNN joins inherently block filter pushdown through extension node semantics.sql-joins.mdwith a "KNN Join Caveats" section that accurately documents:ST_KNNis always extracted first when combined with other predicates viaAND; equivalent examples shown forON ... ANDvsWHEREplacement.Changes
docs/reference/sql-joins.md— Replaced "Optimization Barrier" section with "KNN Join Caveats"rust/sedona-functions/src/barrier.rs— Deleted (649 lines)rust/sedona-functions/src/lib.rs— Removedmod barrier;rust/sedona-functions/src/register.rs— Removedbarrier_udfregistrationTesting
cargo test -p sedona-functions— 344 tests passcargo test -p sedona-spatial-join— 171 tests pass