-
Notifications
You must be signed in to change notification settings - Fork 3k
API, Core: Add geospatial predicates and bounding box literals #14101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
560a29d to
fa54b84
Compare
|
@stevenzwu @szehon-ho @huaxingao @wgtmac @hsiang-c Would you please take a look at this PR? |
|
|
||
| @Override | ||
| protected Type.TypeID typeId() { | ||
| return Type.TypeID.GEOMETRY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit weird to me. bounding box literal can only be GEOMETRY type?
Maybe we should revisit a previous discussion.
#12667 (comment)
Maybe the BoundingBox needs to include the geo type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is indeed strange, although this implementation works in this case since typeId() is only used by BaseLiteral.toByteBuffer for serializing the value of the literal.
I found 2 ways to fix this weirdness:
- Making
BoundingBoxLiteraldirectly implementinginterface Literal, but not extends fromBaseLiteral. TheLiteralinterface does not have thetypeIdmethod we don't have to come up with a confusing implementation. - Adding geo type to
BoundingBox. Check that the arguments passed to spatial predicates must have the same geo type.
I am working towards approach 2 and see if everything fits perfectly after revising the BoundingBox definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to switch to approach 1 and remove the typeId() method from BoundingBoxLiteral. Here are the reasons:
- If
BoundingBoxcarries typeId, it should also carry the full fledged type information including CRS and algorithm. Otherwise it would be strange that only the geospatial type ID is explicitly specified while other type attributes inherit from the type of the field. - Iceberg needs to check that the type of the field and the bounding box matches with each other when binding and evaluating spatial predicates. The CRS and algorithm also need to match.
- Once the query engine checked that the geospatial field is supported by the engine and decided to generate a bounding box for pushing down a spatial predicate, it needs extract the geospatial type attributes from the Iceberg schema to construct the
BoundingBox. The CRS attribute is tightly coupled with the iceberg table since it is the name of the table property defining the full CRS in PROJJSON, so there's a high chance that the engine will extract field type in iceberg table schema directly use it to construct the bounding box. Inferring the type of BoundingBox according to the context would eliminate the need for this extra step. - The untyped BoundingBox looks concise in the REST Catalog SPEC: REST: Add spatial expressions to REST Catalog OpenAPI Spec #14856. Adding type information to the bounding box would make the data model more cluttered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the reasoning. Let's go with approach 1 for now.
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/ExpressionVisitors.java
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/expressions/TestExpressionParser.java
Outdated
Show resolved
Hide resolved
c320c60 to
5bb1a6b
Compare
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java
Outdated
Show resolved
Hide resolved
5bb1a6b to
2a56c2b
Compare
2a56c2b to
6a4a1b3
Compare
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/Evaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/types/TestConversions.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestPredicateBinding.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/expressions/ExpressionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/expressions/ExpressionParser.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
4bb3e84 to
fc86d0c
Compare
|
@stevenzwu @huaxingao @wgtmac Hi folks, happy new year! Can you please take a look at this PR again? |
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionBinding.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionBinding.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionBinding.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Show resolved
Hide resolved
c86b436 to
c6698fb
Compare
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@Kontinuation @jiayuasu we would need more eyes on this from both Iceberg and geo community since it mostly touches api module. I will bring it up in the next Iceberg community sync too.
also @szehon-ho not sure if you also want to take a look.
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm , thanks @Kontinuation for pr, and @stevenzwu @huaxingao @wgtmac for detailed reviews
|
Sorry—I didn’t notice this PR is scheduled for discussion in the next community sync. Let’s wait and follow up after the sync. |
|
@Kontinuation please resolve the merge conflict |
c6698fb to
f55e6e2
Compare
This PR is based on #12667. 2 spatial predicates were added to operate on geospatial bounds:
ST_INTERSECTS: Test if 2 geospatial bounds intersects with each otherST_DISJOINT: The negation ofST_INTERSECTS.A new literal type
BoundingBoxLiteralis added for representing a geospatial query window or geospatial bounds in metrics or Parquet stats.