Skip to content

Conversation

@Kontinuation
Copy link
Member

This PR is based on #12667. 2 spatial predicates were added to operate on geospatial bounds:

  • ST_INTERSECTS: Test if 2 geospatial bounds intersects with each other
  • ST_DISJOINT: The negation of ST_INTERSECTS.

A new literal type BoundingBoxLiteral is added for representing a geospatial query window or geospatial bounds in metrics or Parquet stats.

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 19, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Oct 27, 2025
@stevenzwu stevenzwu reopened this Nov 10, 2025
@github-actions github-actions bot removed the stale label Nov 11, 2025
@Kontinuation Kontinuation force-pushed the pr-2-geo-expr branch 2 times, most recently from 560a29d to fa54b84 Compare November 18, 2025 08:55
@Kontinuation Kontinuation changed the title [WIP] API, Core: Add geospatial predicates and bounding box literals API, Core: Add geospatial predicates and bounding box literals Nov 18, 2025
@Kontinuation Kontinuation marked this pull request as ready for review November 18, 2025 16:20
@jiayuasu
Copy link
Member

@stevenzwu @szehon-ho @huaxingao @wgtmac @hsiang-c Would you please take a look at this PR?


@Override
protected Type.TypeID typeId() {
return Type.TypeID.GEOMETRY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit weird to me. bounding box literal can only be GEOMETRY type?

Maybe we should revisit a previous discussion.
#12667 (comment)

Maybe the BoundingBox needs to include the geo type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed strange, although this implementation works in this case since typeId() is only used by BaseLiteral.toByteBuffer for serializing the value of the literal.

I found 2 ways to fix this weirdness:

  1. Making BoundingBoxLiteral directly implementing interface Literal, but not extends from BaseLiteral. The Literal interface does not have the typeId method we don't have to come up with a confusing implementation.
  2. Adding geo type to BoundingBox. Check that the arguments passed to spatial predicates must have the same geo type.

I am working towards approach 2 and see if everything fits perfectly after revising the BoundingBox definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to switch to approach 1 and remove the typeId() method from BoundingBoxLiteral. Here are the reasons:

  1. If BoundingBox carries typeId, it should also carry the full fledged type information including CRS and algorithm. Otherwise it would be strange that only the geospatial type ID is explicitly specified while other type attributes inherit from the type of the field.
  2. Iceberg needs to check that the type of the field and the bounding box matches with each other when binding and evaluating spatial predicates. The CRS and algorithm also need to match.
  3. Once the query engine checked that the geospatial field is supported by the engine and decided to generate a bounding box for pushing down a spatial predicate, it needs extract the geospatial type attributes from the Iceberg schema to construct the BoundingBox. The CRS attribute is tightly coupled with the iceberg table since it is the name of the table property defining the full CRS in PROJJSON, so there's a high chance that the engine will extract field type in iceberg table schema directly use it to construct the bounding box. Inferring the type of BoundingBox according to the context would eliminate the need for this extra step.
  4. The untyped BoundingBox looks concise in the REST Catalog SPEC: REST: Add spatial expressions to REST Catalog OpenAPI Spec #14856. Adding type information to the bounding box would make the data model more cluttered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the reasoning. Let's go with approach 1 for now.

@Kontinuation Kontinuation force-pushed the pr-2-geo-expr branch 2 times, most recently from c320c60 to 5bb1a6b Compare December 11, 2025 13:28
@jiayuasu
Copy link
Member

@stevenzwu @huaxingao @wgtmac Hi folks, happy new year! Can you please take a look at this PR again?

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Kontinuation @jiayuasu we would need more eyes on this from both Iceberg and geo community since it mostly touches api module. I will bring it up in the next Iceberg community sync too.

also @szehon-ho not sure if you also want to take a look.

@stevenzwu
Copy link
Contributor

cc @hsiang-c @wgtmac @rdblue for reviews

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm , thanks @Kontinuation for pr, and @stevenzwu @huaxingao @wgtmac for detailed reviews

@huaxingao
Copy link
Contributor

Thanks all. This PR looks in good shape and I’d like to merge it soon. I’ll wait until Wednesday for any additional feedback from @hsiang-c @wgtmac @rdblue; if there are no further comments by then, I’ll go ahead and merge.

@huaxingao
Copy link
Contributor

Sorry—I didn’t notice this PR is scheduled for discussion in the next community sync. Let’s wait and follow up after the sync.

@stevenzwu
Copy link
Contributor

@Kontinuation please resolve the merge conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants