Skip to content

Conversation

@uros-db
Copy link
Contributor

@uros-db uros-db commented Jan 28, 2026

What changes were proposed in this pull request?

Enable writing Geometry and Geography data to Parquet files.

Why are the changes needed?

Allowing users to persist geospatial data in Parquet format.

Does this PR introduce any user-facing change?

Yes, geo data can now be written to Parquet.

How was this patch tested?

Added tests for writing GEOMETRY and GEOGRAPHY to Parquet.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link

JIRA Issue Information

=== Sub-task SPARK-55260 ===
Summary: Implement Parquet write support for Geo types
Assignee: None
Status: Open
Affected: ["4.2"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the SQL label Jan 28, 2026
@uros-db uros-db marked this pull request as draft January 28, 2026 18:08
@uros-db uros-db marked this pull request as ready for review January 29, 2026 15:55
Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some failures which don't look related to my changes (subquery/in-subquery/in-set-operations.sql and OracleJoinPushdownIntegrationSuite - Docker integration), so @cloud-fan please review.

@uros-db
Copy link
Contributor Author

uros-db commented Jan 30, 2026

There are some failures again, but I don't think any are related to these changes. @cloud-fan

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nit comments, but looks good

@uros-db
Copy link
Contributor Author

uros-db commented Jan 30, 2026

Linter is failing with python3.11: not found, so I think this should be good to go.


override def supportsDataType(dataType: DataType): Boolean = dataType match {
// GeoSpatial data types in Parquet are limited only to types with supported SRIDs.
case g: GeometryType => GeometryType.isSridSupported(g.srid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a dumb question: GeometryType.isSridSupported sounds like not for parquet, but for spark itself. So this is a safe guard that if the input data contains geo values that not supported by Spark (not sure how it can happen), we don't write them to parquet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a safe guard for Spark, but the same code is used both for reads and writes, please see the comment in the base class (sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala):

  /**
   * Returns whether this format supports the given [[DataType]] in read/write path.
   * By default all data types are supported.
   */
  def supportsDataType(dataType: DataType): Boolean = true

* Returns whether this format supports the given [[DataType]] in read/write path.

@cloud-fan
Copy link
Contributor

the linter failure is unrelated, thanks, merging to master!

@cloud-fan cloud-fan closed this in 7792122 Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants