[SPARK-56160][SQL] Add DataType classes for nanosecond timestamp types by xiaoxuandev · Pull Request #54966 · apache/spark

xiaoxuandev · 2026-03-23T22:16:07Z

What changes were proposed in this pull request?

This PR adds two new DataType classes for nanosecond-precision timestamps:

TimestampNSType (with local timezone semantics)
TimestampNTZNSType (without timezone semantics)

Both are singleton types following the same pattern as TimestampNTZType (SPARK-35662). They are stored internally as a Long representing nanoseconds since the Unix epoch, with a default size of 8 bytes. The representable range is approximately 1677-09-21 to 2262-04-11.

This PR also registers the new types in DataTypes.java (Java API) and DataType.scala (type name registry for JSON/DDL parsing).

Why are the changes needed?

Microsecond precision is insufficient for a growing number of workloads:

Parquet files written by Pandas/PyArrow default to TIMESTAMP(NANOS)
Iceberg V3 adds timestamp_ns / timestamptz_ns types
Financial exchange data (NYSE, NASDAQ, CME) uses nanosecond timestamps
OpenTelemetry traces use nanosecond timestamps

Without native nanosecond types, Spark either throws AnalysisException on nanosecond Parquet columns or reads them as raw LongType via spark.sql.legacy.parquet.nanosAsLong, losing all timestamp semantics.

This is the first step of native nanosecond timestamp support. Subsequent PRs will add SQL parser keywords, Cast rules, Parquet read/write, and Arrow integration.

Does this PR introduce any user-facing change?

No. The types are defined but not yet wired into the SQL parser or any data source.

How was this patch tested?

Added checkDefaultSize tests in DataTypeSuite for both new types.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

### What changes were proposed in this pull request? This PR adds two new DataType classes for nanosecond-precision timestamps: - `TimestampNSType` (with local timezone semantics) - `TimestampNTZNSType` (without timezone semantics) Both are singleton types following the same pattern as `TimestampNTZType` (SPARK-35662). They are stored internally as a Long representing nanoseconds since the Unix epoch, with a default size of 8 bytes. The representable range is approximately 1677-09-21 to 2262-04-11. This PR also registers the new types in `DataTypes.java` (Java API) and `DataType.scala` (type name registry for JSON/DDL parsing). ### Why are the changes needed? Microsecond precision is insufficient for a growing number of workloads: - Parquet files written by Pandas/PyArrow default to `TIMESTAMP(NANOS)` - Iceberg V3 adds `timestamp_ns` / `timestamptz_ns` types - Financial exchange data (NYSE, NASDAQ, CME) uses nanosecond timestamps - OpenTelemetry traces use nanosecond timestamps Without native nanosecond types, Spark either throws `AnalysisException` on nanosecond Parquet columns or reads them as raw `LongType` via `spark.sql.legacy.parquet.nanosAsLong`, losing all timestamp semantics. This is the first step of native nanosecond timestamp support. Subsequent PRs will add SQL parser keywords, Cast rules, Parquet read/write, and Arrow integration. ### Does this PR introduce _any_ user-facing change? No. The types are defined but not yet wired into the SQL parser or any data source. ### How was this patch tested? Added `checkDefaultSize` tests in `DataTypeSuite` for both new types. ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with Kiro.

xiaoxuandev force-pushed the add-timesatmp-nano-types branch from 8e16374 to b295070 Compare March 23, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56160][SQL] Add DataType classes for nanosecond timestamp types#54966

[SPARK-56160][SQL] Add DataType classes for nanosecond timestamp types#54966
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:add-timesatmp-nano-types

xiaoxuandev commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaoxuandev commented Mar 23, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant