Description
The date_format expression was added in PR #3201, but currently only supports UTC timezone with full compatibility. Non-UTC timezones are marked as Incompatible and fall back to Spark by default.
This issue tracks adding proper timezone conversion support so that date_format can be fully compatible with Spark for all timezones.
Current Behavior
- UTC timezone:
Compatible() - runs natively in Comet
- Non-UTC timezones:
Incompatible() - falls back to Spark by default
- Users can enable non-UTC with
spark.comet.expr.DateFormatClass.allowIncompatible=true but results may differ from Spark
Desired Behavior
All timezones should be Compatible() and produce results identical to Spark.
Technical Details
The current implementation uses DataFusion's to_char function which formats timestamps without timezone conversion. Spark's date_format applies the session timezone when formatting.
Possible approaches:
- Convert the timestamp to the target timezone before calling
to_char
- Use a timezone-aware formatting function if available in DataFusion
- Implement custom Rust logic to handle timezone conversion
Related
Note: This issue was generated with AI assistance.
Description
The
date_formatexpression was added in PR #3201, but currently only supports UTC timezone with full compatibility. Non-UTC timezones are marked asIncompatibleand fall back to Spark by default.This issue tracks adding proper timezone conversion support so that
date_formatcan be fully compatible with Spark for all timezones.Current Behavior
Compatible()- runs natively in CometIncompatible()- falls back to Spark by defaultspark.comet.expr.DateFormatClass.allowIncompatible=truebut results may differ from SparkDesired Behavior
All timezones should be
Compatible()and produce results identical to Spark.Technical Details
The current implementation uses DataFusion's
to_charfunction which formats timestamps without timezone conversion. Spark'sdate_formatapplies the session timezone when formatting.Possible approaches:
to_charRelated
date_trunc