From 69b3a25160dd931104947e39afec56a4ad83a9bb Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 12 Mar 2026 13:23:15 -0600 Subject: [PATCH 1/5] docs: update supported expressions and operators for 0.14.0 release Add 9 missing expressions (StringSplit, Right, MakeDate, NextDay, Size, MapContainsKey, MapFromEntries, StructsToCsv, Crc32), add 3 missing operators (CoalesceExec, CollectLimitExec, TakeOrderedAndProjectExec), remove 5 expressions not actually registered (BRound, TryAdd, TryDivide, TryMultiply, TrySubtract), and clarify decomposed expressions (DatePart, Extract, BoolAnd, BoolOr). --- docs/source/user-guide/latest/expressions.md | 40 +++++++++++--------- docs/source/user-guide/latest/operators.md | 3 ++ 2 files changed, 25 insertions(+), 18 deletions(-) diff --git a/docs/source/user-guide/latest/expressions.md b/docs/source/user-guide/latest/expressions.md index 0339cd2a3e..220bd981f1 100644 --- a/docs/source/user-guide/latest/expressions.md +++ b/docs/source/user-guide/latest/expressions.md @@ -74,6 +74,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | Lower | No | Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` | | OctetLength | Yes | | | Reverse | Yes | | +| Right | Yes | Length argument must be a literal value | | RLike | No | Uses Rust regexp engine, which has different behavior to Java regexp engine | | StartsWith | Yes | | | StringInstr | Yes | | @@ -82,6 +83,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | StringLPad | Yes | | | StringRPad | Yes | | | StringSpace | Yes | | +| StringSplit | No | Regex engine differences between Java and Rust | | StringTranslate | Yes | | | StringTrim | Yes | | | StringTrimBoth | Yes | | @@ -98,12 +100,14 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | DateDiff | `datediff` | Yes | | | DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. | | DateSub | `date_sub` | Yes | | -| DatePart | `date_part(field, source)` | Yes | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | -| Extract | `extract(field FROM source)` | Yes | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | +| DatePart | `date_part(field, source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | +| Extract | `extract(field FROM source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | | FromUnixTime | `from_unixtime` | No | Does not support format, supports only -8334601211038 <= sec <= 8210266876799 | | Hour | `hour` | Yes | | | LastDay | `last_day` | Yes | | +| MakeDate | `make_date` | Yes | | | Minute | `minute` | Yes | | +| NextDay | `next_day` | Yes | | | Second | `second` | Yes | | | TruncDate | `trunc` | Yes | | | TruncTimestamp | `date_trunc` | Yes | | @@ -128,7 +132,6 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | Asin | `asin` | Yes | | | Atan | `atan` | Yes | | | Atan2 | `atan2` | Yes | | -| BRound | `bround` | Yes | | | Ceil | `ceil` | Yes | | | Cos | `cos` | Yes | | | Cosh | `cosh` | Yes | | @@ -156,10 +159,6 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | Subtract | `-` | Yes | | | Tan | `tan` | Yes | | | Tanh | `tanh` | Yes | | -| TryAdd | `try_add` | Yes | Only integer inputs are supported | -| TryDivide | `try_div` | Yes | Only integer inputs are supported | -| TryMultiply | `try_mul` | Yes | Only integer inputs are supported | -| TrySubtract | `try_sub` | Yes | Only integer inputs are supported | | UnaryMinus | `-` | Yes | | | Unhex | `unhex` | Yes | | @@ -167,6 +166,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | Expression | Spark-Compatible? | | ----------- | ----------------- | +| Crc32 | Yes | | Md5 | Yes | | Murmur3Hash | Yes | | Sha1 | Yes | @@ -194,8 +194,8 @@ Expressions that are not Spark-compatible will fall back to Spark by default and | BitAndAgg | | Yes | | | BitOrAgg | | Yes | | | BitXorAgg | | Yes | | -| BoolAnd | `bool_and` | Yes | | -| BoolOr | `bool_or` | Yes | | +| BoolAnd | `bool_and` | Yes | Spark decomposes to Min/Max on boolean columns | +| BoolOr | `bool_or` | Yes | Spark decomposes to Min/Max on boolean columns | | Corr | | Yes | | | Count | | Yes | | | CovPopulation | | Yes | | @@ -250,16 +250,19 @@ Comet supports using the following aggregate functions within window contexts wi | ElementAt | Yes | Input must be an array. Map inputs are not supported. | | Flatten | Yes | | | GetArrayItem | Yes | | +| Size | Yes | Only array inputs are supported. Map inputs are not supported. | ## Map Expressions -| Expression | Spark-Compatible? | -| ------------- | ----------------- | -| GetMapValue | Yes | -| MapKeys | Yes | -| MapEntries | Yes | -| MapValues | Yes | -| MapFromArrays | Yes | +| Expression | Spark-Compatible? | Compatibility Notes | +| -------------- | ----------------- | ------------------------------------------- | +| GetMapValue | Yes | | +| MapContainsKey | Yes | | +| MapEntries | Yes | | +| MapFromArrays | Yes | | +| MapFromEntries | No | Binary key or value types are not supported | +| MapKeys | Yes | | +| MapValues | Yes | | ## Struct Expressions @@ -268,8 +271,9 @@ Comet supports using the following aggregate functions within window contexts wi | CreateNamedStruct | Yes | | | GetArrayStructFields | Yes | | | GetStructField | Yes | | -| JsonToStructs | No | Partial support. Requires explicit schema. | -| StructsToJson | Yes | | +| JsonToStructs | No | Partial support. Requires explicit schema. | +| StructsToCsv | No | Complex, Date, Timestamp, and Binary types may produce differences | +| StructsToJson | Yes | | ## Conversion Expressions diff --git a/docs/source/user-guide/latest/operators.md b/docs/source/user-guide/latest/operators.md index 77ba84e4f7..753631fe03 100644 --- a/docs/source/user-guide/latest/operators.md +++ b/docs/source/user-guide/latest/operators.md @@ -27,6 +27,8 @@ not supported by Comet will fall back to regular Spark execution. | BatchScanExec | Yes | Supports Parquet files and Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more information. | | BroadcastExchangeExec | Yes | | | BroadcastHashJoinExec | Yes | | +| CoalesceExec | Yes | | +| CollectLimitExec | Yes | | | ExpandExec | Yes | | | FileSourceScanExec | Yes | Supports Parquet files. See the [Comet Compatibility Guide] for more information. | | FilterExec | Yes | | @@ -42,6 +44,7 @@ not supported by Comet will fall back to regular Spark execution. | ShuffledHashJoinExec | Yes | | | SortExec | Yes | | | SortMergeJoinExec | Yes | | +| TakeOrderedAndProjectExec | Yes | | | UnionExec | Yes | | | WindowExec | No | Disabled by default due to known correctness issues. | From cc625bf8b9d405ccb425f30989235a599279d053 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 12 Mar 2026 13:27:19 -0600 Subject: [PATCH 2/5] style: run prettier on expressions.md --- docs/source/user-guide/latest/expressions.md | 134 +++++++++---------- 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/docs/source/user-guide/latest/expressions.md b/docs/source/user-guide/latest/expressions.md index 220bd981f1..93d78a02ed 100644 --- a/docs/source/user-guide/latest/expressions.md +++ b/docs/source/user-guide/latest/expressions.md @@ -94,73 +94,73 @@ Expressions that are not Spark-compatible will fall back to Spark by default and ## Date/Time Functions -| Expression | SQL | Spark-Compatible? | Compatibility Notes | -| -------------- | ---------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- | -| DateAdd | `date_add` | Yes | | -| DateDiff | `datediff` | Yes | | -| DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. | -| DateSub | `date_sub` | Yes | | +| Expression | SQL | Spark-Compatible? | Compatibility Notes | +| -------------- | ---------------------------- | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| DateAdd | `date_add` | Yes | | +| DateDiff | `datediff` | Yes | | +| DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. | +| DateSub | `date_sub` | Yes | | | DatePart | `date_part(field, source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | | Extract | `extract(field FROM source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | -| FromUnixTime | `from_unixtime` | No | Does not support format, supports only -8334601211038 <= sec <= 8210266876799 | -| Hour | `hour` | Yes | | -| LastDay | `last_day` | Yes | | -| MakeDate | `make_date` | Yes | | -| Minute | `minute` | Yes | | -| NextDay | `next_day` | Yes | | -| Second | `second` | Yes | | -| TruncDate | `trunc` | Yes | | -| TruncTimestamp | `date_trunc` | Yes | | -| UnixDate | `unix_date` | Yes | | -| UnixTimestamp | `unix_timestamp` | Yes | | -| Year | `year` | Yes | | -| Month | `month` | Yes | | -| DayOfMonth | `day`/`dayofmonth` | Yes | | -| DayOfWeek | `dayofweek` | Yes | | -| WeekDay | `weekday` | Yes | | -| DayOfYear | `dayofyear` | Yes | | -| WeekOfYear | `weekofyear` | Yes | | -| Quarter | `quarter` | Yes | | +| FromUnixTime | `from_unixtime` | No | Does not support format, supports only -8334601211038 <= sec <= 8210266876799 | +| Hour | `hour` | Yes | | +| LastDay | `last_day` | Yes | | +| MakeDate | `make_date` | Yes | | +| Minute | `minute` | Yes | | +| NextDay | `next_day` | Yes | | +| Second | `second` | Yes | | +| TruncDate | `trunc` | Yes | | +| TruncTimestamp | `date_trunc` | Yes | | +| UnixDate | `unix_date` | Yes | | +| UnixTimestamp | `unix_timestamp` | Yes | | +| Year | `year` | Yes | | +| Month | `month` | Yes | | +| DayOfMonth | `day`/`dayofmonth` | Yes | | +| DayOfWeek | `dayofweek` | Yes | | +| WeekDay | `weekday` | Yes | | +| DayOfYear | `dayofyear` | Yes | | +| WeekOfYear | `weekofyear` | Yes | | +| Quarter | `quarter` | Yes | | ## Math Expressions -| Expression | SQL | Spark-Compatible? | Compatibility Notes | -| -------------- | --------- | ----------------- | --------------------------------- | -| Abs | `abs` | Yes | | -| Acos | `acos` | Yes | | -| Add | `+` | Yes | | -| Asin | `asin` | Yes | | -| Atan | `atan` | Yes | | -| Atan2 | `atan2` | Yes | | -| Ceil | `ceil` | Yes | | -| Cos | `cos` | Yes | | -| Cosh | `cosh` | Yes | | -| Cot | `cot` | Yes | | -| Divide | `/` | Yes | | -| Exp | `exp` | Yes | | -| Expm1 | `expm1` | Yes | | -| Floor | `floor` | Yes | | -| Hex | `hex` | Yes | | -| IntegralDivide | `div` | Yes | | -| IsNaN | `isnan` | Yes | | -| Log | `log` | Yes | | -| Log2 | `log2` | Yes | | -| Log10 | `log10` | Yes | | -| Multiply | `*` | Yes | | -| Pow | `power` | Yes | | -| Rand | `rand` | Yes | | -| Randn | `randn` | Yes | | -| Remainder | `%` | Yes | | -| Round | `round` | Yes | | -| Signum | `signum` | Yes | | -| Sin | `sin` | Yes | | -| Sinh | `sinh` | Yes | | -| Sqrt | `sqrt` | Yes | | -| Subtract | `-` | Yes | | -| Tan | `tan` | Yes | | -| Tanh | `tanh` | Yes | | -| UnaryMinus | `-` | Yes | | -| Unhex | `unhex` | Yes | | +| Expression | SQL | Spark-Compatible? | Compatibility Notes | +| -------------- | -------- | ----------------- | ------------------- | +| Abs | `abs` | Yes | | +| Acos | `acos` | Yes | | +| Add | `+` | Yes | | +| Asin | `asin` | Yes | | +| Atan | `atan` | Yes | | +| Atan2 | `atan2` | Yes | | +| Ceil | `ceil` | Yes | | +| Cos | `cos` | Yes | | +| Cosh | `cosh` | Yes | | +| Cot | `cot` | Yes | | +| Divide | `/` | Yes | | +| Exp | `exp` | Yes | | +| Expm1 | `expm1` | Yes | | +| Floor | `floor` | Yes | | +| Hex | `hex` | Yes | | +| IntegralDivide | `div` | Yes | | +| IsNaN | `isnan` | Yes | | +| Log | `log` | Yes | | +| Log2 | `log2` | Yes | | +| Log10 | `log10` | Yes | | +| Multiply | `*` | Yes | | +| Pow | `power` | Yes | | +| Rand | `rand` | Yes | | +| Randn | `randn` | Yes | | +| Remainder | `%` | Yes | | +| Round | `round` | Yes | | +| Signum | `signum` | Yes | | +| Sin | `sin` | Yes | | +| Sinh | `sinh` | Yes | | +| Sqrt | `sqrt` | Yes | | +| Subtract | `-` | Yes | | +| Tan | `tan` | Yes | | +| Tanh | `tanh` | Yes | | +| UnaryMinus | `-` | Yes | | +| Unhex | `unhex` | Yes | | ## Hashing Functions @@ -266,11 +266,11 @@ Comet supports using the following aggregate functions within window contexts wi ## Struct Expressions -| Expression | Spark-Compatible? | Compatibility Notes | -| -------------------- | ----------------- | ------------------------------------------ | -| CreateNamedStruct | Yes | | -| GetArrayStructFields | Yes | | -| GetStructField | Yes | | +| Expression | Spark-Compatible? | Compatibility Notes | +| -------------------- | ----------------- | ------------------------------------------------------------------ | +| CreateNamedStruct | Yes | | +| GetArrayStructFields | Yes | | +| GetStructField | Yes | | | JsonToStructs | No | Partial support. Requires explicit schema. | | StructsToCsv | No | Complex, Date, Timestamp, and Binary types may produce differences | | StructsToJson | Yes | | From ccdd7de5998e4d1d795dfcc8e0da0dc6c5ce0c29 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 12 Mar 2026 17:05:14 -0600 Subject: [PATCH 3/5] docs: remove Spark-Compatible column from expressions page Fold compatibility info into the Notes column instead. Expressions are Spark-compatible unless the notes say otherwise. --- docs/source/user-guide/latest/expressions.md | 473 +++++++++---------- 1 file changed, 235 insertions(+), 238 deletions(-) diff --git a/docs/source/user-guide/latest/expressions.md b/docs/source/user-guide/latest/expressions.md index dbc544a7ad..ebdf33e3f2 100644 --- a/docs/source/user-guide/latest/expressions.md +++ b/docs/source/user-guide/latest/expressions.md @@ -19,201 +19,198 @@ # Supported Spark Expressions -Comet supports the following Spark expressions. Expressions that are marked as Spark-compatible will either run -natively in Comet and provide the same results as Spark, or will fall back to Spark for cases that would not -be compatible. +Comet supports the following Spark expressions. Unless noted otherwise, all expressions are Spark-compatible and will +produce the same results as Spark. For expressions with known incompatibilities, Comet will fall back to Spark by +default. These can be forced to run natively by setting `spark.comet.expression.EXPRNAME.allowIncompatible=true`. All expressions are enabled by default, but most can be disabled by setting `spark.comet.expression.EXPRNAME.enabled=false`, where `EXPRNAME` is the expression name as specified in the following tables, such as `Length`, or `StartsWith`. See the [Comet Configuration Guide] for a full list of expressions that be disabled. -Expressions that are not Spark-compatible will fall back to Spark by default and can be enabled by setting -`spark.comet.expression.EXPRNAME.allowIncompatible=true`. - ## Conditional Expressions -| Expression | SQL | Spark-Compatible? | -| ---------- | ------------------------------------------- | ----------------- | -| CaseWhen | `CASE WHEN expr THEN expr ELSE expr END` | Yes | -| If | `IF(predicate_expr, true_expr, false_expr)` | Yes | +| Expression | SQL | +| ---------- | ------------------------------------------- | +| CaseWhen | `CASE WHEN expr THEN expr ELSE expr END` | +| If | `IF(predicate_expr, true_expr, false_expr)` | ## Predicate Expressions -| Expression | SQL | Spark-Compatible? | -| ------------------ | ------------- | ----------------- | -| And | `AND` | Yes | -| EqualTo | `=` | Yes | -| EqualNullSafe | `<=>` | Yes | -| GreaterThan | `>` | Yes | -| GreaterThanOrEqual | `>=` | Yes | -| LessThan | `<` | Yes | -| LessThanOrEqual | `<=` | Yes | -| In | `IN` | Yes | -| IsNotNull | `IS NOT NULL` | Yes | -| IsNull | `IS NULL` | Yes | -| InSet | `IN (...)` | Yes | -| Not | `NOT` | Yes | -| Or | `OR` | Yes | +| Expression | SQL | +| ------------------ | ------------- | +| And | `AND` | +| EqualTo | `=` | +| EqualNullSafe | `<=>` | +| GreaterThan | `>` | +| GreaterThanOrEqual | `>=` | +| LessThan | `<` | +| LessThanOrEqual | `<=` | +| In | `IN` | +| IsNotNull | `IS NOT NULL` | +| IsNull | `IS NULL` | +| InSet | `IN (...)` | +| Not | `NOT` | +| Or | `OR` | ## String Functions -| Expression | Spark-Compatible? | Compatibility Notes | -| --------------- | ----------------- | ---------------------------------------------------------------------------------------------------------- | -| Ascii | Yes | | -| BitLength | Yes | | -| Chr | Yes | | -| Concat | Yes | Only string inputs are supported | -| ConcatWs | Yes | | -| Contains | Yes | | -| EndsWith | Yes | | -| InitCap | No | Behavior is different in some cases, such as hyphenated names. | -| Left | Yes | Length argument must be a literal value | -| Length | Yes | | -| Like | Yes | | -| Lower | No | Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` | -| OctetLength | Yes | | -| Reverse | Yes | | -| Right | Yes | Length argument must be a literal value | -| RLike | No | Uses Rust regexp engine, which has different behavior to Java regexp engine | -| StartsWith | Yes | | -| StringInstr | Yes | | -| StringRepeat | Yes | Negative argument for number of times to repeat causes exception | -| StringReplace | Yes | | -| StringLPad | Yes | | -| StringRPad | Yes | | -| StringSpace | Yes | | -| StringSplit | No | Regex engine differences between Java and Rust | -| StringTranslate | Yes | | -| StringTrim | Yes | | -| StringTrimBoth | Yes | | -| StringTrimLeft | Yes | | -| StringTrimRight | Yes | | -| Substring | Yes | | -| Upper | No | Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` | +| Expression | Notes | +| --------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| Ascii | | +| BitLength | | +| Chr | | +| Concat | Only string inputs are supported | +| ConcatWs | | +| Contains | | +| EndsWith | | +| InitCap | Not Spark compatible. Behavior is different in some cases, such as hyphenated names. | +| Left | Length argument must be a literal value | +| Length | | +| Like | | +| Lower | Not Spark compatible. Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` | +| OctetLength | | +| Reverse | | +| Right | Length argument must be a literal value | +| RLike | Not Spark compatible. Uses Rust regexp engine, which has different behavior to Java regexp engine | +| StartsWith | | +| StringInstr | | +| StringRepeat | Negative argument for number of times to repeat causes exception | +| StringReplace | | +| StringLPad | | +| StringRPad | | +| StringSpace | | +| StringSplit | Not Spark compatible. Regex engine differences between Java and Rust | +| StringTranslate | | +| StringTrim | | +| StringTrimBoth | | +| StringTrimLeft | | +| StringTrimRight | | +| Substring | | +| Upper | Not Spark compatible. Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` | ## Date/Time Functions -| Expression | SQL | Spark-Compatible? | Compatibility Notes | -| -------------- | ---------------------------- | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| DateAdd | `date_add` | Yes | | -| DateDiff | `datediff` | Yes | | -| DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. | -| DateSub | `date_sub` | Yes | | -| DatePart | `date_part(field, source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | -| Extract | `extract(field FROM source)` | Yes | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | -| FromUnixTime | `from_unixtime` | No | Does not support format, supports only -8334601211038 <= sec <= 8210266876799 | -| Hour | `hour` | Yes | | -| LastDay | `last_day` | Yes | | -| MakeDate | `make_date` | Yes | | -| Minute | `minute` | Yes | | -| NextDay | `next_day` | Yes | | -| Second | `second` | Yes | | -| TruncDate | `trunc` | Yes | | -| TruncTimestamp | `date_trunc` | Yes | | -| UnixDate | `unix_date` | Yes | | -| UnixTimestamp | `unix_timestamp` | Yes | | -| Year | `year` | Yes | | -| Month | `month` | Yes | | -| DayOfMonth | `day`/`dayofmonth` | Yes | | -| DayOfWeek | `dayofweek` | Yes | | -| WeekDay | `weekday` | Yes | | -| DayOfYear | `dayofyear` | Yes | | -| WeekOfYear | `weekofyear` | Yes | | -| Quarter | `quarter` | Yes | | +| Expression | SQL | Notes | +| -------------- | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| DateAdd | `date_add` | | +| DateDiff | `datediff` | | +| DateFormat | `date_format` | Partial support. Only specific format patterns are supported. | +| DateSub | `date_sub` | | +| DatePart | `date_part(field, source)` | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | +| Extract | `extract(field FROM source)` | Spark decomposes into individual functions. Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` | +| FromUnixTime | `from_unixtime` | Not Spark compatible. Does not support format, supports only -8334601211038 <= sec <= 8210266876799 | +| Hour | `hour` | | +| LastDay | `last_day` | | +| MakeDate | `make_date` | | +| Minute | `minute` | | +| NextDay | `next_day` | | +| Second | `second` | | +| TruncDate | `trunc` | | +| TruncTimestamp | `date_trunc` | | +| UnixDate | `unix_date` | | +| UnixTimestamp | `unix_timestamp` | | +| Year | `year` | | +| Month | `month` | | +| DayOfMonth | `day`/`dayofmonth` | | +| DayOfWeek | `dayofweek` | | +| WeekDay | `weekday` | | +| DayOfYear | `dayofyear` | | +| WeekOfYear | `weekofyear` | | +| Quarter | `quarter` | | ## Math Expressions -| Expression | SQL | Spark-Compatible? | Compatibility Notes | -| -------------- | --------- | ----------------- | --------------------------------- | -| Abs | `abs` | Yes | | -| Acos | `acos` | Yes | | -| Add | `+` | Yes | | -| Asin | `asin` | Yes | | -| Atan | `atan` | Yes | | -| Atan2 | `atan2` | Yes | | -| BRound | `bround` | Yes | | -| Ceil | `ceil` | Yes | | -| Cos | `cos` | Yes | | -| Cosh | `cosh` | Yes | | -| Cot | `cot` | Yes | | -| Divide | `/` | Yes | | -| Exp | `exp` | Yes | | -| Expm1 | `expm1` | Yes | | -| Floor | `floor` | Yes | | -| Hex | `hex` | Yes | | -| IntegralDivide | `div` | Yes | | -| IsNaN | `isnan` | Yes | | -| Log | `log` | Yes | | -| Log2 | `log2` | Yes | | -| Log10 | `log10` | Yes | | -| Multiply | `*` | Yes | | -| Pow | `power` | Yes | | -| Rand | `rand` | Yes | | -| Randn | `randn` | Yes | | -| Remainder | `%` | Yes | | -| Round | `round` | Yes | | -| Signum | `signum` | Yes | | -| Sin | `sin` | Yes | | -| Sinh | `sinh` | Yes | | -| Sqrt | `sqrt` | Yes | | -| Subtract | `-` | Yes | | -| Tan | `tan` | Yes | | -| Tanh | `tanh` | Yes | | -| TryAdd | `try_add` | Yes | Only integer inputs are supported | -| TryDivide | `try_div` | Yes | Only integer inputs are supported | -| TryMultiply | `try_mul` | Yes | Only integer inputs are supported | -| TrySubtract | `try_sub` | Yes | Only integer inputs are supported | -| UnaryMinus | `-` | Yes | | -| Unhex | `unhex` | Yes | | +| Expression | SQL | Notes | +| -------------- | --------- | --------------------------------- | +| Abs | `abs` | | +| Acos | `acos` | | +| Add | `+` | | +| Asin | `asin` | | +| Atan | `atan` | | +| Atan2 | `atan2` | | +| BRound | `bround` | | +| Ceil | `ceil` | | +| Cos | `cos` | | +| Cosh | `cosh` | | +| Cot | `cot` | | +| Divide | `/` | | +| Exp | `exp` | | +| Expm1 | `expm1` | | +| Floor | `floor` | | +| Hex | `hex` | | +| IntegralDivide | `div` | | +| IsNaN | `isnan` | | +| Log | `log` | | +| Log2 | `log2` | | +| Log10 | `log10` | | +| Multiply | `*` | | +| Pow | `power` | | +| Rand | `rand` | | +| Randn | `randn` | | +| Remainder | `%` | | +| Round | `round` | | +| Signum | `signum` | | +| Sin | `sin` | | +| Sinh | `sinh` | | +| Sqrt | `sqrt` | | +| Subtract | `-` | | +| Tan | `tan` | | +| Tanh | `tanh` | | +| TryAdd | `try_add` | Only integer inputs are supported | +| TryDivide | `try_div` | Only integer inputs are supported | +| TryMultiply | `try_mul` | Only integer inputs are supported | +| TrySubtract | `try_sub` | Only integer inputs are supported | +| UnaryMinus | `-` | | +| Unhex | `unhex` | | ## Hashing Functions -| Expression | Spark-Compatible? | -| ----------- | ----------------- | -| Crc32 | Yes | -| Md5 | Yes | -| Murmur3Hash | Yes | -| Sha1 | Yes | -| Sha2 | Yes | -| XxHash64 | Yes | +| Expression | +| ----------- | +| Crc32 | +| Md5 | +| Murmur3Hash | +| Sha1 | +| Sha2 | +| XxHash64 | ## Bitwise Expressions -| Expression | SQL | Spark-Compatible? | -| ------------ | ---- | ----------------- | -| BitwiseAnd | `&` | Yes | -| BitwiseCount | | Yes | -| BitwiseGet | | Yes | -| BitwiseOr | `\|` | Yes | -| BitwiseNot | `~` | Yes | -| BitwiseXor | `^` | Yes | -| ShiftLeft | `<<` | Yes | -| ShiftRight | `>>` | Yes | +| Expression | SQL | +| ------------ | ---- | +| BitwiseAnd | `&` | +| BitwiseCount | | +| BitwiseGet | | +| BitwiseOr | `\|` | +| BitwiseNot | `~` | +| BitwiseXor | `^` | +| ShiftLeft | `<<` | +| ShiftRight | `>>` | ## Aggregate Expressions -| Expression | SQL | Spark-Compatible? | Compatibility Notes | -| ------------- | ---------- | ------------------------- | ---------------------------------------------------------------- | -| Average | | Yes, except for ANSI mode | | -| BitAndAgg | | Yes | | -| BitOrAgg | | Yes | | -| BitXorAgg | | Yes | | -| BoolAnd | `bool_and` | Yes | Spark decomposes to Min/Max on boolean columns | -| BoolOr | `bool_or` | Yes | Spark decomposes to Min/Max on boolean columns | -| Corr | | Yes | | -| Count | | Yes | | -| CovPopulation | | Yes | | -| CovSample | | Yes | | -| First | | No | This function is not deterministic. Results may not match Spark. | -| Last | | No | This function is not deterministic. Results may not match Spark. | -| Max | | Yes | | -| Min | | Yes | | -| StddevPop | | Yes | | -| StddevSamp | | Yes | | -| Sum | | Yes, except for ANSI mode | | -| VariancePop | | Yes | | -| VarianceSamp | | Yes | | +| Expression | SQL | Notes | +| ------------- | ---------- | -------------------------------------------------------------------------------------- | +| Average | | Not supported in ANSI mode | +| BitAndAgg | | | +| BitOrAgg | | | +| BitXorAgg | | | +| BoolAnd | `bool_and` | Spark decomposes to Min/Max on boolean columns | +| BoolOr | `bool_or` | Spark decomposes to Min/Max on boolean columns | +| Corr | | | +| Count | | | +| CovPopulation | | | +| CovSample | | | +| First | | Not Spark compatible. This function is not deterministic. Results may not match Spark. | +| Last | | Not Spark compatible. This function is not deterministic. Results may not match Spark. | +| Max | | | +| Min | | | +| StddevPop | | | +| StddevSamp | | | +| Sum | | Not supported in ANSI mode | +| VariancePop | | | +| VarianceSamp | | | ## Window Functions @@ -223,98 +220,98 @@ Window support is disabled by default due to known correctness issues. Tracking Comet supports using the following aggregate functions within window contexts with PARTITION BY and ORDER BY clauses. -| Expression | Spark-Compatible? | Compatibility Notes | -| ---------- | ----------------- | ------------------- | -| Count | Yes | | -| Max | Yes | | -| Min | Yes | | -| Sum | Yes | | +| Expression | +| ---------- | +| Count | +| Max | +| Min | +| Sum | **Note:** Dedicated window functions such as `rank`, `dense_rank`, `row_number`, `lag`, `lead`, `ntile`, `cume_dist`, `percent_rank`, and `nth_value` are not currently supported and will fall back to Spark. ## Array Expressions -| Expression | Spark-Compatible? | Compatibility Notes | -| -------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| ArrayAppend | No | | -| ArrayCompact | No | | -| ArrayContains | No | Returns null instead of false for empty arrays with literal values ([#3346](https://github.com/apache/datafusion-comet/issues/3346)) | -| ArrayDistinct | No | Behaves differently than spark. Comet first sorts then removes duplicates while Spark preserves the original order. | -| ArrayExcept | No | | -| ArrayFilter | Yes | Only supports case where function is `IsNotNull` | -| ArrayInsert | No | | -| ArrayIntersect | No | | -| ArrayJoin | No | | -| ArrayMax | Yes | | -| ArrayMin | Yes | | -| ArrayRemove | No | Returns null when element is null instead of removing null elements ([#3173](https://github.com/apache/datafusion-comet/issues/3173)) | -| ArrayRepeat | No | | -| ArrayUnion | No | Behaves differently than spark. Comet sorts the input arrays before performing the union, while Spark preserves the order of the first array and appends unique elements from the second. | -| ArraysOverlap | No | | -| CreateArray | Yes | | -| ElementAt | Yes | Input must be an array. Map inputs are not supported. | -| Flatten | Yes | | -| GetArrayItem | Yes | | -| Size | Yes | Only array inputs are supported. Map inputs are not supported. | +| Expression | Notes | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| ArrayAppend | Not Spark compatible | +| ArrayCompact | Not Spark compatible | +| ArrayContains | Not Spark compatible. Returns null instead of false for empty arrays with literal values ([#3346](https://github.com/apache/datafusion-comet/issues/3346)) | +| ArrayDistinct | Not Spark compatible. Comet first sorts then removes duplicates while Spark preserves the original order. | +| ArrayExcept | Not Spark compatible | +| ArrayFilter | Only supports case where function is `IsNotNull` | +| ArrayInsert | Not Spark compatible | +| ArrayIntersect | Not Spark compatible | +| ArrayJoin | Not Spark compatible | +| ArrayMax | | +| ArrayMin | | +| ArrayRemove | Not Spark compatible. Returns null when element is null instead of removing null elements ([#3173](https://github.com/apache/datafusion-comet/issues/3173)) | +| ArrayRepeat | Not Spark compatible | +| ArrayUnion | Not Spark compatible. Comet sorts the input arrays before performing the union, while Spark preserves the order of the first array and appends unique elements from the second. | +| ArraysOverlap | Not Spark compatible | +| CreateArray | | +| ElementAt | Input must be an array. Map inputs are not supported. | +| Flatten | | +| GetArrayItem | | +| Size | Only array inputs are supported. Map inputs are not supported. | ## Map Expressions -| Expression | Spark-Compatible? | Compatibility Notes | -| -------------- | ----------------- | ------------------------------------------- | -| GetMapValue | Yes | | -| MapContainsKey | Yes | | -| MapEntries | Yes | | -| MapFromArrays | Yes | | -| MapFromEntries | No | Binary key or value types are not supported | -| MapKeys | Yes | | -| MapValues | Yes | | +| Expression | Notes | +| -------------- | ----------------------------------------------------------------- | +| GetMapValue | | +| MapContainsKey | | +| MapEntries | | +| MapFromArrays | | +| MapFromEntries | Not Spark compatible. Binary key or value types are not supported | +| MapKeys | | +| MapValues | | ## Struct Expressions -| Expression | Spark-Compatible? | Compatibility Notes | -| -------------------- | ----------------- | ------------------------------------------------------------------ | -| CreateNamedStruct | Yes | | -| GetArrayStructFields | Yes | | -| GetStructField | Yes | | -| JsonToStructs | No | Partial support. Requires explicit schema. | -| StructsToCsv | No | Complex, Date, Timestamp, and Binary types may produce differences | -| StructsToJson | Yes | | +| Expression | Notes | +| -------------------- | ---------------------------------------------------------------------------------------- | +| CreateNamedStruct | | +| GetArrayStructFields | | +| GetStructField | | +| JsonToStructs | Not Spark compatible. Partial support. Requires explicit schema. | +| StructsToCsv | Not Spark compatible. Complex, Date, Timestamp, and Binary types may produce differences | +| StructsToJson | | ## Conversion Expressions -| Expression | Spark-Compatible | Compatibility Notes | -| ---------- | ------------------------ | ------------------------------------------------------------------------------------------- | -| Cast | Depends on specific cast | See the [Comet Compatibility Guide] for list of supported cast expressions and known issues | +| Expression | Notes | +| ---------- | --------------------------------------------------------------------------------------------------------------------- | +| Cast | Depends on specific cast. See the [Comet Compatibility Guide] for list of supported cast expressions and known issues | ## SortOrder -| Expression | Spark-Compatible? | Compatibility Notes | -| ---------- | ----------------- | ------------------- | -| NullsFirst | Yes | | -| NullsLast | Yes | | -| Ascending | Yes | | -| Descending | Yes | | +| Expression | +| ---------- | +| NullsFirst | +| NullsLast | +| Ascending | +| Descending | ## Other -| Expression | Spark-Compatible? | Compatibility Notes | -| ---------------------------- | ----------------- | --------------------------------------------------------------------------- | -| Alias | Yes | | -| AttributeReference | Yes | | -| BloomFilterMightContain | Yes | | -| Coalesce | Yes | | -| CheckOverflow | Yes | | -| KnownFloatingPointNormalized | Yes | | -| Literal | Yes | | -| MakeDecimal | Yes | | -| MonotonicallyIncreasingID | Yes | | -| NormalizeNaNAndZero | Yes | | -| PromotePrecision | Yes | | -| RegExpReplace | No | Uses Rust regexp engine, which has different behavior to Java regexp engine | -| ScalarSubquery | Yes | | -| SparkPartitionID | Yes | | -| ToPrettyString | Yes | | -| UnscaledValue | Yes | | +| Expression | Notes | +| ---------------------------- | ------------------------------------------------------------------------------------------------- | +| Alias | | +| AttributeReference | | +| BloomFilterMightContain | | +| Coalesce | | +| CheckOverflow | | +| KnownFloatingPointNormalized | | +| Literal | | +| MakeDecimal | | +| MonotonicallyIncreasingID | | +| NormalizeNaNAndZero | | +| PromotePrecision | | +| RegExpReplace | Not Spark compatible. Uses Rust regexp engine, which has different behavior to Java regexp engine | +| ScalarSubquery | | +| SparkPartitionID | | +| ToPrettyString | | +| UnscaledValue | | [Comet Configuration Guide]: configs.md [Comet Compatibility Guide]: compatibility.md From bebd33e76deb0f5e95f60bb3389c3be6ad521a7a Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 12 Mar 2026 17:05:58 -0600 Subject: [PATCH 4/5] docs: remove Spark-Compatible column from operators page --- docs/source/user-guide/latest/operators.md | 50 +++++++++++----------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/docs/source/user-guide/latest/operators.md b/docs/source/user-guide/latest/operators.md index 753631fe03..12ec4821ae 100644 --- a/docs/source/user-guide/latest/operators.md +++ b/docs/source/user-guide/latest/operators.md @@ -22,30 +22,30 @@ The following Spark operators are currently replaced with native versions. Query stages that contain any operators not supported by Comet will fall back to regular Spark execution. -| Operator | Spark-Compatible? | Compatibility Notes | -| --------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------ | -| BatchScanExec | Yes | Supports Parquet files and Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more information. | -| BroadcastExchangeExec | Yes | | -| BroadcastHashJoinExec | Yes | | -| CoalesceExec | Yes | | -| CollectLimitExec | Yes | | -| ExpandExec | Yes | | -| FileSourceScanExec | Yes | Supports Parquet files. See the [Comet Compatibility Guide] for more information. | -| FilterExec | Yes | | -| GenerateExec | Yes | Supports `explode` generator only. | -| GlobalLimitExec | Yes | | -| HashAggregateExec | Yes | | -| InsertIntoHadoopFsRelationCommand | No | Experimental support for native Parquet writes. Disabled by default. | -| LocalLimitExec | Yes | | -| LocalTableScanExec | No | Experimental and disabled by default. | -| ObjectHashAggregateExec | Yes | Supports a limited number of aggregates, such as `bloom_filter_agg`. | -| ProjectExec | Yes | | -| ShuffleExchangeExec | Yes | | -| ShuffledHashJoinExec | Yes | | -| SortExec | Yes | | -| SortMergeJoinExec | Yes | | -| TakeOrderedAndProjectExec | Yes | | -| UnionExec | Yes | | -| WindowExec | No | Disabled by default due to known correctness issues. | +| Operator | Notes | +| --------------------------------- | ------------------------------------------------------------------------------------------------------------------ | +| BatchScanExec | Supports Parquet files and Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more information. | +| BroadcastExchangeExec | | +| BroadcastHashJoinExec | | +| CoalesceExec | | +| CollectLimitExec | | +| ExpandExec | | +| FileSourceScanExec | Supports Parquet files. See the [Comet Compatibility Guide] for more information. | +| FilterExec | | +| GenerateExec | Supports `explode` generator only. | +| GlobalLimitExec | | +| HashAggregateExec | | +| InsertIntoHadoopFsRelationCommand | Not Spark compatible. Experimental support for native Parquet writes. Disabled by default. | +| LocalLimitExec | | +| LocalTableScanExec | Not Spark compatible. Experimental and disabled by default. | +| ObjectHashAggregateExec | Supports a limited number of aggregates, such as `bloom_filter_agg`. | +| ProjectExec | | +| ShuffleExchangeExec | | +| ShuffledHashJoinExec | | +| SortExec | | +| SortMergeJoinExec | | +| TakeOrderedAndProjectExec | | +| UnionExec | | +| WindowExec | Not Spark compatible. Disabled by default due to known correctness issues. | [Comet Compatibility Guide]: compatibility.md From f4a189061ba16428a77831d2cbb8aa6cd47999be Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 12 Mar 2026 17:13:06 -0600 Subject: [PATCH 5/5] docs: update version references for 0.14.0 release --- docs/generate-versions.py | 4 ++-- docs/source/user-guide/latest/iceberg.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/generate-versions.py b/docs/generate-versions.py index 4e4334578b..57357ec71e 100644 --- a/docs/generate-versions.py +++ b/docs/generate-versions.py @@ -104,6 +104,6 @@ def generate_docs(snapshot_version: str, latest_released_version: str, previous_ if __name__ == "__main__": print("Generating versioned user guide docs...") snapshot_version = get_version_from_pom() - latest_released_version = "0.13.0" - previous_versions = ["0.10.1", "0.11.0", "0.12.0"] + latest_released_version = "0.14.0" + previous_versions = ["0.11.0", "0.12.0", "0.13.0"] generate_docs(snapshot_version, latest_released_version, previous_versions) \ No newline at end of file diff --git a/docs/source/user-guide/latest/iceberg.md b/docs/source/user-guide/latest/iceberg.md index ad6e5b2433..e6b626eefc 100644 --- a/docs/source/user-guide/latest/iceberg.md +++ b/docs/source/user-guide/latest/iceberg.md @@ -157,7 +157,7 @@ code. Instead, Comet relies on reflection to extract `FileScanTask`s from Iceber then serialized to Comet's native execution engine (see [PR #2528](https://github.com/apache/datafusion-comet/pull/2528)). -The example below uses Spark's package downloader to retrieve Comet 0.12.0 and Iceberg +The example below uses Spark's package downloader to retrieve Comet 0.14.0 and Iceberg 1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, and 1.10. The key configuration to enable fully-native Iceberg is `spark.comet.scan.icebergNative.enabled=true`. This configuration should **not** be used with the hybrid Iceberg configuration @@ -165,7 +165,7 @@ configuration should **not** be used with the hybrid Iceberg configuration ```shell $SPARK_HOME/bin/spark-shell \ - --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ + --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ --repositories https://repo1.maven.org/maven2/ \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \ @@ -237,7 +237,7 @@ configure Spark to use a REST catalog with Comet's native Iceberg scan: ```shell $SPARK_HOME/bin/spark-shell \ - --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ + --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ --repositories https://repo1.maven.org/maven2/ \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.rest_cat=org.apache.iceberg.spark.SparkCatalog \