What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark aes_encrypt function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The AesEncrypt expression provides AES (Advanced Encryption Standard) encryption functionality in Spark SQL. It encrypts binary input data using a specified key, encryption mode, padding scheme, initialization vector (IV), and optional additional authenticated data (AAD). This expression is implemented as a runtime replaceable that delegates to native implementation methods for performance.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
aes_encrypt(input, key [, mode [, padding [, iv [, aad]]]])
// DataFrame API
import org.apache.spark.sql.catalyst.expressions.AesEncrypt
AesEncrypt(inputExpr, keyExpr, modeExpr, paddingExpr, ivExpr, aadExpr)
Arguments:
| Argument |
Type |
Description |
| input |
BinaryType |
The binary data to encrypt |
| key |
BinaryType |
The encryption key as binary data |
| mode |
StringType |
The encryption mode (defaults to "GCM") |
| padding |
StringType |
The padding scheme (defaults to "DEFAULT") |
| iv |
BinaryType |
The initialization vector (defaults to empty) |
| aad |
BinaryType |
Additional authenticated data for GCM mode (defaults to empty) |
Return Type: Returns BinaryType - the encrypted data as a binary array.
Supported Data Types:
- Input data: Binary type only
- Key: Binary type only
- Mode: String type with collation support (trim collation supported)
- Padding: String type with collation support (trim collation supported)
- IV: Binary type only
- AAD: Binary type only
Edge Cases:
- Null inputs: Follows standard Spark null propagation - any null input produces null output
- Empty AAD: When AAD parameter is omitted, defaults to empty binary literal
- Empty IV: When IV parameter is omitted, defaults to empty binary literal
- Invalid key sizes: Behavior depends on underlying AES implementation in ExpressionImplUtils
- Mode/padding combinations: Some mode and padding combinations may not be supported
Examples:
-- Basic encryption with default GCM mode
SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH'));
-- Full specification with all parameters
SELECT base64(aes_encrypt(
'Spark',
'abcdefghijklmnop12345678ABCDEFGH',
'GCM',
'DEFAULT',
unhex('000000000000000000000000'),
'This is an AAD mixed into the input'
));
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(base64(expr("aes_encrypt(data, key, 'GCM', 'DEFAULT', iv, aad)")))
// Using expression directly
import org.apache.spark.sql.catalyst.expressions._
val encrypted = AesEncrypt(col("data").expr, col("key").expr)
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Large
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.AesEncrypt
Related:
AesDecrypt - corresponding decryption function
base64/unbase64 - commonly used for encoding encrypted binary output
unhex/hex - for converting hexadecimal strings to binary data
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
aes_encryptfunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.The
AesEncryptexpression provides AES (Advanced Encryption Standard) encryption functionality in Spark SQL. It encrypts binary input data using a specified key, encryption mode, padding scheme, initialization vector (IV), and optional additional authenticated data (AAD). This expression is implemented as a runtime replaceable that delegates to native implementation methods for performance.Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
Return Type: Returns
BinaryType- the encrypted data as a binary array.Supported Data Types:
Edge Cases:
Examples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Large
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.AesEncryptRelated:
AesDecrypt- corresponding decryption functionbase64/unbase64- commonly used for encoding encrypted binary outputunhex/hex- for converting hexadecimal strings to binary dataThis issue was auto-generated from Spark reference documentation.