Skip to content

[parquet] Allow more encryption algorithms#9203

Draft
hsiang-c wants to merge 8 commits intoapache:mainfrom
hsiang-c:allow_more_enc_algo
Draft

[parquet] Allow more encryption algorithms#9203
hsiang-c wants to merge 8 commits intoapache:mainfrom
hsiang-c:allow_more_enc_algo

Conversation

@hsiang-c
Copy link

@hsiang-c hsiang-c commented Jan 16, 2026

Which issue does this PR close?

Rationale for this change

  • Iceberg spec supports AES key sizes of 128, 192 and 256 bits. Iceberg Rust depends on arrow-rs for Parquet I/O, I'd like to start supporting AES 256 with this PR.

What changes are included in this PR?

  • RingGcmBlockEncryptor and RingGcmBlockDecryptor will pick AES-128 or AES-256 based on key size

Are these changes tested?

Yes, unit test and on AES-256 encrypted Parquet files.

Are there any user-facing changes?

No

Copy link
Contributor

@xanderbailey xanderbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I exposed quote style here I exported the underlying QuoteStyle is that something we might want to do here also for Algorithm?

@hsiang-c hsiang-c marked this pull request as ready for review January 23, 2026 07:27
@hsiang-c
Copy link
Author

@xanderbailey

Thank you for the review, I took a look at your PR. You made QuoteStyle part of Writer and expose as part of the API.

Do you mean you'd like to make Algorithm a field of RingGcmBlockDecryptor and RingGcmBlockEncryptor and expose them instead of importing it from ring::aead?

use crate::errors::Result;
use crate::file::metadata::HeapSize;
use ring::aead::{AES_128_GCM, Aad, LessSafeKey, NonceSequence, UnboundKey};
use ring::aead::{AES_128_GCM, Aad, LessSafeKey, NonceSequence, UnboundKey, Algorithm};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this have cargo fmt run on it? The order looks off to me.

keys: DecryptionKeys,
aad_prefix: Option<Vec<u8>>,
footer_signature_verification: bool,
algorithm: &'static Algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why static?

@alamb
Copy link
Contributor

alamb commented Feb 3, 2026

Thanks @hsiang-c and @mbutrovich -- looks like there are some CI failures on this PR.

Also, it seems like we should have some tests for the new features

@alamb alamb changed the title Allow more encryption algorithms [parquet] Allow more encryption algorithms Feb 11, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hsiang-c

FYI @adamreeve and @rok as the original authors of the parquet encryption feature

Perhaps you have some time to review this PR?

}

impl RingGcmBlockDecryptor {
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to allow dead code? Maybe we should remove it instead?


#[cfg(feature = "encryption")]
let expected_size_with_decryptor = 3080;
#[cfg(not(feature = "encryption"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? The whole test is gated by

    #[cfg(feature = "encryption")]
    fn test_memory_size_with_decryptor() {

it seems like you just need to update the 3072 to 3080 to reflect the additional size?

#[test]
fn test_round_trip_with_incorrect_key_length() {
let key = [0u8; 16];
assert!(RingGcmBlockEncryptor::new_with_algorithm(&CHACHA20_POLY1305, &key).is_err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me that CHACHA is a valid Parquet encryption: https://github.com/apache/parquet-format/blob/master/Encryption.md :

}

/// The AEAD decryption algorithm to be used.
pub fn with_algorithm(mut self, algorithm: &'static Algorithm) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this effectively makes Algorithm a part of the public API I think -- this might be ok, but it also means we would not be able to change the underlying crypto library without breaking the API in the future

Copy link
Contributor

@adamreeve adamreeve Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I don't like this and I don't think it's necessary, see my comment on new_with_algorithm. This also makes it possible for users to easily create Parquet files that aren't compliant with the spec by using a non-standard algorithm.

Changing the crypto library is something we might potentially want to do to be able to add AES_GCM_CTR support for example (#7258 (comment)).

I think if we want to support new algorithms like AES_GCM_CTR in the future the user should provide an enum value from a supported set of algorithms like is done in the C++ implementation, but we can infer the key length from the provided key so this isn't necessary for this change.

Copy link
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this @hsiang-c! I've left a few comments

Comment on lines 49 to 52
pub(crate) fn new_with_algorithm(
algorithm: &'static Algorithm,
key_bytes: &[u8],
) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any need to add this new constructor, we can reuse the existing new method above and change the algorithm based on the length of the key.

This is also a bit semantically confusing as you could create a new "RingGcmBlockDecryptor" with a non-GCM algorithm.

Comment on lines 158 to 161
pub(crate) fn new_with_algorithm(
algorithm: &'static Algorithm,
key_bytes: &[u8],
) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think we should reuse the existing new method.

}

/// The AEAD decryption algorithm to be used.
pub fn with_algorithm(mut self, algorithm: &'static Algorithm) -> Self {
Copy link
Contributor

@adamreeve adamreeve Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I don't like this and I don't think it's necessary, see my comment on new_with_algorithm. This also makes it possible for users to easily create Parquet files that aren't compliant with the spec by using a non-standard algorithm.

Changing the crypto library is something we might potentially want to do to be able to add AES_GCM_CTR support for example (#7258 (comment)).

I think if we want to support new algorithms like AES_GCM_CTR in the future the user should provide an enum value from a supported set of algorithms like is done in the C++ implementation, but we can infer the key length from the provided key so this isn't necessary for this change.

@@ -115,7 +115,7 @@ async fn test_misspecified_encryption_keys() {

// Too short footer key
check_for_error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some higher level tests for round-tripping files with larger key sizes? It might be beneficial to other Parquet implementations too if you could add files that use larger keys to the parquet-testing repository.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamreeve Will do, thanks!

@hsiang-c hsiang-c marked this pull request as draft February 14, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Support other encryption/decryption key size

5 participants