[parquet] Allow more encryption algorithms#9203
[parquet] Allow more encryption algorithms#9203hsiang-c wants to merge 8 commits intoapache:mainfrom
Conversation
xanderbailey
left a comment
There was a problem hiding this comment.
When I exposed quote style here I exported the underlying QuoteStyle is that something we might want to do here also for Algorithm?
|
Thank you for the review, I took a look at your PR. You made Do you mean you'd like to make |
parquet/src/encryption/ciphers.rs
Outdated
| use crate::errors::Result; | ||
| use crate::file::metadata::HeapSize; | ||
| use ring::aead::{AES_128_GCM, Aad, LessSafeKey, NonceSequence, UnboundKey}; | ||
| use ring::aead::{AES_128_GCM, Aad, LessSafeKey, NonceSequence, UnboundKey, Algorithm}; |
There was a problem hiding this comment.
Did this have cargo fmt run on it? The order looks off to me.
parquet/src/encryption/decrypt.rs
Outdated
| keys: DecryptionKeys, | ||
| aad_prefix: Option<Vec<u8>>, | ||
| footer_signature_verification: bool, | ||
| algorithm: &'static Algorithm |
|
Thanks @hsiang-c and @mbutrovich -- looks like there are some CI failures on this PR. Also, it seems like we should have some tests for the new features |
alamb
left a comment
There was a problem hiding this comment.
Thanks @hsiang-c
FYI @adamreeve and @rok as the original authors of the parquet encryption feature
Perhaps you have some time to review this PR?
parquet/src/encryption/ciphers.rs
Outdated
| } | ||
|
|
||
| impl RingGcmBlockDecryptor { | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
why do we need to allow dead code? Maybe we should remove it instead?
parquet/src/file/metadata/mod.rs
Outdated
|
|
||
| #[cfg(feature = "encryption")] | ||
| let expected_size_with_decryptor = 3080; | ||
| #[cfg(not(feature = "encryption"))] |
There was a problem hiding this comment.
Why is this needed? The whole test is gated by
#[cfg(feature = "encryption")]
fn test_memory_size_with_decryptor() {it seems like you just need to update the 3072 to 3080 to reflect the additional size?
parquet/src/encryption/ciphers.rs
Outdated
| #[test] | ||
| fn test_round_trip_with_incorrect_key_length() { | ||
| let key = [0u8; 16]; | ||
| assert!(RingGcmBlockEncryptor::new_with_algorithm(&CHACHA20_POLY1305, &key).is_err()); |
There was a problem hiding this comment.
It isn't clear to me that CHACHA is a valid Parquet encryption: https://github.com/apache/parquet-format/blob/master/Encryption.md :
parquet/src/encryption/decrypt.rs
Outdated
| } | ||
|
|
||
| /// The AEAD decryption algorithm to be used. | ||
| pub fn with_algorithm(mut self, algorithm: &'static Algorithm) -> Self { |
There was a problem hiding this comment.
this effectively makes Algorithm a part of the public API I think -- this might be ok, but it also means we would not be able to change the underlying crypto library without breaking the API in the future
There was a problem hiding this comment.
Yes I don't like this and I don't think it's necessary, see my comment on new_with_algorithm. This also makes it possible for users to easily create Parquet files that aren't compliant with the spec by using a non-standard algorithm.
Changing the crypto library is something we might potentially want to do to be able to add AES_GCM_CTR support for example (#7258 (comment)).
I think if we want to support new algorithms like AES_GCM_CTR in the future the user should provide an enum value from a supported set of algorithms like is done in the C++ implementation, but we can infer the key length from the provided key so this isn't necessary for this change.
parquet/src/encryption/ciphers.rs
Outdated
| pub(crate) fn new_with_algorithm( | ||
| algorithm: &'static Algorithm, | ||
| key_bytes: &[u8], | ||
| ) -> Result<Self> { |
There was a problem hiding this comment.
I don't think there's any need to add this new constructor, we can reuse the existing new method above and change the algorithm based on the length of the key.
This is also a bit semantically confusing as you could create a new "RingGcmBlockDecryptor" with a non-GCM algorithm.
parquet/src/encryption/ciphers.rs
Outdated
| pub(crate) fn new_with_algorithm( | ||
| algorithm: &'static Algorithm, | ||
| key_bytes: &[u8], | ||
| ) -> Result<Self> { |
There was a problem hiding this comment.
As above, I think we should reuse the existing new method.
parquet/src/encryption/decrypt.rs
Outdated
| } | ||
|
|
||
| /// The AEAD decryption algorithm to be used. | ||
| pub fn with_algorithm(mut self, algorithm: &'static Algorithm) -> Self { |
There was a problem hiding this comment.
Yes I don't like this and I don't think it's necessary, see my comment on new_with_algorithm. This also makes it possible for users to easily create Parquet files that aren't compliant with the spec by using a non-standard algorithm.
Changing the crypto library is something we might potentially want to do to be able to add AES_GCM_CTR support for example (#7258 (comment)).
I think if we want to support new algorithms like AES_GCM_CTR in the future the user should provide an enum value from a supported set of algorithms like is done in the C++ implementation, but we can infer the key length from the provided key so this isn't necessary for this change.
| @@ -115,7 +115,7 @@ async fn test_misspecified_encryption_keys() { | |||
|
|
|||
| // Too short footer key | |||
| check_for_error( | |||
There was a problem hiding this comment.
Could you please add some higher level tests for round-tripping files with larger key sizes? It might be beneficial to other Parquet implementations too if you could add files that use larger keys to the parquet-testing repository.
Which issue does this PR close?
Rationale for this change
arrow-rsfor Parquet I/O, I'd like to start supporting AES 256 with this PR.What changes are included in this PR?
RingGcmBlockEncryptorandRingGcmBlockDecryptorwill pick AES-128 or AES-256 based on key sizeAre these changes tested?
Yes, unit test and on AES-256 encrypted Parquet files.
Are there any user-facing changes?
No