fix(parquet): validate INT96 column metadata statistics#10003
fix(parquet): validate INT96 column metadata statistics#10003fallintoplace wants to merge 1 commit into
Conversation
etseidl
left a comment
There was a problem hiding this comment.
looks good, thanks @fallintoplace
| } | ||
|
|
||
| #[test] | ||
| fn test_convert_stats_returns_error_for_overlong_int96_statistics() { |
There was a problem hiding this comment.
Verified this panics without the fix.
|
run benchmark metadata |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing parquet-int96-stats-error (a10a041) to d48c305 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark metadata |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing parquet-int96-stats-error (a10a041) to d48c305 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Closes #10002.
Rationale for this change
Malformed Parquet footer metadata can contain INT96 statistics whose encoded min or max value is longer than 12 bytes. The footer metadata conversion path checked that INT96 statistics were at least 12 bytes, but then asserted they were exactly 12 bytes. That allowed malformed input to panic instead of returning an error.
The page-statistics path already returns an error for non-12-byte INT96 statistics, so this change makes the footer metadata path behave consistently.
What changes are included in this PR?
This PR replaces the INT96 min/max length assertions in footer metadata statistics conversion with explicit
ParquetErrorreturns.It also adds a regression test covering overlong INT96 min and max values in column metadata statistics.
Are these changes tested?
Yes. I ran:
cargo fmt --allcargo +stable fmt --all -- --checkcargo fmt -p parquet -- --check --config skip_children=true $(find ./parquet -name "*.rs" ! -name format.rs)cargo test -p parquet --lib file::metadata::thrift::tests::test_convert_stats_returns_error_for_overlong_int96_statisticscargo test -p parquet --lib file::metadata::thrift::testscargo test -p parquetcargo check -p parquet --all-targetscargo clippy -p parquet --all-targets --all-features -- -D warningsAre there any user-facing changes?
Malformed INT96 column metadata statistics now return an error instead of panicking.