Skip to content

Commit f0c75b9

Browse files
committed
[T2] Wide column metadata improvemnts
1. Make `ColumnMetaData.type` optional 2. Make `ColumnMetaData.path_in_schema` optional 3. Add `ColumnMetaData.schema_index`. This is the ordinal in `FileMetaData.schema` this column corresponds to. This allows sparse representation of columns in a rowgroup.
1 parent 384bedd commit f0c75b9

1 file changed

Lines changed: 22 additions & 5 deletions

File tree

src/main/thrift/parquet.thrift

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ enum Encoding {
490490
// GROUP_VAR_INT = 1;
491491

492492
/**
493-
* Deprecated: Dictionary encoding. The values in the dictionary are encoded in the
493+
* DEPRECATED: Dictionary encoding. The values in the dictionary are encoded in the
494494
* plain type.
495495
* in a data page use RLE_DICTIONARY instead.
496496
* in a Dictionary page use PLAIN instead
@@ -772,15 +772,25 @@ struct PageEncodingStats {
772772
* Description for column metadata
773773
*/
774774
struct ColumnMetaData {
775-
/** Type of this column **/
776-
1: required Type type
775+
/**
776+
* DEPRECATED: can be found in SchemaElement
777+
*
778+
* Writers MUST NOT omit this field until 2025-10-01.
779+
* Readers MUST ignore this field before 2025-10-01.
780+
*/
781+
1: optional Type type
777782

778783
/** Set of all encodings used for this column. The purpose is to validate
779784
* whether we can decode those pages. **/
780785
2: required list<Encoding> encodings
781786

782-
/** Path in schema **/
783-
3: required list<string> path_in_schema
787+
/**
788+
* DEPRECATED: can be found in SchemaElement
789+
*
790+
* Writers MUST NOT omit this field until 2025-10-01.
791+
* Readers MUST ignore this field before 2025-10-01.
792+
*/
793+
3: optional list<string> path_in_schema
784794

785795
/** Compression codec **/
786796
4: required CompressionCodec codec
@@ -833,6 +843,13 @@ struct ColumnMetaData {
833843
* filter pushdown.
834844
*/
835845
16: optional SizeStatistics size_statistics;
846+
847+
/**
848+
* The index into FileMetadata.schema (list<SchemaElement>) for this column.
849+
* This implies that ColumnMetaData can be sparse in a rowgroup, if for example
850+
* a column does not have any data pages in a rowgroup.
851+
*/
852+
17: optional i32 schema_index;
836853
}
837854

838855
struct EncryptionWithFooterKey {

0 commit comments

Comments
 (0)