Skip to content

Commit 9621f8c

Browse files
authored
GH-541: Document status of file_path (#542)
1 parent 4b1c72c commit 9621f8c

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

src/main/thrift/parquet.thrift

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -963,6 +963,21 @@ union ColumnCryptoMetaData {
963963
struct ColumnChunk {
964964
/** File where column data is stored. If not set, assumed to be same file as
965965
* metadata. This path is relative to the current file.
966+
*
967+
* As of December 2025, the only known use-case for this field is writing summary
968+
* parquet files (i.e. "_metadata" files). These files consolidate footers from
969+
* multiple parquet files to allow for efficient reading of footers to avoid file
970+
* listing costs and prune out files that do not need to be read based on statistics.
971+
*
972+
* These files do not appear to have ever been formally specified in the specification.
973+
* and are potentially problematic from a correctness perspective [1].
974+
*
975+
* [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d
976+
*
977+
* There is no other known usage of this field. Specifically, there are no known
978+
* reference implementations that will read externally stored column data if this field is populated
979+
* within a standard parquet file. Making use of the field for this purpose is
980+
* not considered part of the Parquet specification.
966981
**/
967982
1: optional string file_path
968983

0 commit comments

Comments
 (0)