Skip to content

MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550)#552

Merged
alamb merged 6 commits intoapache:masterfrom
nkaki:master
Feb 9, 2026
Merged

MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550)#552
alamb merged 6 commits intoapache:masterfrom
nkaki:master

Conversation

@nkaki
Copy link
Contributor

@nkaki nkaki commented Jan 29, 2026

Rationale for this change

To make it easier for readers to get an overall picture of Parquet encodings.

What changes are included in this PR?

Adds a summary table to Encodings.md that lists the encoding types (each linked to its description), enums and targets for different Parquet format versions.

See rendered format here: https://github.com/nkaki/parquet-format/blob/master/Encodings.md

Do these changes have PoC implementations?

No - Documentation change only

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nkaki -- this is a great start

Encodings.md Outdated

### Supported Encodings

| Encoding type | Encoding enum | Encoding Targets <br> (Parquet 2.0.0+) | Encoding Targets <br> (Parquet 1.0.0+) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have been trying to avoid the nomenclature of "parquet 2.0" as its definition is not universally agreed upon.

I recommend we remove the separate columns and instead focus on helping people navigate the current version of the spec

I am also not sure about the differences in different encoding targets (e.g. PLAIN_DICTIONARY) --- maybe we can simply not include that in the table as it has been deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Thank you for the review!

I think we have been trying to avoid the nomenclature of "parquet 2.0" as its definition is not universally agreed upon. 
I recommend we remove the separate columns and instead focus on helping people navigate the current version of the spec

I agree on focusing on current versions spec. At some point it would be great to make the parquet site able to see the previous versions easily. For the table I will remove the last column and rename the thrid one.

And just a question, would Data Page V2 (header?) would be a better term in this case?

I am also not sure about the differences in different encoding targets (e.g. PLAIN_DICTIONARY) --- maybe we can simply not include that in the table as it has been deprecated?

For PLAIN_DICTIONARY and RLE_DICTIONARY, I will merge the rows and mark PLAIN_DICTIONARY enum as deprecated.

For BIT_PACKED, since the deprecated encodings are still explained in the document and it is linked by other encodings , I thought it should be in the table and linked to the details. I think there are few options.

  1. Remove BIT_PACKED encoding from the table (your suggestion)
  2. Remove BIT_PACKED encoding description from the page and from the table (this may break links).
  3. Seperate currently supported and deprecated encodings as seperate tables, and change the layout of the page.
  • Layout A:
    supported encodings table
    deprecated encodings table (only BIT_PACKED)
    supported + deprecated encodings descriptions (current order)
  • Layout B:
    supported encodings table
    supported encodings descriptions (current order with out BIT_PACKED)
    deprecated encodings table (only BIT_PACKED)
    deprecated encodings descriptions (only BIT_PACKED)
  • Layout C:
    supported encodings table
    deprecated encodings table (only BIT_PACKED)
    supported encodings descriptions (current order with out BIT_PACKED)
    deprecated encodings descriptions (only BIT_PACKED)

Also about Encoding Targets column should I just list the physical types? removing other encoding targets (e.g. Repetition and definition levels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed v1 columns and seperated the table. If the deprecated encodings table is not needed I will remove it.

Link to the rendered page: https://github.com/nkaki/parquet-format/blob/master/Encodings.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on focusing on current versions spec. At some point it would be great to make the parquet site able to see the previous versions easily.

I do not think there is consensus on what constitutes a "version" of the spec -- so unfortunately I think adding versions will be blocked until we can agree on what they mean. There are a bunch of discussions on the parquet mailing list if you want more of the backstory.

…gs.md) (apache#550) - remove v1 related column, and seperate tables for supported and deprecated encodings
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nkaki -- this looks like a great improvement to the documentation to me

FYI @emkornfield @wgtmac @julienledem in case you would also like to review

Add Dictionary indices to encoding targets

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb
Copy link
Contributor

alamb commented Feb 6, 2026

Thanks @nkaki -- I plan to leave this open for another few days in case anyone else would like a chance to comment.

I plan to merge it sometime next week

used with any page type.

### Supported Encodings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to add a note/link to the implementation status page to understand current support for each.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield
Thank you for the review!
I added note/link to the implementation status page.

nkaki and others added 2 commits February 9, 2026 13:20
fix typo

Co-authored-by: Gang Wu <ustcwg@gmail.com>
@alamb alamb merged commit 38818fa into apache:master Feb 9, 2026
4 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 9, 2026

Thank you so much @nkaki and thanks to @emkornfield and @wgtmac for the review -- I think this makes the encodings page significantly easier to navigate.

We can continue to improve the documentation as follow on PRs if needed

@alamb
Copy link
Contributor

alamb commented Feb 13, 2026

I wanted to follow up here and point out this change is now live on the parquet website:

https://parquet.apache.org/docs/file-format/data-pages/encodings/

I think it looks quite nice; Thanks again @nkaki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add summary table of encodings and supported types (in Encodings.md)

4 participants