You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550) (#552)
* MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550)
* MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550);
* MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550) - remove v1 related column, and seperate tables for supported and deprecated encodings
* Update Encodings.md
Add Dictionary indices to encoding targets
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Update Encodings.md
fix typo
Co-authored-by: Gang Wu <ustcwg@gmail.com>
* added note/link to the implementation status page
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Copy file name to clipboardExpand all lines: Encodings.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,27 @@ This file contains the specification of all supported encodings.
25
25
Unless otherwise stated in page or encoding documentation, any encoding can be
26
26
used with any page type.
27
27
28
+
### Supported Encodings
29
+
30
+
For details on current implementation status, see the [Implementation Status](https://parquet.apache.org/docs/file-format/implementationstatus/#encodings) page.
31
+
32
+
| Encoding type | Encoding enum | Supported Types |
@@ -50,6 +71,7 @@ For native types, this outputs the data as little endian. Floating
50
71
For the byte array type, it encodes the length as a 4 byte little
51
72
endian, followed by the bytes.
52
73
74
+
<aname="DICTIONARY"></a>
53
75
### Dictionary Encoding (PLAIN_DICTIONARY = 2 and RLE_DICTIONARY = 8)
54
76
The dictionary encoding builds a dictionary of values encountered in a given column. The
55
77
dictionary will be stored in a dictionary page per column chunk. The values are stored as integers
@@ -295,6 +317,7 @@ The encoded data is
295
317
This encoding is similar to the [RLE/bit-packing](#RLE) encoding. However the [RLE/bit-packing](#RLE) encoding is specifically used when the range of ints is small over the entire page, as is true of repetition and definition levels. It uses a single bit width for the whole page.
296
318
The delta encoding algorithm described above stores a bit width per miniblock and is less sensitive to variations in the size of encoded integers. It is also somewhat doing RLE encoding as a block containing all the same values will be bit packed to a zero bit width thus being only a header.
0 commit comments