@@ -7,7 +7,7 @@ handling various current and proposed floating point types:
77
88 - [ IEEE 754] ( https://en.wikipedia.org/wiki/IEEE_754 ) : Binary16, Binary32
99 - [ OCP Float8] ( https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-06-20-pdf ) : E5M2, E4M3
10- - [ IEEE WG P3109] ( https://github.com/P3109/Public/blob/main/Shared%20Reports/ IEEE%20WG%20P3109%20Interim%20Report.pdf ) : P3109_ {K}p{P} for K > 2, and 1 <= P < K.
10+ - [ IEEE WG P3109] ( https://github.com/P3109/Public/blob/main/IEEE%20WG%20P3109%20Interim%20Report%20(latest) .pdf ) : P3109_ {K}p{P} for K > 2, and 1 <= P < K.
1111 - [ OCP MX Formats] ( https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf ) : E2M1, M2M3, E3M2, E8M0, INT8, and the MX block formats.
1212
1313The library favours readability and extensibility over speed (although the * _ ndarray functions are reasonably fast for large arrays, see the [ benchmarking notebook] ( docs/source/04-benchmark.ipynb ) ).
@@ -19,33 +19,34 @@ See https://gfloat.readthedocs.io for documentation, or dive into the notebooks
1919
2020For example, here's a table from the [ 02-value-stats] ( docs/source/02-value-stats.ipynb ) notebook:
2121
22- | name| B: Bits in the format| P: Precision in bits| E: Exponent field width in bits| Exact in float16?| Exact in float32?| 0<x<1| 1<x<Inf| minSubnormal| maxSubnormal| minNormal| maxNormal| |
23- | --------------| -----| -----| -----| --------| --------| -------| -------| ----------------| ----------------| -------------| ---------------|
24- | p3109_k3p2sf | 3 | 2 | 1 | True | True | 1 | 1 | 0.5 | 0.5 | 1 | 1.5 |
25- | ocp_e2m1 | 4 | 2 | 2 | True | True | 1 | 5 | 0.5 | 0.5 | 1 | 6 |
26- | p3109_k4p2sf | 4 | 2 | 2 | True | True | 3 | 3 | 0.25 | 0.25 | 0.5 | 3 |
27- | ocp_e2m3 | 6 | 4 | 2 | True | True | 7 | 23 | 0.125 | 0.875 | 1 | 7.5 |
28- | ocp_e3m2 | 6 | 3 | 3 | True | True | 11 | 19 | 0.0625 | 0.1875 | 0.25 | 28 |
29- | p3109_k6p3sf | 6 | 3 | 3 | True | True | 15 | 15 | 0.03125 | 0.09375 | 0.125 | 14 |
30- | p3109_k6p4sf | 6 | 4 | 2 | True | True | 15 | 15 | 0.0625 | 0.4375 | 0.5 | 3.75 |
31- | ocp_e4m3 | 8 | 4 | 4 | True | True | 55 | 70 | 2^-9 | 7/4* 2^-7 | 0.015625 | 448 |
32- | ocp_e5m2 | 8 | 3 | 5 | True | True | 59 | 63 | 2^-16 | 3/2* 2^-15 | 2^-14 | 57344 |
33- | p3109_k8p1se | 8 | 1 | 7 | False | True | 63 | 62 | n/a | n/a | 2^-63 | 2^62 |
34- | p3109_k8p1ue | 8 | 1 | 8 | False | True | 127 | 125 | n/a | n/a | 2^-127 | 2^125 |
35- | p3109_k8p3se | 8 | 3 | 5 | True | True | 63 | 62 | 2^-17 | 3/2* 2^-16 | 2^-15 | 49152 |
36- | p3109_k8p3sf | 8 | 3 | 5 | True | True | 63 | 63 | 2^-17 | 3/2* 2^-16 | 2^-15 | 57344 |
37- | p3109_k8p3ue | 8 | 3 | 6 | False | True | 127 | 125 | 2^-33 | 3/2* 2^-32 | 2^-31 | 5/4* 2^31 |
38- | p3109_k8p3uf | 8 | 3 | 6 | False | True | 127 | 126 | 2^-33 | 3/2* 2^-32 | 2^-31 | 3/2* 2^31 |
39- | p3109_k8p4se | 8 | 4 | 4 | True | True | 63 | 62 | 2^-10 | 7/4* 2^-8 | 0.0078125 | 224 |
40- | p3109_k8p4sf | 8 | 4 | 4 | True | True | 63 | 63 | 2^-10 | 7/4* 2^-8 | 0.0078125 | 240 |
41- | p3109_k8p4ue | 8 | 4 | 5 | True | True | 127 | 125 | 2^-18 | 7/4* 2^-16 | 2^-15 | 53248 |
42- | p3109_k8p4uf | 8 | 4 | 5 | True | True | 127 | 126 | 2^-18 | 7/4* 2^-16 | 2^-15 | 57344 |
43- | p3109_k8p7sf | 8 | 7 | 1 | True | True | 63 | 63 | 0.015625 | 63/32* 2^-1 | 1 | 127/64* 2^0 |
44- | p3109_k8p8uf | 8 | 8 | 1 | True | True | 127 | 126 | 0.0078125 | 127/64* 2^-1 | 1 | 127/64* 2^0 |
45- | binary16 | 16 | 11 | 5 | True | True | 15359 | 16383 | 2^-24 | 1023/512* 2^-15 | 2^-14 | 65504 |
46- | bfloat16 | 16 | 8 | 8 | False | True | 16255 | 16383 | 2^-133 | 127/64* 2^-127 | 2^-126 | 255/128* 2^127 |
47- | ocp_e8m0 | 8 | 1 | 8 | False | True | 127 | 127 | n/a | n/a | 2^-127 | 2^127 |
48- | ocp_int8 | 8 | 8 | 0 | True | True | 63 | 63 | 0.015625 | 127/64* 2^0 | n/a | n/a |
22+ | name| B: Bits in the format| P: Precision in bits| E: Exponent field width in bits| 0<x<1| 1<x<Inf| minSubnormal| maxSubnormal| minNormal| maxNormal| Exact in float16?| Exact in float32?|
23+ | --------------| -----| -----| -----| -------| -------| ----------------| ----------------| -------------| ---------------| --------| --------|
24+ | p3109_k3p2sf | 3 | 2 | 1 | 1 | 1 | 0.5 | 0.5 | 1 | 1.5 | True | True |
25+ | ocp_e2m1 | 4 | 2 | 2 | 1 | 5 | 0.5 | 0.5 | 1 | 6 | True | True |
26+ | p3109_k4p2sf | 4 | 2 | 2 | 3 | 3 | 0.25 | 0.25 | 0.5 | 3 | True | True |
27+ | ocp_e2m3 | 6 | 4 | 2 | 7 | 23 | 0.125 | 0.875 | 1 | 7.5 | True | True |
28+ | ocp_e3m2 | 6 | 3 | 3 | 11 | 19 | 0.0625 | 0.1875 | 0.25 | 28 | True | True |
29+ | p3109_k6p3sf | 6 | 3 | 3 | 15 | 15 | 0.03125 | 0.09375 | 0.125 | 14 | True | True |
30+ | p3109_k6p4sf | 6 | 4 | 2 | 15 | 15 | 0.0625 | 0.4375 | 0.5 | 3.75 | True | True |
31+ | ocp_e4m3 | 8 | 4 | 4 | 55 | 70 | 2^-9 | 7/4* 2^-7 | 0.015625 | 448 | True | True |
32+ | ocp_e5m2 | 8 | 3 | 5 | 59 | 63 | 2^-16 | 3/2* 2^-15 | 2^-14 | 57344 | True | True |
33+ | p3109_k8p1se | 8 | 1 | 7 | 63 | 62 | n/a | n/a | 2^-63 | 2^62 | False | True |
34+ | p3109_k8p1ue | 8 | 1 | 8 | 127 | 125 | n/a | n/a | 2^-127 | 2^125 | False | True |
35+ | p3109_k8p3se | 8 | 3 | 5 | 63 | 62 | 2^-17 | 3/2* 2^-16 | 2^-15 | 49152 | True | True |
36+ | p3109_k8p3sf | 8 | 3 | 5 | 63 | 63 | 2^-17 | 3/2* 2^-16 | 2^-15 | 57344 | True | True |
37+ | p3109_k8p3ue | 8 | 3 | 6 | 127 | 125 | 2^-33 | 3/2* 2^-32 | 2^-31 | 5/4* 2^31 | False | True |
38+ | p3109_k8p3uf | 8 | 3 | 6 | 127 | 126 | 2^-33 | 3/2* 2^-32 | 2^-31 | 3/2* 2^31 | False | True |
39+ | p3109_k8p4se | 8 | 4 | 4 | 63 | 62 | 2^-10 | 7/4* 2^-8 | 0.0078125 | 224 | True | True |
40+ | p3109_k8p4sf | 8 | 4 | 4 | 63 | 63 | 2^-10 | 7/4* 2^-8 | 0.0078125 | 240 | True | True |
41+ | p3109_k8p4ue | 8 | 4 | 5 | 127 | 125 | 2^-18 | 7/4* 2^-16 | 2^-15 | 53248 | True | True |
42+ | p3109_k8p4uf | 8 | 4 | 5 | 127 | 126 | 2^-18 | 7/4* 2^-16 | 2^-15 | 57344 | True | True |
43+ | p3109_k8p7sf | 8 | 7 | 1 | 63 | 63 | 0.015625 | 63/32* 2^-1 | 1 | 127/64* 2^0 | True | True |
44+ | p3109_k8p8uf | 8 | 8 | 1 | 127 | 126 | 0.0078125 | 127/64* 2^-1 | 1 | 127/64* 2^0 | True | True |
45+ | binary16 | 16 | 11 | 5 | 15359 | 16383 | 2^-24 | 1023/512* 2^-15 | 2^-14 | 65504 | True | True |
46+ | bfloat16 | 16 | 8 | 8 | 16255 | 16383 | 2^-133 | 127/64* 2^-127 | 2^-126 | 255/128* 2^127 | False | True |
47+ | ocp_e8m0 | 8 | 1 | 8 | 127 | 127 | n/a | n/a | 2^-127 | 2^127 | False | True |
48+ | ocp_int8 | 8 | 8 | 0 | 63 | 63 | 0.015625 | 127/64* 2^0 | n/a | n/a | True | True |
49+
4950#### Notes
5051
5152All NaNs are the same, with no distinction between signalling or quiet,
0 commit comments