Source: zarr-developers/zarr-extensions#43
Two array-to-array codecs for zarr v3, designed to work together for the common pattern of storing floating-point data as compressed integers.
Type: array -> array (does NOT change dtype)
Encode: out = (in - offset) * scale
Decode: out = (in / scale) + offset
offset(optional): scalar subtracted during encoding. Default: 0 (additive identity). Serialized in JSON using the zarr v3 fill-value encoding for the array's dtype.scale(optional): scalar multiplied during encoding (after offset subtraction). Default: 1 (multiplicative identity). Same JSON encoding as offset.
- Arithmetic MUST use the input array's own data type semantics (no implicit promotion).
- If any intermediate or final value is unrepresentable in that dtype, error.
- If neither scale nor offset is given,
configurationmay be omitted (codec is a no-op). - Fill value MUST be transformed through the codec (encode direction).
- Only valid for real-number data types (int/uint/float families).
{"name": "scale_offset", "configuration": {"offset": 5, "scale": 0.1}}Type: array -> array (CHANGES dtype)
Purpose: Value-convert (not binary-reinterpret) array elements to a new data type.
data_type(required): target zarr v3 data type.rounding(optional): how to round when exact representation is impossible. Values:"nearest-even"(default),"towards-zero","towards-positive","towards-negative","nearest-away".out_of_range(optional): what to do when a value is outside the target's range. Values:"clamp","wrap". If absent, out-of-range MUST error."wrap"only valid for integral two's-complement types.scalar_map(optional): explicit value overrides.{"encode": [[input, output], ...], "decode": [[input, output], ...]}. Evaluated BEFORE rounding/out_of_range.
- Check scalar_map — if input matches a key, use mapped value.
- Check exact representability — if yes, use directly.
- Apply rounding and out_of_range rules.
- If none apply, MUST error.
- NaN propagates between IEEE 754 types unless scalar_map overrides.
- Signed zero preserved between IEEE 754 types.
- If target doesn't support NaN/infinity and input has them, MUST error unless scalar_map provides a mapping.
- MUST be cast using same semantics as elements.
- Implementations SHOULD validate fill value survives round-trip at metadata construction time.
{
"name": "cast_value",
"configuration": {
"data_type": "uint8",
"rounding": "nearest-even",
"out_of_range": "clamp",
"scalar_map": {
"encode": [["NaN", 0], ["+Infinity", 0], ["-Infinity", 0]],
"decode": [[0, "NaN"]]
}
}
}{
"data_type": "float64",
"fill_value": "NaN",
"codecs": [
{"name": "scale_offset", "configuration": {"offset": -10, "scale": 0.1}},
{"name": "cast_value", "configuration": {
"data_type": "uint8",
"rounding": "nearest-even",
"scalar_map": {"encode": [["NaN", 0]], "decode": [[0, "NaN"]]}
}},
"bytes"
]
}- Subclass
ArrayArrayCodec. resolve_metadata: transform fill_value via(fill - offset) * scale, keep dtype._encode_single:(array - offset) * scaleusing numpy with same dtype._decode_single:(array / scale) + offsetusing numpy with same dtype.is_fixed_size = True.
- Subclass
ArrayArrayCodec. resolve_metadata: change dtype to target dtype, cast fill_value._encode_single: cast array from input dtype to target dtype._decode_single: cast array from target dtype back to input dtype.- Needs the input dtype stored (from
evolve_from_array_specorresolve_metadata). is_fixed_size = True(for fixed-size types).- Initial implementation: support
roundingandout_of_rangefor common cases.scalar_mapadds complexity but is needed for NaN handling.
- Encode =
(in - offset) * scale(subtract, not add) — matches HDF5 and numcodecs. - No implicit precision promotion — arithmetic stays in the input dtype.
out_of_rangedefaults to error (not clamp).scalar_mapwas added specifically to handle NaN-to-integer mappings.- Fill value must round-trip exactly through the codec chain.
- Name uses underscore:
scale_offset,cast_value.