scale_offset and cast_value codecs

Source: zarr-developers/zarr-extensions#43

Overview

Two array-to-array codecs for zarr v3, designed to work together for the common pattern of storing floating-point data as compressed integers.

scale_offset

Type: array -> array (does NOT change dtype)

Encode: out = (in - offset) * scale Decode: out = (in / scale) + offset

Parameters

offset (optional): scalar subtracted during encoding. Default: 0 (additive identity). Serialized in JSON using the zarr v3 fill-value encoding for the array's dtype.
scale (optional): scalar multiplied during encoding (after offset subtraction). Default: 1 (multiplicative identity). Same JSON encoding as offset.

Key rules

Arithmetic MUST use the input array's own data type semantics (no implicit promotion).
If any intermediate or final value is unrepresentable in that dtype, error.
If neither scale nor offset is given, configuration may be omitted (codec is a no-op).
Fill value MUST be transformed through the codec (encode direction).
Only valid for real-number data types (int/uint/float families).

JSON

{"name": "scale_offset", "configuration": {"offset": 5, "scale": 0.1}}

cast_value

Type: array -> array (CHANGES dtype)

Purpose: Value-convert (not binary-reinterpret) array elements to a new data type.

Parameters

data_type (required): target zarr v3 data type.
rounding (optional): how to round when exact representation is impossible. Values: "nearest-even" (default), "towards-zero", "towards-positive", "towards-negative", "nearest-away".
out_of_range (optional): what to do when a value is outside the target's range. Values: "clamp", "wrap". If absent, out-of-range MUST error. "wrap" only valid for integral two's-complement types.
scalar_map (optional): explicit value overrides. {"encode": [[input, output], ...], "decode": [[input, output], ...]}. Evaluated BEFORE rounding/out_of_range.

Casting procedure (same for encode and decode, swapping source/target)

Check scalar_map — if input matches a key, use mapped value.
Check exact representability — if yes, use directly.
Apply rounding and out_of_range rules.
If none apply, MUST error.

Special values

NaN propagates between IEEE 754 types unless scalar_map overrides.
Signed zero preserved between IEEE 754 types.
If target doesn't support NaN/infinity and input has them, MUST error unless scalar_map provides a mapping.

Fill value

MUST be cast using same semantics as elements.
Implementations SHOULD validate fill value survives round-trip at metadata construction time.

JSON

{
  "name": "cast_value",
  "configuration": {
    "data_type": "uint8",
    "rounding": "nearest-even",
    "out_of_range": "clamp",
    "scalar_map": {
      "encode": [["NaN", 0], ["+Infinity", 0], ["-Infinity", 0]],
      "decode": [[0, "NaN"]]
    }
  }
}

Typical combined usage

{
  "data_type": "float64",
  "fill_value": "NaN",
  "codecs": [
    {"name": "scale_offset", "configuration": {"offset": -10, "scale": 0.1}},
    {"name": "cast_value", "configuration": {
      "data_type": "uint8",
      "rounding": "nearest-even",
      "scalar_map": {"encode": [["NaN", 0]], "decode": [[0, "NaN"]]}
    }},
    "bytes"
  ]
}

Implementation notes for zarr-python

scale_offset

Subclass ArrayArrayCodec.
resolve_metadata: transform fill_value via (fill - offset) * scale, keep dtype.
_encode_single: (array - offset) * scale using numpy with same dtype.
_decode_single: (array / scale) + offset using numpy with same dtype.
is_fixed_size = True.

cast_value

Subclass ArrayArrayCodec.
resolve_metadata: change dtype to target dtype, cast fill_value.
_encode_single: cast array from input dtype to target dtype.
_decode_single: cast array from target dtype back to input dtype.
Needs the input dtype stored (from evolve_from_array_spec or resolve_metadata).
is_fixed_size = True (for fixed-size types).
Initial implementation: support rounding and out_of_range for common cases. scalar_map adds complexity but is needed for NaN handling.

Key design decisions from PR review

Encode = (in - offset) * scale (subtract, not add) — matches HDF5 and numcodecs.
No implicit precision promotion — arithmetic stays in the input dtype.
out_of_range defaults to error (not clamp).
scalar_map was added specifically to handle NaN-to-integer mappings.
Fill value must round-trip exactly through the codec chain.
Name uses underscore: scale_offset, cast_value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scale_offset and cast_value codecs

Overview

scale_offset

Parameters

Key rules

JSON

cast_value

Parameters

Casting procedure (same for encode and decode, swapping source/target)

Special values

Fill value

JSON

Typical combined usage

Implementation notes for zarr-python

scale_offset

cast_value

Key design decisions from PR review

Uh oh!

FilesExpand file tree

feat-v3-scale-offset-cast.md

Latest commit

History

feat-v3-scale-offset-cast.md

File metadata and controls

scale_offset and cast_value codecs

Overview

scale_offset

Parameters

Key rules

JSON

cast_value

Parameters

Casting procedure (same for encode and decode, swapping source/target)

Special values

Fill value

JSON

Typical combined usage

Implementation notes for zarr-python

scale_offset

cast_value

Key design decisions from PR review