I2_S Quantization Format

This note documents the I2_S weight layout used by bitnet.cpp so alternative runtimes and tooling can parse the format without reverse-engineering the kernels.

The description below is derived from the packing logic in src/ggml-bitnet-mad.cpp, especially quantize_i2_s().

What `I2_S` stores

I2_S is the ternary weight format used by the CPU inference path.

Logical weight values are ternary: -1, 0, or +1
Each logical value is encoded into 2 bits
A full tensor also stores one trailing float32 scale value
The serialized buffer reserves 32 extra bytes so the scale region stays aligned

At packing time the implementation first maps floating-point values into 2-bit symbols:

0 means -1
1 means 0
2 means +1

Zero is detected with a small epsilon check, and non-zero values are converted by sign.

CPU-dependent packing granularity

QK_I2_S depends on the active CPU backend in src/ggml-bitnet-mad.cpp:

x86 / AVX / SSSE3 paths use QK_I2_S = 128
ARM NEON paths use QK_I2_S = 64

The packing pattern is the same on both backends: each output byte stores four 2-bit symbols from different groups. The only difference is whether the groups are 32-wide (128 = 4 x 32) or 16-wide (64 = 4 x 16).

x86 layout (`QK_I2_S = 128`)

For x86, one 32-byte block stores 128 ternary values split into 4 groups of 32.

For logical index j inside a 128-value block:

group_idx = j / 32
group_pos = j % 32

The packer writes:

byte_index = block_base + group_pos
shift = 6 - 2 * group_idx
packed_byte |= value << shift

So byte group_pos contains values from:

[group_pos, 32 + group_pos, 64 + group_pos, 96 + group_pos]

with the bit layout:

bits[7:6] -> element at offset 0
bits[5:4] -> element at offset 32
bits[3:2] -> element at offset 64
bits[1:0] -> element at offset 96

ARM layout (`QK_I2_S = 64`)

For ARM NEON, one 16-byte block stores 64 ternary values split into 4 groups of 16.

For logical index j inside a 64-value block:

group_idx = j / 16
group_pos = j % 16

The packer writes:

byte_index = block_base + group_pos
shift = 6 - 2 * group_idx
packed_byte |= value << shift

So byte group_pos contains values from:

[group_pos, 16 + group_pos, 32 + group_pos, 48 + group_pos]

with the same bit ordering:

bits[7:6], bits[5:4], bits[3:2], bits[1:0]

Scale storage

After the packed 2-bit payload, quantize_i2_s() stores one float32 scale:

scale_ptr = (float *)((char *)packed_weights + n / 4)
scale_ptr[0] = i2_scale

The function then returns:

nrow * row_size / 4 + 32

That final + 32 keeps the serialized tensor aligned. If you are building a parser, treat the packed payload as n / 4 bytes followed by a scale region that starts immediately after that payload, with extra alignment space reserved by the buffer size calculation.

Practical decoding recipe

To decode a logical element:

Choose the backend block size (128 for x86, 64 for ARM NEON).
Compute the block-local group and position.
Read the corresponding byte.
Extract the 2-bit symbol with the appropriate shift.
Map the symbol back to ternary:
- 0 -> -1
- 1 -> 0
- 2 -> +1

For x86:

block = k / 128
pos = k % 128
group = pos / 32
lane = pos % 32
byte_offset = block * 32 + lane
shift = 6 - 2 * group

For ARM:

block = k / 64
pos = k % 64
group = pos / 16
lane = pos % 16
byte_offset = block * 16 + lane
shift = 6 - 2 * group

Related source files

src/ggml-bitnet-mad.cpp
include/ggml-bitnet.h
utils/convert-hf-to-gguf-bitnet.py

If this format changes, update this document alongside the packing implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I2_S Quantization Format

What `I2_S` stores

CPU-dependent packing granularity

x86 layout (`QK_I2_S = 128`)

ARM layout (`QK_I2_S = 64`)

Scale storage

Practical decoding recipe

Related source files

FilesExpand file tree

i2s-format.md

Latest commit

History

i2s-format.md

File metadata and controls

I2_S Quantization Format

What I2_S stores

CPU-dependent packing granularity

x86 layout (QK_I2_S = 128)

ARM layout (QK_I2_S = 64)

Scale storage

Practical decoding recipe

Related source files

What `I2_S` stores

x86 layout (`QK_I2_S = 128`)

ARM layout (`QK_I2_S = 64`)