This document explains the complete journey from your input data to a finished Aztec barcode.
Input Text/Data
│
▼
┌─────────────────┐
│ Data Encoding │ Convert text to bits using modes
└────────┬────────┘
│
▼
┌─────────────────┐
│ Configuration │ Choose symbol size based on data length
│ Selection │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Codeword │ Pack bits into fixed-width codewords
│ Packing │ with "stuff bits"
└────────┬────────┘
│
▼
┌─────────────────┐
│ Reed-Solomon │ Calculate error correction codewords
│ Encoding │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Matrix Building │ Arrange everything into the symbol
└────────┬────────┘
│
▼
AztecSymbol
The first step converts your input into a stream of bits.
Text is encoded using "modes" that assign short bit patterns to common characters. For example:
- Uppercase letters (A-Z): 5 bits each in "Upper" mode
- Lowercase letters (a-z): 5 bits each in "Lower" mode
- Digits (0-9): 4 bits each in "Digit" mode
The encoder automatically switches between modes to minimize the total bits. See Mode Encoding for details.
Example: "ABC" in Upper mode
A = 2 → 00010 (5 bits)
B = 3 → 00011 (5 bits)
C = 4 → 00100 (5 bits)
─────────────────────
Total: 15 bits
Raw bytes are encoded in "Byte" mode:
[Byte mode switch] + [length] + [raw bytes]
5 bits 5 or 16 8 bits each
Based on how many bits your data needs, the library picks the smallest symbol that will fit.
The selection considers:
- Payload size - How many bits of data you have
- Error correction level - How much redundancy you want
- Symbol preference - Compact vs. full symbols
Each symbol size has a fixed total capacity in codewords. Those codewords are split between:
- Data codewords: Your actual information
- Parity codewords: Error correction
Example: Compact 1-layer symbol
Total capacity: 17 codewords × 6 bits = 102 bits
With 23% EC: ~13 data + ~4 parity codewords
If your data doesn't fit, the next larger symbol is tried.
The bit stream is divided into fixed-width codewords (6, 8, 10, or 12 bits depending on symbol size).
Here's a clever trick: Aztec codes add an extra bit to each codeword to avoid problematic patterns.
Why? A codeword of all 0s or all 1s could be confused with empty space or a solid block. So:
- If the data bits are all 0s → add a 1 at the end
- If the data bits are all 1s → add a 0 at the end
- Otherwise → add the next data bit
Example with 6-bit codewords (5 data bits + 1 stuff bit):
Data bits: 00000 → Codeword: 000001 (stuffed 1)
Data bits: 11111 → Codeword: 111110 (stuffed 0)
Data bits: 10101 + next bit 1 → Codeword: 101011 (normal)
This means each codeword carries width - 1 bits of actual data.
Now comes the "magic" part—adding error correction.
Reed-Solomon encoding takes your data codewords and calculates additional "parity" codewords. If any codewords get damaged later, the math lets us reconstruct the originals.
Example:
Data codewords: [D1, D2, D3, D4, D5, D6, D7, D8, D9, D10]
+
Reed-Solomon calculation
↓
Parity codewords: [P1, P2, P3]
Final codewords: [D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, P1, P2, P3]
The number of parity codewords determines how many errors can be corrected:
- Can correct up to
floor(parity_count / 2)codeword errors - Can detect up to
parity_counterrors
See Reed-Solomon for the mathematical details.
The final step assembles all the pieces into the 2D symbol.
- Finder Pattern: The central bull's eye
- Orientation Marks: Direction indicators
- Mode Message: Symbol metadata with its own error correction
- Reference Grid: Alignment lines (full symbols only)
- Data Layers: The codewords in a spiral pattern
The mode message tells scanners about the symbol structure:
Compact (28 bits total):
[2 bits: layers-1] [6 bits: data_codewords-1] [20 bits: RS parity]
Full (40 bits total):
[5 bits: layers-1] [11 bits: data_codewords-1] [24 bits: RS parity]
Codewords are placed in a counter-clockwise spiral pattern starting just outside the mode message ring:
← ← ← ← ←
↓ ↑
→ → ┌───────┐ ↑
↑ │Finder │ ↑
↑ │Pattern│ ↑
↑ └───────┘ →
↑ → → →
Each "layer" of the spiral is 2 modules wide, with codeword bits placed MSB-first (most significant bit first).
The completed matrix is a grid of boolean values (true = black, false = white).
For output, rows are packed into bytes:
LSB-first (default):
Modules: [0][1][2][3][4][5][6][7]
Byte: bit0 bit1 bit2 bit3 bit4 bit5 bit6 bit7
MSB-first (PNG-compatible):
Modules: [0][1][2][3][4][5][6][7]
Byte: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0
Let's trace "Hi" through the pipeline:
Input: "Hi"
Step 1 - Data Encoding:
H in Upper mode = 9 → 01001 (5 bits)
Switch to Lower mode → 11100 (5 bits)
i in Lower mode = 10 → 01010 (5 bits)
Total: 15 bits
Step 2 - Configuration:
15 bits + error correction → Compact 1-layer (15×15)
Codeword width: 6 bits
Step 3 - Codeword Packing:
Bits: 010011110001010
Groups of 5 + stuff bit:
01001 + 1 → 010011
11100 + 0 → 111000
01010 + padding → stuffed
Step 4 - Reed-Solomon:
Data codewords + parity codewords calculated
Step 5 - Matrix Building:
Place finder (9×9 bull's eye)
Place mode message (28 bits around finder)
Place data in layer 1
Step 6 - Export:
15×15 matrix → 2 bytes per row × 15 rows = 30 bytes
- Mode Encoding - Deep dive into text compression
- Galois Fields - The math foundation
- Reed-Solomon - Error correction details
- Symbol Layout - Matrix structure