From fa5c9bfe6a1e92031ddb2efcd6333e075e88517d Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Thu, 1 Jan 2026 20:08:46 -0500 Subject: [PATCH 01/14] First draft --- README.md | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 229 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b0027c7..4ce48ff 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,229 @@ -# protocol -Description of the Compactr binary protocol +--- +title: Compactr Format Specification v1.0 +author: +- name: Frederic Charette + role: maintainer + email: fredericcharette@gmail.com +date: 2026-01-01 +area: API +workgroup: Compactr +keyword: +- serialization +- open-api +--- + +## Abstract + +This document specifies the compactr format, a schema-based serialization protocol which aims to reuse existing [OpenAPI](https://spec.openapis.org/oas/v3.1.2.html) specifications as shemas. + +## Status + +The specification is Stable as of this publication's release. + +## Table of Contents + +[1. Background](#1_Background) + +[2. Design decisions](#2_Design_decisions) + +[3. Schemas](#3_Schemas) + +[4. Primitive types](#4_Primitive_types) + +[5. Complex schemas](#5_Complex_schemas) + +[6. Variants](#6_Variants) + +[7. Implementation considerations](#7_Implementation_considerations) + +[8. Security considerations](#8_Security_considerations) + +[9. References](#9_References) + +--- + +## 1. Background + +## 2. Design decisions + +## 3. Schemas + +## 4. Primitive types + +## 5. Complex schemas + +## 6. Variants + +## 7. Implementation considerations + +## 8. Security considerations + +## 9. References + +[OAS] OpenAPI Specification, The OpenAPI initiative, + + + + + + +## Key Characteristics + +1. **Big-endian byte order** for all multi-byte integers +2. **UTF-8 encoding** for strings (2-byte length prefix + UTF-8 bytes) +3. **Interleaved structure** for objects (index, size, value, index, size, value, ...) +4. **Alphabetical property indexing** for deterministic encoding +5. **Value insertion order** preserved in encoded output + +## Primitive Types + +### Boolean (1 byte) +- `true` → `0x01` +- `false` → `0x00` + +### Int32 (4 bytes, big-endian signed) +- `0` → `0x00 0x00 0x00 0x00` +- `42` → `0x00 0x00 0x00 0x2a` +- `-1` → `0xff 0xff 0xff 0xff` + +### Int64 (8 bytes, IEEE 754 double) +**Note:** Due to JavaScript limitations, int64 values are encoded as IEEE 754 double (f64). +- `0` → `0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00` +- `42` → `0x40 0x45 0x00 0x00 0x00 0x00 0x00 0x00` +- `9007199254740991` (MAX_SAFE_INTEGER) → `0x43 0x3f 0xff 0xff 0xff 0xff 0xff 0xff` + +### Float (4 bytes, big-endian IEEE 754) +- `0.0` → `0x00 0x00 0x00 0x00` +- `1.0` → `0x3f 0x80 0x00 0x00` +- `3.14` → `0x40 0x48 0xf5 0xc3` + +### Double (8 bytes, big-endian IEEE 754) +- `0.0` → `0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00` +- `1.0` → `0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00` + +### String (2 bytes length + UTF-8 bytes) +- Empty string `""` → `0x00 0x00` +- `"A"` → `0x00 0x01 0x41` (length=1, UTF-8 'A'=0x41) +- `"Hello"` → `0x00 0x05 0x48 0x65 0x6c 0x6c 0x6f` + +**In object properties:** Strings are encoded as raw UTF-8 bytes (no length prefix). + +## Special Formats + +### UUID (16 bytes, raw bytes) +Standard UUID format, no hyphens: +- `550e8400-e29b-41d4-a716-446655440000` → 16 raw bytes + +### DateTime (9 bytes: component format) +- 2 bytes: year (u16 big-endian) +- 1 byte: month (1-12) +- 1 byte: day (1-31) +- 1 byte: hour (0-23) +- 1 byte: minute (0-59) +- 1 byte: second (0-59) +- 2 bytes: milliseconds (u16 big-endian, 0-999) + +Example: `2024-01-15T10:30:00.000Z` → `0x07 0xe8 0x01 0x0f 0x0a 0x1e 0x00 0x00 0x00` + +### Date (4 bytes: days since Unix epoch, i32 big-endian) +- `1970-01-01` → `0x00 0x00 0x00 0x00` +- `2024-01-01` → `0x00 0x00 0x4e 0x94` + +### IPv4 (4 bytes, network order) +- `192.168.1.1` → `0xc0 0xa8 0x01 0x01` + +### IPv6 (16 bytes, network order) +- `::1` → `0x00 0x00 ... 0x00 0x01` (15 zeros + 1) + +### Binary (4 bytes length + raw data) +- 4 bytes: length (u32 big-endian) +- N bytes: raw binary data + +## Array Format + +Arrays encode each element with a 1-byte size prefix: + +``` +[element1_size, element1_data, element2_size, element2_data, ...] +``` + +### Example: `[1, 2, 3]` (int32 array) +``` +0x04 0x00 0x00 0x00 0x01 // size=4, value=1 +0x04 0x00 0x00 0x00 0x02 // size=4, value=2 +0x04 0x00 0x00 0x00 0x03 // size=4, value=3 +``` + +## Object Format + +Objects use an **interleaved structure** where each property is encoded as: +`[index, size, value]` + +### Structure +``` +[num_props, index0, size0, value0, index1, size1, value1, ...] +``` + +- **num_props** (1 byte): Number of properties present +- **index** (1 byte): Alphabetical index of property in schema +- **size** (variable): Size encoding depends on type +- **value**: Encoded property value + +### Property Indexing + +Properties are indexed **alphabetically by name** (not schema insertion order): +- Schema `{id: ..., name: ..., email: ...}` → alphabetically: `email=0, id=1, name=2` + +### Size Encoding + +Different types use different size encodings: + +**Compound types (Array, Object):** +- Always use `0x00` prefix +- Then: single byte if size < 256, else u16 big-endian + +**Primitive types:** +- Single byte if size < 256 +- Else: `0x00` prefix + u16 big-endian + +**Strings in objects:** +- Raw UTF-8 bytes (no length prefix) +- Size field indicates byte count + +### Example: `{x: 10, y: 20}` with schema `{x: int32, y: int32}` + +``` +0x02 // 2 properties +0x00 0x04 // x (index 0), size 4 +0x00 0x00 0x00 0x0a // x = 10 +0x01 0x04 // y (index 1), size 4 +0x00 0x00 0x00 0x14 // y = 20 +``` + +### Property Order + +**Important:** Properties are encoded in the order they appear in the **value object** (insertion order), but use alphabetical indices from the schema. + +Example: +```javascript +// Value: {email: "a@b.com", id: 1, name: "Alice"} +// Alphabetical indices: email=0, id=1, name=2 +// Encoded order: email (idx 0), id (idx 1), name (idx 2) - follows value insertion +``` + +### Optional Properties + +Missing optional properties are simply omitted from the encoding. Only present properties are encoded. + +## Wrapper Format (v3.x) + +In compactr.js v3.x, all top-level values are wrapped in objects: + +```javascript +// To encode the number 42: +schema({ value: { type: 'int32' } }).write({ value: 42 }) + +// Result: Object with one property 'value' = 42 +``` + +This is different from v2.x which allowed standalone primitives. From 9f45b417ee4be1fc7bb60af006d62e6b54861717 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Fri, 2 Jan 2026 12:03:38 -0500 Subject: [PATCH 02/14] Update README.md --- README.md | 50 +++++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 4ce48ff..444f09d 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,16 @@ ---- -title: Compactr Format Specification v1.0 -author: -- name: Frederic Charette - role: maintainer - email: fredericcharette@gmail.com -date: 2026-01-01 -area: API -workgroup: Compactr -keyword: +# Compactr Format Specification v1.0 + +Authors: +- Frederic Charette + +Date published: 2026-01-01 + +Last update: 2026-01-02 + +Keywords: - serialization - open-api + --- ## Abstract @@ -22,28 +23,39 @@ The specification is Stable as of this publication's release. ## Table of Contents -[1. Background](#1_Background) +[1. Background](#1-Background) -[2. Design decisions](#2_Design_decisions) +[2. Design decisions](#2-Design-decisions) -[3. Schemas](#3_Schemas) +[3. Schemas](#3-Schemas) -[4. Primitive types](#4_Primitive_types) +[4. Primitive types](#4-Primitive-types) -[5. Complex schemas](#5_Complex_schemas) +[5. Complex schemas](#5-Complex-schemas) -[6. Variants](#6_Variants) +[6. Variants](#6-Variants) -[7. Implementation considerations](#7_Implementation_considerations) +[7. Implementation considerations](#7-Implementation-considerations) -[8. Security considerations](#8_Security_considerations) +[8. Security considerations](#8-Security-considerations) -[9. References](#9_References) +[9. References](#9-References) --- ## 1. Background +Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in formats such as TEXT (ex: JSON, XML), or BINARY (ex: Files, [Protobuf](https://protobuf.dev/), so that they can be understood and reconstructed by other systems. + +A schema-based serialization approach enforces a predefined structure for data, ensuring consistency and validation, while a schema-less approach allows for more flexible and dynamic data representation, with fewer constraints on how data is organized. + +Schema-based serialization protocols generally yield much smaller outputs, which is desirable to limit bandwidth and costs. The caveat to schema-based serialization is the cost of creating and maintaining schemas across multiple systems. + +The initial concept for the compactr protocol was drafted in [2016](https://www.npmjs.com/package/compactr/v/0.0.1) with the goal of creating a schema-based serialization protocol that outputs minimal binary while using first party markdown or code structures as schemas. + +While functional, the early versions would still require the knowledge of writing "compactr-style" schemas as Javascript Objects or JSON and limited adoption for languages outside of Javascript. As of compactr.js 3.0, release in 2025, the protocol moved to adopt OpenAPI 3.x as the base format for compactr schemas. + + ## 2. Design decisions ## 3. Schemas From 4f1bef4c6511572d5aa42f46fc65ae9dae515ed8 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Fri, 2 Jan 2026 14:37:24 -0500 Subject: [PATCH 03/14] Update README.md --- README.md | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 84 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 444f09d..5afcbd0 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Keywords: ## Abstract -This document specifies the compactr format, a schema-based serialization protocol which aims to reuse existing [OpenAPI](https://spec.openapis.org/oas/v3.1.2.html) specifications as shemas. +This document specifies the compactr format, a schema-based serialization protocol which aims to reuse existing [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as schemas. ## Status @@ -23,29 +23,39 @@ The specification is Stable as of this publication's release. ## Table of Contents -[1. Background](#1-Background) +- [1. Background](#1-Background) -[2. Design decisions](#2-Design-decisions) +- [2. Design decisions](#2-Design-decisions) -[3. Schemas](#3-Schemas) + - [2.1 Byte-order](#2-1-Byte-order) -[4. Primitive types](#4-Primitive-types) + - [2.2 Key limits](#2-2-Key-limits) -[5. Complex schemas](#5-Complex-schemas) + - [2.3 Size limits](#2-3-Size-limits) -[6. Variants](#6-Variants) + - [2.4 Schema properties and Encoding order](#2-4-Schema-properties-and-Encoding-order) -[7. Implementation considerations](#7-Implementation-considerations) + - [2.5 Versionning](#2-5-Versionning) -[8. Security considerations](#8-Security-considerations) +- [3. Schemas](#3-Schemas) -[9. References](#9-References) +- [4. Primitive types](#4-Primitive-types) + +- [5. Complex schemas](#5-Complex-schemas) + +- [6. Variants](#6-Variants) + +- [7. Implementation considerations](#7-Implementation-considerations) + +- [8. Security considerations](#8-Security-considerations) + +- [9. References](#9-References) --- ## 1. Background -Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in formats such as TEXT (ex: JSON, XML), or BINARY (ex: Files, [Protobuf](https://protobuf.dev/), so that they can be understood and reconstructed by other systems. +Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in formats such as TEXT (ex: JSON, XML), or BINARY (ex: Files, [Protobuf](https://protobuf.dev/)), so that they can be understood and reconstructed by other systems. A schema-based serialization approach enforces a predefined structure for data, ensuring consistency and validation, while a schema-less approach allows for more flexible and dynamic data representation, with fewer constraints on how data is organized. @@ -55,9 +65,67 @@ The initial concept for the compactr protocol was drafted in [2016](https://www. While functional, the early versions would still require the knowledge of writing "compactr-style" schemas as Javascript Objects or JSON and limited adoption for languages outside of Javascript. As of compactr.js 3.0, release in 2025, the protocol moved to adopt OpenAPI 3.x as the base format for compactr schemas. +--- ## 2. Design decisions +The primary objectives of the Compactr protocols are: + +- First-party schema definitions, using [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as base schemas. +- Optimized binary output +- Compatibility across runtimes +- Type safety + +In order to meet these objectives, some key design decisions were made: + +### 2.1 Byte-order + +Compactr binary follows Network byte order (NBO) big-endian format. + +### 2.2 Key limits + +Indices for properties are assigned a numeric value which is stored as an unsigned 8bit integer. Thus limiting the number of properties per object to 255. + +### 2.3 Size limits + +Some primitive types (ex: `Boolean`) have static sizes, which are not encoded, while others (ex: `String`) have dynamic sizes. + +Dynamically-sized properties have varying size limits, which are described in the [primitives](#4-primitives) section of this document. + + +### 2.4 Schema properties and Encoding order + +To maintain consistency across systems, the field index for each schema properties is based on it's alphabetical order, starting from 1. + +The implementation of this sort function must be based on the numerical sorting of Unicode (UTF-16) character code values of the property name. + +Example: + +```json + +{ + "type": "object", + "properties": { + "a": { "type": "boolean" }, + "c": { "type": "boolean" }, + "b": { "type": "boolean" } + } +} + +``` +Will attribute index 0 to field `a`, index 1 to field `b` and 2 for `c`. Implementations of this protocol must follow this sorting rule to maintain consistency, even if properties are listed in differring orders across systems. + +Encoding of values to generate the binary output simply follows the order in which the properties are listed in the structure or object. + +For example, serializing `{ c: true, a: true, b: true }` with the previous schema will output: `0x03 0x01 0x01 0x01 0x02 0x01`. + + +## 2.5 Versionning + +Compactr binaries do not include version flags and the protocol does not include versionning mechanisms. + +--- + ## 3. Schemas ## 4. Primitive types @@ -76,7 +144,11 @@ While functional, the early versions would still require the knowledge of writin - +Type | Variant | Bytes | Limit +--- +String | - | 2 | 0xFFFF +String | Binary | 4 | 0xFFFFFFFF +Array | - | 2 | ## Key Characteristics From 85548a1c1380e73a53baf0785d7074abee450221 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Fri, 2 Jan 2026 15:41:04 -0500 Subject: [PATCH 04/14] Update README.md --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 5afcbd0..8a8d5b3 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Keywords: ## Abstract -This document specifies the compactr format, a schema-based serialization protocol which aims to reuse existing [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as schemas. +This document specifies the compactr format, a schema-based serialization protocol that reuses existing [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as schemas. ## Status @@ -35,7 +35,7 @@ The specification is Stable as of this publication's release. - [2.4 Schema properties and Encoding order](#2-4-Schema-properties-and-Encoding-order) - - [2.5 Versionning](#2-5-Versionning) + - [2.5 Versioning](#2-5-Versioning) - [3. Schemas](#3-Schemas) @@ -55,9 +55,9 @@ The specification is Stable as of this publication's release. ## 1. Background -Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in formats such as TEXT (ex: JSON, XML), or BINARY (ex: Files, [Protobuf](https://protobuf.dev/)), so that they can be understood and reconstructed by other systems. +Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in text-based formats (e.g., JSON, XML), or binary formats (e.g., files, [Protobuf](https://protobuf.dev/)), so that they can be understood and reconstructed by other systems. -A schema-based serialization approach enforces a predefined structure for data, ensuring consistency and validation, while a schema-less approach allows for more flexible and dynamic data representation, with fewer constraints on how data is organized. +A schema-based serialization approach enforces a predefined structure for data, ensuring consistency and validation, whereas a schema-less approach allows for more flexible and dynamic data representation, with fewer constraints on how data is organized. Schema-based serialization protocols generally yield much smaller outputs, which is desirable to limit bandwidth and costs. The caveat to schema-based serialization is the cost of creating and maintaining schemas across multiple systems. @@ -80,24 +80,24 @@ In order to meet these objectives, some key design decisions were made: ### 2.1 Byte-order -Compactr binary follows Network byte order (NBO) big-endian format. +Compactr binary follows Network Byte Order (NBO) big-endian format. ### 2.2 Key limits -Indices for properties are assigned a numeric value which is stored as an unsigned 8bit integer. Thus limiting the number of properties per object to 255. +Indices are assigned for properties and stored as an unsigned 8-bit integer. Thus limiting the number of properties per object to 255. ### 2.3 Size limits -Some primitive types (ex: `Boolean`) have static sizes, which are not encoded, while others (ex: `String`) have dynamic sizes. +Some primitive types (e.g., `Boolean`) have fixed sizes, thus not requiring size bytes to be encoded, while others (e.g.,: `String`) have dynamic sizes. -Dynamically-sized properties have varying size limits, which are described in the [primitives](#4-primitives) section of this document. +Dynamically-sized properties have size limits represented by unsigned integers of varying sizes, which are described in the [primitives](#4-primitives) section of this document. ### 2.4 Schema properties and Encoding order -To maintain consistency across systems, the field index for each schema properties is based on it's alphabetical order, starting from 1. +To maintain consistency across systems, the field index for each schema property is based on its alphabetical order, starting from 1. -The implementation of this sort function must be based on the numerical sorting of Unicode (UTF-16) character code values of the property name. +The sorting function must be based on the numerical order of Unicode (UTF-16) character code values of the property names. Example: @@ -120,9 +120,9 @@ Encoding of values to generate the binary output simply follows the order in whi For example, serializing `{ c: true, a: true, b: true }` with the previous schema will output: `0x03 0x01 0x01 0x01 0x02 0x01`. -## 2.5 Versionning +## 2.5 Versioning -Compactr binaries do not include version flags and the protocol does not include versionning mechanisms. +Compactr binaries do not include version flags and the protocol does not include versioning mechanisms. --- From 9bb75137ef7862bad30a74fa0f72f66bbf4fec64 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Fri, 2 Jan 2026 20:15:40 -0500 Subject: [PATCH 05/14] Update README.md --- README.md | 323 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 197 insertions(+), 126 deletions(-) diff --git a/README.md b/README.md index 8a8d5b3..f60e5e9 100644 --- a/README.md +++ b/README.md @@ -128,186 +128,257 @@ Compactr binaries do not include version flags and the protocol does not include ## 3. Schemas +### 3.1 Schema Source + +Compactr schemas are derived from OpenAPI 3.0+ Schema Objects, as defined in [[OAS]]. + +Only the following schema keywords are normative for Compactr encoding: + +- `type` +- `format` +- `properties` +- `required` +- `items` +- `oneOf` +- `anyOf` +- `allOf` +- `nullable` +- `$ref` + +All other OpenAPI keywords (e.g., description, example, deprecated) are ignored for encoding purposes. + +### 3.2 Supported OpenAPI Types + +| Type | Format | Bytes | Description | +| --- | --- | --- | --- | +| boolean | - | 1 | Boolean value | +| integer | int32 | 4 | 32-bit integer | +| integer | int64 | 8 | 64-bit integer | +| number | float | 4 | 32-bit floating point | +| number | double | 8 | 64-bit floating point | +| string | - | variable | UTF-8 variable size encoding | +| string | uuid | 16 | UUID (compressed) | +| string | ipv4 | 4 | IPv4 address | +| string | ipv6 | 16 | IPv6 address | +| string | date | 4 | Date (YYYY-MM-DD) | +| string | date-time | 8 | ISO 8601 date-time | +| string | binary | variable | Base64 binary data | +| array | - | variable | Array of items | +| object | - | variable | Nested object | + +### 3.3 Required vs Optional Properties + +Properties listed in required MUST be present during encoding. + +Optional properties MAY be omitted. + +Missing optional properties are not encoded and do not occupy space. + +Decoders MUST treat omitted optional properties as undefined (or language equivalent). + ## 4. Primitive types -## 5. Complex schemas +### 4.1 Boolean -## 6. Variants +Size: 1 byte -## 7. Implementation considerations +Encoding: -## 8. Security considerations +0x00 → false -## 9. References +0x01 → true -[OAS] OpenAPI Specification, The OpenAPI initiative, +No size prefix is used. +### 4.2 Integers +Size: 4 bytes -Type | Variant | Bytes | Limit ---- -String | - | 2 | 0xFFFF -String | Binary | 4 | 0xFFFFFFFF -Array | - | 2 | +Encoding: Signed two’s complement, big-endian +Valid range: −2³¹ to 2³¹−1 -## Key Characteristics +### 4.3 Numbers -1. **Big-endian byte order** for all multi-byte integers -2. **UTF-8 encoding** for strings (2-byte length prefix + UTF-8 bytes) -3. **Interleaved structure** for objects (index, size, value, index, size, value, ...) -4. **Alphabetical property indexing** for deterministic encoding -5. **Value insertion order** preserved in encoded output +Size: 4 bytes -## Primitive Types +Encoding: IEEE 754 single-precision, big-endian -### Boolean (1 byte) -- `true` → `0x01` -- `false` → `0x00` +Mapped from OpenAPI number when format: float. -### Int32 (4 bytes, big-endian signed) -- `0` → `0x00 0x00 0x00 0x00` -- `42` → `0x00 0x00 0x00 0x2a` -- `-1` → `0xff 0xff 0xff 0xff` +4.5 String -### Int64 (8 bytes, IEEE 754 double) -**Note:** Due to JavaScript limitations, int64 values are encoded as IEEE 754 double (f64). -- `0` → `0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00` -- `42` → `0x40 0x45 0x00 0x00 0x00 0x00 0x00 0x00` -- `9007199254740991` (MAX_SAFE_INTEGER) → `0x43 0x3f 0xff 0xff 0xff 0xff 0xff 0xff` +Two encoding modes exist. -### Float (4 bytes, big-endian IEEE 754) -- `0.0` → `0x00 0x00 0x00 0x00` -- `1.0` → `0x3f 0x80 0x00 0x00` -- `3.14` → `0x40 0x48 0xf5 0xc3` +Standalone / array strings +[u16 length][UTF-8 bytes] -### Double (8 bytes, big-endian IEEE 754) -- `0.0` → `0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00` -- `1.0` → `0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00` -### String (2 bytes length + UTF-8 bytes) -- Empty string `""` → `0x00 0x00` -- `"A"` → `0x00 0x01 0x41` (length=1, UTF-8 'A'=0x41) -- `"Hello"` → `0x00 0x05 0x48 0x65 0x6c 0x6c 0x6f` +Maximum length: 65,535 bytes. -**In object properties:** Strings are encoded as raw UTF-8 bytes (no length prefix). +Object property strings +[size][UTF-8 bytes] -## Special Formats -### UUID (16 bytes, raw bytes) -Standard UUID format, no hyphens: -- `550e8400-e29b-41d4-a716-446655440000` → 16 raw bytes +The size field is supplied by the enclosing object encoding. -### DateTime (9 bytes: component format) -- 2 bytes: year (u16 big-endian) -- 1 byte: month (1-12) -- 1 byte: day (1-31) -- 1 byte: hour (0-23) -- 1 byte: minute (0-59) -- 1 byte: second (0-59) -- 2 bytes: milliseconds (u16 big-endian, 0-999) +Strings MUST be valid UTF-8. -Example: `2024-01-15T10:30:00.000Z` → `0x07 0xe8 0x01 0x0f 0x0a 0x1e 0x00 0x00 0x00` +4.6 Binary +[u32 length][raw bytes] -### Date (4 bytes: days since Unix epoch, i32 big-endian) -- `1970-01-01` → `0x00 0x00 0x00 0x00` -- `2024-01-01` → `0x00 0x00 0x4e 0x94` -### IPv4 (4 bytes, network order) -- `192.168.1.1` → `0xc0 0xa8 0x01 0x01` +Mapped from OpenAPI: -### IPv6 (16 bytes, network order) -- `::1` → `0x00 0x00 ... 0x00 0x01` (15 zeros + 1) +type: string +format: binary -### Binary (4 bytes length + raw data) -- 4 bytes: length (u32 big-endian) -- N bytes: raw binary data -## Array Format +Maximum length: 4,294,967,295 bytes. -Arrays encode each element with a 1-byte size prefix: +4.7 UUID -``` -[element1_size, element1_data, element2_size, element2_data, ...] -``` +Size: 16 bytes -### Example: `[1, 2, 3]` (int32 array) -``` -0x04 0x00 0x00 0x00 0x01 // size=4, value=1 -0x04 0x00 0x00 0x00 0x02 // size=4, value=2 -0x04 0x00 0x00 0x00 0x03 // size=4, value=3 -``` +Encoding: Raw UUID bytes, network order -## Object Format +Mapped from: -Objects use an **interleaved structure** where each property is encoded as: -`[index, size, value]` +type: string +format: uuid -### Structure -``` -[num_props, index0, size0, value0, index1, size1, value1, ...] -``` +4.8 Date -- **num_props** (1 byte): Number of properties present -- **index** (1 byte): Alphabetical index of property in schema -- **size** (variable): Size encoding depends on type -- **value**: Encoded property value +Size: 4 bytes -### Property Indexing +Encoding: Signed int32, days since Unix epoch (UTC) -Properties are indexed **alphabetically by name** (not schema insertion order): -- Schema `{id: ..., name: ..., email: ...}` → alphabetically: `email=0, id=1, name=2` +Mapped from: -### Size Encoding +type: string +format: date -Different types use different size encodings: +4.9 DateTime -**Compound types (Array, Object):** -- Always use `0x00` prefix -- Then: single byte if size < 256, else u16 big-endian +Size: 9 bytes -**Primitive types:** -- Single byte if size < 256 -- Else: `0x00` prefix + u16 big-endian +Encoding: Component-based UTC timestamp -**Strings in objects:** -- Raw UTF-8 bytes (no length prefix) -- Size field indicates byte count +Mapped from: -### Example: `{x: 10, y: 20}` with schema `{x: int32, y: int32}` +type: string +format: date-time + +## 5. Complex schemas + +5.1 Object Encoding + +Objects are encoded as: + +[num_properties][property...] + + +Where each property is: + +[index][size][value] + + +num_properties: u8 + +index: u8 (alphabetical index) + +size: variable (see below) + +5.3 Arrays + +Arrays are encoded as a sequence of elements, without a global count: + +[element_size][element_value]... -``` -0x02 // 2 properties -0x00 0x04 // x (index 0), size 4 -0x00 0x00 0x00 0x0a // x = 10 -0x01 0x04 // y (index 1), size 4 -0x00 0x00 0x00 0x14 // y = 20 -``` -### Property Order +The end of the array is determined by the enclosing object’s size field. -**Important:** Properties are encoded in the order they appear in the **value object** (insertion order), but use alphabetical indices from the schema. +This design allows streaming decoding. + +5.4 Nested Objects + +Nested objects follow the same encoding rules recursively. + +There is no depth limit imposed by the protocol; implementations SHOULD impose practical limits. + +## 6. Variants + +6.1 Union Types (oneOf, anyOf) + +Variants are encoded as: + +[variant_index][value] + + +variant_index: u8, based on schema order + +value: encoded according to the selected schema Example: -```javascript -// Value: {email: "a@b.com", id: 1, name: "Alice"} -// Alphabetical indices: email=0, id=1, name=2 -// Encoded order: email (idx 0), id (idx 1), name (idx 2) - follows value insertion -``` -### Optional Properties +oneOf: + - type: string + - type: int32 -Missing optional properties are simply omitted from the encoding. Only present properties are encoded. -## Wrapper Format (v3.x) +Encoding "abc": -In compactr.js v3.x, all top-level values are wrapped in objects: +0x00 [string encoding] -```javascript -// To encode the number 42: -schema({ value: { type: 'int32' } }).write({ value: 42 }) +6.2 Nullable Values -// Result: Object with one property 'value' = 42 -``` +If nullable: true, a null value is encoded as: + +0xff + + +No further bytes follow. + +This sentinel value is reserved and MUST NOT collide with valid indices. + +### 6.3 Custom variants + +## 7. Implementation considerations + +### 7.1 Determinism -This is different from v2.x which allowed standalone primitives. +Encoders MUST: + +Alphabetically sort schema properties + +Preserve value insertion order + +Use canonical size encodings + +Failure to do so breaks binary compatibility. + +## 8. Security considerations + +Implementations MUST guard against: + +Oversized length prefixes + +Deeply nested schemas + +Malformed UTF-8 + +Integer overflow during size calculations + +Decoders SHOULD impose: + +Maximum object size + +Maximum recursion depth + +Maximum array element count + +Compactr does not provide encryption, authentication, or integrity guarantees. + +## 9. References + +[OAS] OpenAPI Specification, The OpenAPI initiative, From 626ff9d1098dc2c7c7f020a0054d1647424c9d22 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sat, 3 Jan 2026 16:18:01 -0500 Subject: [PATCH 06/14] Update README.md --- README.md | 161 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 57 deletions(-) diff --git a/README.md b/README.md index f60e5e9..1686e32 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,19 @@ The specification is Stable as of this publication's release. - [3. Schemas](#3-Schemas) -- [4. Primitive types](#4-Primitive-types) + - [3.1 Schema Source](#3-1-Schema-source) + + - [3.2 Required vs Optional Properties](#3-2-Required-vs-Optional-Properties) + + - [3.3 Walkthrough properties](#3-3-Walkthrough-properties) + +- [4. Encoding](#4-Encoding) + + - [4.1 Variants](#4-1-Variants) + + - [4.2 Primitive types](#4-2-Primitive-types) + + - - [5. Complex schemas](#5-Complex-schemas) @@ -53,6 +65,8 @@ The specification is Stable as of this publication's release. --- +The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [[BCP 14]](https://tools.ietf.org/html/bcp14) [[RFC2119]](https://spec.openapis.org/oas/v3.1.2.html#bib-rfc2119) [[RFC8174]](https://spec.openapis.org/oas/v3.1.2.html#bib-rfc8174) when, and only when, they appear in all capitals, as shown here. + ## 1. Background Serialization in the context of Web APIs refers to the process of converting data structures into a format that can be easily transmitted over a network, typically in text-based formats (e.g., JSON, XML), or binary formats (e.g., files, [Protobuf](https://protobuf.dev/)), so that they can be understood and reconstructed by other systems. @@ -80,24 +94,24 @@ In order to meet these objectives, some key design decisions were made: ### 2.1 Byte-order -Compactr binary follows Network Byte Order (NBO) big-endian format. +Compactr binary MUST follow Network Byte Order (NBO) big-endian format. ### 2.2 Key limits -Indices are assigned for properties and stored as an unsigned 8-bit integer. Thus limiting the number of properties per object to 255. +Indices SHALL be assigned for properties and stored as an unsigned 8-bit integer. Thus limiting the number of properties per object to 255. ### 2.3 Size limits -Some primitive types (e.g., `Boolean`) have fixed sizes, thus not requiring size bytes to be encoded, while others (e.g.,: `String`) have dynamic sizes. +Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g.,: `String`) have dynamic sizes and MUST include between one and four size bytes. -Dynamically-sized properties have size limits represented by unsigned integers of varying sizes, which are described in the [primitives](#4-primitives) section of this document. +Size bytes are represented by unsigned integers of varying sizes, which are described in the [primitives](#4-primitives) section of this document. ### 2.4 Schema properties and Encoding order To maintain consistency across systems, the field index for each schema property is based on its alphabetical order, starting from 1. -The sorting function must be based on the numerical order of Unicode (UTF-16) character code values of the property names. +The sorting function MUST be based on the numerical order of Unicode (UTF-16) character code values of the property names. Example: @@ -113,24 +127,36 @@ Example: } ``` -Will attribute index 0 to field `a`, index 1 to field `b` and 2 for `c`. Implementations of this protocol must follow this sorting rule to maintain consistency, even if properties are listed in differring orders across systems. +Will attribute index 0 to field `a`, index 1 to field `b` and 2 for `c`. Implementations of this protocol MUST follow this sorting rule to maintain consistency, even if properties are listed in differring orders across systems. -Encoding of values to generate the binary output simply follows the order in which the properties are listed in the structure or object. +Encoding of values to generate the binary output SHOULD simply follow the order in which the properties are listed in the structure or object. For example, serializing `{ c: true, a: true, b: true }` with the previous schema will output: `0x03 0x01 0x01 0x01 0x02 0x01`. -## 2.5 Versioning +## 2.5 Unsupported features + +### 2.5.1 References + +`$ref` references are supported, with constraints which MUST be enforced in client implementations: + +- Circular references MUST be detected. +- Recursive schemas MAY be supported but implementations SHOULD impose depth limits. +- External $ref targets (remote URLs) MAY be supported but MUST be resolved prior to encoding. + +### 2.5.2 Versioning Compactr binaries do not include version flags and the protocol does not include versioning mechanisms. +Client implementations MAY elect to include integrity or versioning checks provided that the final encoded binary remains compatible with the Compactr protocol. + --- ## 3. Schemas ### 3.1 Schema Source -Compactr schemas are derived from OpenAPI 3.0+ Schema Objects, as defined in [[OAS]]. +Compactr schemas are derived from OpenAPI 3.0+ Schema Objects, as defined in [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications. Only the following schema keywords are normative for Compactr encoding: @@ -147,28 +173,9 @@ Only the following schema keywords are normative for Compactr encoding: All other OpenAPI keywords (e.g., description, example, deprecated) are ignored for encoding purposes. -### 3.2 Supported OpenAPI Types - -| Type | Format | Bytes | Description | -| --- | --- | --- | --- | -| boolean | - | 1 | Boolean value | -| integer | int32 | 4 | 32-bit integer | -| integer | int64 | 8 | 64-bit integer | -| number | float | 4 | 32-bit floating point | -| number | double | 8 | 64-bit floating point | -| string | - | variable | UTF-8 variable size encoding | -| string | uuid | 16 | UUID (compressed) | -| string | ipv4 | 4 | IPv4 address | -| string | ipv6 | 16 | IPv6 address | -| string | date | 4 | Date (YYYY-MM-DD) | -| string | date-time | 8 | ISO 8601 date-time | -| string | binary | variable | Base64 binary data | -| array | - | variable | Array of items | -| object | - | variable | Nested object | - -### 3.3 Required vs Optional Properties +### 3.2 Required vs Optional Properties -Properties listed in required MUST be present during encoding. +Properties listed in required MUST be present during encoding. Missing required properties MUST throw an encoding error. Optional properties MAY be omitted. @@ -176,57 +183,73 @@ Missing optional properties are not encoded and do not occupy space. Decoders MUST treat omitted optional properties as undefined (or language equivalent). -## 4. Primitive types +### 3.3 Walkthrough properties -### 4.1 Boolean +Compactr walks through composition keywords `$ref`, `schema` `oneOf`, `allOf`, `anyOf` and only creates internal models for primitives. -Size: 1 byte +--- -Encoding: +## 4. Encoding -0x00 → false +Properties are encoded with the matching schema index first (`i`), then an optional variant byte (`v`), optional size byte(s) (`s`), then the encoded value (`d`). -0x01 → true +`[i][v?][s?...][d...]` -No size prefix is used. +### 4.1 Variants -### 4.2 Integers +Encoded fields which have the `nullable` schema property and a `null` value have an extra byte that indicates the variant. -Size: 4 bytes +- `0x00` For null values +- `0x01` For non-null values -Encoding: Signed two’s complement, big-endian +If the `nullable` property is not present in the schema, the variant byte is not encoded and `null` values are not encoded. -Valid range: −2³¹ to 2³¹−1 +Fields with multiple definitions, as described in the schema with the `oneOf` or `anyOf` keywords use the variant byte to indicate which definition to use, starting with `0x01` for the first definition, and incrementing by `0x01` for each subsequent one. -### 4.3 Numbers -Size: 4 bytes +### 4.2 Primitive types + +Types are based on JSON Schema Validation Specification Draft 2020-12: `array`, `boolean`, `integer`, `number`, `object` or `string`. -Encoding: IEEE 754 single-precision, big-endian +#### 4.2.1 Array -Mapped from OpenAPI number when format: float. +#### 4.2.2 Boolean -4.5 String +Fixed size of 1 byte, either 0x00 for false or 0x01 for true. -Two encoding modes exist. +#### 4.2.3 Integers -Standalone / array strings -[u16 length][UTF-8 bytes] +Variable size based on the `format` attribute defined in the schema. Size byte SHOULD NOT be encoded. Decoding should take in account the `format` attribute to determine the size. +- `(null, undefined or language equivalent)`: unsigned 32-bit integer +- `int32`: unsigned 32-bit integer +- `int64`: unsigned 64-bit integer -Maximum length: 65,535 bytes. +#### 4.2.4 Numbers -Object property strings -[size][UTF-8 bytes] +Variable size based on the `format` attribute defined in the schema. Size byte SHOULD NOT be encoded. Decoding should take in account the `format` attribute to determine the size. +- `(null, undefined or language equivalent)`: 32-bit floating point +- `float`: 32-bit floating point +- `double`: 64-bit floating point -The size field is supplied by the enclosing object encoding. +#### 4.2.5 Objects -Strings MUST be valid UTF-8. -4.6 Binary -[u32 length][raw bytes] +#### 4.2.6 Strings + +Strings are encoded as UTF-8 Multi-byte Unicode characters. Most languages provide a UTF-8 encoding utility, which SHOULD be used to determine the size and generate the bytes to be appended. + + +### 4.3 Special formats + +Compactr supports encoding of special formats to improve efficiency. Additional special encoding formats MAY be added. + +#### 4.3.1 Binary + + +### 4. Mapped from OpenAPI: @@ -382,3 +405,27 @@ Compactr does not provide encryption, authentication, or integrity guarantees. ## 9. References [OAS] OpenAPI Specification, The OpenAPI initiative, + + + + + + +### 3.2 Supported OpenAPI Types + +| Type | Format | Bytes | Description | +| --- | --- | --- | --- | +| boolean | - | 1 | Boolean value | +| integer | int32 | 4 | 32-bit integer | +| integer | int64 | 8 | 64-bit integer | +| number | float | 4 | 32-bit floating point | +| number | double | 8 | 64-bit floating point | +| string | - | variable | UTF-8 variable size encoding | +| string | uuid | 16 | UUID (compressed) | +| string | ipv4 | 4 | IPv4 address | +| string | ipv6 | 16 | IPv6 address | +| string | date | 4 | Date (YYYY-MM-DD) | +| string | date-time | 8 | ISO 8601 date-time | +| string | binary | variable | Base64 binary data | +| array | - | variable | Array of items | +| object | - | variable | Nested object | From 7aa55d0e2ffe6ab26ebd313535b3f6eeb9e18d27 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sat, 3 Jan 2026 16:43:48 -0500 Subject: [PATCH 07/14] Update README.md --- README.md | 112 +++++++++++------------------------------------------- 1 file changed, 22 insertions(+), 90 deletions(-) diff --git a/README.md b/README.md index 1686e32..69ffd75 100644 --- a/README.md +++ b/README.md @@ -102,7 +102,7 @@ Indices SHALL be assigned for properties and stored as an unsigned 8-bit integer ### 2.3 Size limits -Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g.,: `String`) have dynamic sizes and MUST include between one and four size bytes. +Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g.,: `String`) have variable sizes and MUST include between one and four size bytes. Size bytes are represented by unsigned integers of varying sizes, which are described in the [primitives](#4-primitives) section of this document. @@ -213,7 +213,27 @@ Types are based on JSON Schema Validation Specification Draft 2020-12: `array`, #### 4.2.1 Array -#### 4.2.2 Boolean +Arrays MUST include an unsigned 32-bit integer to represent the whole size of the array. Individual elements are treated sequentially as their primitives defined in the schema. + +Example: + +``` +// Schema +{ + type: 'object', + properties: { + foo: { + type: 'array', + items: { type: 'string' } + } + } +} + +// Data +{ foo: [ 'hello', 'bye', 'bye' ] } +```# +Results in this buffer: `0x01 0x00 0x00 0x0e 0x05 0x68 0x65 0x6c 0x6c 0x6f 0x03 0x62 0x79 0x65 0x03 0x62 0x79 0x65`. +## 4.2.2 Boolean Fixed size of 1 byte, either 0x00 for false or 0x01 for true. @@ -292,94 +312,6 @@ Mapped from: type: string format: date-time -## 5. Complex schemas - -5.1 Object Encoding - -Objects are encoded as: - -[num_properties][property...] - - -Where each property is: - -[index][size][value] - - -num_properties: u8 - -index: u8 (alphabetical index) - -size: variable (see below) - -5.3 Arrays - -Arrays are encoded as a sequence of elements, without a global count: - -[element_size][element_value]... - - -The end of the array is determined by the enclosing object’s size field. - -This design allows streaming decoding. - -5.4 Nested Objects - -Nested objects follow the same encoding rules recursively. - -There is no depth limit imposed by the protocol; implementations SHOULD impose practical limits. - -## 6. Variants - -6.1 Union Types (oneOf, anyOf) - -Variants are encoded as: - -[variant_index][value] - - -variant_index: u8, based on schema order - -value: encoded according to the selected schema - -Example: - -oneOf: - - type: string - - type: int32 - - -Encoding "abc": - -0x00 [string encoding] - -6.2 Nullable Values - -If nullable: true, a null value is encoded as: - -0xff - - -No further bytes follow. - -This sentinel value is reserved and MUST NOT collide with valid indices. - -### 6.3 Custom variants - -## 7. Implementation considerations - -### 7.1 Determinism - -Encoders MUST: - -Alphabetically sort schema properties - -Preserve value insertion order - -Use canonical size encodings - -Failure to do so breaks binary compatibility. - ## 8. Security considerations Implementations MUST guard against: From 3ad5c5cb322703cd20f70a3f115cca8e50459d35 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sat, 3 Jan 2026 16:51:36 -0500 Subject: [PATCH 08/14] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 69ffd75..d937dfd 100644 --- a/README.md +++ b/README.md @@ -231,8 +231,10 @@ Example: // Data { foo: [ 'hello', 'bye', 'bye' ] } -```# +``` + Results in this buffer: `0x01 0x00 0x00 0x0e 0x05 0x68 0x65 0x6c 0x6c 0x6f 0x03 0x62 0x79 0x65 0x03 0x62 0x79 0x65`. + ## 4.2.2 Boolean Fixed size of 1 byte, either 0x00 for false or 0x01 for true. From 662393cf5381706aec6b5478f1f5f570d00a22c8 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sun, 4 Jan 2026 20:15:29 -0500 Subject: [PATCH 09/14] Update README.md --- README.md | 151 ++++++++++++++++++++++++------------------------------ 1 file changed, 66 insertions(+), 85 deletions(-) diff --git a/README.md b/README.md index d937dfd..daf3053 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Keywords: ## Abstract -This document specifies the compactr format, a schema-based serialization protocol that reuses existing [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as schemas. +This document specifies the compactr format, a schema-based serialization protocol that reuses existing [[OAS]](#6-References)OpenAPI specifications as schemas. ## Status @@ -51,21 +51,35 @@ The specification is Stable as of this publication's release. - [4.2 Primitive types](#4-2-Primitive-types) - - - -- [5. Complex schemas](#5-Complex-schemas) - -- [6. Variants](#6-Variants) - -- [7. Implementation considerations](#7-Implementation-considerations) - -- [8. Security considerations](#8-Security-considerations) - -- [9. References](#9-References) + - [4.2.1 Arrays](#4-2-1-arrays) + + - [4.2.2 Boolean](#4-2-2-boolean) + + - [4.2.3 Integers](#4-2-3-integers) + + - [4.2.4 Numbers](#4-2-4-numbers) + + - [4.2.5 Objects](#4-2-5-objects) + + - [4.2.6 Strings](#4-2-6-strings) + + - [4.3 Special formats](#4-3-special-formats) + + - [4.3.1 Binary](#4-3-1-binary) + + - [4.3.2 Date and DateTime](#4-3-2-date-and-datetime) + + - [4.3.3 IPV4 and IPV6](#4-3-3-ipv4-and-ipv6) + + - [4.3.4 UUID](#4-3-4-uuid) + +- [5. Security considerations](#5-Security-considerations) + +- [6. References](#6-References) --- -The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [[BCP 14]](https://tools.ietf.org/html/bcp14) [[RFC2119]](https://spec.openapis.org/oas/v3.1.2.html#bib-rfc2119) [[RFC8174]](https://spec.openapis.org/oas/v3.1.2.html#bib-rfc8174) when, and only when, they appear in all capitals, as shown here. +The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [[RFC2119]](#6-References) [[RFC8174]](#6-References) when, and only when, they appear in all capitals, as shown here. ## 1. Background @@ -85,7 +99,7 @@ While functional, the early versions would still require the knowledge of writin The primary objectives of the Compactr protocols are: -- First-party schema definitions, using [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications as base schemas. +- First-party schema definitions, using [[OAS]](#6-References)OpenAPI specifications as base schemas. - Optimized binary output - Compatibility across runtimes - Type safety @@ -104,7 +118,7 @@ Indices SHALL be assigned for properties and stored as an unsigned 8-bit integer Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g.,: `String`) have variable sizes and MUST include between one and four size bytes. -Size bytes are represented by unsigned integers of varying sizes, which are described in the [primitives](#4-primitives) section of this document. +Size bytes are represented by unsigned integers of varying sizes, which are described in the [primitives](#4-2-primitive-types) section of this document. ### 2.4 Schema properties and Encoding order @@ -251,115 +265,82 @@ Variable size based on the `format` attribute defined in the schema. Size byte S Variable size based on the `format` attribute defined in the schema. Size byte SHOULD NOT be encoded. Decoding should take in account the `format` attribute to determine the size. +All floating-point arithmetic MUST adhere to [[IEEE 754-2019]](#6-References) + - `(null, undefined or language equivalent)`: 32-bit floating point - `float`: 32-bit floating point - `double`: 64-bit floating point #### 4.2.5 Objects - +Objects are encoded recursively using the same scheme: `[i][v?][s?...][d...]`. #### 4.2.6 Strings Strings are encoded as UTF-8 Multi-byte Unicode characters. Most languages provide a UTF-8 encoding utility, which SHOULD be used to determine the size and generate the bytes to be appended. - ### 4.3 Special formats Compactr supports encoding of special formats to improve efficiency. Additional special encoding formats MAY be added. #### 4.3.1 Binary +Variable length `string` format with 32-bit size bytes. -### 4. - -Mapped from OpenAPI: - -type: string -format: binary - - -Maximum length: 4,294,967,295 bytes. - -4.7 UUID - -Size: 16 bytes +Buffers and UInt8Arrays MAY be encoded as-is, while `strings` MUST be Base64 encoded. -Encoding: Raw UUID bytes, network order +#### 4.3.2 Date and DateTime -Mapped from: +Fixed length `string` formats with no size bytes. -type: string -format: uuid +`date` is represented as `[uint32][uint8][uint8]` to encode YYYY-MM-DD values. -4.8 Date +`date-time` is represented as `[uint32][uint8][uint8][uint8][uint8][uint8][uint32]` to encode YYYY-MM-DDTHH:mm:ss.sssZ date strings with UTC time. -Size: 4 bytes +Values MUST be reconstructed as such by the decoder to fit [[ISO 8601]](#6-References) extended date. -Encoding: Signed int32, days since Unix epoch (UTC) +Implementations SHOULD validate that the input string is a valid date string and SHOULD set time bytes to 0 if not explicitly set. -Mapped from: +#### 4.3.3 IPV4 and IPV6 -type: string -format: date +Fixed length `string` formats with no size bytes. -4.9 DateTime +`ipv4` is represented as [uint8][uint8][uint8][uint8] and must be decoded to match [[RFC791]](#6-References) IPV4 format. -Size: 9 bytes +`ipv6` is represented as [uint32][uint32][uint32][uint32] and must be decoded to match [[RFC8200]](#6-References) IPV6 format. -Encoding: Component-based UTC timestamp +#### 4.3.4 UUID -Mapped from: +Fixed sized 16 bytes using raw UUID bytes (network-order) and must be decoded as standard [[RFC9562]](#6-References) UUID. -type: string -format: date-time +--- -## 8. Security considerations +## 5. Security considerations Implementations MUST guard against: -Oversized length prefixes - -Deeply nested schemas - -Malformed UTF-8 - -Integer overflow during size calculations - -Decoders SHOULD impose: - -Maximum object size - -Maximum recursion depth - -Maximum array element count +- Circular schemas +- Malformed UTF-8 +- Integer overflow +- Maximum object keys +- Maximum object recursion depth +- Array total byte size +- Ensure schema formats match the appropriate schema type Compactr does not provide encryption, authentication, or integrity guarantees. -## 9. References - -[OAS] OpenAPI Specification, The OpenAPI initiative, - - - +--- +## 6. References +- [OAS] OpenAPI Specification v3.1.2. The Linux foundation (2025). +- [RFC791] Internet protocol. DARPA Internet Program Protocol Specification. (1981). +- [RFC2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. (1997). +- [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. (2017). +- [RFC8200] Internet Protocol, Version 6 (IPv6) Specification. S. Deering. IETF. (2017). +- [RFC9562] Universally Unique IDentifiers (UUIDs). K. Davis. IETF. (2024) +- [IEEE 754-2019] IEEE 754-2019: IEEE Standard for Floating-Point Arithmetic. Institute of Electrical and Electronic Engineers. (2019). -### 3.2 Supported OpenAPI Types +--- -| Type | Format | Bytes | Description | -| --- | --- | --- | --- | -| boolean | - | 1 | Boolean value | -| integer | int32 | 4 | 32-bit integer | -| integer | int64 | 8 | 64-bit integer | -| number | float | 4 | 32-bit floating point | -| number | double | 8 | 64-bit floating point | -| string | - | variable | UTF-8 variable size encoding | -| string | uuid | 16 | UUID (compressed) | -| string | ipv4 | 4 | IPv4 address | -| string | ipv6 | 16 | IPv6 address | -| string | date | 4 | Date (YYYY-MM-DD) | -| string | date-time | 8 | ISO 8601 date-time | -| string | binary | variable | Base64 binary data | -| array | - | variable | Array of items | -| object | - | variable | Nested object | +Licensed under Apache 2.0, 2026, Compactr, Frederic Charette From 8ecc8522c9af71c1c123d9c8625657574cd0c948 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sun, 4 Jan 2026 20:17:20 -0500 Subject: [PATCH 10/14] Update README.md --- README.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index daf3053..6a719ec 100644 --- a/README.md +++ b/README.md @@ -27,51 +27,51 @@ The specification is Stable as of this publication's release. - [2. Design decisions](#2-Design-decisions) - - [2.1 Byte-order](#2-1-Byte-order) + - [2.1 Byte-order](#21-Byte-order) - - [2.2 Key limits](#2-2-Key-limits) + - [2.2 Key limits](#22-Key-limits) - - [2.3 Size limits](#2-3-Size-limits) + - [2.3 Size limits](#23-Size-limits) - - [2.4 Schema properties and Encoding order](#2-4-Schema-properties-and-Encoding-order) + - [2.4 Schema properties and Encoding order](#24-Schema-properties-and-Encoding-order) - - [2.5 Versioning](#2-5-Versioning) + - [2.5 Versioning](#25-Versioning) - [3. Schemas](#3-Schemas) - - [3.1 Schema Source](#3-1-Schema-source) + - [3.1 Schema Source](#31-Schema-source) - - [3.2 Required vs Optional Properties](#3-2-Required-vs-Optional-Properties) + - [3.2 Required vs Optional Properties](#32-Required-vs-Optional-Properties) - - [3.3 Walkthrough properties](#3-3-Walkthrough-properties) + - [3.3 Walkthrough properties](#33-Walkthrough-properties) - [4. Encoding](#4-Encoding) - - [4.1 Variants](#4-1-Variants) + - [4.1 Variants](#41-Variants) - - [4.2 Primitive types](#4-2-Primitive-types) + - [4.2 Primitive types](#42-Primitive-types) - - [4.2.1 Arrays](#4-2-1-arrays) + - [4.2.1 Arrays](#421-arrays) - - [4.2.2 Boolean](#4-2-2-boolean) + - [4.2.2 Boolean](#422-boolean) - - [4.2.3 Integers](#4-2-3-integers) + - [4.2.3 Integers](#423-integers) - - [4.2.4 Numbers](#4-2-4-numbers) + - [4.2.4 Numbers](#424-numbers) - - [4.2.5 Objects](#4-2-5-objects) + - [4.2.5 Objects](#425-objects) - - [4.2.6 Strings](#4-2-6-strings) + - [4.2.6 Strings](#426-strings) - - [4.3 Special formats](#4-3-special-formats) + - [4.3 Special formats](#43-special-formats) - - [4.3.1 Binary](#4-3-1-binary) + - [4.3.1 Binary](#431-binary) - - [4.3.2 Date and DateTime](#4-3-2-date-and-datetime) + - [4.3.2 Date and DateTime](#432-date-and-datetime) - - [4.3.3 IPV4 and IPV6](#4-3-3-ipv4-and-ipv6) + - [4.3.3 IPV4 and IPV6](#433-ipv4-and-ipv6) - - [4.3.4 UUID](#4-3-4-uuid) + - [4.3.4 UUID](#434-uuid) - [5. Security considerations](#5-Security-considerations) From 9a36739031abf8b90807f9cd5670a2e68ac2b867 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Sun, 4 Jan 2026 20:18:13 -0500 Subject: [PATCH 11/14] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6a719ec..5a9fa27 100644 --- a/README.md +++ b/README.md @@ -225,7 +225,7 @@ Fields with multiple definitions, as described in the schema with the `oneOf` or Types are based on JSON Schema Validation Specification Draft 2020-12: `array`, `boolean`, `integer`, `number`, `object` or `string`. -#### 4.2.1 Array +#### 4.2.1 Arrays Arrays MUST include an unsigned 32-bit integer to represent the whole size of the array. Individual elements are treated sequentially as their primitives defined in the schema. @@ -249,7 +249,7 @@ Example: Results in this buffer: `0x01 0x00 0x00 0x0e 0x05 0x68 0x65 0x6c 0x6c 0x6f 0x03 0x62 0x79 0x65 0x03 0x62 0x79 0x65`. -## 4.2.2 Boolean +#### 4.2.2 Boolean Fixed size of 1 byte, either 0x00 for false or 0x01 for true. From 5268d9d47a9de3568956c1147c07c4a4c8e4a1ef Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Tue, 6 Jan 2026 09:54:19 -0500 Subject: [PATCH 12/14] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5a9fa27..c3a95f9 100644 --- a/README.md +++ b/README.md @@ -267,7 +267,7 @@ Variable size based on the `format` attribute defined in the schema. Size byte S All floating-point arithmetic MUST adhere to [[IEEE 754-2019]](#6-References) -- `(null, undefined or language equivalent)`: 32-bit floating point +- `(null, undefined or language equivalent)`: 64-bit floating point - `float`: 32-bit floating point - `double`: 64-bit floating point From a465dbe3031637f093f3904d48237d093bdbacb5 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Tue, 6 Jan 2026 11:00:29 -0500 Subject: [PATCH 13/14] Editorial fixes --- README.md | 400 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 331 insertions(+), 69 deletions(-) diff --git a/README.md b/README.md index c3a95f9..d5f1a7c 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Keywords: ## Abstract -This document specifies the compactr format, a schema-based serialization protocol that reuses existing [[OAS]](#6-References)OpenAPI specifications as schemas. +This document specifies the compactr format, a schema-based serialization protocol that reuses existing [[OAS]](#6-References) OpenAPI specifications as schemas. ## Status @@ -35,43 +35,47 @@ The specification is Stable as of this publication's release. - [2.4 Schema properties and Encoding order](#24-Schema-properties-and-Encoding-order) - - [2.5 Versioning](#25-Versioning) + - [2.5 Schema References and Versioning](#25-Schema-References-and-Versioning) - [3. Schemas](#3-Schemas) - [3.1 Schema Source](#31-Schema-source) - [3.2 Required vs Optional Properties](#32-Required-vs-Optional-Properties) - - - [3.3 Walkthrough properties](#33-Walkthrough-properties) + + - [3.3 Composition Keywords](#33-Composition-Keywords) - [4. Encoding](#4-Encoding) - [4.1 Variants](#41-Variants) - [4.2 Primitive types](#42-Primitive-types) - + - [4.2.1 Arrays](#421-arrays) - + - [4.2.2 Boolean](#422-boolean) - + - [4.2.3 Integers](#423-integers) - + - [4.2.4 Numbers](#424-numbers) - + - [4.2.5 Objects](#425-objects) - + - [4.2.6 Strings](#426-strings) - - - [4.3 Special formats](#43-special-formats) - - [4.3.1 Binary](#431-binary) - - - [4.3.2 Date and DateTime](#432-date-and-datetime) - - - [4.3.3 IPV4 and IPV6](#433-ipv4-and-ipv6) - - - [4.3.4 UUID](#434-uuid) + - [4.3 Size Encoding](#43-size-encoding) + + - [4.4 Edge Cases](#44-edge-cases) + + - [4.5 Special formats](#45-special-formats) + + - [4.5.1 Binary](#451-binary) + + - [4.5.2 Date and DateTime](#452-date-and-datetime) + + - [4.5.3 IPV4 and IPV6](#453-ipv4-and-ipv6) + + - [4.5.4 UUID](#454-uuid) - [5. Security considerations](#5-Security-considerations) @@ -91,7 +95,7 @@ Schema-based serialization protocols generally yield much smaller outputs, which The initial concept for the compactr protocol was drafted in [2016](https://www.npmjs.com/package/compactr/v/0.0.1) with the goal of creating a schema-based serialization protocol that outputs minimal binary while using first party markdown or code structures as schemas. -While functional, the early versions would still require the knowledge of writing "compactr-style" schemas as Javascript Objects or JSON and limited adoption for languages outside of Javascript. As of compactr.js 3.0, release in 2025, the protocol moved to adopt OpenAPI 3.x as the base format for compactr schemas. +While functional, the early versions would still require the knowledge of writing "compactr-style" schemas as Javascript Objects or JSON and limited adoption for languages outside of Javascript. As of compactr.js 3.0, released in 2025, the protocol moved to adopt OpenAPI 3.x as the base format for compactr schemas. --- @@ -99,7 +103,7 @@ While functional, the early versions would still require the knowledge of writin The primary objectives of the Compactr protocols are: -- First-party schema definitions, using [[OAS]](#6-References)OpenAPI specifications as base schemas. +- First-party schema definitions, using [[OAS]](#6-References) OpenAPI specifications as base schemas. - Optimized binary output - Compatibility across runtimes - Type safety @@ -112,13 +116,13 @@ Compactr binary MUST follow Network Byte Order (NBO) big-endian format. ### 2.2 Key limits -Indices SHALL be assigned for properties and stored as an unsigned 8-bit integer. Thus limiting the number of properties per object to 255. +Indices SHALL be assigned for properties and stored as an unsigned 8-bit integer (range 0-255). Since indices start from 1 (as per Section 2.4), the maximum property index is 255, thus limiting the number of properties per object to 255. ### 2.3 Size limits -Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g.,: `String`) have variable sizes and MUST include between one and four size bytes. +Some primitive types (e.g., `Boolean`) have fixed sizes and therefore MUST NOT encode size bytes, while others (e.g., `String`, `Array`) have variable sizes and MUST include size bytes. -Size bytes are represented by unsigned integers of varying sizes, which are described in the [primitives](#4-2-primitive-types) section of this document. +Size bytes are represented by unsigned integers. The specific size encoding for each type is described in the [Size Encoding](#43-size-encoding) section of this document. ### 2.4 Schema properties and Encoding order @@ -141,14 +145,14 @@ Example: } ``` -Will attribute index 0 to field `a`, index 1 to field `b` and 2 for `c`. Implementations of this protocol MUST follow this sorting rule to maintain consistency, even if properties are listed in differring orders across systems. +Will attribute index 1 to field `a`, index 2 to field `b` and index 3 to field `c`. Implementations of this protocol MUST follow this sorting rule to maintain consistency, even if properties are listed in differing orders across systems. Encoding of values to generate the binary output SHOULD simply follow the order in which the properties are listed in the structure or object. For example, serializing `{ c: true, a: true, b: true }` with the previous schema will output: `0x03 0x01 0x01 0x01 0x02 0x01`. -## 2.5 Unsupported features +## 2.5 Schema References and Versioning ### 2.5.1 References @@ -170,7 +174,7 @@ Client implementations MAY elect to include integrity or versioning checks provi ### 3.1 Schema Source -Compactr schemas are derived from OpenAPI 3.0+ Schema Objects, as defined in [[OAS]](https://spec.openapis.org/oas/v3.1.2.html)OpenAPI specifications. +Compactr schemas are derived from OpenAPI 3.0+ Schema Objects, as defined in [[OAS]](https://spec.openapis.org/oas/v3.1.2.html) OpenAPI specifications. Only the following schema keywords are normative for Compactr encoding: @@ -189,17 +193,51 @@ All other OpenAPI keywords (e.g., description, example, deprecated) are ignored ### 3.2 Required vs Optional Properties -Properties listed in required MUST be present during encoding. Missing required properties MUST throw an encoding error. +Properties listed in the schema's `required` array MUST be present during encoding. Missing required properties MUST throw an encoding error. -Optional properties MAY be omitted. +Optional properties (not listed in `required`) MAY be omitted from the data being encoded. -Missing optional properties are not encoded and do not occupy space. +Missing optional properties are not encoded and do not occupy space in the binary output. Decoders MUST treat omitted optional properties as undefined (or language equivalent). -### 3.3 Walkthrough properties +Example: + +``` +// Schema +{ + type: 'object', + properties: { + name: { type: 'string' }, + age: { type: 'integer', format: 'int32' } + }, + required: ['name'] +} + +// Data with both properties +{ name: 'Alice', age: 30 } +// Output: Both properties encoded + +// Data with only required property +{ name: 'Alice' } +// Output: Only 'name' encoded, 'age' omitted entirely + +// Data missing required property +{ age: 30 } +// Result: Encoding error +``` + +### 3.3 Composition Keywords + +Compactr walks through composition and walkthrough keywords `$ref`, `schema`, `oneOf`, `allOf`, `anyOf` and only creates internal models for primitives. + +**`allOf`**: Merges all schemas in the array. All properties from all schemas MUST be encoded. The merged schema is treated as a single object schema. -Compactr walks through composition keywords `$ref`, `schema` `oneOf`, `allOf`, `anyOf` and only creates internal models for primitives. +**`oneOf` and `anyOf`**: Require a variant byte (see Section 4.1) to indicate which schema definition is being encoded. The variant byte starts at `0x01` for the first schema in the array and increments for each subsequent schema. + +**`$ref`**: Resolves to the referenced schema and encodes according to that schema's type. + +**`schema`**: A walkthrough keyword that wraps a schema definition without affecting encoding. --- @@ -211,15 +249,68 @@ Properties are encoded with the matching schema index first (`i`), then an optio ### 4.1 Variants -Encoded fields which have the `nullable` schema property and a `null` value have an extra byte that indicates the variant. +Encoded fields which have the `nullable` schema property MUST include a variant byte that indicates whether the value is null or not. -- `0x00` For null values -- `0x01` For non-null values +- `0x00` For null values (no data bytes follow) +- `0x01` For non-null values (data bytes follow as per the type specification) -If the `nullable` property is not present in the schema, the variant byte is not encoded and `null` values are not encoded. +If the `nullable` property is not present in the schema, the variant byte MUST NOT be encoded. Attempts to encode `null` values for non-nullable fields MUST result in an encoding error. Fields with multiple definitions, as described in the schema with the `oneOf` or `anyOf` keywords use the variant byte to indicate which definition to use, starting with `0x01` for the first definition, and incrementing by `0x01` for each subsequent one. +Example of nullable field: + +``` +// Schema +{ + type: 'object', + properties: { + optionalValue: { type: 'string', nullable: true } + } +} + +// Data with null value +{ optionalValue: null } + +// Output: 0x01 0x00 +// Breakdown: [index: 1][variant: null] + +// Data with non-null value +{ optionalValue: 'test' } + +// Output: 0x01 0x01 0x04 0x74 0x65 0x73 0x74 +// Breakdown: [index: 1][variant: non-null][size: 4]['test'] +``` + +Example of oneOf variant: + +``` +// Schema +{ + type: 'object', + properties: { + value: { + oneOf: [ + { type: 'string' }, + { type: 'integer', format: 'int32' } + ] + } + } +} + +// Data with first variant (string) +{ value: 'hello' } + +// Output: 0x01 0x01 0x05 0x68 0x65 0x6c 0x6c 0x6f +// Breakdown: [index: 1][variant: 1][size: 5]['hello'] + +// Data with second variant (integer) +{ value: 42 } + +// Output: 0x01 0x02 0x00 0x00 0x00 0x2a +// Breakdown: [index: 1][variant: 2][int32: 42] +``` + ### 4.2 Primitive types @@ -227,7 +318,7 @@ Types are based on JSON Schema Validation Specification Draft 2020-12: `array`, #### 4.2.1 Arrays -Arrays MUST include an unsigned 32-bit integer to represent the whole size of the array. Individual elements are treated sequentially as their primitives defined in the schema. +Arrays MUST include an unsigned 32-bit integer to represent the total byte size of all encoded array elements combined. Individual elements are then treated sequentially as their primitives defined in the schema. Example: @@ -247,87 +338,257 @@ Example: { foo: [ 'hello', 'bye', 'bye' ] } ``` -Results in this buffer: `0x01 0x00 0x00 0x0e 0x05 0x68 0x65 0x6c 0x6c 0x6f 0x03 0x62 0x79 0x65 0x03 0x62 0x79 0x65`. +Results in this buffer: `0x01 0x00 0x00 0x00 0x0e 0x05 0x68 0x65 0x6c 0x6c 0x6f 0x03 0x62 0x79 0x65 0x03 0x62 0x79 0x65`. + +Breakdown: `[index: 1][array size: 14 bytes as 32-bit int][string 'hello': size 5 + 5 bytes][string 'bye': size 3 + 3 bytes][string 'bye': size 3 + 3 bytes]` #### 4.2.2 Boolean Fixed size of 1 byte, either 0x00 for false or 0x01 for true. +Example: + +``` +// Schema +{ + type: 'object', + properties: { + isActive: { type: 'boolean' } + } +} + +// Data +{ isActive: true } + +// Output: 0x01 0x01 +// Breakdown: [index: 1][value: true] +``` + #### 4.2.3 Integers -Variable size based on the `format` attribute defined in the schema. Size byte SHOULD NOT be encoded. Decoding should take in account the `format` attribute to determine the size. +Variable size based on the `format` attribute defined in the schema. Size bytes MUST NOT be encoded. Decoders MUST use the `format` attribute to determine the byte size. - `(null, undefined or language equivalent)`: unsigned 32-bit integer -- `int32`: unsigned 32-bit integer -- `int64`: unsigned 64-bit integer +- `int32`: signed 32-bit integer +- `int64`: signed 64-bit integer + +Example: + +``` +// Schema +{ + type: 'object', + properties: { + count: { type: 'integer', format: 'int32' } + } +} + +// Data +{ count: 42 } + +// Output: 0x01 0x00 0x00 0x00 0x2a +// Breakdown: [index: 1][int32 value: 42 in big-endian] +``` #### 4.2.4 Numbers -Variable size based on the `format` attribute defined in the schema. Size byte SHOULD NOT be encoded. Decoding should take in account the `format` attribute to determine the size. +Variable size based on the `format` attribute defined in the schema. Size bytes MUST NOT be encoded. Decoders MUST use the `format` attribute to determine the byte size. + +All floating-point arithmetic MUST adhere to [[IEEE 754-2019]](#6-References). + +- `(null, undefined or language equivalent)`: 64-bit floating point (double precision) +- `float`: 32-bit floating point (single precision) +- `double`: 64-bit floating point (double precision) + +Example: + +``` +// Schema +{ + type: 'object', + properties: { + price: { type: 'number', format: 'float' } + } +} -All floating-point arithmetic MUST adhere to [[IEEE 754-2019]](#6-References) +// Data +{ price: 19.99 } -- `(null, undefined or language equivalent)`: 64-bit floating point -- `float`: 32-bit floating point -- `double`: 64-bit floating point +// Output: 0x01 0x41 0xa0 0x3d 0x71 +// Breakdown: [index: 1][IEEE 754 single-precision float for 19.99] +``` #### 4.2.5 Objects Objects are encoded recursively using the same scheme: `[i][v?][s?...][d...]`. +Example: + +``` +// Schema +{ + type: 'object', + properties: { + user: { + type: 'object', + properties: { + name: { type: 'string' }, + age: { type: 'integer', format: 'int32' } + } + } + } +} + +// Data +{ user: { name: 'Alice', age: 30 } } + +// Output: 0x01 0x01 0x00 0x00 0x00 0x1e 0x02 0x05 0x41 0x6c 0x69 0x63 0x65 +// Breakdown: [user index: 1][nested age index: 1][age value: 30][nested name index: 2][name size: 5]['Alice'] +``` + #### 4.2.6 Strings -Strings are encoded as UTF-8 Multi-byte Unicode characters. Most languages provide a UTF-8 encoding utility, which SHOULD be used to determine the size and generate the bytes to be appended. +Strings are encoded as UTF-8 multi-byte Unicode characters. The byte size is encoded as an unsigned 8-bit integer (1 byte, supporting sizes 0-255) followed by the UTF-8 encoded bytes. Most languages provide a UTF-8 encoding utility, which SHOULD be used to determine the size and generate the bytes to be appended. + +For strings exceeding 255 bytes, see Section 4.3 Size Encoding for implementation requirements. -### 4.3 Special formats +Example: + +``` +// Schema +{ + type: 'object', + properties: { + message: { type: 'string' } + } +} + +// Data +{ message: 'Hello' } + +// Output: 0x01 0x05 0x48 0x65 0x6c 0x6c 0x6f +// Breakdown: [index: 1][size: 5]['Hello' as UTF-8 bytes] +``` + +### 4.3 Size Encoding + +Variable-length types (strings, arrays, binary) encode their size using unsigned integers. The number of bytes used for size encoding depends on the type: + +- **Strings**: unsigned 8-bit integer (1 byte) for sizes 0-255 +- **Arrays**: unsigned 32-bit integer (4 bytes) for total byte size +- **Binary**: unsigned 32-bit integer (4 bytes) for byte size + +For strings, if the UTF-8 encoded byte size exceeds 255 bytes, implementations MUST either use a larger integer type or throw an encoding error. Future versions of this specification MAY support variable-length integer encoding for sizes. + +### 4.4 Edge Cases + +#### Empty Arrays and Empty Strings + +Empty arrays MUST encode a size of `0x00 0x00 0x00 0x00` (4 bytes) followed by no element bytes. + +Empty strings MUST encode a size of `0x00` (1 byte) followed by no character bytes. + +#### Unknown Schema Keywords + +Implementations MUST ignore schema keywords that are not normative for Compactr encoding (as listed in Section 3.1). Non-normative keywords (e.g., `description`, `example`, `deprecated`) SHOULD NOT affect encoding or decoding behavior. + +#### Missing Required Properties + +Encoders MUST throw an error if a required property (listed in the schema's `required` array) is missing from the data being encoded. + +Decoders MUST throw an error if a required property is missing from the encoded binary when the schema specifies it as required. + +### 4.5 Special formats Compactr supports encoding of special formats to improve efficiency. Additional special encoding formats MAY be added. -#### 4.3.1 Binary +#### 4.5.1 Binary Variable length `string` format with 32-bit size bytes. -Buffers and UInt8Arrays MAY be encoded as-is, while `strings` MUST be Base64 encoded. +Binary data in Buffers and UInt8Arrays MAY be encoded as-is (raw bytes), while binary data represented as `strings` MUST be encoded to Base64 before binary encoding. -#### 4.3.2 Date and DateTime +#### 4.5.2 Date and DateTime Fixed length `string` formats with no size bytes. -`date` is represented as `[uint32][uint8][uint8]` to encode YYYY-MM-DD values. +`date` is represented as 6 bytes encoding YYYY-MM-DD: +- `[uint32]` Year (0-9999) +- `[uint8]` Month (1-12) +- `[uint8]` Day (1-31) -`date-time` is represented as `[uint32][uint8][uint8][uint8][uint8][uint8][uint32]` to encode YYYY-MM-DDTHH:mm:ss.sssZ date strings with UTC time. +`date-time` is represented as 18 bytes encoding YYYY-MM-DDTHH:mm:ss.sssZ (UTC time): +- `[uint32]` Year (0-9999) +- `[uint8]` Month (1-12) +- `[uint8]` Day (1-31) +- `[uint8]` Hour (0-23) +- `[uint8]` Minute (0-59) +- `[uint8]` Second (0-59) +- `[uint32]` Milliseconds (0-999) -Values MUST be reconstructed as such by the decoder to fit [[ISO 8601]](#6-References) extended date. +Decoders MUST reconstruct values to conform to [[ISO 8601]](#6-References) extended date format. -Implementations SHOULD validate that the input string is a valid date string and SHOULD set time bytes to 0 if not explicitly set. +Encoders SHOULD validate that input strings are valid dates and SHOULD set time components to 0 when not explicitly specified. -#### 4.3.3 IPV4 and IPV6 +#### 4.5.3 IPV4 and IPV6 Fixed length `string` formats with no size bytes. -`ipv4` is represented as [uint8][uint8][uint8][uint8] and must be decoded to match [[RFC791]](#6-References) IPV4 format. +`ipv4` is represented as 4 bytes encoding dotted-decimal notation (e.g., "192.168.1.1"): +- `[uint8]` First octet (0-255) +- `[uint8]` Second octet (0-255) +- `[uint8]` Third octet (0-255) +- `[uint8]` Fourth octet (0-255) + +Decoders MUST reconstruct to dotted-decimal string format as specified in [[RFC791]](#6-References). -`ipv6` is represented as [uint32][uint32][uint32][uint32] and must be decoded to match [[RFC8200]](#6-References) IPV6 format. +`ipv6` is represented as 16 bytes encoding the 128-bit IPv6 address (e.g., "2001:0db8:85a3:0000:0000:8a2e:0370:7334"): +- `[uint32]` First 32 bits +- `[uint32]` Second 32 bits +- `[uint32]` Third 32 bits +- `[uint32]` Fourth 32 bits -#### 4.3.4 UUID +Decoders MUST reconstruct to standard IPv6 string format as specified in [[RFC8200]](#6-References). -Fixed sized 16 bytes using raw UUID bytes (network-order) and must be decoded as standard [[RFC9562]](#6-References) UUID. +#### 4.5.4 UUID + +Fixed sized 16 bytes using raw UUID bytes (network-order) and MUST be decoded as standard [[RFC9562]](#6-References) UUID. --- ## 5. Security considerations -Implementations MUST guard against: +Implementations MUST guard against the following security vulnerabilities: + +### 5.1 Schema Validation + +**Circular schemas**: Detect and reject schemas with circular `$ref` references to prevent infinite loops during encoding/decoding. + +**Format-type mismatches**: Validate that schema `format` attributes match their declared `type`. For example, a schema declaring `{ type: 'number', format: 'uuid' }` is invalid and MUST be rejected, as `uuid` format only applies to `string` types. + +Example of valid format-type combinations: +- `{ type: 'string', format: 'date' }` - Valid +- `{ type: 'string', format: 'uuid' }` - Valid +- `{ type: 'integer', format: 'int32' }` - Valid +- `{ type: 'integer', format: 'date' }` - Invalid (must reject) + +### 5.2 Input Validation + +**Malformed UTF-8**: Validate all string inputs are valid UTF-8 before encoding. Reject or sanitize malformed UTF-8 sequences. + +**Integer overflow**: Validate that integer values fit within their declared format's range (e.g., int32 values must be between -2,147,483,648 and 2,147,483,647). + +### 5.3 Resource Limits + +**Maximum object keys**: Enforce a maximum limit on the number of properties per object (255 as per Section 2.2) to prevent resource exhaustion. + +**Maximum recursion depth**: Implement a maximum depth limit for nested objects to prevent stack overflow attacks. A reasonable limit is 100 levels of nesting. + +**Array byte size limits**: Enforce maximum array byte sizes to prevent memory exhaustion attacks. Implementations SHOULD reject arrays exceeding a configured size limit (e.g., 100MB). -- Circular schemas -- Malformed UTF-8 -- Integer overflow -- Maximum object keys -- Maximum object recursion depth -- Array total byte size -- Ensure schema formats match the appropriate schema type +### 5.4 Cryptographic Considerations -Compactr does not provide encryption, authentication, or integrity guarantees. +Compactr does not provide encryption, authentication, or integrity guarantees. Applications requiring these properties MUST implement them at the transport or application layer. --- @@ -340,6 +601,7 @@ Compactr does not provide encryption, authentication, or integrity guarantees. - [RFC8200] Internet Protocol, Version 6 (IPv6) Specification. S. Deering. IETF. (2017). - [RFC9562] Universally Unique IDentifiers (UUIDs). K. Davis. IETF. (2024) - [IEEE 754-2019] IEEE 754-2019: IEEE Standard for Floating-Point Arithmetic. Institute of Electrical and Electronic Engineers. (2019). +- [ISO 8601] Date and time format. International Organization for Standardization. --- From 9c60efe364c6d6c3ad6b6034330d6542b05e8a35 Mon Sep 17 00:00:00 2001 From: Frederic Charette Date: Tue, 6 Jan 2026 11:20:08 -0500 Subject: [PATCH 14/14] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d5f1a7c..98c5465 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Authors: Date published: 2026-01-01 -Last update: 2026-01-02 +Last update: 2026-01-06 Keywords: - serialization @@ -229,7 +229,7 @@ Example: ### 3.3 Composition Keywords -Compactr walks through composition and walkthrough keywords `$ref`, `schema`, `oneOf`, `allOf`, `anyOf` and only creates internal models for primitives. +Compactr walks through composition keywords `$ref`, `schema`, `oneOf`, `allOf`, `anyOf` and `schema` to only create internal models for primitives. **`allOf`**: Merges all schemas in the array. All properties from all schemas MUST be encoded. The merged schema is treated as a single object schema.