Section: Core Specification Version: 0.1
A Codex document is packaged as a ZIP archive with the file extension .cdx. This approach provides:
- Familiar tooling and broad platform support
- Built-in compression at the container level
- Random access to individual components
- Easy inspection and debugging
Codex documents MUST use the file extension .cdx.
| Form | MIME Type | Use |
|---|---|---|
| Canonical (JSON) | application/vnd.codex+json |
Primary format |
| Binary | application/vnd.codex |
Future optimization |
Implementations SHOULD register these MIME types with the operating system for proper file association.
Codex documents MUST be valid ZIP archives conforming to APPNOTE.TXT version 6.3.3 or later.
The following ZIP features are REQUIRED:
- ZIP64 extensions for documents larger than 4GB
- UTF-8 encoding for file names (Language Encoding Flag set)
The following ZIP features MUST NOT be used:
- ZIP encryption (use Codex security extension instead)
- Multi-volume archives
Individual files within the archive MAY use the following compression methods:
| Method | Code | Use Case |
|---|---|---|
| Store | 0 | Pre-compressed assets (AVIF, WebP) |
| Deflate | 8 | General content, wide compatibility |
| Zstandard | 93 | Optimized compression (recommended) |
Implementations MUST support Deflate (method 8). Support for Zstandard (method 93) is RECOMMENDED.
The archive MUST contain the following structure:
/
├── manifest.json # REQUIRED
├── content/
│ └── document.json # REQUIRED
├── presentation/ # OPTIONAL
│ ├── paginated.json
│ └── continuous.json
├── assets/ # OPTIONAL
│ ├── images/
│ ├── fonts/
│ └── embeds/
├── security/ # OPTIONAL
│ └── signatures.json
└── metadata/
└── dublin-core.json # REQUIRED
| Path | Description |
|---|---|
/manifest.json |
Document manifest with version, state, and structure |
/content/document.json |
Semantic content blocks |
/metadata/dublin-core.json |
Dublin Core metadata |
| Path | Description |
|---|---|
/presentation/ |
Presentation layer files |
/assets/ |
Embedded resources |
/security/ |
Signatures and encryption metadata |
/phantoms/ |
Off-page annotation clusters (Phantom Extension) |
All file and directory names within the archive:
- MUST be encoded as UTF-8
- MUST use forward slash (
/) as path separator - MUST NOT contain backslash (
\) - MUST NOT begin with
/(paths are relative to archive root) - SHOULD use lowercase for standard paths
- SHOULD use URL-safe characters for asset names
The archive MAY include a ZIP comment containing:
Codex Document Format v0.1
This enables format identification without extracting content.
The first file in the archive MUST be manifest.json. This enables:
- Quick format validation
- Streaming access to document metadata
- Efficient partial loading
| Component | Recommended Limit | Rationale |
|---|---|---|
| Total archive size | 2 GB | Practical processing |
| Individual file size | 500 MB | Memory efficiency |
| Number of files | 10,000 | File system compatibility |
| Path length | 255 characters | Cross-platform compatibility |
Implementations MAY support larger documents but SHOULD warn users about potential compatibility issues.
Conforming implementations MUST support:
- Archives up to 100 MB
- Individual files up to 50 MB
- At least 1,000 files
- Paths up to 200 characters
Standard ZIP CRC-32 checksums MUST be present for all files in the archive.
The document's content-addressable hash (see Document Hashing specification) provides integrity verification at the semantic level, independent of container-level checksums.
Extensions MAY define additional directories under the root. Custom directories:
- MUST NOT conflict with standard paths
- SHOULD use a namespace prefix (e.g.,
/x-myextension/) - MUST be documented in the manifest
Implementations MUST ignore unrecognized files and directories. This enables:
- Future specification extensions
- Application-specific metadata
- Gradual migration between versions
When creating a Codex document:
- Write
manifest.jsonas the first entry - Add required content and metadata files
- Add presentation layers (if any)
- Add assets (if any)
- Add security files (if any)
- Use Zstandard compression where supported, Deflate otherwise
- Store pre-compressed images without additional compression
When reading a Codex document:
- Verify the archive is a valid ZIP
- Read
manifest.jsonfirst to determine version and structure - Validate required files exist
- Load content lazily where possible (especially assets)
For large documents, implementations SHOULD support:
- Streaming extraction without loading entire archive
- Random access to specific files via ZIP central directory
- Progressive loading of content blocks
Implementations MUST validate that extracted paths do not traverse outside the intended directory (zip slip vulnerability). Paths containing .. segments MUST be rejected.
Implementations SHOULD impose limits on:
- Compression ratio (reject suspiciously high ratios)
- Decompressed size relative to compressed size
- Total extraction size
ZIP archives MAY contain symbolic links. Implementations MUST NOT follow symbolic links that point outside the archive or extraction directory.