Skip to content

Latest commit

 

History

History
218 lines (163 loc) · 6.51 KB

File metadata and controls

218 lines (163 loc) · 6.51 KB

File Format

Pocket DB files are append-only operation logs.

All multi-byte integers in the current format are encoded as big-endian unless explicitly stated otherwise.

File Header

Every database file starts with a 12-byte file header:

8 bytes   magic string "pocketdb" (ASCII)
1 byte    file format major version
1 byte    file format minor version
1 byte    serialization format ('j' = JSON, ASCII)
1 byte    serialization format version

The magic string is used to reject files that are not Pocket DB files. The format major version gates breaking changes; any mismatch causes open() to throw. The serialization format byte identifies the document encoding used throughout the file; all operation payloads that embed documents use the same format. Mixing serialization formats inside one file is not supported.

Current values:

Field Value
magic pocketdb
format major 0
format minor 1
serialization format j (0x6a)
serialization version 0

Operation records start at byte offset 12 (immediately after the file header).

Operation Record

Every operation record has the same outer layout:

4 bytes   operation identifier (4 ASCII bytes)
4 bytes   payload length, uint32 big-endian
N bytes   payload (padded to 4-byte alignment, length stored is padded length)
4 bytes   crc32(identifier + length_field + payload)

The operation identifier also acts as a per-operation magic value. The CRC32 is computed over the identifier, the length field, and the full padded payload. A checksum mismatch causes the read to throw immediately.

Total record size: 4 + 4 + N + 4 bytes, where N is the padded payload length.

U29

U29 encodes an unsigned 29-bit integer in 1 to 4 bytes, using the same broad shape as UTF-8 variable-length encoding. The high bit of each byte signals whether more bytes follow.

Range Bytes used
0x00 – 0x7F 1
0x80 – 0x3FFF 2
0x4000 – 0x1FFFFF 3
0x200000 – 0x1FFFFFFF 4

Pocket DB uses U29 for UTF-8 string and JSON byte lengths inside payloads.

Operations

ncl1 — New Collection

4 bytes   collection id
U29       collection name byte length
N bytes   collection name, UTF-8
padding   zero bytes until payload is aligned to 4 bytes

The collection id is a 4-byte binary identifier chosen randomly at creation time. Collection names must be unique inside a database. A ncl1 record is written whenever db.collection(name) is called for a name not yet in the database.

dco1 — Drop Collection

4 bytes   collection id

Marks the collection and all its documents as deleted. During replay, dco1 removes the collection from the in-memory registry and clears its primary index. Compaction discards all ncl1, idx1, and put1 records that belonged to the dropped collection.

idx1 — Create Index

4 bytes   collection id
1 byte    index type (1 = string, 2 = number)
U29       field name byte length
N bytes   field name, UTF-8
padding   zero bytes until payload is aligned to 4 bytes

Index definitions are persisted in the log. Index contents are rebuilt in memory when the database opens. A idx1 record is written once per createIndex() call; calling createIndex() again for the same field and type is a no-op at the log level.

dix1 — Drop Index

4 bytes   collection id
U29       field name byte length
N bytes   field name, UTF-8
padding   zero bytes until payload is aligned to 4 bytes

During replay, dix1 removes the named index from the in-memory index manager of its collection.

put1 — Put Document

4 bytes    collection id
12 bytes   document id
U29        JSON document byte length
N bytes    JSON.stringify(document), UTF-8
padding    zero bytes until payload is aligned to 4 bytes

put1 is used for inserts, replacements, and updates. The document is stored as its full JSON representation. Replay updates the primary index to point to the latest put1 offset for each document id; earlier versions of the same document are superseded and become dead records until the next compaction.

del1 — Delete Document

4 bytes    collection id
12 bytes   document id

Replay removes the document id from the collection's primary index and all secondary indexes. There is no padding: the payload is exactly 16 bytes and is already 4-byte aligned.

txnb — Transaction Begin

empty payload

Starts a transaction group. Operations after txnb are staged during replay until a matching txnc is found. Nested txnb inside an open transaction is an error.

txnc — Transaction Commit

empty payload

Commits the staged transaction operations all at once. If the log ends after txnb without a txnc (e.g. the process crashed mid-batch), the staged operations are silently discarded during replay. txnc without a preceding txnb is an error.

hol0 — Hole

empty payload (or arbitrary payload)

A no-op placeholder. Replay skips hol0 records entirely. Compaction discards them. Reserved for future use: potential uses include reserving space for in-place rewriting within a single record slot, or acting as a tombstone in scenarios where an operation is logically cancelled before the file is compacted.

Document Identifiers

Document ids are 12-byte ObjectId-style values:

4 bytes   unix timestamp, seconds, big-endian
5 bytes   process-random (fixed per process lifetime)
3 bytes   monotonic counter, big-endian, wraps at 0xFFFFFF

Exposed as 24-character lowercase hexadecimal strings.

The process-random component differentiates ids generated by different processes; the counter differentiates ids generated within the same second by the same process. The combination makes collisions extremely unlikely without requiring global coordination.

Payload Alignment

All payloads with variable-length content are zero-padded to 4-byte alignment. The stored payload length includes the padding. Decoders verify that padding bytes are zero and reject payloads with non-zero padding.

Summary Table

Identifier Name Payload size
ncl1 New collection variable (aligned)
dco1 Drop collection 4 bytes
idx1 Create index variable (aligned)
dix1 Drop index variable (aligned)
put1 Put document variable (aligned)
del1 Delete document 16 bytes
txnb Transaction begin 0 bytes
txnc Transaction commit 0 bytes
hol0 Hole 0 bytes