Skip to content

Commit 4a0d290

Browse files
authored
Merge pull request #6 from git-stunts/release/v2.0.0
release: v2.0.0 — compression, KDF, Merkle manifests
2 parents 933e33e + 2db4bbd commit 4a0d290

23 files changed

Lines changed: 2346 additions & 62 deletions

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [2.0.0] — M7 Horizon (2026-02-07)
11+
12+
### Added
13+
- **Compression support** (Task 7.1): Optional gzip compression pipeline via `compression: { algorithm: 'gzip' }` option on `store()`. Compression is applied before encryption when both are enabled. Manifests include a new optional `compression` field. Decompression on `restore()` is automatic.
14+
- **KDF support** (Task 7.2): Passphrase-based encryption using PBKDF2 or scrypt via `deriveKey()` method and `passphrase` option on `store()`/`restore()`. KDF parameters are stored in `manifest.encryption.kdf` for deterministic re-derivation. All three crypto adapters (Node, Bun, Web) implement `deriveKey()`.
15+
- **Merkle tree manifests** (Task 7.3): Large manifests (chunk count exceeding `merkleThreshold`, default 1000) are automatically split into sub-manifests stored as separate blobs. Root manifest uses `version: 2` with `subManifests` references. `readManifest()` transparently reconstitutes v2 manifests into flat chunk lists. Full backward compatibility with v1 manifests.
16+
- New schema fields: `version`, `compression`, `subManifests` on `ManifestSchema`; `kdf` on `EncryptionSchema`.
17+
- 52 new unit tests across three new test suites (compression, KDF, Merkle).
18+
- Updated API reference (`docs/API.md`), guide (`GUIDE.md`), and README with v2.0.0 feature documentation.
19+
20+
### Changed
21+
- **BREAKING**: Manifest schema now includes `version` field (defaults to 1). Existing v1 manifests are fully backward-compatible.
22+
- `CasService` constructor accepts new `merkleThreshold` option.
23+
- `ContentAddressableStore` constructor now accepts and forwards `merkleThreshold` to `CasService`.
24+
- `store()` and `storeFile()` accept `passphrase`, `kdfOptions`, and `compression` options.
25+
- `restore()` accepts `passphrase` option.
26+
27+
### Fixed
28+
- `storeFile()` now forwards `passphrase`, `kdfOptions`, and `compression` options to `store()` (previously silently dropped).
29+
- `NodeCryptoAdapter.deriveKey()` uses `Buffer.from(salt)` for base64 encoding, preventing corrupt output when salt is a `Uint8Array`.
30+
- `WebCryptoAdapter.deriveKey()` now validates KDF algorithm and throws for unsupported values instead of silently falling through to scrypt.
31+
- `WebCryptoAdapter` scrypt derivation now throws a descriptive error when `node:crypto` is unavailable (e.g. in browsers).
32+
1033
## [1.6.2] — OIDC publishing + JSR docs coverage (2026-02-07)
1134

1235
### Added

GUIDE.md

Lines changed: 214 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,13 @@ along from first principles to full mastery.
1818
7. [The CLI](#7-the-cli)
1919
8. [Lifecycle Management](#8-lifecycle-management)
2020
9. [Observability](#9-observability)
21-
10. [Architecture](#10-architecture)
22-
11. [Codec System](#11-codec-system)
23-
12. [Error Handling](#12-error-handling)
24-
13. [FAQ / Troubleshooting](#13-faq--troubleshooting)
21+
10. [Compression](#10-compression)
22+
11. [Passphrase Encryption (KDF)](#11-passphrase-encryption-kdf)
23+
12. [Merkle Manifests](#12-merkle-manifests)
24+
13. [Architecture](#13-architecture)
25+
14. [Codec System](#14-codec-system)
26+
15. [Error Handling](#15-error-handling)
27+
16. [FAQ / Troubleshooting](#16-faq--troubleshooting)
2528

2629
---
2730

@@ -756,7 +759,208 @@ await cas.verifyIntegrity(manifest);
756759

757760
---
758761

759-
## 10. Architecture
762+
## 10. Compression
763+
764+
*New in v2.0.0.*
765+
766+
`git-cas` supports optional gzip compression. When enabled, file content is
767+
compressed before encryption (if any) and before chunking. This reduces storage
768+
size for compressible data without changing the round-trip contract.
769+
770+
### Storing with Compression
771+
772+
Pass the `compression` option when storing:
773+
774+
```js
775+
const manifest = await cas.storeFile({
776+
filePath: './vacation.jpg',
777+
slug: 'photos/vacation',
778+
compression: { algorithm: 'gzip' },
779+
});
780+
781+
console.log(manifest.compression);
782+
// { algorithm: 'gzip' }
783+
```
784+
785+
The manifest gains an optional `compression` field recording the algorithm used.
786+
787+
### Compression + Encryption
788+
789+
Compression and encryption compose naturally. Compression runs first (on
790+
plaintext), then encryption runs on the compressed bytes:
791+
792+
```js
793+
const manifest = await cas.storeFile({
794+
filePath: './data.csv',
795+
slug: 'reports/q4',
796+
compression: { algorithm: 'gzip' },
797+
encryptionKey,
798+
});
799+
```
800+
801+
### Restoring Compressed Content
802+
803+
Decompression on `restore()` is automatic. If the manifest includes a
804+
`compression` field, the restored bytes are decompressed after decryption
805+
(if encrypted) and after chunk reassembly:
806+
807+
```js
808+
await cas.restoreFile({
809+
manifest,
810+
outputPath: './restored.csv',
811+
});
812+
// restored.csv is byte-identical to the original data.csv
813+
```
814+
815+
### When to Use Compression
816+
817+
Compression is most useful for text, CSV, JSON, XML, and other compressible
818+
formats. For already-compressed data (JPEG, PNG, MP4, ZIP), compression adds
819+
CPU cost without meaningful size reduction. Use your judgement.
820+
821+
---
822+
823+
## 11. Passphrase Encryption (KDF)
824+
825+
*New in v2.0.0.*
826+
827+
Instead of managing raw 32-byte encryption keys, you can derive keys from
828+
passphrases using standard key derivation functions (KDFs). `git-cas` supports
829+
PBKDF2 (default) and scrypt.
830+
831+
### Storing with a Passphrase
832+
833+
Pass `passphrase` instead of `encryptionKey`:
834+
835+
```js
836+
const manifest = await cas.storeFile({
837+
filePath: './vacation.jpg',
838+
slug: 'photos/vacation',
839+
passphrase: 'my secret passphrase',
840+
});
841+
842+
console.log(manifest.encryption.kdf);
843+
// {
844+
// algorithm: 'pbkdf2',
845+
// salt: 'base64-encoded-salt',
846+
// iterations: 100000,
847+
// keyLength: 32
848+
// }
849+
```
850+
851+
KDF parameters (salt, iterations, algorithm) are stored in the manifest's
852+
`encryption.kdf` field. The salt is generated randomly for each store
853+
operation.
854+
855+
### Restoring with a Passphrase
856+
857+
Provide the same passphrase on restore. The KDF parameters in the manifest
858+
are used to re-derive the key:
859+
860+
```js
861+
await cas.restoreFile({
862+
manifest,
863+
passphrase: 'my secret passphrase',
864+
outputPath: './restored.jpg',
865+
});
866+
```
867+
868+
A wrong passphrase produces a wrong key, which fails with `INTEGRITY_ERROR`
869+
(AES-256-GCM detects it).
870+
871+
### Using scrypt
872+
873+
Pass `kdfOptions` to select scrypt:
874+
875+
```js
876+
const manifest = await cas.storeFile({
877+
filePath: './secret.bin',
878+
slug: 'vault',
879+
passphrase: 'strong passphrase',
880+
kdfOptions: { algorithm: 'scrypt', cost: 16384 },
881+
});
882+
```
883+
884+
### Manual Key Derivation
885+
886+
For advanced workflows, derive the key yourself:
887+
888+
```js
889+
const { key, salt, params } = await cas.deriveKey({
890+
passphrase: 'my secret passphrase',
891+
algorithm: 'pbkdf2',
892+
iterations: 200000,
893+
});
894+
895+
// Use the derived key directly
896+
const manifest = await cas.storeFile({
897+
filePath: './vacation.jpg',
898+
slug: 'photos/vacation',
899+
encryptionKey: key,
900+
});
901+
```
902+
903+
### Supported KDF Algorithms
904+
905+
| Algorithm | Default Params | Notes |
906+
|-----------|---------------|-------|
907+
| `pbkdf2` (default) | 100,000 iterations, SHA-512 | Widely supported, good baseline |
908+
| `scrypt` | N=16384, r=8, p=1 | Memory-hard, stronger against GPU attacks |
909+
910+
---
911+
912+
## 12. Merkle Manifests
913+
914+
*New in v2.0.0.*
915+
916+
When storing very large files, the manifest (which lists every chunk) can
917+
itself become large. Merkle manifests solve this by splitting the chunk list
918+
into sub-manifests, each stored as a separate Git blob. The root manifest
919+
references sub-manifests by OID.
920+
921+
### How It Works
922+
923+
When the chunk count exceeds `merkleThreshold` (default: 1000), `git-cas`
924+
automatically:
925+
926+
1. Groups chunks into sub-manifests (each containing up to `merkleThreshold`
927+
chunks).
928+
2. Stores each sub-manifest as a Git blob.
929+
3. Writes a root manifest with `version: 2` and a `subManifests` array
930+
referencing the sub-manifest blob OIDs.
931+
932+
### Configuring the Threshold
933+
934+
Set `merkleThreshold` at construction time:
935+
936+
```js
937+
const cas = new ContentAddressableStore({
938+
plumbing: git,
939+
merkleThreshold: 500, // Split at 500 chunks instead of 1000
940+
});
941+
```
942+
943+
### Transparent Reconstitution
944+
945+
`readManifest()` transparently handles both v1 (flat) and v2 (Merkle)
946+
manifests. When it encounters a v2 manifest, it reads all sub-manifests,
947+
concatenates their chunk lists, and returns a flat `Manifest` object:
948+
949+
```js
950+
const manifest = await cas.readManifest({ treeOid });
951+
// Works identically whether the manifest is v1 or v2
952+
console.log(manifest.chunks.length); // Full chunk list, regardless of structure
953+
```
954+
955+
### Backward Compatibility
956+
957+
- v2 code reads v1 manifests without any changes.
958+
- v1 manifests (chunk count below threshold) continue to use the flat format.
959+
- The `version` field defaults to `1` for existing manifests.
960+
961+
---
962+
963+
## 13. Architecture
760964

761965
`git-cas` follows a hexagonal (ports and adapters) architecture. The domain
762966
logic in `CasService` has zero direct dependencies on Node.js, Git, or any
@@ -824,6 +1028,7 @@ class CryptoPort {
8241028
encryptBuffer(buffer, key) {} // Returns { buf, meta }
8251029
decryptBuffer(buffer, key, meta) {} // Returns Buffer
8261030
createEncryptionStream(key) {} // Returns { encrypt, finalize }
1031+
deriveKey(options) {} // Returns { key, salt, params } (v2.0.0)
8271032
}
8281033
```
8291034

@@ -889,7 +1094,7 @@ const cas = new ContentAddressableStore({
8891094

8901095
---
8911096

892-
## 11. Codec System
1097+
## 14. Codec System
8931098

8941099
### JSON Codec
8951100

@@ -978,7 +1183,7 @@ The manifest will be stored in the tree as `manifest.msgpack`.
9781183

9791184
---
9801185

981-
## 12. Error Handling
1186+
## 15. Error Handling
9821187

9831188
All errors thrown by `git-cas` are instances of `CasError`, which extends
9841189
`Error` with two additional properties:
@@ -1061,7 +1266,7 @@ try {
10611266

10621267
---
10631268

1064-
## 13. FAQ / Troubleshooting
1269+
## 16. FAQ / Troubleshooting
10651270

10661271
### Q: Does this work with bare repositories?
10671272

@@ -1175,7 +1380,7 @@ Every Git plumbing command is wrapped in a policy from `@git-stunts/alfred`.
11751380
The default policy applies a 30-second timeout and retries up to 2 times with
11761381
exponential backoff (100ms, then up to 2s). This handles transient filesystem
11771382
errors and lock contention gracefully. You can override the policy at
1178-
construction time (see Section 10).
1383+
construction time (see Section 13).
11791384

11801385
---
11811386

README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,26 @@ We use the object database.
2121
- **Dedupe for free** Git already hashes objects. We just lean into it.
2222
- **Chunked storage** big files become stable, reusable blobs.
2323
- **Optional AES-256-GCM encryption** store secrets without leaking plaintext into the ODB.
24+
- **Compression** gzip before encryption — smaller blobs, same round-trip.
25+
- **Passphrase encryption** derive keys from passphrases via PBKDF2 or scrypt — no raw key management.
26+
- **Merkle manifests** large files auto-split into sub-manifests for scalability.
2427
- **Manifests** a tiny explicit index of chunks + metadata (JSON/CBOR).
2528
- **Tree output** generates standard Git trees so assets snap into commits cleanly.
2629
- **Full round-trip** store, tree, and restore — get your bytes back, verified.
2730
- **Lifecycle management** `readManifest`, `deleteAsset`, `findOrphanedChunks` — inspect trees, plan deletions, audit storage.
2831

2932
**Use it for:** binary assets, build artifacts, model weights, data packs, secret bundles, weird experiments, etc.
3033

34+
## What's new in v2.0.0
35+
36+
**Compression**`compression: { algorithm: 'gzip' }` on `store()`. Compression runs before encryption. Decompression on `restore()` is automatic.
37+
38+
**Passphrase-based encryption** — Pass `passphrase` instead of `encryptionKey`. Keys are derived via PBKDF2 (default) or scrypt. KDF parameters are stored in the manifest for deterministic re-derivation. Use `deriveKey()` directly for manual control.
39+
40+
**Merkle tree manifests** — When chunk count exceeds `merkleThreshold` (default: 1000), manifests are automatically split into sub-manifests stored as separate blobs. `readManifest()` transparently reconstitutes them. Full backward compatibility with v1 manifests.
41+
42+
See [CHANGELOG.md](./CHANGELOG.md) for the full list of changes.
43+
3144
## Usage (Node API)
3245

3346
```js
@@ -56,6 +69,14 @@ const m = await cas.readManifest({ treeOid });
5669
// Lifecycle: inspect deletion impact, find orphaned chunks
5770
const { slug, chunksOrphaned } = await cas.deleteAsset({ treeOid });
5871
const { referenced, total } = await cas.findOrphanedChunks({ treeOids: [treeOid] });
72+
73+
// v2.0.0: Compressed + passphrase-encrypted store
74+
const manifest2 = await cas.storeFile({
75+
filePath: './image.png',
76+
slug: 'my-image',
77+
passphrase: 'my secret passphrase',
78+
compression: { algorithm: 'gzip' },
79+
});
5980
```
6081

6182
## CLI (git plugin)

ROADMAP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Return and throw semantics for every public method (current and planned).
134134
| v1.4.0 | M4 | Compass | Lifecycle management ||
135135
| v1.5.0 | M5 | Sonar | Observability ||
136136
| v1.6.0 | M6 | Cartographer | Documentation ||
137-
| v2.0.0 | M7 | Horizon | Advanced features | |
137+
| v2.0.0 | M7 | Horizon | Advanced features | |
138138

139139
---
140140

@@ -1461,7 +1461,7 @@ As a new user, I want runnable examples so I can integrate quickly and correctly
14611461

14621462
---
14631463

1464-
# M7 — Horizon (v2.0.0)
1464+
# M7 — Horizon (v2.0.0)
14651465
**Theme:** Advanced capabilities that may change manifest format; major version bump.
14661466

14671467
---

0 commit comments

Comments
 (0)