Skip to content

Commit b925907

Browse files
committed
Prepare for v0.1.0 release: README, CHANGELOG, CITATION, runbook
Release prep that doesn't itself ship a tag --- the actual cut follows RELEASING.md and bumps the version fields in pyproject.toml and rust/Cargo.toml when ready to publish. README.md Add Rust quick-start alongside Python so the byte-compatible-port story is visible from the front page; pull in the Zenodo DOI badge and a citation block pointing at the companion preprint (10.5281/zenodo.20058256); fix two GitHub org links that pointed at thirdkey/ rather than ThirdKeyAI/; soften the Isolation-Forest-as-defense paragraph to match the paper's tighter framing (catches distribution-shifting attacks but not rotation; brittle against adaptive attackers). CHANGELOG.md New file. Documents v0.1.0 in the Keep-a-Changelog format. CITATION.cff CFF v1.2.0 with the preprint DOI in the references block. GitHub auto-renders this in the right-rail "Cite this repository" widget. RELEASING.md New runbook with the pre-release checklist, version-bump locations (pyproject.toml, rust/Cargo.toml, CITATION.cff), tagging steps, PyPI/crates.io publish commands, and a yank procedure. Includes the test-vector drift check as a release gate. scripts/generate_test_vectors.py Lint cleanup (timezone import path, line length). Output is byte-identical to the committed fixtures.
1 parent f69492a commit b925907

5 files changed

Lines changed: 298 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Changelog
2+
3+
All notable changes to VectorPin will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [0.1.0] — 2026-05-06
9+
10+
Initial public release. Protocol version: 1.
11+
12+
### Added
13+
14+
#### Core protocol
15+
- `Pin` and `PinHeader` attestation format with sorted-key, no-whitespace
16+
canonical JSON encoding for deterministic signing.
17+
- SHA-256 over UTF-8 NFC-normalized source text.
18+
- SHA-256 over canonical little-endian f32/f64 vector bytes.
19+
- Ed25519 signing and verification.
20+
- URL-safe base64 (no padding) wire encoding for signatures.
21+
- Wire-format specification at [`docs/spec.md`](docs/spec.md), self-contained
22+
for cross-language reimplementation.
23+
24+
#### Python implementation (`src/vectorpin/`)
25+
- `Signer.generate(key_id)` and `Signer.from_private_bytes(raw, key_id)`.
26+
- `Signer.pin(source, model, vector)` returning a signed `Pin`.
27+
- `Verifier(public_keys)` with structured `VerificationResult` outcomes:
28+
`OK`, `UNSUPPORTED_VERSION`, `UNKNOWN_KEY`, `SIGNATURE_INVALID`,
29+
`VECTOR_TAMPERED`, `SOURCE_MISMATCH`, `MODEL_MISMATCH`, `SHAPE_MISMATCH`.
30+
- Multi-key registry for rotation support.
31+
- `Pin.to_json()` / `Pin.from_json()` round-trip.
32+
33+
#### Rust implementation (`rust/vectorpin/`)
34+
- Byte-for-byte compatible with the Python reference.
35+
- Same canonical bytes, same Ed25519 signatures.
36+
- `Signer`, `Verifier`, `Pin`, `PinHeader` types with the same
37+
failure-mode taxonomy.
38+
- `cargo test` passes 23 unit tests + 2 cross-language tests + 1 doctest.
39+
40+
#### Cross-language test vectors (`testvectors/`)
41+
- `v1.json`: positive fixtures with deterministic seed, consumed by both
42+
Python and Rust test suites.
43+
- `negative_v1.json`: tamper-detection fixture.
44+
- CI workflow regenerates fixtures on every Python-side change and
45+
fails on byte drift, preventing silent compatibility breakage.
46+
47+
#### Adapters and detectors
48+
- `QdrantAdapter`: production Qdrant integration via `qdrant-client`.
49+
Lazily imported; install with `pip install 'vectorpin[qdrant]'`.
50+
- `IsolationForestDetector` and `OneClassSVMDetector`: defensive
51+
baselines from sklearn. Lazily imported; install with
52+
`pip install 'vectorpin[detectors]'`.
53+
54+
#### CLI (`vectorpin`)
55+
- `keygen`: generate Ed25519 key pairs.
56+
- `pin`: sign a (text, vector) pair.
57+
- `verify-pin`: verify a pin against ground-truth source/vector.
58+
- `audit-qdrant`: walk a Qdrant collection and report on every record.
59+
60+
#### Documentation
61+
- README with Python and Rust quick-start.
62+
- `docs/spec.md` — protocol v1 specification.
63+
- `examples/basic_usage.py` and `examples/basic_usage.rs`.
64+
- Companion preprint (Zenodo DOI
65+
[10.5281/zenodo.20058256](https://doi.org/10.5281/zenodo.20058256))
66+
documenting the threat model and defended attack class.
67+
68+
### Known limitations
69+
70+
- Adapter coverage is partial: Qdrant only. FAISS, Pinecone, Chroma,
71+
and pgvector adapters are planned for v0.2.
72+
- TypeScript and Go ports are planned but not yet shipped.
73+
- Record-id and collection-id binding currently lives under the
74+
`extra` field; promotion to top-level fields is a candidate for
75+
protocol v1.1.
76+
77+
[0.1.0]: https://github.com/ThirdKeyAI/VectorPin/releases/tag/v0.1.0

CITATION.cff

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
cff-version: 1.2.0
2+
message: "If you use VectorPin in your work, please cite both the software and the companion preprint."
3+
type: software
4+
title: "VectorPin: Verifiable integrity for AI embedding stores"
5+
authors:
6+
- family-names: Wanger
7+
given-names: Jascha
8+
affiliation: "ThirdKey / Tarnover, LLC"
9+
email: jascha@thirdkey.ai
10+
abstract: >-
11+
VectorPin is a cryptographic provenance protocol for embeddings stored in
12+
vector databases. Each embedding is bound to its source content and producing
13+
model via an Ed25519 signature over a canonical byte representation, and any
14+
post-embedding modification breaks signature verification on read. Reference
15+
implementations in Python and Rust are byte-for-byte compatible, locked
16+
together by shared test vectors. Part of the ThirdKey Trust Stack.
17+
version: "0.1.0"
18+
date-released: 2026-05-06
19+
keywords:
20+
- vector database
21+
- embedding store
22+
- retrieval-augmented generation
23+
- cryptographic provenance
24+
- Ed25519
25+
- integrity
26+
- tamper evidence
27+
- AI security
28+
license: Apache-2.0
29+
repository-code: "https://github.com/ThirdKeyAI/VectorPin"
30+
url: "https://thirdkey.ai"
31+
references:
32+
- type: article
33+
title: "VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense"
34+
authors:
35+
- family-names: Wanger
36+
given-names: Jascha
37+
year: 2026
38+
doi: "10.5281/zenodo.20058256"
39+
url: "https://doi.org/10.5281/zenodo.20058256"
40+
notes: "Companion preprint documenting the threat model and the empirical evaluation that motivates VectorPin."

README.md

Lines changed: 61 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,13 @@
44

55
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
66
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
7+
[![Rust stable](https://img.shields.io/badge/rust-stable-orange.svg)](https://www.rust-lang.org/)
78
[![Status: alpha](https://img.shields.io/badge/status-alpha-orange.svg)](#status)
9+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20058256.svg)](https://doi.org/10.5281/zenodo.20058256)
810

911
Vector databases are the new soft underbelly of the AI stack. Models trust them. Agents query them. Compliance audits don't yet ask about them. VectorPin pins every embedding to its source content and the model that produced it, then continuously verifies the store has not been tampered with — including covert steganographic modifications invisible to traditional DLP.
1012

11-
Part of the [ThirdKey](https://thirdkey.ai) Trust Stack, alongside [Symbiont](https://github.com/thirdkey/symbiont) (policy-governed agent runtime) and [SchemaPin](https://github.com/thirdkey/schemapin) (cryptographic tool verification).
13+
Part of the [ThirdKey](https://thirdkey.ai) Trust Stack, alongside [Symbiont](https://github.com/ThirdKeyAI/Symbiont) (policy-governed agent runtime) and [SchemaPin](https://github.com/ThirdKeyAI/SchemaPin) (cryptographic tool verification).
1214

1315
## Why this matters
1416

@@ -18,7 +20,7 @@ Modern RAG systems convert sensitive content into high-dimensional vectors and s
1820
- Don't verify integrity on read
1921
- Treat embeddings as opaque numerical artifacts
2022

21-
That's a giant attack surface. The [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using techniques that pass standard observability:
23+
That's a giant attack surface. The companion [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using techniques that pass standard observability:
2224

2325
- Noise injection, rotation, scaling, and offset perturbations
2426
- Cross-model fragmentation
@@ -28,6 +30,8 @@ Cryptographic pinning is the kill shot for these attacks. Every steganographic t
2830

2931
## Quick start
3032

33+
### Python
34+
3135
```bash
3236
pip install vectorpin
3337
```
@@ -53,6 +57,38 @@ if not result.ok:
5357
print(f"INTEGRITY FAILURE: {result.error.value}{result.detail}")
5458
```
5559

60+
### Rust
61+
62+
```toml
63+
[dependencies]
64+
vectorpin = "0.1"
65+
```
66+
67+
```rust
68+
use vectorpin::{Signer, Verifier};
69+
70+
let signer = Signer::generate("prod-2026-05".to_string());
71+
let embedding: Vec<f32> = my_model_embed("The quick brown fox.");
72+
let pin = signer.pin(
73+
"The quick brown fox.",
74+
"text-embedding-3-large",
75+
embedding.as_slice(),
76+
)?;
77+
78+
let mut verifier = Verifier::new();
79+
verifier.add_key(signer.key_id(), signer.public_key_bytes());
80+
81+
let result = verifier.verify_full::<&[f32]>(
82+
&pin,
83+
Some("The quick brown fox."),
84+
Some(embedding.as_slice()),
85+
None,
86+
);
87+
assert!(result.is_ok());
88+
```
89+
90+
The Python and Rust implementations are byte-for-byte compatible. A pin produced by either side verifies on both, enforced by shared test vectors at [`testvectors/v1.json`](testvectors/) consumed in both test suites.
91+
5692
## What VectorPin guarantees
5793

5894
Each Pin commits to:
@@ -128,7 +164,7 @@ detector = IsolationForestDetector().fit(clean_embeddings)
128164
flagged = detector.decide(suspect_embeddings)
129165
```
130166

131-
In the VectorSmuggle empirical study, this single line of defense flagged every operating point of every steganographic technique that hides a non-trivial amount of data, with TPR@1%FPR ≥ 0.79 for all noise-based attacks.
167+
In the VectorSmuggle empirical study, this single line of defense flagged every operating point of every distribution-shifting steganographic technique that hides a non-trivial amount of data — but it does not catch orthogonal rotation (which preserves every density feature the detector fits on) and is brittle against attackers who know the detector. Cryptographic pinning is the durable layer; statistical detection is defense-in-depth.
132168

133169
## Threat model
134170

@@ -146,15 +182,32 @@ VectorPin does **not** defend against:
146182

147183
## Status
148184

149-
Alpha. Core protocol (`Pin`, `Signer`, `Verifier`) is stable and tested. Adapter coverage is partial. Hosted attestation service is not yet available.
185+
Alpha (`v0.1`). Core protocol (`Pin`, `Signer`, `Verifier`) is stable and tested. Python and Rust ports are byte-for-byte compatible and locked together by shared test vectors in CI. Adapter coverage is partial. Hosted attestation service is not yet available.
186+
187+
The protocol version field (`v: 1`) lets future revisions break compatibility cleanly. We will not break existing pins without bumping the major version. See [`docs/spec.md`](docs/spec.md) for the wire-format specification.
188+
189+
## Citation
150190

151-
The protocol version field (`v: 1`) lets future revisions break compatibility cleanly. We will not break existing pins without bumping the major version.
191+
If you reference VectorPin or the threat model it defends against, please cite the companion preprint:
192+
193+
> Wanger, J. (2026). *VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense*. Zenodo. <https://doi.org/10.5281/zenodo.20058256>
194+
195+
```bibtex
196+
@misc{wanger2026vectorsmuggle,
197+
title = {{VectorSmuggle}: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense},
198+
author = {Wanger, Jascha},
199+
year = {2026},
200+
publisher = {Zenodo},
201+
doi = {10.5281/zenodo.20058256},
202+
url = {https://doi.org/10.5281/zenodo.20058256}
203+
}
204+
```
152205

153206
## Related work
154207

155-
- [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) — companion threat-research project demonstrating the attacks VectorPin defends against.
156-
- [Symbiont](https://github.com/thirdkey/symbiont) — policy-governed agent runtime; consumes VectorPin attestations to enforce "agents may only retrieve from verified vector stores."
157-
- [SchemaPin](https://github.com/thirdkey/schemapin) — sister project doing the same kind of cryptographic provenance for tool schemas in MCP.
208+
- [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) — companion threat-research project demonstrating the attacks VectorPin defends against. Empirical results in the linked Zenodo preprint.
209+
- [Symbiont](https://github.com/ThirdKeyAI/Symbiont) — policy-governed agent runtime; consumes VectorPin attestations to enforce "agents may only retrieve from verified vector stores."
210+
- [SchemaPin](https://github.com/ThirdKeyAI/SchemaPin) — sister project doing the same kind of cryptographic provenance for tool schemas in MCP.
158211
- [sigstore](https://www.sigstore.dev/) — inspired our approach to OSS-friendly cryptographic provenance.
159212

160213
## Contributing

RELEASING.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Releasing VectorPin
2+
3+
Cutting a release ships an updated Python wheel to PyPI and an updated
4+
Rust crate to crates.io. Both ports must remain byte-for-byte compatible
5+
with the published `testvectors/v1.json`, so the release is gated on
6+
the cross-language test suite passing.
7+
8+
## Versioning
9+
10+
We follow semver. The protocol-version field (`v: 1` in the wire
11+
format) is independent from the package version:
12+
13+
- **Protocol major bump** — incompatible wire format. v1 verifiers
14+
must reject v2 pins. Triggers a `vectorpin` major-version bump.
15+
- **Protocol minor bump** — additive changes (new optional fields,
16+
new dtype identifiers, new signature algorithms with new
17+
identifiers). Old verifiers continue to verify old pins. Triggers
18+
a `vectorpin` minor-version bump.
19+
- **Package patch bump** — bug fixes, dependency updates, doc-only
20+
changes. No protocol change.
21+
22+
## Pre-release checklist
23+
24+
Run all of these and only proceed when each is clean.
25+
26+
```bash
27+
# 1. Python: lint + tests
28+
source venv/bin/activate
29+
ruff check .
30+
pytest -v
31+
32+
# 2. Rust: fmt + clippy + tests
33+
cd rust
34+
cargo fmt --all -- --check
35+
cargo clippy -j2 --all-targets -- -D warnings
36+
cargo test -j2 --workspace
37+
cd ..
38+
39+
# 3. Regenerate cross-language test vectors and confirm no drift
40+
python scripts/generate_test_vectors.py
41+
git diff --quiet testvectors/ # must be silent
42+
```
43+
44+
## Cutting a release
45+
46+
1. **Update the version field in three places.** Bump
47+
`pyproject.toml` `[project] version`,
48+
`rust/Cargo.toml` `[workspace.package] version`, and the
49+
`version:` field in `CITATION.cff`. They must match.
50+
51+
2. **Update `CHANGELOG.md`.** Add a section for the new version
52+
describing what changed since the previous release. Include the
53+
release date in `YYYY-MM-DD` form.
54+
55+
3. **Commit the version bump as a single commit.**
56+
```
57+
git commit -am "Release vX.Y.Z"
58+
```
59+
60+
4. **Tag the commit.**
61+
```
62+
git tag -a vX.Y.Z -m "VectorPin vX.Y.Z"
63+
git push origin main vX.Y.Z
64+
```
65+
66+
5. **Build and publish the Python package.**
67+
```
68+
pip install --upgrade build twine
69+
python -m build # produces dist/vectorpin-X.Y.Z-*.whl and *.tar.gz
70+
twine check dist/*
71+
twine upload dist/*
72+
```
73+
74+
6. **Publish the Rust crate.**
75+
```
76+
cd rust/vectorpin
77+
cargo publish --dry-run # verify it would publish cleanly
78+
cargo publish
79+
cd ../..
80+
```
81+
82+
7. **Create the GitHub release.** The tag from step 4 will appear in
83+
the GitHub UI; convert it to a release with the changelog entry as
84+
the release notes. Attach `dist/vectorpin-X.Y.Z.tar.gz` for users
85+
who want a self-contained source archive.
86+
87+
8. **Update the companion preprint's `refs.bib`** to reference the
88+
tagged release if the paper is being revised.
89+
90+
## Post-release
91+
92+
- Watch for PyPI / crates.io install issues for ~24 hours.
93+
- Open follow-up issues for any planned next-version work that this
94+
release deferred.
95+
- If the protocol changed, tag the corresponding `testvectors/`
96+
release on the same git SHA so external implementations can fetch
97+
the correct fixtures.
98+
99+
## Yanking a release
100+
101+
If a published version contains a security or correctness bug:
102+
103+
```
104+
# PyPI
105+
twine yank vectorpin --version X.Y.Z --reason "<short reason>"
106+
107+
# crates.io
108+
cargo yank --version X.Y.Z
109+
```
110+
111+
Yanked versions remain installable via exact pin (so existing
112+
deployments don't break), but new resolutions skip them. Always
113+
release a fixed `X.Y.Z+1` immediately and update the changelog with
114+
the yank notice.

scripts/generate_test_vectors.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,12 @@
2626

2727
import base64
2828
import json
29+
from datetime import UTC
2930
from pathlib import Path
3031

3132
import numpy as np
3233

33-
from vectorpin import Pin, Signer
34+
from vectorpin import Signer
3435

3536
OUT_DIR = Path(__file__).resolve().parent.parent / "testvectors"
3637

@@ -65,22 +66,23 @@ def main() -> None:
6566
vec = make_vector(seed=i, dim=dim, dtype=dtype)
6667
# Use a fixed timestamp so the pin (and therefore the signature) is
6768
# bit-for-bit reproducible across runs.
68-
from datetime import datetime, timezone
69-
ts = datetime(2026, 5, 5, 12, 0, 0, tzinfo=timezone.utc)
69+
from datetime import datetime
70+
ts = datetime(2026, 5, 5, 12, 0, 0, tzinfo=UTC)
7071
pin = signer.pin(
7172
source=text,
7273
model=model,
7374
vector=vec,
7475
vec_dtype=dtype,
7576
timestamp=ts,
7677
)
78+
np_dtype = "<f4" if dtype == "f32" else "<f8"
7779
fixtures.append(
7880
{
7981
"name": f"vector_{i}",
8082
"input": {
8183
"source": text,
8284
"model": model,
83-
"vector_b64": b64url(vec.astype(f"<{'f4' if dtype == 'f32' else 'f8'}").tobytes()),
85+
"vector_b64": b64url(vec.astype(np_dtype).tobytes()),
8486
"vec_dtype": dtype,
8587
"vec_dim": dim,
8688
"timestamp": "2026-05-05T12:00:00Z",

0 commit comments

Comments
 (0)