Prepare for v0.1.0 release: README, CHANGELOG, CITATION, runbook

jaschadub · jaschadub · commit b92590738eb7 · 2026-05-06T12:36:01.000-07:00
Release prep that doesn't itself ship a tag --- the actual cut follows
RELEASING.md and bumps the version fields in pyproject.toml and
rust/Cargo.toml when ready to publish.

README.md
  Add Rust quick-start alongside Python so the byte-compatible-port
  story is visible from the front page; pull in the Zenodo DOI badge
  and a citation block pointing at the companion preprint
  (10.5281/zenodo.20058256); fix two GitHub org links that pointed
  at thirdkey/ rather than ThirdKeyAI/; soften the
  Isolation-Forest-as-defense paragraph to match the paper's tighter
  framing (catches distribution-shifting attacks but not rotation;
  brittle against adaptive attackers).

CHANGELOG.md
  New file. Documents v0.1.0 in the Keep-a-Changelog format.

CITATION.cff
  CFF v1.2.0 with the preprint DOI in the references block. GitHub
  auto-renders this in the right-rail "Cite this repository" widget.

RELEASING.md
  New runbook with the pre-release checklist, version-bump locations
  (pyproject.toml, rust/Cargo.toml, CITATION.cff), tagging steps,
  PyPI/crates.io publish commands, and a yank procedure. Includes
  the test-vector drift check as a release gate.

scripts/generate_test_vectors.py
  Lint cleanup (timezone import path, line length). Output is
  byte-identical to the committed fixtures.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,77 @@
+# Changelog
+
+All notable changes to VectorPin will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [0.1.0] — 2026-05-06
+
+Initial public release. Protocol version: 1.
+
+### Added
+
+#### Core protocol
+- `Pin` and `PinHeader` attestation format with sorted-key, no-whitespace
+  canonical JSON encoding for deterministic signing.
+- SHA-256 over UTF-8 NFC-normalized source text.
+- SHA-256 over canonical little-endian f32/f64 vector bytes.
+- Ed25519 signing and verification.
+- URL-safe base64 (no padding) wire encoding for signatures.
+- Wire-format specification at [`docs/spec.md`](docs/spec.md), self-contained
+  for cross-language reimplementation.
+
+#### Python implementation (`src/vectorpin/`)
+- `Signer.generate(key_id)` and `Signer.from_private_bytes(raw, key_id)`.
+- `Signer.pin(source, model, vector)` returning a signed `Pin`.
+- `Verifier(public_keys)` with structured `VerificationResult` outcomes:
+  `OK`, `UNSUPPORTED_VERSION`, `UNKNOWN_KEY`, `SIGNATURE_INVALID`,
+  `VECTOR_TAMPERED`, `SOURCE_MISMATCH`, `MODEL_MISMATCH`, `SHAPE_MISMATCH`.
+- Multi-key registry for rotation support.
+- `Pin.to_json()` / `Pin.from_json()` round-trip.
+
+#### Rust implementation (`rust/vectorpin/`)
+- Byte-for-byte compatible with the Python reference.
+- Same canonical bytes, same Ed25519 signatures.
+- `Signer`, `Verifier`, `Pin`, `PinHeader` types with the same
+  failure-mode taxonomy.
+- `cargo test` passes 23 unit tests + 2 cross-language tests + 1 doctest.
+
+#### Cross-language test vectors (`testvectors/`)
+- `v1.json`: positive fixtures with deterministic seed, consumed by both
+  Python and Rust test suites.
+- `negative_v1.json`: tamper-detection fixture.
+- CI workflow regenerates fixtures on every Python-side change and
+  fails on byte drift, preventing silent compatibility breakage.
+
+#### Adapters and detectors
+- `QdrantAdapter`: production Qdrant integration via `qdrant-client`.
+  Lazily imported; install with `pip install 'vectorpin[qdrant]'`.
+- `IsolationForestDetector` and `OneClassSVMDetector`: defensive
+  baselines from sklearn. Lazily imported; install with
+  `pip install 'vectorpin[detectors]'`.
+
+#### CLI (`vectorpin`)
+- `keygen`: generate Ed25519 key pairs.
+- `pin`: sign a (text, vector) pair.
+- `verify-pin`: verify a pin against ground-truth source/vector.
+- `audit-qdrant`: walk a Qdrant collection and report on every record.
+
+#### Documentation
+- README with Python and Rust quick-start.
+- `docs/spec.md` — protocol v1 specification.
+- `examples/basic_usage.py` and `examples/basic_usage.rs`.
+- Companion preprint (Zenodo DOI
+  [10.5281/zenodo.20058256](https://doi.org/10.5281/zenodo.20058256))
+  documenting the threat model and defended attack class.
+
+### Known limitations
+
+- Adapter coverage is partial: Qdrant only. FAISS, Pinecone, Chroma,
+  and pgvector adapters are planned for v0.2.
+- TypeScript and Go ports are planned but not yet shipped.
+- Record-id and collection-id binding currently lives under the
+  `extra` field; promotion to top-level fields is a candidate for
+  protocol v1.1.
+
+[0.1.0]: https://github.com/ThirdKeyAI/VectorPin/releases/tag/v0.1.0
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,40 @@
+cff-version: 1.2.0
+message: "If you use VectorPin in your work, please cite both the software and the companion preprint."
+type: software
+title: "VectorPin: Verifiable integrity for AI embedding stores"
+authors:
+  - family-names: Wanger
+    given-names: Jascha
+    affiliation: "ThirdKey / Tarnover, LLC"
+    email: jascha@thirdkey.ai
+abstract: >-
+  VectorPin is a cryptographic provenance protocol for embeddings stored in
+  vector databases. Each embedding is bound to its source content and producing
+  model via an Ed25519 signature over a canonical byte representation, and any
+  post-embedding modification breaks signature verification on read. Reference
+  implementations in Python and Rust are byte-for-byte compatible, locked
+  together by shared test vectors. Part of the ThirdKey Trust Stack.
+version: "0.1.0"
+date-released: 2026-05-06
+keywords:
+  - vector database
+  - embedding store
+  - retrieval-augmented generation
+  - cryptographic provenance
+  - Ed25519
+  - integrity
+  - tamper evidence
+  - AI security
+license: Apache-2.0
+repository-code: "https://github.com/ThirdKeyAI/VectorPin"
+url: "https://thirdkey.ai"
+references:
+  - type: article
+    title: "VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense"
+    authors:
+      - family-names: Wanger
+        given-names: Jascha
+    year: 2026
+    doi: "10.5281/zenodo.20058256"
+    url: "https://doi.org/10.5281/zenodo.20058256"
+    notes: "Companion preprint documenting the threat model and the empirical evaluation that motivates VectorPin."
diff --git a/README.md b/README.md
@@ -4,11 +4,13 @@
 
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
 [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![Rust stable](https://img.shields.io/badge/rust-stable-orange.svg)](https://www.rust-lang.org/)
 [![Status: alpha](https://img.shields.io/badge/status-alpha-orange.svg)](#status)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20058256.svg)](https://doi.org/10.5281/zenodo.20058256)
 
 Vector databases are the new soft underbelly of the AI stack. Models trust them. Agents query them. Compliance audits don't yet ask about them. VectorPin pins every embedding to its source content and the model that produced it, then continuously verifies the store has not been tampered with — including covert steganographic modifications invisible to traditional DLP.
 
-Part of the [ThirdKey](https://thirdkey.ai) Trust Stack, alongside [Symbiont](https://github.com/thirdkey/symbiont) (policy-governed agent runtime) and [SchemaPin](https://github.com/thirdkey/schemapin) (cryptographic tool verification).
+Part of the [ThirdKey](https://thirdkey.ai) Trust Stack, alongside [Symbiont](https://github.com/ThirdKeyAI/Symbiont) (policy-governed agent runtime) and [SchemaPin](https://github.com/ThirdKeyAI/SchemaPin) (cryptographic tool verification).
 
 ## Why this matters
 
@@ -18,7 +20,7 @@ Modern RAG systems convert sensitive content into high-dimensional vectors and s
 - Don't verify integrity on read
 - Treat embeddings as opaque numerical artifacts
 
-That's a giant attack surface. The [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using techniques that pass standard observability:
+That's a giant attack surface. The companion [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using techniques that pass standard observability:
 
 - Noise injection, rotation, scaling, and offset perturbations
 - Cross-model fragmentation
@@ -28,6 +30,8 @@ Cryptographic pinning is the kill shot for these attacks. Every steganographic t
 
 ## Quick start
 
+### Python
+
 ```bash
 pip install vectorpin
 ```
@@ -53,6 +57,38 @@ if not result.ok:
     print(f"INTEGRITY FAILURE: {result.error.value} — {result.detail}")
 ```
 
+### Rust
+
+```toml
+[dependencies]
+vectorpin = "0.1"
+```
+
+```rust
+use vectorpin::{Signer, Verifier};
+
+let signer = Signer::generate("prod-2026-05".to_string());
+let embedding: Vec<f32> = my_model_embed("The quick brown fox.");
+let pin = signer.pin(
+    "The quick brown fox.",
+    "text-embedding-3-large",
+    embedding.as_slice(),
+)?;
+
+let mut verifier = Verifier::new();
+verifier.add_key(signer.key_id(), signer.public_key_bytes());
+
+let result = verifier.verify_full::<&[f32]>(
+    &pin,
+    Some("The quick brown fox."),
+    Some(embedding.as_slice()),
+    None,
+);
+assert!(result.is_ok());
+```
+
+The Python and Rust implementations are byte-for-byte compatible. A pin produced by either side verifies on both, enforced by shared test vectors at [`testvectors/v1.json`](testvectors/) consumed in both test suites.
+
 ## What VectorPin guarantees
 
 Each Pin commits to:
@@ -128,7 +164,7 @@ detector = IsolationForestDetector().fit(clean_embeddings)
 flagged = detector.decide(suspect_embeddings)
 ```
 
-In the VectorSmuggle empirical study, this single line of defense flagged every operating point of every steganographic technique that hides a non-trivial amount of data, with TPR@1%FPR ≥ 0.79 for all noise-based attacks.
+In the VectorSmuggle empirical study, this single line of defense flagged every operating point of every distribution-shifting steganographic technique that hides a non-trivial amount of data — but it does not catch orthogonal rotation (which preserves every density feature the detector fits on) and is brittle against attackers who know the detector. Cryptographic pinning is the durable layer; statistical detection is defense-in-depth.
 
 ## Threat model
 
@@ -146,15 +182,32 @@ VectorPin does **not** defend against:
 
 ## Status
 
-Alpha. Core protocol (`Pin`, `Signer`, `Verifier`) is stable and tested. Adapter coverage is partial. Hosted attestation service is not yet available.
+Alpha (`v0.1`). Core protocol (`Pin`, `Signer`, `Verifier`) is stable and tested. Python and Rust ports are byte-for-byte compatible and locked together by shared test vectors in CI. Adapter coverage is partial. Hosted attestation service is not yet available.
+
+The protocol version field (`v: 1`) lets future revisions break compatibility cleanly. We will not break existing pins without bumping the major version. See [`docs/spec.md`](docs/spec.md) for the wire-format specification.
+
+## Citation
 
-The protocol version field (`v: 1`) lets future revisions break compatibility cleanly. We will not break existing pins without bumping the major version.
+If you reference VectorPin or the threat model it defends against, please cite the companion preprint:
+
+> Wanger, J. (2026). *VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense*. Zenodo. <https://doi.org/10.5281/zenodo.20058256>
+
+```bibtex
+@misc{wanger2026vectorsmuggle,
+  title  = {{VectorSmuggle}: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense},
+  author = {Wanger, Jascha},
+  year   = {2026},
+  publisher = {Zenodo},
+  doi    = {10.5281/zenodo.20058256},
+  url    = {https://doi.org/10.5281/zenodo.20058256}
+}
+```
 
 ## Related work
 
-- [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) — companion threat-research project demonstrating the attacks VectorPin defends against.
-- [Symbiont](https://github.com/thirdkey/symbiont) — policy-governed agent runtime; consumes VectorPin attestations to enforce "agents may only retrieve from verified vector stores."
-- [SchemaPin](https://github.com/thirdkey/schemapin) — sister project doing the same kind of cryptographic provenance for tool schemas in MCP.
+- [VectorSmuggle](https://github.com/jaschadub/VectorSmuggle) — companion threat-research project demonstrating the attacks VectorPin defends against. Empirical results in the linked Zenodo preprint.
+- [Symbiont](https://github.com/ThirdKeyAI/Symbiont) — policy-governed agent runtime; consumes VectorPin attestations to enforce "agents may only retrieve from verified vector stores."
+- [SchemaPin](https://github.com/ThirdKeyAI/SchemaPin) — sister project doing the same kind of cryptographic provenance for tool schemas in MCP.
 - [sigstore](https://www.sigstore.dev/) — inspired our approach to OSS-friendly cryptographic provenance.
 
 ## Contributing
diff --git a/RELEASING.md b/RELEASING.md
@@ -0,0 +1,114 @@
+# Releasing VectorPin
+
+Cutting a release ships an updated Python wheel to PyPI and an updated
+Rust crate to crates.io. Both ports must remain byte-for-byte compatible
+with the published `testvectors/v1.json`, so the release is gated on
+the cross-language test suite passing.
+
+## Versioning
+
+We follow semver. The protocol-version field (`v: 1` in the wire
+format) is independent from the package version:
+
+- **Protocol major bump** — incompatible wire format. v1 verifiers
+  must reject v2 pins. Triggers a `vectorpin` major-version bump.
+- **Protocol minor bump** — additive changes (new optional fields,
+  new dtype identifiers, new signature algorithms with new
+  identifiers). Old verifiers continue to verify old pins. Triggers
+  a `vectorpin` minor-version bump.
+- **Package patch bump** — bug fixes, dependency updates, doc-only
+  changes. No protocol change.
+
+## Pre-release checklist
+
+Run all of these and only proceed when each is clean.
+
+```bash
+# 1. Python: lint + tests
+source venv/bin/activate
+ruff check .
+pytest -v
+
+# 2. Rust: fmt + clippy + tests
+cd rust
+cargo fmt --all -- --check
+cargo clippy -j2 --all-targets -- -D warnings
+cargo test -j2 --workspace
+cd ..
+
+# 3. Regenerate cross-language test vectors and confirm no drift
+python scripts/generate_test_vectors.py
+git diff --quiet testvectors/  # must be silent
+```
+
+## Cutting a release
+
+1. **Update the version field in three places.** Bump
+   `pyproject.toml` `[project] version`,
+   `rust/Cargo.toml` `[workspace.package] version`, and the
+   `version:` field in `CITATION.cff`. They must match.
+
+2. **Update `CHANGELOG.md`.** Add a section for the new version
+   describing what changed since the previous release. Include the
+   release date in `YYYY-MM-DD` form.
+
+3. **Commit the version bump as a single commit.**
+   ```
+   git commit -am "Release vX.Y.Z"
+   ```
+
+4. **Tag the commit.**
+   ```
+   git tag -a vX.Y.Z -m "VectorPin vX.Y.Z"
+   git push origin main vX.Y.Z
+   ```
+
+5. **Build and publish the Python package.**
+   ```
+   pip install --upgrade build twine
+   python -m build           # produces dist/vectorpin-X.Y.Z-*.whl and *.tar.gz
+   twine check dist/*
+   twine upload dist/*
+   ```
+
+6. **Publish the Rust crate.**
+   ```
+   cd rust/vectorpin
+   cargo publish --dry-run   # verify it would publish cleanly
+   cargo publish
+   cd ../..
+   ```
+
+7. **Create the GitHub release.** The tag from step 4 will appear in
+   the GitHub UI; convert it to a release with the changelog entry as
+   the release notes. Attach `dist/vectorpin-X.Y.Z.tar.gz` for users
+   who want a self-contained source archive.
+
+8. **Update the companion preprint's `refs.bib`** to reference the
+   tagged release if the paper is being revised.
+
+## Post-release
+
+- Watch for PyPI / crates.io install issues for ~24 hours.
+- Open follow-up issues for any planned next-version work that this
+  release deferred.
+- If the protocol changed, tag the corresponding `testvectors/`
+  release on the same git SHA so external implementations can fetch
+  the correct fixtures.
+
+## Yanking a release
+
+If a published version contains a security or correctness bug:
+
+```
+# PyPI
+twine yank vectorpin --version X.Y.Z --reason "<short reason>"
+
+# crates.io
+cargo yank --version X.Y.Z
+```
+
+Yanked versions remain installable via exact pin (so existing
+deployments don't break), but new resolutions skip them. Always
+release a fixed `X.Y.Z+1` immediately and update the changelog with
+the yank notice.
diff --git a/scripts/generate_test_vectors.py b/scripts/generate_test_vectors.py
@@ -26,11 +26,12 @@
 
 import base64
 import json
+from datetime import UTC
 from pathlib import Path
 
 import numpy as np
 
-from vectorpin import Pin, Signer
+from vectorpin import Signer
 
 OUT_DIR = Path(__file__).resolve().parent.parent / "testvectors"
 
@@ -65,22 +66,23 @@ def main() -> None:
         vec = make_vector(seed=i, dim=dim, dtype=dtype)
         # Use a fixed timestamp so the pin (and therefore the signature) is
         # bit-for-bit reproducible across runs.
-        from datetime import datetime, timezone
-        ts = datetime(2026, 5, 5, 12, 0, 0, tzinfo=timezone.utc)
+        from datetime import datetime
+        ts = datetime(2026, 5, 5, 12, 0, 0, tzinfo=UTC)
         pin = signer.pin(
             source=text,
             model=model,
             vector=vec,
             vec_dtype=dtype,
             timestamp=ts,
         )
+        np_dtype = "<f4" if dtype == "f32" else "<f8"
         fixtures.append(
             {
                 "name": f"vector_{i}",
                 "input": {
                     "source": text,
                     "model": model,
-                    "vector_b64": b64url(vec.astype(f"<{'f4' if dtype == 'f32' else 'f8'}").tobytes()),
+                    "vector_b64": b64url(vec.astype(np_dtype).tobytes()),
                     "vec_dtype": dtype,
                     "vec_dim": dim,
                     "timestamp": "2026-05-05T12:00:00Z",