scout-diff-cache

Diff-aware context-caching proxy for Llama 4 Scout (10M context).

A local, OpenAI-compatible proxy that turns a 10M-token context window into a cheap, fast one. It watches your git repository, splits the codebase into an immutable snapshot (stable per commit) and the recent working diff (volatile), and reassembles every prompt so the big stable part sits at the front — exactly where an inference server's prefix / KV cache can reuse it.

client (OpenAI SDK)
      │  POST /v1/chat/completions
      ▼
┌─────────────────────────────────────────────────────────┐
│  scout-diff-cache                                         │
│                                                           │
│   git HEAD ──▶ snapshot (cached per commit)  ─┐           │
│   git diff ──▶ working diff (recomputed)      ├─▶ rebuilt │
│   client messages ────────────────────────────┘   prompt │
└─────────────────────────────────────────────────────────┘
      │  POST {TARGET_API_URL}/chat/completions
      ▼
upstream model server (vLLM / llama.cpp / TGI …)

Because the snapshot block is byte-identical for a given commit, the upstream server matches it as a cached prefix and only pays to process the small diff plus the live conversation — large latency and cost savings on a 10M-token codebase.

Why prefix ordering matters

Prefix caches match the longest identical leading span of tokens. The proxy therefore emits messages in this order:

#	Message	Volatility	Cacheable?
1	`system`: codebase snapshot `@<commit>`	changes only on commit	✅ yes
2	`system`: working diff	changes on every edit	❌ no
3…	original client messages	the live conversation	❌ no

Put the diff before the snapshot and you'd invalidate the cache on every keystroke. Order is the whole trick.

Quick start

Requirements: Node.js ≥ 20, a running OpenAI-compatible model server, and a git repository to watch.

npm install
cp .env.example .env          # then edit TARGET_API_URL / GIT_REPO_PATH
npm run dev                   # hot-reload dev server (tsx)
# or
npm run build && npm start    # production

Point any OpenAI client at the proxy:

curl http://127.0.0.1:8787/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "llama-4-scout",
    "stream": true,
    "messages": [{ "role": "user", "content": "Where is the cache invalidated?" }]
  }'

The proxy injects the snapshot + diff automatically — your client only sends the actual question.

Docker

A multi-stage Dockerfile (git included for simple-git, runs as a non-root user, with a /health healthcheck) and an example docker-compose.yml are provided.

Prebuilt image (GHCR)

Every push to main publishes a multi-tag image to the GitHub Container Registry via CI:

docker pull ghcr.io/nagayu/scout-diff-cache:latest
docker run --init -p 8787:8787 \
  -e TARGET_API_URL=http://host.docker.internal:8000/v1 \
  -e GIT_REPO_PATH=/repo \
  -v "$PWD:/repo:ro" \
  ghcr.io/nagayu/scout-diff-cache:latest

Build locally

docker compose up --build

Mount the repository you want injected at /repo and point TARGET_API_URL at your model server (use host.docker.internal to reach the host). Or run the image directly:

docker build -t scout-diff-cache .
docker run --init -p 8787:8787 \
  -e TARGET_API_URL=http://host.docker.internal:8000/v1 \
  -e GIT_REPO_PATH=/repo \
  -v "$PWD:/repo:ro" \
  scout-diff-cache

⚠️ Security & privacy

This proxy reads every tracked file in the repository and sends it to the upstream model server. Before pointing it at a repo, understand the implications:

Secrets in the repo leak. If your repository tracks a .env, private keys, credentials, or customer data, those bytes are embedded in the prompt and transmitted upstream. Add such paths to EXCLUDE_PATH_PATTERNS (it already excludes lockfiles and node_modules/), and prefer a model server you control. The proxy honors your exclude list but does not scan for secrets — that is your responsibility.
No authentication. The proxy itself is unauthenticated and binds to 127.0.0.1 by default. Do not bind to 0.0.0.0 on an untrusted network without putting an authenticating reverse proxy in front of it.
Trust the upstream. TARGET_API_KEY and your entire codebase are sent to TARGET_API_URL. Only use an endpoint you trust.

Known limitations

Snapshot reads the working tree, not HEAD. Clean files match HEAD exactly; uncommitted edits to a file appear in both the snapshot and the diff. This is a deliberate trade-off for speed and simplicity.
git diff HEAD excludes untracked file contents. New (untracked) files are listed under "changed files" but their contents are not in the patch until staged. Run git add to include them.
No timeout once a stream has started. UPSTREAM_TIMEOUT_MS bounds the time-to-first-byte; a stream that stalls mid-response relies on the client to disconnect (which the proxy detects and propagates upstream).

Configuration

All configuration is via environment variables (validated at startup — the process refuses to boot on a bad value). See .env.example for the full list with defaults. Key ones:

Variable	Default	Purpose
`PORT` / `HOST`	`8787` / `127.0.0.1`	where the proxy listens
`TARGET_API_URL`	`http://127.0.0.1:8000/v1`	upstream OpenAI-compatible base URL
`TARGET_API_KEY`	—	optional bearer token forwarded upstream
`GIT_REPO_PATH`	`cwd`	repository to inject as context
`CACHE_TTL_MS`	`300000`	snapshot freshness window
`MAX_FILE_BYTES`	`524288`	per-file embed ceiling
`MAX_SNAPSHOT_BYTES`	`67108864`	total embed memory guard
`INJECT_CONTEXT`	`true`	set `false` for transparent pass-through
`EXCLUDE_PATH_PATTERNS`	`node_modules/,dist/,…`	paths to omit from the snapshot

API

Method & path	Description
`POST /v1/chat/completions`	OpenAI-compatible; streaming + non-streaming
`GET /v1/models`	advertises `llama-4-scout`
`GET /health`	liveness probe

Errors use the OpenAI error envelope ({ "error": { message, type, code } }) with precise status codes: 400 invalid request, 500 git failure, 502 upstream error, 503 upstream unreachable, 504 upstream timeout.

Project layout

src/
  index.ts              entry point + graceful shutdown
  server.ts             Fastify app, routes, SSE streaming, error handler
  config/index.ts       env loading & validation (zod)
  types/index.ts        all shared type definitions
  utils/
    git.ts              repo inspection, snapshot & diff extraction
    logger.ts           pino structured logging
    errors.ts           typed AppError hierarchy
  services/
    cache.ts            single-slot snapshot cache (commit + TTL)
    context.ts          orchestrates git + cache (build coalescing)
    promptBuilder.ts    message reconstruction
    proxy.ts            upstream forwarding (timeout/abort aware)
    validation.ts       request validation (zod)
tests/                  vitest unit tests

Development

npm run typecheck   # strict tsc, no emit
npm run test        # vitest
npm run lint        # eslint

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scout-diff-cache

Why prefix ordering matters

Quick start

Docker

Prebuilt image (GHCR)

Build locally

⚠️ Security & privacy

Known limitations

Configuration

API

Project layout

Development

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scout-diff-cache

Why prefix ordering matters

Quick start

Docker

Prebuilt image (GHCR)

Build locally

⚠️ Security & privacy

Known limitations

Configuration

API

Project layout

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages