Skip to content

feat(indexer): watchfiles-based auto-reindex of VAULT_PATH#4

Open
tricamtech wants to merge 1 commit into
thebackpackdevorg:mainfrom
tricamtech:feat/watchfiles-auto-reindex
Open

feat(indexer): watchfiles-based auto-reindex of VAULT_PATH#4
tricamtech wants to merge 1 commit into
thebackpackdevorg:mainfrom
tricamtech:feat/watchfiles-auto-reindex

Conversation

@tricamtech
Copy link
Copy Markdown

Problem

vault-mcp's incremental indexing only runs in two situations:

  1. At server startupindex_all does an mtime scan and re-embeds anything that changed since last run.
  2. When the server's own write tools modify a filevault_write and vault_edit call indexer.index_file() inline.

External changes are invisible:

  • git pull bringing in new content from a synced repo
  • Manual edits in Obsidian, VS Code, vim, etc.
  • Sync scripts (rsync, syncthing, nextcloud, etc.)
  • Another process editing the markdown directly

For any deployment where the vault directory is shared with anything other than vault-mcp itself — which is most of them, since markdown vaults are usually edited directly by humans or version-controlled with git — the index goes stale fast. The only fix today is systemctl restart, which is disruptive (especially for the OAuth-enabled deployment shape, where restart wipes the in-memory token store and forces re-auth on every connected client).

Fix

Add a background asyncio task that uses watchfiles to subscribe to filesystem events under VAULT_PATH and react in real time:

Event Action
.md file added indexer.index_file(rel_path)
.md file modified indexer.index_file(rel_path)
.md file deleted indexer._delete_files_chunks([rel_path])
Non-.md event ignored
Watcher exception per-file logged + skipped, watcher continues
Watcher crashes globally logged via logger.exception, server keeps running with auto-reindex disabled

The watcher starts immediately after index_all completes (at the end of indexer.start()), so the server is fully ready before the watcher starts emitting events.

Configuration

  • VAULT_WATCH env var (default true). Set VAULT_WATCH=false/0/no/off to disable the watcher entirely (revert to the existing restart-to-pick-up behavior).
  • VaultIndexer(..., watch_files=True) constructor arg so library users can opt out without env vars.
  • watchfiles>=0.21 added as an explicit dep in pyproject.toml. It was already pulled in transitively via uvicorn, but pinning it explicitly means the import won't break if uvicorn drops it later.

Verification

Full add/modify/delete cycle on a live server, no restart between steps:

$ mkdir /tmp/vault && echo '# initial' > /tmp/vault/a.md
$ VAULT_PATH=/tmp/vault SERVER_HOST=127.0.0.1 SERVER_PORT=8794 \
    python -m vault_mcp.server &
... INFO Filesystem watcher started on /tmp/vault

# Add
$ echo '# new file with Shadowfax the horse' > /tmp/vault/b.md
... INFO Watcher: re-indexing b.md (added)
$ vault_search "Shadowfax"
  → b.md, score 0.752

# Modify
$ echo '# revised, mentions Wisteria Lane' > /tmp/vault/a.md
... INFO Watcher: re-indexing a.md (modified)
$ vault_search "Wisteria"
  → a.md with new content, score 0.771
$ vault_search "old content phrase"
  → a.md no longer matches the old content

# Delete
$ rm /tmp/vault/b.md
... INFO Watcher: removing b.md
$ vault_search "Shadowfax"
  → only weak fallback match, b.md gone from the index

Notes

Found while deploying vault-mcp behind a Cloudflare Tunnel for cross-device access from claude.ai web. The vault is also git-synced from a separate machine, so the "restart to pick up new content" UX would have been painful — especially with OAuth enabled, since that flow wipes the in-memory token store on restart and requires re-auth in the connector dialog every time.

Sister to #1 (vault_reindex guard), #2 (MCP_ALLOWED_HOSTS), and #3 (PIN field maxlength).

vault-mcp's incremental indexing currently only runs:

- at server startup (full mtime scan via index_all)
- when its own write tools modify a file (vault_write/vault_edit
  call indexer.index_file inline)

External changes — git pull, manual edits in Obsidian, sync scripts,
another process editing the markdown — are invisible until the next
server restart. For deployments where the vault directory is shared
with anything other than vault-mcp itself (which is most of them),
the index goes stale fast.

Add a background asyncio task that uses watchfiles (already a
transitive dep via uvicorn) to subscribe to filesystem events under
VAULT_PATH and react in real time:

- added or modified .md → indexer.index_file(rel_path)
- deleted .md          → indexer._delete_files_chunks([rel_path])
- non-.md events       → ignored
- watcher crashes      → logged, server keeps running (auto-reindex
                         silently disabled until restart)

The watcher starts immediately after the initial index_all completes
(at the end of indexer.start()), so the server is fully ready before
the watcher starts emitting events. Errors per-file are caught and
logged so a single bad chunk doesn't take down the watcher.

Configuration:

- New env var VAULT_WATCH (default: true). Set VAULT_WATCH=false to
  disable the watcher entirely if you want the old behavior.
- New constructor arg VaultIndexer(..., watch_files=True) so library
  users can opt out without env vars.
- watchfiles added as an explicit dependency in pyproject.toml. It
  was already pulled in transitively via uvicorn, but pinning it
  explicitly means the import won't break if uvicorn drops it.

Verification — full add/modify/delete cycle on a live server, no
restart between steps:

  $ mkdir /tmp/vault && echo '# initial' > /tmp/vault/a.md
  $ VAULT_PATH=/tmp/vault SERVER_HOST=127.0.0.1 SERVER_PORT=8794 \
      python -m vault_mcp.server &
  ... Filesystem watcher started on /tmp/vault

  # Add
  $ echo '# new file with Shadowfax the horse' > /tmp/vault/b.md
  ... Watcher: re-indexing b.md (added)
  $ vault_search "Shadowfax" → returns b.md, score 0.752

  # Modify
  $ echo '# revised, mentions Wisteria Lane' > /tmp/vault/a.md
  ... Watcher: re-indexing a.md (modified)
  $ vault_search "Wisteria" → returns a.md with new content, 0.771
  $ vault_search "old content phrase" → no longer matches a.md

  # Delete
  $ rm /tmp/vault/b.md
  ... Watcher: removing b.md
  $ vault_search "Shadowfax" → only weak fallback match, b.md gone

Found while deploying vault-mcp behind a Cloudflare Tunnel for
cross-device access. The vault is also git-synced from a separate
machine, so the "restart to pick up new content" UX would have been
painful. Sister to thebackpackdevorg#1 (vault_reindex guard), thebackpackdevorg#2 (MCP_ALLOWED_HOSTS),
and thebackpackdevorg#3 (PIN field maxlength).
tricamtech pushed a commit to tricamtech/vault-mcp-server that referenced this pull request Apr 8, 2026
The SimpleOAuthProvider stores everything in in-memory dicts:

  self._clients: dict[str, OAuthClientInformationFull] = {}
  self._access_tokens: dict[str, AccessToken] = {}
  self._refresh_tokens: dict[str, RefreshToken] = {}
  self._auth_codes: dict[str, AuthorizationCode] = {}
  self._pending: dict[str, dict] = {}

Restart wipes all of them. For deployments where the only consumer is
a single Claude Desktop instance running on the same machine that's
restarted manually, this is fine — that's the use case the docstring
explicitly calls out:

  Stores clients, authorization codes, and tokens in memory.
  Tokens survive until server restart (acceptable for personal use).

But for the deployment shape the README's "Remote deployment" section
recommends — vault-mcp running as a long-lived systemd service behind
a reverse proxy, with claude.ai web / Claude desktop / multiple Claude
Code clients all connected through dynamically-registered OAuth — a
restart silently invalidates every client's stored bearer token. The
next request from each client returns 401, and the user has to walk
through the PIN approval flow again per-client. Even worse: if the
restart happens with an approval page open in a tab, the new process
has no _pending entry for the request_id, so the user gets "Invalid
or expired authorization request" with no clue that the entire
state was wiped, not just their pending request.

I hit this several times today running the public instance behind a
Cloudflare Tunnel for cross-device access. Every PIN rotation, every
config change requiring `systemctl restart`, every accidental crash
required re-authing in claude.ai's connector dialog.

## Fix

Add optional disk persistence for the long-lived state:

- New constructor arg: `state_path: Path | None = None`
- New env var:        `OAUTH_STATE_PATH`
- When set, the provider loads `_clients`, `_access_tokens`, and
  `_refresh_tokens` from a JSON file on init, and writes them back
  after every mutation.

What is persisted:
- Registered DCR clients (`_clients`)
- Issued access tokens (`_access_tokens`)
- Issued refresh tokens (`_refresh_tokens`)

What is NOT persisted (intentionally):
- The per-process `cf_bypass_token` — regenerated each start so it
  can never be exfiltrated from a stale state file. The save filter
  `if k != self._cf_token` excludes it explicitly.
- `_auth_codes` — short-lived (5 minute expiry by design), no value
  in persisting
- `_pending` — even shorter-lived (in-flight authorization requests),
  no value in persisting; surviving across restart could even be
  confusing (user clicks Approve on a half-dead request_id)

Implementation details:

- Atomic writes via `tempfile.mkstemp` + `os.replace` so a crash
  mid-write can't corrupt the state file
- File mode 0600 (root-only readable, like /root/.env)
- `asyncio.Lock` (`_save_lock`) serializes concurrent saves
- Actual file I/O dispatched to the default executor to avoid
  blocking the event loop
- Save errors are logged via `logger.exception` and swallowed —
  we never want a transient disk issue to block an auth flow
- Load errors fall back to empty state with a logged warning,
  so a corrupted state file doesn't take down the server
- Refresh-token rotation now correctly drops the OLD refresh token
  from the dict (it wasn't being cleaned up before, just orphaned;
  now it's removed and the save reflects that)
- `version: 1` field in the state file for future schema migrations

## Verification

End-to-end on a live server:

  $ OAUTH_STATE_PATH=/tmp/state.json OAUTH_PIN=test1234 ... \
      python -m vault_mcp.server &
  ... INFO OAuth provider initialized — state path: /tmp/state.json,
           loaded 0 clients / 0 access tokens / 0 refresh tokens

  $ curl -X POST http://127.0.0.1:8795/register \
      -d '{"client_name":"test","redirect_uris":["http://localhost/cb"]}'
  → client_id: 567926af-...

  $ ls -la /tmp/state.json
  -rw------- 1 root root 719 ... /tmp/state.json
  $ jq '.clients | keys' /tmp/state.json
  ["567926af-..."]

  # Kill and restart
  $ kill %1 && python -m vault_mcp.server &
  ... INFO OAuth provider initialized — state path: /tmp/state.json,
           loaded 1 clients / 0 access tokens / 0 refresh tokens
                  ^^^^^^^^^

  # Register a second client → both are now persisted
  $ curl -X POST http://127.0.0.1:8795/register ...
  $ jq '.clients | keys' /tmp/state.json
  ["567926af-...", "bff56b80-..."]

The same flow with real access/refresh tokens (after walking the
full Authorization Code grant via the browser) preserves token
validity across restart — the connecting client doesn't need to
re-auth after a service restart.

## Backwards compatibility

`state_path` defaults to `None`. Without `OAUTH_STATE_PATH` set in
the environment, behavior is byte-identical to before this commit:
all state in memory, wiped on restart, no disk I/O, no new files.
Existing deployments are unaffected.

Sister to thebackpackdevorg#1 (vault_reindex guard), thebackpackdevorg#2 (MCP_ALLOWED_HOSTS),
thebackpackdevorg#3 (PIN field maxlength), and thebackpackdevorg#4 (watchfiles auto-reindex).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant