gitsema

A content-addressed semantic index synchronized with Git's object model.

Gitsema walks your Git history, embeds every blob, and lets you semantically search your codebase — including across time. It treats blob hashes as the unit of identity, so identical content is only embedded once regardless of how many commits reference it.

For a comprehensive feature catalog see docs/features.md. For the development roadmap see docs/PLAN.md.

Installation & quick start

Requirements: Node.js 18+, Git on PATH, and an embedding backend (Ollama or any OpenAI-compatible HTTP API).

pnpm install       # install dependencies
pnpm build         # compile TypeScript → dist/
pnpm link --global # optional: put `gitsema` on PATH

Development (no build step):

pnpm dev -- <command> [options]
# e.g.: pnpm dev -- search "authentication middleware"

Running compiled output:

node dist/cli/index.js <command> [options]

Ollama quick start

ollama pull nomic-embed-text
gitsema index start
gitsema search "authentication middleware"

Or run the guided wizard, which detects your provider, configures a model and storage backend (sqlite/postgres/qdrant), optionally configures a narrator/guide model, and indexes HEAD in one step:

gitsema setup
# (gitsema quickstart is a backward-compat alias)

OpenAI-compatible HTTP example

export GITSEMA_PROVIDER=http
export GITSEMA_HTTP_URL=https://api.openai.com
export GITSEMA_MODEL=text-embedding-3-small
export GITSEMA_API_KEY=sk-...
gitsema index start

Multi-model routing: set GITSEMA_CODE_MODEL to a different model than GITSEMA_TEXT_MODEL to use separate models for code and prose. When they match (the default), a single provider is used.

Configuration

Configuration is read from environment variables or persisted via gitsema config (stored in .gitsema/config.json, repo-level by default, or ~/.config/gitsema/config.json with --global). Environment variables always override config-file values.

Variable	Default	Description
`GITSEMA_PROVIDER`	`ollama`	`ollama` or `http`
`GITSEMA_MODEL`	`nomic-embed-text`	Default embedding model
`GITSEMA_TEXT_MODEL`	`$GITSEMA_MODEL`	Model for prose/docs
`GITSEMA_CODE_MODEL`	`$GITSEMA_TEXT_MODEL`	Model for source code files
`GITSEMA_HTTP_URL`	(required if `http`)	Base URL of OpenAI-compatible API
`GITSEMA_API_KEY`	(optional)	Bearer token for the HTTP provider
`GITSEMA_VERBOSE`	off	Set to `1` for debug logging (or pass `--verbose`)
`GITSEMA_LOG_MAX_BYTES`	`1048576`	Log rotation threshold (1 MB)
`GITSEMA_SERVE_PORT`	`4242`	Port for `gitsema tools serve`
`GITSEMA_SERVE_KEY`	(optional)	Bearer token required by `gitsema tools serve`
`GITSEMA_LLM_URL`	(optional)	OpenAI-compatible URL for `--narrate` LLM summaries
`GITSEMA_DATA_DIR`	`~/.gitsema/data`	Root directory where `gitsema tools serve` persists cloned repos + index DBs (`repos/<repoId>/{repo,index.db}`, `registry.db`)

Run gitsema config list to see every active key and where its value came from (env, repo config, global config, or default). Supported dot-notation keys include provider, model, textModel, codeModel, httpUrl, apiKey, llmUrl, llmModel, remoteUrl, remoteKey, index.concurrency, index.maxCommits, index.ext, index.maxSize, index.exclude, index.chunker, index.windowSize, index.overlap, search.top, search.hybrid, search.recent, search.weightVector, search.weightRecency, search.weightPath, evolution.threshold, clusters.k, hooks.enabled, vscode.mcp, vscode.lsp, storage.backend, storage.scope, storage.name, storage.metadata.url, storage.vectors.url, storage.vectors.apiKey, storage.fts.backend, and more.

Storage backends (experimental, Phases 101–103): storage.backend selects where the index lives — sqlite (the default), postgres (+ pgvector), and qdrant are all implemented. For postgres, set storage.metadata.url=postgres://user:pass@host:5432/dbname (see docker-compose.postgres.yml for a local pgvector instance) and optionally storage.fts.backend=tsvector|pg_search|none. For qdrant, set storage.vectors.url=http://host:6333 (see docker-compose.qdrant.yml) plus storage.metadata.url=postgres://... (a Postgres companion for paths/commits/branches/FTS) and optionally storage.vectors.apiKey. gitsema index and all read-path commands (search, history, evolution, etc.) work against postgres and qdrant. storage.scope chooses which index a command resolves to: project (per-repo .gitsema/, default), user (~/.gitsema/), or named (an explicitly addressed index via storage.name). Use gitsema storage migrate --to <backend> [...] to copy an existing sqlite index into postgres/qdrant/sqlite, and gitsema doctor/gitsema status to inspect any backend. See docs/storage-backends-plan.md.

Command reference

All commands support a top-level --verbose flag (or GITSEMA_VERBOSE=1) for debug output. Run gitsema <command> --help for full flag details — the tables below summarize the most important options. Commands are grouped exactly as in gitsema --help.

Setup & Infrastructure

Command	Description
`gitsema config <action> [key] [value]`	Manage persistent configuration (`set`, `get`, `list`, `unset`)
`gitsema status [file]`	Show index status and database info, or status for a specific file
`gitsema doctor [options]`	Run index health checks (integrity, schema version, FTS backfill, scale warnings; `--extended` for model reachability/freshness)
`gitsema storage [info]`	Show the resolved storage backend/scope/location/FTS config (no connections opened)
`gitsema storage migrate --to <backend> [options]`	Copy the active index into another storage backend (sqlite/postgres/qdrant)
`gitsema models`	Manage embedding model configurations (list, add, remove, info); also manages LLM narrator/guide model configs via `--narrator`/`--guide`
`gitsema index`	Show index coverage (blob counts per model)
`gitsema index start [options]`	Perform indexing — walk Git history and embed all blobs
`gitsema setup` (alias: `gitsema quickstart`)	Guided onboarding wizard: detect provider, configure embedding model, select storage backend (sqlite/postgres/qdrant), index HEAD, and optionally configure a narrator/guide model
`gitsema remote-index <repoUrl>`	Ask a remote gitsema server to clone and index a Git repository

`gitsema index start [options]`

Walk Git history and embed all blobs. Already-indexed blobs are skipped (dedup by blob hash).

Flag	Default	Description
`--since <ref>`	last indexed commit	Date, tag, commit hash, or `"all"` for full re-index
`--max-commits <n>`	unlimited	Cap commits per run (for splitting large histories)
`--concurrency <n>`	`4`	Parallel embedding calls
`--embed-batch-size <n>`	—	Batch embedding requests
`--ext <exts>`	all	Only index these comma-separated extensions
`--max-size <size>`	`200kb`	Skip blobs larger than this
`--exclude <patterns>`	none	Skip paths containing these comma-separated substrings
`--include-glob <glob>`	—	Glob-based selective indexing
`--chunker <strategy>`	`file`	`file` \| `function` \| `fixed`
`--window-size <n>`	`1500`	Characters per chunk (fixed chunker)
`--overlap <n>`	`200`	Character overlap between adjacent fixed chunks
`--file <paths...>`	—	Index specific files from HEAD only
`--model <name>`	—	Model override for this run
`--allow-mixed`	off	Allow mixing models within one index
`--auto-build-vss [threshold]`	off	Build the VSS/HNSW ANN index after indexing
`--quantize`	off	Int8 scalar quantization of embeddings
`--profile <name>`	—	`speed` \| `balanced` \| `quality` preset
`--graph`	off	Extract structural references (imports/calls/extends/implements) for TS/TSX/JS/Python blobs into `structural_refs` (Phase 106 knowledge-graph track)

The indexer applies a multi-level fallback chain: whole-file → function chunker → fixed windows (1500 chars → 800 chars) when a blob exceeds the embedding model's context limit.

Other index subcommands: index export / index import (bundle transfer), index update-modules (directory-centroid embeddings), index build-vss (build the ANN index).

`gitsema storage [info]`

Prints the resolved storage.* configuration — backend, scope, location, and FTS status — without opening any connections. Bare gitsema storage is an alias for gitsema storage info. Use gitsema status/gitsema doctor for row counts and health checks.

`gitsema storage migrate --to <backend> [options]`

Copies the active index into another storage backend (sqlite/postgres/qdrant) via the StorageProfile seam. Content-addressed and idempotent, so a migration is safe to re-run/resume after an interruption. Only sqlite sources are supported today.

Flag	Description
`--to <backend>`	destination backend: `sqlite` \| `postgres` \| `qdrant` (required)
`--to-path <file>`	destination SQLite database file (for `--to sqlite`)
`--to-metadata-url <url>`	destination `postgres://...` connection string (for `--to postgres \| qdrant`)
`--to-vectors-url <url>`	destination Qdrant `http(s)://` URL (for `--to qdrant`)
`--to-vectors-api-key <key>`	Qdrant API key (for `--to qdrant`)
`--to-fts-backend <backend>`	destination FTS backend: `tsvector` \| `pg_search` \| `none` (default: `tsvector`)

Protocol Servers

Command	Description
`gitsema tools mcp`	Start the MCP stdio server (AI tool interface)
`gitsema tools lsp [--tcp <port>]`	Start the LSP semantic hover server (JSON-RPC over stdio or TCP)
`gitsema tools serve [--port n] [--key token] [--ui]`	Start the HTTP API server (remote embedding backend)

The old top-level gitsema mcp, gitsema lsp, gitsema serve, and gitsema backfill-fts still work as hidden backward-compat aliases.

gitsema tools serve defaults POST /api/v1/remote/index to persistent mode: the cloned repo and its index are stored under GITSEMA_DATA_DIR (default ~/.gitsema/data) and reused on subsequent requests (fetch + incremental index instead of a fresh clone + full reindex). The response includes a repoId you can pass to /api/v1/search, /api/v1/evolution/*, /api/v1/analysis/*, and other query routes to run them against that repo's index. Pass persist: false to fall back to the legacy ephemeral (temp-dir, GITSEMA_CLONE_KEEP-governed) behavior. Manage persisted repos with gitsema repos list-persisted and gitsema repos remove <repoId> [--purge]. See docs/features.md for details.

Deploying the server? See the deployment guide for Docker / docker-compose, systemd, the Postgres + Qdrant backends, key security, backups, and per-repo-size tuning. The repo also ships docker-compose.yml (Ollama sidecar), docker-compose.postgres.yml (pgvector), and docker-compose.qdrant.yml.

Search & Discovery

Command	Description
`gitsema search <query> [options]`	Semantically search the index for blobs matching the query
`gitsema first-seen <query> [options]`	Find when a concept first appeared, sorted chronologically (earliest first)
`gitsema repl [options]`	Start an interactive semantic search session
`gitsema dead-concepts [options]`	Find historical concepts no longer in HEAD but semantically similar to current code

`gitsema search <query> [options]`

Flag	Default	Description
`-k, --top <n>`	`10`	Results to return
`--recent`	off	Blend cosine similarity with recency score
`--alpha <n>`	`0.8`	Cosine weight in blended score
`--before <date>`	—	Only blobs first seen before this date (YYYY-MM-DD or ISO 8601); alias of `--until`
`--after <date>`	—	Only blobs first seen after this date (YYYY-MM-DD or ISO 8601); alias of `--since`
`--since <date>`	—	Only blobs first seen at or after this date (YYYY-MM-DD or ISO 8601); alias of `--after`
`--until <date>`	—	Only blobs first seen before this date (YYYY-MM-DD or ISO 8601); alias of `--before`
`--weight-vector <n>`	`0.7`	Vector weight in three-signal ranking
`--weight-recency <n>`	`0.2`	Recency weight
`--weight-path <n>`	`0.1`	Path-relevance weight
`--group <mode>`	—	Collapse results by `file` \| `module` \| `commit`
`--chunks`	off	Include chunk-level embeddings
`--hybrid`	off	Combine vector + BM25 (FTS5); requires prior `gitsema backfill-fts` for older data
`--bm25-weight <n>`	`0.3`	BM25 weight in hybrid score
`--branch <name>`	—	Restrict results to blobs seen on this branch
`--vss`	off	Use the HNSW ANN index (requires prior `gitsema index build-vss`)
`--out <spec>`	—	Output spec (repeatable): `text\|json[:file]\|html[:file]\|markdown[:file]`

`gitsema first-seen <query> [options]`

Same underlying search as gitsema search, but results are sorted by first-seen date (earliest first).

Flag	Default	Description
`-k, --top <n>`	`10`	Results to return
`--branch <name>`	—	Restrict results to blobs seen on this branch
`--hybrid`	off	Blend vector similarity with BM25 keyword matching
`--bm25-weight <n>`	`0.3`	BM25 weight in hybrid score
`--include-commits`	off	Also search commit messages and show chronological commit results
`--vss`	off	Use the HNSW ANN index for approximate search
`--repos <ids>`	—	Comma-separated repo IDs to include (multi-repo search)
`--out <spec>`	—	Output spec (repeatable): `text\|json[:file]\|html[:file]\|markdown[:file]`

Analysis

Command	Description
`gitsema triage <query> [options]`	Incident triage: composite workflow (first-seen, change-points, file-evolution, bisect, experts). `--lens structural\|hybrid` adds a cascade-planner structural context section (default: semantic, unchanged)
`gitsema policy-check [options]`	Run policy gates for drift, debt, and security scores (CI-friendly exit codes)
`gitsema ownership <query> [options]`	Show ownership heatmap for a concept (ownership confidence and trends)
`gitsema eval <file> [options]`	Evaluate retrieval quality against a JSONL file of (query, expectedPaths) pairs
`gitsema narrate [options]`	Return commit evidence (default) or an LLM-generated narrative of repository development history
`gitsema explain <topic> [options]`	Return matching commits (default) or an LLM-generated explanation/timeline for a bug, error, or topic. `--lens structural\|hybrid` appends call-graph/co-change context for a `--files` path
`gitsema guide [question] [options]`	Interactive LLM chat that answers questions about this repository, using gathered git context. `--lens structural\|hybrid` biases the agent toward the `call_graph`/`blast_radius`/`hotspots` tools

`gitsema narrate [options]` / `gitsema explain <topic> [options]`

Both commands are safe-by-default: with no narrator model configured (or without --narrate), they return raw commit evidence as JSON/text — no network calls are made. Pass --narrate (and configure a narrator model) to generate LLM prose instead.

Flag	Default	Description
`--since <ref\|date>`	—	Only include commits after this ref or date
`--until <ref\|date>`	—	Only include commits before this ref or date
`--range <rev-range>`	—	Git revision range (e.g. `v1.0..HEAD`) (`narrate` only)
`--focus <area>`	`all`	Filter commits by area: `bugs`, `features`, `ops`, `security`, `deps`, `performance`, `all` (`narrate` only)
`--format <fmt>`	`md`	Output format when narrating: `md`, `text`, `json` (legacy: prefer `--out`)
`--out <spec>`	—	Output spec (repeatable): `text\|json[:file]\|markdown[:file]` (overrides `--format`)
`--max-commits <n>`	`500`	Maximum commits to analyse (`narrate` only)
`--narrator-model-id <id>`	—	`embed_config.id` of the narrator model to use (overrides active selection)
`--model <name>`	—	Narrator model name to use (overrides active selection)
`--narrate`	off	Call the configured LLM narrator and return prose (default: return evidence only)
`--evidence-only`	on	Return raw commit evidence without calling the LLM (this is the default)

Configure a narrator model with gitsema models add <name> --narrator --http-url <url> [--key <token>] --activate, with a local Ollama model (gitsema models add <name> --narrator --provider ollama [--global-name <tag>] --activate, see "Ollama" below), or with a local CLI AI tool: gitsema models add <name> --narrator --provider cli --cli-command <tool> [--cli-args "<args>"] --activate (see "CLI-based AI tool backends" below).

`--narrate` on other commands

The same safe-by-default narration (no network unless --narrate is passed and a narrator model is configured) is also available as a --narrate flag on: first-seen, branch-summary, merge-audit, merge-preview, dead-concepts, debt, doc-gap, security-scan, blame/semantic-blame, triage, impact, ownership, experts, author, contributor-profile, bisect, refactor-candidates, cherry-pick-suggest, and heatmap. Each prints its normal output followed by an === LLM Narrative === section summarizing the result using the active narrator model.

`gitsema guide [question] [options]`

Interactive LLM chat that gathers repository context (recent commits, repo stats) and answers questions about the codebase. Falls back to the active narrator model if no guide model is configured, and prints the gathered context (without calling an LLM, no network access) if neither is configured.

When a guide (or fallback narrator) model is configured, guide runs a real agentic tool-calling loop (via @jsilvanus/chattydeer's runAgentLoop): the LLM can call gitsema tools to gather additional evidence before answering, up to 5 roundtrips. The tool registry (src/core/narrator/guideTools.ts) wires the full gitsema analysis toolset — the same ~36 capabilities exposed as MCP tools — grouped by category:

Category	Tools
Repository (git-only)	`repo_stats`, `recent_commits`, `narrate_repo`, `explain_topic`
Search & discovery	`semantic_search`, `code_search`, `search_history`, `first_seen`, `multi_repo_search`
History & temporal drift	`file_evolution`, `concept_evolution`, `change_points`, `file_change_points`, `health_timeline`
Branch & merge	`branch_summary`, `merge_audit`, `merge_preview`
Ownership & expertise	`author`, `experts`, `ownership`, `contributor_profile`
Quality, debt & risk	`impact`, `dead_concepts`, `debt_score`, `doc_gap`, `security_scan`
Diff & blame	`semantic_diff`, `semantic_blame`
Clustering	`clusters`, `cluster_diff`, `cluster_timeline`
Compound workflows	`triage`, `workflow_run`, `policy_check`, `eval`
Administration	`index`

Each tool has a corresponding entry in src/core/narrator/interpretations.ts describing its result shape and how to interpret it; this catalog is embedded directly in the guide's system prompt so the LLM knows what each result means (significant values, thresholds, caveats) and is also used to generate the "Interpreting gitsema tool results" section of skill/gitsema-ai-assistant.md (pnpm gen:skill).

Tools that require a .gitsema index (search, evolution, clustering, ownership, etc.) return {"error": "..."} gracefully if no index exists or the embedding provider is unreachable — the agent is instructed to fall back to git-only tools (repo_stats, recent_commits, narrate_repo, explain_topic) and tell the user to run gitsema index first. The index tool is mutating/expensive (admin category) and the agent is instructed to use it conservatively.

All outbound content (prompts, tool results) is passed through the same secret/PII redaction (redactAll) used by narrate/explain before it reaches the LLM.

Flag	Description
`--guide-model-id <id>`	`embed_config.id` of the guide model to use
`--model <name>`	Guide/narrator model name to use
`--no-context`	Skip gathering git context (faster but less accurate)
`-i, --interactive`	Start an interactive REPL session (one question per line, multi-turn — the agent session is reused across turns)

Configure a guide model with gitsema models add <name> --guide --http-url <url> [--key <token>] --activate, or with a local CLI AI tool: gitsema models add <name> --guide --provider cli --cli-command <tool> [--use-mcp] --activate (see "CLI-based AI tool backends" below).

Ollama: --provider ollama configures narrate/explain/guide against a local Ollama server with no API key. It defaults --http-url to http://localhost:11434 (no trailing /v1 — both the narrator and @jsilvanus/chattydeer append /v1/chat/completions themselves) and sends the local model name as the model field in chat requests, so the name (or --global-name) must match a pulled Ollama tag. The agentic guide loop needs a tool-calling-capable model (e.g. llama3.1, qwen2.5):

ollama pull llama3.1
gitsema models add llama3.1 --guide --provider ollama --activate
gitsema guide "what changed recently?"

# Or use a local alias mapped to the Ollama tag via --global-name:
gitsema models add ol-guide --guide --provider ollama --global-name llama3.1 --activate

If you omit <name> with --provider ollama, gitsema lists the models available on your Ollama server:

gitsema models add --provider ollama

CLI-based AI tool backends

Instead of an HTTP endpoint, narrate/explain/guide can shell out to a local CLI AI coding agent (Claude Code, Codex CLI, GitHub Copilot CLI, or any other tool already installed and authenticated on your machine):

# One-shot narration via the local `claude` CLI
gitsema models add claude-cli --narrator --provider cli --cli-command claude --activate
gitsema narrate --narrate

# Agentic guide, with gitsema's own MCP server exposed to the CLI tool's agent loop
gitsema models add claude-cli --guide --provider cli --cli-command claude --use-mcp --activate
gitsema guide "what changed recently?"

Flag	Description
`--provider cli`	Use a local CLI tool instead of an HTTP endpoint
`--cli-command <tool>`	Executable to spawn (e.g. `claude`, `codex`, `copilot`/`gh`); required with `--provider cli`
`--cli-args "<args>"`	Extra fixed arguments inserted before the prompt (space-separated)
`--use-mcp`	Guide only: write a temporary MCP config exposing gitsema's `tools mcp` server and pass `--mcp-config`/`--allowedTools mcp__gitsema__*`, so the CLI tool's own agent loop can call gitsema's analysis tools directly

For narrate/explain, the system+user prompt is combined, redacted, and passed one-shot (e.g. claude -p "<prompt>" --output-format json); the result is parsed from stdout. For guide -i (interactive, multi-turn), conversational context is preserved via the CLI tool's session-resume flag (e.g. Claude Code's --resume <session-id>), extracted from the previous turn's output.

Built-in adapters: claude (full support, including --mcp-config and --resume), codex (codex exec, best-effort/experimental MCP support), copilot/gh (copilot explain, one-shot only — no MCP/session support). Any other --cli-command falls back to a generic adapter (<tool> [cliArgs...] "<prompt>", raw stdout as prose).

`gitsema policy-check [options]`

CI policy gate over drift, debt, and security thresholds.

Flag	Description
`--max-drift <n>`	Fail if any concept change-point distance exceeds `n` (cosine distance, 0–2)
`--max-debt-score <n>`	Fail if aggregate debt score exceeds `n`
`--min-security-score <n>`	Fail if any security finding similarity exceeds `n` (cosine similarity, 0–1)
`--query <text>`	Concept query (required for the drift gate)
`--out <spec>`	Output spec (repeatable): `text\|json[:file]`

Exit codes: 0 = ok, 1 = runtime error, 2 = usage error, 3 = gate failed. The same exit-code contract applies to all CI-facing commands: ci-diff, regression-gate, code-review, and policy-check.

File History

Command	Description
`gitsema file-evolution <path> [options]`	Track semantic drift of a file over its Git history (see also: `file-diff`, `evolution`)
`gitsema file-diff <ref1> <ref2> <path>`	Compute semantic diff between two versions of a file
`gitsema blame <file>` (alias: `semantic-blame`)	Show semantic origin of each logical block in a file — nearest-neighbor blame
`gitsema impact <path> [--lens <lens>] [--weight-structural <n>]`	Compute semantically similar blobs across the codebase to highlight refactor impact. `--lens structural\|hybrid` makes this a thin alias over `blast-radius` (default lens: semantic, pre-Phase-109 behavior)

`gitsema file-evolution <path> [options]`

Track semantic drift of a single file across its Git history.

Flag	Default	Description
`--threshold <n>`	`0.3`	Cosine distance (0–2) above which a version is flagged as a large change
`--level <level>`	`file`	`file` \| `symbol` — symbol uses per-symbol centroid embeddings
`--dump [file]`	—	Output structured JSON; writes to `<file>` or stdout
`--html [file]`	—	Output an interactive HTML visualization
`--out <spec>`	—	Output spec (repeatable): `text\|json[:file]\|html[:file]\|markdown[:file]` (overrides `--dump`/`--html`)
`--include-content`	off	Add stored file text per version in JSON (requires `--dump`)
`--alerts [n]`	`5`	Show the top-N largest semantic jumps with author and commit link
`--branch <name>`	—	Restrict evolution to blobs seen on this branch
`--narrate`	off	Generate an LLM narrative summary of semantic shifts (requires `GITSEMA_LLM_URL`)

Concept History

Command	Description
`gitsema evolution <query> [options]` (alias: `concept-evolution`)	Trace how a semantic concept evolved across the entire codebase history
`gitsema diff <ref1> <ref2> <query>`	Compute a conceptual/semantic diff of a topic across two git refs — gained, lost, and stable concepts
`gitsema author <query> [options]`	Rank authors by semantic contribution to a concept

`gitsema evolution <query> [options]`

Flag	Default	Description
`-k, --top <n>`	`50`	Top-matching blobs to include
`--threshold <n>`	`0.3`	Large-change flag threshold (cosine distance, 0–2)
`--dump [file]`	—	Structured JSON output
`--include-content`	off	Add stored file text (requires `--dump`)

Cluster Analysis

Command	Description
`gitsema clusters [options]`	Cluster all blob embeddings into semantic regions and display a concept graph
`gitsema cluster-diff <ref1> <ref2>`	Compare semantic clusters between two points in history
`gitsema cluster-timeline [options]`	Show how semantic clusters shifted over commit history (multi-step timeline)
`gitsema branch-summary <branch>`	Generate a semantic summary of what a branch is about compared to its base branch
`gitsema merge-audit <branch-a> <branch-b>`	Detect semantic collisions between two branches — concept-level conflicts textual diff can't find
`gitsema merge-preview <branch>`	Predict semantic cluster shifts after merging a branch

Change Detection

Command	Description
`gitsema change-points <query> [options]`	Detect conceptual change points for a semantic query across commit history
`gitsema file-change-points <path> [options]`	Detect semantic change points in a file's Git history
`gitsema cluster-change-points [options]`	Detect change points in the repo's cluster structure across commit history

Code Quality

Command	Description
`gitsema code-search <snippet> [options]`	Find code semantically similar to a given snippet
`gitsema security-scan [options]`	Scan the codebase for common security patterns (semantic + structural heuristics)
`gitsema health [options]`	Show codebase health timeline
`gitsema debt [options]`	Score technical debt across the codebase
`gitsema code-review [options]`	Semantic code review: find historical analogues for changed code and flag regressions. `--lens structural\|hybrid` adds per-file call-graph/co-change context (default: semantic, unchanged)
`gitsema refactor-candidates [options]`	Find pairs of symbols/chunks/files semantically similar enough to be refactoring candidates
`gitsema lifecycle <query> [options]`	Analyze the lifecycle stages (born → growing → mature → declining → dead) of a semantic concept
`gitsema doc-gap [options]`	Find undocumented code by comparing code blobs against prose/documentation embeddings

code-review follows the same exit-code contract as policy-check (0 ok / 1 runtime error / 2 usage error / 3 gate failed).

Knowledge Graph

Command	Description
`gitsema graph build`	Build/rebuild `graph_nodes` and `edges` from `structural_refs`, `symbols`, and `blob_commits` (truncate-and-rebuild; requires prior `gitsema index --graph`)
`gitsema co-change <path> [-k/--top <n>]`	Files that historically change together with `<path>`
`gitsema deps <identifier> [--reverse] [--depth <n>] [--edge-types <types>]`	Import/dependency closure of a file or symbol (default edge types: `imports,calls,extends,implements`)
`gitsema graph cycles [--edge-types <types>]` / `gitsema cycles [--edge-types <types>]`	Detect cycles in the structural graph (default: `imports`)
`gitsema graph callers <symbol> [--depth <n>]`	Reverse `calls` traversal — who (transitively) calls `<symbol>` (default depth 3, max 3)
`gitsema graph callees <symbol> [--depth <n>]`	Forward `calls` traversal — what `<symbol>` (transitively) calls (default depth 3, max 3)
`gitsema graph neighbors <node> [--edge-types <types>] [--direction <dir>] [--depth <n>]`	Typed neighborhood of `<node>` — any edge kinds by default (default depth 1, max 3)
`gitsema graph path <a> <b>`	Shortest typed path from `<a>` to `<b>` (max depth 3)
`gitsema blast-radius <symbol> [--lens <lens>] [--depth <n>] [-k/--top <n>] [--weight-structural <n>]`	What changes if I touch this — structural dependents (`calls`/`imports`/`extends`/`implements`/`references`, reverse traversal) and/or semantically similar blobs (default lens: hybrid)
`gitsema relate <symbol> [--lens <lens>] [-k/--top <n>]`	Callers/callees (structural, depth 1) and semantically similar blobs, labeled — both lenses, lose neither (default lens: hybrid)
`gitsema similar <symbol> [--lens <lens>] [-k/--top <n>] [--weight-structural <n>]`	Symbols/files with a similar call/import shape (structural, Jaccard overlap) and/or semantically similar (vector) (default lens: hybrid)
`gitsema unused [--edge-types <types>]`	Symbols/files with no inbound `calls`/`imports` edges — structural complement to `dead-concepts`
`gitsema hotspots [--lens <lens>] [-k/--top <n>]`	Architectural risk = co-change (temporal) × call-coupling (structural) × churn — geometric mean of the signals the lens selects (default lens: hybrid). Also `POST /api/v1/graph/hotspots` and MCP `hotspots`

Workflow & CI

Command	Description
`gitsema repos`	Manage tracked repositories for multi-repo indexing
`gitsema watch`	Manage saved searches and watch mode notifications
`gitsema pr-report [options]`	Compose a semantic PR report: diff, impacted modules, change-points, reviewer suggestions
`gitsema regression-gate [options]`	CI gate: fail if key concepts drift beyond threshold between two refs
`gitsema bisect <good> <bad> <query>`	Semantic git bisect — binary search to find where a concept diverged from a "good" baseline
`gitsema ci-diff [options]`	CI/CD semantic diff — compare semantic content between two Git refs, non-zero exit when concepts changed
`gitsema contributor-profile <author> [options]`	Compute a contributor semantic profile and show top-N blobs they specialize in
`gitsema cherry-pick-suggest <query> [options]`	Suggest commits to cherry-pick based on semantic similarity to a query

regression-gate and ci-diff follow the same exit-code contract as policy-check (0 ok / 1 runtime error / 2 usage error / 3 gate failed) — useful for distinguishing "drift detected" from "provider unreachable" in CI pipelines.

Workflows

Command	Description
`gitsema workflow list`	List available productized workflow templates
`gitsema workflow run <name>`	Run a named workflow template (`pr-review` \| `incident` \| `release-audit`)

Repo Insights

Command	Description
`gitsema cross-repo-similarity <query> [options]`	Compare semantic similarity of a concept across two separately indexed repos
`gitsema experts [options]`	Show top contributors ranked by blob count and the semantic areas they worked on

Visualization

Command	Description
`gitsema map`	Output a JSON representation of semantic clusters and blob assignments
`gitsema heatmap [options]`	Show semantic activity heatmap — distinct blob changes by time period (week or month)
`gitsema project [options]`	Compute 2D random projections of all embeddings for the web UI (requires prior `gitsema index`)

Database & schema

Location: .gitsema/index.db (relative to the indexed repo root) — add .gitsema/ to .gitignore
Engine: SQLite via better-sqlite3, WAL mode enabled
Current schema version: 21

See CLAUDE.md for the full schema table and migration history.

MCP integration

Gitsema ships an MCP stdio server exposing 32 tools (semantic search, temporal analysis, clustering, security/debt scans, workflow automation, and more) to Claude and other MCP-compatible AI clients:

gitsema tools mcp

See CLAUDE.md for the full tool list and VS Code registration instructions, and docs/features.md for tool-by-tool descriptions.

Further documentation

Document	Purpose
`docs/deploy.md`	Deployment guide: running the HTTP server (Docker / docker-compose / systemd), Postgres + Qdrant backends, securing keys, backups, and tuning
`docs/features.md`	Comprehensive feature catalog grouped by area (indexing, search, MCP tools, HTTP API, etc.)
`docs/PLAN.md`	Full development roadmap: phase history, current status, backlog, and planned phases
`CLAUDE.md`	Architecture, schema, configuration, and development conventions
`skill/gitsema-ai-assistant.md`	AI workflow skill/playbook for operating gitsema in coding tasks

Name		Name	Last commit message	Last commit date
Latest commit History 553 Commits
.changeset		.changeset
.claude/skills/gitsema		.claude/skills/gitsema
.github		.github
.vscode		.vscode
action		action
assets		assets
docs		docs
modelserver		modelserver
scripts		scripts
skill		skill
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
ISSUES.md		ISSUES.md
LICENSE		LICENSE
README.md		README.md
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.qdrant.yml		docker-compose.qdrant.yml
docker-compose.yml		docker-compose.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

gitsema

Installation & quick start

Ollama quick start

OpenAI-compatible HTTP example

Configuration

Command reference

Setup & Infrastructure

gitsema index start [options]

gitsema storage [info]

gitsema storage migrate --to <backend> [options]

Protocol Servers

Search & Discovery

gitsema search <query> [options]

gitsema first-seen <query> [options]

Analysis

gitsema narrate [options] / gitsema explain <topic> [options]

--narrate on other commands

gitsema guide [question] [options]

CLI-based AI tool backends

gitsema policy-check [options]

File History

gitsema file-evolution <path> [options]

Concept History

gitsema evolution <query> [options]

Cluster Analysis

Change Detection

Code Quality

Knowledge Graph

Workflow & CI

Workflows

Repo Insights

Visualization

Database & schema

MCP integration

Further documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`gitsema index start [options]`

`gitsema storage [info]`

`gitsema storage migrate --to <backend> [options]`

`gitsema search <query> [options]`

`gitsema first-seen <query> [options]`

`gitsema narrate [options]` / `gitsema explain <topic> [options]`

`--narrate` on other commands

`gitsema guide [question] [options]`

`gitsema policy-check [options]`

`gitsema file-evolution <path> [options]`

`gitsema evolution <query> [options]`

Packages