From d93eaf5eeb3c77338f9af8663e577e11d67a797e Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 15:00:32 +0000 Subject: [PATCH 1/6] Phase 108: graph traversal primitives (callers/callees/neighbors/path) Add GraphStore.neighbors/callers/callees/path/subgraph (recursive-CTE traversals over graph_nodes/edges, depth-capped at 3) for sqlite and Postgres, with an UnsupportedGraphStore stub for the Qdrant profile. Expose them via `gitsema graph callers|callees|neighbors|path` and MCP tools `call_graph`/`graph_neighbors`. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- .changeset/phase-108-graph-traversal.md | 5 + CLAUDE.md | 4 +- README.md | 4 + docs/PLAN.md | 30 +++ docs/features.md | 3 + src/cli/commands/graphCallees.ts | 34 ++++ src/cli/commands/graphCallers.ts | 34 ++++ src/cli/commands/graphNeighbors.ts | 42 ++++ src/cli/commands/graphPath.ts | 46 +++++ src/cli/register/graph.ts | 30 +++ src/core/graph/traversal.ts | 54 +++++ src/core/storage/postgres/graphStore.ts | 75 ++++++- src/core/storage/postgres/graphTraversal.ts | 118 +++++++++++ src/core/storage/sqlite/graphTraversal.ts | 131 ++++++++++++ src/core/storage/sqlite/profile.ts | 84 +++++++- src/core/storage/types.ts | 57 +++++- src/core/storage/unsupportedGraphStore.ts | 22 ++- src/mcp/server.ts | 2 + src/mcp/tools/graph.ts | 90 +++++++++ tests/graphTraversal.test.ts | 208 ++++++++++++++++++++ 20 files changed, 1068 insertions(+), 5 deletions(-) create mode 100644 .changeset/phase-108-graph-traversal.md create mode 100644 src/cli/commands/graphCallees.ts create mode 100644 src/cli/commands/graphCallers.ts create mode 100644 src/cli/commands/graphNeighbors.ts create mode 100644 src/cli/commands/graphPath.ts create mode 100644 src/core/graph/traversal.ts create mode 100644 src/core/storage/postgres/graphTraversal.ts create mode 100644 src/core/storage/sqlite/graphTraversal.ts create mode 100644 src/mcp/tools/graph.ts create mode 100644 tests/graphTraversal.test.ts diff --git a/.changeset/phase-108-graph-traversal.md b/.changeset/phase-108-graph-traversal.md new file mode 100644 index 0000000..0d7e730 --- /dev/null +++ b/.changeset/phase-108-graph-traversal.md @@ -0,0 +1,5 @@ +--- +"gitsema": minor +--- + +Add graph traversal primitives over the Phase 107 structural graph: `gitsema graph callers ` / `gitsema graph callees ` (transitive `calls` traversal, default and max depth 3), `gitsema graph neighbors ` (typed neighborhood, any edge kinds, configurable direction/depth), and `gitsema graph path ` (shortest typed path between two nodes). New MCP tools `call_graph` and `graph_neighbors` expose the same traversals. diff --git a/CLAUDE.md b/CLAUDE.md index f15dc31..a797686 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -468,7 +468,7 @@ node dist/cli/index.js tools mcp The MCP server reads the same environment variables as the CLI. It runs against the `.gitsema/index.db` in the current working directory when the server is started. -**Exposed tools (34 total, registered across `src/mcp/tools/{search,analysis,clustering,infrastructure,workflow,narrator}.ts`):** +**Exposed tools (36 total, registered across `src/mcp/tools/{search,analysis,clustering,infrastructure,workflow,narrator,graph}.ts`):** | Tool | Description | |---|---| @@ -506,6 +506,8 @@ The MCP server reads the same environment variables as the CLI. It runs against | `workflow_run` | Run a named workflow template (`pr-review` \| `incident` \| `release-audit`) | | `narrate_repo` | Generate evidence (default) or an LLM narrative of repository development history | | `explain_issue_or_error` | Generate evidence (default) or an LLM explanation/timeline for a bug, error, or topic | +| `call_graph` | Structural call-graph traversal — callers/callees of a symbol (Phase 108) | +| `graph_neighbors` | Typed neighborhood of a graph node — any edge kinds, direction, depth (Phase 108) | --- diff --git a/README.md b/README.md index e7ed033..dc95df5 100644 --- a/README.md +++ b/README.md @@ -435,6 +435,10 @@ Track semantic drift of a single file across its Git history. | `gitsema co-change [-k/--top ]` | Files that historically change together with `` | | `gitsema deps [--reverse] [--depth ] [--edge-types ]` | Import/dependency closure of a file or symbol (default edge types: `imports,calls,extends,implements`) | | `gitsema graph cycles [--edge-types ]` / `gitsema cycles [--edge-types ]` | Detect cycles in the structural graph (default: `imports`) | +| `gitsema graph callers [--depth ]` | Reverse `calls` traversal — who (transitively) calls `` (default depth 3, max 3) | +| `gitsema graph callees [--depth ]` | Forward `calls` traversal — what `` (transitively) calls (default depth 3, max 3) | +| `gitsema graph neighbors [--edge-types ] [--direction ] [--depth ]` | Typed neighborhood of `` — any edge kinds by default (default depth 1, max 3) | +| `gitsema graph path ` | Shortest typed path from `` to `` (max depth 3) | ### Workflow & CI diff --git a/docs/PLAN.md b/docs/PLAN.md index bb39d46..f62c86c 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -4003,3 +4003,33 @@ pairwise co-change computation per commit to avoid O(n²) blowup on vendoring/lockfile-regeneration commits. Traversal primitives (callers/callees/path/neighbors) and the `--lens` toggle remain out of scope — Phase 108/109. + +**Status:** Phase 108 ✅ complete. The `GraphStore` interface +(`src/core/storage/types.ts`) gains five traversal primitives — +`neighbors`/`callers`/`callees`/`path`/`subgraph` — plus `GraphHit`, `GraphPath`, +`GraphPathHop`, `GraphSubgraph`, and a shared `MAX_GRAPH_TRAVERSAL_DEPTH = 3` +constant (knowledge-graph §6). Both `SqliteGraphStore` and `PostgresGraphStore` +implement them via recursive CTEs over `edges`/`graph_nodes` +(`src/core/storage/sqlite/graphTraversal.ts` and +`src/core/storage/postgres/graphTraversal.ts`): a `WITH RECURSIVE` walk with a +`ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth)` window picks the +shortest-depth hit (and its edge type) per reached node for +`neighbors`/`callers`/`callees`/`subgraph`; `path` uses a second recursive CTE that +accumulates a delimited path string (`node|edgeType|reversed|node|...`) and returns +the shortest match. All traversal depths are clamped to +`MAX_GRAPH_TRAVERSAL_DEPTH` (`callers`/`callees`/`path`/`subgraph` default to 3; +`neighbors` defaults to 1). `UnsupportedGraphStore` throws the same +"graph queries require a relational backend" error for all five new methods, per +review9 §4. A new `src/core/graph/traversal.ts` wraps the primitives with +`resolveNode()` (Phase 107) for identifier resolution, backing four new CLI +commands — `gitsema graph callers [--depth]`, `gitsema graph callees + [--depth]`, `gitsema graph neighbors [--edge-types] [--direction] +[--depth]`, and `gitsema graph path ` — and two new MCP tools, `call_graph` +(callers/callees over `calls` edges) and `graph_neighbors` (typed neighborhood, any +edge kinds), registered in `src/mcp/tools/graph.ts`. **Deviation from the original +sketch:** `call_graph`/`graph_neighbors` are not yet added to the `gitsema guide` +`GUIDE_TOOLS` registry (46 tools) or `interpretations.ts` — left for the Phase 110 +fusion pass / Phase 112 lens-coverage sweep, consistent with `docsSync`'s existing +guard (which only requires every `GUIDE_TOOLS` entry to have an interpretation, not +that every MCP tool is in `GUIDE_TOOLS`). No schema change. Tests: +`tests/graphTraversal.test.ts`. diff --git a/docs/features.md b/docs/features.md index b30c2db..7e77fe4 100644 --- a/docs/features.md +++ b/docs/features.md @@ -200,6 +200,7 @@ All search uses the **text embedding model** (not the code model) to embed queri | **Generic `--narrate` via `narrateToolResult` (Phase 104)** | `narrateToolResult(toolKey, result)` in `src/core/llm/narrator.ts` looks up the tool's `TOOL_INTERPRETATIONS` entry, redacts and caps the JSON result, and asks the active narrator model for a prose summary (safe-by-default — no network unless `--narrate` is passed and a narrator model is configured). Wired onto `--narrate` flags on `first-seen`, `branch-summary`, `merge-audit`, `merge-preview`, `dead-concepts`, `debt`, `doc-gap`, `security-scan`, `blame`/`semantic-blame`, `triage`, `impact`, `ownership`, `experts`, `author`, `contributor-profile`, `bisect`, `refactor-candidates`, `cherry-pick-suggest`, and `heatmap` | | **Guided `gitsema setup` wizard with storage backend selection (Phase 104)** | `gitsema setup` (primary name; `gitsema quickstart` remains a backward-compat alias) extends the onboarding wizard with a storage-backend step (sqlite/postgres/qdrant), persisting `storage.*` config keys and validating the connection via `getCachedStorageProfile().metadata.getLastIndexedCommit()` (reverting to sqlite on failure), plus an optional final step to configure a local Ollama narrator/guide model via `gitsema models add --narrator\|--guide --provider ollama --activate` | | **Co-change / dependency / cycle queries (Phase 107)** | `gitsema co-change [-k/--top]` — files that historically change together with `` (from `co_change` edges); `gitsema deps [--reverse] [--depth] [--edge-types]` — import/dependency closure of a file or symbol (BFS over `imports`/`calls`/`extends`/`implements` edges); `gitsema graph cycles` / top-level `gitsema cycles [--edge-types]` — detect cycles in the structural graph (default: `imports`). All require `gitsema index --graph` + `gitsema graph build` first | +| **Graph traversal primitives (Phase 108)** | `GraphStore.neighbors/callers/callees/path/subgraph` — recursive-CTE traversals over `graph_nodes`/`edges` (sqlite + Postgres; Qdrant profile throws "graph queries require a relational backend"), depth capped at 3. CLI: `gitsema graph callers [--depth]` (reverse `calls` traversal), `gitsema graph callees [--depth]` (forward `calls` traversal), `gitsema graph neighbors [--edge-types] [--direction] [--depth]` (typed neighborhood, any edge kinds), `gitsema graph path ` (shortest typed path, rendered as `-[edgeType]->`/`<-[edgeType]-` hops). MCP: `call_graph` (callers/callees) and `graph_neighbors`. All resolve symbol qualified names, file paths, or literal node keys via `resolveNode()`; require `gitsema index --graph` + `gitsema graph build` first | --- @@ -361,6 +362,8 @@ Start with `gitsema tools mcp`. All tools share the same core logic as the CLI. | `triage` | Incident triage bundle: first-seen, change points, evolution, bisect, experts | | `policy_check` | CI policy gate — debt score, security similarity, and concept drift thresholds | | `workflow_run` | Run a named workflow template (`pr-review` \| `incident` \| `release-audit`) | +| `call_graph` | Structural call-graph traversal — callers/callees of a symbol (Phase 108) | +| `graph_neighbors` | Typed neighborhood of a graph node — any edge kinds, direction, depth (Phase 108) | --- diff --git a/src/cli/commands/graphCallees.ts b/src/cli/commands/graphCallees.ts new file mode 100644 index 0000000..53079a3 --- /dev/null +++ b/src/cli/commands/graphCallees.ts @@ -0,0 +1,34 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callees } from '../../core/graph/traversal.js' + +export interface GraphCalleesCommandOptions { + depth?: string +} + +export async function graphCalleesCommand(symbol: string, options: GraphCalleesCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await callees(profile.graph, symbol, depth) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${symbol}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${symbol}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const node = result.resolved.node + console.log(`Callees of ${node.displayName} (${node.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + console.log(` ${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphCallers.ts b/src/cli/commands/graphCallers.ts new file mode 100644 index 0000000..17bfda0 --- /dev/null +++ b/src/cli/commands/graphCallers.ts @@ -0,0 +1,34 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callers } from '../../core/graph/traversal.js' + +export interface GraphCallersCommandOptions { + depth?: string +} + +export async function graphCallersCommand(symbol: string, options: GraphCallersCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await callers(profile.graph, symbol, depth) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${symbol}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${symbol}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const node = result.resolved.node + console.log(`Callers of ${node.displayName} (${node.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + console.log(` ${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphNeighbors.ts b/src/cli/commands/graphNeighbors.ts new file mode 100644 index 0000000..f00c66e --- /dev/null +++ b/src/cli/commands/graphNeighbors.ts @@ -0,0 +1,42 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { neighbors } from '../../core/graph/traversal.js' +import type { EdgeType } from '../../core/storage/types.js' + +export interface GraphNeighborsCommandOptions { + edgeTypes?: string + direction?: string + depth?: string +} + +export async function graphNeighborsCommand(node: string, options: GraphNeighborsCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const edgeTypes = options.edgeTypes + ? options.edgeTypes.split(',').map((s) => s.trim()).filter(Boolean) as EdgeType[] + : undefined + const direction = (options.direction as 'out' | 'in' | 'both' | undefined) ?? 'both' + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await neighbors(profile.graph, node, { edgeTypes, direction, depth }) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${node}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${node}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const resolvedNode = result.resolved.node + console.log(`Neighbors of ${resolvedNode.displayName} (${resolvedNode.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + const edge = hit.edgeType ? `[${hit.edgeType}] ` : '' + console.log(` ${edge}${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphPath.ts b/src/cli/commands/graphPath.ts new file mode 100644 index 0000000..4f486d1 --- /dev/null +++ b/src/cli/commands/graphPath.ts @@ -0,0 +1,46 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { path } from '../../core/graph/traversal.js' + +export async function graphPathCommand(a: string, b: string): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const result = await path(profile.graph, a, b) + + if (result.from.status === 'not-found') { + console.log(`No graph node found for "${a}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.from.status === 'ambiguous') { + console.log(`"${a}" is ambiguous — matches multiple symbols:`) + for (const c of result.from.candidates) console.log(` ${c.nodeKey}`) + return + } + if (result.to.status === 'not-found') { + console.log(`No graph node found for "${b}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.to.status === 'ambiguous') { + console.log(`"${b}" is ambiguous — matches multiple symbols:`) + for (const c of result.to.candidates) console.log(` ${c.nodeKey}`) + return + } + + const fromNode = result.from.node + const toNode = result.to.node + + if (!result.path) { + console.log(`No path found from ${fromNode.displayName} to ${toNode.displayName} within the traversal depth limit.`) + return + } + + if (result.path.hops.length === 0) { + console.log(`${fromNode.displayName} is the same node as ${toNode.displayName}.`) + return + } + + const segments = [fromNode.displayName] + for (const hop of result.path.hops) { + const arrow = hop.reversed ? `<-[${hop.edgeType}]-` : `-[${hop.edgeType}]->` + segments.push(arrow, hop.displayName) + } + console.log(segments.join(' ')) +} diff --git a/src/cli/register/graph.ts b/src/cli/register/graph.ts index 9d6ff49..4bd0d4a 100644 --- a/src/cli/register/graph.ts +++ b/src/cli/register/graph.ts @@ -3,6 +3,10 @@ import { graphBuildCommand } from '../commands/graphBuild.js' import { coChangeCommand } from '../commands/coChange.js' import { depsCommand } from '../commands/deps.js' import { cyclesCommand } from '../commands/cycles.js' +import { graphCallersCommand } from '../commands/graphCallers.js' +import { graphCalleesCommand } from '../commands/graphCallees.js' +import { graphNeighborsCommand } from '../commands/graphNeighbors.js' +import { graphPathCommand } from '../commands/graphPath.js' /** * Structural knowledge-graph commands (Phase 107, knowledge-graph §3.3/§8). @@ -52,4 +56,30 @@ export function registerGraph(program: Command) { .description('Detect cycles in the structural graph (default: import cycles) (alias of `gitsema graph cycles`)') .option('--edge-types ', 'comma-separated edge types to check for cycles (default: imports)') .action(cyclesCommand) + + // Phase 108: traversal primitives (recursive CTEs over edges/graph_nodes). + graph + .command('callers ') + .description('Reverse `calls` traversal — who (transitively) calls (default depth 3)') + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphCallersCommand) + + graph + .command('callees ') + .description('Forward `calls` traversal — what (transitively) calls (default depth 3)') + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphCalleesCommand) + + graph + .command('neighbors ') + .description('Typed neighborhood of — any edge kinds by default (default depth 1, max 3)') + .option('--edge-types ', 'comma-separated edge types to traverse (default: all)') + .option('--direction ', "'out' | 'in' | 'both' (default: both)") + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphNeighborsCommand) + + graph + .command('path ') + .description('Shortest typed path from to (structural lens; max depth 3)') + .action(graphPathCommand) } diff --git a/src/core/graph/traversal.ts b/src/core/graph/traversal.ts new file mode 100644 index 0000000..1805ead --- /dev/null +++ b/src/core/graph/traversal.ts @@ -0,0 +1,54 @@ +/** + * `gitsema graph callers|callees|neighbors|path` (Phase 108, knowledge-graph + * §6/§8): thin wrappers over `GraphStore`'s recursive-CTE traversal + * primitives, resolving user-supplied identifiers via `resolveNode`. + */ + +import type { EdgeType, GraphHit, GraphPath, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' + +export interface TraversalResult { + resolved: ResolveNodeResult + hits: GraphHit[] +} + +/** Reverse `calls` traversal — who (transitively) calls `identifier`. */ +export async function callers(graph: GraphStore, identifier: string, depth?: number): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.callers(resolved.node.nodeKey, depth) } +} + +/** Forward `calls` traversal — what `identifier` (transitively) calls. */ +export async function callees(graph: GraphStore, identifier: string, depth?: number): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.callees(resolved.node.nodeKey, depth) } +} + +/** Typed neighborhood of `identifier` (any edge kinds by default). */ +export async function neighbors( + graph: GraphStore, + identifier: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number } = {}, +): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.neighbors(resolved.node.nodeKey, opts) } +} + +export interface PathResult { + from: ResolveNodeResult + to: ResolveNodeResult + path: GraphPath | null +} + +/** Shortest typed path from `a` to `b` — "how does A reach B". */ +export async function path(graph: GraphStore, a: string, b: string): Promise { + const from = await resolveNode(graph, a) + const to = await resolveNode(graph, b) + if (from.status !== 'found' || to.status !== 'found') { + return { from, to, path: null } + } + return { from, to, path: await graph.path(from.node.nodeKey, to.node.nodeKey) } +} diff --git a/src/core/storage/postgres/graphStore.ts b/src/core/storage/postgres/graphStore.ts index a6c923d..fc2f865 100644 --- a/src/core/storage/postgres/graphStore.ts +++ b/src/core/storage/postgres/graphStore.ts @@ -8,7 +8,9 @@ import type { Pool } from 'pg' import { ensurePostgresSchema } from './migrations.js' -import type { EdgeType, GraphEdgeRecord, GraphNodeRecord, GraphStore } from '../types.js' +import type { EdgeType, GraphEdgeRecord, GraphHit, GraphNodeRecord, GraphPath, GraphPathHop, GraphStore, GraphSubgraph } from '../types.js' +import { MAX_GRAPH_TRAVERSAL_DEPTH } from '../types.js' +import { traverseNeighbors, findShortestPath, clampDepth, type WalkHit } from './graphTraversal.js' export class PostgresGraphStore implements GraphStore { constructor(private readonly pool: Pool) {} @@ -97,6 +99,77 @@ export class PostgresGraphStore implements GraphStore { ? collected.filter((e) => edgeTypes.includes(e.edgeType)) : collected } + + async neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { ...opts, depthFallback: 1 }) + return this.hydrateHits(hits) + } + + async callers(key: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { edgeTypes: ['calls'], direction: 'in', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async callees(key: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { edgeTypes: ['calls'], direction: 'out', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async path(from: string, to: string): Promise { + await ensurePostgresSchema(this.pool) + const found = await findShortestPath(this.pool, from, to) + if (!found) return null + + const hops: GraphPathHop[] = [] + for (const hop of found.hops) { + const node = await this.getNode(hop.nodeKey) + hops.push({ + nodeKey: hop.nodeKey, + displayName: node?.displayName ?? hop.nodeKey, + edgeType: hop.edgeType, + reversed: hop.reversed, + }) + } + return { from, to, hops } + } + + async subgraph(seed: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const maxDepth = clampDepth(depth, MAX_GRAPH_TRAVERSAL_DEPTH) + const hits = await traverseNeighbors(this.pool, seed, { direction: 'both', depth: maxDepth }) + const nodeKeys = [...new Set([seed, ...hits.map((h) => h.nodeKey)])] + + if (nodeKeys.length === 0) return { nodes: [], edges: [] } + + const nodesRes = await this.pool.query('SELECT * FROM graph_nodes WHERE node_key = ANY($1::text[])', [nodeKeys]) + const edgesRes = await this.pool.query( + 'SELECT * FROM edges WHERE src_key = ANY($1::text[]) AND dst_key = ANY($1::text[])', + [nodeKeys], + ) + return { nodes: nodesRes.rows.map(rowToNode), edges: edgesRes.rows.map(rowToEdge) } + } + + private async hydrateHits(hits: WalkHit[]): Promise { + if (hits.length === 0) return [] + const nodeKeys = hits.map((h) => h.nodeKey) + const res = await this.pool.query('SELECT * FROM graph_nodes WHERE node_key = ANY($1::text[])', [nodeKeys]) + const byKey = new Map(res.rows.map((r: GraphNodeRow) => [r.node_key, rowToNode(r)])) + return hits + .map((h) => { + const node = byKey.get(h.nodeKey) + return { + nodeKey: h.nodeKey, + displayName: node?.displayName ?? h.nodeKey, + kind: node?.kind ?? 'unknown', + depth: h.depth, + edgeType: h.edgeType, + } + }) + .sort((a, b) => a.depth - b.depth || a.nodeKey.localeCompare(b.nodeKey)) + } } interface GraphNodeRow { diff --git a/src/core/storage/postgres/graphTraversal.ts b/src/core/storage/postgres/graphTraversal.ts new file mode 100644 index 0000000..5ffe487 --- /dev/null +++ b/src/core/storage/postgres/graphTraversal.ts @@ -0,0 +1,118 @@ +/** + * Recursive-CTE traversal primitives over `edges`/`graph_nodes` for the + * Postgres `GraphStore` (Phase 108, knowledge-graph §6). Mirrors + * `../sqlite/graphTraversal.ts`; Postgres supports the same `WITH RECURSIVE` + * + window-function shape as SQLite. + */ + +import type { Pool } from 'pg' +import { MAX_GRAPH_TRAVERSAL_DEPTH, type EdgeType } from '../types.js' + +export function clampDepth(depth: number | undefined, fallback: number): number { + const d = depth ?? fallback + return Math.max(1, Math.min(Math.trunc(d), MAX_GRAPH_TRAVERSAL_DEPTH)) +} + +export interface WalkHit { + nodeKey: string + depth: number + edgeType?: EdgeType +} + +async function walkDirection( + pool: Pool, + start: string, + maxDepth: number, + edgeTypes: EdgeType[] | undefined, + direction: 'out' | 'in', +): Promise { + const srcCol = direction === 'out' ? 'src_key' : 'dst_key' + const dstCol = direction === 'out' ? 'dst_key' : 'src_key' + const edgeFilter = edgeTypes && edgeTypes.length > 0 ? 'AND e.edge_type = ANY($3::text[])' : '' + const params: unknown[] = [start, maxDepth] + if (edgeTypes && edgeTypes.length > 0) params.push(edgeTypes) + + const query = ` + WITH RECURSIVE walk(node_key, depth, edge_type) AS ( + SELECT $1::text AS node_key, 0 AS depth, NULL::text AS edge_type + UNION ALL + SELECT e.${dstCol}, w.depth + 1, e.edge_type + FROM walk w JOIN edges e ON e.${srcCol} = w.node_key + WHERE w.depth < $2 ${edgeFilter} + ) + SELECT node_key, depth, edge_type FROM ( + SELECT node_key, depth, edge_type, + ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth) AS rn + FROM walk WHERE depth > 0 + ) ranked WHERE rn = 1 + ` + const res = await pool.query(query, params) + return res.rows.map((r: { node_key: string; depth: number; edge_type: string | null }) => ({ + nodeKey: r.node_key, + depth: r.depth, + edgeType: (r.edge_type ?? undefined) as EdgeType | undefined, + })) +} + +export async function traverseNeighbors( + pool: Pool, + start: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number; depthFallback?: number }, +): Promise { + const direction = opts.direction ?? 'both' + const maxDepth = clampDepth(opts.depth, opts.depthFallback ?? 1) + const merged = new Map() + + if (direction === 'out' || direction === 'both') { + for (const hit of await walkDirection(pool, start, maxDepth, opts.edgeTypes, 'out')) { + merged.set(hit.nodeKey, hit) + } + } + if (direction === 'in' || direction === 'both') { + for (const hit of await walkDirection(pool, start, maxDepth, opts.edgeTypes, 'in')) { + const existing = merged.get(hit.nodeKey) + if (!existing || hit.depth < existing.depth) merged.set(hit.nodeKey, hit) + } + } + return [...merged.values()] +} + +export interface PathRow { + depth: number + hops: { nodeKey: string; edgeType: EdgeType; reversed: boolean }[] +} + +export async function findShortestPath( + pool: Pool, + from: string, + to: string, + maxDepth: number = MAX_GRAPH_TRAVERSAL_DEPTH, +): Promise { + if (from === to) return { depth: 0, hops: [] } + + const query = ` + WITH RECURSIVE walk(node_key, depth, path) AS ( + SELECT $1::text AS node_key, 0 AS depth, $1::text AS path + UNION ALL + SELECT + CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END, + w.depth + 1, + w.path || '|' || e.edge_type || '|' || (CASE WHEN e.src_key = w.node_key THEN '0' ELSE '1' END) + || '|' || (CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END) + FROM walk w + JOIN edges e ON (e.src_key = w.node_key OR e.dst_key = w.node_key) AND e.src_key != e.dst_key + WHERE w.depth < $2 + ) + SELECT path, depth FROM walk WHERE node_key = $3 AND depth > 0 ORDER BY depth ASC LIMIT 1 + ` + const res = await pool.query(query, [from, maxDepth, to]) + const row = res.rows[0] as { path: string; depth: number } | undefined + if (!row) return null + + const parts = row.path.split('|') + const hops: PathRow['hops'] = [] + for (let i = 1; i < parts.length; i += 3) { + hops.push({ edgeType: parts[i] as EdgeType, reversed: parts[i + 1] === '1', nodeKey: parts[i + 2] }) + } + return { depth: row.depth, hops } +} diff --git a/src/core/storage/sqlite/graphTraversal.ts b/src/core/storage/sqlite/graphTraversal.ts new file mode 100644 index 0000000..5865dcc --- /dev/null +++ b/src/core/storage/sqlite/graphTraversal.ts @@ -0,0 +1,131 @@ +/** + * Recursive-CTE traversal primitives over `edges`/`graph_nodes` for the + * SQLite `GraphStore` (Phase 108, knowledge-graph §6). + * + * Each helper takes the active session's raw `better-sqlite3` handle and + * returns plain rows; `SqliteGraphStore` (profile.ts) wraps these into the + * `GraphHit`/`GraphPath`/`GraphSubgraph` shapes from `../types.js`. + */ + +import type Database from 'better-sqlite3' +import { MAX_GRAPH_TRAVERSAL_DEPTH, type EdgeType } from '../types.js' + +export function clampDepth(depth: number | undefined, fallback: number): number { + const d = depth ?? fallback + return Math.max(1, Math.min(Math.trunc(d), MAX_GRAPH_TRAVERSAL_DEPTH)) +} + +export interface WalkHit { + nodeKey: string + depth: number + edgeType?: EdgeType +} + +/** + * Single-direction recursive walk from `start`, returning the shortest-depth + * hit (with the edge type of that hop) for every node reached within + * `maxDepth`. `direction: 'out'` follows `src_key -> dst_key`; `'in'` follows + * `dst_key -> src_key`. + */ +function walkDirection( + rawDb: InstanceType, + start: string, + maxDepth: number, + edgeTypes: EdgeType[] | undefined, + direction: 'out' | 'in', +): WalkHit[] { + const srcCol = direction === 'out' ? 'src_key' : 'dst_key' + const dstCol = direction === 'out' ? 'dst_key' : 'src_key' + const edgeFilter = edgeTypes && edgeTypes.length > 0 + ? `AND e.edge_type IN (${edgeTypes.map(() => '?').join(',')})` + : '' + + const query = ` + WITH RECURSIVE walk(node_key, depth, edge_type) AS ( + SELECT ? AS node_key, 0 AS depth, NULL AS edge_type + UNION ALL + SELECT e.${dstCol}, w.depth + 1, e.edge_type + FROM walk w JOIN edges e ON e.${srcCol} = w.node_key + WHERE w.depth < ? ${edgeFilter} + ) + SELECT node_key, depth, edge_type FROM ( + SELECT node_key, depth, edge_type, + ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth) AS rn + FROM walk WHERE depth > 0 + ) WHERE rn = 1 + ` + const params: unknown[] = [start, maxDepth, ...(edgeTypes ?? [])] + const rows = rawDb.prepare(query).all(...params) as Array<{ node_key: string; depth: number; edge_type: string | null }> + return rows.map((r) => ({ nodeKey: r.node_key, depth: r.depth, edgeType: (r.edge_type ?? undefined) as EdgeType | undefined })) +} + +/** + * Typed neighborhood of `start` (Phase 108, knowledge-graph §6). `direction` + * defaults to `'both'`. Depth is clamped via `clampDepth`. + */ +export function traverseNeighbors( + rawDb: InstanceType, + start: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number; depthFallback?: number }, +): WalkHit[] { + const direction = opts.direction ?? 'both' + const maxDepth = clampDepth(opts.depth, opts.depthFallback ?? 1) + const merged = new Map() + + if (direction === 'out' || direction === 'both') { + for (const hit of walkDirection(rawDb, start, maxDepth, opts.edgeTypes, 'out')) { + merged.set(hit.nodeKey, hit) + } + } + if (direction === 'in' || direction === 'both') { + for (const hit of walkDirection(rawDb, start, maxDepth, opts.edgeTypes, 'in')) { + const existing = merged.get(hit.nodeKey) + if (!existing || hit.depth < existing.depth) merged.set(hit.nodeKey, hit) + } + } + return [...merged.values()] +} + +export interface PathRow { + depth: number + hops: { nodeKey: string; edgeType: EdgeType; reversed: boolean }[] +} + +/** + * Shortest path from `from` to `to` over edges of any type, traversed in + * either direction, via a recursive CTE that accumulates a delimited path + * string. Returns `null` if unreachable within `MAX_GRAPH_TRAVERSAL_DEPTH`. + */ +export function findShortestPath( + rawDb: InstanceType, + from: string, + to: string, + maxDepth: number = MAX_GRAPH_TRAVERSAL_DEPTH, +): PathRow | null { + if (from === to) return { depth: 0, hops: [] } + + const query = ` + WITH RECURSIVE walk(node_key, depth, path) AS ( + SELECT ? AS node_key, 0 AS depth, ? AS path + UNION ALL + SELECT + CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END, + w.depth + 1, + w.path || '|' || e.edge_type || '|' || (CASE WHEN e.src_key = w.node_key THEN '0' ELSE '1' END) + || '|' || (CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END) + FROM walk w + JOIN edges e ON (e.src_key = w.node_key OR e.dst_key = w.node_key) AND e.src_key != e.dst_key + WHERE w.depth < ? + ) + SELECT path, depth FROM walk WHERE node_key = ? AND depth > 0 ORDER BY depth ASC LIMIT 1 + ` + const row = rawDb.prepare(query).get(from, from, maxDepth, to) as { path: string; depth: number } | undefined + if (!row) return null + + const parts = row.path.split('|') + const hops: PathRow['hops'] = [] + for (let i = 1; i < parts.length; i += 3) { + hops.push({ edgeType: parts[i] as EdgeType, reversed: parts[i + 1] === '1', nodeKey: parts[i + 2] }) + } + return { depth: row.depth, hops } +} diff --git a/src/core/storage/sqlite/profile.ts b/src/core/storage/sqlite/profile.ts index 3249410..b7ef884 100644 --- a/src/core/storage/sqlite/profile.ts +++ b/src/core/storage/sqlite/profile.ts @@ -10,7 +10,7 @@ import { getActiveSession } from '../../db/sqlite.js' import { embeddings, paths, blobs, commits, blobCommits, indexedCommits, blobBranches, chunks, chunkEmbeddings, symbols, symbolEmbeddings, moduleEmbeddings, commitEmbeddings, graphNodes, edges } from '../../db/schema.js' -import { eq, inArray, sql } from 'drizzle-orm' +import { eq, inArray, sql, and } from 'drizzle-orm' import { isIndexed as dedupeIsIndexed, filterNewBlobs as dedupeFilterNewBlobs } from '../../indexing/deduper.js' import { storeFtsContent, getBlobContent, storeBlob, storeBlobRecord, @@ -29,8 +29,12 @@ import type { EdgeType, FtsStore, GraphEdgeRecord, + GraphHit, GraphNodeRecord, + GraphPath, + GraphPathHop, GraphStore, + GraphSubgraph, MetadataStore, StorageProfile, StorageScope, @@ -42,6 +46,8 @@ import type { WriteBlobRecordArgs, WriteFileBlobArgs, } from '../types.js' +import { MAX_GRAPH_TRAVERSAL_DEPTH } from '../types.js' +import { traverseNeighbors, findShortestPath, clampDepth, type WalkHit } from './graphTraversal.js' class SqliteMetadataStore implements MetadataStore { async isIndexed(blobHash: string, model: string): Promise { @@ -397,6 +403,82 @@ export class SqliteGraphStore implements GraphStore { ? collected.filter((e) => edgeTypes.includes(e.edgeType)) : collected } + + async neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { ...opts, depthFallback: 1 }) + return this.hydrateHits(hits) + } + + async callers(key: string, depth?: number): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { edgeTypes: ['calls'], direction: 'in', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async callees(key: string, depth?: number): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { edgeTypes: ['calls'], direction: 'out', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async path(from: string, to: string): Promise { + const { rawDb } = getActiveSession() + const found = findShortestPath(rawDb, from, to) + if (!found) return null + + const hops: GraphPathHop[] = [] + for (const hop of found.hops) { + const node = await this.getNode(hop.nodeKey) + hops.push({ + nodeKey: hop.nodeKey, + displayName: node?.displayName ?? hop.nodeKey, + edgeType: hop.edgeType, + reversed: hop.reversed, + }) + } + return { from, to, hops } + } + + async subgraph(seed: string, depth?: number): Promise { + const { db, rawDb } = getActiveSession() + const maxDepth = clampDepth(depth, MAX_GRAPH_TRAVERSAL_DEPTH) + const hits = traverseNeighbors(rawDb, seed, { direction: 'both', depth: maxDepth }) + const nodeKeys = [...new Set([seed, ...hits.map((h) => h.nodeKey)])] + + const nodes = nodeKeys.length > 0 + ? db.select().from(graphNodes).where(inArray(graphNodes.nodeKey, nodeKeys)).all().map(rowToNode) + : [] + + const edgeRows = nodeKeys.length > 0 + ? db.select().from(edges) + .where(and(inArray(edges.srcKey, nodeKeys), inArray(edges.dstKey, nodeKeys))) + .all() + : [] + + return { nodes, edges: edgeRows.map(rowToEdge) } + } + + /** Resolves `WalkHit`s into `GraphHit`s by looking up display name/kind for each node. */ + private async hydrateHits(hits: WalkHit[]): Promise { + if (hits.length === 0) return [] + const { db } = getActiveSession() + const nodeKeys = hits.map((h) => h.nodeKey) + const rows = db.select().from(graphNodes).where(inArray(graphNodes.nodeKey, nodeKeys)).all() + const byKey = new Map(rows.map((r) => [r.nodeKey, rowToNode(r)])) + return hits + .map((h) => { + const node = byKey.get(h.nodeKey) + return { + nodeKey: h.nodeKey, + displayName: node?.displayName ?? h.nodeKey, + kind: node?.kind ?? 'unknown', + depth: h.depth, + edgeType: h.edgeType, + } + }) + .sort((a, b) => a.depth - b.depth || a.nodeKey.localeCompare(b.nodeKey)) + } } function rowToNode(row: typeof graphNodes.$inferSelect): GraphNodeRecord { diff --git a/src/core/storage/types.ts b/src/core/storage/types.ts index 4e4e142..11b0dda 100644 --- a/src/core/storage/types.ts +++ b/src/core/storage/types.ts @@ -164,11 +164,51 @@ export interface GraphEdgeRecord { observedCount?: number } +/** + * Default/maximum traversal depth for `GraphStore` traversal primitives + * (Phase 108, knowledge-graph §6). Capped to bound recursive-CTE cost. + */ +export const MAX_GRAPH_TRAVERSAL_DEPTH = 3 + +/** A node reached during a `GraphStore` traversal (Phase 108, knowledge-graph §6). */ +export interface GraphHit { + nodeKey: string + displayName: string + kind: string + /** Number of hops from the traversal's starting node (>= 1). */ + depth: number + /** The edge type of the (shortest) hop that reached this node, if known. */ + edgeType?: EdgeType +} + +/** One hop in a `GraphPath` (Phase 108, knowledge-graph §6). */ +export interface GraphPathHop { + nodeKey: string + displayName: string + edgeType: EdgeType + /** True if this hop traverses an edge against its stored src->dst direction. */ + reversed: boolean +} + +/** A shortest typed path between two graph nodes (Phase 108, knowledge-graph §6). */ +export interface GraphPath { + from: string + to: string + /** Hops from `from` to `to`, in order. Empty if `from === to`. */ + hops: GraphPathHop[] +} + +/** A node-induced subgraph (Phase 108, knowledge-graph §6). */ +export interface GraphSubgraph { + nodes: GraphNodeRecord[] + edges: GraphEdgeRecord[] +} + /** * Storage for the recomputable structural graph (Phase 107, knowledge-graph * §3.3/§6). `gitsema graph build` truncates and rebuilds nodes/edges wholesale * (like `blob_clusters`); read methods back the early `co-change`/`deps`/ - * `cycles` commands. + * `cycles` commands plus the Phase 108 traversal primitives. * * Relational-only (review9 §4): the Qdrant profile's `GraphStore` throws on * every method — graph queries require a relational backend. @@ -187,6 +227,21 @@ export interface GraphStore { allEdges(edgeTypes?: EdgeType[]): Promise /** Edges touching `nodeKey`, optionally filtered by direction and edge types. */ edgesFor(nodeKey: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both' }): Promise + + /** + * Typed neighborhood of `key` via recursive traversal (Phase 108, + * knowledge-graph §6). `direction` defaults to `'both'`; `depth` defaults + * to 1 and is capped at `MAX_GRAPH_TRAVERSAL_DEPTH`. + */ + neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise + /** Reverse `calls` traversal — who (transitively) calls `key`. Depth capped at `MAX_GRAPH_TRAVERSAL_DEPTH` (default). */ + callers(key: string, depth?: number): Promise + /** Forward `calls` traversal — what `key` (transitively) calls. Depth capped at `MAX_GRAPH_TRAVERSAL_DEPTH` (default). */ + callees(key: string, depth?: number): Promise + /** Shortest typed path from `from` to `to` (any edge type/direction), or null if unreachable within `MAX_GRAPH_TRAVERSAL_DEPTH`. */ + path(from: string, to: string): Promise + /** The node-induced subgraph within `depth` hops of `seed` (both directions, all edge types). `depth` capped at `MAX_GRAPH_TRAVERSAL_DEPTH`. */ + subgraph(seed: string, depth?: number): Promise } /** A raw structural reference to persist for one blob (Phase 106, knowledge-graph §3.2). */ diff --git a/src/core/storage/unsupportedGraphStore.ts b/src/core/storage/unsupportedGraphStore.ts index 3b68511..8e54340 100644 --- a/src/core/storage/unsupportedGraphStore.ts +++ b/src/core/storage/unsupportedGraphStore.ts @@ -8,7 +8,7 @@ * graph-unavailable backend, not a silent empty graph. */ -import type { EdgeType, GraphEdgeRecord, GraphNodeRecord, GraphStore } from './types.js' +import type { EdgeType, GraphEdgeRecord, GraphHit, GraphNodeRecord, GraphPath, GraphStore, GraphSubgraph } from './types.js' const ERROR_MESSAGE = 'graph queries require a relational backend (Qdrant storage profiles do not support gitsema graph build/co-change/deps/cycles)' @@ -40,4 +40,24 @@ export class UnsupportedGraphStore implements GraphStore { async edgesFor(_nodeKey: string, _opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both' }): Promise { throw new Error(ERROR_MESSAGE) } + + async neighbors(_key: string, _opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + throw new Error(ERROR_MESSAGE) + } + + async callers(_key: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } + + async callees(_key: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } + + async path(_from: string, _to: string): Promise { + throw new Error(ERROR_MESSAGE) + } + + async subgraph(_seed: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } } diff --git a/src/mcp/server.ts b/src/mcp/server.ts index e7595c4..bb62da0 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -12,6 +12,7 @@ import { registerClusteringTools } from './tools/clustering.js' import { registerWorkflowTools } from './tools/workflow.js' import { registerInfrastructureTools } from './tools/infrastructure.js' import { registerNarratorTools } from './tools/narrator.js' +import { registerGraphTools } from './tools/graph.js' import { readFileSync } from 'node:fs' // Read package version dynamically so the MCP server always matches package.json @@ -37,6 +38,7 @@ export async function startMcpServer(): Promise { registerWorkflowTools(server) registerInfrastructureTools(server) registerNarratorTools(server) + registerGraphTools(server) const transport = new StdioServerTransport() await server.connect(transport) diff --git a/src/mcp/tools/graph.ts b/src/mcp/tools/graph.ts new file mode 100644 index 0000000..4afe271 --- /dev/null +++ b/src/mcp/tools/graph.ts @@ -0,0 +1,90 @@ +import { z } from 'zod' +import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js' +import { registerTool } from '../registerTool.js' +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callers, callees, neighbors } from '../../core/graph/traversal.js' +import type { EdgeType, GraphHit } from '../../core/storage/types.js' + +function renderResolutionError(label: string, resolved: { status: string; candidates?: Array<{ nodeKey: string }> }): string { + if (resolved.status === 'not-found') { + return `No graph node found for "${label}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.` + } + const candidates = (resolved.candidates ?? []).map((c) => ` ${c.nodeKey}`).join('\n') + return `"${label}" is ambiguous — matches multiple symbols:\n${candidates}` +} + +function renderHits(hits: GraphHit[]): string { + if (hits.length === 0) return ' (none)' + return hits.map((h) => ` ${h.edgeType ? `[${h.edgeType}] ` : ''}${h.displayName} (depth ${h.depth})`).join('\n') +} + +/** + * Phase 108 (knowledge-graph §6/§8) MCP tools, exposing the `GraphStore` + * traversal primitives: `call_graph` (callers/callees over `calls` edges) + * and `graph_neighbors` (typed neighborhood, any edge kinds). + */ +export function registerGraphTools(server: McpServer) { + registerTool( + server, + 'call_graph', + 'Structural call-graph traversal: who calls (or is called by) a symbol, via the Phase 107/108 structural graph (`gitsema index --graph` + `gitsema graph build`). Reverse `calls` traversal (direction=callers) finds callers; forward (direction=callees) finds callees.', + { + symbol: z.string().describe('A symbol qualified name, file path, or literal node key (file:..., symbol:..., external:...)'), + direction: z.enum(['callers', 'callees']).optional().default('callers').describe('Traverse reverse (callers) or forward (callees) `calls` edges'), + depth: z.number().int().min(1).max(3).optional().describe('Traversal depth (default and max: 3)'), + }, + async ({ symbol, direction, depth }) => { + try { + const profile = getCachedStorageProfile(process.cwd()) + const result = direction === 'callees' + ? await callees(profile.graph, symbol, depth) + : await callers(profile.graph, symbol, depth) + + if (result.resolved.status !== 'found') { + return { content: [{ type: 'text', text: renderResolutionError(symbol, result.resolved) }] } + } + + const node = result.resolved.node + const label = direction === 'callees' ? 'Callees of' : 'Callers of' + return { content: [{ type: 'text', text: `${label} ${node.displayName} (${node.nodeKey}):\n\n${renderHits(result.hits)}` }] } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + return { content: [{ type: 'text', text: `Error: ${msg}` }] } + } + }, + ) + + registerTool( + server, + 'graph_neighbors', + 'Typed neighborhood of a node in the structural graph (Phase 107/108: `gitsema index --graph` + `gitsema graph build`). Returns nodes connected by the given edge types (default: all) in the given direction.', + { + node: z.string().describe('A symbol qualified name, file path, or literal node key (file:..., symbol:..., external:...)'), + edge_types: z.array(z.enum(['contains', 'defines', 'imports', 'calls', 'extends', 'implements', 'references', 'co_change', 'similar_to'])) + .optional() + .describe('Edge types to traverse (default: all)'), + direction: z.enum(['out', 'in', 'both']).optional().default('both').describe("Edge direction relative to 'node'"), + depth: z.number().int().min(1).max(3).optional().describe('Traversal depth (default 1, max 3)'), + }, + async ({ node, edge_types, direction, depth }) => { + try { + const profile = getCachedStorageProfile(process.cwd()) + const result = await neighbors(profile.graph, node, { + edgeTypes: edge_types as EdgeType[] | undefined, + direction, + depth, + }) + + if (result.resolved.status !== 'found') { + return { content: [{ type: 'text', text: renderResolutionError(node, result.resolved) }] } + } + + const resolvedNode = result.resolved.node + return { content: [{ type: 'text', text: `Neighbors of ${resolvedNode.displayName} (${resolvedNode.nodeKey}):\n\n${renderHits(result.hits)}` }] } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + return { content: [{ type: 'text', text: `Error: ${msg}` }] } + } + }, + ) +} diff --git a/tests/graphTraversal.test.ts b/tests/graphTraversal.test.ts new file mode 100644 index 0000000..ef71863 --- /dev/null +++ b/tests/graphTraversal.test.ts @@ -0,0 +1,208 @@ +/** + * Tests for the Phase 108 traversal primitives (knowledge-graph §6/§8): + * `GraphStore.neighbors/callers/callees/path/subgraph`, the recursive-CTE + * implementations in `SqliteGraphStore`, and the `core/graph/traversal.ts` + * wrappers used by `gitsema graph callers|callees|neighbors|path`. + */ + +import { describe, it, expect, afterEach } from 'vitest' +import { mkdtempSync, rmSync } from 'node:fs' +import { join } from 'node:path' +import { tmpdir } from 'node:os' +import { openDatabaseAt, withDbSession, type DbSession } from '../src/core/db/sqlite.js' +import { SqliteGraphStore } from '../src/core/storage/sqlite/profile.js' +import { callers, callees, neighbors, path as graphPath } from '../src/core/graph/traversal.js' +import type { GraphEdgeRecord, GraphNodeRecord } from '../src/core/storage/types.js' + +// --------------------------------------------------------------------------- +// Fixture graph: +// +// file:a.ts --defines--> symbol:A, symbol:B, symbol:C +// symbol:A --calls--> symbol:B --calls--> symbol:C --calls--> external:lib +// +// --------------------------------------------------------------------------- + +const NODES: GraphNodeRecord[] = [ + { nodeKey: 'file:a.ts', kind: 'file', displayName: 'a.ts', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#A#sig1', kind: 'function', displayName: 'A', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#B#sig2', kind: 'function', displayName: 'B', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#C#sig3', kind: 'function', displayName: 'C', path: 'a.ts' }, + { nodeKey: 'external:lib', kind: 'external', displayName: 'lib', isExternal: true }, + { nodeKey: 'external:isolated', kind: 'external', displayName: 'isolated', isExternal: true }, +] + +const EDGES: GraphEdgeRecord[] = [ + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#A#sig1', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'defines' }, + { srcKey: 'symbol:a.ts#A#sig1', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#B#sig2', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#C#sig3', dstKey: 'external:lib', edgeType: 'calls' }, +] + +function setupFixtureDb(): { session: DbSession; tmpDir: string } { + const tmpDir = mkdtempSync(join(tmpdir(), 'gitsema-graphtraversal-')) + const session = openDatabaseAt(join(tmpDir, 'test.db')) + return { session, tmpDir } +} + +const tmpDirs: string[] = [] +afterEach(() => { + for (const dir of tmpDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }) + } +}) + +async function withGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { + const { session, tmpDir } = setupFixtureDb() + tmpDirs.push(tmpDir) + return withDbSession(session, async () => { + const graph = new SqliteGraphStore() + await graph.replaceAll(NODES, EDGES) + return fn(graph, session) + }) +} + +describe('SqliteGraphStore.neighbors', () => { + it('returns depth-1 neighbors by default, in both directions', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('symbol:a.ts#B#sig2') + const keys = hits.map((h) => h.nodeKey).sort() + // B has a `calls` edge from A, a `calls` edge to C, and a `defines` edge from file:a.ts. + expect(keys).toEqual(['file:a.ts', 'symbol:a.ts#A#sig1', 'symbol:a.ts#C#sig3'].sort()) + expect(hits.every((h) => h.depth === 1)).toBe(true) + }) + }) + + it('filters by edge type and direction', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('file:a.ts', { edgeTypes: ['defines'], direction: 'out' }) + const keys = hits.map((h) => h.nodeKey).sort() + expect(keys).toEqual([ + 'symbol:a.ts#A#sig1', + 'symbol:a.ts#B#sig2', + 'symbol:a.ts#C#sig3', + ].sort()) + expect(hits.every((h) => h.edgeType === 'defines')).toBe(true) + }) + }) + + it('expands to deeper depth when requested, capped at 3', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('symbol:a.ts#A#sig1', { edgeTypes: ['calls'], direction: 'out', depth: 10 }) + const keys = hits.map((h) => h.nodeKey).sort() + // depth capped at 3: A -calls-> B -calls-> C -calls-> external:lib + expect(keys).toEqual(['external:lib', 'symbol:a.ts#B#sig2', 'symbol:a.ts#C#sig3'].sort()) + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#C#sig3')).toBe(2) + expect(byKey.get('external:lib')).toBe(3) + }) + }) +}) + +describe('SqliteGraphStore.callers / callees', () => { + it('callers walks `calls` edges backward', async () => { + await withGraph(async (graph) => { + const hits = await graph.callers('symbol:a.ts#C#sig3') + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#A#sig1')).toBe(2) + }) + }) + + it('callees walks `calls` edges forward', async () => { + await withGraph(async (graph) => { + const hits = await graph.callees('symbol:a.ts#A#sig1') + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#C#sig3')).toBe(2) + expect(byKey.get('external:lib')).toBe(3) + }) + }) + + it('respects an explicit depth limit', async () => { + await withGraph(async (graph) => { + const hits = await graph.callees('symbol:a.ts#A#sig1', 1) + expect(hits.map((h) => h.nodeKey)).toEqual(['symbol:a.ts#B#sig2']) + }) + }) +}) + +describe('SqliteGraphStore.path', () => { + it('finds the shortest typed path between two nodes', async () => { + await withGraph(async (graph) => { + const result = await graph.path('symbol:a.ts#A#sig1', 'symbol:a.ts#C#sig3') + expect(result).not.toBeNull() + expect(result!.hops.map((h) => h.nodeKey)).toEqual(['symbol:a.ts#B#sig2', 'symbol:a.ts#C#sig3']) + expect(result!.hops.every((h) => h.edgeType === 'calls' && !h.reversed)).toBe(true) + }) + }) + + it('returns an empty-hop path when from === to', async () => { + await withGraph(async (graph) => { + const result = await graph.path('symbol:a.ts#A#sig1', 'symbol:a.ts#A#sig1') + expect(result).toEqual({ from: 'symbol:a.ts#A#sig1', to: 'symbol:a.ts#A#sig1', hops: [] }) + }) + }) + + it('returns null when no path exists within the depth cap', async () => { + await withGraph(async (graph) => { + const result = await graph.path('external:lib', 'external:isolated') + expect(result).toBeNull() + }) + }) +}) + +describe('SqliteGraphStore.subgraph', () => { + it('returns the node-induced subgraph within depth hops of the seed', async () => { + await withGraph(async (graph) => { + const { nodes, edges } = await graph.subgraph('symbol:a.ts#B#sig2', 1) + const nodeKeys = nodes.map((n) => n.nodeKey).sort() + expect(nodeKeys).toEqual([ + 'file:a.ts', + 'symbol:a.ts#A#sig1', + 'symbol:a.ts#B#sig2', + 'symbol:a.ts#C#sig3', + ].sort()) + expect(edges.some((e) => e.srcKey === 'symbol:a.ts#A#sig1' && e.dstKey === 'symbol:a.ts#B#sig2')).toBe(true) + expect(edges.some((e) => e.srcKey === 'symbol:a.ts#B#sig2' && e.dstKey === 'symbol:a.ts#C#sig3')).toBe(true) + }) + }) +}) + +describe('core/graph/traversal wrappers', () => { + it('resolves a display name to its graph node and traverses callers', async () => { + await withGraph(async (graph) => { + const result = await callers(graph, 'C') + expect(result.resolved.status).toBe('found') + expect(result.hits.map((h) => h.displayName).sort()).toEqual(['A', 'B'].sort()) + }) + }) + + it('resolves a display name and traverses callees', async () => { + await withGraph(async (graph) => { + const result = await callees(graph, 'A', 1) + expect(result.resolved.status).toBe('found') + expect(result.hits.map((h) => h.displayName)).toEqual(['B']) + }) + }) + + it('returns not-found for an unknown identifier', async () => { + await withGraph(async (graph) => { + const result = await neighbors(graph, 'does-not-exist') + expect(result.resolved.status).toBe('not-found') + expect(result.hits).toEqual([]) + }) + }) + + it('finds a path between two symbols by display name', async () => { + await withGraph(async (graph) => { + const result = await graphPath(graph, 'A', 'C') + expect(result.from.status).toBe('found') + expect(result.to.status).toBe('found') + expect(result.path).not.toBeNull() + expect(result.path!.hops.map((h) => h.displayName)).toEqual(['B', 'C']) + }) + }) +}) From 5ed1f53510183d745b0d785f557fe39ba0e45756 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 18:39:25 +0000 Subject: [PATCH 2/6] Phase 109: --lens toggle + structural/semantic fusion commands Add a four-signal vectorSearch ranking formula (weightStructural/structuralScores, byte-identical by default), a shared --lens semantic|structural|hybrid + --weight-structural CLI option, and new commands gitsema blast-radius/relate/ similar/unused. impact --lens structural|hybrid becomes a thin alias over blast-radius. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- .changeset/phase-109-lens-toggle.md | 5 + README.md | 6 +- docs/PLAN.md | 60 +++++ docs/features.md | 2 + src/cli/commands/graphBlastRadius.ts | 26 +++ src/cli/commands/graphRelate.ts | 50 ++++ src/cli/commands/graphSimilar.ts | 51 ++++ src/cli/commands/graphUnused.ts | 26 +++ src/cli/commands/impact.ts | 36 +++ src/cli/lib/graphRender.ts | 51 ++++ src/cli/lib/lens.ts | 51 ++++ src/cli/register/all.ts | 35 +-- src/cli/register/graph.ts | 35 +++ src/core/graph/blastRadius.ts | 59 +++++ src/core/graph/relate.ts | 41 ++++ src/core/graph/semanticNeighbors.ts | 177 ++++++++++++++ src/core/graph/similar.ts | 95 ++++++++ src/core/graph/unused.ts | 42 ++++ src/core/models/types.ts | 2 +- src/core/search/analysis/vectorSearch.ts | 43 +++- tests/graphLens.test.ts | 283 +++++++++++++++++++++++ 21 files changed, 1147 insertions(+), 29 deletions(-) create mode 100644 .changeset/phase-109-lens-toggle.md create mode 100644 src/cli/commands/graphBlastRadius.ts create mode 100644 src/cli/commands/graphRelate.ts create mode 100644 src/cli/commands/graphSimilar.ts create mode 100644 src/cli/commands/graphUnused.ts create mode 100644 src/cli/lib/graphRender.ts create mode 100644 src/cli/lib/lens.ts create mode 100644 src/core/graph/blastRadius.ts create mode 100644 src/core/graph/relate.ts create mode 100644 src/core/graph/semanticNeighbors.ts create mode 100644 src/core/graph/similar.ts create mode 100644 src/core/graph/unused.ts create mode 100644 tests/graphLens.test.ts diff --git a/.changeset/phase-109-lens-toggle.md b/.changeset/phase-109-lens-toggle.md new file mode 100644 index 0000000..5bf1b5a --- /dev/null +++ b/.changeset/phase-109-lens-toggle.md @@ -0,0 +1,5 @@ +--- +"gitsema": minor +--- + +Add a cross-cutting `--lens semantic|structural|hybrid` toggle (plus `--weight-structural `) and four new structural/semantic fusion commands: `gitsema blast-radius ` ("what changes if I touch this" — structural dependents and/or semantically similar blobs), `gitsema relate ` (callers/callees plus semantically similar blobs, both lenses), `gitsema similar ` (same call/import shape and/or semantic similarity), and `gitsema unused` (symbols/files with no inbound calls/imports edges). `gitsema impact --lens structural|hybrid` now reuses `blast-radius` for true structural impact analysis. diff --git a/README.md b/README.md index dc95df5..891a99c 100644 --- a/README.md +++ b/README.md @@ -358,7 +358,7 @@ CI policy gate over drift, debt, and security thresholds. | `gitsema file-evolution [options]` | Track semantic drift of a file over its Git history (see also: `file-diff`, `evolution`) | | `gitsema file-diff ` | Compute semantic diff between two versions of a file | | `gitsema blame ` (alias: `semantic-blame`) | Show semantic origin of each logical block in a file — nearest-neighbor blame | -| `gitsema impact ` | Compute semantically similar blobs across the codebase to highlight refactor impact | +| `gitsema impact [--lens ] [--weight-structural ]` | Compute semantically similar blobs across the codebase to highlight refactor impact. `--lens structural\|hybrid` makes this a thin alias over `blast-radius` (default lens: semantic, pre-Phase-109 behavior) | #### `gitsema file-evolution [options]` @@ -439,6 +439,10 @@ Track semantic drift of a single file across its Git history. | `gitsema graph callees [--depth ]` | Forward `calls` traversal — what `` (transitively) calls (default depth 3, max 3) | | `gitsema graph neighbors [--edge-types ] [--direction ] [--depth ]` | Typed neighborhood of `` — any edge kinds by default (default depth 1, max 3) | | `gitsema graph path ` | Shortest typed path from `` to `` (max depth 3) | +| `gitsema blast-radius [--lens ] [--depth ] [-k/--top ] [--weight-structural ]` | What changes if I touch this — structural dependents (`calls`/`imports`/`extends`/`implements`/`references`, reverse traversal) and/or semantically similar blobs (default lens: hybrid) | +| `gitsema relate [-k/--top ]` | Callers/callees (structural, depth 1) and semantically similar blobs, labeled — both lenses, lose neither | +| `gitsema similar [--lens ] [-k/--top ] [--weight-structural ]` | Symbols/files with a similar call/import shape (structural, Jaccard overlap) and/or semantically similar (vector) (default lens: hybrid) | +| `gitsema unused [--edge-types ]` | Symbols/files with no inbound `calls`/`imports` edges — structural complement to `dead-concepts` | ### Workflow & CI diff --git a/docs/PLAN.md b/docs/PLAN.md index f62c86c..6db1bb3 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -4033,3 +4033,63 @@ fusion pass / Phase 112 lens-coverage sweep, consistent with `docsSync`'s existi guard (which only requires every `GUIDE_TOOLS` entry to have an interpretation, not that every MCP tool is in `GUIDE_TOOLS`). No schema change. Tests: `tests/graphTraversal.test.ts`. + +**Status:** Phase 109 ✅ complete. Adds the cross-cutting `--lens +semantic|structural|hybrid` toggle (knowledge-graph §7/§8) plus a fourth ranking +signal in `vectorSearch` (`src/core/search/analysis/vectorSearch.ts`): +`weightStructural`/`structuralScores` extend the three-signal formula to +`score = (wv*cosine + wr*recency + wp*pathScore + ws*structScore) / wTotal`, where +`structScore` comes from a precomputed `Map` of structural +proximity (`1 / (1 + hops)`) from a query anchor. When neither option is set +(the default for every existing caller), `useWeightedSignals` and the formula are +unchanged from before Phase 109 — semantic-lens output is byte-for-byte identical. +A shared `src/cli/lib/lens.ts` provides `parseLens()`, `lensWeights()`, and +`addLensOption()` (adds `--lens ` and `--weight-structural ` to a +Commander command). Four new core modules under `src/core/graph/` back four new +top-level commands: +- `blastRadius` (`blastRadius.ts`) → `gitsema blast-radius [--lens] + [--depth] [-k/--top]` (default lens: hybrid) — structural dependents via + `graph.neighbors(node, {edgeTypes: BLAST_RADIUS_EDGE_TYPES, direction: 'in', + depth})` (calls/imports/extends/implements/references) and/or semantically + similar blobs/symbols. +- `relate` (`relate.ts`) → `gitsema relate [-k/--top]` — depth-1 + callers + depth-1 callees (labeled, structural) plus semantically similar + blobs/symbols — "both lenses, lose neither", no `--lens` flag (always shows all + three sections). +- `similar` (`similar.ts`) → `gitsema similar [--lens] [-k/--top]` + (default lens: hybrid) — structural similarity ranks nodes of the same `kind` by + Jaccard overlap of their outgoing edge targets (`imports` for files, `calls` for + symbols by default); semantic similarity ranks by embedding cosine similarity. +- `unused` (`unused.ts`) → `gitsema unused [--edge-types]` — file/function/class/ + method nodes with no inbound `calls`/`imports` edges (excludes `external:*` + nodes); the structural complement to the semantic `dead-concepts` command. + +The "semantic similarity without an embedding provider" lookup +(`src/core/graph/semanticNeighbors.ts`, `semanticNeighborsForNode()`) ranks stored +`embeddings`/`symbol_embeddings` rows by cosine similarity to the resolved graph +node's own stored embedding (file nodes use `currentBlobHash`'s whole-file +embedding; symbol nodes parse `symbol:##` and +use the matching `symbol_embeddings` row) — no network call. It returns +`{supported: false, hits: []}` on non-sqlite backends, and all four new commands +(plus `impact`'s blast-radius alias) render `(not supported on this storage +backend)` for the semantic section in that case rather than throwing. Shared +rendering helpers (`renderResolutionError`, `renderBlastRadius`) live in +`src/cli/lib/graphRender.ts`. + +`gitsema impact --lens structural|hybrid` becomes a thin alias over +`blastRadius()` (knowledge-graph §8): the path is normalized to a `file:` +graph node and delegated entirely to `blastRadius()`/`renderBlastRadius()`, +including `--dump`/`--out json` support. `--lens semantic` (the default) preserves +the pre-Phase-109 `computeImpact()` code path exactly. + +**Deviations from the original sketch:** (1) `--weight-structural` is accepted by +`blast-radius`/`similar`/`impact` for consistency with the shared `--lens` option +helper, but the new fusion commands rank their structural/semantic sections +independently (Jaccard / graph-distance / cosine) rather than through +`vectorSearch`'s four-signal blend — `--weight-structural` only affects ranking +when `--lens` flows into `vectorSearch` directly (not yet wired for any CLI +command in this phase; the four-signal formula itself is tested and ready for a +future search-integration phase). (2) No new MCP tools were added for +`blast-radius`/`relate`/`similar`/`unused` — left for a future fusion/MCP-coverage +phase, consistent with the Phase 108 deviation note. No schema change. Tests: +`tests/graphLens.test.ts`. diff --git a/docs/features.md b/docs/features.md index 7e77fe4..f222dcc 100644 --- a/docs/features.md +++ b/docs/features.md @@ -201,6 +201,8 @@ All search uses the **text embedding model** (not the code model) to embed queri | **Guided `gitsema setup` wizard with storage backend selection (Phase 104)** | `gitsema setup` (primary name; `gitsema quickstart` remains a backward-compat alias) extends the onboarding wizard with a storage-backend step (sqlite/postgres/qdrant), persisting `storage.*` config keys and validating the connection via `getCachedStorageProfile().metadata.getLastIndexedCommit()` (reverting to sqlite on failure), plus an optional final step to configure a local Ollama narrator/guide model via `gitsema models add --narrator\|--guide --provider ollama --activate` | | **Co-change / dependency / cycle queries (Phase 107)** | `gitsema co-change [-k/--top]` — files that historically change together with `` (from `co_change` edges); `gitsema deps [--reverse] [--depth] [--edge-types]` — import/dependency closure of a file or symbol (BFS over `imports`/`calls`/`extends`/`implements` edges); `gitsema graph cycles` / top-level `gitsema cycles [--edge-types]` — detect cycles in the structural graph (default: `imports`). All require `gitsema index --graph` + `gitsema graph build` first | | **Graph traversal primitives (Phase 108)** | `GraphStore.neighbors/callers/callees/path/subgraph` — recursive-CTE traversals over `graph_nodes`/`edges` (sqlite + Postgres; Qdrant profile throws "graph queries require a relational backend"), depth capped at 3. CLI: `gitsema graph callers [--depth]` (reverse `calls` traversal), `gitsema graph callees [--depth]` (forward `calls` traversal), `gitsema graph neighbors [--edge-types] [--direction] [--depth]` (typed neighborhood, any edge kinds), `gitsema graph path ` (shortest typed path, rendered as `-[edgeType]->`/`<-[edgeType]-` hops). MCP: `call_graph` (callers/callees) and `graph_neighbors`. All resolve symbol qualified names, file paths, or literal node keys via `resolveNode()`; require `gitsema index --graph` + `gitsema graph build` first | +| **`--lens` toggle + four-signal ranking (Phase 109)** | `vectorSearch` gains a fourth ranking signal — `weightStructural`/`structuralScores` extend the three-signal formula to `score = (wv*cosine + wr*recency + wp*pathScore + ws*structScore) / wTotal`, where `structScore` is `1 / (1 + hops)` graph proximity from a query anchor. Unset by default (byte-for-byte identical to pre-Phase-109 ranking). Shared `--lens semantic\|structural\|hybrid` + `--weight-structural ` CLI options (`src/cli/lib/lens.ts`) toggle which signal(s) drive a command's output | +| **Structural/semantic fusion commands (Phase 109)** | `gitsema blast-radius [--lens] [--depth] [-k/--top]` (default: hybrid) — "what changes if I touch this": structural dependents (`calls`/`imports`/`extends`/`implements`/`references`, reverse traversal) and/or semantically similar blobs/symbols; `gitsema relate [-k/--top]` — depth-1 callers + depth-1 callees (labeled) plus semantically similar blobs/symbols, both lenses always shown; `gitsema similar [--lens] [-k/--top]` (default: hybrid) — structural similarity via Jaccard overlap of outgoing edge targets (same call/import "shape") and/or semantic embedding similarity; `gitsema unused [--edge-types]` — file/function/class/method nodes with no inbound `calls`/`imports` edges, the structural complement to `dead-concepts`. Semantic ranking (`semanticNeighborsForNode`) uses already-stored embeddings — no embedding provider call — and degrades to `(not supported on this storage backend)` on non-sqlite backends. `gitsema impact --lens structural\|hybrid` becomes a thin alias over `blast-radius`; `--lens semantic` (default) is unchanged | --- diff --git a/src/cli/commands/graphBlastRadius.ts b/src/cli/commands/graphBlastRadius.ts new file mode 100644 index 0000000..d0b0902 --- /dev/null +++ b/src/cli/commands/graphBlastRadius.ts @@ -0,0 +1,26 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { blastRadius } from '../../core/graph/blastRadius.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError, renderBlastRadius } from '../lib/graphRender.js' + +export interface GraphBlastRadiusCommandOptions { + lens?: string + depth?: string + top?: string +} + +export async function blastRadiusCommand(symbol: string, options: GraphBlastRadiusCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const lens = parseLens(options.lens, 'hybrid') + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await blastRadius(profile.graph, symbol, { lens, depth, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + console.log(renderBlastRadius(result, result.resolved.node)) +} diff --git a/src/cli/commands/graphRelate.ts b/src/cli/commands/graphRelate.ts new file mode 100644 index 0000000..95cd477 --- /dev/null +++ b/src/cli/commands/graphRelate.ts @@ -0,0 +1,50 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { relate } from '../../core/graph/relate.js' +import { renderResolutionError } from '../lib/graphRender.js' + +export interface GraphRelateCommandOptions { + top?: string +} + +export async function relateCommand(symbol: string, options: GraphRelateCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await relate(profile.graph, symbol, { topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + const node = result.resolved.node + console.log(`Related to ${node.displayName} (${node.nodeKey}):\n`) + + console.log('Called by (structural):') + if (result.callers.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.callers) console.log(` ${hit.displayName}`) + } + console.log('') + + console.log('Calls (structural):') + if (result.callees.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.callees) console.log(` ${hit.displayName}`) + } + console.log('') + + console.log('Semantically similar:') + if (!result.semanticSupported) { + console.log(' (not supported on this storage backend)') + } else if (result.similar.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.similar) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + console.log(` ${hit.score.toFixed(3)} ${label}`) + } + } +} diff --git a/src/cli/commands/graphSimilar.ts b/src/cli/commands/graphSimilar.ts new file mode 100644 index 0000000..fd5cb1f --- /dev/null +++ b/src/cli/commands/graphSimilar.ts @@ -0,0 +1,51 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { similar } from '../../core/graph/similar.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError } from '../lib/graphRender.js' + +export interface GraphSimilarCommandOptions { + lens?: string + top?: string +} + +export async function similarCommand(symbol: string, options: GraphSimilarCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const lens = parseLens(options.lens, 'hybrid') + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await similar(profile.graph, symbol, { lens, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + const node = result.resolved.node + console.log(`Similar to ${node.displayName} (${node.nodeKey}) — lens: ${lens}\n`) + + if (lens !== 'semantic') { + console.log('Structural (same call/import shape):') + if (result.structural.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.structural) { + console.log(` ${hit.jaccard.toFixed(3)} ${hit.displayName} (${hit.shared} shared)`) + } + } + console.log('') + } + + if (lens !== 'structural') { + console.log('Semantic:') + if (!result.semanticSupported) { + console.log(' (not supported on this storage backend)') + } else if (result.semantic.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.semantic) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + console.log(` ${hit.score.toFixed(3)} ${label}`) + } + } + } +} diff --git a/src/cli/commands/graphUnused.ts b/src/cli/commands/graphUnused.ts new file mode 100644 index 0000000..56a243b --- /dev/null +++ b/src/cli/commands/graphUnused.ts @@ -0,0 +1,26 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { unused } from '../../core/graph/unused.js' +import type { EdgeType } from '../../core/storage/types.js' + +export interface GraphUnusedCommandOptions { + edgeTypes?: string +} + +export async function unusedCommand(options: GraphUnusedCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const edgeTypes = options.edgeTypes + ? options.edgeTypes.split(',').map((s) => s.trim()).filter(Boolean) as EdgeType[] + : undefined + + const result = await unused(profile.graph, { edgeTypes }) + + if (result.nodes.length === 0) { + console.log('No unused symbols or files found (or `gitsema graph build` has not been run).') + return + } + + console.log(`${result.nodes.length} unused node${result.nodes.length === 1 ? '' : 's'} (no inbound calls/imports):\n`) + for (const node of result.nodes) { + console.log(` [${node.kind}] ${node.displayName}${node.path ? ` (${node.path})` : ''}`) + } +} diff --git a/src/cli/commands/impact.ts b/src/cli/commands/impact.ts index 15a0bd3..9dcd897 100644 --- a/src/cli/commands/impact.ts +++ b/src/cli/commands/impact.ts @@ -11,6 +11,10 @@ import { shortHash } from '../../core/search/ranking.js' import { buildProviderOrExit, resolveModels } from '../lib/provider.js' import { emitJsonSink } from '../lib/output.js' import { narrateToolResult } from '../../core/llm/narrator.js' +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { blastRadius } from '../../core/graph/blastRadius.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError, renderBlastRadius } from '../lib/graphRender.js' export interface ImpactCommandOptions { /** Number of similar blobs to return (default 10). */ @@ -33,6 +37,7 @@ export interface ImpactCommandOptions { noHeadings?: boolean out?: string[] narrate?: boolean + lens?: string } /** @@ -104,6 +109,37 @@ export async function impactCommand( process.exit(1) } + const lens = parseLens(options.lens, 'semantic') + + // Phase 109 (knowledge-graph §8): `--lens structural|hybrid` makes `impact` + // a thin alias over `blast-radius` — true structural dependents instead of + // (or alongside) semantic similarity. `--lens semantic` (default) preserves + // pre-Phase-109 behavior exactly. + if (lens !== 'semantic') { + const profile = getCachedStorageProfile(process.cwd()) + const normalised = filePath.trim().replace(/\\/g, '/').replace(/^\.\//, '') + const result = await blastRadius(profile.graph, normalised, { lens, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(filePath, result.resolved)) + return + } + + if (options.dump !== undefined || (options.out?.some((o) => o.startsWith('json')))) { + const jsonSink = getSink(resolveOutputs({ out: options.out, dump: options.dump, html: options.html }), 'json') + if (jsonSink?.file) { + writeFileSync(jsonSink.file, JSON.stringify(result, null, 2), 'utf8') + console.log(`Wrote impact (blast-radius) report JSON to ${jsonSink.file}`) + } else { + console.log(JSON.stringify(result, null, 2)) + } + return + } + + console.log(renderBlastRadius(result, result.resolved.node)) + return + } + const resolvedPath = resolve(filePath.trim()) if (!existsSync(resolvedPath)) { console.error(`Error: file not found: ${resolvedPath}`) diff --git a/src/cli/lib/graphRender.ts b/src/cli/lib/graphRender.ts new file mode 100644 index 0000000..baa354b --- /dev/null +++ b/src/cli/lib/graphRender.ts @@ -0,0 +1,51 @@ +import type { ResolveNodeResult } from '../../core/graph/resolveNode.js' +import type { BlastRadiusResult } from '../../core/graph/blastRadius.js' +import type { GraphNodeRecord } from '../../core/storage/types.js' + +/** Shared "not found" / "ambiguous" message for `resolveNode()` results. */ +export function renderResolutionError(label: string, resolved: ResolveNodeResult): string { + if (resolved.status === 'ambiguous') { + const candidates = resolved.candidates.map((c) => ` ${c.nodeKey}`).join('\n') + return `"${label}" is ambiguous — matches multiple symbols:\n${candidates}` + } + return `No graph node found for "${label}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.` +} + +/** + * Renders a `BlastRadiusResult` as human-readable text. Shared by + * `blast-radius` and `impact --lens`. The caller must have already checked + * `result.resolved.status === 'found'`. + */ +export function renderBlastRadius(result: BlastRadiusResult, node: GraphNodeRecord): string { + const lines: string[] = [] + lines.push(`Blast radius of ${node.displayName} (${node.nodeKey}) — lens: ${result.lens}`, '') + + if (result.lens !== 'semantic') { + lines.push('Structural dependents (who references this):') + if (result.structural.length === 0) { + lines.push(' (none)') + } else { + for (const hit of result.structural) { + const edge = hit.edgeType ? `[${hit.edgeType}] ` : '' + lines.push(` ${edge}${hit.displayName} (depth ${hit.depth})`) + } + } + lines.push('') + } + + if (result.lens !== 'structural') { + lines.push('Semantically related:') + if (!result.semanticSupported) { + lines.push(' (not supported on this storage backend)') + } else if (result.semantic.length === 0) { + lines.push(' (none)') + } else { + for (const hit of result.semantic) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + lines.push(` ${hit.score.toFixed(3)} ${label}`) + } + } + } + + return lines.join('\n') +} diff --git a/src/cli/lib/lens.ts b/src/cli/lib/lens.ts new file mode 100644 index 0000000..26d4871 --- /dev/null +++ b/src/cli/lib/lens.ts @@ -0,0 +1,51 @@ +import type { Command } from 'commander' + +/** + * The cross-cutting `--lens` toggle (Phase 109, knowledge-graph §7): which of + * the semantic/structural signals drive a command's ranking. + * + * - `semantic` — vectors + FTS only (today's default; structural weight 0). + * - `structural` — pure graph traversal/ranking (vector weight 0). + * - `hybrid` — both blended. + */ +export type Lens = 'semantic' | 'structural' | 'hybrid' + +export function parseLens(value: string | undefined, fallback: Lens): Lens { + if (value === 'semantic' || value === 'structural' || value === 'hybrid') return value + return fallback +} + +/** Ranking-weight overrides for `vectorSearch`'s four-signal formula (§7.2). */ +export interface LensWeights { + weightVector?: number + weightRecency?: number + weightPath?: number + weightStructural?: number +} + +/** + * Translates `--lens` (+ optional `--weight-structural` override) into + * `vectorSearch` ranking-weight overrides. + * + * `semantic` with no explicit structural weight returns `{}` — leaving + * `vectorSearch`'s defaults untouched, so existing semantic-lens callers stay + * byte-for-byte identical to pre-Phase-109 behavior. + */ +export function lensWeights(lens: Lens, weightStructural?: number): LensWeights { + switch (lens) { + case 'structural': + return { weightVector: 0, weightRecency: 0, weightPath: 0, weightStructural: weightStructural ?? 1 } + case 'hybrid': + return { weightStructural: weightStructural ?? 0.3 } + case 'semantic': + default: + return weightStructural !== undefined ? { weightStructural } : {} + } +} + +/** Adds the shared `--lens` and `--weight-structural` options to a command. */ +export function addLensOption(cmd: Command, defaultLens: Lens): Command { + return cmd + .option('--lens ', `'semantic' | 'structural' | 'hybrid' — which signal(s) drive ranking (default: ${defaultLens})`, defaultLens) + .option('--weight-structural ', 'structural signal weight (overrides the --lens default)') +} diff --git a/src/cli/register/all.ts b/src/cli/register/all.ts index 1fb48e1..eafc9ff 100644 --- a/src/cli/register/all.ts +++ b/src/cli/register/all.ts @@ -1,5 +1,6 @@ import { Command } from 'commander' import { collectOut } from '../../utils/outputSink.js' +import { addLensOption } from '../lib/lens.js' // Per-domain register helpers (keep existing split modules available) import { registerSetup } from './setup.js' @@ -330,22 +331,24 @@ export function registerAll(program: Command) { .option('--narrate', 'generate an LLM narrative of dead concepts (requires GITSEMA_LLM_URL)') .action(deadConceptsCommand) - program - .command('impact ') - .description('Compute semantically similar blobs across the codebase to highlight refactor impact (see also: blame, file-diff)') - .option('-k, --top ', 'number of similar blobs to return', '10') - .option('--chunks', 'include chunk-level embeddings for finer-grained coupling') - .option('--level ', 'search level: file (default), chunk, or symbol') - .option('--dump [file]', 'output structured JSON; writes to if given, otherwise prints JSON to stdout (legacy: prefer --out json)') - .option('--model ', 'override embedding model') - .option('--text-model ', 'override text embedding model') - .option('--code-model ', 'override code embedding model') - .option('--branch ', 'restrict results to blobs seen on this branch') - .option('--html [file]', 'output interactive HTML; writes to if given, otherwise impact.html (legacy: prefer --out html)') - .option('--out ', 'output spec (repeatable): text|json[:file]|html[:file]|markdown[:file] (overrides --dump/--html)', collectOut, [] as string[]) - .option('--no-headings', "don't print section header") - .option('--narrate', 'generate an LLM narrative of the impact report (requires GITSEMA_LLM_URL)') - .action(impactCommand) + addLensOption( + program + .command('impact ') + .description('Compute semantically similar blobs across the codebase to highlight refactor impact (see also: blame, file-diff, blast-radius)') + .option('-k, --top ', 'number of similar blobs to return', '10') + .option('--chunks', 'include chunk-level embeddings for finer-grained coupling') + .option('--level ', 'search level: file (default), chunk, or symbol') + .option('--dump [file]', 'output structured JSON; writes to if given, otherwise prints JSON to stdout (legacy: prefer --out json)') + .option('--model ', 'override embedding model') + .option('--text-model ', 'override text embedding model') + .option('--code-model ', 'override code embedding model') + .option('--branch ', 'restrict results to blobs seen on this branch') + .option('--html [file]', 'output interactive HTML; writes to if given, otherwise impact.html (legacy: prefer --out html)') + .option('--out ', 'output spec (repeatable): text|json[:file]|html[:file]|markdown[:file] (overrides --dump/--html)', collectOut, [] as string[]) + .option('--no-headings', "don't print section header") + .option('--narrate', 'generate an LLM narrative of the impact report (requires GITSEMA_LLM_URL)'), + 'semantic', + ).action(impactCommand) program .command('clusters') diff --git a/src/cli/register/graph.ts b/src/cli/register/graph.ts index 4bd0d4a..2a4e21d 100644 --- a/src/cli/register/graph.ts +++ b/src/cli/register/graph.ts @@ -7,6 +7,11 @@ import { graphCallersCommand } from '../commands/graphCallers.js' import { graphCalleesCommand } from '../commands/graphCallees.js' import { graphNeighborsCommand } from '../commands/graphNeighbors.js' import { graphPathCommand } from '../commands/graphPath.js' +import { blastRadiusCommand } from '../commands/graphBlastRadius.js' +import { relateCommand } from '../commands/graphRelate.js' +import { similarCommand } from '../commands/graphSimilar.js' +import { unusedCommand } from '../commands/graphUnused.js' +import { addLensOption } from '../lib/lens.js' /** * Structural knowledge-graph commands (Phase 107, knowledge-graph §3.3/§8). @@ -82,4 +87,34 @@ export function registerGraph(program: Command) { .command('path ') .description('Shortest typed path from to (structural lens; max depth 3)') .action(graphPathCommand) + + // Phase 109: --lens toggle + fusion commands (knowledge-graph §7/§8). + addLensOption( + program + .command('blast-radius ') + .description('What changes if I touch this — structural dependents and/or semantically related blobs (default lens: hybrid)') + .option('--depth ', 'structural traversal depth (max 3)') + .option('-k, --top ', 'number of semantic results to return (default 10)'), + 'hybrid', + ).action(blastRadiusCommand) + + program + .command('relate ') + .description('Callers/callees (structural) and semantically similar blobs (vector), labeled — both lenses, lose neither') + .option('-k, --top ', 'number of semantic results to return (default 10)') + .action(relateCommand) + + addLensOption( + program + .command('similar ') + .description('Symbols/files with a similar call/import shape (structural) and/or semantically similar (vector) (default lens: hybrid)') + .option('-k, --top ', 'number of results to return per lens (default 10)'), + 'hybrid', + ).action(similarCommand) + + program + .command('unused') + .description('Symbols/files with no inbound calls/imports edges — structural complement to `dead-concepts`') + .option('--edge-types ', 'comma-separated inbound edge types that count as "used" (default: calls,imports)') + .action(unusedCommand) } diff --git a/src/core/graph/blastRadius.ts b/src/core/graph/blastRadius.ts new file mode 100644 index 0000000..a87a108 --- /dev/null +++ b/src/core/graph/blastRadius.ts @@ -0,0 +1,59 @@ +/** + * `gitsema blast-radius ` (Phase 109, knowledge-graph §7/§8): "what + * changes if I touch this" — structural dependents (who references this node, + * via `neighbors(..., direction: 'in')`) and/or semantically similar blobs, + * selected by `--lens`. The upgrade to `impact`'s semantic-only analysis. + */ + +import type { EdgeType, GraphHit, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' +import type { Lens } from '../../cli/lib/lens.js' + +/** Edge types that represent "depends on" relationships for blast-radius purposes. */ +export const BLAST_RADIUS_EDGE_TYPES: EdgeType[] = ['calls', 'imports', 'extends', 'implements', 'references'] + +export interface BlastRadiusResult { + resolved: ResolveNodeResult + lens: Lens + /** Nodes that (transitively) depend on the resolved node — empty unless lens is structural/hybrid. */ + structural: GraphHit[] + /** Semantically similar blobs/symbols — empty unless lens is semantic/hybrid. */ + semantic: SemanticHit[] + /** False when the semantic lens was requested but the storage backend doesn't support it. */ + semanticSupported: boolean +} + +export interface BlastRadiusOptions { + lens?: Lens + depth?: number + topK?: number + edgeTypes?: EdgeType[] +} + +export async function blastRadius(graph: GraphStore, identifier: string, opts: BlastRadiusOptions = {}): Promise { + const lens = opts.lens ?? 'hybrid' + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, lens, structural: [], semantic: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const structural = lens === 'semantic' + ? [] + : await graph.neighbors(resolved.node.nodeKey, { + edgeTypes: opts.edgeTypes ?? BLAST_RADIUS_EDGE_TYPES, + direction: 'in', + depth: opts.depth, + }) + + let semantic: SemanticHit[] = [] + let semanticSupported = true + if (lens !== 'structural') { + const result = await semanticNeighborsForNode(resolved.node, topK) + semantic = result.hits + semanticSupported = result.supported + } + + return { resolved, lens, structural, semantic, semanticSupported } +} diff --git a/src/core/graph/relate.ts b/src/core/graph/relate.ts new file mode 100644 index 0000000..990d23f --- /dev/null +++ b/src/core/graph/relate.ts @@ -0,0 +1,41 @@ +/** + * `gitsema relate ` (Phase 109, knowledge-graph §7/§8): one view + * combining structural callers/callees (labeled, depth 1) with semantically + * similar blobs/symbols — "both lenses, lose neither". + */ + +import type { GraphHit, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' + +export interface RelateResult { + resolved: ResolveNodeResult + /** Direct (depth-1) callers of the resolved symbol. */ + callers: GraphHit[] + /** Direct (depth-1) callees of the resolved symbol. */ + callees: GraphHit[] + /** Semantically similar blobs/symbols. */ + similar: SemanticHit[] + /** False when the storage backend doesn't support the semantic lookup. */ + semanticSupported: boolean +} + +export interface RelateOptions { + topK?: number +} + +export async function relate(graph: GraphStore, identifier: string, opts: RelateOptions = {}): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, callers: [], callees: [], similar: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const [callers, callees, semantic] = await Promise.all([ + graph.callers(resolved.node.nodeKey, 1), + graph.callees(resolved.node.nodeKey, 1), + semanticNeighborsForNode(resolved.node, topK), + ]) + + return { resolved, callers, callees, similar: semantic.hits, semanticSupported: semantic.supported } +} diff --git a/src/core/graph/semanticNeighbors.ts b/src/core/graph/semanticNeighbors.ts new file mode 100644 index 0000000..3ba45c7 --- /dev/null +++ b/src/core/graph/semanticNeighbors.ts @@ -0,0 +1,177 @@ +/** + * Semantic-similarity helper for the Phase 109 fusion commands + * (`blast-radius`, `relate`, `similar`): ranks stored embeddings by cosine + * similarity to a graph node's *own* stored embedding, so these commands need + * no embedding provider / network call. + * + * sqlite-only for now (raw `embeddings`/`symbols`/`symbol_embeddings` queries); + * other storage backends degrade to an empty (but `supported: false`) result, + * so hybrid-lens commands fall back to structural-only output gracefully. + */ + +import { getActiveSession } from '../db/sqlite.js' +import { embeddings, paths, symbols, symbolEmbeddings } from '../db/schema.js' +import { eq, inArray } from 'drizzle-orm' +import { cosineSimilarityPrecomputed, vectorNorm } from '../search/analysis/vectorSearch.js' +import { bufferToFloat32 } from '../../utils/embedding.js' +import { getCachedStorageProfile } from '../storage/resolveProfile.js' +import type { GraphNodeRecord } from '../storage/types.js' + +export interface SemanticHit { + blobHash: string + paths: string[] + score: number + symbolId?: number + symbolName?: string + qualifiedName?: string + startLine?: number + endLine?: number +} + +export interface SemanticNeighborsResult { + /** False when the active storage backend doesn't support this lookup (non-sqlite). */ + supported: boolean + hits: SemanticHit[] +} + +function pathsByBlob(hashes: string[]): Map { + if (hashes.length === 0) return new Map() + const { db } = getActiveSession() + const rows = db.select({ blobHash: paths.blobHash, path: paths.path }) + .from(paths) + .where(inArray(paths.blobHash, hashes)) + .all() + const map = new Map() + for (const row of rows) { + const list = map.get(row.blobHash) ?? [] + list.push(row.path) + map.set(row.blobHash, list) + } + return map +} + +/** + * Finds blobs whose whole-file embedding is most similar to `blobHash`'s, + * excluding `blobHash` itself. + */ +function fileNeighbors(blobHash: string, topK: number): SemanticHit[] { + const { db } = getActiveSession() + const target = db.select({ vector: embeddings.vector, model: embeddings.model }) + .from(embeddings) + .where(eq(embeddings.blobHash, blobHash)) + .limit(1) + .all()[0] + if (!target) return [] + + const targetVec = bufferToFloat32(target.vector as Buffer) + const targetNorm = vectorNorm(targetVec) + + const rows = db.select({ blobHash: embeddings.blobHash, vector: embeddings.vector }) + .from(embeddings) + .where(eq(embeddings.model, target.model)) + .all() + + const scored = rows + .filter((r) => r.blobHash !== blobHash) + .map((r) => ({ + blobHash: r.blobHash, + score: cosineSimilarityPrecomputed(targetVec, targetNorm, bufferToFloat32(r.vector as Buffer)), + })) + .sort((a, b) => b.score - a.score) + .slice(0, topK) + + const byBlob = pathsByBlob(scored.map((s) => s.blobHash)) + return scored.map((s) => ({ blobHash: s.blobHash, paths: byBlob.get(s.blobHash) ?? [], score: s.score })) +} + +/** + * Finds symbols whose embedding is most similar to the symbol identified by + * `blobHash` + `qualifiedName` + `signatureHash`, excluding itself. + */ +function symbolNeighbors(blobHash: string, qualifiedName: string, signatureHash: string, topK: number): SemanticHit[] { + const { db } = getActiveSession() + const targetRow = db.select({ id: symbols.id, qualifiedName: symbols.qualifiedName, signatureHash: symbols.signatureHash }) + .from(symbols) + .where(eq(symbols.blobHash, blobHash)) + .all() + .find((s) => s.qualifiedName === qualifiedName && s.signatureHash === signatureHash) + if (!targetRow) return [] + + const targetEmb = db.select({ vector: symbolEmbeddings.vector, model: symbolEmbeddings.model }) + .from(symbolEmbeddings) + .where(eq(symbolEmbeddings.symbolId, targetRow.id)) + .limit(1) + .all()[0] + if (!targetEmb) return [] + + const targetVec = bufferToFloat32(targetEmb.vector as Buffer) + const targetNorm = vectorNorm(targetVec) + + const rows = db.select({ + symbolId: symbolEmbeddings.symbolId, + vector: symbolEmbeddings.vector, + blobHash: symbols.blobHash, + symbolName: symbols.symbolName, + qualifiedName: symbols.qualifiedName, + startLine: symbols.startLine, + endLine: symbols.endLine, + }) + .from(symbolEmbeddings) + .innerJoin(symbols, eq(symbolEmbeddings.symbolId, symbols.id)) + .where(eq(symbolEmbeddings.model, targetEmb.model)) + .all() + + const scored = rows + .filter((r) => r.symbolId !== targetRow.id) + .map((r) => ({ + ...r, + score: cosineSimilarityPrecomputed(targetVec, targetNorm, bufferToFloat32(r.vector as Buffer)), + })) + .sort((a, b) => b.score - a.score) + .slice(0, topK) + + const byBlob = pathsByBlob(scored.map((s) => s.blobHash)) + return scored.map((s) => ({ + blobHash: s.blobHash, + paths: byBlob.get(s.blobHash) ?? [], + score: s.score, + symbolId: s.symbolId, + symbolName: s.symbolName, + qualifiedName: s.qualifiedName ?? undefined, + startLine: s.startLine, + endLine: s.endLine, + })) +} + +/** Parses `symbol:##` into its parts. */ +export function parseSymbolNodeKey(nodeKey: string): { path: string; qualifiedName: string; signatureHash: string } | null { + if (!nodeKey.startsWith('symbol:')) return null + const rest = nodeKey.slice('symbol:'.length) + const lastHash = rest.lastIndexOf('#') + const secondHash = rest.lastIndexOf('#', lastHash - 1) + if (lastHash === -1 || secondHash === -1) return null + return { + path: rest.slice(0, secondHash), + qualifiedName: rest.slice(secondHash + 1, lastHash), + signatureHash: rest.slice(lastHash + 1), + } +} + +/** + * Semantic neighbors of a resolved graph node — file nodes rank by whole-file + * embedding similarity; symbol nodes rank by symbol-embedding similarity. + * Returns `{ supported: false, hits: [] }` on non-sqlite backends. + */ +export async function semanticNeighborsForNode(node: GraphNodeRecord, topK = 10): Promise { + const profile = getCachedStorageProfile() + if (profile.backend !== 'sqlite') return { supported: false, hits: [] } + if (!node.currentBlobHash) return { supported: true, hits: [] } + + if (node.kind === 'file') { + return { supported: true, hits: fileNeighbors(node.currentBlobHash, topK) } + } + + const parsed = parseSymbolNodeKey(node.nodeKey) + if (!parsed) return { supported: true, hits: [] } + return { supported: true, hits: symbolNeighbors(node.currentBlobHash, parsed.qualifiedName, parsed.signatureHash, topK) } +} diff --git a/src/core/graph/similar.ts b/src/core/graph/similar.ts new file mode 100644 index 0000000..febacbf --- /dev/null +++ b/src/core/graph/similar.ts @@ -0,0 +1,95 @@ +/** + * `gitsema similar --lens structural|semantic|hybrid` (Phase 109, + * knowledge-graph §7/§8): structural similarity ranks nodes by the Jaccard + * overlap of their outgoing edge targets (same call/import "shape" as the + * resolved node); semantic similarity ranks by embedding cosine similarity. + */ + +import type { EdgeType, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' +import type { Lens } from '../../cli/lib/lens.js' + +export interface StructuralSimilarHit { + nodeKey: string + displayName: string + kind: string + /** Jaccard similarity of outgoing edge targets, in [0, 1]. */ + jaccard: number + /** Number of shared outgoing edge targets. */ + shared: number +} + +export interface SimilarResult { + resolved: ResolveNodeResult + lens: Lens + /** Nodes with a similar call/import shape — empty unless lens is structural/hybrid. */ + structural: StructuralSimilarHit[] + /** Semantically similar blobs/symbols — empty unless lens is semantic/hybrid. */ + semantic: SemanticHit[] + /** False when the semantic lens was requested but the storage backend doesn't support it. */ + semanticSupported: boolean +} + +export interface SimilarOptions { + lens?: Lens + topK?: number + /** Edge type whose outgoing targets define a node's "shape" (default: `calls` for symbols, `imports` for files). */ + edgeType?: EdgeType +} + +function jaccard(a: ReadonlySet, b: ReadonlySet): { score: number; shared: number } { + let shared = 0 + for (const k of a) if (b.has(k)) shared++ + const union = a.size + b.size - shared + return { score: union === 0 ? 0 : shared / union, shared } +} + +export async function similar(graph: GraphStore, identifier: string, opts: SimilarOptions = {}): Promise { + const lens = opts.lens ?? 'hybrid' + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, lens, structural: [], semantic: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const edgeType: EdgeType = opts.edgeType ?? (resolved.node.kind === 'file' ? 'imports' : 'calls') + + let structural: StructuralSimilarHit[] = [] + if (lens !== 'semantic') { + const edges = await graph.allEdges([edgeType]) + const setsBySrc = new Map>() + for (const e of edges) { + const set = setsBySrc.get(e.srcKey) ?? new Set() + set.add(e.dstKey) + setsBySrc.set(e.srcKey, set) + } + + const targetSet = setsBySrc.get(resolved.node.nodeKey) + if (targetSet && targetSet.size > 0) { + const allNodes = await graph.allNodes() + const byKey = new Map(allNodes.map((n) => [n.nodeKey, n])) + const hits: StructuralSimilarHit[] = [] + for (const [nodeKey, set] of setsBySrc) { + if (nodeKey === resolved.node.nodeKey) continue + const node = byKey.get(nodeKey) + if (!node || node.kind !== resolved.node.kind) continue + const { score, shared } = jaccard(targetSet, set) + if (shared === 0) continue + hits.push({ nodeKey, displayName: node.displayName, kind: node.kind, jaccard: score, shared }) + } + hits.sort((a, b) => b.jaccard - a.jaccard || b.shared - a.shared) + structural = hits.slice(0, topK) + } + } + + let semantic: SemanticHit[] = [] + let semanticSupported = true + if (lens !== 'structural') { + const result = await semanticNeighborsForNode(resolved.node, topK) + semantic = result.hits + semanticSupported = result.supported + } + + return { resolved, lens, structural, semantic, semanticSupported } +} diff --git a/src/core/graph/unused.ts b/src/core/graph/unused.ts new file mode 100644 index 0000000..ee8af26 --- /dev/null +++ b/src/core/graph/unused.ts @@ -0,0 +1,42 @@ +/** + * `gitsema unused` (Phase 109, knowledge-graph §7/§8): symbols/files with no + * inbound `calls`/`imports` edges — the structural complement to the semantic + * `dead-concepts` command. + */ + +import type { EdgeType, GraphNodeRecord, GraphStore } from '../storage/types.js' + +export const UNUSED_EDGE_TYPES: EdgeType[] = ['calls', 'imports'] +export const UNUSED_NODE_KINDS = ['file', 'function', 'class', 'method'] + +export interface UnusedOptions { + /** Inbound edge types that count as "used" (default: calls, imports). */ + edgeTypes?: EdgeType[] + /** Node kinds to consider (default: file + function/class/method symbol kinds). */ + kinds?: string[] +} + +export interface UnusedResult { + nodes: GraphNodeRecord[] +} + +export async function unused(graph: GraphStore, opts: UnusedOptions = {}): Promise { + const edgeTypes = opts.edgeTypes ?? UNUSED_EDGE_TYPES + const kinds = opts.kinds ?? UNUSED_NODE_KINDS + + const [allNodes, allEdges] = await Promise.all([ + graph.allNodes(), + graph.allEdges(edgeTypes), + ]) + + const referenced = new Set() + for (const e of allEdges) referenced.add(e.dstKey) + + const nodes = allNodes.filter((n) => + !n.isExternal && + kinds.includes(n.kind) && + !referenced.has(n.nodeKey), + ) + + return { nodes } +} diff --git a/src/core/models/types.ts b/src/core/models/types.ts index 027c3d7..f2cb2b9 100644 --- a/src/core/models/types.ts +++ b/src/core/models/types.ts @@ -70,5 +70,5 @@ export interface SearchResult { /** Cluster label from `cluster_assignments` — populated by `--annotate-clusters` on the search command. */ clusterLabel?: string /** When explain=true, breakdown of score components. */ - signals?: { cosine: number; recency?: number; pathScore?: number; bm25?: number } + signals?: { cosine: number; recency?: number; pathScore?: number; bm25?: number; structural?: number } } diff --git a/src/core/search/analysis/vectorSearch.ts b/src/core/search/analysis/vectorSearch.ts index 844566d..1319d2e 100644 --- a/src/core/search/analysis/vectorSearch.ts +++ b/src/core/search/analysis/vectorSearch.ts @@ -164,6 +164,21 @@ export interface VectorSearchOptions { weightVector?: number weightRecency?: number weightPath?: number + /** + * Fourth ranking signal weight (Phase 109, knowledge-graph §7.2): structural + * proximity from a query anchor along graph edges. Defaults to 0 (no + * structural signal) — when neither this nor `structuralScores` is set, the + * scoring formula is byte-for-byte identical to the pre-Phase-109 three-signal + * (or plain cosine) behavior. + */ + weightStructural?: number + /** + * Precomputed per-blob structural proximity scores (e.g. `1 / (1 + hops)` + * from a query anchor, weighted by edge confidence — see + * `src/core/graph/structuralScore.ts`), keyed by `blobHash`. Missing entries + * score 0. Only consulted when `weightStructural` is set. + */ + structuralScores?: Map query?: string searchChunks?: boolean searchSymbols?: boolean @@ -195,11 +210,14 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const { topK = 10, model, recent = false, alpha = 0.8, before, after, - weightVector, weightRecency, weightPath, query = '', + weightVector, weightRecency, weightPath, weightStructural, structuralScores, query = '', searchChunks = false, searchSymbols = false, searchModules = false, branch, negativeQueryEmbedding, negativeLambda, explain, earlyCut = 0, queryText, noCache = false, allowedHashes, } = options + // Per-anchor structural scores vary independently of the cache key's query + // text/embedding fingerprint, so bypass the result cache when present. + const effectiveNoCache = noCache || !!structuralScores // Per-mode row caps to prevent memory spikes on large indexes (review7 §4.4/§4.5). const FILE_CAP = (() => { @@ -221,7 +239,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const cacheKeyOptions: Record = { topK, model, recent, alpha, before, after, - weightVector, weightRecency, weightPath, query, + weightVector, weightRecency, weightPath, weightStructural, query, searchChunks, searchSymbols, searchModules, branch, negativeLambda, explain, earlyCut, // §11.1 — include a deterministic fingerprint of allowedHashes so that @@ -237,16 +255,17 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea queryText ?? embeddingFingerprint(queryEmbedding), cacheKeyOptions, ) - if (!noCache) { + if (!effectiveNoCache) { const cached = getCachedResults(cacheKey) if (cached) return cached } - const useThreeSignal = weightVector !== undefined || weightRecency !== undefined || weightPath !== undefined + const useWeightedSignals = weightVector !== undefined || weightRecency !== undefined || weightPath !== undefined || weightStructural !== undefined const wv = weightVector ?? 0.7 const wr = weightRecency ?? 0.2 const wp = weightPath ?? 0.1 - const wTotal = wv + wr + wp || 1 + const ws = weightStructural ?? 0 + const wTotal = wv + wr + wp + ws || 1 const { db, rawDb } = getActiveSession() const AUTO_CANDIDATE_LIMIT = FILE_CAP @@ -424,7 +443,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const negLambda = options.negativeLambda ?? 0.5 const negNorm = negEmbedding ? vectorNorm(negEmbedding) : 0 - const needRecency = recent || useThreeSignal + const needRecency = recent || useWeightedSignals let recencyScores: Map | null = null if (needRecency) { const candidateHashes = [...new Set( @@ -435,7 +454,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea } let pathsByBlob: Map | null = null - if (useThreeSignal) { + if (useWeightedSignals) { const hashes = [...new Set( scoringPool.filter((r) => !r.blobHash.startsWith('\0module:')).map((r) => r.blobHash), )] @@ -475,11 +494,12 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea score = cosine - (negLambda * negCos) } - if (useThreeSignal) { + if (useWeightedSignals) { const recency = recencyScores?.get(row.blobHash) ?? 0 const blobPaths = pathsByBlob?.get(row.blobHash) ?? [] const pathScore = blobPaths.length > 0 ? Math.max(...blobPaths.map((p) => pathRelevanceScore(query, p))) : 0 - score = (wv * cosine + wr * recency + wp * pathScore) / wTotal + const structScore = structuralScores?.get(row.blobHash) ?? 0 + score = (wv * cosine + wr * recency + wp * pathScore + ws * structScore) / wTotal } else if (recent) { const recency = recencyScores?.get(row.blobHash) ?? 0 score = alpha * cosine + (1 - alpha) * recency @@ -563,13 +583,14 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const recency = recencyScores?.get(b.blobHash) const blobPaths = pathsByBlob?.get(b.blobHash) ?? [] const pathScore = blobPaths.length > 0 ? Math.max(...blobPaths.map((p) => pathRelevanceScore(query, p))) : undefined - base.signals = { cosine: b.cosine, recency: recency ?? undefined, pathScore } + const structScore = structuralScores?.get(b.blobHash) + base.signals = { cosine: b.cosine, recency: recency ?? undefined, pathScore, structural: structScore } } return base }) - if (!noCache && !allowedHashes) { + if (!effectiveNoCache && !allowedHashes) { setCachedResults(cacheKey, results) } diff --git a/tests/graphLens.test.ts b/tests/graphLens.test.ts new file mode 100644 index 0000000..4f91589 --- /dev/null +++ b/tests/graphLens.test.ts @@ -0,0 +1,283 @@ +/** + * Tests for the Phase 109 `--lens` toggle (knowledge-graph §7/§8): + * - the four-signal `vectorSearch` ranking formula (`weightStructural` / + * `structuralScores`), and its semantic-lens-identical default behavior + * - the `blastRadius` / `relate` / `similar` / `unused` core modules. + */ + +import { describe, it, expect, afterEach } from 'vitest' +import { mkdtempSync, rmSync } from 'node:fs' +import { join } from 'node:path' +import { tmpdir } from 'node:os' +import { openDatabaseAt, withDbSession, type DbSession } from '../src/core/db/sqlite.js' +import { SqliteGraphStore } from '../src/core/storage/sqlite/profile.js' +import { vectorSearch } from '../src/core/search/analysis/vectorSearch.js' +import { blastRadius } from '../src/core/graph/blastRadius.js' +import { relate } from '../src/core/graph/relate.js' +import { similar } from '../src/core/graph/similar.js' +import { unused } from '../src/core/graph/unused.js' +import type { GraphEdgeRecord, GraphNodeRecord } from '../src/core/storage/types.js' + +function bufFromArray(arr: number[]) { + return Buffer.from(new Float32Array(arr).buffer) +} + +const tmpDirs: string[] = [] +afterEach(() => { + for (const dir of tmpDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }) + } +}) + +function setupDb(): { session: DbSession; tmpDir: string } { + const tmpDir = mkdtempSync(join(tmpdir(), 'gitsema-graphlens-')) + const session = openDatabaseAt(join(tmpDir, 'test.db')) + tmpDirs.push(tmpDir) + return { session, tmpDir } +} + +// --------------------------------------------------------------------------- +// vectorSearch four-signal ranking +// --------------------------------------------------------------------------- + +describe('vectorSearch — four-signal ranking (Phase 109)', () => { + it('semantic lens (no structural options) is identical to pre-Phase-109 cosine ranking', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0, 1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + const results = await vectorSearch([1, 0, 0, 0], { topK: 10, noCache: true }) + expect(results.map((r) => r.blobHash)).toEqual(['blobA', 'blobB']) + // No weighted-signals options set -> score === cosine similarity. + expect(results[0].score).toBeCloseTo(1, 6) + expect(results[1].score).toBeCloseTo(0, 6) + }) + }) + + it('weightStructural + structuralScores reorders results via the four-signal formula', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + // blobA is closer to the query vector by cosine... + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + // ...but blobB is structurally adjacent to the query anchor (score 1), + // while blobA has no structural relation (score 0). + const structuralScores = new Map([['blobB', 1]]) + + const results = await vectorSearch([1, 0, 0, 0], { + topK: 10, + noCache: true, + weightVector: 0, + weightRecency: 0, + weightPath: 0, + weightStructural: 1, + structuralScores, + explain: true, + }) + + const byHash = new Map(results.map((r) => [r.blobHash, r])) + // wTotal = 0+0+0+1 = 1, so score === structScore directly. + expect(byHash.get('blobB')!.score).toBeCloseTo(1, 6) + expect(byHash.get('blobA')!.score).toBeCloseTo(0, 6) + expect(byHash.get('blobB')!.signals?.structural).toBeCloseTo(1, 6) + expect(byHash.get('blobA')!.signals?.structural ?? 0).toBeCloseTo(0, 6) + // blobB now outranks blobA despite the lower cosine similarity. + expect(results[0].blobHash).toBe('blobB') + }) + }) + + it('blends all four signals according to the supplied weights', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + + const structuralScores = new Map([['blobA', 0.5]]) + const results = await vectorSearch([1, 0, 0, 0], { + topK: 10, + noCache: true, + weightVector: 1, + weightRecency: 0, + weightPath: 0, + weightStructural: 1, + structuralScores, + explain: true, + }) + + // wTotal = 1+0+0+1 = 2; cosine=1, structural=0.5 -> score = (1*1 + 1*0.5) / 2 = 0.75 + expect(results[0].score).toBeCloseTo(0.75, 6) + }) + }) +}) + +// --------------------------------------------------------------------------- +// blastRadius / relate / similar / unused fixture graph: +// +// file:a.ts --defines--> symbol:A, symbol:B, symbol:C +// symbol:A --calls--> symbol:B --calls--> symbol:C --calls--> external:lib +// file:a.ts --imports--> external:lib +// file:b.ts --imports--> external:lib +// +// --------------------------------------------------------------------------- + +const NODES: GraphNodeRecord[] = [ + { nodeKey: 'file:a.ts', kind: 'file', displayName: 'a.ts', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'file:b.ts', kind: 'file', displayName: 'b.ts', path: 'b.ts', currentBlobHash: 'blobB' }, + { nodeKey: 'symbol:a.ts#A#sig1', kind: 'function', displayName: 'A', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'symbol:a.ts#B#sig2', kind: 'function', displayName: 'B', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'symbol:a.ts#C#sig3', kind: 'function', displayName: 'C', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'external:lib', kind: 'external', displayName: 'lib', isExternal: true }, + { nodeKey: 'external:isolated', kind: 'external', displayName: 'isolated', isExternal: true }, +] + +const EDGES: GraphEdgeRecord[] = [ + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#A#sig1', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'defines' }, + { srcKey: 'symbol:a.ts#A#sig1', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#B#sig2', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#C#sig3', dstKey: 'external:lib', edgeType: 'calls' }, + { srcKey: 'file:a.ts', dstKey: 'external:lib', edgeType: 'imports' }, + { srcKey: 'file:b.ts', dstKey: 'external:lib', edgeType: 'imports' }, +] + +async function withFusionGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { + const { session } = setupDb() + return withDbSession(session, async () => { + const graph = new SqliteGraphStore() + await graph.replaceAll(NODES, EDGES) + + // Seed embeddings/symbols so semanticNeighborsForNode() has data to rank. + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + const insertSymbol = session.rawDb.prepare( + 'INSERT INTO symbols (blob_hash, start_line, end_line, symbol_name, symbol_kind, language, qualified_name, signature_hash) ' + + 'VALUES (?, ?, ?, ?, ?, ?, ?, ?)', + ) + const symA = insertSymbol.run('blobA', 1, 2, 'A', 'function', 'typescript', 'A', 'sig1').lastInsertRowid as number + const symB = insertSymbol.run('blobA', 3, 4, 'B', 'function', 'typescript', 'B', 'sig2').lastInsertRowid as number + const symC = insertSymbol.run('blobA', 5, 6, 'C', 'function', 'typescript', 'C', 'sig3').lastInsertRowid as number + + const insertSymEmb = session.rawDb.prepare( + 'INSERT INTO symbol_embeddings (symbol_id, model, dimensions, vector) VALUES (?, ?, ?, ?)', + ) + insertSymEmb.run(symA, 'm', 4, bufFromArray([1, 0, 0, 0])) + insertSymEmb.run(symB, 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + insertSymEmb.run(symC, 'm', 4, bufFromArray([0, 1, 0, 0])) + + return fn(graph, session) + }) +} + +describe('blastRadius (Phase 109)', () => { + it('lens=structural returns only structural dependents', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { lens: 'structural', depth: 3 }) + expect(result.resolved.status).toBe('found') + expect(result.structural.map((h) => h.displayName).sort()).toEqual(['A', 'B']) + expect(result.semantic).toEqual([]) + }) + }) + + it('lens=semantic returns only semantically related blobs/symbols', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { lens: 'semantic' }) + expect(result.resolved.status).toBe('found') + expect(result.structural).toEqual([]) + expect(result.semanticSupported).toBe(true) + // C's embedding [0,1,0,0] is closest to nothing in this fixture (A/B are + // orthogonal-ish), but the call should still succeed and return hits sorted by score. + expect(Array.isArray(result.semantic)).toBe(true) + }) + }) + + it('lens=hybrid (default) returns both structural and semantic sections', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { depth: 3 }) + expect(result.lens).toBe('hybrid') + expect(result.structural.map((h) => h.displayName).sort()).toEqual(['A', 'B']) + expect(result.semanticSupported).toBe(true) + }) + }) + + it('returns not-found for an unknown identifier', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'does-not-exist') + expect(result.resolved.status).toBe('not-found') + expect(result.structural).toEqual([]) + expect(result.semantic).toEqual([]) + }) + }) +}) + +describe('relate (Phase 109)', () => { + it('returns depth-1 callers, callees, and semantically similar hits', async () => { + await withFusionGraph(async (graph) => { + const result = await relate(graph, 'B') + expect(result.resolved.status).toBe('found') + expect(result.callers.map((h) => h.displayName)).toEqual(['A']) + expect(result.callees.map((h) => h.displayName)).toEqual(['C']) + expect(result.semanticSupported).toBe(true) + // B's embedding [0.9,0.1,0,0] is most similar to A's [1,0,0,0]. + expect(result.similar[0]?.symbolName).toBe('A') + }) + }) +}) + +describe('similar (Phase 109)', () => { + it('lens=structural finds files with overlapping import targets', async () => { + await withFusionGraph(async (graph) => { + const result = await similar(graph, 'a.ts', { lens: 'structural' }) + expect(result.resolved.status).toBe('found') + expect(result.structural.map((h) => h.displayName)).toEqual(['b.ts']) + expect(result.structural[0].shared).toBe(1) + expect(result.semantic).toEqual([]) + }) + }) + + it('lens=semantic ranks by embedding similarity', async () => { + await withFusionGraph(async (graph) => { + const result = await similar(graph, 'A', { lens: 'semantic' }) + expect(result.resolved.status).toBe('found') + expect(result.structural).toEqual([]) + expect(result.semanticSupported).toBe(true) + expect(result.semantic[0]?.symbolName).toBe('B') + }) + }) +}) + +describe('unused (Phase 109)', () => { + it('returns nodes with no inbound calls/imports edges', async () => { + await withFusionGraph(async (graph) => { + const result = await unused(graph) + const keys = result.nodes.map((n) => n.nodeKey).sort() + // A has no inbound calls/imports (only `defines`); file:a.ts/file:b.ts + // have no inbound calls/imports either. B and C are `calls` targets, and + // external:* nodes are excluded. + expect(keys).toEqual(['file:a.ts', 'file:b.ts', 'symbol:a.ts#A#sig1'].sort()) + }) + }) +}) From a9581844756335593bbc9a3721704786f6740854 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 19:29:31 +0000 Subject: [PATCH 3/6] Fix Windows CI: close sqlite handles before rmSync in graph tests better-sqlite3 keeps test.db open after withDbSession() returns; rmSync() on its temp dir fails with EBUSY on Windows (not Linux/macOS). Close session.rawDb in a finally block before cleanup, and document the pattern in CLAUDE.md. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- CLAUDE.md | 6 ++++++ tests/graphLens.test.ts | 11 +++++++++-- tests/graphTraversal.test.ts | 14 +++++++++----- 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index a797686..1501bf6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -101,6 +101,12 @@ pnpm test -- --watch # watch mode during development - Mock modules with `vi.mock()`, spy with `vi.fn()`, clean up with `vi.restoreAllMocks()` in `afterEach` - Integration tests use `mkdtempSync()` + `rmSync()` for isolated temp Git repos - `withDbSession()` helper creates isolated temp SQLite DBs per test +- **Always close `session.rawDb` (`better-sqlite3`) before `rmSync()`-ing its temp + directory.** On Windows, `rmSync` on a directory containing an open SQLite handle + fails with `EBUSY: resource busy or locked, unlink '...\test.db'` — this passes on + Linux/macOS (CI runs `ubuntu-latest` by default) but fails the Windows CI job. Call + `session.rawDb.close()` (e.g. in a `try`/`finally` around `withDbSession()`) before + the test's temp dir is removed in `afterEach`. --- diff --git a/tests/graphLens.test.ts b/tests/graphLens.test.ts index 4f91589..5937421 100644 --- a/tests/graphLens.test.ts +++ b/tests/graphLens.test.ts @@ -59,6 +59,7 @@ describe('vectorSearch — four-signal ranking (Phase 109)', () => { expect(results[0].score).toBeCloseTo(1, 6) expect(results[1].score).toBeCloseTo(0, 6) }) + session.rawDb.close() }) it('weightStructural + structuralScores reorders results via the four-signal formula', async () => { @@ -98,6 +99,7 @@ describe('vectorSearch — four-signal ranking (Phase 109)', () => { // blobB now outranks blobA despite the lower cosine similarity. expect(results[0].blobHash).toBe('blobB') }) + session.rawDb.close() }) it('blends all four signals according to the supplied weights', async () => { @@ -123,6 +125,7 @@ describe('vectorSearch — four-signal ranking (Phase 109)', () => { // wTotal = 1+0+0+1 = 2; cosine=1, structural=0.5 -> score = (1*1 + 1*0.5) / 2 = 0.75 expect(results[0].score).toBeCloseTo(0.75, 6) }) + session.rawDb.close() }) }) @@ -159,7 +162,8 @@ const EDGES: GraphEdgeRecord[] = [ async function withFusionGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { const { session } = setupDb() - return withDbSession(session, async () => { + try { + return await withDbSession(session, async () => { const graph = new SqliteGraphStore() await graph.replaceAll(NODES, EDGES) @@ -189,7 +193,10 @@ async function withFusionGraph(fn: (graph: SqliteGraphStore, session: DbSessi insertSymEmb.run(symC, 'm', 4, bufFromArray([0, 1, 0, 0])) return fn(graph, session) - }) + }) + } finally { + session.rawDb.close() + } } describe('blastRadius (Phase 109)', () => { diff --git a/tests/graphTraversal.test.ts b/tests/graphTraversal.test.ts index ef71863..a9c198b 100644 --- a/tests/graphTraversal.test.ts +++ b/tests/graphTraversal.test.ts @@ -56,11 +56,15 @@ afterEach(() => { async function withGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { const { session, tmpDir } = setupFixtureDb() tmpDirs.push(tmpDir) - return withDbSession(session, async () => { - const graph = new SqliteGraphStore() - await graph.replaceAll(NODES, EDGES) - return fn(graph, session) - }) + try { + return await withDbSession(session, async () => { + const graph = new SqliteGraphStore() + await graph.replaceAll(NODES, EDGES) + return fn(graph, session) + }) + } finally { + session.rawDb.close() + } } describe('SqliteGraphStore.neighbors', () => { From 1fc6b31d608b1b2b50b90758be566ae48dedba29 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 19:32:41 +0000 Subject: [PATCH 4/6] Regenerate docs/PLAN.md table of contents Run pnpm gen:toc to include the Phase 109 status section added in the previous commit. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- docs/PLAN.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/PLAN.md b/docs/PLAN.md index 6db1bb3..77f8e72 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -115,11 +115,12 @@ | [Phase 97 — Full-toolset guide, tool interpretation registry, skill generation, Ollama docs](#phase-97-—-full-toolset-guide-tool-interpretation-registry-skill-generation-ollama-docs) | 3280 | | [Phase 98 — CLI-based AI tool backends for narrator/guide](#phase-98-—-cli-based-ai-tool-backends-for-narratorguide) | 3341 | | [Phase 99 — `--provider ollama` for narrator/guide + Ollama model discovery](#phase-99-—-provider-ollama-for-narratorguide-ollama-model-discovery) | 3405 | -| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3471 | -| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101103-—-pluggable-storage-backends-index-scoping) | 3539 | -| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3741 | | [Long-Term Investments](#long-term-investments) | 3446 | | [Non-goals for now (revisited later)](#non-goals-for-now-revisited-later) | 3463 | +| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3471 | +| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101–103-—-pluggable-storage-backends-index-scoping) | 3539 | +| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3741 | +| [Knowledge Graph Track (Phases 105–112) — *planned*](#knowledge-graph-track-phases-105–112-—-planned) | 3887 | --- From 8a82b94324883c8131c91c6afa2a8dd12efe9fbf Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 19:36:40 +0000 Subject: [PATCH 5/6] Swap Phase 111/112 order and add CLI view to unified graph UI Lens coverage & parity sweep moves to Phase 111 (so it still lands before the UI phase and covers the 110 fusion commands); the unified graph UI moves to Phase 112 and now also specifies a CLI/text-mode subgraph view alongside the HTML view. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- docs/PLAN.md | 232 +++++++++++++++++++++++++-------------------------- 1 file changed, 116 insertions(+), 116 deletions(-) diff --git a/docs/PLAN.md b/docs/PLAN.md index 77f8e72..274e6ee 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -8,119 +8,119 @@ | Section | Line | |---|---:| -| [Vision](#vision) | 126 | -| [Guiding principles](#guiding-principles) | 132 | -| [Architecture overview](#architecture-overview) | 142 | -| [Project structure](#project-structure) | 162 | -| [Section I - Phases](#section-i-phases) | 214 | -| [Phase 1 — Foundation](#phase-1-—-foundation) | 216 | -| [Phase 2 — Git walking](#phase-2-—-git-walking) | 258 | -| [Phase 3 — Embedding system](#phase-3-—-embedding-system) | 282 | -| [Phase 4 — Indexing](#phase-4-—-indexing) | 320 | -| [Phase 5 — Search · *MVP deliverable*](#phase-5-—-search-·-mvp-deliverable) | 346 | -| [Phase 6 — Commit mapping](#phase-6-—-commit-mapping) | 379 | -| [Phase 7 — Time-aware queries · *Phase 2 deliverable*](#phase-7-—-time-aware-queries-·-phase-2-deliverable) | 416 | -| [Phase 8 — File-type-aware embedding models](#phase-8-—-file-type-aware-embedding-models) | 449 | -| [Phase 9 — Performance](#phase-9-—-performance) | 487 | -| [Phase 10 — Smarter semantics](#phase-10-—-smarter-semantics) | 525 | -| [Phase 11 — Advanced features + MCP](#phase-11-—-advanced-features-mcp) | 570 | -| [Phase 11b — Content access and semantic concept tracking](#phase-11b-—-content-access-and-semantic-concept-tracking) | 641 | -| [Key technical decisions](#key-technical-decisions) | 758 | -| [Risk register](#risk-register) | 770 | -| [Phase 12 — CLI consolidation & robust per-file indexing](#phase-12-—-cli-consolidation-robust-per-file-indexing) | 782 | -| [Recent progress (snapshot: 2026-04-01)](#recent-progress-snapshot-2026-04-01) | 812 | -| [Phase 13 — Standalone model server for embeddings](#phase-13-—-standalone-model-server-for-embeddings) | 828 | -| [Phase 14 — Infrastructure, tooling, and maintenance](#phase-14-—-infrastructure-tooling-and-maintenance) | 911 | -| [Phase 14b — Search result deduplication](#phase-14b-—-search-result-deduplication) | 968 | -| [Phase 15 — Branch awareness](#phase-15-—-branch-awareness) | 1002 | -| [Phase 16 — Remote-repository indexing (server-managed clone, RAM-backed working tree, persistent DB)](#phase-16-—-remote-repository-indexing-server-managed-clone-ram-backed-working-tree-persistent-db) | 1074 | -| [Phase 17 — Remote-indexing hardening and SSH support](#phase-17-—-remote-indexing-hardening-and-ssh-support) | 1332 | -| [Phase 18 — Reliability, tests, and query caching](#phase-18-—-reliability-tests-and-query-caching) | 1403 | -| [Phase 19 — Smarter chunking, semantic blame & symbol-level embeddings](#phase-19-—-smarter-chunking-semantic-blame-symbol-level-embeddings) | 1417 | -| [Phase 20 — Dead-concept detection & refactor impact analysis](#phase-20-—-dead-concept-detection-refactor-impact-analysis) | 1482 | -| [Phase 21 — Semantic clustering & concept graph](#phase-21-—-semantic-clustering-concept-graph) | 1495 | -| [Phase 22 — Temporal cluster diff](#phase-22-—-temporal-cluster-diff) | 1508 | -| [Phase 23 — Cluster timeline](#phase-23-—-cluster-timeline) | 1521 | -| [Phase 24 — Enhanced cluster labeling](#phase-24-—-enhanced-cluster-labeling) | 1535 | -| [Phase 25 — Interactive HTML visualizations](#phase-25-—-interactive-html-visualizations) | 1549 | -| [Phase 26 — CLI naming consolidation & conceptual diff](#phase-26-—-cli-naming-consolidation-conceptual-diff) | 1564 | -| [Phase 27 — Semantic change-point detection](#phase-27-—-semantic-change-point-detection) | 1605 | -| [Phase 28 — Persistent configuration management](#phase-28-—-persistent-configuration-management) | 1665 | -| [Phase 29 — Automated indexing via Git hooks](#phase-29-—-automated-indexing-via-git-hooks) | 1692 | -| [Phase 30 — Commit message semantic indexing](#phase-30-—-commit-message-semantic-indexing) | 1708 | -| [Phase 31 — Semantic concept authorship ranking](#phase-31-—-semantic-concept-authorship-ranking) | 1759 | -| [Phase 32 — Branch and merge awareness](#phase-32-—-branch-and-merge-awareness) | 1809 | -| [Phase 33 — Multi-level hierarchical indexing](#phase-33-—-multi-level-hierarchical-indexing) | 1870 | -| [Phase 34 — Feature adoption & cross-cutting improvements](#phase-34-—-feature-adoption-cross-cutting-improvements) | 1926 | -| [Phase 35 — Multi-model DB, per-command model flags, clear-model, multi-model search](#phase-35-—-multi-model-db-per-command-model-flags-clear-model-multi-model-search) | 1964 | -| [Phase 36 — Vector Index (VSS), Int8 Quantization, ANN Search](#phase-36-—-vector-index-vss-int8-quantization-ann-search) | 2002 | -| [Phase 37 — Quick Wins: Selective Indexing, Code-to-Code Search, Negative Examples, Result Explanation](#phase-37-—-quick-wins-selective-indexing-code-to-code-search-negative-examples-result-explanation) | 2076 | -| [Phase 38 — Medium Effort: Documentation Gap Analysis, Semantic Bisect, GC, Boolean Queries](#phase-38-—-medium-effort-documentation-gap-analysis-semantic-bisect-gc-boolean-queries) | 2101 | -| [Phase 39 — Analysis Features: Contributor Profiles, Refactoring, Lifecycle, CI Diff](#phase-39-—-analysis-features-contributor-profiles-refactoring-lifecycle-ci-diff) | 2126 | -| [Phase 40 — Visualization & Scale: Codebase Map, Temporal Heatmap, Remote Index, Cherry-Pick](#phase-40-—-visualization-scale-codebase-map-temporal-heatmap-remote-index-cherry-pick) | 2151 | -| [Phase 41 — Multi-Repo Unified Index *(completed v0.43.0)*](#phase-41-—-multi-repo-unified-index-completed-v0430) | 2182 | -| [Phase 42 — IDE / LSP Integration *(completed v0.44.0)*](#phase-42-—-ide-lsp-integration-completed-v0440) | 2198 | -| [Phase 43 — Security Pattern Detection *(completed v0.45.0)*](#phase-43-—-security-pattern-detection-completed-v0450) | 2214 | -| [Phase 44 — Codebase Health Timeline *(completed v0.46.0)*](#phase-44-—-codebase-health-timeline-completed-v0460) | 2229 | -| [Phase 45 — Technical Debt Scoring *(completed v0.47.0)*](#phase-45-—-technical-debt-scoring-completed-v0470) | 2244 | -| [Phase 46 — Evolution Alerts and Commit URL Construction *(completed v0.48.0)*](#phase-46-—-evolution-alerts-and-commit-url-construction-completed-v0480) | 2261 | -| [Phase 47 — Richer Indexing Progress, Embed Latency Stats, and Incremental-by-Default Messaging](#phase-47-—-richer-indexing-progress-embed-latency-stats-and-incremental-by-default-messaging) | 2276 | -| [Phase 48 — Batch Embedding and Provider Throughput ✅ Implemented](#phase-48-—-batch-embedding-and-provider-throughput-✅-implemented) | 2306 | -| [Phase 49 — Auto-VSS Default Path ✅ Implemented (v0.51.0)](#phase-49-—-auto-vss-default-path-✅-implemented-v0510) | 2321 | -| [Phase 50 — Real Multi-Repo Search ✅ Implemented (v0.52.0)](#phase-50-—-real-multi-repo-search-✅-implemented-v0520) | 2333 | -| [Phase 51 — LSP Completion of the Protocol ✅ Implemented (v0.53.0)](#phase-51-—-lsp-completion-of-the-protocol-✅-implemented-v0530) | 2345 | -| [Phase 52 — Query Expansion ✅ Implemented (v0.54.0)](#phase-52-—-query-expansion-✅-implemented-v0540) | 2358 | -| [Phase 53 — Saved Searches and Watch Mode ✅ Implemented (v0.55.0)](#phase-53-—-saved-searches-and-watch-mode-✅-implemented-v0550) | 2370 | -| [Phase 54 — Index Bundle Export / Import ✅ Implemented (v0.56.0)](#phase-54-—-index-bundle-export-import-✅-implemented-v0560) | 2382 | -| [Phase 55 — Embedding Space Explorer (Web UI) ✅ Implemented (v0.57.0)](#phase-55-—-embedding-space-explorer-web-ui-✅-implemented-v0570) | 2393 | -| [Phase 56 — LLM-Powered Evolution Narration ✅ Implemented (v0.58.0)](#phase-56-—-llm-powered-evolution-narration-✅-implemented-v0580) | 2404 | -| [Phase 57 — GitHub Actions Integration for CI Diff ✅ Implemented (v0.59.0)](#phase-57-—-github-actions-integration-for-ci-diff-✅-implemented-v0590) | 2415 | -| [Phase 58 — Structured Security Scan (Static + Semantic) ✅ Implemented (v0.60.0)](#phase-58-—-structured-security-scan-static-semantic-✅-implemented-v0600) | 2426 | -| [Phase 59 — `gitsema tools` Subcommand Group (Protocol Servers) ✅ Implemented (v0.61.0)](#phase-59-—-gitsema-tools-subcommand-group-protocol-servers-✅-implemented-v0610) | 2438 | -| [Phase 60 — Uniform Column Headers + `--no-headings` Across All Commands ✅ Implemented (v.0.62.0)](#phase-60-—-uniform-column-headers-no-headings-across-all-commands-✅-implemented-v0620) | 2479 | -| [Phase 61 — MCP/HTTP Parity + Semantic PR Report *(completed v0.64.0)*](#phase-61-—-mcphttp-parity-semantic-pr-report-completed-v0640) | 2544 | -| [Phase 62 — Heavy Batching for Ollama + HTTP Providers *(completed v0.67.0)*](#phase-62-—-heavy-batching-for-ollama-http-providers-completed-v0670) | 2564 | -| [Phase 63 — Indexing Auto-Defaults and Adaptive Tuning *(completed v0.65.0)*](#phase-63-—-indexing-auto-defaults-and-adaptive-tuning-completed-v0650) | 2578 | -| [Phase 64 — Search Scalability + AI Retrieval Reliability *(completed v0.66.0)*](#phase-64-—-search-scalability-ai-retrieval-reliability-completed-v0660) | 2594 | -| [Phase 65 — Incident Triage Bundle *(completed v0.68.0)*](#phase-65-—-incident-triage-bundle-completed-v0680) | 2608 | -| [Phase 66 — Policy Checks for CI *(completed v0.68.0)*](#phase-66-—-policy-checks-for-ci-completed-v0680) | 2616 | -| [Phase 67 — Ownership Heatmap by Concept *(completed v0.68.0)*](#phase-67-—-ownership-heatmap-by-concept-completed-v0680) | 2624 | -| [Phase 68 — Persistent Workflow Templates *(completed v0.68.0)*](#phase-68-—-persistent-workflow-templates-completed-v0680) | 2632 | -| [Phase 69 — Pipelined Batch Indexing *(completed v0.68.0)*](#phase-69-—-pipelined-batch-indexing-completed-v0680) | 2640 | -| [Phase 70 — Unified Output System *(completed v0.69.0)*](#phase-70-—-unified-output-system-completed-v0690) | 2648 | -| [Phase 71 — Index Status Dashboard + Model Management *(completed v0.71.0)*](#phase-71-—-index-status-dashboard-model-management-completed-v0710) | 2665 | -| [Planned Phases (72+)](#planned-phases-72) | 2687 | -| [Phase 71 — Operational Readiness: Metrics, Rate Limiting, and OpenAPI *(completed v0.71.0)*](#phase-71-—-operational-readiness-metrics-rate-limiting-and-openapi-completed-v0710) | 2693 | -| [Phase 72 — HTTP Route Parity for All Analysis Commands *(completed v0.72.0)*](#phase-72-—-http-route-parity-for-all-analysis-commands-completed-v0720) | 2706 | -| [Phase 73 — Deployment Guide and Docker Infrastructure](#phase-73-—-deployment-guide-and-docker-infrastructure) | 2718 | -| [Phase 74 — `gitsema status` Scale Warnings + Extended `gitsema doctor` Pre-flight](#phase-74-—-gitsema-status-scale-warnings-extended-gitsema-doctor-pre-flight) | 2731 | -| [Phase 75 — Per-Repo Access Control on HTTP Server](#phase-75-—-per-repo-access-control-on-http-server) | 2744 | -| [Phase 76 — Complete `htmlRenderer.ts` Modularisation](#phase-76-—-complete-htmlrendererts-modularisation) | 2758 | -| [Phase 77 — Unified Indexing + Search Level Concept](#phase-77-—-unified-indexing-search-level-concept) | 2771 | -| [Phase 82 — Auto-cap Search Memory *(completed v0.79.0)*](#phase-82-—-auto-cap-search-memory-completed-v0790) | 2787 | -| [Phase 83 — Parallel Commit-Message Embedding *(completed v0.80.0)*](#phase-83-—-parallel-commit-message-embedding-completed-v0800) | 2799 | -| [Phase 84 — LSP: documentSymbol + Improved definition/references *(completed v0.81.0)*](#phase-84-—-lsp-documentsymbol-improved-definitionreferences-completed-v0810) | 2813 | -| [Phase 85 — Tier-1 Reliability: Test Isolation, SQL Sampling, Batch Dedup *(completed v0.84.0)*](#phase-85-—-tier-1-reliability-test-isolation-sql-sampling-batch-dedup-completed-v0840) | 2827 | -| [Phase 86 — Tier-2 Code Organisation: MCP Modularization + Search Module Split + CLI Register Split *(completed v0.85.0)*](#phase-86-—-tier-2-code-organisation-mcp-modularization-search-module-split-cli-register-split-completed-v0850) | 2855 | -| [Phase 87 — Tier-3 Robustness: Embed Retry, Queue Backpressure, Atomic FTS5, Body Limit *(completed v0.86.0)*](#phase-87-—-tier-3-robustness-embed-retry-queue-backpressure-atomic-fts5-body-limit-completed-v0860) | 2883 | -| [Phase 88 — Tier-4 Scale/Features: LLM Narrator Tests + Docs Sync Check *(completed v0.87.0)*](#phase-88-—-tier-4-scalefeatures-llm-narrator-tests-docs-sync-check-completed-v0870) | 2915 | -| [Phase 89 — Tier-5 Code Quality: review6 §11 Detailed Findings *(completed v0.88.0)*](#phase-89-—-tier-5-code-quality-review6-§11-detailed-findings-completed-v0880) | 2939 | -| [Phase 90 — Model Local Names (Shorthand / globalName) *(completed v0.89.0)*](#phase-90-—-model-local-names-shorthand-globalname-completed-v0890) | 3019 | -| [Phase 91 — 8 Productized Usage Patterns (review7 §5) *(completed v0.90.0)*](#phase-91-—-8-productized-usage-patterns-review7-§5-completed-v0900) | 3074 | -| [Phase 92 — review7 Improvement Bundle *(completed, 2026-04-09)*](#phase-92-—-review7-improvement-bundle-completed-2026-04-09) | 3120 | -| [Phase 93 — Time filter semantics & pagination stability](#phase-93-—-time-filter-semantics-pagination-stability) | 3162 | -| [Phase 94 — review8 CLI Wiring & Documentation Restoration *(completed v0.91.0)*](#phase-94-—-review8-cli-wiring-documentation-restoration-completed-v0910) | 3192 | -| [Phase 95 — Flag unification (review8 §8.6/§8.9) *(completed v0.92.0)*](#phase-95-—-flag-unification-review8-§86§89-completed-v0920) | 3224 | -| [Phase 96 — LLM Narrator/Explainer/Guide via chattydeer *(completed v0.93.0)*](#phase-96-—-llm-narratorexplainerguide-via-chattydeer-completed-v0930) | 3247 | -| [Phase 97 — Full-toolset guide, tool interpretation registry, skill generation, Ollama docs](#phase-97-—-full-toolset-guide-tool-interpretation-registry-skill-generation-ollama-docs) | 3280 | -| [Phase 98 — CLI-based AI tool backends for narrator/guide](#phase-98-—-cli-based-ai-tool-backends-for-narratorguide) | 3341 | -| [Phase 99 — `--provider ollama` for narrator/guide + Ollama model discovery](#phase-99-—-provider-ollama-for-narratorguide-ollama-model-discovery) | 3405 | -| [Long-Term Investments](#long-term-investments) | 3446 | -| [Non-goals for now (revisited later)](#non-goals-for-now-revisited-later) | 3463 | -| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3471 | -| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101–103-—-pluggable-storage-backends-index-scoping) | 3539 | -| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3741 | -| [Knowledge Graph Track (Phases 105–112) — *planned*](#knowledge-graph-track-phases-105–112-—-planned) | 3887 | +| [Vision](#vision) | 127 | +| [Guiding principles](#guiding-principles) | 133 | +| [Architecture overview](#architecture-overview) | 143 | +| [Project structure](#project-structure) | 163 | +| [Section I - Phases](#section-i-phases) | 215 | +| [Phase 1 — Foundation](#phase-1-—-foundation) | 217 | +| [Phase 2 — Git walking](#phase-2-—-git-walking) | 259 | +| [Phase 3 — Embedding system](#phase-3-—-embedding-system) | 283 | +| [Phase 4 — Indexing](#phase-4-—-indexing) | 321 | +| [Phase 5 — Search · *MVP deliverable*](#phase-5-—-search-·-mvp-deliverable) | 347 | +| [Phase 6 — Commit mapping](#phase-6-—-commit-mapping) | 380 | +| [Phase 7 — Time-aware queries · *Phase 2 deliverable*](#phase-7-—-time-aware-queries-·-phase-2-deliverable) | 417 | +| [Phase 8 — File-type-aware embedding models](#phase-8-—-file-type-aware-embedding-models) | 450 | +| [Phase 9 — Performance](#phase-9-—-performance) | 488 | +| [Phase 10 — Smarter semantics](#phase-10-—-smarter-semantics) | 526 | +| [Phase 11 — Advanced features + MCP](#phase-11-—-advanced-features-mcp) | 571 | +| [Phase 11b — Content access and semantic concept tracking](#phase-11b-—-content-access-and-semantic-concept-tracking) | 642 | +| [Key technical decisions](#key-technical-decisions) | 759 | +| [Risk register](#risk-register) | 771 | +| [Phase 12 — CLI consolidation & robust per-file indexing](#phase-12-—-cli-consolidation-robust-per-file-indexing) | 783 | +| [Recent progress (snapshot: 2026-04-01)](#recent-progress-snapshot-2026-04-01) | 813 | +| [Phase 13 — Standalone model server for embeddings](#phase-13-—-standalone-model-server-for-embeddings) | 829 | +| [Phase 14 — Infrastructure, tooling, and maintenance](#phase-14-—-infrastructure-tooling-and-maintenance) | 912 | +| [Phase 14b — Search result deduplication](#phase-14b-—-search-result-deduplication) | 969 | +| [Phase 15 — Branch awareness](#phase-15-—-branch-awareness) | 1003 | +| [Phase 16 — Remote-repository indexing (server-managed clone, RAM-backed working tree, persistent DB)](#phase-16-—-remote-repository-indexing-server-managed-clone-ram-backed-working-tree-persistent-db) | 1075 | +| [Phase 17 — Remote-indexing hardening and SSH support](#phase-17-—-remote-indexing-hardening-and-ssh-support) | 1333 | +| [Phase 18 — Reliability, tests, and query caching](#phase-18-—-reliability-tests-and-query-caching) | 1404 | +| [Phase 19 — Smarter chunking, semantic blame & symbol-level embeddings](#phase-19-—-smarter-chunking-semantic-blame-symbol-level-embeddings) | 1418 | +| [Phase 20 — Dead-concept detection & refactor impact analysis](#phase-20-—-dead-concept-detection-refactor-impact-analysis) | 1483 | +| [Phase 21 — Semantic clustering & concept graph](#phase-21-—-semantic-clustering-concept-graph) | 1496 | +| [Phase 22 — Temporal cluster diff](#phase-22-—-temporal-cluster-diff) | 1509 | +| [Phase 23 — Cluster timeline](#phase-23-—-cluster-timeline) | 1522 | +| [Phase 24 — Enhanced cluster labeling](#phase-24-—-enhanced-cluster-labeling) | 1536 | +| [Phase 25 — Interactive HTML visualizations](#phase-25-—-interactive-html-visualizations) | 1550 | +| [Phase 26 — CLI naming consolidation & conceptual diff](#phase-26-—-cli-naming-consolidation-conceptual-diff) | 1565 | +| [Phase 27 — Semantic change-point detection](#phase-27-—-semantic-change-point-detection) | 1606 | +| [Phase 28 — Persistent configuration management](#phase-28-—-persistent-configuration-management) | 1666 | +| [Phase 29 — Automated indexing via Git hooks](#phase-29-—-automated-indexing-via-git-hooks) | 1693 | +| [Phase 30 — Commit message semantic indexing](#phase-30-—-commit-message-semantic-indexing) | 1709 | +| [Phase 31 — Semantic concept authorship ranking](#phase-31-—-semantic-concept-authorship-ranking) | 1760 | +| [Phase 32 — Branch and merge awareness](#phase-32-—-branch-and-merge-awareness) | 1810 | +| [Phase 33 — Multi-level hierarchical indexing](#phase-33-—-multi-level-hierarchical-indexing) | 1871 | +| [Phase 34 — Feature adoption & cross-cutting improvements](#phase-34-—-feature-adoption-cross-cutting-improvements) | 1927 | +| [Phase 35 — Multi-model DB, per-command model flags, clear-model, multi-model search](#phase-35-—-multi-model-db-per-command-model-flags-clear-model-multi-model-search) | 1965 | +| [Phase 36 — Vector Index (VSS), Int8 Quantization, ANN Search](#phase-36-—-vector-index-vss-int8-quantization-ann-search) | 2003 | +| [Phase 37 — Quick Wins: Selective Indexing, Code-to-Code Search, Negative Examples, Result Explanation](#phase-37-—-quick-wins-selective-indexing-code-to-code-search-negative-examples-result-explanation) | 2077 | +| [Phase 38 — Medium Effort: Documentation Gap Analysis, Semantic Bisect, GC, Boolean Queries](#phase-38-—-medium-effort-documentation-gap-analysis-semantic-bisect-gc-boolean-queries) | 2102 | +| [Phase 39 — Analysis Features: Contributor Profiles, Refactoring, Lifecycle, CI Diff](#phase-39-—-analysis-features-contributor-profiles-refactoring-lifecycle-ci-diff) | 2127 | +| [Phase 40 — Visualization & Scale: Codebase Map, Temporal Heatmap, Remote Index, Cherry-Pick](#phase-40-—-visualization-scale-codebase-map-temporal-heatmap-remote-index-cherry-pick) | 2152 | +| [Phase 41 — Multi-Repo Unified Index *(completed v0.43.0)*](#phase-41-—-multi-repo-unified-index-completed-v0430) | 2183 | +| [Phase 42 — IDE / LSP Integration *(completed v0.44.0)*](#phase-42-—-ide-lsp-integration-completed-v0440) | 2199 | +| [Phase 43 — Security Pattern Detection *(completed v0.45.0)*](#phase-43-—-security-pattern-detection-completed-v0450) | 2215 | +| [Phase 44 — Codebase Health Timeline *(completed v0.46.0)*](#phase-44-—-codebase-health-timeline-completed-v0460) | 2230 | +| [Phase 45 — Technical Debt Scoring *(completed v0.47.0)*](#phase-45-—-technical-debt-scoring-completed-v0470) | 2245 | +| [Phase 46 — Evolution Alerts and Commit URL Construction *(completed v0.48.0)*](#phase-46-—-evolution-alerts-and-commit-url-construction-completed-v0480) | 2262 | +| [Phase 47 — Richer Indexing Progress, Embed Latency Stats, and Incremental-by-Default Messaging](#phase-47-—-richer-indexing-progress-embed-latency-stats-and-incremental-by-default-messaging) | 2277 | +| [Phase 48 — Batch Embedding and Provider Throughput ✅ Implemented](#phase-48-—-batch-embedding-and-provider-throughput-✅-implemented) | 2307 | +| [Phase 49 — Auto-VSS Default Path ✅ Implemented (v0.51.0)](#phase-49-—-auto-vss-default-path-✅-implemented-v0510) | 2322 | +| [Phase 50 — Real Multi-Repo Search ✅ Implemented (v0.52.0)](#phase-50-—-real-multi-repo-search-✅-implemented-v0520) | 2334 | +| [Phase 51 — LSP Completion of the Protocol ✅ Implemented (v0.53.0)](#phase-51-—-lsp-completion-of-the-protocol-✅-implemented-v0530) | 2346 | +| [Phase 52 — Query Expansion ✅ Implemented (v0.54.0)](#phase-52-—-query-expansion-✅-implemented-v0540) | 2359 | +| [Phase 53 — Saved Searches and Watch Mode ✅ Implemented (v0.55.0)](#phase-53-—-saved-searches-and-watch-mode-✅-implemented-v0550) | 2371 | +| [Phase 54 — Index Bundle Export / Import ✅ Implemented (v0.56.0)](#phase-54-—-index-bundle-export-import-✅-implemented-v0560) | 2383 | +| [Phase 55 — Embedding Space Explorer (Web UI) ✅ Implemented (v0.57.0)](#phase-55-—-embedding-space-explorer-web-ui-✅-implemented-v0570) | 2394 | +| [Phase 56 — LLM-Powered Evolution Narration ✅ Implemented (v0.58.0)](#phase-56-—-llm-powered-evolution-narration-✅-implemented-v0580) | 2405 | +| [Phase 57 — GitHub Actions Integration for CI Diff ✅ Implemented (v0.59.0)](#phase-57-—-github-actions-integration-for-ci-diff-✅-implemented-v0590) | 2416 | +| [Phase 58 — Structured Security Scan (Static + Semantic) ✅ Implemented (v0.60.0)](#phase-58-—-structured-security-scan-static-semantic-✅-implemented-v0600) | 2427 | +| [Phase 59 — `gitsema tools` Subcommand Group (Protocol Servers) ✅ Implemented (v0.61.0)](#phase-59-—-gitsema-tools-subcommand-group-protocol-servers-✅-implemented-v0610) | 2439 | +| [Phase 60 — Uniform Column Headers + `--no-headings` Across All Commands ✅ Implemented (v.0.62.0)](#phase-60-—-uniform-column-headers-no-headings-across-all-commands-✅-implemented-v0620) | 2480 | +| [Phase 61 — MCP/HTTP Parity + Semantic PR Report *(completed v0.64.0)*](#phase-61-—-mcphttp-parity-semantic-pr-report-completed-v0640) | 2545 | +| [Phase 62 — Heavy Batching for Ollama + HTTP Providers *(completed v0.67.0)*](#phase-62-—-heavy-batching-for-ollama-http-providers-completed-v0670) | 2565 | +| [Phase 63 — Indexing Auto-Defaults and Adaptive Tuning *(completed v0.65.0)*](#phase-63-—-indexing-auto-defaults-and-adaptive-tuning-completed-v0650) | 2579 | +| [Phase 64 — Search Scalability + AI Retrieval Reliability *(completed v0.66.0)*](#phase-64-—-search-scalability-ai-retrieval-reliability-completed-v0660) | 2595 | +| [Phase 65 — Incident Triage Bundle *(completed v0.68.0)*](#phase-65-—-incident-triage-bundle-completed-v0680) | 2609 | +| [Phase 66 — Policy Checks for CI *(completed v0.68.0)*](#phase-66-—-policy-checks-for-ci-completed-v0680) | 2617 | +| [Phase 67 — Ownership Heatmap by Concept *(completed v0.68.0)*](#phase-67-—-ownership-heatmap-by-concept-completed-v0680) | 2625 | +| [Phase 68 — Persistent Workflow Templates *(completed v0.68.0)*](#phase-68-—-persistent-workflow-templates-completed-v0680) | 2633 | +| [Phase 69 — Pipelined Batch Indexing *(completed v0.68.0)*](#phase-69-—-pipelined-batch-indexing-completed-v0680) | 2641 | +| [Phase 70 — Unified Output System *(completed v0.69.0)*](#phase-70-—-unified-output-system-completed-v0690) | 2649 | +| [Phase 71 — Index Status Dashboard + Model Management *(completed v0.71.0)*](#phase-71-—-index-status-dashboard-model-management-completed-v0710) | 2666 | +| [Planned Phases (72+)](#planned-phases-72) | 2688 | +| [Phase 71 — Operational Readiness: Metrics, Rate Limiting, and OpenAPI *(completed v0.71.0)*](#phase-71-—-operational-readiness-metrics-rate-limiting-and-openapi-completed-v0710) | 2694 | +| [Phase 72 — HTTP Route Parity for All Analysis Commands *(completed v0.72.0)*](#phase-72-—-http-route-parity-for-all-analysis-commands-completed-v0720) | 2707 | +| [Phase 73 — Deployment Guide and Docker Infrastructure](#phase-73-—-deployment-guide-and-docker-infrastructure) | 2719 | +| [Phase 74 — `gitsema status` Scale Warnings + Extended `gitsema doctor` Pre-flight](#phase-74-—-gitsema-status-scale-warnings-extended-gitsema-doctor-pre-flight) | 2732 | +| [Phase 75 — Per-Repo Access Control on HTTP Server](#phase-75-—-per-repo-access-control-on-http-server) | 2745 | +| [Phase 76 — Complete `htmlRenderer.ts` Modularisation](#phase-76-—-complete-htmlrendererts-modularisation) | 2759 | +| [Phase 77 — Unified Indexing + Search Level Concept](#phase-77-—-unified-indexing-search-level-concept) | 2772 | +| [Phase 82 — Auto-cap Search Memory *(completed v0.79.0)*](#phase-82-—-auto-cap-search-memory-completed-v0790) | 2788 | +| [Phase 83 — Parallel Commit-Message Embedding *(completed v0.80.0)*](#phase-83-—-parallel-commit-message-embedding-completed-v0800) | 2800 | +| [Phase 84 — LSP: documentSymbol + Improved definition/references *(completed v0.81.0)*](#phase-84-—-lsp-documentsymbol-improved-definitionreferences-completed-v0810) | 2814 | +| [Phase 85 — Tier-1 Reliability: Test Isolation, SQL Sampling, Batch Dedup *(completed v0.84.0)*](#phase-85-—-tier-1-reliability-test-isolation-sql-sampling-batch-dedup-completed-v0840) | 2828 | +| [Phase 86 — Tier-2 Code Organisation: MCP Modularization + Search Module Split + CLI Register Split *(completed v0.85.0)*](#phase-86-—-tier-2-code-organisation-mcp-modularization-search-module-split-cli-register-split-completed-v0850) | 2856 | +| [Phase 87 — Tier-3 Robustness: Embed Retry, Queue Backpressure, Atomic FTS5, Body Limit *(completed v0.86.0)*](#phase-87-—-tier-3-robustness-embed-retry-queue-backpressure-atomic-fts5-body-limit-completed-v0860) | 2884 | +| [Phase 88 — Tier-4 Scale/Features: LLM Narrator Tests + Docs Sync Check *(completed v0.87.0)*](#phase-88-—-tier-4-scalefeatures-llm-narrator-tests-docs-sync-check-completed-v0870) | 2916 | +| [Phase 89 — Tier-5 Code Quality: review6 §11 Detailed Findings *(completed v0.88.0)*](#phase-89-—-tier-5-code-quality-review6-§11-detailed-findings-completed-v0880) | 2940 | +| [Phase 90 — Model Local Names (Shorthand / globalName) *(completed v0.89.0)*](#phase-90-—-model-local-names-shorthand-globalname-completed-v0890) | 3020 | +| [Phase 91 — 8 Productized Usage Patterns (review7 §5) *(completed v0.90.0)*](#phase-91-—-8-productized-usage-patterns-review7-§5-completed-v0900) | 3075 | +| [Phase 92 — review7 Improvement Bundle *(completed, 2026-04-09)*](#phase-92-—-review7-improvement-bundle-completed-2026-04-09) | 3121 | +| [Phase 93 — Time filter semantics & pagination stability](#phase-93-—-time-filter-semantics-pagination-stability) | 3163 | +| [Phase 94 — review8 CLI Wiring & Documentation Restoration *(completed v0.91.0)*](#phase-94-—-review8-cli-wiring-documentation-restoration-completed-v0910) | 3193 | +| [Phase 95 — Flag unification (review8 §8.6/§8.9) *(completed v0.92.0)*](#phase-95-—-flag-unification-review8-§86§89-completed-v0920) | 3225 | +| [Phase 96 — LLM Narrator/Explainer/Guide via chattydeer *(completed v0.93.0)*](#phase-96-—-llm-narratorexplainerguide-via-chattydeer-completed-v0930) | 3248 | +| [Phase 97 — Full-toolset guide, tool interpretation registry, skill generation, Ollama docs](#phase-97-—-full-toolset-guide-tool-interpretation-registry-skill-generation-ollama-docs) | 3281 | +| [Phase 98 — CLI-based AI tool backends for narrator/guide](#phase-98-—-cli-based-ai-tool-backends-for-narratorguide) | 3342 | +| [Phase 99 — `--provider ollama` for narrator/guide + Ollama model discovery](#phase-99-—-provider-ollama-for-narratorguide-ollama-model-discovery) | 3406 | +| [Long-Term Investments](#long-term-investments) | 3447 | +| [Non-goals for now (revisited later)](#non-goals-for-now-revisited-later) | 3464 | +| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3472 | +| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101–103-—-pluggable-storage-backends-index-scoping) | 3540 | +| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3742 | +| [Knowledge Graph Track (Phases 105–112) — *planned*](#knowledge-graph-track-phases-105–112-—-planned) | 3888 | --- @@ -3932,8 +3932,8 @@ include `co-change`, `deps`, `cycles`, `callers`/`callees`/`path`/`neighbors`, | **108** | Traversal primitives + CLI/MCP | — | `GraphStore` seam (recursive CTEs); `gitsema graph callers\|callees\|neighbors\|path`; MCP `call_graph`/`graph_neighbors`. | | **109** | `--lens` toggle + structural ranking | — | Cross-cutting `--lens` + `--weight-structural` in the re-rank loop; new commands `blast-radius`, `relate`, `similar --lens`, `unused`; `impact` gains `--lens`. Semantic stays the default for existing commands. | | **110** | Fusion: cascade planner + hotspots | — | Cascade query planner (`FTS → vector → graph traversal → merge/rerank`); `hotspots`; structural enrichment of `code-review`/`explain`/`guide`/`triage`. | -| **111** | Unified graph UI | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. | -| **112** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the whole command surface (CLI + MCP + HTTP): shared `addLensOption()` helper, uniform §7.3 defaults + per-hit lens labeling, docs/skill/`interpretations.ts` parity, and a test asserting every lens-capable command exposes `lens`. Done last so it covers the 110 fusion commands too. | +| **111** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the whole command surface (CLI + MCP + HTTP): shared `addLensOption()` helper, uniform §7.3 defaults + per-hit lens labeling, docs/skill/`interpretations.ts` parity, and a test asserting every lens-capable command exposes `lens`. Done before the UI phase so it covers the 110 fusion commands too. | +| **112** | Unified graph UI (HTML + CLI) | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. Also adds a CLI/text-mode subgraph view (ASCII tree or list rendering of nodes/edges) for terminal-only workflows, alongside the HTML view. | Each phase ends with working software, tests, a `features.md` entry, a `PLAN.md` status update, and a changeset. **Start point: Phase 105** (isolated, test-heavy, @@ -4030,7 +4030,7 @@ commands — `gitsema graph callers [--depth]`, `gitsema graph callees edge kinds), registered in `src/mcp/tools/graph.ts`. **Deviation from the original sketch:** `call_graph`/`graph_neighbors` are not yet added to the `gitsema guide` `GUIDE_TOOLS` registry (46 tools) or `interpretations.ts` — left for the Phase 110 -fusion pass / Phase 112 lens-coverage sweep, consistent with `docsSync`'s existing +fusion pass / Phase 111 lens-coverage sweep, consistent with `docsSync`'s existing guard (which only requires every `GUIDE_TOOLS` entry to have an interpretation, not that every MCP tool is in `GUIDE_TOOLS`). No schema change. Tests: `tests/graphTraversal.test.ts`. From 784518253bb1ca3a20624955d89d949edee475a2 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 19:37:55 +0000 Subject: [PATCH 6/6] Sync knowledge-graph.md with Phase 111/112 reorder Mirrors the docs/PLAN.md swap: lens-coverage sweep is now Phase 111 (still ahead of the UI phase, still covers the 110 fusion commands), and the unified graph UI is now Phase 112 with both HTML and CLI views. https://claude.ai/code/session_01BjWermmuwNjWvx2dXfDU6q --- docs/knowledge-graph.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/knowledge-graph.md b/docs/knowledge-graph.md index 1332ead..f4f2877 100644 --- a/docs/knowledge-graph.md +++ b/docs/knowledge-graph.md @@ -3,8 +3,8 @@ > Status: **Design / not yet implemented.** Target track: Phases 105–112. > Scope decision (owner): build the **structural** graph first (typed edges from > static analysis), starting with **TypeScript/JavaScript + Python**, then expand -> to Go/Rust/Java. A separate presentation/UI graph (Phase 111) folds in later, and -> a cross-command lens-coverage sweep (Phase 112) closes the track. +> to Go/Rust/Java. A cross-command lens-coverage sweep (Phase 111) precedes a +> separate presentation/UI graph (Phase 112, HTML + CLI) which closes the track. This document is the single design reference for the knowledge-graph track. It nails down the **identity model**, the **schema**, the **per-language name-resolution @@ -376,7 +376,7 @@ fusion lives in one place. This mirrors the precedent set by `--hybrid` + `--bm25-weight` (vector+BM25 blend), so the convention is already idiomatic. The MCP/HTTP tool surfaces expose `lens` as a -parameter wherever the CLI flag exists. A dedicated **Phase 112** sweeps the whole +parameter wherever the CLI flag exists. A dedicated **Phase 111** sweeps the whole command set to make this coverage uniform and mechanically enforced (a shared `addLensOption()` helper + a parity test), rather than wired ad-hoc per command. @@ -427,8 +427,8 @@ when each becomes buildable — several temporal ones need only `co_change` | **108** | Traversal primitives + CLI/MCP | — | `GraphStore` seam (recursive CTEs); `gitsema graph callers\|callees\|neighbors\|path`; MCP `call_graph`/`graph_neighbors`. | | **109** | `--lens` toggle + structural ranking | — | Cross-cutting `--lens semantic\|structural\|hybrid` + `--weight-structural` (§7) wired into the re-rank loop; new commands `blast-radius`, `relate`, `similar --lens`, `unused`; `impact` gains `--lens`. **Semantic stays the default for existing commands.** | | **110** | Fusion: cascade planner + hotspots | — | Cascade query planner `FTS filter → vector expand → graph traversal → merge/rerank`; `hotspots`; structural enrichment of `code-review`/`explain`/`guide`/`triage`. | -| **111** | Unified graph UI | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. | -| **112** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the **entire command surface** (CLI + MCP + HTTP): wire `--lens`/`lens` into every command where more than one lens is meaningful, via a single shared `addLensOption()` helper; enforce the §7.3 defaults and per-hit lens labeling uniformly; restore docs / skill / `interpretations.ts` parity; add a test asserting every lens-capable command/tool exposes `lens` (the same mechanical-guarantee approach as `docsSync`). Done last so it also covers the fusion commands from 110. | +| **111** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the **entire command surface** (CLI + MCP + HTTP): wire `--lens`/`lens` into every command where more than one lens is meaningful, via a single shared `addLensOption()` helper; enforce the §7.3 defaults and per-hit lens labeling uniformly; restore docs / skill / `interpretations.ts` parity; add a test asserting every lens-capable command/tool exposes `lens` (the same mechanical-guarantee approach as `docsSync`). Done before the UI phase so it also covers the fusion commands from 110. | +| **112** | Unified graph UI (HTML + CLI) | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. Also adds a CLI/text-mode subgraph view (ASCII tree or list rendering of nodes/edges) for terminal-only workflows. | Each phase ends with working software, tests, a `features.md` entry, a `PLAN.md` status update, and a changeset (per `CLAUDE.md`).