diff --git a/.changeset/phase-108-graph-traversal.md b/.changeset/phase-108-graph-traversal.md new file mode 100644 index 0000000..0d7e730 --- /dev/null +++ b/.changeset/phase-108-graph-traversal.md @@ -0,0 +1,5 @@ +--- +"gitsema": minor +--- + +Add graph traversal primitives over the Phase 107 structural graph: `gitsema graph callers ` / `gitsema graph callees ` (transitive `calls` traversal, default and max depth 3), `gitsema graph neighbors ` (typed neighborhood, any edge kinds, configurable direction/depth), and `gitsema graph path ` (shortest typed path between two nodes). New MCP tools `call_graph` and `graph_neighbors` expose the same traversals. diff --git a/.changeset/phase-109-lens-toggle.md b/.changeset/phase-109-lens-toggle.md new file mode 100644 index 0000000..5bf1b5a --- /dev/null +++ b/.changeset/phase-109-lens-toggle.md @@ -0,0 +1,5 @@ +--- +"gitsema": minor +--- + +Add a cross-cutting `--lens semantic|structural|hybrid` toggle (plus `--weight-structural `) and four new structural/semantic fusion commands: `gitsema blast-radius ` ("what changes if I touch this" — structural dependents and/or semantically similar blobs), `gitsema relate ` (callers/callees plus semantically similar blobs, both lenses), `gitsema similar ` (same call/import shape and/or semantic similarity), and `gitsema unused` (symbols/files with no inbound calls/imports edges). `gitsema impact --lens structural|hybrid` now reuses `blast-radius` for true structural impact analysis. diff --git a/CLAUDE.md b/CLAUDE.md index f15dc31..1501bf6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -101,6 +101,12 @@ pnpm test -- --watch # watch mode during development - Mock modules with `vi.mock()`, spy with `vi.fn()`, clean up with `vi.restoreAllMocks()` in `afterEach` - Integration tests use `mkdtempSync()` + `rmSync()` for isolated temp Git repos - `withDbSession()` helper creates isolated temp SQLite DBs per test +- **Always close `session.rawDb` (`better-sqlite3`) before `rmSync()`-ing its temp + directory.** On Windows, `rmSync` on a directory containing an open SQLite handle + fails with `EBUSY: resource busy or locked, unlink '...\test.db'` — this passes on + Linux/macOS (CI runs `ubuntu-latest` by default) but fails the Windows CI job. Call + `session.rawDb.close()` (e.g. in a `try`/`finally` around `withDbSession()`) before + the test's temp dir is removed in `afterEach`. --- @@ -468,7 +474,7 @@ node dist/cli/index.js tools mcp The MCP server reads the same environment variables as the CLI. It runs against the `.gitsema/index.db` in the current working directory when the server is started. -**Exposed tools (34 total, registered across `src/mcp/tools/{search,analysis,clustering,infrastructure,workflow,narrator}.ts`):** +**Exposed tools (36 total, registered across `src/mcp/tools/{search,analysis,clustering,infrastructure,workflow,narrator,graph}.ts`):** | Tool | Description | |---|---| @@ -506,6 +512,8 @@ The MCP server reads the same environment variables as the CLI. It runs against | `workflow_run` | Run a named workflow template (`pr-review` \| `incident` \| `release-audit`) | | `narrate_repo` | Generate evidence (default) or an LLM narrative of repository development history | | `explain_issue_or_error` | Generate evidence (default) or an LLM explanation/timeline for a bug, error, or topic | +| `call_graph` | Structural call-graph traversal — callers/callees of a symbol (Phase 108) | +| `graph_neighbors` | Typed neighborhood of a graph node — any edge kinds, direction, depth (Phase 108) | --- diff --git a/README.md b/README.md index e7ed033..891a99c 100644 --- a/README.md +++ b/README.md @@ -358,7 +358,7 @@ CI policy gate over drift, debt, and security thresholds. | `gitsema file-evolution [options]` | Track semantic drift of a file over its Git history (see also: `file-diff`, `evolution`) | | `gitsema file-diff ` | Compute semantic diff between two versions of a file | | `gitsema blame ` (alias: `semantic-blame`) | Show semantic origin of each logical block in a file — nearest-neighbor blame | -| `gitsema impact ` | Compute semantically similar blobs across the codebase to highlight refactor impact | +| `gitsema impact [--lens ] [--weight-structural ]` | Compute semantically similar blobs across the codebase to highlight refactor impact. `--lens structural\|hybrid` makes this a thin alias over `blast-radius` (default lens: semantic, pre-Phase-109 behavior) | #### `gitsema file-evolution [options]` @@ -435,6 +435,14 @@ Track semantic drift of a single file across its Git history. | `gitsema co-change [-k/--top ]` | Files that historically change together with `` | | `gitsema deps [--reverse] [--depth ] [--edge-types ]` | Import/dependency closure of a file or symbol (default edge types: `imports,calls,extends,implements`) | | `gitsema graph cycles [--edge-types ]` / `gitsema cycles [--edge-types ]` | Detect cycles in the structural graph (default: `imports`) | +| `gitsema graph callers [--depth ]` | Reverse `calls` traversal — who (transitively) calls `` (default depth 3, max 3) | +| `gitsema graph callees [--depth ]` | Forward `calls` traversal — what `` (transitively) calls (default depth 3, max 3) | +| `gitsema graph neighbors [--edge-types ] [--direction ] [--depth ]` | Typed neighborhood of `` — any edge kinds by default (default depth 1, max 3) | +| `gitsema graph path ` | Shortest typed path from `` to `` (max depth 3) | +| `gitsema blast-radius [--lens ] [--depth ] [-k/--top ] [--weight-structural ]` | What changes if I touch this — structural dependents (`calls`/`imports`/`extends`/`implements`/`references`, reverse traversal) and/or semantically similar blobs (default lens: hybrid) | +| `gitsema relate [-k/--top ]` | Callers/callees (structural, depth 1) and semantically similar blobs, labeled — both lenses, lose neither | +| `gitsema similar [--lens ] [-k/--top ] [--weight-structural ]` | Symbols/files with a similar call/import shape (structural, Jaccard overlap) and/or semantically similar (vector) (default lens: hybrid) | +| `gitsema unused [--edge-types ]` | Symbols/files with no inbound `calls`/`imports` edges — structural complement to `dead-concepts` | ### Workflow & CI diff --git a/docs/PLAN.md b/docs/PLAN.md index bb39d46..274e6ee 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -8,118 +8,119 @@ | Section | Line | |---|---:| -| [Vision](#vision) | 126 | -| [Guiding principles](#guiding-principles) | 132 | -| [Architecture overview](#architecture-overview) | 142 | -| [Project structure](#project-structure) | 162 | -| [Section I - Phases](#section-i-phases) | 214 | -| [Phase 1 — Foundation](#phase-1-—-foundation) | 216 | -| [Phase 2 — Git walking](#phase-2-—-git-walking) | 258 | -| [Phase 3 — Embedding system](#phase-3-—-embedding-system) | 282 | -| [Phase 4 — Indexing](#phase-4-—-indexing) | 320 | -| [Phase 5 — Search · *MVP deliverable*](#phase-5-—-search-·-mvp-deliverable) | 346 | -| [Phase 6 — Commit mapping](#phase-6-—-commit-mapping) | 379 | -| [Phase 7 — Time-aware queries · *Phase 2 deliverable*](#phase-7-—-time-aware-queries-·-phase-2-deliverable) | 416 | -| [Phase 8 — File-type-aware embedding models](#phase-8-—-file-type-aware-embedding-models) | 449 | -| [Phase 9 — Performance](#phase-9-—-performance) | 487 | -| [Phase 10 — Smarter semantics](#phase-10-—-smarter-semantics) | 525 | -| [Phase 11 — Advanced features + MCP](#phase-11-—-advanced-features-mcp) | 570 | -| [Phase 11b — Content access and semantic concept tracking](#phase-11b-—-content-access-and-semantic-concept-tracking) | 641 | -| [Key technical decisions](#key-technical-decisions) | 758 | -| [Risk register](#risk-register) | 770 | -| [Phase 12 — CLI consolidation & robust per-file indexing](#phase-12-—-cli-consolidation-robust-per-file-indexing) | 782 | -| [Recent progress (snapshot: 2026-04-01)](#recent-progress-snapshot-2026-04-01) | 812 | -| [Phase 13 — Standalone model server for embeddings](#phase-13-—-standalone-model-server-for-embeddings) | 828 | -| [Phase 14 — Infrastructure, tooling, and maintenance](#phase-14-—-infrastructure-tooling-and-maintenance) | 911 | -| [Phase 14b — Search result deduplication](#phase-14b-—-search-result-deduplication) | 968 | -| [Phase 15 — Branch awareness](#phase-15-—-branch-awareness) | 1002 | -| [Phase 16 — Remote-repository indexing (server-managed clone, RAM-backed working tree, persistent DB)](#phase-16-—-remote-repository-indexing-server-managed-clone-ram-backed-working-tree-persistent-db) | 1074 | -| [Phase 17 — Remote-indexing hardening and SSH support](#phase-17-—-remote-indexing-hardening-and-ssh-support) | 1332 | -| [Phase 18 — Reliability, tests, and query caching](#phase-18-—-reliability-tests-and-query-caching) | 1403 | -| [Phase 19 — Smarter chunking, semantic blame & symbol-level embeddings](#phase-19-—-smarter-chunking-semantic-blame-symbol-level-embeddings) | 1417 | -| [Phase 20 — Dead-concept detection & refactor impact analysis](#phase-20-—-dead-concept-detection-refactor-impact-analysis) | 1482 | -| [Phase 21 — Semantic clustering & concept graph](#phase-21-—-semantic-clustering-concept-graph) | 1495 | -| [Phase 22 — Temporal cluster diff](#phase-22-—-temporal-cluster-diff) | 1508 | -| [Phase 23 — Cluster timeline](#phase-23-—-cluster-timeline) | 1521 | -| [Phase 24 — Enhanced cluster labeling](#phase-24-—-enhanced-cluster-labeling) | 1535 | -| [Phase 25 — Interactive HTML visualizations](#phase-25-—-interactive-html-visualizations) | 1549 | -| [Phase 26 — CLI naming consolidation & conceptual diff](#phase-26-—-cli-naming-consolidation-conceptual-diff) | 1564 | -| [Phase 27 — Semantic change-point detection](#phase-27-—-semantic-change-point-detection) | 1605 | -| [Phase 28 — Persistent configuration management](#phase-28-—-persistent-configuration-management) | 1665 | -| [Phase 29 — Automated indexing via Git hooks](#phase-29-—-automated-indexing-via-git-hooks) | 1692 | -| [Phase 30 — Commit message semantic indexing](#phase-30-—-commit-message-semantic-indexing) | 1708 | -| [Phase 31 — Semantic concept authorship ranking](#phase-31-—-semantic-concept-authorship-ranking) | 1759 | -| [Phase 32 — Branch and merge awareness](#phase-32-—-branch-and-merge-awareness) | 1809 | -| [Phase 33 — Multi-level hierarchical indexing](#phase-33-—-multi-level-hierarchical-indexing) | 1870 | -| [Phase 34 — Feature adoption & cross-cutting improvements](#phase-34-—-feature-adoption-cross-cutting-improvements) | 1926 | -| [Phase 35 — Multi-model DB, per-command model flags, clear-model, multi-model search](#phase-35-—-multi-model-db-per-command-model-flags-clear-model-multi-model-search) | 1964 | -| [Phase 36 — Vector Index (VSS), Int8 Quantization, ANN Search](#phase-36-—-vector-index-vss-int8-quantization-ann-search) | 2002 | -| [Phase 37 — Quick Wins: Selective Indexing, Code-to-Code Search, Negative Examples, Result Explanation](#phase-37-—-quick-wins-selective-indexing-code-to-code-search-negative-examples-result-explanation) | 2076 | -| [Phase 38 — Medium Effort: Documentation Gap Analysis, Semantic Bisect, GC, Boolean Queries](#phase-38-—-medium-effort-documentation-gap-analysis-semantic-bisect-gc-boolean-queries) | 2101 | -| [Phase 39 — Analysis Features: Contributor Profiles, Refactoring, Lifecycle, CI Diff](#phase-39-—-analysis-features-contributor-profiles-refactoring-lifecycle-ci-diff) | 2126 | -| [Phase 40 — Visualization & Scale: Codebase Map, Temporal Heatmap, Remote Index, Cherry-Pick](#phase-40-—-visualization-scale-codebase-map-temporal-heatmap-remote-index-cherry-pick) | 2151 | -| [Phase 41 — Multi-Repo Unified Index *(completed v0.43.0)*](#phase-41-—-multi-repo-unified-index-completed-v0430) | 2182 | -| [Phase 42 — IDE / LSP Integration *(completed v0.44.0)*](#phase-42-—-ide-lsp-integration-completed-v0440) | 2198 | -| [Phase 43 — Security Pattern Detection *(completed v0.45.0)*](#phase-43-—-security-pattern-detection-completed-v0450) | 2214 | -| [Phase 44 — Codebase Health Timeline *(completed v0.46.0)*](#phase-44-—-codebase-health-timeline-completed-v0460) | 2229 | -| [Phase 45 — Technical Debt Scoring *(completed v0.47.0)*](#phase-45-—-technical-debt-scoring-completed-v0470) | 2244 | -| [Phase 46 — Evolution Alerts and Commit URL Construction *(completed v0.48.0)*](#phase-46-—-evolution-alerts-and-commit-url-construction-completed-v0480) | 2261 | -| [Phase 47 — Richer Indexing Progress, Embed Latency Stats, and Incremental-by-Default Messaging](#phase-47-—-richer-indexing-progress-embed-latency-stats-and-incremental-by-default-messaging) | 2276 | -| [Phase 48 — Batch Embedding and Provider Throughput ✅ Implemented](#phase-48-—-batch-embedding-and-provider-throughput-✅-implemented) | 2306 | -| [Phase 49 — Auto-VSS Default Path ✅ Implemented (v0.51.0)](#phase-49-—-auto-vss-default-path-✅-implemented-v0510) | 2321 | -| [Phase 50 — Real Multi-Repo Search ✅ Implemented (v0.52.0)](#phase-50-—-real-multi-repo-search-✅-implemented-v0520) | 2333 | -| [Phase 51 — LSP Completion of the Protocol ✅ Implemented (v0.53.0)](#phase-51-—-lsp-completion-of-the-protocol-✅-implemented-v0530) | 2345 | -| [Phase 52 — Query Expansion ✅ Implemented (v0.54.0)](#phase-52-—-query-expansion-✅-implemented-v0540) | 2358 | -| [Phase 53 — Saved Searches and Watch Mode ✅ Implemented (v0.55.0)](#phase-53-—-saved-searches-and-watch-mode-✅-implemented-v0550) | 2370 | -| [Phase 54 — Index Bundle Export / Import ✅ Implemented (v0.56.0)](#phase-54-—-index-bundle-export-import-✅-implemented-v0560) | 2382 | -| [Phase 55 — Embedding Space Explorer (Web UI) ✅ Implemented (v0.57.0)](#phase-55-—-embedding-space-explorer-web-ui-✅-implemented-v0570) | 2393 | -| [Phase 56 — LLM-Powered Evolution Narration ✅ Implemented (v0.58.0)](#phase-56-—-llm-powered-evolution-narration-✅-implemented-v0580) | 2404 | -| [Phase 57 — GitHub Actions Integration for CI Diff ✅ Implemented (v0.59.0)](#phase-57-—-github-actions-integration-for-ci-diff-✅-implemented-v0590) | 2415 | -| [Phase 58 — Structured Security Scan (Static + Semantic) ✅ Implemented (v0.60.0)](#phase-58-—-structured-security-scan-static-semantic-✅-implemented-v0600) | 2426 | -| [Phase 59 — `gitsema tools` Subcommand Group (Protocol Servers) ✅ Implemented (v0.61.0)](#phase-59-—-gitsema-tools-subcommand-group-protocol-servers-✅-implemented-v0610) | 2438 | -| [Phase 60 — Uniform Column Headers + `--no-headings` Across All Commands ✅ Implemented (v.0.62.0)](#phase-60-—-uniform-column-headers-no-headings-across-all-commands-✅-implemented-v0620) | 2479 | -| [Phase 61 — MCP/HTTP Parity + Semantic PR Report *(completed v0.64.0)*](#phase-61-—-mcphttp-parity-semantic-pr-report-completed-v0640) | 2544 | -| [Phase 62 — Heavy Batching for Ollama + HTTP Providers *(completed v0.67.0)*](#phase-62-—-heavy-batching-for-ollama-http-providers-completed-v0670) | 2564 | -| [Phase 63 — Indexing Auto-Defaults and Adaptive Tuning *(completed v0.65.0)*](#phase-63-—-indexing-auto-defaults-and-adaptive-tuning-completed-v0650) | 2578 | -| [Phase 64 — Search Scalability + AI Retrieval Reliability *(completed v0.66.0)*](#phase-64-—-search-scalability-ai-retrieval-reliability-completed-v0660) | 2594 | -| [Phase 65 — Incident Triage Bundle *(completed v0.68.0)*](#phase-65-—-incident-triage-bundle-completed-v0680) | 2608 | -| [Phase 66 — Policy Checks for CI *(completed v0.68.0)*](#phase-66-—-policy-checks-for-ci-completed-v0680) | 2616 | -| [Phase 67 — Ownership Heatmap by Concept *(completed v0.68.0)*](#phase-67-—-ownership-heatmap-by-concept-completed-v0680) | 2624 | -| [Phase 68 — Persistent Workflow Templates *(completed v0.68.0)*](#phase-68-—-persistent-workflow-templates-completed-v0680) | 2632 | -| [Phase 69 — Pipelined Batch Indexing *(completed v0.68.0)*](#phase-69-—-pipelined-batch-indexing-completed-v0680) | 2640 | -| [Phase 70 — Unified Output System *(completed v0.69.0)*](#phase-70-—-unified-output-system-completed-v0690) | 2648 | -| [Phase 71 — Index Status Dashboard + Model Management *(completed v0.71.0)*](#phase-71-—-index-status-dashboard-model-management-completed-v0710) | 2665 | -| [Planned Phases (72+)](#planned-phases-72) | 2687 | -| [Phase 71 — Operational Readiness: Metrics, Rate Limiting, and OpenAPI *(completed v0.71.0)*](#phase-71-—-operational-readiness-metrics-rate-limiting-and-openapi-completed-v0710) | 2693 | -| [Phase 72 — HTTP Route Parity for All Analysis Commands *(completed v0.72.0)*](#phase-72-—-http-route-parity-for-all-analysis-commands-completed-v0720) | 2706 | -| [Phase 73 — Deployment Guide and Docker Infrastructure](#phase-73-—-deployment-guide-and-docker-infrastructure) | 2718 | -| [Phase 74 — `gitsema status` Scale Warnings + Extended `gitsema doctor` Pre-flight](#phase-74-—-gitsema-status-scale-warnings-extended-gitsema-doctor-pre-flight) | 2731 | -| [Phase 75 — Per-Repo Access Control on HTTP Server](#phase-75-—-per-repo-access-control-on-http-server) | 2744 | -| [Phase 76 — Complete `htmlRenderer.ts` Modularisation](#phase-76-—-complete-htmlrendererts-modularisation) | 2758 | -| [Phase 77 — Unified Indexing + Search Level Concept](#phase-77-—-unified-indexing-search-level-concept) | 2771 | -| [Phase 82 — Auto-cap Search Memory *(completed v0.79.0)*](#phase-82-—-auto-cap-search-memory-completed-v0790) | 2787 | -| [Phase 83 — Parallel Commit-Message Embedding *(completed v0.80.0)*](#phase-83-—-parallel-commit-message-embedding-completed-v0800) | 2799 | -| [Phase 84 — LSP: documentSymbol + Improved definition/references *(completed v0.81.0)*](#phase-84-—-lsp-documentsymbol-improved-definitionreferences-completed-v0810) | 2813 | -| [Phase 85 — Tier-1 Reliability: Test Isolation, SQL Sampling, Batch Dedup *(completed v0.84.0)*](#phase-85-—-tier-1-reliability-test-isolation-sql-sampling-batch-dedup-completed-v0840) | 2827 | -| [Phase 86 — Tier-2 Code Organisation: MCP Modularization + Search Module Split + CLI Register Split *(completed v0.85.0)*](#phase-86-—-tier-2-code-organisation-mcp-modularization-search-module-split-cli-register-split-completed-v0850) | 2855 | -| [Phase 87 — Tier-3 Robustness: Embed Retry, Queue Backpressure, Atomic FTS5, Body Limit *(completed v0.86.0)*](#phase-87-—-tier-3-robustness-embed-retry-queue-backpressure-atomic-fts5-body-limit-completed-v0860) | 2883 | -| [Phase 88 — Tier-4 Scale/Features: LLM Narrator Tests + Docs Sync Check *(completed v0.87.0)*](#phase-88-—-tier-4-scalefeatures-llm-narrator-tests-docs-sync-check-completed-v0870) | 2915 | -| [Phase 89 — Tier-5 Code Quality: review6 §11 Detailed Findings *(completed v0.88.0)*](#phase-89-—-tier-5-code-quality-review6-§11-detailed-findings-completed-v0880) | 2939 | -| [Phase 90 — Model Local Names (Shorthand / globalName) *(completed v0.89.0)*](#phase-90-—-model-local-names-shorthand-globalname-completed-v0890) | 3019 | -| [Phase 91 — 8 Productized Usage Patterns (review7 §5) *(completed v0.90.0)*](#phase-91-—-8-productized-usage-patterns-review7-§5-completed-v0900) | 3074 | -| [Phase 92 — review7 Improvement Bundle *(completed, 2026-04-09)*](#phase-92-—-review7-improvement-bundle-completed-2026-04-09) | 3120 | -| [Phase 93 — Time filter semantics & pagination stability](#phase-93-—-time-filter-semantics-pagination-stability) | 3162 | -| [Phase 94 — review8 CLI Wiring & Documentation Restoration *(completed v0.91.0)*](#phase-94-—-review8-cli-wiring-documentation-restoration-completed-v0910) | 3192 | -| [Phase 95 — Flag unification (review8 §8.6/§8.9) *(completed v0.92.0)*](#phase-95-—-flag-unification-review8-§86§89-completed-v0920) | 3224 | -| [Phase 96 — LLM Narrator/Explainer/Guide via chattydeer *(completed v0.93.0)*](#phase-96-—-llm-narratorexplainerguide-via-chattydeer-completed-v0930) | 3247 | -| [Phase 97 — Full-toolset guide, tool interpretation registry, skill generation, Ollama docs](#phase-97-—-full-toolset-guide-tool-interpretation-registry-skill-generation-ollama-docs) | 3280 | -| [Phase 98 — CLI-based AI tool backends for narrator/guide](#phase-98-—-cli-based-ai-tool-backends-for-narratorguide) | 3341 | -| [Phase 99 — `--provider ollama` for narrator/guide + Ollama model discovery](#phase-99-—-provider-ollama-for-narratorguide-ollama-model-discovery) | 3405 | -| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3471 | -| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101103-—-pluggable-storage-backends-index-scoping) | 3539 | -| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3741 | -| [Long-Term Investments](#long-term-investments) | 3446 | -| [Non-goals for now (revisited later)](#non-goals-for-now-revisited-later) | 3463 | +| [Vision](#vision) | 127 | +| [Guiding principles](#guiding-principles) | 133 | +| [Architecture overview](#architecture-overview) | 143 | +| [Project structure](#project-structure) | 163 | +| [Section I - Phases](#section-i-phases) | 215 | +| [Phase 1 — Foundation](#phase-1-—-foundation) | 217 | +| [Phase 2 — Git walking](#phase-2-—-git-walking) | 259 | +| [Phase 3 — Embedding system](#phase-3-—-embedding-system) | 283 | +| [Phase 4 — Indexing](#phase-4-—-indexing) | 321 | +| [Phase 5 — Search · *MVP deliverable*](#phase-5-—-search-·-mvp-deliverable) | 347 | +| [Phase 6 — Commit mapping](#phase-6-—-commit-mapping) | 380 | +| [Phase 7 — Time-aware queries · *Phase 2 deliverable*](#phase-7-—-time-aware-queries-·-phase-2-deliverable) | 417 | +| [Phase 8 — File-type-aware embedding models](#phase-8-—-file-type-aware-embedding-models) | 450 | +| [Phase 9 — Performance](#phase-9-—-performance) | 488 | +| [Phase 10 — Smarter semantics](#phase-10-—-smarter-semantics) | 526 | +| [Phase 11 — Advanced features + MCP](#phase-11-—-advanced-features-mcp) | 571 | +| [Phase 11b — Content access and semantic concept tracking](#phase-11b-—-content-access-and-semantic-concept-tracking) | 642 | +| [Key technical decisions](#key-technical-decisions) | 759 | +| [Risk register](#risk-register) | 771 | +| [Phase 12 — CLI consolidation & robust per-file indexing](#phase-12-—-cli-consolidation-robust-per-file-indexing) | 783 | +| [Recent progress (snapshot: 2026-04-01)](#recent-progress-snapshot-2026-04-01) | 813 | +| [Phase 13 — Standalone model server for embeddings](#phase-13-—-standalone-model-server-for-embeddings) | 829 | +| [Phase 14 — Infrastructure, tooling, and maintenance](#phase-14-—-infrastructure-tooling-and-maintenance) | 912 | +| [Phase 14b — Search result deduplication](#phase-14b-—-search-result-deduplication) | 969 | +| [Phase 15 — Branch awareness](#phase-15-—-branch-awareness) | 1003 | +| [Phase 16 — Remote-repository indexing (server-managed clone, RAM-backed working tree, persistent DB)](#phase-16-—-remote-repository-indexing-server-managed-clone-ram-backed-working-tree-persistent-db) | 1075 | +| [Phase 17 — Remote-indexing hardening and SSH support](#phase-17-—-remote-indexing-hardening-and-ssh-support) | 1333 | +| [Phase 18 — Reliability, tests, and query caching](#phase-18-—-reliability-tests-and-query-caching) | 1404 | +| [Phase 19 — Smarter chunking, semantic blame & symbol-level embeddings](#phase-19-—-smarter-chunking-semantic-blame-symbol-level-embeddings) | 1418 | +| [Phase 20 — Dead-concept detection & refactor impact analysis](#phase-20-—-dead-concept-detection-refactor-impact-analysis) | 1483 | +| [Phase 21 — Semantic clustering & concept graph](#phase-21-—-semantic-clustering-concept-graph) | 1496 | +| [Phase 22 — Temporal cluster diff](#phase-22-—-temporal-cluster-diff) | 1509 | +| [Phase 23 — Cluster timeline](#phase-23-—-cluster-timeline) | 1522 | +| [Phase 24 — Enhanced cluster labeling](#phase-24-—-enhanced-cluster-labeling) | 1536 | +| [Phase 25 — Interactive HTML visualizations](#phase-25-—-interactive-html-visualizations) | 1550 | +| [Phase 26 — CLI naming consolidation & conceptual diff](#phase-26-—-cli-naming-consolidation-conceptual-diff) | 1565 | +| [Phase 27 — Semantic change-point detection](#phase-27-—-semantic-change-point-detection) | 1606 | +| [Phase 28 — Persistent configuration management](#phase-28-—-persistent-configuration-management) | 1666 | +| [Phase 29 — Automated indexing via Git hooks](#phase-29-—-automated-indexing-via-git-hooks) | 1693 | +| [Phase 30 — Commit message semantic indexing](#phase-30-—-commit-message-semantic-indexing) | 1709 | +| [Phase 31 — Semantic concept authorship ranking](#phase-31-—-semantic-concept-authorship-ranking) | 1760 | +| [Phase 32 — Branch and merge awareness](#phase-32-—-branch-and-merge-awareness) | 1810 | +| [Phase 33 — Multi-level hierarchical indexing](#phase-33-—-multi-level-hierarchical-indexing) | 1871 | +| [Phase 34 — Feature adoption & cross-cutting improvements](#phase-34-—-feature-adoption-cross-cutting-improvements) | 1927 | +| [Phase 35 — Multi-model DB, per-command model flags, clear-model, multi-model search](#phase-35-—-multi-model-db-per-command-model-flags-clear-model-multi-model-search) | 1965 | +| [Phase 36 — Vector Index (VSS), Int8 Quantization, ANN Search](#phase-36-—-vector-index-vss-int8-quantization-ann-search) | 2003 | +| [Phase 37 — Quick Wins: Selective Indexing, Code-to-Code Search, Negative Examples, Result Explanation](#phase-37-—-quick-wins-selective-indexing-code-to-code-search-negative-examples-result-explanation) | 2077 | +| [Phase 38 — Medium Effort: Documentation Gap Analysis, Semantic Bisect, GC, Boolean Queries](#phase-38-—-medium-effort-documentation-gap-analysis-semantic-bisect-gc-boolean-queries) | 2102 | +| [Phase 39 — Analysis Features: Contributor Profiles, Refactoring, Lifecycle, CI Diff](#phase-39-—-analysis-features-contributor-profiles-refactoring-lifecycle-ci-diff) | 2127 | +| [Phase 40 — Visualization & Scale: Codebase Map, Temporal Heatmap, Remote Index, Cherry-Pick](#phase-40-—-visualization-scale-codebase-map-temporal-heatmap-remote-index-cherry-pick) | 2152 | +| [Phase 41 — Multi-Repo Unified Index *(completed v0.43.0)*](#phase-41-—-multi-repo-unified-index-completed-v0430) | 2183 | +| [Phase 42 — IDE / LSP Integration *(completed v0.44.0)*](#phase-42-—-ide-lsp-integration-completed-v0440) | 2199 | +| [Phase 43 — Security Pattern Detection *(completed v0.45.0)*](#phase-43-—-security-pattern-detection-completed-v0450) | 2215 | +| [Phase 44 — Codebase Health Timeline *(completed v0.46.0)*](#phase-44-—-codebase-health-timeline-completed-v0460) | 2230 | +| [Phase 45 — Technical Debt Scoring *(completed v0.47.0)*](#phase-45-—-technical-debt-scoring-completed-v0470) | 2245 | +| [Phase 46 — Evolution Alerts and Commit URL Construction *(completed v0.48.0)*](#phase-46-—-evolution-alerts-and-commit-url-construction-completed-v0480) | 2262 | +| [Phase 47 — Richer Indexing Progress, Embed Latency Stats, and Incremental-by-Default Messaging](#phase-47-—-richer-indexing-progress-embed-latency-stats-and-incremental-by-default-messaging) | 2277 | +| [Phase 48 — Batch Embedding and Provider Throughput ✅ Implemented](#phase-48-—-batch-embedding-and-provider-throughput-✅-implemented) | 2307 | +| [Phase 49 — Auto-VSS Default Path ✅ Implemented (v0.51.0)](#phase-49-—-auto-vss-default-path-✅-implemented-v0510) | 2322 | +| [Phase 50 — Real Multi-Repo Search ✅ Implemented (v0.52.0)](#phase-50-—-real-multi-repo-search-✅-implemented-v0520) | 2334 | +| [Phase 51 — LSP Completion of the Protocol ✅ Implemented (v0.53.0)](#phase-51-—-lsp-completion-of-the-protocol-✅-implemented-v0530) | 2346 | +| [Phase 52 — Query Expansion ✅ Implemented (v0.54.0)](#phase-52-—-query-expansion-✅-implemented-v0540) | 2359 | +| [Phase 53 — Saved Searches and Watch Mode ✅ Implemented (v0.55.0)](#phase-53-—-saved-searches-and-watch-mode-✅-implemented-v0550) | 2371 | +| [Phase 54 — Index Bundle Export / Import ✅ Implemented (v0.56.0)](#phase-54-—-index-bundle-export-import-✅-implemented-v0560) | 2383 | +| [Phase 55 — Embedding Space Explorer (Web UI) ✅ Implemented (v0.57.0)](#phase-55-—-embedding-space-explorer-web-ui-✅-implemented-v0570) | 2394 | +| [Phase 56 — LLM-Powered Evolution Narration ✅ Implemented (v0.58.0)](#phase-56-—-llm-powered-evolution-narration-✅-implemented-v0580) | 2405 | +| [Phase 57 — GitHub Actions Integration for CI Diff ✅ Implemented (v0.59.0)](#phase-57-—-github-actions-integration-for-ci-diff-✅-implemented-v0590) | 2416 | +| [Phase 58 — Structured Security Scan (Static + Semantic) ✅ Implemented (v0.60.0)](#phase-58-—-structured-security-scan-static-semantic-✅-implemented-v0600) | 2427 | +| [Phase 59 — `gitsema tools` Subcommand Group (Protocol Servers) ✅ Implemented (v0.61.0)](#phase-59-—-gitsema-tools-subcommand-group-protocol-servers-✅-implemented-v0610) | 2439 | +| [Phase 60 — Uniform Column Headers + `--no-headings` Across All Commands ✅ Implemented (v.0.62.0)](#phase-60-—-uniform-column-headers-no-headings-across-all-commands-✅-implemented-v0620) | 2480 | +| [Phase 61 — MCP/HTTP Parity + Semantic PR Report *(completed v0.64.0)*](#phase-61-—-mcphttp-parity-semantic-pr-report-completed-v0640) | 2545 | +| [Phase 62 — Heavy Batching for Ollama + HTTP Providers *(completed v0.67.0)*](#phase-62-—-heavy-batching-for-ollama-http-providers-completed-v0670) | 2565 | +| [Phase 63 — Indexing Auto-Defaults and Adaptive Tuning *(completed v0.65.0)*](#phase-63-—-indexing-auto-defaults-and-adaptive-tuning-completed-v0650) | 2579 | +| [Phase 64 — Search Scalability + AI Retrieval Reliability *(completed v0.66.0)*](#phase-64-—-search-scalability-ai-retrieval-reliability-completed-v0660) | 2595 | +| [Phase 65 — Incident Triage Bundle *(completed v0.68.0)*](#phase-65-—-incident-triage-bundle-completed-v0680) | 2609 | +| [Phase 66 — Policy Checks for CI *(completed v0.68.0)*](#phase-66-—-policy-checks-for-ci-completed-v0680) | 2617 | +| [Phase 67 — Ownership Heatmap by Concept *(completed v0.68.0)*](#phase-67-—-ownership-heatmap-by-concept-completed-v0680) | 2625 | +| [Phase 68 — Persistent Workflow Templates *(completed v0.68.0)*](#phase-68-—-persistent-workflow-templates-completed-v0680) | 2633 | +| [Phase 69 — Pipelined Batch Indexing *(completed v0.68.0)*](#phase-69-—-pipelined-batch-indexing-completed-v0680) | 2641 | +| [Phase 70 — Unified Output System *(completed v0.69.0)*](#phase-70-—-unified-output-system-completed-v0690) | 2649 | +| [Phase 71 — Index Status Dashboard + Model Management *(completed v0.71.0)*](#phase-71-—-index-status-dashboard-model-management-completed-v0710) | 2666 | +| [Planned Phases (72+)](#planned-phases-72) | 2688 | +| [Phase 71 — Operational Readiness: Metrics, Rate Limiting, and OpenAPI *(completed v0.71.0)*](#phase-71-—-operational-readiness-metrics-rate-limiting-and-openapi-completed-v0710) | 2694 | +| [Phase 72 — HTTP Route Parity for All Analysis Commands *(completed v0.72.0)*](#phase-72-—-http-route-parity-for-all-analysis-commands-completed-v0720) | 2707 | +| [Phase 73 — Deployment Guide and Docker Infrastructure](#phase-73-—-deployment-guide-and-docker-infrastructure) | 2719 | +| [Phase 74 — `gitsema status` Scale Warnings + Extended `gitsema doctor` Pre-flight](#phase-74-—-gitsema-status-scale-warnings-extended-gitsema-doctor-pre-flight) | 2732 | +| [Phase 75 — Per-Repo Access Control on HTTP Server](#phase-75-—-per-repo-access-control-on-http-server) | 2745 | +| [Phase 76 — Complete `htmlRenderer.ts` Modularisation](#phase-76-—-complete-htmlrendererts-modularisation) | 2759 | +| [Phase 77 — Unified Indexing + Search Level Concept](#phase-77-—-unified-indexing-search-level-concept) | 2772 | +| [Phase 82 — Auto-cap Search Memory *(completed v0.79.0)*](#phase-82-—-auto-cap-search-memory-completed-v0790) | 2788 | +| [Phase 83 — Parallel Commit-Message Embedding *(completed v0.80.0)*](#phase-83-—-parallel-commit-message-embedding-completed-v0800) | 2800 | +| [Phase 84 — LSP: documentSymbol + Improved definition/references *(completed v0.81.0)*](#phase-84-—-lsp-documentsymbol-improved-definitionreferences-completed-v0810) | 2814 | +| [Phase 85 — Tier-1 Reliability: Test Isolation, SQL Sampling, Batch Dedup *(completed v0.84.0)*](#phase-85-—-tier-1-reliability-test-isolation-sql-sampling-batch-dedup-completed-v0840) | 2828 | +| [Phase 86 — Tier-2 Code Organisation: MCP Modularization + Search Module Split + CLI Register Split *(completed v0.85.0)*](#phase-86-—-tier-2-code-organisation-mcp-modularization-search-module-split-cli-register-split-completed-v0850) | 2856 | +| [Phase 87 — Tier-3 Robustness: Embed Retry, Queue Backpressure, Atomic FTS5, Body Limit *(completed v0.86.0)*](#phase-87-—-tier-3-robustness-embed-retry-queue-backpressure-atomic-fts5-body-limit-completed-v0860) | 2884 | +| [Phase 88 — Tier-4 Scale/Features: LLM Narrator Tests + Docs Sync Check *(completed v0.87.0)*](#phase-88-—-tier-4-scalefeatures-llm-narrator-tests-docs-sync-check-completed-v0870) | 2916 | +| [Phase 89 — Tier-5 Code Quality: review6 §11 Detailed Findings *(completed v0.88.0)*](#phase-89-—-tier-5-code-quality-review6-§11-detailed-findings-completed-v0880) | 2940 | +| [Phase 90 — Model Local Names (Shorthand / globalName) *(completed v0.89.0)*](#phase-90-—-model-local-names-shorthand-globalname-completed-v0890) | 3020 | +| [Phase 91 — 8 Productized Usage Patterns (review7 §5) *(completed v0.90.0)*](#phase-91-—-8-productized-usage-patterns-review7-§5-completed-v0900) | 3075 | +| [Phase 92 — review7 Improvement Bundle *(completed, 2026-04-09)*](#phase-92-—-review7-improvement-bundle-completed-2026-04-09) | 3121 | +| [Phase 93 — Time filter semantics & pagination stability](#phase-93-—-time-filter-semantics-pagination-stability) | 3163 | +| [Phase 94 — review8 CLI Wiring & Documentation Restoration *(completed v0.91.0)*](#phase-94-—-review8-cli-wiring-documentation-restoration-completed-v0910) | 3193 | +| [Phase 95 — Flag unification (review8 §8.6/§8.9) *(completed v0.92.0)*](#phase-95-—-flag-unification-review8-§86§89-completed-v0920) | 3225 | +| [Phase 96 — LLM Narrator/Explainer/Guide via chattydeer *(completed v0.93.0)*](#phase-96-—-llm-narratorexplainerguide-via-chattydeer-completed-v0930) | 3248 | +| [Phase 97 — Full-toolset guide, tool interpretation registry, skill generation, Ollama docs](#phase-97-—-full-toolset-guide-tool-interpretation-registry-skill-generation-ollama-docs) | 3281 | +| [Phase 98 — CLI-based AI tool backends for narrator/guide](#phase-98-—-cli-based-ai-tool-backends-for-narratorguide) | 3342 | +| [Phase 99 — `--provider ollama` for narrator/guide + Ollama model discovery](#phase-99-—-provider-ollama-for-narratorguide-ollama-model-discovery) | 3406 | +| [Long-Term Investments](#long-term-investments) | 3447 | +| [Non-goals for now (revisited later)](#non-goals-for-now-revisited-later) | 3464 | +| [Phase 100 — Persistent, registry-backed server-side repo storage](#phase-100-—-persistent-registry-backed-server-side-repo-storage) | 3472 | +| [Phases 101–103 — Pluggable storage backends & index scoping](#phases-101–103-—-pluggable-storage-backends-index-scoping) | 3540 | +| [Phase 104 — Full-toolset guide coverage, per-command `--narrate`, and a guided `gitsema setup` wizard](#phase-104-—-full-toolset-guide-coverage-per-command-narrate-and-a-guided-gitsema-setup-wizard) | 3742 | +| [Knowledge Graph Track (Phases 105–112) — *planned*](#knowledge-graph-track-phases-105–112-—-planned) | 3888 | --- @@ -3931,8 +3932,8 @@ include `co-change`, `deps`, `cycles`, `callers`/`callees`/`path`/`neighbors`, | **108** | Traversal primitives + CLI/MCP | — | `GraphStore` seam (recursive CTEs); `gitsema graph callers\|callees\|neighbors\|path`; MCP `call_graph`/`graph_neighbors`. | | **109** | `--lens` toggle + structural ranking | — | Cross-cutting `--lens` + `--weight-structural` in the re-rank loop; new commands `blast-radius`, `relate`, `similar --lens`, `unused`; `impact` gains `--lens`. Semantic stays the default for existing commands. | | **110** | Fusion: cascade planner + hotspots | — | Cascade query planner (`FTS → vector → graph traversal → merge/rerank`); `hotspots`; structural enrichment of `code-review`/`explain`/`guide`/`triage`. | -| **111** | Unified graph UI | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. | -| **112** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the whole command surface (CLI + MCP + HTTP): shared `addLensOption()` helper, uniform §7.3 defaults + per-hit lens labeling, docs/skill/`interpretations.ts` parity, and a test asserting every lens-capable command exposes `lens`. Done last so it covers the 110 fusion commands too. | +| **111** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the whole command surface (CLI + MCP + HTTP): shared `addLensOption()` helper, uniform §7.3 defaults + per-hit lens labeling, docs/skill/`interpretations.ts` parity, and a test asserting every lens-capable command exposes `lens`. Done before the UI phase so it covers the 110 fusion commands too. | +| **112** | Unified graph UI (HTML + CLI) | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. Also adds a CLI/text-mode subgraph view (ASCII tree or list rendering of nodes/edges) for terminal-only workflows, alongside the HTML view. | Each phase ends with working software, tests, a `features.md` entry, a `PLAN.md` status update, and a changeset. **Start point: Phase 105** (isolated, test-heavy, @@ -4003,3 +4004,93 @@ pairwise co-change computation per commit to avoid O(n²) blowup on vendoring/lockfile-regeneration commits. Traversal primitives (callers/callees/path/neighbors) and the `--lens` toggle remain out of scope — Phase 108/109. + +**Status:** Phase 108 ✅ complete. The `GraphStore` interface +(`src/core/storage/types.ts`) gains five traversal primitives — +`neighbors`/`callers`/`callees`/`path`/`subgraph` — plus `GraphHit`, `GraphPath`, +`GraphPathHop`, `GraphSubgraph`, and a shared `MAX_GRAPH_TRAVERSAL_DEPTH = 3` +constant (knowledge-graph §6). Both `SqliteGraphStore` and `PostgresGraphStore` +implement them via recursive CTEs over `edges`/`graph_nodes` +(`src/core/storage/sqlite/graphTraversal.ts` and +`src/core/storage/postgres/graphTraversal.ts`): a `WITH RECURSIVE` walk with a +`ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth)` window picks the +shortest-depth hit (and its edge type) per reached node for +`neighbors`/`callers`/`callees`/`subgraph`; `path` uses a second recursive CTE that +accumulates a delimited path string (`node|edgeType|reversed|node|...`) and returns +the shortest match. All traversal depths are clamped to +`MAX_GRAPH_TRAVERSAL_DEPTH` (`callers`/`callees`/`path`/`subgraph` default to 3; +`neighbors` defaults to 1). `UnsupportedGraphStore` throws the same +"graph queries require a relational backend" error for all five new methods, per +review9 §4. A new `src/core/graph/traversal.ts` wraps the primitives with +`resolveNode()` (Phase 107) for identifier resolution, backing four new CLI +commands — `gitsema graph callers [--depth]`, `gitsema graph callees + [--depth]`, `gitsema graph neighbors [--edge-types] [--direction] +[--depth]`, and `gitsema graph path ` — and two new MCP tools, `call_graph` +(callers/callees over `calls` edges) and `graph_neighbors` (typed neighborhood, any +edge kinds), registered in `src/mcp/tools/graph.ts`. **Deviation from the original +sketch:** `call_graph`/`graph_neighbors` are not yet added to the `gitsema guide` +`GUIDE_TOOLS` registry (46 tools) or `interpretations.ts` — left for the Phase 110 +fusion pass / Phase 111 lens-coverage sweep, consistent with `docsSync`'s existing +guard (which only requires every `GUIDE_TOOLS` entry to have an interpretation, not +that every MCP tool is in `GUIDE_TOOLS`). No schema change. Tests: +`tests/graphTraversal.test.ts`. + +**Status:** Phase 109 ✅ complete. Adds the cross-cutting `--lens +semantic|structural|hybrid` toggle (knowledge-graph §7/§8) plus a fourth ranking +signal in `vectorSearch` (`src/core/search/analysis/vectorSearch.ts`): +`weightStructural`/`structuralScores` extend the three-signal formula to +`score = (wv*cosine + wr*recency + wp*pathScore + ws*structScore) / wTotal`, where +`structScore` comes from a precomputed `Map` of structural +proximity (`1 / (1 + hops)`) from a query anchor. When neither option is set +(the default for every existing caller), `useWeightedSignals` and the formula are +unchanged from before Phase 109 — semantic-lens output is byte-for-byte identical. +A shared `src/cli/lib/lens.ts` provides `parseLens()`, `lensWeights()`, and +`addLensOption()` (adds `--lens ` and `--weight-structural ` to a +Commander command). Four new core modules under `src/core/graph/` back four new +top-level commands: +- `blastRadius` (`blastRadius.ts`) → `gitsema blast-radius [--lens] + [--depth] [-k/--top]` (default lens: hybrid) — structural dependents via + `graph.neighbors(node, {edgeTypes: BLAST_RADIUS_EDGE_TYPES, direction: 'in', + depth})` (calls/imports/extends/implements/references) and/or semantically + similar blobs/symbols. +- `relate` (`relate.ts`) → `gitsema relate [-k/--top]` — depth-1 + callers + depth-1 callees (labeled, structural) plus semantically similar + blobs/symbols — "both lenses, lose neither", no `--lens` flag (always shows all + three sections). +- `similar` (`similar.ts`) → `gitsema similar [--lens] [-k/--top]` + (default lens: hybrid) — structural similarity ranks nodes of the same `kind` by + Jaccard overlap of their outgoing edge targets (`imports` for files, `calls` for + symbols by default); semantic similarity ranks by embedding cosine similarity. +- `unused` (`unused.ts`) → `gitsema unused [--edge-types]` — file/function/class/ + method nodes with no inbound `calls`/`imports` edges (excludes `external:*` + nodes); the structural complement to the semantic `dead-concepts` command. + +The "semantic similarity without an embedding provider" lookup +(`src/core/graph/semanticNeighbors.ts`, `semanticNeighborsForNode()`) ranks stored +`embeddings`/`symbol_embeddings` rows by cosine similarity to the resolved graph +node's own stored embedding (file nodes use `currentBlobHash`'s whole-file +embedding; symbol nodes parse `symbol:##` and +use the matching `symbol_embeddings` row) — no network call. It returns +`{supported: false, hits: []}` on non-sqlite backends, and all four new commands +(plus `impact`'s blast-radius alias) render `(not supported on this storage +backend)` for the semantic section in that case rather than throwing. Shared +rendering helpers (`renderResolutionError`, `renderBlastRadius`) live in +`src/cli/lib/graphRender.ts`. + +`gitsema impact --lens structural|hybrid` becomes a thin alias over +`blastRadius()` (knowledge-graph §8): the path is normalized to a `file:` +graph node and delegated entirely to `blastRadius()`/`renderBlastRadius()`, +including `--dump`/`--out json` support. `--lens semantic` (the default) preserves +the pre-Phase-109 `computeImpact()` code path exactly. + +**Deviations from the original sketch:** (1) `--weight-structural` is accepted by +`blast-radius`/`similar`/`impact` for consistency with the shared `--lens` option +helper, but the new fusion commands rank their structural/semantic sections +independently (Jaccard / graph-distance / cosine) rather than through +`vectorSearch`'s four-signal blend — `--weight-structural` only affects ranking +when `--lens` flows into `vectorSearch` directly (not yet wired for any CLI +command in this phase; the four-signal formula itself is tested and ready for a +future search-integration phase). (2) No new MCP tools were added for +`blast-radius`/`relate`/`similar`/`unused` — left for a future fusion/MCP-coverage +phase, consistent with the Phase 108 deviation note. No schema change. Tests: +`tests/graphLens.test.ts`. diff --git a/docs/features.md b/docs/features.md index b30c2db..f222dcc 100644 --- a/docs/features.md +++ b/docs/features.md @@ -200,6 +200,9 @@ All search uses the **text embedding model** (not the code model) to embed queri | **Generic `--narrate` via `narrateToolResult` (Phase 104)** | `narrateToolResult(toolKey, result)` in `src/core/llm/narrator.ts` looks up the tool's `TOOL_INTERPRETATIONS` entry, redacts and caps the JSON result, and asks the active narrator model for a prose summary (safe-by-default — no network unless `--narrate` is passed and a narrator model is configured). Wired onto `--narrate` flags on `first-seen`, `branch-summary`, `merge-audit`, `merge-preview`, `dead-concepts`, `debt`, `doc-gap`, `security-scan`, `blame`/`semantic-blame`, `triage`, `impact`, `ownership`, `experts`, `author`, `contributor-profile`, `bisect`, `refactor-candidates`, `cherry-pick-suggest`, and `heatmap` | | **Guided `gitsema setup` wizard with storage backend selection (Phase 104)** | `gitsema setup` (primary name; `gitsema quickstart` remains a backward-compat alias) extends the onboarding wizard with a storage-backend step (sqlite/postgres/qdrant), persisting `storage.*` config keys and validating the connection via `getCachedStorageProfile().metadata.getLastIndexedCommit()` (reverting to sqlite on failure), plus an optional final step to configure a local Ollama narrator/guide model via `gitsema models add --narrator\|--guide --provider ollama --activate` | | **Co-change / dependency / cycle queries (Phase 107)** | `gitsema co-change [-k/--top]` — files that historically change together with `` (from `co_change` edges); `gitsema deps [--reverse] [--depth] [--edge-types]` — import/dependency closure of a file or symbol (BFS over `imports`/`calls`/`extends`/`implements` edges); `gitsema graph cycles` / top-level `gitsema cycles [--edge-types]` — detect cycles in the structural graph (default: `imports`). All require `gitsema index --graph` + `gitsema graph build` first | +| **Graph traversal primitives (Phase 108)** | `GraphStore.neighbors/callers/callees/path/subgraph` — recursive-CTE traversals over `graph_nodes`/`edges` (sqlite + Postgres; Qdrant profile throws "graph queries require a relational backend"), depth capped at 3. CLI: `gitsema graph callers [--depth]` (reverse `calls` traversal), `gitsema graph callees [--depth]` (forward `calls` traversal), `gitsema graph neighbors [--edge-types] [--direction] [--depth]` (typed neighborhood, any edge kinds), `gitsema graph path ` (shortest typed path, rendered as `-[edgeType]->`/`<-[edgeType]-` hops). MCP: `call_graph` (callers/callees) and `graph_neighbors`. All resolve symbol qualified names, file paths, or literal node keys via `resolveNode()`; require `gitsema index --graph` + `gitsema graph build` first | +| **`--lens` toggle + four-signal ranking (Phase 109)** | `vectorSearch` gains a fourth ranking signal — `weightStructural`/`structuralScores` extend the three-signal formula to `score = (wv*cosine + wr*recency + wp*pathScore + ws*structScore) / wTotal`, where `structScore` is `1 / (1 + hops)` graph proximity from a query anchor. Unset by default (byte-for-byte identical to pre-Phase-109 ranking). Shared `--lens semantic\|structural\|hybrid` + `--weight-structural ` CLI options (`src/cli/lib/lens.ts`) toggle which signal(s) drive a command's output | +| **Structural/semantic fusion commands (Phase 109)** | `gitsema blast-radius [--lens] [--depth] [-k/--top]` (default: hybrid) — "what changes if I touch this": structural dependents (`calls`/`imports`/`extends`/`implements`/`references`, reverse traversal) and/or semantically similar blobs/symbols; `gitsema relate [-k/--top]` — depth-1 callers + depth-1 callees (labeled) plus semantically similar blobs/symbols, both lenses always shown; `gitsema similar [--lens] [-k/--top]` (default: hybrid) — structural similarity via Jaccard overlap of outgoing edge targets (same call/import "shape") and/or semantic embedding similarity; `gitsema unused [--edge-types]` — file/function/class/method nodes with no inbound `calls`/`imports` edges, the structural complement to `dead-concepts`. Semantic ranking (`semanticNeighborsForNode`) uses already-stored embeddings — no embedding provider call — and degrades to `(not supported on this storage backend)` on non-sqlite backends. `gitsema impact --lens structural\|hybrid` becomes a thin alias over `blast-radius`; `--lens semantic` (default) is unchanged | --- @@ -361,6 +364,8 @@ Start with `gitsema tools mcp`. All tools share the same core logic as the CLI. | `triage` | Incident triage bundle: first-seen, change points, evolution, bisect, experts | | `policy_check` | CI policy gate — debt score, security similarity, and concept drift thresholds | | `workflow_run` | Run a named workflow template (`pr-review` \| `incident` \| `release-audit`) | +| `call_graph` | Structural call-graph traversal — callers/callees of a symbol (Phase 108) | +| `graph_neighbors` | Typed neighborhood of a graph node — any edge kinds, direction, depth (Phase 108) | --- diff --git a/docs/knowledge-graph.md b/docs/knowledge-graph.md index 1332ead..f4f2877 100644 --- a/docs/knowledge-graph.md +++ b/docs/knowledge-graph.md @@ -3,8 +3,8 @@ > Status: **Design / not yet implemented.** Target track: Phases 105–112. > Scope decision (owner): build the **structural** graph first (typed edges from > static analysis), starting with **TypeScript/JavaScript + Python**, then expand -> to Go/Rust/Java. A separate presentation/UI graph (Phase 111) folds in later, and -> a cross-command lens-coverage sweep (Phase 112) closes the track. +> to Go/Rust/Java. A cross-command lens-coverage sweep (Phase 111) precedes a +> separate presentation/UI graph (Phase 112, HTML + CLI) which closes the track. This document is the single design reference for the knowledge-graph track. It nails down the **identity model**, the **schema**, the **per-language name-resolution @@ -376,7 +376,7 @@ fusion lives in one place. This mirrors the precedent set by `--hybrid` + `--bm25-weight` (vector+BM25 blend), so the convention is already idiomatic. The MCP/HTTP tool surfaces expose `lens` as a -parameter wherever the CLI flag exists. A dedicated **Phase 112** sweeps the whole +parameter wherever the CLI flag exists. A dedicated **Phase 111** sweeps the whole command set to make this coverage uniform and mechanically enforced (a shared `addLensOption()` helper + a parity test), rather than wired ad-hoc per command. @@ -427,8 +427,8 @@ when each becomes buildable — several temporal ones need only `co_change` | **108** | Traversal primitives + CLI/MCP | — | `GraphStore` seam (recursive CTEs); `gitsema graph callers\|callees\|neighbors\|path`; MCP `call_graph`/`graph_neighbors`. | | **109** | `--lens` toggle + structural ranking | — | Cross-cutting `--lens semantic\|structural\|hybrid` + `--weight-structural` (§7) wired into the re-rank loop; new commands `blast-radius`, `relate`, `similar --lens`, `unused`; `impact` gains `--lens`. **Semantic stays the default for existing commands.** | | **110** | Fusion: cascade planner + hotspots | — | Cascade query planner `FTS filter → vector expand → graph traversal → merge/rerank`; `hotspots`; structural enrichment of `code-review`/`explain`/`guide`/`triage`. | -| **111** | Unified graph UI | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. | -| **112** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the **entire command surface** (CLI + MCP + HTTP): wire `--lens`/`lens` into every command where more than one lens is meaningful, via a single shared `addLensOption()` helper; enforce the §7.3 defaults and per-hit lens labeling uniformly; restore docs / skill / `interpretations.ts` parity; add a test asserting every lens-capable command/tool exposes `lens` (the same mechanical-guarantee approach as `docsSync`). Done last so it also covers the fusion commands from 110. | +| **111** | Lens coverage & parity sweep | — | Cross-cutting adoption pass over the **entire command surface** (CLI + MCP + HTTP): wire `--lens`/`lens` into every command where more than one lens is meaningful, via a single shared `addLensOption()` helper; enforce the §7.3 defaults and per-hit lens labeling uniformly; restore docs / skill / `interpretations.ts` parity; add a test asserting every lens-capable command/tool exposes `lens` (the same mechanical-guarantee approach as `docsSync`). Done before the UI phase so it also covers the fusion commands from 110. | +| **112** | Unified graph UI (HTML + CLI) | — | Render subgraphs in HTML (reuse `htmlRenderer-clusters.ts` force-graph); nodes deep-link into existing per-command HTML views — binds the standalone HTML outputs together. Also adds a CLI/text-mode subgraph view (ASCII tree or list rendering of nodes/edges) for terminal-only workflows. | Each phase ends with working software, tests, a `features.md` entry, a `PLAN.md` status update, and a changeset (per `CLAUDE.md`). diff --git a/src/cli/commands/graphBlastRadius.ts b/src/cli/commands/graphBlastRadius.ts new file mode 100644 index 0000000..d0b0902 --- /dev/null +++ b/src/cli/commands/graphBlastRadius.ts @@ -0,0 +1,26 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { blastRadius } from '../../core/graph/blastRadius.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError, renderBlastRadius } from '../lib/graphRender.js' + +export interface GraphBlastRadiusCommandOptions { + lens?: string + depth?: string + top?: string +} + +export async function blastRadiusCommand(symbol: string, options: GraphBlastRadiusCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const lens = parseLens(options.lens, 'hybrid') + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await blastRadius(profile.graph, symbol, { lens, depth, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + console.log(renderBlastRadius(result, result.resolved.node)) +} diff --git a/src/cli/commands/graphCallees.ts b/src/cli/commands/graphCallees.ts new file mode 100644 index 0000000..53079a3 --- /dev/null +++ b/src/cli/commands/graphCallees.ts @@ -0,0 +1,34 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callees } from '../../core/graph/traversal.js' + +export interface GraphCalleesCommandOptions { + depth?: string +} + +export async function graphCalleesCommand(symbol: string, options: GraphCalleesCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await callees(profile.graph, symbol, depth) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${symbol}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${symbol}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const node = result.resolved.node + console.log(`Callees of ${node.displayName} (${node.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + console.log(` ${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphCallers.ts b/src/cli/commands/graphCallers.ts new file mode 100644 index 0000000..17bfda0 --- /dev/null +++ b/src/cli/commands/graphCallers.ts @@ -0,0 +1,34 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callers } from '../../core/graph/traversal.js' + +export interface GraphCallersCommandOptions { + depth?: string +} + +export async function graphCallersCommand(symbol: string, options: GraphCallersCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await callers(profile.graph, symbol, depth) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${symbol}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${symbol}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const node = result.resolved.node + console.log(`Callers of ${node.displayName} (${node.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + console.log(` ${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphNeighbors.ts b/src/cli/commands/graphNeighbors.ts new file mode 100644 index 0000000..f00c66e --- /dev/null +++ b/src/cli/commands/graphNeighbors.ts @@ -0,0 +1,42 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { neighbors } from '../../core/graph/traversal.js' +import type { EdgeType } from '../../core/storage/types.js' + +export interface GraphNeighborsCommandOptions { + edgeTypes?: string + direction?: string + depth?: string +} + +export async function graphNeighborsCommand(node: string, options: GraphNeighborsCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const edgeTypes = options.edgeTypes + ? options.edgeTypes.split(',').map((s) => s.trim()).filter(Boolean) as EdgeType[] + : undefined + const direction = (options.direction as 'out' | 'in' | 'both' | undefined) ?? 'both' + const depth = options.depth !== undefined ? parseInt(options.depth, 10) : undefined + + const result = await neighbors(profile.graph, node, { edgeTypes, direction, depth }) + + if (result.resolved.status === 'not-found') { + console.log(`No graph node found for "${node}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.resolved.status === 'ambiguous') { + console.log(`"${node}" is ambiguous — matches multiple symbols:`) + for (const c of result.resolved.candidates) console.log(` ${c.nodeKey}`) + return + } + + const resolvedNode = result.resolved.node + console.log(`Neighbors of ${resolvedNode.displayName} (${resolvedNode.nodeKey}):\n`) + + if (result.hits.length === 0) { + console.log(' (none)') + return + } + for (const hit of result.hits) { + const edge = hit.edgeType ? `[${hit.edgeType}] ` : '' + console.log(` ${edge}${hit.displayName} (depth ${hit.depth})`) + } +} diff --git a/src/cli/commands/graphPath.ts b/src/cli/commands/graphPath.ts new file mode 100644 index 0000000..4f486d1 --- /dev/null +++ b/src/cli/commands/graphPath.ts @@ -0,0 +1,46 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { path } from '../../core/graph/traversal.js' + +export async function graphPathCommand(a: string, b: string): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const result = await path(profile.graph, a, b) + + if (result.from.status === 'not-found') { + console.log(`No graph node found for "${a}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.from.status === 'ambiguous') { + console.log(`"${a}" is ambiguous — matches multiple symbols:`) + for (const c of result.from.candidates) console.log(` ${c.nodeKey}`) + return + } + if (result.to.status === 'not-found') { + console.log(`No graph node found for "${b}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.`) + return + } + if (result.to.status === 'ambiguous') { + console.log(`"${b}" is ambiguous — matches multiple symbols:`) + for (const c of result.to.candidates) console.log(` ${c.nodeKey}`) + return + } + + const fromNode = result.from.node + const toNode = result.to.node + + if (!result.path) { + console.log(`No path found from ${fromNode.displayName} to ${toNode.displayName} within the traversal depth limit.`) + return + } + + if (result.path.hops.length === 0) { + console.log(`${fromNode.displayName} is the same node as ${toNode.displayName}.`) + return + } + + const segments = [fromNode.displayName] + for (const hop of result.path.hops) { + const arrow = hop.reversed ? `<-[${hop.edgeType}]-` : `-[${hop.edgeType}]->` + segments.push(arrow, hop.displayName) + } + console.log(segments.join(' ')) +} diff --git a/src/cli/commands/graphRelate.ts b/src/cli/commands/graphRelate.ts new file mode 100644 index 0000000..95cd477 --- /dev/null +++ b/src/cli/commands/graphRelate.ts @@ -0,0 +1,50 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { relate } from '../../core/graph/relate.js' +import { renderResolutionError } from '../lib/graphRender.js' + +export interface GraphRelateCommandOptions { + top?: string +} + +export async function relateCommand(symbol: string, options: GraphRelateCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await relate(profile.graph, symbol, { topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + const node = result.resolved.node + console.log(`Related to ${node.displayName} (${node.nodeKey}):\n`) + + console.log('Called by (structural):') + if (result.callers.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.callers) console.log(` ${hit.displayName}`) + } + console.log('') + + console.log('Calls (structural):') + if (result.callees.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.callees) console.log(` ${hit.displayName}`) + } + console.log('') + + console.log('Semantically similar:') + if (!result.semanticSupported) { + console.log(' (not supported on this storage backend)') + } else if (result.similar.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.similar) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + console.log(` ${hit.score.toFixed(3)} ${label}`) + } + } +} diff --git a/src/cli/commands/graphSimilar.ts b/src/cli/commands/graphSimilar.ts new file mode 100644 index 0000000..fd5cb1f --- /dev/null +++ b/src/cli/commands/graphSimilar.ts @@ -0,0 +1,51 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { similar } from '../../core/graph/similar.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError } from '../lib/graphRender.js' + +export interface GraphSimilarCommandOptions { + lens?: string + top?: string +} + +export async function similarCommand(symbol: string, options: GraphSimilarCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const lens = parseLens(options.lens, 'hybrid') + const topK = options.top !== undefined ? parseInt(options.top, 10) : undefined + + const result = await similar(profile.graph, symbol, { lens, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(symbol, result.resolved)) + return + } + + const node = result.resolved.node + console.log(`Similar to ${node.displayName} (${node.nodeKey}) — lens: ${lens}\n`) + + if (lens !== 'semantic') { + console.log('Structural (same call/import shape):') + if (result.structural.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.structural) { + console.log(` ${hit.jaccard.toFixed(3)} ${hit.displayName} (${hit.shared} shared)`) + } + } + console.log('') + } + + if (lens !== 'structural') { + console.log('Semantic:') + if (!result.semanticSupported) { + console.log(' (not supported on this storage backend)') + } else if (result.semantic.length === 0) { + console.log(' (none)') + } else { + for (const hit of result.semantic) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + console.log(` ${hit.score.toFixed(3)} ${label}`) + } + } + } +} diff --git a/src/cli/commands/graphUnused.ts b/src/cli/commands/graphUnused.ts new file mode 100644 index 0000000..56a243b --- /dev/null +++ b/src/cli/commands/graphUnused.ts @@ -0,0 +1,26 @@ +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { unused } from '../../core/graph/unused.js' +import type { EdgeType } from '../../core/storage/types.js' + +export interface GraphUnusedCommandOptions { + edgeTypes?: string +} + +export async function unusedCommand(options: GraphUnusedCommandOptions = {}): Promise { + const profile = getCachedStorageProfile(process.cwd()) + const edgeTypes = options.edgeTypes + ? options.edgeTypes.split(',').map((s) => s.trim()).filter(Boolean) as EdgeType[] + : undefined + + const result = await unused(profile.graph, { edgeTypes }) + + if (result.nodes.length === 0) { + console.log('No unused symbols or files found (or `gitsema graph build` has not been run).') + return + } + + console.log(`${result.nodes.length} unused node${result.nodes.length === 1 ? '' : 's'} (no inbound calls/imports):\n`) + for (const node of result.nodes) { + console.log(` [${node.kind}] ${node.displayName}${node.path ? ` (${node.path})` : ''}`) + } +} diff --git a/src/cli/commands/impact.ts b/src/cli/commands/impact.ts index 15a0bd3..9dcd897 100644 --- a/src/cli/commands/impact.ts +++ b/src/cli/commands/impact.ts @@ -11,6 +11,10 @@ import { shortHash } from '../../core/search/ranking.js' import { buildProviderOrExit, resolveModels } from '../lib/provider.js' import { emitJsonSink } from '../lib/output.js' import { narrateToolResult } from '../../core/llm/narrator.js' +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { blastRadius } from '../../core/graph/blastRadius.js' +import { parseLens } from '../lib/lens.js' +import { renderResolutionError, renderBlastRadius } from '../lib/graphRender.js' export interface ImpactCommandOptions { /** Number of similar blobs to return (default 10). */ @@ -33,6 +37,7 @@ export interface ImpactCommandOptions { noHeadings?: boolean out?: string[] narrate?: boolean + lens?: string } /** @@ -104,6 +109,37 @@ export async function impactCommand( process.exit(1) } + const lens = parseLens(options.lens, 'semantic') + + // Phase 109 (knowledge-graph §8): `--lens structural|hybrid` makes `impact` + // a thin alias over `blast-radius` — true structural dependents instead of + // (or alongside) semantic similarity. `--lens semantic` (default) preserves + // pre-Phase-109 behavior exactly. + if (lens !== 'semantic') { + const profile = getCachedStorageProfile(process.cwd()) + const normalised = filePath.trim().replace(/\\/g, '/').replace(/^\.\//, '') + const result = await blastRadius(profile.graph, normalised, { lens, topK }) + + if (result.resolved.status !== 'found') { + console.log(renderResolutionError(filePath, result.resolved)) + return + } + + if (options.dump !== undefined || (options.out?.some((o) => o.startsWith('json')))) { + const jsonSink = getSink(resolveOutputs({ out: options.out, dump: options.dump, html: options.html }), 'json') + if (jsonSink?.file) { + writeFileSync(jsonSink.file, JSON.stringify(result, null, 2), 'utf8') + console.log(`Wrote impact (blast-radius) report JSON to ${jsonSink.file}`) + } else { + console.log(JSON.stringify(result, null, 2)) + } + return + } + + console.log(renderBlastRadius(result, result.resolved.node)) + return + } + const resolvedPath = resolve(filePath.trim()) if (!existsSync(resolvedPath)) { console.error(`Error: file not found: ${resolvedPath}`) diff --git a/src/cli/lib/graphRender.ts b/src/cli/lib/graphRender.ts new file mode 100644 index 0000000..baa354b --- /dev/null +++ b/src/cli/lib/graphRender.ts @@ -0,0 +1,51 @@ +import type { ResolveNodeResult } from '../../core/graph/resolveNode.js' +import type { BlastRadiusResult } from '../../core/graph/blastRadius.js' +import type { GraphNodeRecord } from '../../core/storage/types.js' + +/** Shared "not found" / "ambiguous" message for `resolveNode()` results. */ +export function renderResolutionError(label: string, resolved: ResolveNodeResult): string { + if (resolved.status === 'ambiguous') { + const candidates = resolved.candidates.map((c) => ` ${c.nodeKey}`).join('\n') + return `"${label}" is ambiguous — matches multiple symbols:\n${candidates}` + } + return `No graph node found for "${label}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.` +} + +/** + * Renders a `BlastRadiusResult` as human-readable text. Shared by + * `blast-radius` and `impact --lens`. The caller must have already checked + * `result.resolved.status === 'found'`. + */ +export function renderBlastRadius(result: BlastRadiusResult, node: GraphNodeRecord): string { + const lines: string[] = [] + lines.push(`Blast radius of ${node.displayName} (${node.nodeKey}) — lens: ${result.lens}`, '') + + if (result.lens !== 'semantic') { + lines.push('Structural dependents (who references this):') + if (result.structural.length === 0) { + lines.push(' (none)') + } else { + for (const hit of result.structural) { + const edge = hit.edgeType ? `[${hit.edgeType}] ` : '' + lines.push(` ${edge}${hit.displayName} (depth ${hit.depth})`) + } + } + lines.push('') + } + + if (result.lens !== 'structural') { + lines.push('Semantically related:') + if (!result.semanticSupported) { + lines.push(' (not supported on this storage backend)') + } else if (result.semantic.length === 0) { + lines.push(' (none)') + } else { + for (const hit of result.semantic) { + const label = hit.symbolName ?? hit.paths[0] ?? '(unknown)' + lines.push(` ${hit.score.toFixed(3)} ${label}`) + } + } + } + + return lines.join('\n') +} diff --git a/src/cli/lib/lens.ts b/src/cli/lib/lens.ts new file mode 100644 index 0000000..26d4871 --- /dev/null +++ b/src/cli/lib/lens.ts @@ -0,0 +1,51 @@ +import type { Command } from 'commander' + +/** + * The cross-cutting `--lens` toggle (Phase 109, knowledge-graph §7): which of + * the semantic/structural signals drive a command's ranking. + * + * - `semantic` — vectors + FTS only (today's default; structural weight 0). + * - `structural` — pure graph traversal/ranking (vector weight 0). + * - `hybrid` — both blended. + */ +export type Lens = 'semantic' | 'structural' | 'hybrid' + +export function parseLens(value: string | undefined, fallback: Lens): Lens { + if (value === 'semantic' || value === 'structural' || value === 'hybrid') return value + return fallback +} + +/** Ranking-weight overrides for `vectorSearch`'s four-signal formula (§7.2). */ +export interface LensWeights { + weightVector?: number + weightRecency?: number + weightPath?: number + weightStructural?: number +} + +/** + * Translates `--lens` (+ optional `--weight-structural` override) into + * `vectorSearch` ranking-weight overrides. + * + * `semantic` with no explicit structural weight returns `{}` — leaving + * `vectorSearch`'s defaults untouched, so existing semantic-lens callers stay + * byte-for-byte identical to pre-Phase-109 behavior. + */ +export function lensWeights(lens: Lens, weightStructural?: number): LensWeights { + switch (lens) { + case 'structural': + return { weightVector: 0, weightRecency: 0, weightPath: 0, weightStructural: weightStructural ?? 1 } + case 'hybrid': + return { weightStructural: weightStructural ?? 0.3 } + case 'semantic': + default: + return weightStructural !== undefined ? { weightStructural } : {} + } +} + +/** Adds the shared `--lens` and `--weight-structural` options to a command. */ +export function addLensOption(cmd: Command, defaultLens: Lens): Command { + return cmd + .option('--lens ', `'semantic' | 'structural' | 'hybrid' — which signal(s) drive ranking (default: ${defaultLens})`, defaultLens) + .option('--weight-structural ', 'structural signal weight (overrides the --lens default)') +} diff --git a/src/cli/register/all.ts b/src/cli/register/all.ts index 1fb48e1..eafc9ff 100644 --- a/src/cli/register/all.ts +++ b/src/cli/register/all.ts @@ -1,5 +1,6 @@ import { Command } from 'commander' import { collectOut } from '../../utils/outputSink.js' +import { addLensOption } from '../lib/lens.js' // Per-domain register helpers (keep existing split modules available) import { registerSetup } from './setup.js' @@ -330,22 +331,24 @@ export function registerAll(program: Command) { .option('--narrate', 'generate an LLM narrative of dead concepts (requires GITSEMA_LLM_URL)') .action(deadConceptsCommand) - program - .command('impact ') - .description('Compute semantically similar blobs across the codebase to highlight refactor impact (see also: blame, file-diff)') - .option('-k, --top ', 'number of similar blobs to return', '10') - .option('--chunks', 'include chunk-level embeddings for finer-grained coupling') - .option('--level ', 'search level: file (default), chunk, or symbol') - .option('--dump [file]', 'output structured JSON; writes to if given, otherwise prints JSON to stdout (legacy: prefer --out json)') - .option('--model ', 'override embedding model') - .option('--text-model ', 'override text embedding model') - .option('--code-model ', 'override code embedding model') - .option('--branch ', 'restrict results to blobs seen on this branch') - .option('--html [file]', 'output interactive HTML; writes to if given, otherwise impact.html (legacy: prefer --out html)') - .option('--out ', 'output spec (repeatable): text|json[:file]|html[:file]|markdown[:file] (overrides --dump/--html)', collectOut, [] as string[]) - .option('--no-headings', "don't print section header") - .option('--narrate', 'generate an LLM narrative of the impact report (requires GITSEMA_LLM_URL)') - .action(impactCommand) + addLensOption( + program + .command('impact ') + .description('Compute semantically similar blobs across the codebase to highlight refactor impact (see also: blame, file-diff, blast-radius)') + .option('-k, --top ', 'number of similar blobs to return', '10') + .option('--chunks', 'include chunk-level embeddings for finer-grained coupling') + .option('--level ', 'search level: file (default), chunk, or symbol') + .option('--dump [file]', 'output structured JSON; writes to if given, otherwise prints JSON to stdout (legacy: prefer --out json)') + .option('--model ', 'override embedding model') + .option('--text-model ', 'override text embedding model') + .option('--code-model ', 'override code embedding model') + .option('--branch ', 'restrict results to blobs seen on this branch') + .option('--html [file]', 'output interactive HTML; writes to if given, otherwise impact.html (legacy: prefer --out html)') + .option('--out ', 'output spec (repeatable): text|json[:file]|html[:file]|markdown[:file] (overrides --dump/--html)', collectOut, [] as string[]) + .option('--no-headings', "don't print section header") + .option('--narrate', 'generate an LLM narrative of the impact report (requires GITSEMA_LLM_URL)'), + 'semantic', + ).action(impactCommand) program .command('clusters') diff --git a/src/cli/register/graph.ts b/src/cli/register/graph.ts index 9d6ff49..2a4e21d 100644 --- a/src/cli/register/graph.ts +++ b/src/cli/register/graph.ts @@ -3,6 +3,15 @@ import { graphBuildCommand } from '../commands/graphBuild.js' import { coChangeCommand } from '../commands/coChange.js' import { depsCommand } from '../commands/deps.js' import { cyclesCommand } from '../commands/cycles.js' +import { graphCallersCommand } from '../commands/graphCallers.js' +import { graphCalleesCommand } from '../commands/graphCallees.js' +import { graphNeighborsCommand } from '../commands/graphNeighbors.js' +import { graphPathCommand } from '../commands/graphPath.js' +import { blastRadiusCommand } from '../commands/graphBlastRadius.js' +import { relateCommand } from '../commands/graphRelate.js' +import { similarCommand } from '../commands/graphSimilar.js' +import { unusedCommand } from '../commands/graphUnused.js' +import { addLensOption } from '../lib/lens.js' /** * Structural knowledge-graph commands (Phase 107, knowledge-graph §3.3/§8). @@ -52,4 +61,60 @@ export function registerGraph(program: Command) { .description('Detect cycles in the structural graph (default: import cycles) (alias of `gitsema graph cycles`)') .option('--edge-types ', 'comma-separated edge types to check for cycles (default: imports)') .action(cyclesCommand) + + // Phase 108: traversal primitives (recursive CTEs over edges/graph_nodes). + graph + .command('callers ') + .description('Reverse `calls` traversal — who (transitively) calls (default depth 3)') + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphCallersCommand) + + graph + .command('callees ') + .description('Forward `calls` traversal — what (transitively) calls (default depth 3)') + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphCalleesCommand) + + graph + .command('neighbors ') + .description('Typed neighborhood of — any edge kinds by default (default depth 1, max 3)') + .option('--edge-types ', 'comma-separated edge types to traverse (default: all)') + .option('--direction ', "'out' | 'in' | 'both' (default: both)") + .option('--depth ', 'limit traversal depth (max 3)') + .action(graphNeighborsCommand) + + graph + .command('path ') + .description('Shortest typed path from to (structural lens; max depth 3)') + .action(graphPathCommand) + + // Phase 109: --lens toggle + fusion commands (knowledge-graph §7/§8). + addLensOption( + program + .command('blast-radius ') + .description('What changes if I touch this — structural dependents and/or semantically related blobs (default lens: hybrid)') + .option('--depth ', 'structural traversal depth (max 3)') + .option('-k, --top ', 'number of semantic results to return (default 10)'), + 'hybrid', + ).action(blastRadiusCommand) + + program + .command('relate ') + .description('Callers/callees (structural) and semantically similar blobs (vector), labeled — both lenses, lose neither') + .option('-k, --top ', 'number of semantic results to return (default 10)') + .action(relateCommand) + + addLensOption( + program + .command('similar ') + .description('Symbols/files with a similar call/import shape (structural) and/or semantically similar (vector) (default lens: hybrid)') + .option('-k, --top ', 'number of results to return per lens (default 10)'), + 'hybrid', + ).action(similarCommand) + + program + .command('unused') + .description('Symbols/files with no inbound calls/imports edges — structural complement to `dead-concepts`') + .option('--edge-types ', 'comma-separated inbound edge types that count as "used" (default: calls,imports)') + .action(unusedCommand) } diff --git a/src/core/graph/blastRadius.ts b/src/core/graph/blastRadius.ts new file mode 100644 index 0000000..a87a108 --- /dev/null +++ b/src/core/graph/blastRadius.ts @@ -0,0 +1,59 @@ +/** + * `gitsema blast-radius ` (Phase 109, knowledge-graph §7/§8): "what + * changes if I touch this" — structural dependents (who references this node, + * via `neighbors(..., direction: 'in')`) and/or semantically similar blobs, + * selected by `--lens`. The upgrade to `impact`'s semantic-only analysis. + */ + +import type { EdgeType, GraphHit, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' +import type { Lens } from '../../cli/lib/lens.js' + +/** Edge types that represent "depends on" relationships for blast-radius purposes. */ +export const BLAST_RADIUS_EDGE_TYPES: EdgeType[] = ['calls', 'imports', 'extends', 'implements', 'references'] + +export interface BlastRadiusResult { + resolved: ResolveNodeResult + lens: Lens + /** Nodes that (transitively) depend on the resolved node — empty unless lens is structural/hybrid. */ + structural: GraphHit[] + /** Semantically similar blobs/symbols — empty unless lens is semantic/hybrid. */ + semantic: SemanticHit[] + /** False when the semantic lens was requested but the storage backend doesn't support it. */ + semanticSupported: boolean +} + +export interface BlastRadiusOptions { + lens?: Lens + depth?: number + topK?: number + edgeTypes?: EdgeType[] +} + +export async function blastRadius(graph: GraphStore, identifier: string, opts: BlastRadiusOptions = {}): Promise { + const lens = opts.lens ?? 'hybrid' + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, lens, structural: [], semantic: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const structural = lens === 'semantic' + ? [] + : await graph.neighbors(resolved.node.nodeKey, { + edgeTypes: opts.edgeTypes ?? BLAST_RADIUS_EDGE_TYPES, + direction: 'in', + depth: opts.depth, + }) + + let semantic: SemanticHit[] = [] + let semanticSupported = true + if (lens !== 'structural') { + const result = await semanticNeighborsForNode(resolved.node, topK) + semantic = result.hits + semanticSupported = result.supported + } + + return { resolved, lens, structural, semantic, semanticSupported } +} diff --git a/src/core/graph/relate.ts b/src/core/graph/relate.ts new file mode 100644 index 0000000..990d23f --- /dev/null +++ b/src/core/graph/relate.ts @@ -0,0 +1,41 @@ +/** + * `gitsema relate ` (Phase 109, knowledge-graph §7/§8): one view + * combining structural callers/callees (labeled, depth 1) with semantically + * similar blobs/symbols — "both lenses, lose neither". + */ + +import type { GraphHit, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' + +export interface RelateResult { + resolved: ResolveNodeResult + /** Direct (depth-1) callers of the resolved symbol. */ + callers: GraphHit[] + /** Direct (depth-1) callees of the resolved symbol. */ + callees: GraphHit[] + /** Semantically similar blobs/symbols. */ + similar: SemanticHit[] + /** False when the storage backend doesn't support the semantic lookup. */ + semanticSupported: boolean +} + +export interface RelateOptions { + topK?: number +} + +export async function relate(graph: GraphStore, identifier: string, opts: RelateOptions = {}): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, callers: [], callees: [], similar: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const [callers, callees, semantic] = await Promise.all([ + graph.callers(resolved.node.nodeKey, 1), + graph.callees(resolved.node.nodeKey, 1), + semanticNeighborsForNode(resolved.node, topK), + ]) + + return { resolved, callers, callees, similar: semantic.hits, semanticSupported: semantic.supported } +} diff --git a/src/core/graph/semanticNeighbors.ts b/src/core/graph/semanticNeighbors.ts new file mode 100644 index 0000000..3ba45c7 --- /dev/null +++ b/src/core/graph/semanticNeighbors.ts @@ -0,0 +1,177 @@ +/** + * Semantic-similarity helper for the Phase 109 fusion commands + * (`blast-radius`, `relate`, `similar`): ranks stored embeddings by cosine + * similarity to a graph node's *own* stored embedding, so these commands need + * no embedding provider / network call. + * + * sqlite-only for now (raw `embeddings`/`symbols`/`symbol_embeddings` queries); + * other storage backends degrade to an empty (but `supported: false`) result, + * so hybrid-lens commands fall back to structural-only output gracefully. + */ + +import { getActiveSession } from '../db/sqlite.js' +import { embeddings, paths, symbols, symbolEmbeddings } from '../db/schema.js' +import { eq, inArray } from 'drizzle-orm' +import { cosineSimilarityPrecomputed, vectorNorm } from '../search/analysis/vectorSearch.js' +import { bufferToFloat32 } from '../../utils/embedding.js' +import { getCachedStorageProfile } from '../storage/resolveProfile.js' +import type { GraphNodeRecord } from '../storage/types.js' + +export interface SemanticHit { + blobHash: string + paths: string[] + score: number + symbolId?: number + symbolName?: string + qualifiedName?: string + startLine?: number + endLine?: number +} + +export interface SemanticNeighborsResult { + /** False when the active storage backend doesn't support this lookup (non-sqlite). */ + supported: boolean + hits: SemanticHit[] +} + +function pathsByBlob(hashes: string[]): Map { + if (hashes.length === 0) return new Map() + const { db } = getActiveSession() + const rows = db.select({ blobHash: paths.blobHash, path: paths.path }) + .from(paths) + .where(inArray(paths.blobHash, hashes)) + .all() + const map = new Map() + for (const row of rows) { + const list = map.get(row.blobHash) ?? [] + list.push(row.path) + map.set(row.blobHash, list) + } + return map +} + +/** + * Finds blobs whose whole-file embedding is most similar to `blobHash`'s, + * excluding `blobHash` itself. + */ +function fileNeighbors(blobHash: string, topK: number): SemanticHit[] { + const { db } = getActiveSession() + const target = db.select({ vector: embeddings.vector, model: embeddings.model }) + .from(embeddings) + .where(eq(embeddings.blobHash, blobHash)) + .limit(1) + .all()[0] + if (!target) return [] + + const targetVec = bufferToFloat32(target.vector as Buffer) + const targetNorm = vectorNorm(targetVec) + + const rows = db.select({ blobHash: embeddings.blobHash, vector: embeddings.vector }) + .from(embeddings) + .where(eq(embeddings.model, target.model)) + .all() + + const scored = rows + .filter((r) => r.blobHash !== blobHash) + .map((r) => ({ + blobHash: r.blobHash, + score: cosineSimilarityPrecomputed(targetVec, targetNorm, bufferToFloat32(r.vector as Buffer)), + })) + .sort((a, b) => b.score - a.score) + .slice(0, topK) + + const byBlob = pathsByBlob(scored.map((s) => s.blobHash)) + return scored.map((s) => ({ blobHash: s.blobHash, paths: byBlob.get(s.blobHash) ?? [], score: s.score })) +} + +/** + * Finds symbols whose embedding is most similar to the symbol identified by + * `blobHash` + `qualifiedName` + `signatureHash`, excluding itself. + */ +function symbolNeighbors(blobHash: string, qualifiedName: string, signatureHash: string, topK: number): SemanticHit[] { + const { db } = getActiveSession() + const targetRow = db.select({ id: symbols.id, qualifiedName: symbols.qualifiedName, signatureHash: symbols.signatureHash }) + .from(symbols) + .where(eq(symbols.blobHash, blobHash)) + .all() + .find((s) => s.qualifiedName === qualifiedName && s.signatureHash === signatureHash) + if (!targetRow) return [] + + const targetEmb = db.select({ vector: symbolEmbeddings.vector, model: symbolEmbeddings.model }) + .from(symbolEmbeddings) + .where(eq(symbolEmbeddings.symbolId, targetRow.id)) + .limit(1) + .all()[0] + if (!targetEmb) return [] + + const targetVec = bufferToFloat32(targetEmb.vector as Buffer) + const targetNorm = vectorNorm(targetVec) + + const rows = db.select({ + symbolId: symbolEmbeddings.symbolId, + vector: symbolEmbeddings.vector, + blobHash: symbols.blobHash, + symbolName: symbols.symbolName, + qualifiedName: symbols.qualifiedName, + startLine: symbols.startLine, + endLine: symbols.endLine, + }) + .from(symbolEmbeddings) + .innerJoin(symbols, eq(symbolEmbeddings.symbolId, symbols.id)) + .where(eq(symbolEmbeddings.model, targetEmb.model)) + .all() + + const scored = rows + .filter((r) => r.symbolId !== targetRow.id) + .map((r) => ({ + ...r, + score: cosineSimilarityPrecomputed(targetVec, targetNorm, bufferToFloat32(r.vector as Buffer)), + })) + .sort((a, b) => b.score - a.score) + .slice(0, topK) + + const byBlob = pathsByBlob(scored.map((s) => s.blobHash)) + return scored.map((s) => ({ + blobHash: s.blobHash, + paths: byBlob.get(s.blobHash) ?? [], + score: s.score, + symbolId: s.symbolId, + symbolName: s.symbolName, + qualifiedName: s.qualifiedName ?? undefined, + startLine: s.startLine, + endLine: s.endLine, + })) +} + +/** Parses `symbol:##` into its parts. */ +export function parseSymbolNodeKey(nodeKey: string): { path: string; qualifiedName: string; signatureHash: string } | null { + if (!nodeKey.startsWith('symbol:')) return null + const rest = nodeKey.slice('symbol:'.length) + const lastHash = rest.lastIndexOf('#') + const secondHash = rest.lastIndexOf('#', lastHash - 1) + if (lastHash === -1 || secondHash === -1) return null + return { + path: rest.slice(0, secondHash), + qualifiedName: rest.slice(secondHash + 1, lastHash), + signatureHash: rest.slice(lastHash + 1), + } +} + +/** + * Semantic neighbors of a resolved graph node — file nodes rank by whole-file + * embedding similarity; symbol nodes rank by symbol-embedding similarity. + * Returns `{ supported: false, hits: [] }` on non-sqlite backends. + */ +export async function semanticNeighborsForNode(node: GraphNodeRecord, topK = 10): Promise { + const profile = getCachedStorageProfile() + if (profile.backend !== 'sqlite') return { supported: false, hits: [] } + if (!node.currentBlobHash) return { supported: true, hits: [] } + + if (node.kind === 'file') { + return { supported: true, hits: fileNeighbors(node.currentBlobHash, topK) } + } + + const parsed = parseSymbolNodeKey(node.nodeKey) + if (!parsed) return { supported: true, hits: [] } + return { supported: true, hits: symbolNeighbors(node.currentBlobHash, parsed.qualifiedName, parsed.signatureHash, topK) } +} diff --git a/src/core/graph/similar.ts b/src/core/graph/similar.ts new file mode 100644 index 0000000..febacbf --- /dev/null +++ b/src/core/graph/similar.ts @@ -0,0 +1,95 @@ +/** + * `gitsema similar --lens structural|semantic|hybrid` (Phase 109, + * knowledge-graph §7/§8): structural similarity ranks nodes by the Jaccard + * overlap of their outgoing edge targets (same call/import "shape" as the + * resolved node); semantic similarity ranks by embedding cosine similarity. + */ + +import type { EdgeType, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' +import { semanticNeighborsForNode, type SemanticHit } from './semanticNeighbors.js' +import type { Lens } from '../../cli/lib/lens.js' + +export interface StructuralSimilarHit { + nodeKey: string + displayName: string + kind: string + /** Jaccard similarity of outgoing edge targets, in [0, 1]. */ + jaccard: number + /** Number of shared outgoing edge targets. */ + shared: number +} + +export interface SimilarResult { + resolved: ResolveNodeResult + lens: Lens + /** Nodes with a similar call/import shape — empty unless lens is structural/hybrid. */ + structural: StructuralSimilarHit[] + /** Semantically similar blobs/symbols — empty unless lens is semantic/hybrid. */ + semantic: SemanticHit[] + /** False when the semantic lens was requested but the storage backend doesn't support it. */ + semanticSupported: boolean +} + +export interface SimilarOptions { + lens?: Lens + topK?: number + /** Edge type whose outgoing targets define a node's "shape" (default: `calls` for symbols, `imports` for files). */ + edgeType?: EdgeType +} + +function jaccard(a: ReadonlySet, b: ReadonlySet): { score: number; shared: number } { + let shared = 0 + for (const k of a) if (b.has(k)) shared++ + const union = a.size + b.size - shared + return { score: union === 0 ? 0 : shared / union, shared } +} + +export async function similar(graph: GraphStore, identifier: string, opts: SimilarOptions = {}): Promise { + const lens = opts.lens ?? 'hybrid' + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') { + return { resolved, lens, structural: [], semantic: [], semanticSupported: true } + } + + const topK = opts.topK ?? 10 + const edgeType: EdgeType = opts.edgeType ?? (resolved.node.kind === 'file' ? 'imports' : 'calls') + + let structural: StructuralSimilarHit[] = [] + if (lens !== 'semantic') { + const edges = await graph.allEdges([edgeType]) + const setsBySrc = new Map>() + for (const e of edges) { + const set = setsBySrc.get(e.srcKey) ?? new Set() + set.add(e.dstKey) + setsBySrc.set(e.srcKey, set) + } + + const targetSet = setsBySrc.get(resolved.node.nodeKey) + if (targetSet && targetSet.size > 0) { + const allNodes = await graph.allNodes() + const byKey = new Map(allNodes.map((n) => [n.nodeKey, n])) + const hits: StructuralSimilarHit[] = [] + for (const [nodeKey, set] of setsBySrc) { + if (nodeKey === resolved.node.nodeKey) continue + const node = byKey.get(nodeKey) + if (!node || node.kind !== resolved.node.kind) continue + const { score, shared } = jaccard(targetSet, set) + if (shared === 0) continue + hits.push({ nodeKey, displayName: node.displayName, kind: node.kind, jaccard: score, shared }) + } + hits.sort((a, b) => b.jaccard - a.jaccard || b.shared - a.shared) + structural = hits.slice(0, topK) + } + } + + let semantic: SemanticHit[] = [] + let semanticSupported = true + if (lens !== 'structural') { + const result = await semanticNeighborsForNode(resolved.node, topK) + semantic = result.hits + semanticSupported = result.supported + } + + return { resolved, lens, structural, semantic, semanticSupported } +} diff --git a/src/core/graph/traversal.ts b/src/core/graph/traversal.ts new file mode 100644 index 0000000..1805ead --- /dev/null +++ b/src/core/graph/traversal.ts @@ -0,0 +1,54 @@ +/** + * `gitsema graph callers|callees|neighbors|path` (Phase 108, knowledge-graph + * §6/§8): thin wrappers over `GraphStore`'s recursive-CTE traversal + * primitives, resolving user-supplied identifiers via `resolveNode`. + */ + +import type { EdgeType, GraphHit, GraphPath, GraphStore } from '../storage/types.js' +import { resolveNode, type ResolveNodeResult } from './resolveNode.js' + +export interface TraversalResult { + resolved: ResolveNodeResult + hits: GraphHit[] +} + +/** Reverse `calls` traversal — who (transitively) calls `identifier`. */ +export async function callers(graph: GraphStore, identifier: string, depth?: number): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.callers(resolved.node.nodeKey, depth) } +} + +/** Forward `calls` traversal — what `identifier` (transitively) calls. */ +export async function callees(graph: GraphStore, identifier: string, depth?: number): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.callees(resolved.node.nodeKey, depth) } +} + +/** Typed neighborhood of `identifier` (any edge kinds by default). */ +export async function neighbors( + graph: GraphStore, + identifier: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number } = {}, +): Promise { + const resolved = await resolveNode(graph, identifier) + if (resolved.status !== 'found') return { resolved, hits: [] } + return { resolved, hits: await graph.neighbors(resolved.node.nodeKey, opts) } +} + +export interface PathResult { + from: ResolveNodeResult + to: ResolveNodeResult + path: GraphPath | null +} + +/** Shortest typed path from `a` to `b` — "how does A reach B". */ +export async function path(graph: GraphStore, a: string, b: string): Promise { + const from = await resolveNode(graph, a) + const to = await resolveNode(graph, b) + if (from.status !== 'found' || to.status !== 'found') { + return { from, to, path: null } + } + return { from, to, path: await graph.path(from.node.nodeKey, to.node.nodeKey) } +} diff --git a/src/core/graph/unused.ts b/src/core/graph/unused.ts new file mode 100644 index 0000000..ee8af26 --- /dev/null +++ b/src/core/graph/unused.ts @@ -0,0 +1,42 @@ +/** + * `gitsema unused` (Phase 109, knowledge-graph §7/§8): symbols/files with no + * inbound `calls`/`imports` edges — the structural complement to the semantic + * `dead-concepts` command. + */ + +import type { EdgeType, GraphNodeRecord, GraphStore } from '../storage/types.js' + +export const UNUSED_EDGE_TYPES: EdgeType[] = ['calls', 'imports'] +export const UNUSED_NODE_KINDS = ['file', 'function', 'class', 'method'] + +export interface UnusedOptions { + /** Inbound edge types that count as "used" (default: calls, imports). */ + edgeTypes?: EdgeType[] + /** Node kinds to consider (default: file + function/class/method symbol kinds). */ + kinds?: string[] +} + +export interface UnusedResult { + nodes: GraphNodeRecord[] +} + +export async function unused(graph: GraphStore, opts: UnusedOptions = {}): Promise { + const edgeTypes = opts.edgeTypes ?? UNUSED_EDGE_TYPES + const kinds = opts.kinds ?? UNUSED_NODE_KINDS + + const [allNodes, allEdges] = await Promise.all([ + graph.allNodes(), + graph.allEdges(edgeTypes), + ]) + + const referenced = new Set() + for (const e of allEdges) referenced.add(e.dstKey) + + const nodes = allNodes.filter((n) => + !n.isExternal && + kinds.includes(n.kind) && + !referenced.has(n.nodeKey), + ) + + return { nodes } +} diff --git a/src/core/models/types.ts b/src/core/models/types.ts index 027c3d7..f2cb2b9 100644 --- a/src/core/models/types.ts +++ b/src/core/models/types.ts @@ -70,5 +70,5 @@ export interface SearchResult { /** Cluster label from `cluster_assignments` — populated by `--annotate-clusters` on the search command. */ clusterLabel?: string /** When explain=true, breakdown of score components. */ - signals?: { cosine: number; recency?: number; pathScore?: number; bm25?: number } + signals?: { cosine: number; recency?: number; pathScore?: number; bm25?: number; structural?: number } } diff --git a/src/core/search/analysis/vectorSearch.ts b/src/core/search/analysis/vectorSearch.ts index 844566d..1319d2e 100644 --- a/src/core/search/analysis/vectorSearch.ts +++ b/src/core/search/analysis/vectorSearch.ts @@ -164,6 +164,21 @@ export interface VectorSearchOptions { weightVector?: number weightRecency?: number weightPath?: number + /** + * Fourth ranking signal weight (Phase 109, knowledge-graph §7.2): structural + * proximity from a query anchor along graph edges. Defaults to 0 (no + * structural signal) — when neither this nor `structuralScores` is set, the + * scoring formula is byte-for-byte identical to the pre-Phase-109 three-signal + * (or plain cosine) behavior. + */ + weightStructural?: number + /** + * Precomputed per-blob structural proximity scores (e.g. `1 / (1 + hops)` + * from a query anchor, weighted by edge confidence — see + * `src/core/graph/structuralScore.ts`), keyed by `blobHash`. Missing entries + * score 0. Only consulted when `weightStructural` is set. + */ + structuralScores?: Map query?: string searchChunks?: boolean searchSymbols?: boolean @@ -195,11 +210,14 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const { topK = 10, model, recent = false, alpha = 0.8, before, after, - weightVector, weightRecency, weightPath, query = '', + weightVector, weightRecency, weightPath, weightStructural, structuralScores, query = '', searchChunks = false, searchSymbols = false, searchModules = false, branch, negativeQueryEmbedding, negativeLambda, explain, earlyCut = 0, queryText, noCache = false, allowedHashes, } = options + // Per-anchor structural scores vary independently of the cache key's query + // text/embedding fingerprint, so bypass the result cache when present. + const effectiveNoCache = noCache || !!structuralScores // Per-mode row caps to prevent memory spikes on large indexes (review7 §4.4/§4.5). const FILE_CAP = (() => { @@ -221,7 +239,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const cacheKeyOptions: Record = { topK, model, recent, alpha, before, after, - weightVector, weightRecency, weightPath, query, + weightVector, weightRecency, weightPath, weightStructural, query, searchChunks, searchSymbols, searchModules, branch, negativeLambda, explain, earlyCut, // §11.1 — include a deterministic fingerprint of allowedHashes so that @@ -237,16 +255,17 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea queryText ?? embeddingFingerprint(queryEmbedding), cacheKeyOptions, ) - if (!noCache) { + if (!effectiveNoCache) { const cached = getCachedResults(cacheKey) if (cached) return cached } - const useThreeSignal = weightVector !== undefined || weightRecency !== undefined || weightPath !== undefined + const useWeightedSignals = weightVector !== undefined || weightRecency !== undefined || weightPath !== undefined || weightStructural !== undefined const wv = weightVector ?? 0.7 const wr = weightRecency ?? 0.2 const wp = weightPath ?? 0.1 - const wTotal = wv + wr + wp || 1 + const ws = weightStructural ?? 0 + const wTotal = wv + wr + wp + ws || 1 const { db, rawDb } = getActiveSession() const AUTO_CANDIDATE_LIMIT = FILE_CAP @@ -424,7 +443,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const negLambda = options.negativeLambda ?? 0.5 const negNorm = negEmbedding ? vectorNorm(negEmbedding) : 0 - const needRecency = recent || useThreeSignal + const needRecency = recent || useWeightedSignals let recencyScores: Map | null = null if (needRecency) { const candidateHashes = [...new Set( @@ -435,7 +454,7 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea } let pathsByBlob: Map | null = null - if (useThreeSignal) { + if (useWeightedSignals) { const hashes = [...new Set( scoringPool.filter((r) => !r.blobHash.startsWith('\0module:')).map((r) => r.blobHash), )] @@ -475,11 +494,12 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea score = cosine - (negLambda * negCos) } - if (useThreeSignal) { + if (useWeightedSignals) { const recency = recencyScores?.get(row.blobHash) ?? 0 const blobPaths = pathsByBlob?.get(row.blobHash) ?? [] const pathScore = blobPaths.length > 0 ? Math.max(...blobPaths.map((p) => pathRelevanceScore(query, p))) : 0 - score = (wv * cosine + wr * recency + wp * pathScore) / wTotal + const structScore = structuralScores?.get(row.blobHash) ?? 0 + score = (wv * cosine + wr * recency + wp * pathScore + ws * structScore) / wTotal } else if (recent) { const recency = recencyScores?.get(row.blobHash) ?? 0 score = alpha * cosine + (1 - alpha) * recency @@ -563,13 +583,14 @@ export async function vectorSearch(queryEmbedding: Embedding, options: VectorSea const recency = recencyScores?.get(b.blobHash) const blobPaths = pathsByBlob?.get(b.blobHash) ?? [] const pathScore = blobPaths.length > 0 ? Math.max(...blobPaths.map((p) => pathRelevanceScore(query, p))) : undefined - base.signals = { cosine: b.cosine, recency: recency ?? undefined, pathScore } + const structScore = structuralScores?.get(b.blobHash) + base.signals = { cosine: b.cosine, recency: recency ?? undefined, pathScore, structural: structScore } } return base }) - if (!noCache && !allowedHashes) { + if (!effectiveNoCache && !allowedHashes) { setCachedResults(cacheKey, results) } diff --git a/src/core/storage/postgres/graphStore.ts b/src/core/storage/postgres/graphStore.ts index a6c923d..fc2f865 100644 --- a/src/core/storage/postgres/graphStore.ts +++ b/src/core/storage/postgres/graphStore.ts @@ -8,7 +8,9 @@ import type { Pool } from 'pg' import { ensurePostgresSchema } from './migrations.js' -import type { EdgeType, GraphEdgeRecord, GraphNodeRecord, GraphStore } from '../types.js' +import type { EdgeType, GraphEdgeRecord, GraphHit, GraphNodeRecord, GraphPath, GraphPathHop, GraphStore, GraphSubgraph } from '../types.js' +import { MAX_GRAPH_TRAVERSAL_DEPTH } from '../types.js' +import { traverseNeighbors, findShortestPath, clampDepth, type WalkHit } from './graphTraversal.js' export class PostgresGraphStore implements GraphStore { constructor(private readonly pool: Pool) {} @@ -97,6 +99,77 @@ export class PostgresGraphStore implements GraphStore { ? collected.filter((e) => edgeTypes.includes(e.edgeType)) : collected } + + async neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { ...opts, depthFallback: 1 }) + return this.hydrateHits(hits) + } + + async callers(key: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { edgeTypes: ['calls'], direction: 'in', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async callees(key: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const hits = await traverseNeighbors(this.pool, key, { edgeTypes: ['calls'], direction: 'out', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async path(from: string, to: string): Promise { + await ensurePostgresSchema(this.pool) + const found = await findShortestPath(this.pool, from, to) + if (!found) return null + + const hops: GraphPathHop[] = [] + for (const hop of found.hops) { + const node = await this.getNode(hop.nodeKey) + hops.push({ + nodeKey: hop.nodeKey, + displayName: node?.displayName ?? hop.nodeKey, + edgeType: hop.edgeType, + reversed: hop.reversed, + }) + } + return { from, to, hops } + } + + async subgraph(seed: string, depth?: number): Promise { + await ensurePostgresSchema(this.pool) + const maxDepth = clampDepth(depth, MAX_GRAPH_TRAVERSAL_DEPTH) + const hits = await traverseNeighbors(this.pool, seed, { direction: 'both', depth: maxDepth }) + const nodeKeys = [...new Set([seed, ...hits.map((h) => h.nodeKey)])] + + if (nodeKeys.length === 0) return { nodes: [], edges: [] } + + const nodesRes = await this.pool.query('SELECT * FROM graph_nodes WHERE node_key = ANY($1::text[])', [nodeKeys]) + const edgesRes = await this.pool.query( + 'SELECT * FROM edges WHERE src_key = ANY($1::text[]) AND dst_key = ANY($1::text[])', + [nodeKeys], + ) + return { nodes: nodesRes.rows.map(rowToNode), edges: edgesRes.rows.map(rowToEdge) } + } + + private async hydrateHits(hits: WalkHit[]): Promise { + if (hits.length === 0) return [] + const nodeKeys = hits.map((h) => h.nodeKey) + const res = await this.pool.query('SELECT * FROM graph_nodes WHERE node_key = ANY($1::text[])', [nodeKeys]) + const byKey = new Map(res.rows.map((r: GraphNodeRow) => [r.node_key, rowToNode(r)])) + return hits + .map((h) => { + const node = byKey.get(h.nodeKey) + return { + nodeKey: h.nodeKey, + displayName: node?.displayName ?? h.nodeKey, + kind: node?.kind ?? 'unknown', + depth: h.depth, + edgeType: h.edgeType, + } + }) + .sort((a, b) => a.depth - b.depth || a.nodeKey.localeCompare(b.nodeKey)) + } } interface GraphNodeRow { diff --git a/src/core/storage/postgres/graphTraversal.ts b/src/core/storage/postgres/graphTraversal.ts new file mode 100644 index 0000000..5ffe487 --- /dev/null +++ b/src/core/storage/postgres/graphTraversal.ts @@ -0,0 +1,118 @@ +/** + * Recursive-CTE traversal primitives over `edges`/`graph_nodes` for the + * Postgres `GraphStore` (Phase 108, knowledge-graph §6). Mirrors + * `../sqlite/graphTraversal.ts`; Postgres supports the same `WITH RECURSIVE` + * + window-function shape as SQLite. + */ + +import type { Pool } from 'pg' +import { MAX_GRAPH_TRAVERSAL_DEPTH, type EdgeType } from '../types.js' + +export function clampDepth(depth: number | undefined, fallback: number): number { + const d = depth ?? fallback + return Math.max(1, Math.min(Math.trunc(d), MAX_GRAPH_TRAVERSAL_DEPTH)) +} + +export interface WalkHit { + nodeKey: string + depth: number + edgeType?: EdgeType +} + +async function walkDirection( + pool: Pool, + start: string, + maxDepth: number, + edgeTypes: EdgeType[] | undefined, + direction: 'out' | 'in', +): Promise { + const srcCol = direction === 'out' ? 'src_key' : 'dst_key' + const dstCol = direction === 'out' ? 'dst_key' : 'src_key' + const edgeFilter = edgeTypes && edgeTypes.length > 0 ? 'AND e.edge_type = ANY($3::text[])' : '' + const params: unknown[] = [start, maxDepth] + if (edgeTypes && edgeTypes.length > 0) params.push(edgeTypes) + + const query = ` + WITH RECURSIVE walk(node_key, depth, edge_type) AS ( + SELECT $1::text AS node_key, 0 AS depth, NULL::text AS edge_type + UNION ALL + SELECT e.${dstCol}, w.depth + 1, e.edge_type + FROM walk w JOIN edges e ON e.${srcCol} = w.node_key + WHERE w.depth < $2 ${edgeFilter} + ) + SELECT node_key, depth, edge_type FROM ( + SELECT node_key, depth, edge_type, + ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth) AS rn + FROM walk WHERE depth > 0 + ) ranked WHERE rn = 1 + ` + const res = await pool.query(query, params) + return res.rows.map((r: { node_key: string; depth: number; edge_type: string | null }) => ({ + nodeKey: r.node_key, + depth: r.depth, + edgeType: (r.edge_type ?? undefined) as EdgeType | undefined, + })) +} + +export async function traverseNeighbors( + pool: Pool, + start: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number; depthFallback?: number }, +): Promise { + const direction = opts.direction ?? 'both' + const maxDepth = clampDepth(opts.depth, opts.depthFallback ?? 1) + const merged = new Map() + + if (direction === 'out' || direction === 'both') { + for (const hit of await walkDirection(pool, start, maxDepth, opts.edgeTypes, 'out')) { + merged.set(hit.nodeKey, hit) + } + } + if (direction === 'in' || direction === 'both') { + for (const hit of await walkDirection(pool, start, maxDepth, opts.edgeTypes, 'in')) { + const existing = merged.get(hit.nodeKey) + if (!existing || hit.depth < existing.depth) merged.set(hit.nodeKey, hit) + } + } + return [...merged.values()] +} + +export interface PathRow { + depth: number + hops: { nodeKey: string; edgeType: EdgeType; reversed: boolean }[] +} + +export async function findShortestPath( + pool: Pool, + from: string, + to: string, + maxDepth: number = MAX_GRAPH_TRAVERSAL_DEPTH, +): Promise { + if (from === to) return { depth: 0, hops: [] } + + const query = ` + WITH RECURSIVE walk(node_key, depth, path) AS ( + SELECT $1::text AS node_key, 0 AS depth, $1::text AS path + UNION ALL + SELECT + CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END, + w.depth + 1, + w.path || '|' || e.edge_type || '|' || (CASE WHEN e.src_key = w.node_key THEN '0' ELSE '1' END) + || '|' || (CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END) + FROM walk w + JOIN edges e ON (e.src_key = w.node_key OR e.dst_key = w.node_key) AND e.src_key != e.dst_key + WHERE w.depth < $2 + ) + SELECT path, depth FROM walk WHERE node_key = $3 AND depth > 0 ORDER BY depth ASC LIMIT 1 + ` + const res = await pool.query(query, [from, maxDepth, to]) + const row = res.rows[0] as { path: string; depth: number } | undefined + if (!row) return null + + const parts = row.path.split('|') + const hops: PathRow['hops'] = [] + for (let i = 1; i < parts.length; i += 3) { + hops.push({ edgeType: parts[i] as EdgeType, reversed: parts[i + 1] === '1', nodeKey: parts[i + 2] }) + } + return { depth: row.depth, hops } +} diff --git a/src/core/storage/sqlite/graphTraversal.ts b/src/core/storage/sqlite/graphTraversal.ts new file mode 100644 index 0000000..5865dcc --- /dev/null +++ b/src/core/storage/sqlite/graphTraversal.ts @@ -0,0 +1,131 @@ +/** + * Recursive-CTE traversal primitives over `edges`/`graph_nodes` for the + * SQLite `GraphStore` (Phase 108, knowledge-graph §6). + * + * Each helper takes the active session's raw `better-sqlite3` handle and + * returns plain rows; `SqliteGraphStore` (profile.ts) wraps these into the + * `GraphHit`/`GraphPath`/`GraphSubgraph` shapes from `../types.js`. + */ + +import type Database from 'better-sqlite3' +import { MAX_GRAPH_TRAVERSAL_DEPTH, type EdgeType } from '../types.js' + +export function clampDepth(depth: number | undefined, fallback: number): number { + const d = depth ?? fallback + return Math.max(1, Math.min(Math.trunc(d), MAX_GRAPH_TRAVERSAL_DEPTH)) +} + +export interface WalkHit { + nodeKey: string + depth: number + edgeType?: EdgeType +} + +/** + * Single-direction recursive walk from `start`, returning the shortest-depth + * hit (with the edge type of that hop) for every node reached within + * `maxDepth`. `direction: 'out'` follows `src_key -> dst_key`; `'in'` follows + * `dst_key -> src_key`. + */ +function walkDirection( + rawDb: InstanceType, + start: string, + maxDepth: number, + edgeTypes: EdgeType[] | undefined, + direction: 'out' | 'in', +): WalkHit[] { + const srcCol = direction === 'out' ? 'src_key' : 'dst_key' + const dstCol = direction === 'out' ? 'dst_key' : 'src_key' + const edgeFilter = edgeTypes && edgeTypes.length > 0 + ? `AND e.edge_type IN (${edgeTypes.map(() => '?').join(',')})` + : '' + + const query = ` + WITH RECURSIVE walk(node_key, depth, edge_type) AS ( + SELECT ? AS node_key, 0 AS depth, NULL AS edge_type + UNION ALL + SELECT e.${dstCol}, w.depth + 1, e.edge_type + FROM walk w JOIN edges e ON e.${srcCol} = w.node_key + WHERE w.depth < ? ${edgeFilter} + ) + SELECT node_key, depth, edge_type FROM ( + SELECT node_key, depth, edge_type, + ROW_NUMBER() OVER (PARTITION BY node_key ORDER BY depth) AS rn + FROM walk WHERE depth > 0 + ) WHERE rn = 1 + ` + const params: unknown[] = [start, maxDepth, ...(edgeTypes ?? [])] + const rows = rawDb.prepare(query).all(...params) as Array<{ node_key: string; depth: number; edge_type: string | null }> + return rows.map((r) => ({ nodeKey: r.node_key, depth: r.depth, edgeType: (r.edge_type ?? undefined) as EdgeType | undefined })) +} + +/** + * Typed neighborhood of `start` (Phase 108, knowledge-graph §6). `direction` + * defaults to `'both'`. Depth is clamped via `clampDepth`. + */ +export function traverseNeighbors( + rawDb: InstanceType, + start: string, + opts: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number; depthFallback?: number }, +): WalkHit[] { + const direction = opts.direction ?? 'both' + const maxDepth = clampDepth(opts.depth, opts.depthFallback ?? 1) + const merged = new Map() + + if (direction === 'out' || direction === 'both') { + for (const hit of walkDirection(rawDb, start, maxDepth, opts.edgeTypes, 'out')) { + merged.set(hit.nodeKey, hit) + } + } + if (direction === 'in' || direction === 'both') { + for (const hit of walkDirection(rawDb, start, maxDepth, opts.edgeTypes, 'in')) { + const existing = merged.get(hit.nodeKey) + if (!existing || hit.depth < existing.depth) merged.set(hit.nodeKey, hit) + } + } + return [...merged.values()] +} + +export interface PathRow { + depth: number + hops: { nodeKey: string; edgeType: EdgeType; reversed: boolean }[] +} + +/** + * Shortest path from `from` to `to` over edges of any type, traversed in + * either direction, via a recursive CTE that accumulates a delimited path + * string. Returns `null` if unreachable within `MAX_GRAPH_TRAVERSAL_DEPTH`. + */ +export function findShortestPath( + rawDb: InstanceType, + from: string, + to: string, + maxDepth: number = MAX_GRAPH_TRAVERSAL_DEPTH, +): PathRow | null { + if (from === to) return { depth: 0, hops: [] } + + const query = ` + WITH RECURSIVE walk(node_key, depth, path) AS ( + SELECT ? AS node_key, 0 AS depth, ? AS path + UNION ALL + SELECT + CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END, + w.depth + 1, + w.path || '|' || e.edge_type || '|' || (CASE WHEN e.src_key = w.node_key THEN '0' ELSE '1' END) + || '|' || (CASE WHEN e.src_key = w.node_key THEN e.dst_key ELSE e.src_key END) + FROM walk w + JOIN edges e ON (e.src_key = w.node_key OR e.dst_key = w.node_key) AND e.src_key != e.dst_key + WHERE w.depth < ? + ) + SELECT path, depth FROM walk WHERE node_key = ? AND depth > 0 ORDER BY depth ASC LIMIT 1 + ` + const row = rawDb.prepare(query).get(from, from, maxDepth, to) as { path: string; depth: number } | undefined + if (!row) return null + + const parts = row.path.split('|') + const hops: PathRow['hops'] = [] + for (let i = 1; i < parts.length; i += 3) { + hops.push({ edgeType: parts[i] as EdgeType, reversed: parts[i + 1] === '1', nodeKey: parts[i + 2] }) + } + return { depth: row.depth, hops } +} diff --git a/src/core/storage/sqlite/profile.ts b/src/core/storage/sqlite/profile.ts index 3249410..b7ef884 100644 --- a/src/core/storage/sqlite/profile.ts +++ b/src/core/storage/sqlite/profile.ts @@ -10,7 +10,7 @@ import { getActiveSession } from '../../db/sqlite.js' import { embeddings, paths, blobs, commits, blobCommits, indexedCommits, blobBranches, chunks, chunkEmbeddings, symbols, symbolEmbeddings, moduleEmbeddings, commitEmbeddings, graphNodes, edges } from '../../db/schema.js' -import { eq, inArray, sql } from 'drizzle-orm' +import { eq, inArray, sql, and } from 'drizzle-orm' import { isIndexed as dedupeIsIndexed, filterNewBlobs as dedupeFilterNewBlobs } from '../../indexing/deduper.js' import { storeFtsContent, getBlobContent, storeBlob, storeBlobRecord, @@ -29,8 +29,12 @@ import type { EdgeType, FtsStore, GraphEdgeRecord, + GraphHit, GraphNodeRecord, + GraphPath, + GraphPathHop, GraphStore, + GraphSubgraph, MetadataStore, StorageProfile, StorageScope, @@ -42,6 +46,8 @@ import type { WriteBlobRecordArgs, WriteFileBlobArgs, } from '../types.js' +import { MAX_GRAPH_TRAVERSAL_DEPTH } from '../types.js' +import { traverseNeighbors, findShortestPath, clampDepth, type WalkHit } from './graphTraversal.js' class SqliteMetadataStore implements MetadataStore { async isIndexed(blobHash: string, model: string): Promise { @@ -397,6 +403,82 @@ export class SqliteGraphStore implements GraphStore { ? collected.filter((e) => edgeTypes.includes(e.edgeType)) : collected } + + async neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { ...opts, depthFallback: 1 }) + return this.hydrateHits(hits) + } + + async callers(key: string, depth?: number): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { edgeTypes: ['calls'], direction: 'in', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async callees(key: string, depth?: number): Promise { + const { rawDb } = getActiveSession() + const hits = traverseNeighbors(rawDb, key, { edgeTypes: ['calls'], direction: 'out', depth, depthFallback: MAX_GRAPH_TRAVERSAL_DEPTH }) + return this.hydrateHits(hits) + } + + async path(from: string, to: string): Promise { + const { rawDb } = getActiveSession() + const found = findShortestPath(rawDb, from, to) + if (!found) return null + + const hops: GraphPathHop[] = [] + for (const hop of found.hops) { + const node = await this.getNode(hop.nodeKey) + hops.push({ + nodeKey: hop.nodeKey, + displayName: node?.displayName ?? hop.nodeKey, + edgeType: hop.edgeType, + reversed: hop.reversed, + }) + } + return { from, to, hops } + } + + async subgraph(seed: string, depth?: number): Promise { + const { db, rawDb } = getActiveSession() + const maxDepth = clampDepth(depth, MAX_GRAPH_TRAVERSAL_DEPTH) + const hits = traverseNeighbors(rawDb, seed, { direction: 'both', depth: maxDepth }) + const nodeKeys = [...new Set([seed, ...hits.map((h) => h.nodeKey)])] + + const nodes = nodeKeys.length > 0 + ? db.select().from(graphNodes).where(inArray(graphNodes.nodeKey, nodeKeys)).all().map(rowToNode) + : [] + + const edgeRows = nodeKeys.length > 0 + ? db.select().from(edges) + .where(and(inArray(edges.srcKey, nodeKeys), inArray(edges.dstKey, nodeKeys))) + .all() + : [] + + return { nodes, edges: edgeRows.map(rowToEdge) } + } + + /** Resolves `WalkHit`s into `GraphHit`s by looking up display name/kind for each node. */ + private async hydrateHits(hits: WalkHit[]): Promise { + if (hits.length === 0) return [] + const { db } = getActiveSession() + const nodeKeys = hits.map((h) => h.nodeKey) + const rows = db.select().from(graphNodes).where(inArray(graphNodes.nodeKey, nodeKeys)).all() + const byKey = new Map(rows.map((r) => [r.nodeKey, rowToNode(r)])) + return hits + .map((h) => { + const node = byKey.get(h.nodeKey) + return { + nodeKey: h.nodeKey, + displayName: node?.displayName ?? h.nodeKey, + kind: node?.kind ?? 'unknown', + depth: h.depth, + edgeType: h.edgeType, + } + }) + .sort((a, b) => a.depth - b.depth || a.nodeKey.localeCompare(b.nodeKey)) + } } function rowToNode(row: typeof graphNodes.$inferSelect): GraphNodeRecord { diff --git a/src/core/storage/types.ts b/src/core/storage/types.ts index 4e4e142..11b0dda 100644 --- a/src/core/storage/types.ts +++ b/src/core/storage/types.ts @@ -164,11 +164,51 @@ export interface GraphEdgeRecord { observedCount?: number } +/** + * Default/maximum traversal depth for `GraphStore` traversal primitives + * (Phase 108, knowledge-graph §6). Capped to bound recursive-CTE cost. + */ +export const MAX_GRAPH_TRAVERSAL_DEPTH = 3 + +/** A node reached during a `GraphStore` traversal (Phase 108, knowledge-graph §6). */ +export interface GraphHit { + nodeKey: string + displayName: string + kind: string + /** Number of hops from the traversal's starting node (>= 1). */ + depth: number + /** The edge type of the (shortest) hop that reached this node, if known. */ + edgeType?: EdgeType +} + +/** One hop in a `GraphPath` (Phase 108, knowledge-graph §6). */ +export interface GraphPathHop { + nodeKey: string + displayName: string + edgeType: EdgeType + /** True if this hop traverses an edge against its stored src->dst direction. */ + reversed: boolean +} + +/** A shortest typed path between two graph nodes (Phase 108, knowledge-graph §6). */ +export interface GraphPath { + from: string + to: string + /** Hops from `from` to `to`, in order. Empty if `from === to`. */ + hops: GraphPathHop[] +} + +/** A node-induced subgraph (Phase 108, knowledge-graph §6). */ +export interface GraphSubgraph { + nodes: GraphNodeRecord[] + edges: GraphEdgeRecord[] +} + /** * Storage for the recomputable structural graph (Phase 107, knowledge-graph * §3.3/§6). `gitsema graph build` truncates and rebuilds nodes/edges wholesale * (like `blob_clusters`); read methods back the early `co-change`/`deps`/ - * `cycles` commands. + * `cycles` commands plus the Phase 108 traversal primitives. * * Relational-only (review9 §4): the Qdrant profile's `GraphStore` throws on * every method — graph queries require a relational backend. @@ -187,6 +227,21 @@ export interface GraphStore { allEdges(edgeTypes?: EdgeType[]): Promise /** Edges touching `nodeKey`, optionally filtered by direction and edge types. */ edgesFor(nodeKey: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both' }): Promise + + /** + * Typed neighborhood of `key` via recursive traversal (Phase 108, + * knowledge-graph §6). `direction` defaults to `'both'`; `depth` defaults + * to 1 and is capped at `MAX_GRAPH_TRAVERSAL_DEPTH`. + */ + neighbors(key: string, opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise + /** Reverse `calls` traversal — who (transitively) calls `key`. Depth capped at `MAX_GRAPH_TRAVERSAL_DEPTH` (default). */ + callers(key: string, depth?: number): Promise + /** Forward `calls` traversal — what `key` (transitively) calls. Depth capped at `MAX_GRAPH_TRAVERSAL_DEPTH` (default). */ + callees(key: string, depth?: number): Promise + /** Shortest typed path from `from` to `to` (any edge type/direction), or null if unreachable within `MAX_GRAPH_TRAVERSAL_DEPTH`. */ + path(from: string, to: string): Promise + /** The node-induced subgraph within `depth` hops of `seed` (both directions, all edge types). `depth` capped at `MAX_GRAPH_TRAVERSAL_DEPTH`. */ + subgraph(seed: string, depth?: number): Promise } /** A raw structural reference to persist for one blob (Phase 106, knowledge-graph §3.2). */ diff --git a/src/core/storage/unsupportedGraphStore.ts b/src/core/storage/unsupportedGraphStore.ts index 3b68511..8e54340 100644 --- a/src/core/storage/unsupportedGraphStore.ts +++ b/src/core/storage/unsupportedGraphStore.ts @@ -8,7 +8,7 @@ * graph-unavailable backend, not a silent empty graph. */ -import type { EdgeType, GraphEdgeRecord, GraphNodeRecord, GraphStore } from './types.js' +import type { EdgeType, GraphEdgeRecord, GraphHit, GraphNodeRecord, GraphPath, GraphStore, GraphSubgraph } from './types.js' const ERROR_MESSAGE = 'graph queries require a relational backend (Qdrant storage profiles do not support gitsema graph build/co-change/deps/cycles)' @@ -40,4 +40,24 @@ export class UnsupportedGraphStore implements GraphStore { async edgesFor(_nodeKey: string, _opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both' }): Promise { throw new Error(ERROR_MESSAGE) } + + async neighbors(_key: string, _opts?: { edgeTypes?: EdgeType[]; direction?: 'out' | 'in' | 'both'; depth?: number }): Promise { + throw new Error(ERROR_MESSAGE) + } + + async callers(_key: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } + + async callees(_key: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } + + async path(_from: string, _to: string): Promise { + throw new Error(ERROR_MESSAGE) + } + + async subgraph(_seed: string, _depth?: number): Promise { + throw new Error(ERROR_MESSAGE) + } } diff --git a/src/mcp/server.ts b/src/mcp/server.ts index e7595c4..bb62da0 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -12,6 +12,7 @@ import { registerClusteringTools } from './tools/clustering.js' import { registerWorkflowTools } from './tools/workflow.js' import { registerInfrastructureTools } from './tools/infrastructure.js' import { registerNarratorTools } from './tools/narrator.js' +import { registerGraphTools } from './tools/graph.js' import { readFileSync } from 'node:fs' // Read package version dynamically so the MCP server always matches package.json @@ -37,6 +38,7 @@ export async function startMcpServer(): Promise { registerWorkflowTools(server) registerInfrastructureTools(server) registerNarratorTools(server) + registerGraphTools(server) const transport = new StdioServerTransport() await server.connect(transport) diff --git a/src/mcp/tools/graph.ts b/src/mcp/tools/graph.ts new file mode 100644 index 0000000..4afe271 --- /dev/null +++ b/src/mcp/tools/graph.ts @@ -0,0 +1,90 @@ +import { z } from 'zod' +import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js' +import { registerTool } from '../registerTool.js' +import { getCachedStorageProfile } from '../../core/storage/resolveProfile.js' +import { callers, callees, neighbors } from '../../core/graph/traversal.js' +import type { EdgeType, GraphHit } from '../../core/storage/types.js' + +function renderResolutionError(label: string, resolved: { status: string; candidates?: Array<{ nodeKey: string }> }): string { + if (resolved.status === 'not-found') { + return `No graph node found for "${label}". Run \`gitsema index --graph\` then \`gitsema graph build\` first.` + } + const candidates = (resolved.candidates ?? []).map((c) => ` ${c.nodeKey}`).join('\n') + return `"${label}" is ambiguous — matches multiple symbols:\n${candidates}` +} + +function renderHits(hits: GraphHit[]): string { + if (hits.length === 0) return ' (none)' + return hits.map((h) => ` ${h.edgeType ? `[${h.edgeType}] ` : ''}${h.displayName} (depth ${h.depth})`).join('\n') +} + +/** + * Phase 108 (knowledge-graph §6/§8) MCP tools, exposing the `GraphStore` + * traversal primitives: `call_graph` (callers/callees over `calls` edges) + * and `graph_neighbors` (typed neighborhood, any edge kinds). + */ +export function registerGraphTools(server: McpServer) { + registerTool( + server, + 'call_graph', + 'Structural call-graph traversal: who calls (or is called by) a symbol, via the Phase 107/108 structural graph (`gitsema index --graph` + `gitsema graph build`). Reverse `calls` traversal (direction=callers) finds callers; forward (direction=callees) finds callees.', + { + symbol: z.string().describe('A symbol qualified name, file path, or literal node key (file:..., symbol:..., external:...)'), + direction: z.enum(['callers', 'callees']).optional().default('callers').describe('Traverse reverse (callers) or forward (callees) `calls` edges'), + depth: z.number().int().min(1).max(3).optional().describe('Traversal depth (default and max: 3)'), + }, + async ({ symbol, direction, depth }) => { + try { + const profile = getCachedStorageProfile(process.cwd()) + const result = direction === 'callees' + ? await callees(profile.graph, symbol, depth) + : await callers(profile.graph, symbol, depth) + + if (result.resolved.status !== 'found') { + return { content: [{ type: 'text', text: renderResolutionError(symbol, result.resolved) }] } + } + + const node = result.resolved.node + const label = direction === 'callees' ? 'Callees of' : 'Callers of' + return { content: [{ type: 'text', text: `${label} ${node.displayName} (${node.nodeKey}):\n\n${renderHits(result.hits)}` }] } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + return { content: [{ type: 'text', text: `Error: ${msg}` }] } + } + }, + ) + + registerTool( + server, + 'graph_neighbors', + 'Typed neighborhood of a node in the structural graph (Phase 107/108: `gitsema index --graph` + `gitsema graph build`). Returns nodes connected by the given edge types (default: all) in the given direction.', + { + node: z.string().describe('A symbol qualified name, file path, or literal node key (file:..., symbol:..., external:...)'), + edge_types: z.array(z.enum(['contains', 'defines', 'imports', 'calls', 'extends', 'implements', 'references', 'co_change', 'similar_to'])) + .optional() + .describe('Edge types to traverse (default: all)'), + direction: z.enum(['out', 'in', 'both']).optional().default('both').describe("Edge direction relative to 'node'"), + depth: z.number().int().min(1).max(3).optional().describe('Traversal depth (default 1, max 3)'), + }, + async ({ node, edge_types, direction, depth }) => { + try { + const profile = getCachedStorageProfile(process.cwd()) + const result = await neighbors(profile.graph, node, { + edgeTypes: edge_types as EdgeType[] | undefined, + direction, + depth, + }) + + if (result.resolved.status !== 'found') { + return { content: [{ type: 'text', text: renderResolutionError(node, result.resolved) }] } + } + + const resolvedNode = result.resolved.node + return { content: [{ type: 'text', text: `Neighbors of ${resolvedNode.displayName} (${resolvedNode.nodeKey}):\n\n${renderHits(result.hits)}` }] } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + return { content: [{ type: 'text', text: `Error: ${msg}` }] } + } + }, + ) +} diff --git a/tests/graphLens.test.ts b/tests/graphLens.test.ts new file mode 100644 index 0000000..5937421 --- /dev/null +++ b/tests/graphLens.test.ts @@ -0,0 +1,290 @@ +/** + * Tests for the Phase 109 `--lens` toggle (knowledge-graph §7/§8): + * - the four-signal `vectorSearch` ranking formula (`weightStructural` / + * `structuralScores`), and its semantic-lens-identical default behavior + * - the `blastRadius` / `relate` / `similar` / `unused` core modules. + */ + +import { describe, it, expect, afterEach } from 'vitest' +import { mkdtempSync, rmSync } from 'node:fs' +import { join } from 'node:path' +import { tmpdir } from 'node:os' +import { openDatabaseAt, withDbSession, type DbSession } from '../src/core/db/sqlite.js' +import { SqliteGraphStore } from '../src/core/storage/sqlite/profile.js' +import { vectorSearch } from '../src/core/search/analysis/vectorSearch.js' +import { blastRadius } from '../src/core/graph/blastRadius.js' +import { relate } from '../src/core/graph/relate.js' +import { similar } from '../src/core/graph/similar.js' +import { unused } from '../src/core/graph/unused.js' +import type { GraphEdgeRecord, GraphNodeRecord } from '../src/core/storage/types.js' + +function bufFromArray(arr: number[]) { + return Buffer.from(new Float32Array(arr).buffer) +} + +const tmpDirs: string[] = [] +afterEach(() => { + for (const dir of tmpDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }) + } +}) + +function setupDb(): { session: DbSession; tmpDir: string } { + const tmpDir = mkdtempSync(join(tmpdir(), 'gitsema-graphlens-')) + const session = openDatabaseAt(join(tmpDir, 'test.db')) + tmpDirs.push(tmpDir) + return { session, tmpDir } +} + +// --------------------------------------------------------------------------- +// vectorSearch four-signal ranking +// --------------------------------------------------------------------------- + +describe('vectorSearch — four-signal ranking (Phase 109)', () => { + it('semantic lens (no structural options) is identical to pre-Phase-109 cosine ranking', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0, 1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + const results = await vectorSearch([1, 0, 0, 0], { topK: 10, noCache: true }) + expect(results.map((r) => r.blobHash)).toEqual(['blobA', 'blobB']) + // No weighted-signals options set -> score === cosine similarity. + expect(results[0].score).toBeCloseTo(1, 6) + expect(results[1].score).toBeCloseTo(0, 6) + }) + session.rawDb.close() + }) + + it('weightStructural + structuralScores reorders results via the four-signal formula', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + // blobA is closer to the query vector by cosine... + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + // ...but blobB is structurally adjacent to the query anchor (score 1), + // while blobA has no structural relation (score 0). + const structuralScores = new Map([['blobB', 1]]) + + const results = await vectorSearch([1, 0, 0, 0], { + topK: 10, + noCache: true, + weightVector: 0, + weightRecency: 0, + weightPath: 0, + weightStructural: 1, + structuralScores, + explain: true, + }) + + const byHash = new Map(results.map((r) => [r.blobHash, r])) + // wTotal = 0+0+0+1 = 1, so score === structScore directly. + expect(byHash.get('blobB')!.score).toBeCloseTo(1, 6) + expect(byHash.get('blobA')!.score).toBeCloseTo(0, 6) + expect(byHash.get('blobB')!.signals?.structural).toBeCloseTo(1, 6) + expect(byHash.get('blobA')!.signals?.structural ?? 0).toBeCloseTo(0, 6) + // blobB now outranks blobA despite the lower cosine similarity. + expect(results[0].blobHash).toBe('blobB') + }) + session.rawDb.close() + }) + + it('blends all four signals according to the supplied weights', async () => { + const { session } = setupDb() + await withDbSession(session, async () => { + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + + const structuralScores = new Map([['blobA', 0.5]]) + const results = await vectorSearch([1, 0, 0, 0], { + topK: 10, + noCache: true, + weightVector: 1, + weightRecency: 0, + weightPath: 0, + weightStructural: 1, + structuralScores, + explain: true, + }) + + // wTotal = 1+0+0+1 = 2; cosine=1, structural=0.5 -> score = (1*1 + 1*0.5) / 2 = 0.75 + expect(results[0].score).toBeCloseTo(0.75, 6) + }) + session.rawDb.close() + }) +}) + +// --------------------------------------------------------------------------- +// blastRadius / relate / similar / unused fixture graph: +// +// file:a.ts --defines--> symbol:A, symbol:B, symbol:C +// symbol:A --calls--> symbol:B --calls--> symbol:C --calls--> external:lib +// file:a.ts --imports--> external:lib +// file:b.ts --imports--> external:lib +// +// --------------------------------------------------------------------------- + +const NODES: GraphNodeRecord[] = [ + { nodeKey: 'file:a.ts', kind: 'file', displayName: 'a.ts', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'file:b.ts', kind: 'file', displayName: 'b.ts', path: 'b.ts', currentBlobHash: 'blobB' }, + { nodeKey: 'symbol:a.ts#A#sig1', kind: 'function', displayName: 'A', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'symbol:a.ts#B#sig2', kind: 'function', displayName: 'B', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'symbol:a.ts#C#sig3', kind: 'function', displayName: 'C', path: 'a.ts', currentBlobHash: 'blobA' }, + { nodeKey: 'external:lib', kind: 'external', displayName: 'lib', isExternal: true }, + { nodeKey: 'external:isolated', kind: 'external', displayName: 'isolated', isExternal: true }, +] + +const EDGES: GraphEdgeRecord[] = [ + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#A#sig1', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'defines' }, + { srcKey: 'symbol:a.ts#A#sig1', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#B#sig2', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#C#sig3', dstKey: 'external:lib', edgeType: 'calls' }, + { srcKey: 'file:a.ts', dstKey: 'external:lib', edgeType: 'imports' }, + { srcKey: 'file:b.ts', dstKey: 'external:lib', edgeType: 'imports' }, +] + +async function withFusionGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { + const { session } = setupDb() + try { + return await withDbSession(session, async () => { + const graph = new SqliteGraphStore() + await graph.replaceAll(NODES, EDGES) + + // Seed embeddings/symbols so semanticNeighborsForNode() has data to rank. + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobA', 10, 1) + session.rawDb.prepare('INSERT INTO blobs (blob_hash, size, indexed_at) VALUES (?, ?, ?)').run('blobB', 10, 1) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobA', 'm', 4, bufFromArray([1, 0, 0, 0])) + session.rawDb.prepare('INSERT INTO embeddings (blob_hash, model, dimensions, vector) VALUES (?, ?, ?, ?)') + .run('blobB', 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobA', 'a.ts') + session.rawDb.prepare('INSERT INTO paths (blob_hash, path) VALUES (?, ?)').run('blobB', 'b.ts') + + const insertSymbol = session.rawDb.prepare( + 'INSERT INTO symbols (blob_hash, start_line, end_line, symbol_name, symbol_kind, language, qualified_name, signature_hash) ' + + 'VALUES (?, ?, ?, ?, ?, ?, ?, ?)', + ) + const symA = insertSymbol.run('blobA', 1, 2, 'A', 'function', 'typescript', 'A', 'sig1').lastInsertRowid as number + const symB = insertSymbol.run('blobA', 3, 4, 'B', 'function', 'typescript', 'B', 'sig2').lastInsertRowid as number + const symC = insertSymbol.run('blobA', 5, 6, 'C', 'function', 'typescript', 'C', 'sig3').lastInsertRowid as number + + const insertSymEmb = session.rawDb.prepare( + 'INSERT INTO symbol_embeddings (symbol_id, model, dimensions, vector) VALUES (?, ?, ?, ?)', + ) + insertSymEmb.run(symA, 'm', 4, bufFromArray([1, 0, 0, 0])) + insertSymEmb.run(symB, 'm', 4, bufFromArray([0.9, 0.1, 0, 0])) + insertSymEmb.run(symC, 'm', 4, bufFromArray([0, 1, 0, 0])) + + return fn(graph, session) + }) + } finally { + session.rawDb.close() + } +} + +describe('blastRadius (Phase 109)', () => { + it('lens=structural returns only structural dependents', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { lens: 'structural', depth: 3 }) + expect(result.resolved.status).toBe('found') + expect(result.structural.map((h) => h.displayName).sort()).toEqual(['A', 'B']) + expect(result.semantic).toEqual([]) + }) + }) + + it('lens=semantic returns only semantically related blobs/symbols', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { lens: 'semantic' }) + expect(result.resolved.status).toBe('found') + expect(result.structural).toEqual([]) + expect(result.semanticSupported).toBe(true) + // C's embedding [0,1,0,0] is closest to nothing in this fixture (A/B are + // orthogonal-ish), but the call should still succeed and return hits sorted by score. + expect(Array.isArray(result.semantic)).toBe(true) + }) + }) + + it('lens=hybrid (default) returns both structural and semantic sections', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'C', { depth: 3 }) + expect(result.lens).toBe('hybrid') + expect(result.structural.map((h) => h.displayName).sort()).toEqual(['A', 'B']) + expect(result.semanticSupported).toBe(true) + }) + }) + + it('returns not-found for an unknown identifier', async () => { + await withFusionGraph(async (graph) => { + const result = await blastRadius(graph, 'does-not-exist') + expect(result.resolved.status).toBe('not-found') + expect(result.structural).toEqual([]) + expect(result.semantic).toEqual([]) + }) + }) +}) + +describe('relate (Phase 109)', () => { + it('returns depth-1 callers, callees, and semantically similar hits', async () => { + await withFusionGraph(async (graph) => { + const result = await relate(graph, 'B') + expect(result.resolved.status).toBe('found') + expect(result.callers.map((h) => h.displayName)).toEqual(['A']) + expect(result.callees.map((h) => h.displayName)).toEqual(['C']) + expect(result.semanticSupported).toBe(true) + // B's embedding [0.9,0.1,0,0] is most similar to A's [1,0,0,0]. + expect(result.similar[0]?.symbolName).toBe('A') + }) + }) +}) + +describe('similar (Phase 109)', () => { + it('lens=structural finds files with overlapping import targets', async () => { + await withFusionGraph(async (graph) => { + const result = await similar(graph, 'a.ts', { lens: 'structural' }) + expect(result.resolved.status).toBe('found') + expect(result.structural.map((h) => h.displayName)).toEqual(['b.ts']) + expect(result.structural[0].shared).toBe(1) + expect(result.semantic).toEqual([]) + }) + }) + + it('lens=semantic ranks by embedding similarity', async () => { + await withFusionGraph(async (graph) => { + const result = await similar(graph, 'A', { lens: 'semantic' }) + expect(result.resolved.status).toBe('found') + expect(result.structural).toEqual([]) + expect(result.semanticSupported).toBe(true) + expect(result.semantic[0]?.symbolName).toBe('B') + }) + }) +}) + +describe('unused (Phase 109)', () => { + it('returns nodes with no inbound calls/imports edges', async () => { + await withFusionGraph(async (graph) => { + const result = await unused(graph) + const keys = result.nodes.map((n) => n.nodeKey).sort() + // A has no inbound calls/imports (only `defines`); file:a.ts/file:b.ts + // have no inbound calls/imports either. B and C are `calls` targets, and + // external:* nodes are excluded. + expect(keys).toEqual(['file:a.ts', 'file:b.ts', 'symbol:a.ts#A#sig1'].sort()) + }) + }) +}) diff --git a/tests/graphTraversal.test.ts b/tests/graphTraversal.test.ts new file mode 100644 index 0000000..a9c198b --- /dev/null +++ b/tests/graphTraversal.test.ts @@ -0,0 +1,212 @@ +/** + * Tests for the Phase 108 traversal primitives (knowledge-graph §6/§8): + * `GraphStore.neighbors/callers/callees/path/subgraph`, the recursive-CTE + * implementations in `SqliteGraphStore`, and the `core/graph/traversal.ts` + * wrappers used by `gitsema graph callers|callees|neighbors|path`. + */ + +import { describe, it, expect, afterEach } from 'vitest' +import { mkdtempSync, rmSync } from 'node:fs' +import { join } from 'node:path' +import { tmpdir } from 'node:os' +import { openDatabaseAt, withDbSession, type DbSession } from '../src/core/db/sqlite.js' +import { SqliteGraphStore } from '../src/core/storage/sqlite/profile.js' +import { callers, callees, neighbors, path as graphPath } from '../src/core/graph/traversal.js' +import type { GraphEdgeRecord, GraphNodeRecord } from '../src/core/storage/types.js' + +// --------------------------------------------------------------------------- +// Fixture graph: +// +// file:a.ts --defines--> symbol:A, symbol:B, symbol:C +// symbol:A --calls--> symbol:B --calls--> symbol:C --calls--> external:lib +// +// --------------------------------------------------------------------------- + +const NODES: GraphNodeRecord[] = [ + { nodeKey: 'file:a.ts', kind: 'file', displayName: 'a.ts', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#A#sig1', kind: 'function', displayName: 'A', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#B#sig2', kind: 'function', displayName: 'B', path: 'a.ts' }, + { nodeKey: 'symbol:a.ts#C#sig3', kind: 'function', displayName: 'C', path: 'a.ts' }, + { nodeKey: 'external:lib', kind: 'external', displayName: 'lib', isExternal: true }, + { nodeKey: 'external:isolated', kind: 'external', displayName: 'isolated', isExternal: true }, +] + +const EDGES: GraphEdgeRecord[] = [ + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#A#sig1', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'defines' }, + { srcKey: 'file:a.ts', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'defines' }, + { srcKey: 'symbol:a.ts#A#sig1', dstKey: 'symbol:a.ts#B#sig2', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#B#sig2', dstKey: 'symbol:a.ts#C#sig3', edgeType: 'calls' }, + { srcKey: 'symbol:a.ts#C#sig3', dstKey: 'external:lib', edgeType: 'calls' }, +] + +function setupFixtureDb(): { session: DbSession; tmpDir: string } { + const tmpDir = mkdtempSync(join(tmpdir(), 'gitsema-graphtraversal-')) + const session = openDatabaseAt(join(tmpDir, 'test.db')) + return { session, tmpDir } +} + +const tmpDirs: string[] = [] +afterEach(() => { + for (const dir of tmpDirs.splice(0)) { + rmSync(dir, { recursive: true, force: true }) + } +}) + +async function withGraph(fn: (graph: SqliteGraphStore, session: DbSession) => Promise): Promise { + const { session, tmpDir } = setupFixtureDb() + tmpDirs.push(tmpDir) + try { + return await withDbSession(session, async () => { + const graph = new SqliteGraphStore() + await graph.replaceAll(NODES, EDGES) + return fn(graph, session) + }) + } finally { + session.rawDb.close() + } +} + +describe('SqliteGraphStore.neighbors', () => { + it('returns depth-1 neighbors by default, in both directions', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('symbol:a.ts#B#sig2') + const keys = hits.map((h) => h.nodeKey).sort() + // B has a `calls` edge from A, a `calls` edge to C, and a `defines` edge from file:a.ts. + expect(keys).toEqual(['file:a.ts', 'symbol:a.ts#A#sig1', 'symbol:a.ts#C#sig3'].sort()) + expect(hits.every((h) => h.depth === 1)).toBe(true) + }) + }) + + it('filters by edge type and direction', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('file:a.ts', { edgeTypes: ['defines'], direction: 'out' }) + const keys = hits.map((h) => h.nodeKey).sort() + expect(keys).toEqual([ + 'symbol:a.ts#A#sig1', + 'symbol:a.ts#B#sig2', + 'symbol:a.ts#C#sig3', + ].sort()) + expect(hits.every((h) => h.edgeType === 'defines')).toBe(true) + }) + }) + + it('expands to deeper depth when requested, capped at 3', async () => { + await withGraph(async (graph) => { + const hits = await graph.neighbors('symbol:a.ts#A#sig1', { edgeTypes: ['calls'], direction: 'out', depth: 10 }) + const keys = hits.map((h) => h.nodeKey).sort() + // depth capped at 3: A -calls-> B -calls-> C -calls-> external:lib + expect(keys).toEqual(['external:lib', 'symbol:a.ts#B#sig2', 'symbol:a.ts#C#sig3'].sort()) + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#C#sig3')).toBe(2) + expect(byKey.get('external:lib')).toBe(3) + }) + }) +}) + +describe('SqliteGraphStore.callers / callees', () => { + it('callers walks `calls` edges backward', async () => { + await withGraph(async (graph) => { + const hits = await graph.callers('symbol:a.ts#C#sig3') + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#A#sig1')).toBe(2) + }) + }) + + it('callees walks `calls` edges forward', async () => { + await withGraph(async (graph) => { + const hits = await graph.callees('symbol:a.ts#A#sig1') + const byKey = new Map(hits.map((h) => [h.nodeKey, h.depth])) + expect(byKey.get('symbol:a.ts#B#sig2')).toBe(1) + expect(byKey.get('symbol:a.ts#C#sig3')).toBe(2) + expect(byKey.get('external:lib')).toBe(3) + }) + }) + + it('respects an explicit depth limit', async () => { + await withGraph(async (graph) => { + const hits = await graph.callees('symbol:a.ts#A#sig1', 1) + expect(hits.map((h) => h.nodeKey)).toEqual(['symbol:a.ts#B#sig2']) + }) + }) +}) + +describe('SqliteGraphStore.path', () => { + it('finds the shortest typed path between two nodes', async () => { + await withGraph(async (graph) => { + const result = await graph.path('symbol:a.ts#A#sig1', 'symbol:a.ts#C#sig3') + expect(result).not.toBeNull() + expect(result!.hops.map((h) => h.nodeKey)).toEqual(['symbol:a.ts#B#sig2', 'symbol:a.ts#C#sig3']) + expect(result!.hops.every((h) => h.edgeType === 'calls' && !h.reversed)).toBe(true) + }) + }) + + it('returns an empty-hop path when from === to', async () => { + await withGraph(async (graph) => { + const result = await graph.path('symbol:a.ts#A#sig1', 'symbol:a.ts#A#sig1') + expect(result).toEqual({ from: 'symbol:a.ts#A#sig1', to: 'symbol:a.ts#A#sig1', hops: [] }) + }) + }) + + it('returns null when no path exists within the depth cap', async () => { + await withGraph(async (graph) => { + const result = await graph.path('external:lib', 'external:isolated') + expect(result).toBeNull() + }) + }) +}) + +describe('SqliteGraphStore.subgraph', () => { + it('returns the node-induced subgraph within depth hops of the seed', async () => { + await withGraph(async (graph) => { + const { nodes, edges } = await graph.subgraph('symbol:a.ts#B#sig2', 1) + const nodeKeys = nodes.map((n) => n.nodeKey).sort() + expect(nodeKeys).toEqual([ + 'file:a.ts', + 'symbol:a.ts#A#sig1', + 'symbol:a.ts#B#sig2', + 'symbol:a.ts#C#sig3', + ].sort()) + expect(edges.some((e) => e.srcKey === 'symbol:a.ts#A#sig1' && e.dstKey === 'symbol:a.ts#B#sig2')).toBe(true) + expect(edges.some((e) => e.srcKey === 'symbol:a.ts#B#sig2' && e.dstKey === 'symbol:a.ts#C#sig3')).toBe(true) + }) + }) +}) + +describe('core/graph/traversal wrappers', () => { + it('resolves a display name to its graph node and traverses callers', async () => { + await withGraph(async (graph) => { + const result = await callers(graph, 'C') + expect(result.resolved.status).toBe('found') + expect(result.hits.map((h) => h.displayName).sort()).toEqual(['A', 'B'].sort()) + }) + }) + + it('resolves a display name and traverses callees', async () => { + await withGraph(async (graph) => { + const result = await callees(graph, 'A', 1) + expect(result.resolved.status).toBe('found') + expect(result.hits.map((h) => h.displayName)).toEqual(['B']) + }) + }) + + it('returns not-found for an unknown identifier', async () => { + await withGraph(async (graph) => { + const result = await neighbors(graph, 'does-not-exist') + expect(result.resolved.status).toBe('not-found') + expect(result.hits).toEqual([]) + }) + }) + + it('finds a path between two symbols by display name', async () => { + await withGraph(async (graph) => { + const result = await graphPath(graph, 'A', 'C') + expect(result.from.status).toBe('found') + expect(result.to.status).toBe('found') + expect(result.path).not.toBeNull() + expect(result.path!.hops.map((h) => h.displayName)).toEqual(['B', 'C']) + }) + }) +})