docs: Add AI-native knowledge layer architectural design report by bashandbone · Pull Request #71 · knitli/thread

bashandbone · 2026-02-21T19:13:08Z

Summary

This PR adds a comprehensive architectural design document for an AI-native knowledge layer that would enhance Thread's code intelligence capabilities. The document explores how to move beyond file-centric abstractions to enable AI agents to interact with code through semantically rich, graph-structured representations.

Key Changes

New document: docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md (1054 lines)
- Executive summary establishing the thesis that files limit AI agent effectiveness
- Problem statement detailing context window saturation, hidden relationships, and redundant re-parsing
- State-of-the-art analysis covering Unison's content-addressed definitions, semantic code graphs (Kythe, SCIP, Glean), and AI agent ecosystem trends
- Three architectural options analyzed:
  - Option A: Graph-enhanced file model (evolutionary, low effort)
  - Option B: Content-addressed definition store (transformative, very high effort)
  - Option C: Multi-resolution knowledge layer (recommended, medium-high effort)
- Detailed recommendation for Option C with five abstraction levels (file index → definitions → semantic graph → architectural patterns → intent/contracts)
- Key design decisions covering atomic units, source of truth model, MCP tool interface, context pack format, storage architecture, and update propagation
- Implementation strategy with proposed crate organization and dependency graph
- Risk analysis covering technical, adoption, and constitutional risks
- Appendices with comparison matrix, landscape research sources, and specialist review findings

Notable Details

Multi-resolution architecture: Proposes simultaneous representation at five levels of abstraction, each content-addressed and incrementally updated
Git compatibility: Maintains Git as source of truth for content while establishing knowledge layer as authoritative for meaning/relationships
Incremental delivery: Designed as extension to existing 001-realtime-code-graph spec, with phased rollout from Phase 1 (file index + semantic graph) through Phase 5 (intent inference)
MCP-native interface: Exposes knowledge layer as Model Context Protocol tools for universal AI agent integration
Context compression: Targets 10-100x token reduction through AI Context Packs—pre-assembled, task-optimized subgraphs
Specialist review integration: Incorporates findings from three independent reviews addressing storage overhead, context pack assembly, and adoption risks

Status

Document is marked as Draft (v0.1.0) in Brainstorm / Options Analysis status. Serves as foundation for architectural decision-making and future implementation planning.

https://claude.ai/code/session_01XrLAbKgucdCJMzfcQRqwJd

Summary by Sourcery

Add a draft architectural design document proposing a multi-resolution AI-native knowledge layer to enhance code intelligence, including evaluated options, a recommended approach, and phased implementation and risk analysis.

Documentation:

Introduce a comprehensive AI-native knowledge layer architectural design report outlining goals, state of the art, design options, and a recommended multi-resolution architecture.
Document an implementation and rollout strategy, including crate organization, dependency structure, phased delivery plan, and validation milestones for the knowledge layer.
Capture specialist review findings with revised risk assessments, storage and performance considerations, and adjustments to API design and project scope for the minimum viable knowledge layer.

Comprehensive brainstorm and options analysis for a multi-resolution knowledge layer that replaces the file as the primary unit of code intelligence for AI agents. Analyzes three architectural options (evolutionary, Unison-inspired, multi-resolution) and recommends a layered approach building on the 001-realtime-code-graph foundation. https://claude.ai/code/session_01XrLAbKgucdCJMzfcQRqwJd

Three independent specialist reviews (Rust systems architect, AI agent integration specialist, product strategist) identified 11 key findings: - Storage overhead underestimated 4-10x; revised to metadata-only L1 - Context packs need two-phase protocol (manifest then fetch) - MCP tools restructured from 20 tools/5 levels to 12 tools/3 tiers - L3-L4 relabeled as research spikes with go/no-go gates - Graceful degradation path required (hybrid tools with fallback) - Definition extraction via tree-sitter tags.scm, not custom extractors - Crate count reduced from 4 new to 1 new (thread-definitions) - Cross-session persistence claim scoped to codebase state only - MVKL defined: L0-L2 + 5 tools + 4 languages + Postgres - OSS/commercial boundary defined (L0-L2 OSS, L3-L4 commercial) - Revised phasing with parallel P1/P2 and validation-first P3 https://claude.ai/code/session_01XrLAbKgucdCJMzfcQRqwJd

sourcery-ai · 2026-02-21T19:13:14Z

Reviewer's Guide

Adds a new, draft architectural design document describing an AI-native multi-resolution knowledge layer for Thread, including problem framing, survey of related work, three architectural options (A/B/C), a recommendation of Option C, and detailed design/implementation considerations aligned with the existing 001-realtime-code-graph spec.

Sequence diagram for MCP-based two-phase context pack retrieval

sequenceDiagram
  actor Agent
  participant MCPServer
  participant API as Thread_API
  participant Graph as Thread_Graph_L2
  participant Defs as Thread_Definitions_L1
  participant AST as Thread_AST_Engine
  participant Store as Storage_Backend

  Agent->>MCPServer: thread_context_plan(focal_point, depth, budget)
  MCPServer->>API: context_plan(focal_point, depth, budget)
  API->>Graph: load_subgraph(focal_point, depth)
  Graph->>Store: query_graph_nodes_and_edges(focal_point, depth)
  Store-->>Graph: graph_nodes_and_edges
  Graph-->>API: candidate_definitions_with_relevance
  API->>Defs: get_definition_metadata(definition_ids)
  Defs-->>API: definitions_with_token_estimates
  API-->>MCPServer: ContextPlanManifest(definitions, estimates, cursors)
  MCPServer-->>Agent: ContextPlanManifest

  Agent->>MCPServer: thread_context_fetch(selected_definition_hashes)
  MCPServer->>API: context_fetch(selected_definition_hashes)
  API->>Defs: resolve_file_and_ranges(selected_definition_hashes)
  Defs-->>API: file_ranges
  API->>AST: parse_files_if_needed(file_ids)
  AST->>Store: load_file_contents(file_ids)
  Store-->>AST: file_contents
  AST-->>API: definition_snippets
  API-->>MCPServer: ContextPack(definitions, edges, architecture_metadata)
  MCPServer-->>Agent: ContextPack

Class diagram for proposed crates and their responsibilities in the knowledge layer

classDiagram
  class thread_ast_engine {
    +parse_file(path: string) AST
    +incremental_parse(path: string, previous_ast: AST) AST
  }

  class thread_language {
    +detect_language(path: string) Language
    +get_grammar(language: Language) Grammar
  }

  class thread_flow {
    +run_pipeline(pipeline_id: string, inputs: FlowInputs) FlowResult
  }

  class thread_graph {
    +add_nodes(nodes: NodeList) void
    +add_edges(edges: EdgeList) void
    +subgraph_from_focal(focal_point: string, depth: int) SubGraph
    +callers(symbol: string) NodeList
    +callees(symbol: string) NodeList
  }

  class thread_storage {
    +get_file(path: string) FileRecord
    +put_file(record: FileRecord) void
    +get_definition(hash: string) DefinitionRecord
    +put_definition(record: DefinitionRecord) void
    +query_graph(query: GraphQuery) GraphResult
  }

  class thread_definitions {
    +extract_definitions(ast: AST, language: Language) DefinitionList
    +compute_hash(ast_subtree: AST) string
    +store_definitions(definitions: DefinitionList) void
    +get_definition_metadata(hash: string) DefinitionMetadata
  }

  class thread_api {
    +context_plan(focal_point: string, depth: int, budget: int) ContextPlanManifest
    +context_fetch(definition_hashes: StringList) ContextPack
    +search(query: string, language: Language) SearchResults
    +callers(symbol: string) CallerList
    +status() StatusReport
  }

  class thread_projection {
    +project_to_files(definitions: DefinitionList) FileEdits
    +generate_context_pack(subgraph: SubGraph, budget: int) ContextPack
    +generate_docs(definitions: DefinitionList) Documentation
  }

  class thread_mcp {
    +serve(address: string) void
    +handle_tool_call(tool_name: string, params: ToolParams) ToolResult
  }

  thread_definitions --> thread_ast_engine : uses
  thread_definitions --> thread_language : uses
  thread_definitions --> thread_storage : persists

  thread_graph --> thread_storage : persists

  thread_projection --> thread_graph : reads
  thread_projection --> thread_definitions : reads

  thread_api --> thread_graph : queries
  thread_api --> thread_definitions : queries
  thread_api --> thread_projection : uses
  thread_api --> thread_storage : reads

  thread_mcp --> thread_api : delegates
  thread_flow --> thread_ast_engine : uses
  thread_flow --> thread_definitions : orchestrates
  thread_flow --> thread_graph : orchestrates

Flow diagram for ingestion and update propagation across knowledge layer levels

flowchart TD
  A[Source_file_changed] --> B[Compute_file_hash_with_Blake3]
  B --> C{File_hash_changed?}
  C -- No --> Z[No_further_action]
  C -- Yes --> D[Update_Level_0_File_index]

  D --> E[Parse_file_with_tree_sitter]
  E --> F[Extract_definitions_with_tree_sitter_tags]
  F --> G[Update_Level_1_Definitions_metadata]

  G --> H[Rebuild_affected_graph_edges]
  H --> I[Update_Level_2_Semantic_graph]

  I --> J{Schedule_pattern_redetection?}
  J -- Yes --> K[Recompute_affected_patterns]
  J -- No --> L[Skip_pattern_update]

  K --> M[Update_Level_3_Architectural_patterns]

  M --> N{Schedule_intent_reinference?}
  N -- Yes --> O[Reinfer_intent_and_contracts]
  N -- No --> P[Skip_intent_update]

  O --> Q[Update_Level_4_Intent_and_contracts]

  subgraph Latency_Tiers
    D
    G
    I
    K
    O
  end

File-Level Changes

Change	Details	Files
Introduce a comprehensive architecture design report for an AI-native knowledge layer, recommending a multi-resolution, graph-based model that extends the existing realtime code graph.	Describe limitations of file-centric code representations for AI agents and define success criteria for a knowledge layer. Survey state-of-the-art systems in semantic code graphs, content-addressed code, projectional editing, and AI agent tooling to ground the design. Define and analyze three architectural options (graph-enhanced file model, content-addressed definition store, and a multi-resolution knowledge layer) with tradeoffs and effort levels. Recommend Option C (multi-resolution knowledge layer) with five abstraction levels, dual source-of-truth model (Git for content, knowledge layer for meaning), and MCP-based AI interfaces. Specify key design decisions around definition-level content addressing, context pack format, storage architecture, update propagation, and MCP tool surface. Outline an implementation strategy with proposed Rust crate/dependency structure that builds on the 001-realtime-code-graph spec and existing Thread components. Provide risk analysis, open questions, comparison matrix, research landscape, and synthesized specialist review feedback that adjusts storage estimates, tool design, phasing, and scope.	`docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

There are conflicting claims about context compression (10–100x vs. revised 5–15x and 2–5x in the specialist findings); consider harmonizing these numbers so the document presents a single, consistent set of expectations.
The MCP tool design is described twice (a detailed level-based suite and then a revised 3‑tier suite in the review findings); it would help to clearly mark the earlier design as superseded or collapse the document to only describe the current 3‑tier plan to avoid confusion.
Given that L3/L4 are now framed as research spikes rather than planned engineering phases, you might want to simplify the main phased roadmap and push most L3/L4 detail into a dedicated ‘Future Work / Research’ section to keep the core proposal focused on the MVKL scope.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There are conflicting claims about context compression (10–100x vs. revised 5–15x and 2–5x in the specialist findings); consider harmonizing these numbers so the document presents a single, consistent set of expectations.
- The MCP tool design is described twice (a detailed level-based suite and then a revised 3‑tier suite in the review findings); it would help to clearly mark the earlier design as superseded or collapse the document to only describe the current 3‑tier plan to avoid confusion.
- Given that L3/L4 are now framed as research spikes rather than planned engineering phases, you might want to simplify the main phased roadmap and push most L3/L4 detail into a dedicated ‘Future Work / Research’ section to keep the core proposal focused on the MVKL scope.

## Individual Comments

### Comment 1
<location> `docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md:106` </location>
<code_context>
+
+**Sourcegraph SCIP**: Protobuf schema with human-readable symbol IDs. 10x faster indexing and 4-5x smaller indexes than LSIF. Powers Sourcegraph's Code Graph capturing inheritance, service dependencies, and API interactions.
+
+**Meta Glean**: General-purpose fact collection using RocksDB and declarative Angle queries. Indexes diffs as "diff sketches" for semantic search over commits. Hundreds-of-microseconds query latency at massive scale.
+
+### Graph-Integrated LLM Inference (CGM, NeurIPS 2025)
</code_context>

<issue_to_address>
**nitpick (typo):** Consider rephrasing "Hundreds-of-microseconds" to the more standard "hundreds of microseconds".

The hyphenated form reads like an editing artifact. Using "hundreds of microseconds" instead better follows standard technical writing conventions and improves readability.

```suggestion
**Meta Glean**: General-purpose fact collection using RocksDB and declarative Angle queries. Indexes diffs as "diff sketches" for semantic search over commits. Hundreds of microseconds query latency at massive scale.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

Copilot

Pull request overview

This PR adds a comprehensive architectural design document exploring an AI-native knowledge layer for Thread's code intelligence capabilities. The document presents a well-researched analysis of how to move beyond file-centric abstractions to enable AI agents to interact with code through semantically rich, graph-structured representations.

Changes:

Adds 1,054-line architectural design document exploring three options for enhancing Thread's code intelligence with an AI-native knowledge layer
Recommends Option C (Multi-Resolution Knowledge Layer) with five abstraction levels from file index to intent/contracts
Includes specialist review findings from three independent reviewers addressing storage overhead, context pack assembly, and adoption risks
Provides revised phasing, crate organization (reduced from 4 new crates to 1), and realistic compression estimates (5-15x vs. initial 100x claims)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md

bashandbone · 2026-02-21T19:22:28Z

@claude please consider the following feedback from sourcery and make any appropriate changes to the proposal:

There are conflicting claims about context compression (10–100x vs. revised 5–15x and 2–5x in the > specialist findings); consider harmonizing these numbers so the document presents a single,
consistent set of expectations.
The MCP tool design is described twice (a detailed level-based suite and then a revised 3‑tier
suite in the review findings); it would help to clearly mark the earlier design as superseded or
collapse the document to only describe the current 3‑tier plan to avoid confusion.
Given that L3/L4 are now framed as research spikes rather than planned engineering phases, you
might want to simplify the main phased roadmap and push most L3/L4 detail into a dedicated
‘Future Work / Research’ section to keep the core proposal focused on the MVKL scope.

They make good points that you should examine and potentially correct or incorporate. We need to ensure that plan documents accurately reflect final resolved input and are clearly separated from preliminary discussion/initial ideas from initial brainstorming.

…feedback Three structural issues addressed: 1. Context compression numbers harmonized: all references now consistently state 5-15x for targeted queries, 2-5x for exploratory work. Removed conflicting 10-100x, 10-50x, and 1/100th claims from the main body. 2. MCP tool design consolidated: replaced the preliminary 5-level/20-tool brainstorm with the revised 3-tier/12-tool design directly in the main body. Two-phase context pack protocol (manifest → fetch) now presented as the primary design, not as a review finding override. 3. L3-L4 detail moved to dedicated "Future Work / Research" section: main body now focuses on MVKL scope (L0-L2). L3-L4 framed as time-boxed research spikes with explicit go/no-go criteria. Phased delivery table updated to reflect MVKL-first approach. Also: version bumped to 0.2.0, crate organization simplified to 1 new crate, storage estimates revised to metadata-only L1, architecture diagram updated to show MVKL scope boundaries. https://claude.ai/code/session_01XrLAbKgucdCJMzfcQRqwJd

claude added 2 commits February 21, 2026 18:58

Copilot AI review requested due to automatic review settings February 21, 2026 19:13

Copilot started reviewing on behalf of bashandbone February 21, 2026 19:13 View session

sourcery-ai bot reviewed Feb 21, 2026

View reviewed changes

docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md Outdated Show resolved Hide resolved

Update docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md

03182ad

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

Copilot AI reviewed Feb 21, 2026

View reviewed changes

docs/architecture/AI_KNOWLEDGE_LAYER_DESIGN.md Show resolved Hide resolved

bashandbone merged commit 591f187 into main Feb 21, 2026
16 of 18 checks passed

bashandbone deleted the claude/ai-knowledge-layer-design-fgs2p branch February 21, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

docs: Add AI-native knowledge layer architectural design report#71

docs: Add AI-native knowledge layer architectural design report#71
bashandbone merged 4 commits intomainfrom
claude/ai-knowledge-layer-design-fgs2p

bashandbone commented Feb 21, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Feb 21, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

bashandbone commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

bashandbone commented Feb 21, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Notable Details

Status

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for MCP-based two-phase context pack retrieval

Class diagram for proposed crates and their responsibilities in the knowledge layer

Flow diagram for ingestion and update propagation across knowledge layer levels

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

bashandbone commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bashandbone commented Feb 21, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 21, 2026 •

edited

Loading