Conversation
|
Thank you for this and the clean implementation midweste, it's well-structured with good error isolation and test coverage. Adding a plugin/extension system is a significant architectural decision that I'd like to think through carefully before committing to it. The project is still young and changing rapidly, and it feels a bit early. Some considerations:
SocratiCode already has context artifacts for extending project knowledge without code changes, which may partially overlap this. I'm not ruling this out for the future, I do like the idea of external plugins. But I'd prefer to think about it alongside a few concrete plugins so there are use cases for it and the API is validated by real usage. I definitely like more the idea of plugins instead of bloated code going beyond its core design, but I'd like it to be very surface-level and not posing any potential security or concern over the core functionality and indexes. I'll keep this open for now for further comments also from other contributors. In the meantime, I'd like to first work more on the core product, smooth out existing bugs and implement core features. |
|
Thanks for the thoughtful review and for keeping the door open. Totally understand wanting to be careful with architectural decisions this early. I want to share the context behind why I built the plugin system — there's a concrete feature driving it. Git Memory Plugin SocratiCode answers "what does this code do?" through semantic search and context artifacts. Git Memory fills a gap: "why was it written this way?" When a project is indexed, it reads unprocessed git commits — diffs, messages, and git-trailers — batches them, and sends them to a configurable LLM (OpenRouter, OpenAI, Google, Ollama) to extract structured memories: architectural decisions, bug fixes, refactors, patterns. These get embedded and stored in the same For example, a search for "authentication middleware" wouldn't just find the code — it would also surface "Switched from JWT to sessions due to XSS vulnerability (commit abc123)." An AI assistant would know not to reintroduce a bug because it can find "This validation was missing — caused production outage, fixed in def456." It's fully opt-in ( I originally built it just as a local project mcp in go, but I see what I've built can compliment a fully indexed and searchable codebase On the Plugin Architecture Git-memory is a first use case for plugin architecture. The interface is intentionally minimal — 4 optional lifecycle hooks — and I think it's actually the right pattern for SocratiCode going forward. Features like this should be isolated modules that plug into the lifecycle, not code wired throughout the core. That keeps the core lean and each feature self-contained. That said, your concerns about I'd love to get your take on the git-memory feature itself — if it makes sense, we can work out the right integration approach together. Happy to discuss. |
|
The Git Memory idea is interesting, and thanks for addressing the security concern. A few thoughts: Feature vs plugin: Git Memory feels more like a core feature behind a flag (like INCLUDE_DOT_FILES or context artifacts) than something that validates a plugin system (more below). LLM dependency: SocratiCode today uses just embeddings. Simple, local, no API keys (by default). Git Memory would need I think a generative LLM to read diffs and extract structured memories, which means configuring providers, API keys, picking models. That's a different level of complexity that I'm trying to avoid. How would it work without that? Existing artifacts: Could raw commit messages + diffs be embedded directly without LLM structuring? How could them be maintained up to date? If possible, implementing a simple script to update a folder with all of that would mean artifacts could already cover it. I keep thinking the git memory is something for the existing artifacts more than anything. Existing tools: most coding AI agents already have access to git — they can run git log, git blame, search commit history dynamically. So is it really a concern for SocratiCode? On plugins generally: I really do like the idea of opening SocratiCode to community extensions — but I think a plugin should be a lighter touch. Something that enriches the index (or the use of it) but without introducing heavy dependencies or high complexity. I'd love to see the Git Memory implementation to understand the architecture better — even if we end up shipping it as a core feature (maybe as part of artifacts) rather than a plugin. Happy to keep discussing :-) |
I understand that perspective too, I never want to be too presumptuous with other peoples projects :)
So currently, I've built it to use openrouter and it evaluates first:
The three steps it does via LLM are: Extraction:
Triage:
Synthesis (where some magic happens) and as currently setup needs a good size context window:
I expect so, but I do think the why is where some of this becomes more valuable, however without any LLM, "git-commit" could be a context artifact of its own and have its own links to search results. "Git memory lite"?
But do they use it is the real question. Ime I have to keep telling it over and over to scan files and tell it what to be scanning myself. MCP's seem to function as first class tools that it will use. I talk to opus a lot about why is doesn't use certain things and it tells me that the more friction it takes to do certain things, the more likely it is to skip it and just do it the "old fashion" way with grep etc. Maybe this is a cost cutting methodology with how they train the models or injected prompts.
I don't really have a preference one way or another for this. The reason I'm here is because I liked what you put together and thought my beta project was a natural fit. I did some dry run AI testing with both systems as MCPs before I even considered porting it, and opus reported a lot of complementary results. I had it dry run mentally a feature addition to a codebase and it used both MCPs and told me what information it would use from both systems and how it would influence how it would go about building the new feature. I do like modularity (even your existing indexer could in theory, be a plugin), but if it works as part of the existing codebase thats fine too as the plugin I submitted are really the only touch points needed.
What's the best way for me to do this? I can push to my fork after I test a couple runs. My gap analysis of the port is nearly covered now. My main motivation for the git memory mcp was that I kinda realized that git is probably one of the best and most available memory systems that a project has at it's disposal. Yes, memories can be superseded by later commits but let's be honest, once it ends up in the repo, its a meaningful thing to remember. |
|
Ok got the basic flow working for memory additions, here's a document i asked opus to make that would show a theoretical feature, it chose "Test coverage plugin" I'm guessing where test coverage information is added to the index. Not even sure if this makes any sense but it does show what it thinks about what its finding: Dry Run: "Add a Test Coverage Plugin"A walkthrough of how an AI agent researches a new feature using SocratiCode. Each step shows the combined results the agent receives, tagged by source. Phase 1: How Do I Create a Plugin?The agent runs two searches in parallel and receives these combined results:
Agent is ready to: scaffold Phase 2: How Do I Store Data?
Agent is ready to: write the storage layer using the correct API, with relative paths, into the shared collection.
Phase 3: Collection & Project Identity
Agent is ready to: use the canonical naming functions instead of constructing collection names manually. Full Knowledge MapEvery piece of knowledge the agent gathered, by source:
Code search provided 7 pieces: implementation contracts, API signatures, working templates. Together: 12 distinct pieces of knowledge from 4 parallel query pairs, zero files opened. |
|
I've been thinking more about this and I think there's an approach that could work well for both the plugin system and Git Memory, and one that I like more as it's a good compromise between SocratiCode philosophy of KISS and an expandable plugins system that doesn't affect or interacts with any of the core features. Plugins as artifact generators What if the plugin contract was simply: "generate files in the context artifacts directory"? SocratiCode's existing pipeline handles embedding, indexing, and search. Plugins just produce the knowledge. The interface could be minimal:
A plugin gets the project path and a directory to write to. It does its thing. SocratiCode indexes whatever it finds there. No Qdrant access, no lifecycle hooks into the indexing pipeline, no core API surface to maintain. This solves the concerns I have: Security: plugins never see Qdrant or any core internals How Git Memory fits Git Memory becomes a perfect first plugin with two modes: Lite (no LLM): runs git log, extracts commits, writes structured markdown artifacts: commit messages, authors, affected files, diffs. No AI processing, just organized git history made searchable. Full (with LLM): same extraction step, then sends batches to a configured LLM for structuring: types, relationships, importance scores. Writes richer artifacts. Both modes just produce markdown files. SocratiCode's hybrid search (embeddings + BM25) handles them naturally: "switched from JWT to sessions due to XSS" will surface when someone searches "authentication security" regardless of format. Structured markdown with good headings actually chunks well for embeddings. The LLM provider configuration stays entirely within the plugin, while SocratiCode core remains embeddings-only. Users who want the full mode configure the plugin separately. What this enables Because the contract is just "generate useful artifacts," other plugins become natural too: API docs from OpenAPI specs, dependency analysis, architecture decision records, CI/CD context. All just file generators, all sandboxed by design. I know this trades power for safety compared to your original plugin system with lifecycle hooks and Qdrant access. But I think that limitation is actually a feature right now, fitting the SocratiCode original philosophy: doing one thing, well. It's enough to validate the concept, and covers the git memory use case fully. What do you think? If this direction works for you, you could share maybe in another dedicated PR this approach for the plugin system and the Git Memory Lite and Full? So there's the plugin system and the first use cases for a simple one and a more complex one needing more configuration (the LLM part should support Openrouter and major providers like Openai compatible, Ollama etc.). |
|
When I get some additional time, I'll injest this a bit more fully and see what changes need to be made. Honestly, I dont think its that much, however the file format may likely benefit from a structure like json. The current system generates links between memories during the final step and adds superseded by type tags to inform the consumer of its "relevance". Couple quick questions:
Maybe the plugin implements a schema that can plug in? Json schema would be useful as the main core could validate input before it adds anything (or maybe thats the domain of the plugin?) The upsert ends up looking like this: {
id, // UUID derived from SHA-256 of "git-memory:{contentHash}"
vector, // embedding of prepareDocumentText(`[${memoryType}] ${summary}`, `git-memory:${filePath}`)
bm25Text, // same text as above (for hybrid search)
payload: {
// ── Context artifact fields (shared with all SocratiCode context) ──
artifactName: "git-memory", // constant — groups all git memories
artifactDescription: "[decision] Git memory (importance: 85) from commits abc123, def456",
filePath: "src/services/auth.ts", // primary file path
relativePath: "git-memory", // constant
content: "Decided to use JWT over sessions because...", // the memory summary
startLine: 0, // not applicable
endLine: 0, // not applicable
language: "git-memory:decision", // "git-memory:{memoryType}"
type: "git-memory", // constant — Qdrant filter key
// ── Git-memory-specific fields ──
contentHash: "a1b2c3d4e5f67890", // 16-char hex for dedup
sourceCommits: ["abc123...", "def456..."], // full commit hashes
filePaths: ["src/services/auth.ts", "src/config.ts"], // all related files
tags: ["authentication", "jwt", "architecture"],
importance: 85, // 0-100
confidence: 70, // 0-100
memoryType: "decision", // one of GIT_MEMORY_TYPES
createdAt: "2025-06-15T10:30:00Z", // ISO date of earliest source commit
},
}Will respond more later, haven't had my second cup of joe yet, so I may be completely offbase here :P |
|
Sorry I forgot to answer your questions:
|
|
Closing this one for now, happy to explore it again, also in consideration of all the core features that are being added and will keep being added, because the product is still new and expanding :-) |
|
Currently just using it as is (adding context meta to qdrant) and getting a handle on what the data looks like after using it on a few repos and what it gives in value. Didn't have time to do the conversion to flat json yet but keeping it in mind. One thing that I noticed and this may be relevant when using a remote qdrant server is that I was hoping to set this us for use as a shared resource for our small development team and the paths are absolute in the qdrant store. This need to be that way? Can we write a file to the source control that identifies the project either by name or hash and have the data be written relatively? Just a thought |
|
The one gap that I saw in initial planning to put it into the existing json files schema is that it that the current json schema doesn't seem to be designed to hold extra meta. This wouldn't allow a plugin to define what data is important in it's plugin scope: These are all fields I didn't see a way to make available in the current schema design but are VERY important to the knowledge of git memory operation and value. Without this, really loses a lot of value |
This was actually addressed already in the recent weeks (you might need to update): the Qdrant payload stores both [filePath] (absolute, used internally for reading the file) and [relativePath] (relative to project root). Search results returned to the AI use the relative path, so what the agent sees is something like src/services/auth.ts (lines 42-67), not /home/dev1/projects/myapp/src/services/auth.ts. The collection naming also doesn't expose your path. By default, the collection name is a SHA-256 hash of the absolute path (e.g. codebase_a1b2c3d4e5f6), not the path itself. So, looking at Qdrant directly, you'll see hash-based names, not filesystem paths. For your shared remote Qdrant use case, the key is [SOCRATICODE_PROJECT_ID]. Set this to a stable team-wide name (e.g. my-project) and every team member's instance will read/write the same Qdrant collections regardless of where they cloned the repo locally. Without it, each developer's different absolute path produces a different hash, so they'd each get their own index. Config example (in your MCP config): With this, the whole team shares one index on your remote Qdrant. The SOCRATICODE_PROJECT_ID is also documented in the [Git Worktrees section] of the README and in the [Environment Variables] table. If you want you can also join on Socraticode Discord: https://discord.gg/5DrMXfNG |
I think there may be a misunderstanding about the constraint here. Qdrant payloads are schemaless, you can store any JSON fields you want. The current code chunks have fields like [filePath], [relativePath], [content], [startLine], etc., and context artifacts add [artifactName], [artifactDescription], [contentHash], but those aren't a rigid schema that limits what can be stored. For a git-memory plugin, you wouldn't need to modify the existing payload structure at all. The plugin would manage its own data in its own Qdrant collection (or in the existing collection with a different [type] field to distinguish git-memory points from code/artifact points). Your PR #12 already exports getClient() from qdrant.ts, so the plugin can upsert points directly with whatever payload fields it needs: sourceCommits, tags, importance, confidence, memoryType, createdAt, all of it. The helper functions like [upsertChunks] are convenience wrappers for code indexing specifically. A plugin doing something fundamentally different (like git memory) would use the Qdrant client directly, which gives full control over the payload. The collection creation and embedding generation utilities are also exported and reusable. So the architecture actually supports what you described already. The plugin system from your PR provides the lifecycle hooks (when to run), and the exported Qdrant client provides the storage layer (where to write). No changes to the existing schema needed. Happy to dig into specifics, there have been many several updates and more in the planning (one of the reasons I was mentioning a plugin system at the moment must be very light touch because the product will keep changing at speed). |
|
Great to hear in regard to paths! In regard to the project ID, it seems some AI client implementation use global mcps for some unknown reason and putting the "SOCRATICODE_PROJECT_ID": "my-project", in the mcp would fix it to one index yes? (currently using Antigravity and there is no per project mcp config). I'll look to see if .socraticode.json in project root allows for project id, that would cover it. I think I misunderstood how .socraticodecontextartifacts.json worked. Not at the computer yet but after looking again, it seems I could generate git memories in a folder to my own json schema and then just add them to .socraticodecontextartifacts.json ? Essentially bypassing the need for direct access to Qdrant. Sorry I'm having to catch up mentally! Then, the "plugin manager" would scale down to just something that triggered hooks? This is probably what you were suggesting before but it went over my head! Apologies |
Project ID with global MCPs Setting SOCRATICODE_PROJECT_ID in the MCP config pins it to that specific index regardless of what path the client resolves. So even if Antigravity uses a global MCP config, adding "SOCRATICODE_PROJECT_ID": "my-project" means every call maps to the same Qdrant collections. That's the intended escape hatch for clients that don't support per-project MCP configs. On .socraticode.json: right now that file only supports linkedProjects (for cross-project search). It doesn't have a projectId field yet. So the env var is the way to go for your setup. Adding projectId to .socraticode.json is a reasonable feature request though if you want it, would cover the case where the env var isn't an option. Context artifacts for git memories No apologies needed! That's exactly the pattern:
The description field in the config is important here because it gets stored in every chunk's payload. So if you set it to something like "Git-derived architectural decisions and code evolution patterns", the AI gets that context when results come back, helping it understand what it's looking at. And yes, the "plugin manager" simplifies down to just triggering hooks. Something like: on codebase_index completion, run the git analysis script, write the output files, and let SocratiCode pick them up on the next codebase_update. No need to touch Qdrant directly at all. The beauty of this approach is it's completely decoupled. Your git-memory tool doesn't even need to know SocratiCode exists. It just writes files. SocratiCode just indexes files. The .socraticodecontextartifacts.json config is the glue. |
|
That approach would simplify some of what I'm doing in the code as well. Currently:
Most recently, I had changed it to write to Qdrant after every generation as if some LLM calls failed I was having to regenerate the whole commit again. Later revisions wrote to qdrant per generation and later altered the existing memories on synthesis. Files would work well because its a semi heavy operation. Having files based on memories makes it idempotent where once memories and generated, they would never have to be regenerated (for instance if the qdrant remote connection was down or something). This would save LLM hits. Also, some AI will just look at files in the codebase anyway and gather knowledge so it would help if the LLM decides its "too hard" to use the MCP. Would run into the same memories then. |
File-based approach That's a great way. The file-based pattern gives you exactly what you described: idempotency (skip already-generated commits), resilience (Qdrant being down doesn't lose work), and discoverability (agents browsing the repo find the memories directly). It's a strictly better architecture than writing to Qdrant on every LLM call. Where this leaves the PR Thinking about it, the file-based approach means your git-memory tool is fully external to SocratiCode. It generates files, you list them in So PR #12 as was scoped (plugin manager with lifecycle hooks) may not be needed anymore for your use case. That said, if after building it out you find there's a small, focused contribution that would help, like a I'd say: build out the git-memory tool with the file-based approach, see how it feels end-to-end, and maybe we can then see how it could be integrated in SocratiCode? That way it maps directly to a real friction point rather than theoretical infrastructure (with one main application only). Looking forward to seeing what you build with it. |
Summary
Adds a PluginManager class that enables SocratiCode to be extended via self-contained plugins without modifying core code. Plugins are auto-discovered from src/plugins/*/index.ts at startup and receive lifecycle hooks. All plugin errors are non-fatal — a failing plugin never affects the indexer.
SocratiCode gives AI agents context about what code does and how it's structured. Plugins extend that context with knowledge that can't be extracted from source files alone — things like why code was written a certain way, which parts of the codebase are most volatile, or what implicit dependencies exist between components. This additional context is stored alongside the existing index and surfaced automatically during search, giving AI agents a deeper understanding of the project without any changes to the core.
Changes
Type of change
Testing
12 tests covering: plugin registration, hook dispatch order, non-fatal error isolation, onProgress forwarding, shutdown resilience, and test reset. No plugins directory = gracefully skipped.
Checklist
Related issues
None