|
| 1 | +# Project Guide for AI Coding Agents (Copilot) |
| 2 | + |
| 3 | +This document gives AI coding assistants (like GitHub Copilot) essential context to work effectively in the SharpVector repo: architecture overview, key entry points, docs locations, conventions, and safe practices for adding features or fixing bugs. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +- Primary package: `Build5Nines.SharpVector` (NuGet: Build5Nines.SharpVector) targeting .NET 8+. |
| 8 | +- Optional integrations: |
| 9 | + - `Build5Nines.SharpVector.OpenAI` — embeddings via OpenAI/Azure OpenAI. |
| 10 | + - `Build5Nines.SharpVector.Ollama` — embeddings via local Ollama server. |
| 11 | +- Playground and samples are provided for demos and manual testing. |
| 12 | + |
| 13 | +- Branches: active development occurs on `dev`; confirm before broad changes. |
| 14 | +- CI: GitHub Actions workflow `build-release.yml` builds and releases NuGet packages. |
| 15 | + |
| 16 | +## Documentation Locations |
| 17 | + |
| 18 | +- Public docs site sources: `docs/` (MkDocs) |
| 19 | + - Index: `docs/docs/index.md` |
| 20 | + - Get Started: `docs/docs/get-started/` |
| 21 | + - Concepts, Persistence, Text Chunking, Samples, etc. under `docs/docs/` |
| 22 | +- Root README: `README.md` — high-level intro and NuGet info. |
| 23 | +- Project-specific docs inside src: |
| 24 | + - `src/Build5Nines.SharpVector/docs/` — internal docs snippets. |
| 25 | + |
| 26 | +When adding features, update both code and related docs (MkDocs under `docs/docs/...`). Keep docs concise with examples and cross-links. |
| 27 | + |
| 28 | +- `src/SharpVector.sln` — solution file. |
| 29 | +- `src/Build5Nines.SharpVector/` — core library |
| 30 | + - Embeddings interfaces: `Embeddings/IEmbeddingsGenerator.cs`, `Embeddings/IBatchEmbeddingsGenerator.cs` |
| 31 | + - Core DB abstractions: |
| 32 | + - `IVectorDatabase.cs` — main interface. |
| 33 | + - `VectorDatabaseBase.cs` — common logic. |
| 34 | + - `MemoryVectorDatabaseBase.cs`, `MemoryVectorDatabase.cs`, `BasicMemoryVectorDatabase.cs` — in-memory implementations. |
| 35 | + - Disk persistence: `BasicDiskMemoryVectorDatabaseBase.cs`, `BasicDiskVectorDatabase.cs`, `DatabaseFile.cs`, `DatabaseInfo.cs` |
| 36 | + - Vector comparison (search metrics): `VectorCompare/` |
| 37 | + - `IVectorComparer.cs` |
| 38 | + - `CosineSimilarityVectorComparerAsync.cs` (default) |
| 39 | + - `EuclideanDistanceVectorComparerAsync.cs` |
| 40 | + - Preprocessing & Vectorization pipeline: `Preprocessing/`, `Vectorization/`, `Vocabulary/`, `VectorStore/`, `Id/` |
| 41 | + - Extensions: `IVectorDatabaseExtensions.cs` |
| 42 | +- `src/Build5Nines.SharpVector.OpenAI/` — OpenAI embeddings |
| 43 | + - `Embeddings/OpenAIEmbeddingsGenerator.cs` |
| 44 | + - Memory DB wrappers using OpenAI: `OpenAIMemoryVectorDatabase*.cs` |
| 45 | +- `src/Build5Nines.SharpVector.Ollama/` — Ollama embeddings |
| 46 | + - `Embeddings/OllamaEmbeddingsGenerator.cs` |
| 47 | + - Memory DB wrappers using Ollama: `OllamaMemoryVectorDatabase*.cs` |
| 48 | +- Playground & samples |
| 49 | + - `src/Build5Nines.SharpVector.Playground/` — demo app, configurable via `appsettings.json` |
| 50 | + - `samples/` and `src/*ConsoleTest`, `*Test` projects — usage examples and tests. |
| 51 | + |
| 52 | +## Typical Usage (Core Library) |
| 53 | + |
| 54 | +- Create DB: `var vdb = new BasicMemoryVectorDatabase();` |
| 55 | +- Add text: `vdb.AddText("some text", metadata);` (sync/async variants) |
| 56 | +- Search: `var results = vdb.Search("query text");` (uses cosine similarity by default) |
| 57 | +- Custom embeddings: Provide your own `IEmbeddingsGenerator` or use OpenAI/Ollama packages. |
| 58 | +- Change comparison metric: Supply an `IVectorComparer` (e.g., Euclidean distance) to the DB. |
| 59 | + |
| 60 | +Minimal example: |
| 61 | + |
| 62 | +```csharp |
| 63 | +using Build5Nines.SharpVector; |
| 64 | + |
| 65 | +var vdb = new BasicMemoryVectorDatabase(); |
| 66 | +vdb.AddText("Hello SharpVector", metadata: "sample"); |
| 67 | +var results = vdb.Search("Hello"); |
| 68 | +``` |
| 69 | + |
| 70 | +## Key Design Concepts |
| 71 | + |
| 72 | +- In-memory first: Default DB stores vectors in memory for speed. Disk-backed options exist for persistence. |
| 73 | +- Pluggable pipeline: |
| 74 | + - Embeddings generation — interfaces allow external providers. |
| 75 | + - Preprocessing — text normalization/tokenization configurable under `Preprocessing/`. |
| 76 | + - Vector comparison — swapable similarity metrics via `IVectorComparer`. |
| 77 | +- Metadata support: Store arbitrary metadata alongside each text entry. |
| 78 | +- Async support: Async APIs exist for scalable operations. |
| 79 | + |
| 80 | +## Conventions & Coding Guidelines |
| 81 | + |
| 82 | +- Language/Runtime: C#, .NET 8+. Use async/await where appropriate. |
| 83 | +- Style: Match existing patterns. Avoid wide refactors; make minimal, focused changes. |
| 84 | +- Naming: Prefer descriptive names; avoid single-letter variables. |
| 85 | +- Comments: Keep code clear; avoid inline comments unless necessary for clarity. |
| 86 | +- Errors/exceptions: Use specific exception types like `DatabaseFileException` where applicable. |
| 87 | +- Tests: When adding/altering behavior, include or update tests in `src/*Test` projects. |
| 88 | + |
| 89 | +- API stability: Prefer additive changes; avoid breaking public types/methods. |
| 90 | +- Nullability: Follow existing project settings; respect nullable context in projects. |
| 91 | +- Performance: Avoid allocations in tight loops; prefer spans/arrays where safe. |
| 92 | + |
| 93 | +## How to Add Features Safely |
| 94 | + |
| 95 | +1. Identify extension point: |
| 96 | + - New similarity metric → implement `IVectorComparer` under `src/Build5Nines.SharpVector/VectorCompare/` and wire via constructor/config. |
| 97 | + - New embeddings provider → implement `IEmbeddingsGenerator` (and optionally `IBatchEmbeddingsGenerator`) under a new package or existing OpenAI/Ollama. |
| 98 | + - Persistence enhancement → extend `BasicDiskMemoryVectorDatabaseBase` and update `DatabaseFile` handling. |
| 99 | +2. Keep public APIs stable; prefer additive changes. |
| 100 | +3. Update docs in `docs/docs/...` with short “Getting Started” and example. |
| 101 | +4. Run tests and add focused unit tests for new logic. |
| 102 | + |
| 103 | +Example wiring for custom comparer: |
| 104 | + |
| 105 | +```csharp |
| 106 | +var comparer = new EuclideanDistanceVectorComparerAsync(); |
| 107 | +var vdb = new BasicMemoryVectorDatabase(vectorComparer: comparer); |
| 108 | +``` |
| 109 | + |
| 110 | +## Bug Fix Workflow |
| 111 | + |
| 112 | +- Reproduce with a minimal sample (Playground or `*ConsoleTest`). |
| 113 | +- Locate source by interfaces: |
| 114 | + - Insert/search issues → `MemoryVectorDatabase*`, `VectorStore/`, `Vectorization/`. |
| 115 | + - Similarity result issues → `VectorCompare/*`. |
| 116 | + - Embeddings/provider issues → `Embeddings/*`, OpenAI/Ollama projects. |
| 117 | +- Add small unit tests or use BenchmarkDotNet samples for perf-sensitive changes. |
| 118 | +- Keep changes minimal; do not alter unrelated behavior. |
| 119 | + |
| 120 | +When fixing bugs, add a regression test under the closest `*Test` project and keep scope tight. |
| 121 | + |
| 122 | +## Performance Notes |
| 123 | + |
| 124 | +- Benchmark artifacts: `BenchmarkDotNet.Artifacts/` and `src/BenchmarkDotNet.Artifacts/` contain previous perf runs. |
| 125 | +- Optimize critical paths: |
| 126 | + - Avoid unnecessary allocations in comparison loops. |
| 127 | + - Prefer span/array operations where safe. |
| 128 | + - Batch embeddings when provider supports it. |
| 129 | + |
| 130 | +## Build & Run |
| 131 | + |
| 132 | +- Build solution: |
| 133 | + |
| 134 | +```bash |
| 135 | +dotnet build src/SharpVector.sln -c Release |
| 136 | +``` |
| 137 | + |
| 138 | +- Run Playground: |
| 139 | + |
| 140 | +```bash |
| 141 | +dotnet run --project src/Build5Nines.SharpVector.Playground -c Debug |
| 142 | +``` |
| 143 | + |
| 144 | +- Run tests (adjust if test projects differ): |
| 145 | + |
| 146 | +```bash |
| 147 | +dotnet test src/SharpVector.sln |
| 148 | +``` |
| 149 | + |
| 150 | +Common test projects (names may vary): |
| 151 | +- `src/SharpVectorTest/` — unit tests for core library. |
| 152 | +- `src/SharpVectorPerformance/` — benchmarks, see `BenchmarkDotNet.Artifacts/`. |
| 153 | + |
| 154 | + |
| 155 | +## Docs Authoring (MkDocs) |
| 156 | + |
| 157 | +- MkDocs config: `docs/mkdocs.yml` |
| 158 | +- Local preview (requires Python + requirements): |
| 159 | + |
| 160 | +```bash |
| 161 | +python3 -m venv .venv |
| 162 | +source .venv/bin/activate |
| 163 | +pip install -r docs/requirements.txt |
| 164 | +mkdocs serve -f docs/mkdocs.yml |
| 165 | +``` |
| 166 | + |
| 167 | +- Theme overrides in `docs/overrides/`. |
| 168 | + |
| 169 | +When adding a feature, include a short “Getting Started” snippet and cross-link to relevant concepts under `docs/docs/`. |
| 170 | + |
| 171 | +## External Integrations |
| 172 | + |
| 173 | +- OpenAI: configure API keys via environment or appsettings in sample apps. |
| 174 | +- Ollama: ensure local Ollama server is running and accessible. |
| 175 | + |
| 176 | +OpenAI/Azure OpenAI configuration (samples): |
| 177 | + |
| 178 | +- Environment variables (example): |
| 179 | + |
| 180 | +```bash |
| 181 | +export OPENAI_API_KEY="..." |
| 182 | +export AZURE_OPENAI_ENDPOINT="https://<your-endpoint>.openai.azure.com" |
| 183 | +export AZURE_OPENAI_API_KEY="..." |
| 184 | +``` |
| 185 | + |
| 186 | +- `appsettings.json` keys for Playground: |
| 187 | + |
| 188 | +```json |
| 189 | +{ |
| 190 | + "OpenAI": { |
| 191 | + "ApiKey": "...", |
| 192 | + "Model": "text-embedding-3-large" |
| 193 | + }, |
| 194 | + "AzureOpenAI": { |
| 195 | + "Endpoint": "https://<your-endpoint>.openai.azure.com", |
| 196 | + "ApiKey": "...", |
| 197 | + "Deployment": "text-embedding-3-large" |
| 198 | + }, |
| 199 | + "Ollama": { |
| 200 | + "Endpoint": "http://localhost:11434", |
| 201 | + "Model": "nomic-embed-text" |
| 202 | + } |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +## Useful Entry Points (Code Navigation) |
| 207 | + |
| 208 | +- `Build5Nines.SharpVector/IVectorDatabase.cs` — core interface for DB operations. |
| 209 | +- `Build5Nines.SharpVector/BasicMemoryVectorDatabase.cs` — easiest reference implementation. |
| 210 | +- `Build5Nines.SharpVector/VectorCompare/*` — similarity algorithms. |
| 211 | +- `Build5Nines.SharpVector.OpenAI/OpenAIMemoryVectorDatabase*.cs` — OpenAI integration. |
| 212 | +- `Build5Nines.SharpVector.Ollama/OllamaMemoryVectorDatabase*.cs` — Ollama integration. |
| 213 | + |
| 214 | +Also see: |
| 215 | +- `Build5Nines.SharpVector/VectorStore/*` — storage mechanics for vector entries. |
| 216 | +- `Build5Nines.SharpVector/Vectorization/*` — local vectorization pipeline. |
| 217 | +- `Build5Nines.SharpVector/Preprocessing/*` — normalization/tokenization steps. |
| 218 | +- `Build5Nines.SharpVector/Vocabulary/*` — vocabulary handling if applicable. |
| 219 | + |
| 220 | +## Common Pitfalls |
| 221 | + |
| 222 | +- Mismatched vector dimensions between provider and store; validate sizes. |
| 223 | +- Forgetting to persist when using disk-backed DB — ensure save/load paths. |
| 224 | +- Not disposing resources for external providers (HTTP clients, etc.). |
| 225 | +- Assuming synchronous search; prefer async variants in large datasets. |
| 226 | + |
| 227 | +- Vector dimension mismatches between embeddings providers and DB entries. |
| 228 | +- Forgetting to persist when using disk-backed DB implementations. |
| 229 | +- Not disposing external providers (HTTP clients) in integration packages. |
| 230 | + |
| 231 | +## Maintainers & Attribution |
| 232 | + |
| 233 | +- Maintained by Build5Nines / Chris Pietschmann. MIT licensed. |
| 234 | +- Follow CODE_OF_CONDUCT.md. Keep PRs focused and well-documented. |
| 235 | + |
| 236 | +--- |
| 237 | + |
| 238 | +If you are an AI agent assisting here: prefer targeted edits, update docs, add tests, and keep public APIs stable. When unsure, propose alternatives and ask for confirmation before broad refactors. |
| 239 | + |
| 240 | +## Contribution & PR Guidance |
| 241 | + |
| 242 | +- Discuss significant changes via issue first; target the `dev` branch. |
| 243 | +- Keep PRs focused and small; include tests and docs updates. |
| 244 | +- Run `dotnet format` if configured to keep style consistent. |
| 245 | +- Ensure CI passes; include clear description and rationale. |
0 commit comments