Skip to content

Commit b8946d5

Browse files
committed
Create AGENTS.md
1 parent 674eaf5 commit b8946d5

1 file changed

Lines changed: 245 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
# Project Guide for AI Coding Agents (Copilot)
2+
3+
This document gives AI coding assistants (like GitHub Copilot) essential context to work effectively in the SharpVector repo: architecture overview, key entry points, docs locations, conventions, and safe practices for adding features or fixing bugs.
4+
5+
## Overview
6+
7+
- Primary package: `Build5Nines.SharpVector` (NuGet: Build5Nines.SharpVector) targeting .NET 8+.
8+
- Optional integrations:
9+
- `Build5Nines.SharpVector.OpenAI` — embeddings via OpenAI/Azure OpenAI.
10+
- `Build5Nines.SharpVector.Ollama` — embeddings via local Ollama server.
11+
- Playground and samples are provided for demos and manual testing.
12+
13+
- Branches: active development occurs on `dev`; confirm before broad changes.
14+
- CI: GitHub Actions workflow `build-release.yml` builds and releases NuGet packages.
15+
16+
## Documentation Locations
17+
18+
- Public docs site sources: `docs/` (MkDocs)
19+
- Index: `docs/docs/index.md`
20+
- Get Started: `docs/docs/get-started/`
21+
- Concepts, Persistence, Text Chunking, Samples, etc. under `docs/docs/`
22+
- Root README: `README.md` — high-level intro and NuGet info.
23+
- Project-specific docs inside src:
24+
- `src/Build5Nines.SharpVector/docs/` — internal docs snippets.
25+
26+
When adding features, update both code and related docs (MkDocs under `docs/docs/...`). Keep docs concise with examples and cross-links.
27+
28+
- `src/SharpVector.sln` — solution file.
29+
- `src/Build5Nines.SharpVector/` — core library
30+
- Embeddings interfaces: `Embeddings/IEmbeddingsGenerator.cs`, `Embeddings/IBatchEmbeddingsGenerator.cs`
31+
- Core DB abstractions:
32+
- `IVectorDatabase.cs` — main interface.
33+
- `VectorDatabaseBase.cs` — common logic.
34+
- `MemoryVectorDatabaseBase.cs`, `MemoryVectorDatabase.cs`, `BasicMemoryVectorDatabase.cs` — in-memory implementations.
35+
- Disk persistence: `BasicDiskMemoryVectorDatabaseBase.cs`, `BasicDiskVectorDatabase.cs`, `DatabaseFile.cs`, `DatabaseInfo.cs`
36+
- Vector comparison (search metrics): `VectorCompare/`
37+
- `IVectorComparer.cs`
38+
- `CosineSimilarityVectorComparerAsync.cs` (default)
39+
- `EuclideanDistanceVectorComparerAsync.cs`
40+
- Preprocessing & Vectorization pipeline: `Preprocessing/`, `Vectorization/`, `Vocabulary/`, `VectorStore/`, `Id/`
41+
- Extensions: `IVectorDatabaseExtensions.cs`
42+
- `src/Build5Nines.SharpVector.OpenAI/` — OpenAI embeddings
43+
- `Embeddings/OpenAIEmbeddingsGenerator.cs`
44+
- Memory DB wrappers using OpenAI: `OpenAIMemoryVectorDatabase*.cs`
45+
- `src/Build5Nines.SharpVector.Ollama/` — Ollama embeddings
46+
- `Embeddings/OllamaEmbeddingsGenerator.cs`
47+
- Memory DB wrappers using Ollama: `OllamaMemoryVectorDatabase*.cs`
48+
- Playground & samples
49+
- `src/Build5Nines.SharpVector.Playground/` — demo app, configurable via `appsettings.json`
50+
- `samples/` and `src/*ConsoleTest`, `*Test` projects — usage examples and tests.
51+
52+
## Typical Usage (Core Library)
53+
54+
- Create DB: `var vdb = new BasicMemoryVectorDatabase();`
55+
- Add text: `vdb.AddText("some text", metadata);` (sync/async variants)
56+
- Search: `var results = vdb.Search("query text");` (uses cosine similarity by default)
57+
- Custom embeddings: Provide your own `IEmbeddingsGenerator` or use OpenAI/Ollama packages.
58+
- Change comparison metric: Supply an `IVectorComparer` (e.g., Euclidean distance) to the DB.
59+
60+
Minimal example:
61+
62+
```csharp
63+
using Build5Nines.SharpVector;
64+
65+
var vdb = new BasicMemoryVectorDatabase();
66+
vdb.AddText("Hello SharpVector", metadata: "sample");
67+
var results = vdb.Search("Hello");
68+
```
69+
70+
## Key Design Concepts
71+
72+
- In-memory first: Default DB stores vectors in memory for speed. Disk-backed options exist for persistence.
73+
- Pluggable pipeline:
74+
- Embeddings generation — interfaces allow external providers.
75+
- Preprocessing — text normalization/tokenization configurable under `Preprocessing/`.
76+
- Vector comparison — swapable similarity metrics via `IVectorComparer`.
77+
- Metadata support: Store arbitrary metadata alongside each text entry.
78+
- Async support: Async APIs exist for scalable operations.
79+
80+
## Conventions & Coding Guidelines
81+
82+
- Language/Runtime: C#, .NET 8+. Use async/await where appropriate.
83+
- Style: Match existing patterns. Avoid wide refactors; make minimal, focused changes.
84+
- Naming: Prefer descriptive names; avoid single-letter variables.
85+
- Comments: Keep code clear; avoid inline comments unless necessary for clarity.
86+
- Errors/exceptions: Use specific exception types like `DatabaseFileException` where applicable.
87+
- Tests: When adding/altering behavior, include or update tests in `src/*Test` projects.
88+
89+
- API stability: Prefer additive changes; avoid breaking public types/methods.
90+
- Nullability: Follow existing project settings; respect nullable context in projects.
91+
- Performance: Avoid allocations in tight loops; prefer spans/arrays where safe.
92+
93+
## How to Add Features Safely
94+
95+
1. Identify extension point:
96+
- New similarity metric → implement `IVectorComparer` under `src/Build5Nines.SharpVector/VectorCompare/` and wire via constructor/config.
97+
- New embeddings provider → implement `IEmbeddingsGenerator` (and optionally `IBatchEmbeddingsGenerator`) under a new package or existing OpenAI/Ollama.
98+
- Persistence enhancement → extend `BasicDiskMemoryVectorDatabaseBase` and update `DatabaseFile` handling.
99+
2. Keep public APIs stable; prefer additive changes.
100+
3. Update docs in `docs/docs/...` with short “Getting Started” and example.
101+
4. Run tests and add focused unit tests for new logic.
102+
103+
Example wiring for custom comparer:
104+
105+
```csharp
106+
var comparer = new EuclideanDistanceVectorComparerAsync();
107+
var vdb = new BasicMemoryVectorDatabase(vectorComparer: comparer);
108+
```
109+
110+
## Bug Fix Workflow
111+
112+
- Reproduce with a minimal sample (Playground or `*ConsoleTest`).
113+
- Locate source by interfaces:
114+
- Insert/search issues → `MemoryVectorDatabase*`, `VectorStore/`, `Vectorization/`.
115+
- Similarity result issues → `VectorCompare/*`.
116+
- Embeddings/provider issues → `Embeddings/*`, OpenAI/Ollama projects.
117+
- Add small unit tests or use BenchmarkDotNet samples for perf-sensitive changes.
118+
- Keep changes minimal; do not alter unrelated behavior.
119+
120+
When fixing bugs, add a regression test under the closest `*Test` project and keep scope tight.
121+
122+
## Performance Notes
123+
124+
- Benchmark artifacts: `BenchmarkDotNet.Artifacts/` and `src/BenchmarkDotNet.Artifacts/` contain previous perf runs.
125+
- Optimize critical paths:
126+
- Avoid unnecessary allocations in comparison loops.
127+
- Prefer span/array operations where safe.
128+
- Batch embeddings when provider supports it.
129+
130+
## Build & Run
131+
132+
- Build solution:
133+
134+
```bash
135+
dotnet build src/SharpVector.sln -c Release
136+
```
137+
138+
- Run Playground:
139+
140+
```bash
141+
dotnet run --project src/Build5Nines.SharpVector.Playground -c Debug
142+
```
143+
144+
- Run tests (adjust if test projects differ):
145+
146+
```bash
147+
dotnet test src/SharpVector.sln
148+
```
149+
150+
Common test projects (names may vary):
151+
- `src/SharpVectorTest/` — unit tests for core library.
152+
- `src/SharpVectorPerformance/` — benchmarks, see `BenchmarkDotNet.Artifacts/`.
153+
154+
155+
## Docs Authoring (MkDocs)
156+
157+
- MkDocs config: `docs/mkdocs.yml`
158+
- Local preview (requires Python + requirements):
159+
160+
```bash
161+
python3 -m venv .venv
162+
source .venv/bin/activate
163+
pip install -r docs/requirements.txt
164+
mkdocs serve -f docs/mkdocs.yml
165+
```
166+
167+
- Theme overrides in `docs/overrides/`.
168+
169+
When adding a feature, include a short “Getting Started” snippet and cross-link to relevant concepts under `docs/docs/`.
170+
171+
## External Integrations
172+
173+
- OpenAI: configure API keys via environment or appsettings in sample apps.
174+
- Ollama: ensure local Ollama server is running and accessible.
175+
176+
OpenAI/Azure OpenAI configuration (samples):
177+
178+
- Environment variables (example):
179+
180+
```bash
181+
export OPENAI_API_KEY="..."
182+
export AZURE_OPENAI_ENDPOINT="https://<your-endpoint>.openai.azure.com"
183+
export AZURE_OPENAI_API_KEY="..."
184+
```
185+
186+
- `appsettings.json` keys for Playground:
187+
188+
```json
189+
{
190+
"OpenAI": {
191+
"ApiKey": "...",
192+
"Model": "text-embedding-3-large"
193+
},
194+
"AzureOpenAI": {
195+
"Endpoint": "https://<your-endpoint>.openai.azure.com",
196+
"ApiKey": "...",
197+
"Deployment": "text-embedding-3-large"
198+
},
199+
"Ollama": {
200+
"Endpoint": "http://localhost:11434",
201+
"Model": "nomic-embed-text"
202+
}
203+
}
204+
```
205+
206+
## Useful Entry Points (Code Navigation)
207+
208+
- `Build5Nines.SharpVector/IVectorDatabase.cs` — core interface for DB operations.
209+
- `Build5Nines.SharpVector/BasicMemoryVectorDatabase.cs` — easiest reference implementation.
210+
- `Build5Nines.SharpVector/VectorCompare/*` — similarity algorithms.
211+
- `Build5Nines.SharpVector.OpenAI/OpenAIMemoryVectorDatabase*.cs` — OpenAI integration.
212+
- `Build5Nines.SharpVector.Ollama/OllamaMemoryVectorDatabase*.cs` — Ollama integration.
213+
214+
Also see:
215+
- `Build5Nines.SharpVector/VectorStore/*` — storage mechanics for vector entries.
216+
- `Build5Nines.SharpVector/Vectorization/*` — local vectorization pipeline.
217+
- `Build5Nines.SharpVector/Preprocessing/*` — normalization/tokenization steps.
218+
- `Build5Nines.SharpVector/Vocabulary/*` — vocabulary handling if applicable.
219+
220+
## Common Pitfalls
221+
222+
- Mismatched vector dimensions between provider and store; validate sizes.
223+
- Forgetting to persist when using disk-backed DB — ensure save/load paths.
224+
- Not disposing resources for external providers (HTTP clients, etc.).
225+
- Assuming synchronous search; prefer async variants in large datasets.
226+
227+
- Vector dimension mismatches between embeddings providers and DB entries.
228+
- Forgetting to persist when using disk-backed DB implementations.
229+
- Not disposing external providers (HTTP clients) in integration packages.
230+
231+
## Maintainers & Attribution
232+
233+
- Maintained by Build5Nines / Chris Pietschmann. MIT licensed.
234+
- Follow CODE_OF_CONDUCT.md. Keep PRs focused and well-documented.
235+
236+
---
237+
238+
If you are an AI agent assisting here: prefer targeted edits, update docs, add tests, and keep public APIs stable. When unsure, propose alternatives and ask for confirmation before broad refactors.
239+
240+
## Contribution & PR Guidance
241+
242+
- Discuss significant changes via issue first; target the `dev` branch.
243+
- Keep PRs focused and small; include tests and docs updates.
244+
- Run `dotnet format` if configured to keep style consistent.
245+
- Ensure CI passes; include clear description and rationale.

0 commit comments

Comments
 (0)