Skip to content

Commit e9ed3aa

Browse files
Merge pull request #78 from Build5Nines/main
merge main back to dev
2 parents 6553071 + e1bbbd6 commit e9ed3aa

10 files changed

Lines changed: 192 additions & 8 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ Add:
1212
- Added `IVectorTextResultItem.Similarity` and marked `IVectorTextResultItem.VectorComparison` obsolete. `VectorComparison` will be removed in the future.
1313
- Added more comment metadata to code
1414

15+
## v2.1.2
16+
17+
Fixed:
18+
19+
- Fixed a bug when loading saved database from file/stream where `IntIdGenerator` or `NumericIdGenerator` lose max Id, resulting in adding new texts to database causes existing texts to be overwritten. This specifically affected `SharpVector.OpenAI` and `SharpVector.Ollama` libraries but the fix is implemented within the core `Build5Nines.SharpVector` library.
20+
1521
## v2.1.1
1622

1723
Add:

README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
1-
# Build5Nines SharpVector - The lightweight, in-memory, Semantic Search, Text Vector Database for any C# / .NET Applications
1+
# Build5Nines SharpVector - The lightweight, in-memory, local, Semantic Search, Text Vector Database for any C# / .NET Applications
22

33
`Build5Nines.SharpVector` is an in-memory vector database library designed for .NET applications. It allows you to store, search, and manage text data using vector representations. The library is customizable and extensible, enabling support for different vector comparison methods, preprocessing techniques, and vectorization strategies.
44

55
[![Release Build](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml/badge.svg)](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml)
66
![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/build5nines/sharpvector)
77

8-
[![NuGet](https://img.shields.io/nuget/v/Build5Nines.SharpVector.svg)](https://www.nuget.org/packages/Build5Nines.SharpVector/)
98
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
109
![Framework: .NET 8+](https://img.shields.io/badge/framework-.NET%208%2B-blue)
1110
![Semantic Search: Enabled](https://img.shields.io/badge/semantic%20search-enabled-purple)
1211
![Gen AI: Ready](https://img.shields.io/badge/Gen%20AI-ready-purple)
1312

1413
Vector databases are used with Semantic Search and [Generative AI](https://build5nines.com/what-is-generative-ai/?utm_source=github&utm_medium=sharpvector) solutions augmenting the LLM (Large Language Model) with the ability to load additional context data with the AI prompt using the [RAG (Retrieval-Augmented Generation)](https://build5nines.com/what-is-retrieval-augmented-generation-rag/?utm_source=github&utm_medium=sharpvector) design pattern.
1514

16-
While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application. Build5Nines SharpVector is the lightweight in-memory Text Vector Database for use in any .NET application that you're looking for!
15+
While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application to provide a local text vector database.
16+
17+
> "For the in-memory vector database, we're using Build5Nines.SharpVector, an excellent open-source project by Chris Pietschmann. SharpVector makes it easy to store and retrieve vectorized data, making it an ideal choice for our sample RAG implementation."
18+
>
19+
> [Tulika Chaudharie, Principal Product Manager at Microsoft for Azure App Service](https://azure.github.io/AppService/2024/09/03/Phi3-vector.html)
20+
21+
Build5Nines SharpVector is the lightweight, local, in-memory Text Vector Database for implementing semantic search into any .NET application!
1722

1823
### [Documentation](https://sharpvector.build5nines.com) | [Get Started](https://sharpvector.build5nines.com/get-started/) | [Samples](https://sharpvector.build5nines.com/samples/)
1924

docs/docs/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ description: The lightweight, in-memory, semantic search, text vector database f
99
[![Release Build](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml/badge.svg)](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml)
1010
![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/build5nines/sharpvector)
1111

12-
[![NuGet](https://img.shields.io/nuget/v/Build5Nines.SharpVector.svg)](https://www.nuget.org/packages/Build5Nines.SharpVector/)
1312
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
1413
![Framework: .NET 8+](https://img.shields.io/badge/framework-.NET%208%2B-blue)
1514
![Semantic Search: Enabled](https://img.shields.io/badge/semantic%20search-enabled-purple)

src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<PackageId>Build5Nines.SharpVector</PackageId>
1010
<PackageProjectUrl>https://sharpvector.build5nines.com</PackageProjectUrl>
1111
<RepositoryUrl>https://github.com/Build5Nines/SharpVector</RepositoryUrl>
12-
<Version>2.1.1</Version>
12+
<Version>2.1.2</Version>
1313
<Description>Lightweight In-memory Vector Database to embed in any .NET Applications</Description>
1414
<Copyright>Copyright (c) 2025 Build5Nines LLC</Copyright>
1515
<PackageReadmeFile>README.md</PackageReadmeFile>
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
namespace Build5Nines.SharpVector.Id;
2+
3+
/// <summary>
4+
/// Interface for ID generators that support setting the most recent generated ID (sequential/numeric style).
5+
/// </summary>
6+
/// <typeparam name="TId">The ID type.</typeparam>
7+
public interface ISequentialIdGenerator<TId> : IIdGenerator<TId>
8+
where TId : notnull
9+
{
10+
/// <summary>
11+
/// Sets the most recent ID value so the next generated ID will continue the sequence.
12+
/// </summary>
13+
/// <param name="mostRecentId">The most recently used/generated ID.</param>
14+
void SetMostRecent(TId mostRecentId);
15+
}

src/Build5Nines.SharpVector/Id/NumericIdGenerator.cs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
namespace Build5Nines.SharpVector.Id;
22

3-
public class NumericIdGenerator<TId> : IIdGenerator<TId>
3+
public class NumericIdGenerator<TId> : ISequentialIdGenerator<TId>
44
where TId : struct
55
{
66
public NumericIdGenerator()
@@ -22,4 +22,11 @@ public TId NewId() {
2222
return _lastId;
2323
}
2424
}
25+
26+
public void SetMostRecent(TId mostRecentId)
27+
{
28+
lock(_lock) {
29+
_lastId = mostRecentId;
30+
}
31+
}
2532
}

src/Build5Nines.SharpVector/MemoryVectorDatabaseBase.cs

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
using Build5Nines.SharpVector.Embeddings;
1212
using System.Runtime.ExceptionServices;
1313
using System.Collections;
14+
using System.Linq;
1415

1516
namespace Build5Nines.SharpVector;
1617

@@ -351,8 +352,18 @@ await DatabaseFile.LoadDatabaseFromZipArchiveAsync(
351352
async (archive) =>
352353
{
353354
await DatabaseFile.LoadVectorStoreAsync(archive, VectorStore);
354-
355355
await DatabaseFile.LoadVocabularyStoreAsync(archive, VectorStore.VocabularyStore);
356+
357+
// Re-initialize the IdGenerator with the max Id value from the VectorStore if it supports sequential numeric IDs
358+
if (_idGenerator is ISequentialIdGenerator<TId> seqIdGen)
359+
{
360+
// Re-seed the sequence only if there are existing IDs
361+
var ids = VectorStore.GetIds();
362+
if (ids.Any())
363+
{
364+
seqIdGen.SetMostRecent(ids.Max()!);
365+
}
366+
}
356367
}
357368
);
358369
}
@@ -708,6 +719,17 @@ await DatabaseFile.LoadDatabaseFromZipArchiveAsync(
708719
async (archive) =>
709720
{
710721
await DatabaseFile.LoadVectorStoreAsync(archive, VectorStore);
722+
723+
// Re-initialize the IdGenerator with the max Id value from the VectorStore if it supports sequential numeric IDs
724+
if (_idGenerator is ISequentialIdGenerator<TId> seqIdGen)
725+
{
726+
// Re-seed the sequence only if there are existing IDs
727+
var ids = VectorStore.GetIds();
728+
if (ids.Any())
729+
{
730+
seqIdGen.SetMostRecent(ids.Max()!);
731+
}
732+
}
711733
}
712734
);
713735
}

src/SharpVectorOpenAITest/BasicOpenAIMemoryVectorDatabaseTest.cs

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
using System.Threading;
88
using System.Threading.Tasks;
99
using System.Collections.Generic;
10+
using System.ClientModel.Primitives;
11+
using System.IO;
12+
using System;
1013

1114
namespace Build5Nines.SharpVector.OpenAI.Tests
1215
{
@@ -20,9 +23,49 @@ public class BasicMemoryVectorDatabaseTest
2023
public void Setup()
2124
{
2225
_mockEmbeddingClient = new Mock<EmbeddingClient>();
26+
27+
// Mock the OpenAI EmbeddingClient to return a deterministic embedding vector
28+
// GenerateEmbeddingAsync(string input, EmbeddingGenerationOptions? options = null, CancellationToken cancellationToken = default)
29+
// returns ClientResult<OpenAIEmbedding>. We create one using the Model Factory helpers.
30+
var embeddingVector = new float[] { 0.1f, 0.2f, 0.3f }; // small deterministic vector for tests
31+
var openAiEmbedding = OpenAIEmbeddingsModelFactory.OpenAIEmbedding(index: 0, vector: embeddingVector);
32+
// Create minimal concrete PipelineResponse implementation to satisfy ClientResult.FromValue without relying on Moq for abstract type
33+
var response = new TestPipelineResponse();
34+
var clientResult = ClientResult.FromValue(openAiEmbedding, response);
35+
36+
_mockEmbeddingClient
37+
.Setup(c => c.GenerateEmbeddingAsync(
38+
It.IsAny<string>(),
39+
It.IsAny<EmbeddingGenerationOptions?>(),
40+
It.IsAny<CancellationToken>()))
41+
.ReturnsAsync(clientResult);
42+
2343
_database = new BasicOpenAIMemoryVectorDatabase(_mockEmbeddingClient.Object);
2444
}
2545

46+
// Minimal headers implementation for TestPipelineResponse
47+
internal class EmptyPipelineResponseHeaders : PipelineResponseHeaders
48+
{
49+
public override IEnumerator<KeyValuePair<string, string>> GetEnumerator() => (new List<KeyValuePair<string,string>>()).GetEnumerator();
50+
public override bool TryGetValue(string name, out string? value) { value = null; return false; }
51+
public override bool TryGetValues(string name, out IEnumerable<string>? values) { values = null; return false; }
52+
}
53+
54+
// Minimal PipelineResponse implementation
55+
internal class TestPipelineResponse : PipelineResponse
56+
{
57+
private Stream? _contentStream = Stream.Null;
58+
private readonly EmptyPipelineResponseHeaders _headers = new EmptyPipelineResponseHeaders();
59+
public override int Status => 200;
60+
public override string ReasonPhrase => "OK";
61+
public override Stream? ContentStream { get => _contentStream; set => _contentStream = value; }
62+
protected override PipelineResponseHeaders HeadersCore => _headers;
63+
public override BinaryData Content => BinaryData.FromBytes(Array.Empty<byte>());
64+
public override BinaryData BufferContent(CancellationToken cancellationToken = default) => Content;
65+
public override ValueTask<BinaryData> BufferContentAsync(CancellationToken cancellationToken = default) => ValueTask.FromResult(Content);
66+
public override void Dispose() { _contentStream?.Dispose(); }
67+
}
68+
2669
[TestMethod]
2770
public void TestInitialization()
2871
{
@@ -40,5 +83,57 @@ public async Task Test_SaveLoad_01()
4083
await _database.LoadFromFileAsync(filename);
4184
}
4285

86+
[TestMethod]
87+
public async Task Test_SaveLoad_TestIds_01()
88+
{
89+
_database.AddText("Sample text for testing IDs.", "111");
90+
_database.AddText("Another sample text for testing IDs.", "222");
91+
92+
var results = _database.Search("testing IDs");
93+
Assert.AreEqual(2, results.Texts.Count());
94+
95+
var filename = "openai_test_saveload_testids_01.b59vdb";
96+
#pragma warning disable CS8604 // Possible null reference argument.
97+
await _database.SaveToFileAsync(filename);
98+
#pragma warning restore CS8604 // Possible null reference argument.
99+
100+
await _database.LoadFromFileAsync(filename);
101+
102+
_database.AddText("A new text after loading to check ID assignment.", "333");
103+
104+
var newResults = _database.Search("testing IDs");
105+
Assert.AreEqual(3, newResults.Texts.Count());
106+
var texts = newResults.Texts.OrderBy(x => x.Metadata).ToArray();
107+
Assert.AreEqual("111", texts[0].Metadata);
108+
Assert.AreEqual("222", texts[1].Metadata);
109+
Assert.AreEqual("333", texts[2].Metadata);
110+
}
111+
112+
[TestMethod]
113+
public async Task Test_SaveLoad_TestIds_02()
114+
{
115+
_database.AddText("Sample text for testing IDs.", "111");
116+
_database.AddText("Another sample text for testing IDs.", "222");
117+
118+
var results = _database.Search("testing IDs");
119+
Assert.AreEqual(2, results.Texts.Count());
120+
121+
var filename = "openai_test_saveload_testids_02.b59vdb";
122+
#pragma warning disable CS8604 // Possible null reference argument.
123+
await _database.SaveToFileAsync(filename);
124+
#pragma warning restore CS8604 // Possible null reference argument.
125+
126+
var newdb = new BasicOpenAIMemoryVectorDatabase(_mockEmbeddingClient.Object);
127+
await newdb.LoadFromFileAsync(filename);
128+
129+
newdb.AddText("A new text after loading to check ID assignment.", "333");
130+
131+
var newResults = newdb.Search("testing IDs");
132+
Assert.AreEqual(3, newResults.Texts.Count());
133+
var texts = newResults.Texts.OrderBy(x => x.Metadata).ToArray();
134+
Assert.AreEqual("111", texts[0].Metadata);
135+
Assert.AreEqual("222", texts[1].Metadata);
136+
Assert.AreEqual("333", texts[2].Metadata);
137+
}
43138
}
44139
}

src/SharpVectorOpenAITest/SharpVectorOpenAITest.csproj

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
</PropertyGroup>
1111

1212
<ItemGroup>
13-
<PackageReference Include="Build5Nines.SharpVector" Version="[2.0.3,3.0.0)" />
13+
<!-- <PackageReference Include="Build5Nines.SharpVector" Version="[2.0.3,3.0.0)" /> -->
1414
<PackageReference Include="coverlet.collector" Version="6.0.0" />
1515
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.8.0" />
1616
<PackageReference Include="Moq" Version="4.20.72" />
@@ -25,6 +25,7 @@
2525

2626
<ItemGroup>
2727
<ProjectReference Include="..\Build5Nines.SharpVector.OpenAI\Build5Nines.SharpVector.OpenAI.csproj" />
28+
<ProjectReference Include="..\Build5Nines.SharpVector\Build5Nines.SharpVector.csproj" />
2829
</ItemGroup>
2930

3031
</Project>

src/SharpVectorTest/VectorDatabaseTests.cs

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,40 @@ public void BasicMemoryVectorDatabase_SaveLoad_01()
174174
Assert.AreEqual(0.3396831452846527, results.Texts.First().Similarity);
175175
}
176176

177+
[TestMethod]
178+
public void BasicMemoryVectorDatabase_SaveLoad_TestIds()
179+
{
180+
var vdb = new BasicMemoryVectorDatabase();
181+
182+
// // Load Vector Database with some sample text
183+
vdb.AddText("The Lion King is a 1994 Disney animated film about a young lion cub named Simba who is the heir to the throne of an African savanna.", "First");
184+
vdb.AddText("Build5Nines is awesome!", "Second");
185+
var results = vdb.Search("Lion King");
186+
187+
Assert.AreEqual(2, results.Texts.Count());
188+
189+
var filename = "BasicMemoryVectorDatabase_SaveLoad_TestIds.b59vdb";
190+
vdb.SaveToFile(filename);
191+
192+
var newvdb = new BasicMemoryVectorDatabase();
193+
newvdb.LoadFromFile(filename);
194+
195+
// Add a new text entry after loading
196+
// This should get the next available ID (3) and not overwrite existing entries
197+
newvdb.AddText("A new string that should be added, not replacing existing one.", "Third");
198+
199+
results = newvdb.Search("Lion King");
200+
201+
Assert.AreEqual(3, results.Texts.Count());
202+
var listOfTexts = results.Texts.OrderBy(x => x.Id).ToArray();
203+
Assert.AreEqual(listOfTexts[0].Id, 1);
204+
Assert.AreEqual(listOfTexts[0].Metadata, "First");
205+
Assert.AreEqual(listOfTexts[1].Id, 2);
206+
Assert.AreEqual(listOfTexts[1].Metadata, "Second");
207+
Assert.AreEqual(listOfTexts[2].Id, 3);
208+
Assert.AreEqual(listOfTexts[2].Metadata, "Third");
209+
}
210+
177211
[TestMethod]
178212
public async Task BasicMemoryVectorDatabase_SaveLoadBinaryStreamAsync_01()
179213
{

0 commit comments

Comments
 (0)