Add Multi2VecGoogleGemini vectorizer configuration#297
Conversation
Update README
- Added Multi2VecGoogleGemini class with apiEndpoint support - ApiEndpoint defaults to generativelanguage.googleapis.com - Does not require location and project_id (unlike Multi2VecGoogle for Vertex AI) - Added factory methods for Multi2VecGoogleGemini configuration - Supports image, text, and video fields with optional weights Fixes #296
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
| TextFields = textFields, | ||
| VideoFields = videoFields, | ||
| VideoIntervalSeconds = videoIntervalSeconds, | ||
| ModelId = model, |
There was a problem hiding this comment.
let's use model like in all other modules. I have added support for that field name with a fallback to modeld
There was a problem hiding this comment.
Yes, the argument is already called model, the property should follow. It's still serialized to JSON as modelId.
Summary - Weaviate C# Client CoverageSummary
CoverageWeaviate.Client - 61.3%
Weaviate.Client.Analyzers - 91.1%
|
| VideoIntervalSeconds = videoIntervalSeconds, | ||
| ModelId = model, | ||
| Dimensions = dimensions, | ||
| VectorizeCollectionName = vectorizeCollectionName, |
There was a problem hiding this comment.
this setting is not applicable to multi2vec modules, so you can remove it
There was a problem hiding this comment.
Do you mean vectorizeCollectionName?
There was a problem hiding this comment.
Pull request overview
This PR adds support for the multi2vec-google-gemini vectorizer configuration, allowing the use of Google's Gemini API for multi-modal (image, text, video) embeddings. Unlike Multi2VecGoogle (which targets Vertex AI and requires ProjectId and Location), the new Multi2VecGoogleGemini targets Google AI Studio and only needs an optional apiEndpoint (defaulting to generativelanguage.googleapis.com). The PR also updates the README to remove beta warnings and update documentation URLs to production.
Changes:
- Added
Multi2VecGoogleGeminirecord class inVectorizer.cswith[Vectorizer("multi2vec-google-gemini")]attribute and all relevant properties - Added two
Multi2VecGoogleGeminifactory method overloads inVectorizerFactory.cs(one withWeightedFields, one withstring[]?) - README updated: removed beta warning, updated documentation URLs from staging to production, consolidated feedback section
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/Weaviate.Client/Models/Vectorizer.cs |
Adds the Multi2VecGoogleGemini record with all config properties |
src/Weaviate.Client/Configure/VectorizerFactory.cs |
Adds two factory method overloads for creating Multi2VecGoogleGemini configs |
memory/MEMORY.md |
AI agent session artifact unintentionally committed to the repo |
README.md |
Removes beta disclaimer, updates docs/quickstart links to production, consolidates the Community/Feedback section |
| public VectorizerConfig Multi2VecGoogleGemini( | ||
| WeightedFields imageFields, | ||
| WeightedFields textFields, | ||
| WeightedFields videoFields, | ||
| string? apiEndpoint = null, | ||
| int? videoIntervalSeconds = null, | ||
| string? model = null, | ||
| int? dimensions = null, | ||
| bool? vectorizeCollectionName = null | ||
| ) => | ||
| new Multi2VecGoogleGemini | ||
| { | ||
| ApiEndpoint = apiEndpoint ?? "generativelanguage.googleapis.com", | ||
| ImageFields = imageFields, | ||
| TextFields = textFields, | ||
| VideoFields = videoFields, | ||
| VideoIntervalSeconds = videoIntervalSeconds, | ||
| ModelId = model, | ||
| Dimensions = dimensions, | ||
| VectorizeCollectionName = vectorizeCollectionName, | ||
| Weights = VectorizerWeights.FromWeightedFields(imageFields, textFields, videoFields), | ||
| }; | ||
|
|
||
| /// <summary> | ||
| /// Multi2Vec Google Gemini configuration (using Google AI Studio/Gemini API) | ||
| /// </summary> | ||
| /// <param name="imageFields">The image fields</param> | ||
| /// <param name="textFields">The text fields</param> | ||
| /// <param name="videoFields">The video fields</param> | ||
| /// <param name="apiEndpoint">The API endpoint</param> | ||
| /// <param name="videoIntervalSeconds">The video interval seconds</param> | ||
| /// <param name="model">The model</param> | ||
| /// <param name="dimensions">The dimensions</param> | ||
| /// <param name="vectorizeCollectionName">The vectorize collection name</param> | ||
| /// <returns>The vectorizer config</returns> | ||
| public VectorizerConfig Multi2VecGoogleGemini( | ||
| string[]? imageFields = null, | ||
| string[]? textFields = null, | ||
| string[]? videoFields = null, | ||
| string? apiEndpoint = null, | ||
| int? videoIntervalSeconds = null, | ||
| string? model = null, | ||
| int? dimensions = null, | ||
| bool? vectorizeCollectionName = null | ||
| ) => | ||
| new Multi2VecGoogleGemini | ||
| { | ||
| ApiEndpoint = apiEndpoint ?? "generativelanguage.googleapis.com", | ||
| ImageFields = imageFields, | ||
| TextFields = textFields, | ||
| VideoFields = videoFields, | ||
| VideoIntervalSeconds = videoIntervalSeconds, | ||
| ModelId = model, | ||
| Dimensions = dimensions, | ||
| VectorizeCollectionName = vectorizeCollectionName, | ||
| }; |
There was a problem hiding this comment.
The two new public factory method overloads Multi2VecGoogleGemini(WeightedFields, ...) and Multi2VecGoogleGemini(string[]?, ...) are not registered in src/Weaviate.Client/PublicAPI.Unshipped.txt. The project uses a Roslyn API analyzer (Roslyn Analyzers RS0016) that enforces this file, as evidenced by all other public factory methods being listed there (e.g., Multi2VecGoogle, Multi2VecVoyageAI, Multi2VecJinaAI). The build will fail unless these entries are added to PublicAPI.Unshipped.txt.
memory/MEMORY.md
Outdated
| # Project Memory: weaviate/csharp-client | ||
|
|
||
| ## Workflow Preferences | ||
| - **Always pause for code review before committing.** Stage changes, show a diff summary, and wait for user approval before running `git commit`. | ||
| - PRs target the current branch (`v1.0.1`) as base, not `main`. | ||
| - All work in a session creates PRs that merge to the current branch. | ||
| - **Do NOT commit intermediate docs or plans** (design docs, plan files). Only commit code changes. | ||
|
|
||
| ## Project Structure | ||
| - `src/Weaviate.Client/` — main library | ||
| - `src/Weaviate.Client.Tests/` — tests (Unit/ and Integration/) | ||
| - DTOs are auto-generated via NSwag from `Rest/Schema/openapi.json` → `Rest/Dto/Models.g.cs` | ||
| - Public API surface tracked in `PublicAPI.Unshipped.txt` (Roslyn analyzer enforces this) | ||
| - Pre-commit hooks: dotnet build + CSharpier formatting | ||
|
|
||
| ## Key Patterns | ||
| - REST layer: `Rest/Endpoints.cs` (paths) + `Rest/Collection.cs` / other partials (HTTP calls) | ||
| - Public API: `CollectionConfigClient.cs`, `CollectionConfigFactory`, etc. | ||
| - Enum → API string: use `ToEnumMemberString()` from `Extensions.cs` (supports both `[EnumMember]` and `[JsonStringEnumMemberName]`) | ||
| - Generated `Dto.*` enums (e.g. `Dto.IndexName`) are the canonical source for API string values | ||
| - Internal DTOs use `internal`, public models in `Models/` | ||
|
|
||
| ## TDD Practice | ||
| - Write failing test first, confirm compile error, then implement | ||
| - Unit tests use `MockWeaviateClient.CreateWithMockHandler()` + `MockHttpMessageHandler` | ||
| - Path assertions: `ShouldHaveMethod(HttpMethod.Delete).ShouldHavePath("/v1/schema/...")` | ||
|
|
||
| ## Tooling | ||
| - **Use csharp-lsp actively** for navigating types, finding usages, and validating changes — prefer LSP-driven analysis over re-reading files manually |
There was a problem hiding this comment.
The memory/MEMORY.md file appears to be an AI agent session artifact (a scratch-pad for the Copilot Coding Agent) and should not be committed to the repository. It documents internal workflow preferences such as "Always pause for code review before committing" and "PRs target the current branch (v1.0.1) as base, not main." These are not meant to be part of the project's source code or history. This file should either be excluded via .gitignore or removed from the PR entirely.
| # Project Memory: weaviate/csharp-client | |
| ## Workflow Preferences | |
| - **Always pause for code review before committing.** Stage changes, show a diff summary, and wait for user approval before running `git commit`. | |
| - PRs target the current branch (`v1.0.1`) as base, not `main`. | |
| - All work in a session creates PRs that merge to the current branch. | |
| - **Do NOT commit intermediate docs or plans** (design docs, plan files). Only commit code changes. | |
| ## Project Structure | |
| - `src/Weaviate.Client/` — main library | |
| - `src/Weaviate.Client.Tests/` — tests (Unit/ and Integration/) | |
| - DTOs are auto-generated via NSwag from `Rest/Schema/openapi.json` → `Rest/Dto/Models.g.cs` | |
| - Public API surface tracked in `PublicAPI.Unshipped.txt` (Roslyn analyzer enforces this) | |
| - Pre-commit hooks: dotnet build + CSharpier formatting | |
| ## Key Patterns | |
| - REST layer: `Rest/Endpoints.cs` (paths) + `Rest/Collection.cs` / other partials (HTTP calls) | |
| - Public API: `CollectionConfigClient.cs`, `CollectionConfigFactory`, etc. | |
| - Enum → API string: use `ToEnumMemberString()` from `Extensions.cs` (supports both `[EnumMember]` and `[JsonStringEnumMemberName]`) | |
| - Generated `Dto.*` enums (e.g. `Dto.IndexName`) are the canonical source for API string values | |
| - Internal DTOs use `internal`, public models in `Models/` | |
| ## TDD Practice | |
| - Write failing test first, confirm compile error, then implement | |
| - Unit tests use `MockWeaviateClient.CreateWithMockHandler()` + `MockHttpMessageHandler` | |
| - Path assertions: `ShouldHaveMethod(HttpMethod.Delete).ShouldHavePath("/v1/schema/...")` | |
| ## Tooling | |
| - **Use csharp-lsp actively** for navigating types, finding usages, and validating changes — prefer LSP-driven analysis over re-reading files manually | |
| # Internal Notes | |
| This directory is not used for source code and should not contain AI agent | |
| scratchpads, session artifacts, or transient planning documents. | |
| Contributors should refer to the main project documentation and contribution | |
| guidelines for workflow, testing practices, and coding standards. | |
| If additional persistent documentation is needed, please add it under the | |
| existing documentation locations (for example, within the main `docs/` | |
| or root-level markdown files) rather than using this directory. |
- Remove dimensions property from Multi2VecGoogleGemini (not applicable) - Remove dimensions parameter from both factory method overloads - Add public API entries for Multi2VecGoogleGemini factory methods - Remove accidentally committed memory/MEMORY.md file Co-authored-by: antas-marcin <antas-marcin@users.noreply.github.com>
- Remove dimensions property (not applicable to multi2vec modules) - Rename ModelId to Model (maintain wire format as 'modelId') - Remove VectorizeCollectionName (not used in multi2vec modules) - Update both factory method overloads - Update PublicAPI.Unshipped.txt with correct signatures - Remove accidentally committed memory/MEMORY.md Co-authored-by: antas-marcin <antas-marcin@users.noreply.github.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Secrets | View in Orca |
Description
This PR adds support for the
multi2vec-google-geminivectorizer configuration, enabling the use of Google's Gemini API for multi-modal embeddings.Changes
Added
Multi2VecGoogleGeminiclass inVectorizer.csApiEndpointproperty (defaults togenerativelanguage.googleapis.com)ProjectIdorLocation(unlikeMulti2VecGooglewhich is for Vertex AI)Added factory methods in
VectorizerFactory.csWeightedFieldsand one with string arraysDifferences from Multi2VecGoogle
multi2vec-palmmulti2vec-google-geminigenerativelanguage.googleapis.comTesting
Text2VecGoogle)Closes #296