.Net: fix(connectors): Support request-level ModelId overrides for Google, Vertex AI, and OpenAI#13999
Open
Yusuftmle wants to merge 12 commits into
Open
.Net: fix(connectors): Support request-level ModelId overrides for Google, Vertex AI, and OpenAI#13999Yusuftmle wants to merge 12 commits into
Yusuftmle wants to merge 12 commits into
Conversation
…proofing - Changed exception handling to catch all Exception types instead of only OpenAI-specific ones - This approach is more robust and won't miss new exception types from future SDK updates - Inner exceptions are preserved for detailed error analysis - Updated unit tests accordingly
…ogle/Vertex AI connectors
…enAIChatCompletionService
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the .NET Google (Gemini / Vertex AI) and OpenAI connectors to honor request-level ModelId overrides (e.g., via PromptExecutionSettings / EmbeddingGenerationOptions) instead of always using the constructor-provided default model.
Changes:
- OpenAI chat completion service now forwards
executionSettings.ModelIdto the underlying client when provided. - Google AI / Vertex AI embedding + Gemini chat clients now resolve endpoint URIs per request using the overridden model id.
- Adds unit tests validating
ModelIdoverride behavior; additionally introduces new OpenAI Response Agent exception-wrapping behavior + tests (not mentioned in the PR description).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/src/Connectors/Connectors.OpenAI/Services/OpenAIChatCompletionService.cs | Prefer request-level ModelId when calling ClientCore. |
| dotnet/src/Connectors/Connectors.OpenAI.UnitTests/Services/OpenAIChatCompletionServiceTests.cs | Adds test asserting OpenAI payload + result use overridden model id. |
| dotnet/src/Connectors/Connectors.Google/GeminiPromptExecutionSettings.cs | Ensures ModelId is preserved when translating from generic execution settings. |
| dotnet/src/Connectors/Connectors.Google/Core/VertexAI/VertexAIEmbeddingClient.cs | Builds Vertex embedding endpoint per request to support model override. |
| dotnet/src/Connectors/Connectors.Google/Core/GoogleAI/GoogleAIEmbeddingClient.cs | Builds Google embedding endpoint per request to support model override. |
| dotnet/src/Connectors/Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs | Builds Gemini generation/streaming endpoints per request and propagates overridden model id into responses. |
| dotnet/src/Connectors/Connectors.Google.UnitTests/Services/GoogleAIGeminiChatCompletionServiceTests.cs | Adds test verifying overridden model id is used in request URI and returned content. |
| dotnet/src/Agents/UnitTests/OpenAI/OpenAIResponseAgentExceptionTests.cs | New tests for exception wrapping behavior in OpenAIResponseAgent. |
| dotnet/src/Agents/OpenAI/OpenAIResponseAgent.cs | Adds provider exception wrapping for streaming and non-streaming invocation paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+100
to
+105
| Kernel? kernel = null, | ||
| CancellationToken cancellationToken = default) | ||
| => this._client.GetChatMessageContentsAsync(this._client.ModelId, chatHistory, executionSettings, kernel, cancellationToken); | ||
| => this._client.GetChatMessageContentsAsync(executionSettings?.ModelId ?? this._client.ModelId, chatHistory, executionSettings, kernel, cancellationToken); |
| ? state.ExecutionSettings.ModelId | ||
| : this._modelId; | ||
|
|
||
| var (generationEndpoint, streamingEndpoint) = this.GetEndpoints(modelId); |
Comment on lines
19
to
23
| private readonly string _embeddingModelId; | ||
| private readonly GoogleAIVersion _apiVersion; | ||
| private readonly Uri _embeddingEndpoint; | ||
| private readonly int? _dimensions; | ||
|
|
Comment on lines
74
to
+80
| Verify.NotNullOrEmpty(data); | ||
|
|
||
| string modelId = !string.IsNullOrWhiteSpace(options?.ModelId) ? options.ModelId : this._embeddingModelId; | ||
| var geminiRequest = this.GetEmbeddingRequest(data, options); | ||
|
|
||
| using var httpRequestMessage = await this.CreateHttpRequestAsync(geminiRequest, this._embeddingEndpoint).ConfigureAwait(false); | ||
| var endpoint = this.GetEmbeddingEndpoint(modelId); | ||
| using var httpRequestMessage = await this.CreateHttpRequestAsync(geminiRequest, endpoint).ConfigureAwait(false); |
Comment on lines
19
to
25
| private readonly string _embeddingModelId; | ||
| private readonly VertexAIVersion _apiVersion; | ||
| private readonly string _location; | ||
| private readonly string _projectId; | ||
| private readonly Uri _embeddingEndpoint; | ||
| private readonly int? _dimensions; | ||
|
|
Comment on lines
+155
to
+158
| await foreach (var result in mappedResults.ConfigureAwait(false)) | ||
| { | ||
| await NotifyMessagesAsync().ConfigureAwait(false); | ||
| yield return new(result, agentThread); |
Comment on lines
+62
to
+78
| try | ||
| { | ||
| agentThread = await this.EnsureThreadExistsWithMessagesAsync(messages, thread, cancellationToken).ConfigureAwait(false); | ||
| extensionsContextOptions = await this.FinalizeInvokeOptionsAsync(messages, options, agentThread, cancellationToken).ConfigureAwait(false); | ||
|
|
||
| ChatHistory chatHistory = [.. messages]; | ||
| invokeResults = ResponseThreadActions.InvokeAsync( | ||
| this, | ||
| chatHistory, | ||
| agentThread, | ||
| extensionsContextOptions, | ||
| cancellationToken); | ||
| } | ||
| catch (Exception ex) when (ex is not OperationCanceledException) | ||
| { | ||
| throw new KernelException($"OpenAI provider error for agent '{this.Name}': {ex.Message}", ex); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses and resolves the issue where Google AI, Vertex AI, and OpenAI connectors ignored request-level
ModelIdoverrides (such as those supplied viaPromptExecutionSettingsorEmbeddingGenerationOptions) and instead always defaulted to using the model ID passed during service initialization/constructor calls.Resolves #13287
Note
Also includes changes from #13011 (OpenAIResponseAgent exception handling).
Motivation
In modern multi-agent or dynamically-routed applications, developers often initialize a single LLM or Embedding service instance (e.g., with a default fast model) and want to dynamically override the model to a larger or specialized version for specific requests using the request-level execution options.
Previously:
ModelIdsupplied on the request options.OpenAIChatCompletionServicewas hardcoded to forward onlythis._client.ModelIdto its core client, completely bypassingexecutionSettings?.ModelId.With this PR, all of these connectors now seamlessly support dynamic, request-level model overriding with optimal performance pathways.
Changes Made
Microsoft.SemanticKernel.Connectors.GoogleGeminiPromptExecutionSettings.cs:FromExecutionSettingsto ensure thatsettings.ModelIdis explicitly copied from the originalexecutionSettings.ModelIdduring option translation/deserialization.GoogleAIEmbeddingClient.cs&VertexAIEmbeddingClient.cs:_apiVersion,_location,_projectId) in private fields.GetEmbeddingEndpoint(string modelId)to reuse the constructor-instantiated_embeddingEndpointif the requested model matches the default.GetEmbeddingRequestto accept the resolved, validatedmodelIddirectly to avoid redundant whitespace checking.GenerateEmbeddingsAsyncto fetch the endpoint dynamically per-request, seamlessly directing the API call to the overridden model.GeminiChatCompletionClient.cs:GetEndpoints(string modelId)to dynamically construct both generation and streaming endpoints for the requested model override.GenerateChatMessageAsync) and streaming (StreamGenerateChatMessageAsync) routines to resolve the runtimemodelIdand use the dynamic endpoints (with clean variable discards for unused endpoints).modelIdinto all response and metadata factories so that the returned chat content has the correct executing model name.Microsoft.SemanticKernel.Connectors.OpenAIOpenAIChatCompletionService.cs:GetChatMessageContentsAsync,GetStreamingChatMessageContentsAsync,GetTextContentsAsync, andGetStreamingTextContentsAsync) to forward overridden models using a robuststring.IsNullOrWhiteSpacecheck to prevent forwarding invalid model identifiers.Test Coverage
Google AI & Vertex AI:
GetChatMessageContentsAsyncUsesModelIdFromExecutionSettingsAsyncinsideGoogleAIGeminiChatCompletionServiceTests.csto verify that overridingModelIdinGeminiPromptExecutionSettingscorrectly updates the target endpoint URI and returned model properties.OpenAI:
GetChatMessageContentsAsyncUsesModelIdFromExecutionSettingsAsyncinsideOpenAIChatCompletionServiceTests.csto verify that overridingModelIdinOpenAIPromptExecutionSettingssuccessfully routes the OpenAI payload with the overridden model identifier.