fix: Remove evaluation metric key from schema which failed on some LLMs by jsonbailey · Pull Request #105 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-03-11T22:40:27Z

fix: Improve metric token collection for Judge evaluations when using LangChain
fix: Include raw response when performing Judge evaluations

Note

Medium Risk
Changes the structured-output contract for Judge evaluations and modifies LangChain structured invocation/metrics extraction, which could affect evaluation parsing and reported token usage across providers.

Overview
Judge structured evaluation output is simplified from a dynamic evaluations[{metricKey}] shape to a fixed evaluation { score, reasoning } schema, and parsing/validation is updated to key results by evaluation_metric_key at runtime (with tests adjusted accordingly).

LangChain structured invocations now request include_raw=True, propagate the raw model message + token usage into StructuredResponse, treat parsing errors as failures, and improve provider handling by mapping Bedrock bedrock:* to bedrock_converse (including passing the original provider string via parameters when needed) and reading token usage from usage_metadata when available.

^{Written by Cursor Bugbot for commit 1ed23cf. This will update automatically on new commits. Configure here.}

packages/sdk/server-ai/tests/test_judge.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-03-14T03:47:24Z

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py

+        # Bedrock requires the foundation provider (e.g. Bedrock:Anthropic) passed in
+        # parameters separately from model_provider, which is used for LangChain routing.
+        if mapped_provider == 'bedrock_converse' and 'provider' not in parameters:
+            parameters['provider'] = provider


Bedrock provider parameter passes wrong format to LangChain

High Severity

The provider variable holds the raw LaunchDarkly provider name (e.g., "Bedrock:Anthropic" or "Bedrock"), which gets passed directly as parameters['provider'] to init_chat_model / ChatBedrockConverse. However, ChatBedrockConverse expects the provider parameter to be just the model family name in lowercase (e.g., "anthropic"), not the full LD-formatted name. Passing "Bedrock:Anthropic" will cause incorrect provider inference and likely break Bedrock model initialization.

cursor · 2026-03-14T03:47:24Z

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py

-                    usage=TokenUsage(total=0, input=0, output=0),
-                ),
-            )
+            return structured_response


Exception handler may return success=True after partial mutation

Low Severity

The except handler returns the shared mutable structured_response without resetting metrics.success. After line 110, get_ai_metrics_from_response replaces the metrics with success=True. If any exception occurs between that point and the explicit returns, the handler returns a response indicating success despite the failure. The previous code defensively created a fresh StructuredResponse with success=False in the handler.

Additional Locations (1)

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py#L109-L110

fix: Remove evaluation metric key from schema which failed on some LLMs

f8c6eba

jsonbailey requested a review from a team as a code owner March 11, 2026 22:40

additional properties is required for openai schemas

49f5e2e

cursor bot reviewed Mar 11, 2026

View reviewed changes

packages/sdk/server-ai/tests/test_judge.py Outdated Show resolved Hide resolved

packages/sdk/server-ai/tests/test_judge.py Show resolved Hide resolved

packages/sdk/server-ai/tests/test_judge.py Show resolved Hide resolved

fix tests

916df2a

kinyoklion approved these changes Mar 13, 2026

View reviewed changes

jsonbailey added 2 commits March 13, 2026 11:38

include raw response and collect judge metric tokens

1d78f8a

fix bedrock models not running properly in langchain

75f75cf

cursor bot reviewed Mar 14, 2026

View reviewed changes

fix lint issue

1ed23cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove evaluation metric key from schema which failed on some LLMs#105

fix: Remove evaluation metric key from schema which failed on some LLMs#105
jsonbailey wants to merge 6 commits intomainfrom
jb/aic-1897/remove-keys-from-evaluation-structure

jsonbailey commented Mar 11, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented Mar 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Bedrock provider parameter passes wrong format to LangChain

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Exception handler may return success=True after partial mutation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented Mar 11, 2026 •

edited by cursor bot

Loading