Fix Python remote eval parameter serialization in dev mode#210
Open
ekeith (evanmkeith) wants to merge 1 commit into
Open
Fix Python remote eval parameter serialization in dev mode#210ekeith (evanmkeith) wants to merge 1 commit into
ekeith (evanmkeith) wants to merge 1 commit into
Conversation
## Summary
Fixes Python `bt eval --dev` `/list` responses for evals that define `parameters={...}`.
The Python eval runner was emitting parameters as a raw JSON Schema object via `parameters_to_json_schema(...)`. The Braintrust UI expects remote eval parameters to use the
serialized parameter container shape, such as `braintrust.staticParameters` or `braintrust.parameters`. As a result, `/list` could return `200 OK` while the UI still failed to
parse the evaluator manifest and showed the generic connection/listing error.
This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.
## Test Plan
- Add a Python list-mode fixture with a Pydantic parameter model.
- Assert `BT_EVAL_DEV_MODE=list` emits `parameters.type = "braintrust.staticParameters"`.
- Run:
- `python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py`
- `cargo fmt --check`
- `cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture`
|
Latest downloadable build artifacts for this PR commit
Available artifact names
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes Python
bt eval --dev/listresponses for evals that defineparameters={...}.The Python eval runner was emitting parameters as a raw JSON Schema object via
parameters_to_json_schema(...). The Braintrust UI expects remote eval parameters to use the serialized parameter container shape, such asbraintrust.staticParametersorbraintrust.parameters. As a result,/listcould return200 OKwhile the UI still failed to parse the evaluator manifest and showed the generic connection/listing error.This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.
Test Plan
BT_EVAL_DEV_MODE=listemitsparameters.type = "braintrust.staticParameters".python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.pycargo fmt --checkcargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture