Skip to content

Yardstick config updates#14

Open
ericwindmill wants to merge 8 commits intomainfrom
yardstick-config-updates
Open

Yardstick config updates#14
ericwindmill wants to merge 8 commits intomainfrom
yardstick-config-updates

Conversation

@ericwindmill
Copy link
Collaborator

Updating config to serve both our local use cases and Yardstick.

…on fields, refactoring `configuration_reference.md` to link to it and updating `index.md` and `custom.css`.
- `Job.sampleFilters` / `Job.sample_filters` — select samples by metadata tags
- `variant_filters` on task YAML — restrict which variants apply to a task (supplements `allowed_variants`)

- **`JobTask.args`.** Per-task argument overrides. Allows a job to pass task-specific arguments (e.g. `base_url`, `dataset_path`) to individual tasks.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to job.yaml tasks.<task>.args JobTask is the underlying model


- **`Job.description`.** Optional human-readable description field on Job.

- **`Job.imagePrefix` / `Job.image_prefix`.** Registry URL prefix prepended to image names during sandbox resolution. Enables switching between local images and remote registries (e.g. Artifact Registry on GKE) without duplicating job YAML files.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imagePrefix/image_prefix thing is because Dart and Python changes are needed.

…resolvers and support colon syntax for task function resolution.
…, and expanded sandbox configuration options in documentation and API.
…rguments to resolved task metadata, updating the config parity tool.
@ericwindmill ericwindmill marked this pull request as ready for review March 14, 2026 00:56
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the configuration system to improve its adaptability and extensibility, particularly for Yardstick and diverse local use cases. It introduces new configuration options, standardizes naming conventions, and decouples environment-specific settings, allowing for more flexible and maintainable evaluation setups. A major enhancement is the introduction of tag-based filtering, which provides powerful new ways to manage task and sample selection.

Highlights

  • New Configuration Fields: Introduced several new fields across Job and Task models, including 'description' and 'imagePrefix' for Job, 'args' for JobTask, and 'systemMessage' and 'sandboxParameters' for Task, enhancing configuration flexibility.
  • Renamed Task Function Field: The 'taskFunc' field in the Task model was renamed to 'func' for consistency with YAML key names, requiring updates in both Dart and Python packages.
  • Configurable Sandbox Registry: The sandbox registry and SDK channel mappings are now data-driven and configurable, allowing non-Flutter projects to define their own sandbox setups and removing hardcoded values.
  • Enhanced Workspace Resolution: Workspace resolution now directly uses Inspect AI's native 'Sample.files' and 'Sample.setup' fields, making the setup command configurable and no longer hardcoded to Flutter-specific commands.
  • Tag-Based Filtering: Implemented a new 'TagFilter' model to enable tag-based filtering for tasks, samples, and variants, providing more granular control over evaluation execution.
  • Module:Task Syntax Support: Added support for the 'module.path:function_name' syntax for Python task function references, improving clarity and organization for task definitions.
  • Comprehensive Documentation Updates: Added a new 'CHANGELOG.md' and 'IMPLEMENTATION_PLAN.md', along with extensive updates to existing documentation, including a new 'yaml_config.md' reference, to reflect all configuration changes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CHANGELOG.md
    • Added a new changelog file detailing new features, breaking changes, and documentation updates across multiple releases, including new Job and Task fields, tag-based filtering, configurable sandbox, and the module:task syntax.
    • Documented the renaming of Task.taskFunc to Task.func and changes to workspace resolution.
  • IMPLEMENTATION_PLAN.md
    • Added a new implementation plan outlining the steps for config improvements, including model changes, parser/resolver changes, tag-based filtering, and a file index of modifications.
  • docs/_static/custom.css
    • Modified CSS to adjust sidebar and article container widths for wide screens.
  • docs/guides/config.md
    • Updated configuration guide with sections on tag-based filtering, task function references, sandbox configuration, and workspace setup.
  • docs/reference/configuration_reference.md
    • Updated the configuration reference to link to the new yaml_config.md and removed detailed field tables, deferring to the new dedicated reference.
  • docs/reference/dart_api/dataset_config_dart/dataset_config_dart.md
    • Updated Dart API documentation for EvalSetResolver, Job, JobTask, ParsedTask, TagFilter, Task, TaskMetadata, and matchesTagFilter to reflect new fields, renames, and configurable sandbox.
  • docs/reference/index.md
    • Added yaml_config to the reference index.
  • docs/reference/yaml_config.md
    • Added a new comprehensive YAML configuration reference file detailing Job, Task, and Sample fields with Dart and Python cross-references.
  • packages/dash_evals/src/dash_evals/runner/json_runner.py
    • Updated task function resolution to support module:task syntax and changed references from task_func to func.
  • packages/dataset_config_dart/lib/src/models/context_file.g.dart
    • Modified JSON serialization for ContextFile to directly use metadata instead of toJson().
  • packages/dataset_config_dart/lib/src/models/dataset.g.dart
    • Modified JSON serialization for Dataset to directly use samples instead of map((e) => e.toJson()).toList().
  • packages/dataset_config_dart/lib/src/models/eval_log.g.dart
    • Modified JSON serialization for EvalLog and related classes to directly use nested objects instead of toJson() or map((e) => e.toJson()).toList().
  • packages/dataset_config_dart/lib/src/models/eval_set.g.dart
    • Modified JSON serialization for EvalSet to directly use tasks instead of map((e) => e.toJson()).toList().
  • packages/dataset_config_dart/lib/src/models/job.dart
    • Added description, imagePrefix, taskFilters, sampleFilters to Job and args to JobTask.
  • packages/dataset_config_dart/lib/src/models/job.freezed.dart
    • Updated Job and JobTask freezed classes to include new fields (description, imagePrefix, taskFilters, sampleFilters, args) and their copyWith, ==, hashCode, and toString methods.
  • packages/dataset_config_dart/lib/src/models/job.g.dart
    • Updated JSON serialization for Job and JobTask to include new fields (description, imagePrefix, taskFilters, sampleFilters, args) and to directly use nested objects.
  • packages/dataset_config_dart/lib/src/models/models.dart
    • Exported the new tag_filter.dart model.
  • packages/dataset_config_dart/lib/src/models/tag_filter.dart
    • Added a new TagFilter model and matchesTagFilter utility function.
  • packages/dataset_config_dart/lib/src/models/tag_filter.freezed.dart
    • Added freezed class for TagFilter.
  • packages/dataset_config_dart/lib/src/models/tag_filter.g.dart
    • Added JSON serialization for TagFilter.
  • packages/dataset_config_dart/lib/src/models/task.dart
    • Renamed taskFunc to func and added systemMessage and sandboxParameters to Task and TaskMetadata.
  • packages/dataset_config_dart/lib/src/models/task.freezed.dart
    • Updated Task freezed class to reflect taskFunc rename to func and addition of systemMessage and sandboxParameters.
  • packages/dataset_config_dart/lib/src/models/task.g.dart
    • Updated JSON serialization for Task to reflect taskFunc rename to func and addition of systemMessage and sandboxParameters, and to directly use dataset.
  • packages/dataset_config_dart/lib/src/models/variant.dart
    • Renamed flutterChannel to branch in Variant.
  • packages/dataset_config_dart/lib/src/models/variant.freezed.dart
    • Updated Variant freezed class to reflect flutterChannel rename to branch.
  • packages/dataset_config_dart/lib/src/models/variant.g.dart
    • Updated JSON serialization for Variant to reflect flutterChannel rename to branch and to directly use context_files.
  • packages/dataset_config_dart/lib/src/parsed_task.dart
    • Renamed taskFunc to func and added variantFilters and sandboxParameters to ParsedTask.
  • packages/dataset_config_dart/lib/src/parsers/json_parser.dart
    • Updated JsonParser to use func instead of taskFunc.
  • packages/dataset_config_dart/lib/src/parsers/yaml_parser.dart
    • Updated YamlParser to use func instead of taskFunc, and to parse new fields like variant_filters, sandbox_parameters, description, image_prefix, task_filters, and sample_filters.
  • packages/dataset_config_dart/lib/src/resolvers/eval_set_resolver.dart
    • Refactored EvalSetResolver to be configurable with sandboxRegistry and branchChannels instead of hardcoded constants.
    • Renamed kSandboxRegistry to kDefaultSandboxRegistry and kSdkChannels to kDefaultBranchChannels.
    • Updated sandbox resolution logic and implemented tag-based filtering for tasks, samples, and variants.
    • Propagated image_prefix and JobTask.args to task metadata.
  • packages/dataset_config_dart/pubspec.yaml
    • Added build_runner, freezed, and json_serializable to dev dependencies.
  • packages/dataset_config_dart/test/eval_set_resolver_test.dart
    • Updated tests to reflect taskFunc rename to func.
    • Added tests for variant_filters, image_prefix, and JobTask.args.
  • packages/dataset_config_dart/test/eval_set_writer_test.dart
    • Updated test to use func instead of taskFunc.
  • packages/dataset_config_dart/test/json_parser_test.dart
    • Updated tests to use func instead of taskFunc.
  • packages/dataset_config_dart/test/parsed_task_test.dart
    • Updated tests to use func instead of taskFunc.
  • packages/dataset_config_python/src/dataset_config_python/init.py
    • Exported new SandboxConfig, DEFAULT_BRANCH_CHANNELS, and DEFAULT_SANDBOX_REGISTRY.
  • packages/dataset_config_python/src/dataset_config_python/models/init.py
    • Exported new TagFilter and matches_tag_filter.
  • packages/dataset_config_python/src/dataset_config_python/models/job.py
    • Imported TagFilter.
    • Added description, image_prefix, task_filters, sample_filters to Job, and args to JobTask.
  • packages/dataset_config_python/src/dataset_config_python/models/tag_filter.py
    • Added new TagFilter model and matches_tag_filter utility function.
  • packages/dataset_config_python/src/dataset_config_python/models/task.py
    • Renamed task_func to func and added system_message and sandbox_parameters to Task.
  • packages/dataset_config_python/src/dataset_config_python/models/variant.py
    • Renamed flutter_channel to branch in Variant.
  • packages/dataset_config_python/src/dataset_config_python/parser.py
    • Imported TagFilter.
    • Renamed task_func to func in ParsedTask.
    • Added sandbox_parameters and variant_filters to ParsedTask.
    • Updated _load_task_file and parse_job to handle these new fields.
  • packages/dataset_config_python/src/dataset_config_python/resolver.py
    • Refactored resolver to use a SandboxConfig dataclass.
    • Renamed SANDBOX_REGISTRY to DEFAULT_SANDBOX_REGISTRY and SDK_CHANNELS to DEFAULT_BRANCH_CHANNELS.
    • Updated _resolve_job and _resolve_sandbox to accept configurable sandbox settings.
    • Implemented tag-based filtering for tasks, samples, and variants.
    • Propagated image_prefix and JobTask.args to task metadata.
  • packages/dataset_config_python/tests/test_config.py
    • Updated tests to reflect flutter_channel rename to branch and task_func rename to func.
  • packages/devals_cli/lib/src/dataset/dry_run.dart
    • Updated _validateConfig to use func instead of taskFunc.
  • tool/config_parity/bin/config_parity.dart
    • Renamed from config_partiy.dart.
Activity
  • The pull request introduces a new CHANGELOG.md file, detailing several new features and breaking changes across different releases, indicating significant ongoing development.
  • An IMPLEMENTATION_PLAN.md is added, suggesting a structured approach to these configuration improvements.
  • Documentation files are updated and a new comprehensive YAML configuration reference is added, implying a focus on clarity and usability for developers.
  • The changes span both Dart and Python packages, indicating a cross-language impact and coordination.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant number of updates to the configuration system to support Yardstick and local use cases. Key changes include adding new fields to Job and Task models (like description, imagePrefix, args), introducing tag-based filtering for tasks and samples, and making the sandbox registry configurable. The Task.taskFunc field has been consistently renamed to Task.func across the codebase. Extensive documentation has been added or updated to reflect these changes, including a new detailed YAML configuration reference.

My review focuses on ensuring consistency across the implementation and documentation. I've found a couple of minor inconsistencies in the documentation regarding field names. I also noticed one part of the implementation in the Dart resolver that seems to contradict the implementation plan regarding hardcoded setup commands. Overall, this is a well-structured and comprehensive update.

Comment on lines 157 to 161
if (workspace != null && isContainer) {
files = {...?files, '/workspace': workspace};
setup = setup ?? 'cd /workspace && flutter pub get';
setup ??= 'cd /workspace && flutter pub get';
enriched['workspace'] = '/workspace';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to the implementation plan, the hardcoded flutter pub get command for workspace setup should be removed to make the configuration more generic and not assume a Flutter project. However, it's still present here. To align with the goal of making the system more project-agnostic, this line should be removed. The task author should explicitly specify the setup command in their task or sample YAML if one is needed.

        if (workspace != null && isContainer) {
          files = {...?files, '/workspace': workspace};
          enriched['workspace'] = '/workspace';
        }

Comment on lines +87 to +92
&nbsp;&nbsp;`.flutter_channel`
- string
- Y
-
-
- Flutter SDK channel (`stable`, `beta`, `main`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The field flutter_channel has been renamed to branch to be more generic, but this documentation still refers to the old name. Please update the field name and its description to reflect this change.

Suggested change
&nbsp;&nbsp;`.flutter_channel`
- string
- Y
-
-
- Flutter SDK channel (`stable`, `beta`, `main`)
* - `variants`\
&nbsp;&nbsp;`.<name>`\
&nbsp;&nbsp;`.branch`
- string
- Y
-
-
- SDK branch/channel to use (`stable`, `beta`, `main`)

| `context_files` | list | `[]` | Paths or glob patterns to context files (relative to task dir) |
| `skills` | list | `[]` | Paths or glob patterns to skill directories (relative to task dir) |
| `mcp_servers` | list | `[]` | MCP server identifiers |
Variant sub-fields (`context_files`, `mcp_servers`, `skills`, `flutter_channel`) are documented in the [Job fields table](yaml_config.md#job).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for variant sub-fields mentions flutter_channel, but this field has been renamed to branch to be more generic. Please update this reference to maintain consistency with the implementation.

Suggested change
Variant sub-fields (`context_files`, `mcp_servers`, `skills`, `flutter_channel`) are documented in the [Job fields table](yaml_config.md#job).
Variant sub-fields (`context_files`, `mcp_servers`, `skills`, `branch`) are documented in the [Job fields table](yaml_config.md#job).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant