flutter · ericwindmill · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/.firebaserc b/.firebaserc
@@ -0,0 +1,5 @@
+{
+  "projects": {
+    "default": "dash-evals"
+  }
+}
diff --git a/.gemini/config.yaml b/.gemini/config.yaml
@@ -0,0 +1,10 @@
+# Minimize verbosity.
+have_fun: false
+code_review:
+  # For now, use the default of MEDIUM for testing. Based on desired verbosity,
+  # we can change this to LOW or HIGH in the future.
+  comment_severity_threshold: MEDIUM
+  pull_request_opened:
+    summary: true
+    include_drafts: false
+ignore_patterns:
diff --git a/.gemini/styleguide.md b/.gemini/styleguide.md
@@ -0,0 +1,90 @@
+# dash_evals Style Guide
+
+This style guide outlines the coding conventions and contribution requirements for the dash_evals repository.
+
+---
+
+## Documentation Requirements
+
+All changes that affect user-facing behavior, configuration, or APIs **must** be documented in the `docs/` directory:
+
+- **New features**: Add documentation explaining the feature and how to use it
+- **CLI changes**: Update `docs/dataset_yaml_schema.md` (CLI Usage section)
+- **Configuration changes**: Update `docs/dataset_yaml_schema.md` 
+- **Workflow changes**: Update `docs/contributing_guide.md`
+- **Architecture changes**: Update `docs/repository_structure.md`
+
+When reviewing PRs, check that:
+1. Any new CLI flags or options are documented
+2. New configuration fields are documented with type, description, and examples
+3. User-facing error messages are clear and actionable
+
+---
+
+## Python Style Guide
+
+This project follows the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html).
+
+### Key Points
+
+- **Formatting**: Use `ruff format` for automatic formatting
+- **Linting**: Use `ruff check` and `pylint` 
+- **Line length**: 100 characters maximum
+- **Docstrings**: Use Google-style docstrings with Args, Returns, and Raises sections
+- **Type hints**: Required for all public functions and methods
+- **Imports**: Use absolute imports, grouped by standard library / third-party / local
+
+### Docstring Example
+
+```python
+def parse(dataset_path: Path, jobs: list[str] | None = None) -> list[EvalSetConfig]:
+    """Parse dataset directory into resolved EvalSetConfig(s).
+
+    Args:
+        dataset_path: Path to dataset directory containing dataset.yaml.
+        jobs: Optional list of job names or paths. Uses default_job if not specified.
+
+    Returns:
+        List of EvalSetConfig objects ready to pass to inspect_ai.eval_set().
+
+    Raises:
+        FileNotFoundError: If dataset or job file not found.
+    """
+```
+
+---
+
+## Dart Style Guide
+
+This project follows the [Effective Dart Style Guide](https://dart.dev/effective-dart/style).
+
+Code should follow the relevant style guides, and use the correct
+auto-formatter, for each language, as described in
+[the repository contributing guide's Style section](https://github.com/flutter/packages/blob/main/CONTRIBUTING.md#style).
+
+### Best Practices
+
+- Code should follow the guidance and principles described in
+  [the flutter/packages contribution guide](https://github.com/flutter/flutter/blob/master/docs/ecosystem/contributing/README.md).
+- Code should be tested. Changes to plugin packages, which include code written
+  in C, C++, Java, Kotlin, Objective-C, or Swift, should have appropriate tests
+  as described in [the plugin test guidance](https://github.com/flutter/flutter/blob/master/docs/ecosystem/testing/Plugin-Tests.md).
+- PR descriptions should include the Pre-Review Checklist from
+  [the PR template](https://github.com/flutter/packages/blob/main/.github/PULL_REQUEST_TEMPLATE.md),
+  with all of the steps completed.
+
+### Review Agent Guidelines
+
+When providing a summary, the review agent must adhere to the following principles:
+- **Be Objective:** Focus on a neutral, descriptive summary of the changes. Avoid subjective value judgments
+  like "good," "bad," "positive," or "negative." The goal is to report what the code does, not to evaluate it.
+- **Use Code as the Source of Truth:** Base all summaries on the code diff. Do not trust or rephrase the PR
+  description, which may be outdated or inaccurate. A summary must reflect the actual changes in the code.
+- **Be Concise:** Generate summaries that are brief and to the point. Focus on the most significant changes,
+  and avoid unnecessary details or verbose explanations. This ensures the feedback is easy to scan and understand.
+
+### YAML Configuration Files
+
+- Use 2-space indentation
+- Include comments explaining non-obvious fields
+- Use explicit `path:` references for file paths (e.g., `- path: samples/file.yaml`)
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -70,6 +70,6 @@ jobs:
         with:
           repoToken: ${{ secrets.GITHUB_TOKEN }}
           firebaseServiceAccount: ${{ secrets.FIREBASE_SERVICE_ACCOUNT }}
-          projectId: evals
-          target: evals-docs
+          projectId: dash-evals
+          target: dash-evals-docs
           channelId: live
diff --git a/.gitignore b/.gitignore
@@ -233,3 +233,4 @@ app.*.map.json
 /android/app/debug
 /android/app/profile
 /android/app/release
+.firebase/
diff --git a/docs/contributing/repository_structure.md b/docs/contributing/repository_structure.md
@@ -21,7 +21,7 @@ evals/
 
 ## dataset/
 
-Contains all evaluation data, configurations, and resources. See the [Configuration Overview](./config/about.md) for detailed file formats.
+Contains all evaluation data, configurations, and resources. See the [Configuration Overview](../reference/configuration_reference.md) for detailed file formats.
 
 | Path | Description |
 |------|-------------|
@@ -81,7 +81,7 @@ dash_evals/
 
 ### devals_cli/ (devals)
 
-Dart CLI for creating and managing evaluation tasks and jobs. See the [CLI documentation](./cli.md) for full command reference.
+Dart CLI for creating and managing evaluation tasks and jobs. See the [CLI documentation](../reference/cli.md) for full command reference.
 
 ```
 devals_cli/

diff --git a/docs/guides/quick_start.md b/docs/guides/quick_start.md
@@ -14,16 +14,12 @@ You'll also need an API key for at least one model provider (`GOOGLE_API_KEY`, `
 ## 1. Install the packages
 
 ```bash
-git clone https://github.com/flutter/evals.git
-pip install -e <path-to-evals>/packages/dash_evals
-dart pub global activate devals --source path <path-to-evals>/packages/devals_cli
-
-
-## TODO: Integrate in the new repo. This is wrong for this repo
+git clone https://github.com/flutter/evals.git && cd evals
 python3 -m venv .venv
 source .venv/bin/activate
 pip install -e "packages/dash_evals[dev]"
 pip install -e "packages/dataset_config_python[dev]"
+dart pub global activate devals --source path packages/devals_cli
 ```
 
 This installs two things:

diff --git a/docs/guides/tutorial.md b/docs/guides/tutorial.md
@@ -169,7 +169,7 @@ samples:
 | `tests.path` | Path to test files the scorer runs against the generated code. |
 
 > [!NOTE]
-> See [Tasks](config/tasks.md) and [Samples](config/samples.md) for the
+See [Tasks](../reference/configuration_reference.md#task-files) and [Samples](../reference/configuration_reference.md#sample-files) for the
 > complete field reference.
 
 ---
@@ -215,7 +215,7 @@ That's the minimal job — it will:
 >   with_context:
 >     context_files: [./context_files/dart_docs.md]
 > ```
-> See [Configuration Overview](config/about.md#variants) for details.
+> See [Configuration Overview](../reference/configuration_reference.md#variants) for details.
 
 ---
 
@@ -281,7 +281,7 @@ devals view path/to/logs
 Now that you've run your first custom evaluation, here are some things to try:
 
 - **Add more samples** to your task: `devals create sample`
-- **Try different task types** — `question_answer`, `bug_fix`, or `flutter_code_gen`. See [all available task functions](../packages/dash_evals.md).
+- **Try different task types** — `question_answer`, `bug_fix`, or `flutter_code_gen`. See [all available task functions](../contributing/packages/dash_evals.md).
 - **Add variants** to test how context files or MCP tools affect performance. See [Variants](config/about.md#variants).
 - **Run multiple models** by adding more entries to the `models` list in your job file
-- **Read the config reference** for [Jobs](config/jobs.md), [Tasks](config/tasks.md), and [Samples](config/samples.md)
+- **Read the config reference** for [Jobs](../reference/configuration_reference.md#job-files), [Tasks](../reference/configuration_reference.md#task-files), and [Samples](../reference/configuration_reference.md#sample-files)
diff --git a/docs/reference/glossary.md b/docs/reference/glossary.md
@@ -68,6 +68,6 @@ Key terminology for understanding the evals framework.
 
 ---
 
-See the [Configuration Overview](./config/about.md) for detailed configuration file documentation.
+See the [Configuration Reference](./configuration_reference.md) for detailed configuration file documentation.
 
 [Learn more about Inspect AI](https://inspect.aisi.org.uk/)
diff --git a/firebase.json b/firebase.json
@@ -1,6 +1,6 @@
 {
   "hosting": {
-    "site": "evals-docs",
+    "site": "dash-evals-docs",
     "public": "docs/_build/html",
     "ignore": [
       "firebase.json",
@@ -9,4 +9,4 @@
       "**/dart_docs/"
     ]
   }
-}
+}