Skip to content

feat(frontend): Run-on modes in the evaluator creation drawer (shared controls)#4557

Open
mmabrouk wants to merge 2 commits into
fe-fix/app-workflow-router-unification-regression-fixfrom
fe-feat/evaluator-drawer-run-on
Open

feat(frontend): Run-on modes in the evaluator creation drawer (shared controls)#4557
mmabrouk wants to merge 2 commits into
fe-fix/app-workflow-router-unification-regression-fixfrom
fe-feat/evaluator-drawer-run-on

Conversation

@mmabrouk
Copy link
Copy Markdown
Member

@mmabrouk mmabrouk commented Jun 5, 2026

Why

The Run on selector (test case / app output / trace) was only wired into the full-page evaluator playground. The evaluator-creation drawer still hardcoded runDisabled={!hasAppConnected} and only showed the test-set dropdown after an app was connected — so in the drawer you were forced to pick an app even when you wanted to run the evaluator directly on a test case. The drawer had silently drifted out of sync with the page.

What

Rather than paste the run-on wiring into the drawer (a fourth copy), this extracts the logic the page and drawer were already duplicating and shares it:

  • useEvaluatorRunControls() — one hook for the app adapter, app-select handler, run-on mode + handlePickRunOn, and the run gate (runDisabled = runOnMode === "app" && !hasAppConnected).
  • EvaluatorRunControls — the run-on selector + app picker + disconnect affordance + test-set dropdown, as one cluster used by both the page header and the drawer header, so they can't diverge again.

Result:

  • Page: behavior-preserving (just sources its controls from the shared hook/cluster).
  • Drawer: gains all three run-on modes, the run-on selector, a disconnect affordance, and an always-available test-set dropdown. Test-case mode now runs without forcing an app — the bug is fixed.
  • Removes the appWorkflowAdapter / handleAppSelect / evaluator-node-lookup triplication across the page body, drawer header, and drawer body.

Net: 218 insertions / 274 deletions across 5 files (2 new, 3 slimmed).

Notes

  • runOnMode stays persisted per project (shared by page and drawer); the per-evaluator question is tracked separately for a later PR, as discussed.
  • runDisabled only manifests where the run panel renders (the page and the expanded drawer); the collapsed/config-only drawer ignores it, unchanged.

Stacked on

Based on fe-fix/app-workflow-router-unification-regression-fix (the merged evaluator-playground branch, which already contains the page-side run-on feature from #4553).

Test plan

  • Open the New Evaluation flow → create-evaluator drawer → switch Run on to "Run directly on a test case": the test-case editor is usable and runs without selecting an app.
  • Switch to "Run on an app output" with no app: the run panel shows the "Select an app" empty state; pick an app → it runs.
  • Confirm the full-page evaluator playground is unchanged (modes, default, dark mode, disconnect).

The Run-on selector (test case / app output / trace) was only wired into the
full-page evaluator playground. The evaluator-creation drawer still hardcoded
`runDisabled={!hasAppConnected}` and only showed the test-set dropdown after an
app was connected, so it forced the user to pick an app even when they wanted to
run the evaluator directly on a test case.

Rather than copy the run-on wiring into the drawer (a fourth duplicate), extract
the shared logic the page and drawer were already duplicating:

- useEvaluatorRunControls(): app adapter, app-select handler, run-on mode +
  handlePickRunOn, and the runDisabled gate (runOnMode === 'app' && !appConnected).
- EvaluatorRunControls: the run-on selector + app picker + disconnect + test-set
  cluster, shared by the page header and the drawer header so they can't drift.

The page is behavior-preserving; the drawer gains all three modes, the run-on
selector, a disconnect affordance, and an always-available test-set dropdown.
This also removes the adapter/handleAppSelect/evaluator-node triplication across
the page body, drawer header, and drawer body.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 5, 2026 1:45pm

Request Review

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ae0093dc-906b-4d93-ba6e-3b2373caa197

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-feat/evaluator-drawer-run-on

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Railway Preview Environment

Preview URL https://gateway-production-d7bf.up.railway.app/w
Project agenta-oss-pr-4557
Image tag pr-4557-1bda40a
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-05T13:54:26.050Z

The creation drawer renders inside EvaluationRunsTableStoreProvider, a scoped
jotai store that mirrors only a handful of global atoms. The playground state,
however, runs on the default store (the playground package uses
getDefaultStore() throughout). So in the drawer the run-on mode was read/written
in the scoped store while the playground lived in the default store — the two
split, and switching to test-case mode never reached the run panel: it stayed
stuck on the 'Select an app' empty state.

Read and write all run-on / playground atoms through getDefaultStore() in
useEvaluatorRunControls, mirroring the existing workaround in
usePreviewVariantConfig and TestsetCells. On the full page (no scoped store)
this is a no-op; in the drawer it aligns run-on state with the playground so
test-case mode shows the inputs/outputs as it does on the page.
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant