Skip to content

[release] v0.102.0#4547

Open
github-actions[bot] wants to merge 52 commits into
mainfrom
release/v0.102.0
Open

[release] v0.102.0#4547
github-actions[bot] wants to merge 52 commits into
mainfrom
release/v0.102.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented Jun 4, 2026

New version v0.102.0 in

  • web
    • web/oss
    • web/ee
  • services
  • api
  • sdks
    • sdks/python
  • clients
    • clients/python
    • clients/typescript
  • kubernetes
    • kubernetes/helm

bekossy and others added 5 commits June 3, 2026 11:46
- Created unit tests for data transformation utilities including error extraction, response status preservation, and metadata stripping.
- Added tests for formatting utilities covering number, currency, latency, and percentage formatting.
- Implemented tests for path utilities to validate object navigation and manipulation.
- Developed tests for slug generation and validation functions.
- Added tests for template variable validation and extraction.
- Included tests for various validators including UUID and HTTP URL validation.
- Configured Vitest for running tests with coverage reporting and JUnit output.
…annotation packages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…verage, template-variable alignment

- Replace `as any` fixture casts with `as unknown as T` in annotation tests
- Fix incorrect Annotation import source in testset-sync (now from @agenta/entities/annotation)
- Add Testcase type import and remove all as-any call-site casts in testset-sync
- Add falsy-root short-circuit tests for getValueAtPath (0, false, "", null)
- Realign template-variable tests to the strict envelope-slot behavior on main

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 8, 2026 12:12pm

Request Review

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 4, 2026
ardaerzin and others added 22 commits June 4, 2026 18:45
PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style
playground was a regression for evaluators (lost the upstream-app
connection) and app-scoped observability defaulted to "invocation"
instead of "annotation" for evaluator workflows. This change addresses
both blockers and re-enables the flow by default.

Playground
- ConfigureEvaluatorPage: upstream app workflow can be connected via
  EntityPicker (skip-variant adapter, filtered to non-evaluator
  non-feedback workflows). Disconnect affordance on the picker
  trigger and as a popup footer.
- Standalone evaluator runs no longer require an upstream app
  (TestsetDropdown is always available; runDisabled gate removed).
- Playground chain traces now write evaluator references
  (evaluator / evaluator_variant / evaluator_revision slots) so the
  per-evaluator observability page can find them. EntityPicker
  search bar respects a new parentLabel option so app pickers no
  longer show "Search evaluator..."

Observability filters
- Per-workflow-kind trace_type default extracted into
  @agenta/entities (defaultTraceTypeForWorkflow): annotation for
  evaluators, invocation otherwise. Pure helper unit-tested with
  vitest.
- References scope filter adapts to the effective trace_type:
  evaluators with trace_type=annotation pin to references.evaluator,
  invocation pins to references.application, and "no trace_type"
  ORs across both slots so all traces mentioning the evaluator
  surface.
- Dialog reconciliation: live label flip while editing trace_type
  in the filter dialog ("Application ID" / "Evaluator ID") via an
  opt-in reconcileFilterRows callback on Filters; observability
  page provides an evaluator-workflow-aware reconciler.
- Filter persistence across reloads: per-app via atomWithStorage
  under "agenta:observability:filters", with __global__ fallback
  for project-level pages. Both userFilters and traceTypeChoice
  share one packed storage atom.
- Cleaner state machine for trace_type intent: tagged union
  (default / value / cleared) replaces the dual-atom dance that
  could silently revert.
- application_id URL param dropped for evaluator workflows; the
  query is gated on workflow context being settled to avoid
  firing with the wrong scope.

Tests
- vitest unit tests for defaultTraceTypeForWorkflow.
- Playwright acceptance for full-page playground: post-create
  nav, row click for LLM and declarative evaluators, direct URL,
  sidebar switcher; fixes the previously broken
  select-app-and-run test for the new flow.
CodeRabbit flagged 5 issues on the evaluator-full-page rollout PR.
This commit addresses each:

1. PlaygroundRouter — `is_feedback` evaluators skip the full-page swap.
   `workflowKind === "evaluator"` was too broad. Human/feedback
   evaluators are drawer-only in /evaluators (they capture human input,
   they don't run), so routing them to ConfigureEvaluatorPage produced
   a run-controls UI for a workflow with nothing to run. Added a
   `flags.is_feedback` exclusion next to the workflowKind check.

2. Sidebar — switcher filters out `is_feedback` evaluators.
   `nonArchivedEvaluatorsAtom` only filters by `deleted_at` and
   includes human evaluators; the switcher was exposing entries that,
   when clicked, would land on the (now-correctly-gated) generic
   <Playground /> for a feedback workflow. Filtered the list at the
   switcher boundary.

3. controls.ts — handle array-valued `trace_type` for in/not_in.
   The dialog dispatches `{operator: "in", value: ["annotation"]}` for
   the IN operator family, but the intent setter only normalized
   scalars — so the user's choice was silently dropped to
   `{kind: "cleared"}`. Normalize to an array, filter to enum values,
   and collapse single-value arrays back to a scalar. Multi-value
   selections (which mean "no filter" for a 2-value enum) still map
   to `cleared`.

4. Playwright — drop stale `[data-row-key]` poll in select-app-and-run.
   The test asserted post-create navigation to /apps/<id>/playground
   AFTER polling for the new row in the evaluators table — but the
   redirect wins first, the table disappears, and the poll became a
   timing-dependent failure. Removed the registry-side wait;
   evaluator-in-registry assertion is covered by the
   post-create-row-click test alongside.

5. ConfigureEvaluator/atoms.ts — fix persistedAppSelectionAtom race.
   `connectAppToEvaluatorAtom` persisted the app selection BEFORE
   `changePrimaryNode` ran, so a failed swap (returns `null` with no
   primary to swap from) left a stale localStorage record that the
   next mount re-hydrated into a phantom "connected" state. Moved the
   persist call to after both graph mutations succeed.
   `disconnectAppFromEvaluatorAtom` early-returned on no-downstream
   without clearing the persisted state, allowing the same phantom
   record to survive a disconnect attempt. Clear it on that branch
   too.

No behavior change for the happy-path full-page flow — these all
narrow edge cases the reviewer flagged.
…ssion-fix

Resolves a single conflict in
`web/packages/agenta-entities/src/workflow/core/schema.ts` —
release v0.100.4 added `artifact_slug` / `variant_slug` to the
revision schema alongside the `workflow_slug` /
`workflow_variant_slug` fields this branch had introduced for
emitting evaluator references on playground chain runs.

Both sides added `workflow_slug` and `workflow_variant_slug`
with overlapping intent; resolution keeps all four fields
and merges the two doc comments into one that covers both
purposes (parent-workflow identification for ID-less callers
+ evaluator chain-trace emission).

No source behavior change — schema is additive on both sides.
…ssion-fix

Resolved conflicts:
- web/oss/src/components/Filters/Filters.tsx — kept this branch's
  `displayedFilter` (reconciles filter rows for evaluator workflows) with
  main's `filterContainerClass` plain-class styling.
- web/packages/agenta-playground/src/state/execution/executionRunner.ts —
  kept this branch's `stageReferences` builder which merges upstream app
  references with the evaluator's own self-references (via
  `buildEvaluatorSelfReferences`). Main's variant dropped references for
  evaluator stages, which was the regression PR #4474 is fixing — evaluator
  traces need `references.evaluator.slug` attached so they are searchable
  on the evaluator's /apps/<evalId>/traces page.
Issue: In the LLM-as-a-judge playground, switching the chained app from a
chat application to a completion application kept sending `context` and
`messages` from the previous app in the new request body.

Root cause: At `executionRunner.ts` for depth=0 (the root entity), the
runner spreads the entire row's data into `nodeInputs` (`{...data}`) and
hands it to the stage handle as `inputValues`. The downstream filter in
`resolveVariableValues` / `buildCompletionInputRow` correctly drops keys
that aren't in the entity's input variables — when `variables` is non-
empty. But when the entity's input ports haven't resolved yet (entity
mid-hydration) or genuinely declares no input variables, that filter
falls back to "spread every key from the row", which is exactly the
window in which stale chat-shape keys (`messages`, `context`) leak into
a completion request.

Fix: Filter `data` at the runner against the entity's declared
`inputSchema.properties` BEFORE building `nodeInputs`. This applies to
both the first execution (line ~417) and the repetition retries (line
~689). When the entity has no resolvable input schema, the helper falls
back to `{...data}` so workflows that genuinely depend on free-form
input (e.g. `__rawBody` app workflows whose variables live in
`__meta.variables`) keep working.

The fix is safe for chat mode: chat strips `messages` separately at line
587 of `executionItems.ts` and rebuilds the conversation from
`chatHistory` via `messageIdsAtomFamily(loadableId)` — independent of
`inputValues`.

Defense-in-depth: this complements the existing
`resolveVariableValues` filter rather than replacing it.
The evaluator info notice in SingleLayout rendered with hardcoded
light-mode colors (bg-blue-50, text-gray-700) and was unreadable
against the dark UI. Add dark: variants to background, border,
icon, body text, and dismiss button to match the existing
dark:bg-blue-900/* pattern used elsewhere in the app.
Previous attempt used dark:text-gray-200 which conflicted with the
themeAwareColors CSS-variable layer — the gray scale is role-inverted
in dark mode, so dark:text-gray-200 resolved to a dark shade against
the dark callout background.

Switch overrides to the blue scale (not theme-flipped): dark:text-blue-50
for body text, dark:text-blue-300 for the icon, and dark:text-blue-200
for the dismiss button. All readable against dark:bg-blue-900/20.
The first #4525 fix only covered the depth=0 (root entity) path. In
the LLM-as-a-judge evaluator playground the chained app sits at
depth>0, where input construction goes through resolveChainInputs
(spreads testcaseData on the no-mapping branch) or
buildEvaluatorExecutionInputs (spreads testcaseData when the schema
allows additionalProperties). Both paths re-leak the stale
`messages` field from a previous chat app into the current target
entity's request body.

Add stripChatTransportForEntity — a targeted strip of known chat-
transport keys (currently just `messages`) that runs unless the
target entity's input schema explicitly declares them. Applied:

- depth=0 path: as a defense-in-depth pass after the strict
  filterDataToEntityInputSchema, so the spread fallback (taken
  while the new app's schema is mid-hydration) can't leak the
  stale field either.
- depth=0 repetition path: same.
- depth>0 path: pre-filters `data` before chain / evaluator input
  construction. Uses a targeted strip (rather than the strict
  schema filter) so evaluators that legitimately depend on
  additionalProperties: true spread of testcase columns keep
  receiving them.

The helper short-circuits to the input reference when no chat
transport keys are present, so there's no allocation in the
common path.
…chat keys

Diagnostic telemetry for #4525 / AGE-3793 — three console.warn signals
in executionRunner so we can tell which layer is actually rescuing the
request body during a chat→completion swap:

1. filterDataToEntityInputSchema schema-not-resolved fallback — the
   strict allow-list can't run because workflowMolecule.selectors
   .ioSchemas returned no inputSchema.properties. Logs the entityId,
   the reason (no-properties vs properties-not-object), the data keys
   present, and whether `messages` is among them.
2. filterDataToEntityInputSchema empty-properties fallback — schema
   resolved but Object.keys(properties).length === 0. Same payload.
3. stripChatTransportForEntity strip — emits only when a chat-transport
   key was actually dropped, with which keys and whether the entity
   schema was resolved at the time of the strip.

All three are warn-level so they're visible in production console
without code changes, and gated to the unusual paths so the happy
path stays quiet.
…4525)

Move the stale-key fix from execution-time stripping to the layer
where it belongs: the testcase row store, on swap of the primary
entity. The testcaseMolecule is shared across loadables, so when the
user swaps the chained app in the LLM-as-a-judge playground (anchor
positional swap in setEntityIdsAtom), the same rows now carry every
key the previous primary populated — `messages` from a prior chat
app, completion variables from a prior completion app, etc.

Reconciliation strategy (decided with the user):
- Closed schema (additionalProperties: false): drop any row key not
  declared by the new entity's inputSchema.properties. Drops silently
  — no toast, no confirm modal. Matches what the user typed for the
  new app and nothing more.
- Open schema (additionalProperties not set or true): only strip the
  CHAT_TRANSPORT_KEYS set (currently `messages`). Evaluator
  workflows that legitimately depend on additionalProperties spread
  keep receiving their extra testcase columns.
- Schema not resolved: skip. The execution-time strip in
  executionRunner.ts is the fallback during this hydration window —
  it will be removed in a follow-up commit once the row-layer fix is
  verified end-to-end and a reactive deferred reconciliation handles
  the hydration race.

Mutation goes through testcaseMolecule.actions.batchUpdate with
stale keys set to `undefined` (the store's update reducer
interprets that as a delete). Drafts are created per affected row.

A console.warn is emitted in two cases:
- schema-not-resolved on swap (so we can verify the hydration race
  surface area in practice).
- one summary per swap that lists which keys were dropped per row
  and the schema mode (closed vs open).
…chema (#4525)

Root cause of the `context` leak that survived the prior fixes: both
the row prune and the runtime filter read the allow-list from
`workflowMolecule.selectors.ioSchemas(entityId).inputSchema.properties`,
which is EMPTY for completion apps. Completion apps express their
variables as prompt template placeholders surfaced through
`inputPorts`, not through the static input schema. So the filter
degraded to its empty-properties fallback (keep everything) and only
the hardcoded chat-transport strip removed `messages` — `context`
(a real chat template var, stale on the row) sailed through.

Diagnostic confirmation from the repro console:
  [executionRunner.filter] empty-properties fallback
    {entityId, dataKeys: ['messages','context','country'], hasMessagesKey: true}

Fix: new shared helper `state/helpers/entityInputContract.ts` that
resolves the allow-list the SAME way executionItems builds request
`variables`:
  variablesFromInputPorts = inputPorts[].key
  variablesFromPayload    = requestPayload.__meta.variables
                            ?? requestPayload.variables ?? []
  variables = inputPorts.length > 0 ? inputPorts : payload
  (+ `messages` when executionMode === 'chat')

`reconcileRowDataForEntity` applies the policy:
  - app with resolved contract → strict allow-list (drops context+messages)
  - evaluator → chat-transport-only strip (preserves additionalProperties
    spread of extra testcase columns)
  - unresolved contract → chat-transport-only safety strip

Both consumers now delegate to it:
  - playgroundController.pruneTestcaseRowsForEntity (swap-time, primary fix)
  - executionRunner.reconcileEntityInputData (exec-time hydration safety net)

This collapses the three ad-hoc helpers (filterDataToEntityInputSchema,
stripChatTransportForEntity, getEntityInputSchema) into one correct
source-of-truth resolution and removes the now-misleading
empty-properties / schema-not-resolved diagnostics.
…nputs (#4525)

The prior fix stripped stale keys from the APP's request inputs, but the
trace still showed `context` because it surfaces in the downstream
EVALUATOR's {inputs, outputs} envelope. The evaluator reads the SAME
shared testcase row, and the evaluator policy (chat-transport-only)
intentionally preserves non-`messages` keys — so `context` survived
there. The UI row also still showed it.

Also: the swap-time prune in setEntityIdsAtom never fired for this flow —
the evaluator playground selects the app via add/remove node actions in
ConfigureEvaluator, not a setEntityIds positional swap (no
[playgroundController.prune] log appeared in the repro).

Fix: reconcile the shared testcase row against the ROOT entity's input
contract in webWorkerIntegration, right before execution — path-agnostic,
fires on every run regardless of how the app was selected. The cleaned
row is:
  - passed to the runner (so app request AND evaluator envelope are clean),
  - written back via loadableController.actions.updateRow (so the UI and
    future runs reflect it; undefined values delete the keys).

Evaluator-referenced columns are protected: collectDownstreamReferencedColumns
gathers testcase columns named by downstream evaluator `<input>_key`
settings (e.g. correct_answer_key → ground_truth) and passes them as
protectedKeys, so a strict clean against the app contract never drops
intentional evaluation inputs.

reconcileRowDataForEntity gains an optional protectedKeys set; a key
survives strict filtering when it's in the app allow-list OR protected.

Emits [webWorker.reconcile] when keys are dropped, listing the strategy,
dropped keys, and protected columns.
…4525)

Two follow-ups now that the row-reconciliation fix is verified working:

1. Clean-on-swap: the evaluator playground selects an app via
   changePrimaryNode + connectDownstreamNode (ConfigureEvaluator), not a
   setEntityIds positional swap — so the previously-wired swap-time prune
   never fired there. Add playgroundController.actions.reconcileRowsToPrimary
   and call it from connectAppToEvaluatorAtom AFTER connectDownstreamNode, so
   the shared testcase row is cleaned the instant the app changes (not only at
   run time). Running after the downstream connect means the evaluator's
   referenced columns (correct_answer_key → ground_truth, etc.) are protected
   from the strict app-contract clean.

   pruneTestcaseRowsForEntity now:
   - collects downstream-evaluator protected columns,
   - returns a status ('acted' | 'noop' | 'unresolved').

   reconcileRowsToPrimary handles the hydration race: if the new primary's
   inputPorts aren't resolved yet AND the entity isn't loaded, it subscribes
   to inputPorts and retries once, then unsubscribes. If the entity is loaded
   but genuinely has no variables, it doesn't subscribe (no dangling sub). The
   run-time reconciliation in webWorkerIntegration remains the backstop.

2. Remove diagnostic logs added while tracing the bug:
   [executionRunner.filter], [webWorker.reconcile],
   [playgroundController.prune]. The reconcile + writeback logic stays; only
   the console.warn telemetry is dropped.
The beautified/markdown view forced H2 headings to uppercase via
text-transform, rewriting the user's own prompt text. H1 was also lighter
than H2, and H3-H6 had no styling. Apply a consistent best-practice scale
(descending sizes, shared weight/color/spacing) across H1-H6 in both light
and dark mode, with no case transform.
Drop the bottom bar that showed the GitHub, LinkedIn, and X icons and the
'Copyright © <year> | Agenta.' line from the platform layout.

Remove the FooterIsland component, its styles, and the footerHeight resize
observer that only existed to size the footer.
The message editors had the Text/Markdown view inverted: the view-mode
dropdown mapped markdownView to the wrong boolean, so picking 'Markdown'
showed raw source and the default 'Text' showed rendered markdown. Fixed
the mapping across every message editor (chat turns, prompt messages,
variable inputs, the JSON object field) and the live markdown toggle
button, whose icon and tooltip were also inverted.

Also:
- The view mode is now a shared, persisted atom (messageViewModeAtom), so
  switching one message switches all and the choice survives a refresh.
- Text mode renders with the editor's proportional font instead of
  monospace, with spacing that matches the rendered markdown view.
- Message text and placeholder align with the role label above them.
Adds a 'Run on' control to the evaluator (LLM-as-a-judge) playground header
so the first/empty state explains itself instead of leaving the user with two
disconnected loaders. Three modes, each drawing its own data-flow:

- Run directly on a test case  (Data -> Evaluator -> Score)
- Run on an app output         (Data -> App -> Output -> Evaluator -> Score) - default
- Run on a trace               (Trace -> Evaluator -> Score) - disabled for now

The mode is persisted per project; a connected app forces effective 'app' mode.
In app mode with no app connected, the run panel hides the testcases and shows a
centered 'Select an app' empty state (shared with the evaluator-creation drawer).
All colors come from the antd theme token so it follows light/dark mode.

Prompt playground is intentionally untouched.
The useStyles call cast its arg to StyleProps, but StyleProps was never
imported in Layout.tsx (a latent issue, flagged by review). With footerHeight
gone, StyleProps is just {themeMode}, so the cast is unnecessary. Pass the
arg directly; tsc confirms it type-checks.
ardaerzin added 7 commits June 5, 2026 23:19
QA round 2: in the evaluator playground, selecting an app → disconnecting
→ re-selecting the SAME app connected nothing in the UI (workflow selector
+ generation panel stayed on the 'Select an app' empty state).

Root cause (pinned via runtime instrumentation): the node graph IS correct
after reconnect — connectAppToEvaluatorAtom writes playgroundNodesAtom to
[app, evaluator] and a follow-up read confirms 2 nodes in the single store.
But on a disconnect→reconnect cycle jotai applies the two sequential
playgroundNodesAtom writes (changePrimaryNode → connectDownstreamNode)
WITHOUT notifying the mounted dependents, so selectedAppLabelAtom /
hasAppConnectedAtom (and the package's generation-panel atoms) never
recompute and the UI shows stale 'disconnected' state. First-connect and
disconnect notify fine; only the reconnect drops the notification.

Fix: after the graph mutations in connectAppToEvaluatorAtom, read the
node-derived display atoms (selectedAppLabelAtom, hasAppConnectedAtom) to
re-establish the dependency and flush the pending notification to their
subscribers. Verified locally: reconnect now updates both the selector and
the generation panel.

Also removes the temporary [B-repro] diagnostics added while root-causing.
…QA critical)

The critical QA bug: invoking an LLM-as-a-judge evaluator opened from the
drawer 422'd because the request shipped
references.evaluator_revision.id = "local-…" (an unsaved local-draft id),
which the backend /invoke validator rejects as a non-UUID.

buildEvaluatorSelfReferences (chain stage refs) was already guarded, but
references can also arrive from the requestPayload builder and from
trace-span extraction. Rather than chase each builder, add a single final
sanitization at the one chokepoint where the request body's references are
assembled (buildExecutionItem, after all sources are merged): drop any
reference id that is a local-draft or placeholder id, keep slug/version,
and drop a slot that ends up empty.

Path-agnostic — covers the drawer direct-invoke, the chained evaluator
playground, and any future reference source.
…4474 QA E)

In the evaluator playground the primary (app) result row exposed an 'open
trace' affordance, but the downstream evaluator result card (DownstreamNodeCard)
rendered only its output fields — no way to open the evaluator's own trace to
debug a grade (QA 2026-06-05: 'show the trace links (icon) for evaluators too').

The downstream result already carries a traceId; the card just never read it.
Read it and pass a compact 'open trace' icon (SharedGenerationResultUtils in
actionsOnly mode) into NodeResultCard's headerActions slot, so it appears next
to the evaluator node name on hover — same trace drawer the app row opens.

Adds actionsOnly to the package's SharedGenerationResultUtilsProps provider
type (the OSS wrapper + entity component already support it).
…ions & Overview

Evaluators can be evaluated as subjects (#4237). Show those runs on the
evaluator's own Evaluations tab and Overview summaries:

- Add a run-list reference predicate (entities/evaluationRun/etl) that drops
  runs by the ROLE their references play - keeping runs where the workflow is
  the evaluated subject (application/invocation ref) and excluding runs where
  it was merely a grader (evaluator/annotation ref). Replaces the flaky
  meta.application heuristic with the structural data.steps source of truth.
- Wire the subject filter into the eval-runs fetch, with a hit-ratio meter
  reporting the v1->v2 escalation signal, and a bounded over-fetch so the
  fixed-size Overview summaries fill instead of falsely reading empty.
- Re-enable Overview eval summaries + Evaluations route for evaluators
  (sidebar links, route guards, DISABLED_FOR_EVALUATOR).
- Resolve the locked Apps filter chip to the workflow name for evaluators.
Evaluators aren't deployed to environments, but deploy actions leaked onto
their surfaces. Gate at the reusable chokepoints:

- DeployVariantButton self-guards via the workflow-level is_evaluator flag
  (correct even on v0 revisions), covering the revision drawer + every other
  reuse without per-call-site checks.
- Recent Prompts (VariantsOverview) passes hideDeployActions for evaluators,
  matching the variants dashboard.
Switch ArchivedAppsPage to the shared PageLayout with an inline back-arrow
title and no subtitle, matching the Archived Evaluators page.
…valuation modal

When the modal is app-scoped to an evaluator route, resolve the Application
panel label/kind from the evaluators list so it shows the evaluator's name
(not its raw id) - the app-scoped pre-lock never sets selectedWorkflowMeta.

Also drop the full evaluator query-result objects from the derived-evaluators
memo deps (and a dead humanEvaluatorsQuery subscription); they changed
identity every query tick and churned the modal's renders.
@mmabrouk mmabrouk marked this pull request as draft June 8, 2026 09:04
@mmabrouk mmabrouk marked this pull request as ready for review June 8, 2026 09:05
@dosubot dosubot Bot added python Pull requests that update Python code typescript labels Jun 8, 2026
…ification-regression-fix

[FE Fix]: Re-enable full-page playground for evaluator workflows
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 8, 2026
…-annotation-packages

test(frontend): add unit tests for @agenta/shared and @agenta/annotation packages
The switcher used fullPagePlaygroundEvaluatorsAtom, which narrows to
evaluators that have a full-page playground (LLM, code) and so dropped the
declarative matchers (exact match, regex, similarity, json diff, contains
json, ...). Add nonHumanEvaluatorsAtom - non-archived evaluators with only the
human (is_feedback, resolved from the latest revision) exclusion - and point
the switcher at it, so every automatic evaluator is listed while human ones
stay out.
mmabrouk added 2 commits June 8, 2026 13:56
Two fixes found while auditing the bot against all open PRs:

- Only a non-empty Summary plus a demo (for functional changes) are required.
  Missing Testing/Checklist sections no longer close a PR. The demo is now
  detected anywhere in the body, not just the Demo section. This fixes a PR
  that had a YouTube demo and full testing notes but was closed for lacking
  the checklist section.
- Drop the 'reopened' trigger so a maintainer who manually reopens a flagged
  PR wins, instead of the bot immediately re-closing it. Auto-reopen on a
  fixed description still works via 'edited'/'synchronize'.
ci: stop PR bot over-closing on missing checklist + fix reopen loop
mmabrouk added 2 commits June 8, 2026 14:09
The evaluator drawer rendered by WorkflowRevisionDrawerWrapper reimplemented
the run panel gate as `runDisabled={!hasAppConnected}`, ignoring the run-on
mode. So switching its Run-on selector to 'test case' updated the header while
the panel kept showing the 'Select an app' empty state and demanding an app —
the page and creation drawer respected the mode, only this third surface didn't.

Route it through the shared useEvaluatorRunControls hook (+ SelectAppEmptyState
and the prop-less EvaluatorPlaygroundHeader), the same wiring the page and the
creation drawer use, so the gate is `runOnMode === 'app' && !hasAppConnected`
everywhere and the three surfaces can't drift again. Removes this drawer's
duplicated app adapter / app-select / run-gate logic.

Also drop the getDefaultStore() patch from useEvaluatorRunControls: runtime
debugging proved these surfaces are not in a scoped store (the drawer that was
broken is WorkflowRevisionDrawerWrapper, not the scoped-store CreateEvaluator
drawer), so the override was a no-op based on a wrong hypothesis.
feat(frontend): Run-on modes in the evaluator creation drawer (shared controls)
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer python Pull requests that update Python code size:XXL This PR changes 1000+ lines, ignoring generated files. typescript

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants