Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CONCEPTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,11 @@ Shared domain vocabulary for this project — entities, named processes, and sta
**Gate policy** — The explicit rule that decides whether repeated attempts pass CI, such as `all_attempts_successful`, `any_attempt_successful`, `attempt_success_rate_at_least`, or `mean_pass_rate_at_least`. Without a repeat-run gate policy, AgentV preserves the normal single-run gate behavior and treats repeat statistics as report data.

**Flaky eval outcome** — A repeat-run aggregate whose attempts disagree, or whose failure classification points at verifier, infrastructure, or timeout instability rather than a stable model-quality failure.

## Release Channels

**Stable release** — A package publication channel whose surfaces are treated as compatibility commitments for normal users.

**Next tag** — A prerelease package channel used to validate upcoming AgentV surfaces before they become stable compatibility commitments.

Next-tag-only surfaces may be hard-corrected before stable release when preserving them would encode an unsafe or misleading contract. Stable-release surfaces need an explicit compatibility or migration strategy.
2 changes: 1 addition & 1 deletion apps/cli/src/commands/results/remote.ts
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ export interface ResultsPublishOverrides {
readonly remote?: string;
readonly auto_push?: boolean;
readonly require_push?: boolean;
readonly push_conflict_policy?: 'block' | 'backup_and_force_push';
readonly push_conflict_policy?: 'block';
}

const REMOTE_RUN_PREFIX = 'remote::';
Expand Down
15 changes: 15 additions & 0 deletions apps/dashboard/src/lib/project-sync-status.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,21 @@ describe('buildProjectSyncFeedback', () => {
expect(feedback.message).toContain('pulled remote results');
});

it('surfaces auto-merged remote changes in successful sync feedback', () => {
const feedback = buildProjectSyncFeedback({
configured: true,
available: true,
sync_status: 'clean',
auto_merged_remote: true,
push_performed: true,
run_count: 2,
});

expect(feedback.kind).toBe('success');
expect(feedback.message).toContain('Merged remote (auto)');
expect(feedback.message).toContain('pushed local results');
});

it('keeps blocked sync feedback explicit', () => {
expect(
buildProjectSyncFeedback({
Expand Down
1 change: 1 addition & 0 deletions apps/dashboard/src/lib/project-sync-status.ts
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ export function buildProjectSyncFeedback(status: RemoteStatusResponse): {
const actions = [
status.commit_created ? 'committed pending metadata' : undefined,
status.pull_performed ? 'pulled remote results' : undefined,
status.auto_merged_remote ? 'Merged remote (auto)' : undefined,
status.push_performed ? 'pushed local results' : undefined,
].filter((action): action is string => action !== undefined);

Expand Down
3 changes: 2 additions & 1 deletion apps/dashboard/src/lib/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ export interface RemoteStatusResponse {
local_dir?: string;
path?: string;
auto_push?: boolean;
push_conflict_policy?: 'block' | 'backup_and_force_push';
push_conflict_policy?: 'block';
branch_prefix?: string;
run_count?: number;
last_synced_at?: string;
Expand Down Expand Up @@ -500,6 +500,7 @@ export interface RemoteStatusResponse {
pull_performed?: boolean;
push_performed?: boolean;
commit_created?: boolean;
auto_merged_remote?: boolean;
target_branch?: string;
remote_commit?: string;
local_commit?: string;
Expand Down
4 changes: 2 additions & 2 deletions apps/web/src/content/docs/docs/tools/dashboard.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ projects:
push_conflict_policy: block
```

`results.repo.remote` is the Git remote URL AgentV fetches and pushes. `results.repo.path: .` stores completed run artifacts on a dedicated branch of the source repository without checking out that branch in the source worktree. AgentV manages the local Git remote alias for that URL, so the normal config stays portable across machines. When `results.repo.remote` is omitted, `results.repo.path` means an existing local Git checkout whose object database and refs AgentV should write to, and the branch defaults to `agentv/results/v1`. AgentV creates the branch automatically on first publish and commits only AgentV result paths into it. `sync.auto_push: false` keeps the result commit local; set it to `true` to push the branch best-effort after each completed run. `sync.require_push: true` is for CI workflows where a push failure should fail the command after local artifacts are written. `sync.push_conflict_policy` defaults to `block`; the `backup_and_force_push` value is deprecated and no longer force-pushes. Non-fast-forward result branch pushes are auto-merged with artifact-aware Git merge drivers and pushed as a fast-forward, so the canonical results branch is never force-pushed or rewritten. Genuine overlay conflicts route to a timestamped temp branch plus a GitHub compare link for a human merge instead.
`results.repo.remote` is the Git remote URL AgentV fetches and pushes. `results.repo.path: .` stores completed run artifacts on a dedicated branch of the source repository without checking out that branch in the source worktree. AgentV manages the local Git remote alias for that URL, so the normal config stays portable across machines. When `results.repo.remote` is omitted, `results.repo.path` means an existing local Git checkout whose object database and refs AgentV should write to, and the branch defaults to `agentv/results/v1`. AgentV creates the branch automatically on first publish and commits only AgentV result paths into it. `sync.auto_push: false` keeps the result commit local; set it to `true` to push the branch best-effort after each completed run. `sync.require_push: true` is for CI workflows where a push failure should fail the command after local artifacts are written. `sync.push_conflict_policy` defaults to `block`; the removed `backup_and_force_push` value is rejected with migration guidance because AgentV never force-pushes result branches. Non-fast-forward result branch pushes are auto-merged with artifact-aware Git merge drivers and pushed as a fast-forward, so the canonical results branch is never force-pushed or rewritten. Genuine overlay conflicts route to a timestamped temp branch plus a GitHub compare link for a human merge instead.

For a separate results repository, use `results.repo.remote` and an optional managed clone `results.repo.path`:

Expand Down Expand Up @@ -425,7 +425,7 @@ After sync, newly fetched remote runs appear in the list with a **remote** sourc
- Safe uncommitted changes under the configured results repo's owned result and metadata paths, such as remote tag overlays under `metadata/runs/**`, are committed and pushed when `sync.auto_push: true`.
- A local results repo that is ahead is pushed when `sync.auto_push: true` and the committed paths are all under `.agentv/results/**`.
- Dirty non-results files, dirty metadata plus remote changes, unresolved conflicts, missing upstream branches, non-results commits ahead, and rejected pushes are blocked instead of reset.
- Non-fast-forward result branch pushes never force-push. AgentV runs a bounded fetch → merge → push loop that absorbs concurrent remote writes with a real merge commit using artifact-aware Git merge drivers (union for the append-only `index.jsonl`, a JSON-union driver for tag and feedback overlays), so the common append-mostly case auto-merges and pushes as a fast-forward. The `sync.push_conflict_policy: backup_and_force_push` value is deprecated and no longer force-pushes; it now auto-merges like the default and emits a one-time deprecation notice.
- Non-fast-forward result branch pushes never force-push. AgentV runs a bounded fetch → merge → push loop that absorbs concurrent remote writes with a real merge commit using artifact-aware Git merge drivers (union for the append-only `index.jsonl`, a JSON-union driver for tag and feedback overlays), so the common append-mostly case auto-merges and pushes as a fast-forward. When Dashboard sync absorbs concurrent remote changes this way, the success feedback includes **Merged remote (auto)**. The removed `sync.push_conflict_policy: backup_and_force_push` value is rejected with migration guidance; remove the field or set it to `block`.
- When a genuine overlay conflict cannot be auto-merged, AgentV does not touch the canonical branch. It pushes the local work to a fresh timestamped `agentv/results-sync/<timestamp>-<branch-slug>-<random>` branch and reports `needs_human_merge` with a `pending_merge` block (temp branch, target branch, and a GitHub compare URL when the remote is on GitHub). The toolbar shows a **Pending merge** card: open the link to merge the branch into the canonical target on GitHub (GitHub's pull request is the conflict surface — AgentV builds no merge UI), then click **I merged it — resync**. That resumes canonical sync by fast-forward-pulling the merged target. A premature click is a safe no-op — local work stays intact and the next sync re-creates a temp branch.

When sync is blocked, Dashboard keeps the local clone intact and shows the `block_reason`, `dirty_paths` or `conflicted_paths`, `git_status`, and a compact `git_diff_summary` so you can resolve the results repo manually before syncing again.
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/docs/tools/results.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ The CLI contract is deliberately narrow: `agentv results` manages local result a

Use these supported remote workflows instead:

- **Automatic publishing:** configure `projects[].results` or top-level `results`; new `agentv eval` and `agentv pipeline bench` runs publish completed artifacts after the run completes. Use `repo.remote` with `repo.path: .` and `repo.branch: agentv/results/v1` to store primary result records on a dedicated branch of the source repo without requiring a machine-local Git remote name. AgentV reserves `agentv/results/v1` for primary results and `agentv/artifacts/v1` for heavy artifact payloads. When `index.jsonl` rows point trace or transcript payloads at `agentv/artifacts/v1`, automatic publishing stores those bytes on that artifact branch in the same remote and publishes pointer keys such as `runs/<run-path>/<pointer.path>`. The configured results branch remains the metadata/control plane (`index.jsonl`, `benchmark.json`, tags, and pointers) instead of duplicating canonical trace/transcript payload bodies. Local pre-publish run workspaces can still contain those files beside the manifest so local tools keep working. Mutable run tags are stored as `tags.json` with a `tag_revision`; there is no tag event log in the normal results layout. `results.repo.path` without `results.repo.remote` means an existing local Git checkout, distinct from `workspace.repos[].repo`, which is a portable repository identity. AgentV manages any local Git remote alias internally. Set `sync.auto_push: true` to push after publish, or `sync.require_push: true` in CI to fail when that push fails. Non-fast-forward result branch pushes never force-push: AgentV auto-merges concurrent remote writes with artifact-aware Git merge drivers (a union driver for the append-only `index.jsonl`, a JSON-union driver for tag and feedback overlays) and pushes the merge as a fast-forward, and routes a genuine overlay conflict to a timestamped `agentv/results-sync/...` branch plus a GitHub compare/PR link for a human merge. `sync.push_conflict_policy: backup_and_force_push` is deprecated and no longer force-pushes — it now auto-merges like the default `block` and emits a one-time deprecation notice. While an eval is still running, [WIP checkpoints](/docs/tools/wip-checkpoints/) can keep partial run output durable on `agentv/wip/...` branches when auto-push is enabled.
- **Automatic publishing:** configure `projects[].results` or top-level `results`; new `agentv eval` and `agentv pipeline bench` runs publish completed artifacts after the run completes. Use `repo.remote` with `repo.path: .` and `repo.branch: agentv/results/v1` to store primary result records on a dedicated branch of the source repo without requiring a machine-local Git remote name. AgentV reserves `agentv/results/v1` for primary results and `agentv/artifacts/v1` for heavy artifact payloads. When `index.jsonl` rows point trace or transcript payloads at `agentv/artifacts/v1`, automatic publishing stores those bytes on that artifact branch in the same remote and publishes pointer keys such as `runs/<run-path>/<pointer.path>`. The configured results branch remains the metadata/control plane (`index.jsonl`, `benchmark.json`, tags, and pointers) instead of duplicating canonical trace/transcript payload bodies. Local pre-publish run workspaces can still contain those files beside the manifest so local tools keep working. Mutable run tags are stored as `tags.json` with a `tag_revision`; there is no tag event log in the normal results layout. `results.repo.path` without `results.repo.remote` means an existing local Git checkout, distinct from `workspace.repos[].repo`, which is a portable repository identity. AgentV manages any local Git remote alias internally. Set `sync.auto_push: true` to push after publish, or `sync.require_push: true` in CI to fail when that push fails. Non-fast-forward result branch pushes never force-push: AgentV auto-merges concurrent remote writes with artifact-aware Git merge drivers (a union driver for the append-only `index.jsonl`, a JSON-union driver for tag and feedback overlays) and pushes the merge as a fast-forward, and routes a genuine overlay conflict to a timestamped `agentv/results-sync/...` branch plus a GitHub compare/PR link for a human merge. The removed `sync.push_conflict_policy: backup_and_force_push` value is rejected with migration guidance; remove the field or set it to `block`. While an eval is still running, [WIP checkpoints](/docs/tools/wip-checkpoints/) can keep partial run output durable on `agentv/wip/...` branches when auto-push is enabled.
- **Manual Dashboard sync:** run `agentv dashboard`, open the project, and use **Sync Project**.
- **Manual API sync:** while Dashboard is running, call `GET /api/projects/:projectId/remote/status` or `POST /api/projects/:projectId/remote/sync` for project-scoped automation. Single-project sessions also expose `GET /api/remote/status` and `POST /api/remote/sync`.
- **Git escape hatch:** for advanced recovery, inspect or repair the configured `projects[].results.repo.path` clone with `git` directly, then sync again.
13 changes: 8 additions & 5 deletions docs/adr/2026-06-24-no-force-push-results-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,11 @@ all of it and is safe: a premature OK just pulls a target lacking the local work
re-diverges on the next push, and re-creates a temp branch — no data loss, no
force push.

`backup_and_force_push` is **deprecated, not removed**: the config value still
validates but now auto-merges like the default and emits a one-time deprecation
notice, so shipped surfaces referencing it keep working.
`backup_and_force_push` is **hard-deprecated/removed** from supported config:
the value shipped only on the `next` npm tag before stable release, so AgentV
now rejects it with migration guidance instead of preserving a compatibility
alias. Remove the field or set `sync.push_conflict_policy: block`; AgentV never
force-pushes result branches.

## Consequences

Expand Down Expand Up @@ -112,8 +114,9 @@ notice, so shipped surfaces referencing it keep working.
Delivered in phases under epic av-raf (all non-breaking):

- Phase 0 — `.gitattributes` + `agentv-json` merge driver registration (#1506).
- Phase 1 — bounded `fetch → merge → push` loop replacing the force-push path;
`backup_and_force_push` deprecated (#1506).
- Phase 1 — bounded `fetch → merge → push` loop replacing the force-push path
(#1506); `backup_and_force_push` hard-deprecated before stable release
(#1510).
- Phase 2 — temp-branch fallback + `confirm-merge` (OK-to-resync) API (#1507).
- Phase 3 — Dashboard **Pending merge** card with the GitHub link + resync button
(#1508).
Expand Down
75 changes: 75 additions & 0 deletions docs/solutions/conventions/hard-correct-next-tag-only-surfaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Hard-correct next-tag-only surfaces before stable release
date: 2026-06-25
category: conventions
module: Release compatibility
problem_type: convention
component: development_workflow
severity: medium
applies_when:
- Removing or renaming a config value, wire field, CLI flag, or public API surface
- Deciding whether a shipped-looking surface needs backward compatibility
tags: [release-channel, compatibility, deprecation, config-schema]
---

# Hard-correct next-tag-only surfaces before stable release

## Context

AgentV briefly exposed `results.sync.push_conflict_policy: backup_and_force_push`
on the npm `next` tag while replacing force-push results sync with a no-force
merge loop. Treating that as a stable shipped surface would have kept a
misleading compatibility alias around even though the value contradicted the new
product invariant: AgentV never force-pushes result branches.

## Guidance

When checking whether a config value or public surface has shipped, distinguish
release channels:

- Stable npm releases require normal compatibility handling: preserve behavior,
soft-deprecate, or provide an explicit migration path.
- `next`-only releases can be hard-corrected before the surface reaches stable,
especially when preserving the surface would encode a dangerous or misleading
contract.

For removed config values, make the correction explicit:

```yaml
results:
sync:
# Remove unsupported aliases and use the stable default.
push_conflict_policy: block
```

If existing local registries or generated config may contain the removed value,
either reject it with migration guidance or drop it during a registry migration
that rewrites the supported shape on the next save.

## Why This Matters

Pre-release tags are useful for discovering wrong API names and unsafe contracts.
If every `next` exposure becomes permanent compatibility debt, the project loses
the ability to correct those mistakes before stable release. The compatibility
bar should protect stable users without forcing unsafe pre-release names into
the long-term schema.

## When to Apply

- A value, flag, or field appeared only on npm `next` or another prerelease
channel.
- The replacement behavior is already stable and safer.
- Keeping the old surface would confuse users about current behavior or
preserve a hazardous name.

## Examples

`backup_and_force_push` should not remain a supported
`results.sync.push_conflict_policy` value after the force-push implementation is
removed. Even though it appeared on a published `next` tarball, the stable
migration is to remove the field or set it to `block`; AgentV's actual behavior
is a no-force-push merge loop.

## Related

- docs/adr/2026-06-24-no-force-push-results-sync.md
17 changes: 8 additions & 9 deletions packages/core/src/evaluation/loaders/config-loader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ export type ExecutionDefaults = {
readonly pool_slots?: number;
};

export type ResultPushConflictPolicy = 'block' | 'backup_and_force_push';
export type ResultPushConflictPolicy = 'block';

export type ResultsConfig = {
readonly mode?: 'github';
Expand Down Expand Up @@ -782,21 +782,20 @@ export function parseResultsConfig(raw: unknown, configPath: string): ResultsCon
logWarning(`Invalid results.sync.require_push in ${configPath}, expected boolean`);
return undefined;
}
if (
syncObj.push_conflict_policy !== undefined &&
syncObj.push_conflict_policy !== 'block' &&
syncObj.push_conflict_policy !== 'backup_and_force_push'
) {
if (syncObj.push_conflict_policy === 'backup_and_force_push') {
logWarning(
`Invalid results.sync.push_conflict_policy in ${configPath}, expected 'block' or 'backup_and_force_push'`,
`results.sync.push_conflict_policy: 'backup_and_force_push' in ${configPath} is no longer supported. Remove the field or set it to 'block'; AgentV never force-pushes result branches.`,
);
return undefined;
}
if (syncObj.push_conflict_policy !== undefined && syncObj.push_conflict_policy !== 'block') {
logWarning(`Invalid results.sync.push_conflict_policy in ${configPath}, expected 'block'`);
return undefined;
}
sync = {
...(typeof syncObj.auto_push === 'boolean' && { auto_push: syncObj.auto_push }),
...(typeof syncObj.require_push === 'boolean' && { require_push: syncObj.require_push }),
...((syncObj.push_conflict_policy === 'block' ||
syncObj.push_conflict_policy === 'backup_and_force_push') && {
...(syncObj.push_conflict_policy === 'block' && {
push_conflict_policy: syncObj.push_conflict_policy,
}),
};
Expand Down
Loading
Loading