Summary
Workspace-Bench already has several useful isolation boundaries: task metadata contains rubrics and expected outputs, agent runs and rubric judging are separate stages, and agent_as_a_judge.py builds a restricted judge_view that exposes original inputs plus candidate outputs while avoiding GT-like directories such as output, output_cc, or gt.
I think it would be useful to add a small canary leakage audit that proves these boundaries stay intact across harnesses and future runner changes.
Why this matters
For workspace benchmarks, leakage can be subtle. An agent does not need the full answer key to overfit; it may be enough to see:
- rubric-only wording,
- expected output hints beyond the task prompt,
- reference output filenames plus nearby GT artifacts,
- judge-only metadata during the agent execution phase,
- or stale files left in a restored workspace.
This is not a bug report. The current code already appears to care about this boundary. The proposal is to make the boundary testable.
Proposed canary audit
Add a tiny synthetic task or CI fixture with explicit canary strings placed in different visibility zones:
-
Agent-visible input canary
- Present in normal input files.
- The agent is allowed to read this.
-
Rubric-only canary
- Present only in
metadata.json rubrics.
- Should not appear in the tested agent's prompt, trace, or workspace during the agent-run phase.
-
Ground-truth canary
- Present only in GT-like directories such as
output, output_cc, or gt.
- Should not be copied into the tested agent workspace.
- Should not be copied into
judge_view.
-
Candidate-output canary
- Present only if the tested agent generated it.
- The judge is allowed to see this through
judge_view/candidate_output.
Expected assertions:
agent phase:
can see input canary
cannot see rubric-only canary
cannot see ground-truth canary
judge phase:
can see input canary
can see candidate-output canary
can see rubrics if intentionally supplied to the judge
cannot see ground-truth canary
Useful output
The audit could emit a small JSON report such as:
{
"agent_visible_paths": ["..."],
"judge_visible_paths": ["..."],
"rubric_canary_seen_by_agent": false,
"ground_truth_canary_seen_by_agent": false,
"ground_truth_canary_seen_by_judge": false,
"passed": true
}
This would make leakage failures easy to debug without exposing real benchmark answers.
Related reference
I have been working on a separate bounded verifier harness here:
https://github.com/sunghunkwag/rsi-metaforge-core
The relevant pattern is narrow: sealed hidden evaluations, explicit train-only rejection, and evidence that hidden expectations/scoring artifacts are not exposed to the adaptive loop. This is not an AGI claim; it is just a verifier-discipline pattern that seems relevant to Workspace-Bench's rubric and workspace isolation story.
Summary
Workspace-Bench already has several useful isolation boundaries: task metadata contains rubrics and expected outputs, agent runs and rubric judging are separate stages, and
agent_as_a_judge.pybuilds a restrictedjudge_viewthat exposes original inputs plus candidate outputs while avoiding GT-like directories such asoutput,output_cc, orgt.I think it would be useful to add a small canary leakage audit that proves these boundaries stay intact across harnesses and future runner changes.
Why this matters
For workspace benchmarks, leakage can be subtle. An agent does not need the full answer key to overfit; it may be enough to see:
This is not a bug report. The current code already appears to care about this boundary. The proposal is to make the boundary testable.
Proposed canary audit
Add a tiny synthetic task or CI fixture with explicit canary strings placed in different visibility zones:
Agent-visible input canary
Rubric-only canary
metadata.jsonrubrics.Ground-truth canary
output,output_cc, orgt.judge_view.Candidate-output canary
judge_view/candidate_output.Expected assertions:
Useful output
The audit could emit a small JSON report such as:
{ "agent_visible_paths": ["..."], "judge_visible_paths": ["..."], "rubric_canary_seen_by_agent": false, "ground_truth_canary_seen_by_agent": false, "ground_truth_canary_seen_by_judge": false, "passed": true }This would make leakage failures easy to debug without exposing real benchmark answers.
Related reference
I have been working on a separate bounded verifier harness here:
https://github.com/sunghunkwag/rsi-metaforge-core
The relevant pattern is narrow: sealed hidden evaluations, explicit train-only rejection, and evidence that hidden expectations/scoring artifacts are not exposed to the adaptive loop. This is not an AGI claim; it is just a verifier-discipline pattern that seems relevant to Workspace-Bench's rubric and workspace isolation story.