Add a canary leakage audit for rubric and ground-truth isolation

## Summary

Workspace-Bench already has several useful isolation boundaries: task metadata contains rubrics and expected outputs, agent runs and rubric judging are separate stages, and `agent_as_a_judge.py` builds a restricted `judge_view` that exposes original inputs plus candidate outputs while avoiding GT-like directories such as `output`, `output_cc`, or `gt`.

I think it would be useful to add a small canary leakage audit that proves these boundaries stay intact across harnesses and future runner changes.

## Why this matters

For workspace benchmarks, leakage can be subtle. An agent does not need the full answer key to overfit; it may be enough to see:

- rubric-only wording,
- expected output hints beyond the task prompt,
- reference output filenames plus nearby GT artifacts,
- judge-only metadata during the agent execution phase,
- or stale files left in a restored workspace.

This is not a bug report. The current code already appears to care about this boundary. The proposal is to make the boundary testable.

## Proposed canary audit

Add a tiny synthetic task or CI fixture with explicit canary strings placed in different visibility zones:

1. **Agent-visible input canary**
   - Present in normal input files.
   - The agent is allowed to read this.

2. **Rubric-only canary**
   - Present only in `metadata.json` rubrics.
   - Should not appear in the tested agent's prompt, trace, or workspace during the agent-run phase.

3. **Ground-truth canary**
   - Present only in GT-like directories such as `output`, `output_cc`, or `gt`.
   - Should not be copied into the tested agent workspace.
   - Should not be copied into `judge_view`.

4. **Candidate-output canary**
   - Present only if the tested agent generated it.
   - The judge is allowed to see this through `judge_view/candidate_output`.

Expected assertions:

```text
agent phase:
  can see input canary
  cannot see rubric-only canary
  cannot see ground-truth canary

judge phase:
  can see input canary
  can see candidate-output canary
  can see rubrics if intentionally supplied to the judge
  cannot see ground-truth canary
```

## Useful output

The audit could emit a small JSON report such as:

```json
{
  "agent_visible_paths": ["..."],
  "judge_visible_paths": ["..."],
  "rubric_canary_seen_by_agent": false,
  "ground_truth_canary_seen_by_agent": false,
  "ground_truth_canary_seen_by_judge": false,
  "passed": true
}
```

This would make leakage failures easy to debug without exposing real benchmark answers.

## Related reference

I have been working on a separate bounded verifier harness here:

https://github.com/sunghunkwag/rsi-metaforge-core

The relevant pattern is narrow: sealed hidden evaluations, explicit train-only rejection, and evidence that hidden expectations/scoring artifacts are not exposed to the adaptive loop. This is not an AGI claim; it is just a verifier-discipline pattern that seems relevant to Workspace-Bench's rubric and workspace isolation story.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a canary leakage audit for rubric and ground-truth isolation #8

Summary

Why this matters

Proposed canary audit

Useful output

Related reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a canary leakage audit for rubric and ground-truth isolation #8

Description

Summary

Why this matters

Proposed canary audit

Useful output

Related reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions