This document describes the current test structure and manual test flows for Open Agentic 2.0. It is intended for developers who want to verify behavior end-to-end, including adversarial scenarios against legacy agents.
All commands assume:
- Repository root:
~/open-agentic - Virtualenv active:
source .venv/bin/activate
Make sure your local clone is up to date with main:
cd ~/open-agentic
git pull origin main
source .venv/bin/activateAll test commands in this document assume you are on the latest main branch.
The main automated tests live under tests/:
tests/test_agentic2.py– end-to-end tests for Open Agentic 2.0tests/test_audit_chain_all.py– audit-chain validationtests/test_smoke.py– basic smoke tests for Audit and Orchestrator
Run the full suite:
cd ~/open-agentic
source .venv/bin/activate
pytest -q -vvExpected outcome (current state):
- 4 tests collected
- All tests pass
In Terminal 1:
cd ~/open-agentic
source .venv/bin/activate
python meta_stub.py
# Meta stub listening on http://127.0.0.1:8081Leave this terminal running.
In Terminal 2:
cd ~/open-agentic
source .venv/bin/activate
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundleExample output:
{
"done": 3,
"status": "OK",
"trace": "1503d1d5-8fc1-481f-b750-e2836b9950e9",
"audit_file": "audit_1503d1d5-8fc1-481f-b750-e2836b9950e9.jsonl",
"bundle_file": "bundle_1503d1d5-8fc1-481f-b750-e2836b9950e9.json"
}This confirms:
- The plan has three steps.
- All steps meet the evidence thresholds.
- A new
audit_*.jsonlandbundle_*.jsonare written.
This scenario shows:
- Tampering with the audit log breaks the SHA chain and is detected.
maintain_audits.pyquarantines the corrupted audit and writes a_salvagedversion.- The audit-chain test becomes green again without manual file editing.
Use the latest audit_*.jsonl produced in the previous step:
cd ~/open-agentic
source .venv/bin/activate
python - <<'PY'
import pathlib
root = pathlib.Path(".")
audits = sorted(
root.glob("audit_*.jsonl"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
if not audits:
raise SystemExit("No audit_*.jsonl found")
f = audits[0]
print("Manipulating audit file:", f)
lines = f.read_text().splitlines()
if not lines:
raise SystemExit("Audit file is empty")
# Simple corruption: change the first occurrence of "a" to "b" in the last line
lines[-1] = lines[-1].replace("a", "b", 1)
f.write_text("\n".join(lines) + "\n")
print("Last line has been corrupted.")
PYcd ~/open-agentic
source .venv/bin/activate
pytest -q tests/test_audit_chain_all.py -vvExpected: the test fails with a message similar to:
AssertionError: Broken chain (plain SHA): audit_1503d1d5-8fc1-481f-b750-e2836b9950e9.jsonl
cd ~/open-agentic
source .venv/bin/activate
python maintain_audits.pyTypical output:
- Reports which audit file has a broken chain.
- Shows where the chain breaks (line number and line content).
- Moves the original file to:
audit_corrupted/audit_<trace>.jsonl - Writes a new file:
audit_<trace>_salvaged.jsonlwith only the valid prefix of lines.
Then re-run the audit-chain test:
pytest -q tests/test_audit_chain_all.py -vvExpected: the test now passes, because all audit_*.jsonl files in the root have valid chains.
The corrupted original is preserved under audit_corrupted/ for forensic analysis.
These checks exercise the legacy agent and meta agent directly, without the full orchestrator.
cd ~/open-agentic
source .venv/bin/activate
python - <<'PY'
import json, subprocess
payload = {"op": "echo", "params": {"msg": "Hello from legacy direct"}}
p = subprocess.Popen(
["python3", "legacy_agentic.py"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True,
)
out, _ = p.communicate(json.dumps(payload))
print(out)
PYExpected JSON:
ok: trueresult: "Hello from legacy direct"evidence.coveragearound0.85evidence.sourcescontaining["legacy", "echo"]
Assuming meta_stub.py is running on 127.0.0.1:8081:
curl -s -X POST http://127.0.0.1:8081 \
-H "Content-Type: application/json" \
-d '{"op": "echo", "params": {"msg": "Hello from meta direct"}}'Expected JSON:
ok: trueresult: "Hello from meta direct"evidence.coveragearound0.85evidence.sourcescontaining["meta", "echo"]
These tests confirm that both legacy and meta agents behave correctly as individual tools.
This scenario introduces a meta-agent that pretends everything is fine (ok: true), but returns evidence that does not meet the policy thresholds. It is useful to test that the verifier and policy do not trust such responses.
First, stop any running meta_stub.py on port 8081, then:
cd ~/open-agentic
source .venv/bin/activate
python evil_meta_low_evidence.py
# Evil meta (low evidence) listening on http://127.0.0.1:8081This server returns:
ok: truecoverage: 0.10(belowmin_coverage=0.75)sources: [](belowmin_sources=2)
In another terminal:
cd ~/open-agentic
source .venv/bin/activate
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundleExpected behavior:
- The meta step produces weak evidence.
- The verifier or policy should not accept this evidence as satisfying the thresholds.
- The final summary (
donecount, reasons in the bundle) reflects that the meta step was rejected or not trusted.
Developers can inspect the generated bundle_*.json to see per-step evidence, coverage, sources and reasons.
This scenario exercises Open Agentic 2.0 under alternating conditions:
- Healthy meta-agent (strong evidence)
- Adversarial meta-agent (weak evidence)
- Audit integrity maintained across runs via
maintain_audits.py
It shows that the governance layer behaves consistently over time, even when the environment changes.
Terminal 1:
cd ~/open-agentic
source .venv/bin/activate
python meta_stub.py
# Meta stub listening on http://127.0.0.1:8081Terminal 2:
cd ~/open-agentic
source .venv/bin/activate
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundleExpected:
done: 3status: "OK"- New
audit_*.jsonlandbundle_*.json.
Stop the healthy meta-agent (Ctrl+C in Terminal 1) and start the evil meta-agent:
cd ~/open-agentic
source .venv/bin/activate
python evil_meta_low_evidence.py
# Evil meta (low evidence) listening on http://127.0.0.1:8081In Terminal 2, run Open Agentic 2.0 again:
cd ~/open-agentic
source .venv/bin/activate
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundleExpected:
- The JSON summary shows
done: 2andstatus: "OK". - A new
bundle_*.jsonis created for this run.
You can inspect the latest bundle:
python - <<'PY'
import json, pathlib
root = pathlib.Path(".")
bundles = sorted(
root.glob("bundle_*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
if not bundles:
raise SystemExit("No bundle_*.json found")
f = bundles[0]
print("Inspecting bundle:", f)
data = json.loads(f.read_text())
print(json.dumps(data, indent=2))
PYThe bundle reflects:
- A plan with three tasks (
legacy,meta,summarize). - Only two tasks fully counted as
done, because the meta step does not meetmin_coverage/min_sourcesand is not trusted as strong evidence.
After running both the healthy and adversarial scenarios, verify that all audit files are still consistent:
cd ~/open-agentic
source .venv/bin/activate
pytest -q tests/test_audit_chain_all.py -vvIf any audit has been manually or externally tampered, the test will fail.
To repair and quarantine corrupted audits:
cd ~/open-agentic
source .venv/bin/activate
python maintain_audits.py
pytest -q tests/test_audit_chain_all.py -vvExpected:
- All
audit_*.jsonlfiles in the repository root have valid chains. - Any corrupted audits have been moved to
audit_corrupted/with a corresponding_salvagedversion in the root.
This scenario simulates a small "swarm" of runs over time, mixing healthy and adversarial conditions. It does not introduce new code, but reuses the existing meta stub, evil meta and audit maintenance logic.
The goal is to show that:
- Open Agentic 2.0 remains consistent over many runs.
- Evidence thresholds are enforced (some runs have
done: 3, somedone: 2). - Audit-chain integrity is maintained, and
maintain_audits.pycan still repair and quarantine corrupted audits if needed.
Terminal 1 – start healthy meta:
cd ~/open-agentic
source .venv/bin/activate
python meta_stub.py
# Meta stub listening on http://127.0.0.1:8081Terminal 2 – run Open Agentic 2.0 multiple times:
cd ~/open-agentic
source .venv/bin/activate
for i in 1 2 3; do
echo "Run $i (healthy meta)..."
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundle
doneExpected:
- Each run prints JSON with
done: 3andstatus: "OK". - Several new
audit_*.jsonlandbundle_*.jsonfiles are created.
Stop the healthy meta (Ctrl+C in Terminal 1), then start the low-evidence meta:
cd ~/open-agentic
source .venv/bin/activate
python evil_meta_low_evidence.py
# Evil meta (low evidence) listening on http://127.0.0.1:8081In Terminal 2, run another batch:
cd ~/open-agentic
source .venv/bin/activate
for i in 1 2 3; do
echo "Run $i (evil meta)..."
python agentic2_micro_plugin.py \
--plan plan.json \
--policy policy.yaml \
--plugins plugins.yaml \
--min_coverage 0.75 \
--min_sources 2 \
--bundle
doneExpected:
- Some runs show
done: 2andstatus: "OK". - The plan in the corresponding
bundle_*.jsonstill contains three tasks (legacy,meta,summarize), but only two are fully accepted according to the policy/evidence thresholds.
You can inspect the latest bundle:
python - <<'PY'
import json, pathlib
root = pathlib.Path(".")
bundles = sorted(
root.glob("bundle_*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
if not bundles:
raise SystemExit("No bundle_*.json found")
f = bundles[0]
print("Inspecting bundle:", f)
data = json.loads(f.read_text())
print(json.dumps(data, indent=2))
PYThis shows that the plan is stable, while the effective done count depends on whether
the meta-agent evidence satisfies the thresholds.
After both the healthy and adversarial batches:
cd ~/open-agentic
source .venv/bin/activate
pytest -q tests/test_audit_chain_all.py -vvExpected:
- If no manual tampering was done, the test should pass.
- If any audit file was corrupted (for example during separate tamper tests), the test
will fail and report which
audit_*.jsonlhas a broken chain.
To repair and quarantine corrupted audits:
cd ~/open-agentic
source .venv/bin/activate
python maintain_audits.py
pytest -q tests/test_audit_chain_all.py -vvExpected:
- All
audit_*.jsonlfiles in the repository root have valid chains again. - Any corrupted audits were moved to
audit_corrupted/with a corresponding_salvagedversion remaining in the root.
