Distributed AI evaluation infrastructure and deployment gating experiments for generative AI systems.
Agent Vigilo explores what LLM and agent evaluation infrastructure can look like beyond ad hoc scripts: versioned WASM evaluators, durable evaluation runs, worker/coordinator execution, normalized results, and pass/fail gates that can sit in CI or release workflows.
It focuses on the parts of AI evaluation that become hard as systems grow: idempotent distributed work, durable event delivery, evaluator isolation, retry-safe persistence, and auditable results.
- Run evaluations like infrastructure: PostgreSQL-backed state, RabbitMQ work distribution, Rust workers, and deterministic state guards.
- Ship versioned evaluators: publish WASI Preview 2 WebAssembly evaluators with strict WIT contracts.
- Protect the runtime: Wasmtime fuel, memory, timeout, log, and concurrency limits isolate evaluator execution.
- Avoid lost events: durable outbox ledger plus hot delivery queue, RabbitMQ publisher confirms, and idempotency keys.
- Gate deployments: turn evaluator findings into dimension scores, total aggregate scores, and reproducible pass/fail decisions for agent releases.
Evaluator findings are normalized to scores, grouped into profile dimensions, combined into one execution aggregate_score, and checked against the overall score gate. An execution passes when aggregate_score >= min_execution_score and no hard blocking finding fails or errors. A run passes only when every expected execution has an aggregate, no chunk failed or was cancelled, and no execution failed or errored.
A run can fail operationally because work did not complete, or complete with a failed gate because evaluation policy failed.
- Getting started: run your first evaluation.
- Architecture overview: containers, components, flows, and state diagrams.
- Scale-out and shard migration: 128 logical run shards and expansion guidance.
- Worker runtime: chunk claiming, evaluator execution, and result persistence.
- Runtime limits: Wasm evaluator sandbox and worker concurrency controls.
- Outbox lifecycle: durable event publication and retry behavior.
- Publishing evaluators: build and publish versioned WASM evaluators.
Rust, Tokio, PostgreSQL, SQLx, RabbitMQ, Wasmtime, WASI Preview 2, WIT, Docusaurus.
GitHub Actions is the source of truth for build verification. To install the optional local Git hooks:
chmod +x scripts/hooks/pre-commit scripts/hooks/pre-push
git config core.hooksPath scripts/hooksThe pre-commit hook runs nightly rustfmt only. The pre-push hook runs clippy, Rust tests, and the web typecheck. Migration smoke checks, evaluator Wasm builds, and the web production build run in CI.
Agent Vigilo is an active systems project focused on reliable AI evaluation, LLM evaluation workflows, agent testing, and deployment gates. The implementation favors explicit contracts, durable state transitions, and operational diagrams over black-box orchestration.