Agent Vigilo

Distributed AI evaluation infrastructure and deployment gating experiments for generative AI systems.

Agent Vigilo explores what LLM and agent evaluation infrastructure can look like beyond ad hoc scripts: versioned WASM evaluators, durable evaluation runs, worker/coordinator execution, normalized results, and pass/fail gates that can sit in CI or release workflows.

It focuses on the parts of AI evaluation that become hard as systems grow: idempotent distributed work, durable event delivery, evaluator isolation, retry-safe persistence, and auditable results.

Why It Matters

Run evaluations like infrastructure: PostgreSQL-backed state, RabbitMQ work distribution, Rust workers, and deterministic state guards.
Ship versioned evaluators: publish WASI Preview 2 WebAssembly evaluators with strict WIT contracts.
Protect the runtime: Wasmtime fuel, memory, timeout, log, and concurrency limits isolate evaluator execution.
Avoid lost events: durable outbox ledger plus hot delivery queue, RabbitMQ publisher confirms, and idempotency keys.
Gate deployments: turn evaluator findings into dimension scores, total aggregate scores, and reproducible pass/fail decisions for agent releases.

How Results Are Calculated

Evaluator findings are normalized to scores, grouped into profile dimensions, combined into one execution aggregate_score, and checked against the overall score gate. An execution passes when aggregate_score >= min_execution_score and no hard blocking finding fails or errors. A run passes only when every expected execution has an aggregate, no chunk failed or was cancelled, and no execution failed or errored.

A run can fail operationally because work did not complete, or complete with a failed gate because evaluation policy failed.

Start Here

Getting started: run your first evaluation.
Architecture overview: containers, components, flows, and state diagrams.
Scale-out and shard migration: 128 logical run shards and expansion guidance.
Worker runtime: chunk claiming, evaluator execution, and result persistence.
Runtime limits: Wasm evaluator sandbox and worker concurrency controls.
Outbox lifecycle: durable event publication and retry behavior.
Publishing evaluators: build and publish versioned WASM evaluators.

Core Stack

Rust, Tokio, PostgreSQL, SQLx, RabbitMQ, Wasmtime, WASI Preview 2, WIT, Docusaurus.

Development Checks

GitHub Actions is the source of truth for build verification. To install the optional local Git hooks:

chmod +x scripts/hooks/pre-commit scripts/hooks/pre-push
git config core.hooksPath scripts/hooks

The pre-commit hook runs nightly rustfmt only. The pre-push hook runs clippy, Rust tests, and the web typecheck. Migration smoke checks, evaluator Wasm builds, and the web production build run in CI.

Project Status

Agent Vigilo is an active systems project focused on reliable AI evaluation, LLM evaluation workflows, agent testing, and deployment gates. The implementation favors explicit contracts, durable state transitions, and operational diagrams over black-box orchestration.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github/workflows		.github/workflows
evaluators		evaluators
example		example
infra		infra
migrations		migrations
scripts/hooks		scripts/hooks
vigilo		vigilo
web		web
wit		wit
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Vigilo

Why It Matters

How Results Are Calculated

Start Here

Core Stack

Development Checks

Project Status

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Vigilo

Why It Matters

How Results Are Calculated

Start Here

Core Stack

Development Checks

Project Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages