Skip to content

gregl83/agent-vigilo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

256 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Build Coverage Status Crates.io Documentation MIT licensed

Agent Vigilo

Distributed AI evaluation infrastructure and deployment gating experiments for generative AI systems.

Agent Vigilo explores what LLM and agent evaluation infrastructure can look like beyond ad hoc scripts: versioned WASM evaluators, durable evaluation runs, worker/coordinator execution, normalized results, and pass/fail gates that can sit in CI or release workflows.

It focuses on the parts of AI evaluation that become hard as systems grow: idempotent distributed work, durable event delivery, evaluator isolation, retry-safe persistence, and auditable results.

Why It Matters

  • Run evaluations like infrastructure: PostgreSQL-backed state, RabbitMQ work distribution, Rust workers, and deterministic state guards.
  • Ship versioned evaluators: publish WASI Preview 2 WebAssembly evaluators with strict WIT contracts.
  • Protect the runtime: Wasmtime fuel, memory, timeout, log, and concurrency limits isolate evaluator execution.
  • Avoid lost events: durable outbox ledger plus hot delivery queue, RabbitMQ publisher confirms, and idempotency keys.
  • Gate deployments: turn evaluator findings into dimension scores, total aggregate scores, and reproducible pass/fail decisions for agent releases.

How Results Are Calculated

Evaluator findings are normalized to scores, grouped into profile dimensions, combined into one execution aggregate_score, and checked against the overall score gate. An execution passes when aggregate_score >= min_execution_score and no hard blocking finding fails or errors. A run passes only when every expected execution has an aggregate, no chunk failed or was cancelled, and no execution failed or errored.

A run can fail operationally because work did not complete, or complete with a failed gate because evaluation policy failed.

Start Here

Core Stack

Rust, Tokio, PostgreSQL, SQLx, RabbitMQ, Wasmtime, WASI Preview 2, WIT, Docusaurus.

Development Checks

GitHub Actions is the source of truth for build verification. To install the optional local Git hooks:

chmod +x scripts/hooks/pre-commit scripts/hooks/pre-push
git config core.hooksPath scripts/hooks

The pre-commit hook runs nightly rustfmt only. The pre-push hook runs clippy, Rust tests, and the web typecheck. Migration smoke checks, evaluator Wasm builds, and the web production build run in CI.

Project Status

Agent Vigilo is an active systems project focused on reliable AI evaluation, LLM evaluation workflows, agent testing, and deployment gates. The implementation favors explicit contracts, durable state transitions, and operational diagrams over black-box orchestration.

License

MIT