Simulate — test AI agents before users meet them

Python SDK for the Simulate pillar of Future AGI. Run voice and text agents against persona-driven scenarios, capture transcripts and audio, and feed results straight into evals.

PyPI · Docs · Platform · Main repo · Discord · Issues

Why this SDK?

AI agents hallucinate. They fabricate facts and misquote policies, and a bad output in production is already in the world by the time anyone notices. You can't unit-test "don't make things up" — you run the agent against realistic conversations before real users do.

agent-simulate is the client SDK for the Simulate pillar of Future AGI. It drives voice and text agents through persona-driven scenarios and hands the transcripts + audio to the Evaluate pillar for scoring.

What's in the box

Voice agents (LiveKit)

Connect a simulated customer to an agent sitting in a LiveKit room over WebRTC. Full multi-turn conversation, per-speaker and combined WAV recordings, and a complete transcript.

Text agents (Cloud)

Orchestrate thousands of multi-turn text conversations against any agent framework (OpenAI, Anthropic, LangChain, Gemini, or your own), via Future AGI's hosted simulation backend.

Evaluation-ready

Results drop straight into ai-evaluation via the evaluate_report helper. Score with 50+ built-in metrics or your own rubrics — task completion, tone, audio quality, groundedness, and others.

Install

# Core SDK
pip install agent-simulate

# With voice (LiveKit) support
pip install "agent-simulate[livekit]"

# With evaluation helpers
pip install "agent-simulate[evaluation]"

# Everything
pip install "agent-simulate[all]"

Requires Python 3.10–3.13.

Voice mode: download Silero VAD weights (one time)

The LiveKit engine uses Silero VAD for voice-activity detection. Run this once after installing the [livekit] extra:

from livekit.plugins import silero

if __name__ == "__main__":
    silero.VAD.load()

🚀 Quickstart — Voice agent (LiveKit)

Connects a simulated customer (Alice) to a deployed voice agent waiting in a LiveKit room, records the call, and scores the transcript.

import asyncio
import os
from dotenv import load_dotenv
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner
from fi.simulate.evaluation import evaluate_report

load_dotenv()

async def main():
    # 1. Point at your deployed voice agent
    agent = AgentDefinition(
        name="my-support-agent",
        url=os.environ["LIVEKIT_URL"],
        room_name="support-room",
        system_prompt="Helpful support agent",
    )

    # 2. Describe the test case
    scenario = Scenario(
        name="Password Reset",
        dataset=[
            Persona(
                persona={"name": "Alice", "mood": "frustrated"},
                situation="She cannot log into her account.",
                outcome="The agent should guide her through a password reset.",
            ),
        ],
    )

    # 3. Run the simulation
    runner = TestRunner()
    report = await runner.run_test(
        agent_definition=agent,
        scenario=scenario,
        record_audio=True,  # writes per-speaker + combined WAVs
    )

    # 4. Inspect results
    for r in report.results:
        print(r.transcript)
        print(r.audio_combined_path)

    # 5. Score with Future AGI evals
    evaluated = evaluate_report(
        report,
        eval_specs=[
            {"template": "task_completion",
             "map": {"input": "persona.situation", "output": "transcript"}},
            {"template": "audio_quality",
             "map": {"input_audio": "audio_combined_path"}},
        ],
    )

    for r in evaluated.results:
        for name, scores in (r.evaluation or {}).items():
            print(name, scores["score"], scores["reason"])

asyncio.run(main())

Required environment variables (voice mode):

LIVEKIT_URL="wss://your-livekit-server.com"
LIVEKIT_API_KEY="..."
LIVEKIT_API_SECRET="..."
OPENAI_API_KEY="..."           # for the simulated customer
FI_API_KEY="..."               # for evaluation
FI_SECRET_KEY="..."

🚀 Quickstart — Text agent (Cloud)

Cloud mode runs the scenario orchestration on Future AGI's backend and calls your agent over a local callback. Use it when you want thousands of parallel text conversations against an OpenAI, Anthropic, LangChain, or Gemini agent without running LiveKit.

Create a simulation run from the Future AGI platform and copy its run_id (or name).
Wire your agent to the runner:

import asyncio
import os
from openai import AsyncOpenAI
from fi.simulate import TestRunner, OpenAIAgentWrapper

async def main():
    # Your agent, wrapped in a zero-config adapter
    wrapper = OpenAIAgentWrapper(
        client=AsyncOpenAI(),
        model="gpt-4o-mini",
        system_prompt="You are a helpful support agent.",
    )

    runner = TestRunner(
        api_key=os.environ["FI_API_KEY"],
        secret_key=os.environ["FI_SECRET_KEY"],
    )

    report = await runner.run_test(
        run_test_name="support-agent-smoke-test",  # or run_id="..."
        agent_callback=wrapper,
        concurrency=5,
    )

asyncio.run(main())

Scores and transcripts land in the platform dashboard. The local TestReport is intentionally empty — metrics live in the backend so you can compare runs over time.

_{See examples/test_cloud_simulation.py for a full tool-using walkthrough. That example shows a custom AgentWrapper subclass around the OpenAI Agents SDK — useful when you need tool-call capture beyond the built-in OpenAIAgentWrapper.}

Agent wrappers

Built-in adapters for the most common Python agent frameworks. All are text-only and live under fi.simulate.

Wrapper	Wraps	Import
`OpenAIAgentWrapper`	`openai.OpenAI` / `AsyncOpenAI` (chat.completions)	`from fi.simulate import OpenAIAgentWrapper`
`AnthropicAgentWrapper`	`anthropic.Anthropic` / `AsyncAnthropic`	`from fi.simulate import AnthropicAgentWrapper`
`GeminiAgentWrapper`	`google.generativeai.GenerativeModel`	`from fi.simulate import GeminiAgentWrapper`
`LangChainAgentWrapper`	Any LangChain `Runnable` / chain	`from fi.simulate import LangChainAgentWrapper`
Custom	Anything — subclass `AgentWrapper`	`from fi.simulate import AgentWrapper`

Rolling your own wrapper is a 20-line class — see CONTRIBUTING.md → Adding a new agent wrapper.

Evaluation

The evaluate_report helper delegates to ai-evaluation, Future AGI's Evaluate SDK. It accepts either a named template list or field-mapped specs:

from fi.simulate.evaluation import evaluate_report

# Named templates with sensible defaults
evaluate_report(report, eval_templates=("task_completion", "tone", "is_helpful"))

# Or explicit field mapping — including audio
evaluate_report(
    report,
    eval_specs=[
        {"template": "task_completion",
         "map": {"input": "persona.situation", "output": "transcript"}},
        {"template": "audio_quality",
         "map": {"input_audio": "audio_combined_path"}},
    ],
)

50+ metrics are available out of the box — groundedness, faithfulness, tool-use correctness, RAG context relevance, hallucination, PII, toxicity, bias, audio quality, and custom rubrics. See the evaluation docs for the full catalog.

How this fits into Future AGI

agent-simulate is one of six pillars in the Future AGI platform:

Simulate → Evaluate → Control → Monitor → Optimize · with Agent Command Center as the runtime gateway.

Traces from simulations flow into Monitor, scores flow into Evaluate, and failures feed Optimize — one loop, on your infrastructure.

SDK	Pillar	Purpose
agent-simulate (you are here)	Simulate	Voice + text agent simulation
ai-evaluation	Evaluate	50+ metrics, LLM-as-judge, guardrail scanners
traceAI	Monitor	OpenTelemetry tracing for 50+ AI frameworks
agent-opt	Optimize	6 prompt-optimization algorithms

_{Full platform README →}

Roadmap

Shipped	In progress	Coming up
LiveKit voice simulation engine Cloud simulation engine OpenAI / Anthropic / Gemini / LangChain wrappers Per-speaker + combined audio capture Scenario auto-generation from a topic `evaluate_report` integration with `ai-evaluation` Tool-call capture in wrapper responses	Conversation-graph scenarios (branching flows) Latency, interruption, and turn-taking metrics Streaming transcript API Pluggable voice-backend interface (VAPI / Retell / Pipecat land on the main platform roadmap)	Adversarial persona templates (jailbreak, PII probing) Multi-agent scenarios On-device VAD + STT for air-gapped runs Regression dashboards in-SDK

🤝 Contributing

We love contributions — bug fixes, new wrappers, framework integrations, docs, examples.

Browse good first issue
Read the Contributing Guide
Say hi on Discord
Sign the CLA on your first PR (automatic bot)

🌍 Community & support


💬 Discord	Real-time help from the team and community
🗨️ GitHub Discussions	Ideas, questions, roadmap input
🐦 Twitter / X	Release announcements
📝 Blog	Engineering & research posts
📧 support@futureagi.com	Cloud account / billing
🔐 security@futureagi.com	Private vulnerability disclosure (see SECURITY.md)

📄 License

agent-simulate is licensed under the Apache License 2.0. See LICENSE and NOTICE.

Built by the Future AGI team and contributors worldwide.

If this SDK helps you ship better agents, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulate — test AI agents before users meet them

Why this SDK?

What's in the box

Voice agents (LiveKit)

Text agents (Cloud)

Evaluation-ready

Install

🚀 Quickstart — Voice agent (LiveKit)

🚀 Quickstart — Text agent (Cloud)

Agent wrappers

Evaluation

How this fits into Future AGI

Roadmap

🤝 Contributing

🌍 Community & support

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Simulate — test AI agents before users meet them

Why this SDK?

What's in the box

Voice agents (LiveKit)

Text agents (Cloud)

Evaluation-ready

Install

🚀 Quickstart — Voice agent (LiveKit)

🚀 Quickstart — Text agent (Cloud)

Agent wrappers

Evaluation

How this fits into Future AGI

Roadmap

🤝 Contributing

🌍 Community & support

📄 License