Python SDK for the Simulate pillar of Future AGI.
Run voice and text agents against persona-driven scenarios, capture transcripts and audio, and feed results straight into evals.
AI agents hallucinate. They fabricate facts and misquote policies, and a bad output in production is already in the world by the time anyone notices. You can't unit-test "don't make things up" — you run the agent against realistic conversations before real users do.
agent-simulate is the client SDK for the Simulate pillar of Future AGI. It drives voice and text agents through persona-driven scenarios and hands the transcripts + audio to the Evaluate pillar for scoring.
|
Connect a simulated customer to an agent sitting in a LiveKit room over WebRTC. Full multi-turn conversation, per-speaker and combined WAV recordings, and a complete transcript. |
Orchestrate thousands of multi-turn text conversations against any agent framework (OpenAI, Anthropic, LangChain, Gemini, or your own), via Future AGI's hosted simulation backend. |
Results drop straight into |
# Core SDK
pip install agent-simulate
# With voice (LiveKit) support
pip install "agent-simulate[livekit]"
# With evaluation helpers
pip install "agent-simulate[evaluation]"
# Everything
pip install "agent-simulate[all]"Requires Python 3.10–3.13.
Voice mode: download Silero VAD weights (one time)
The LiveKit engine uses Silero VAD for voice-activity detection. Run this once after installing the [livekit] extra:
from livekit.plugins import silero
if __name__ == "__main__":
silero.VAD.load()Connects a simulated customer (Alice) to a deployed voice agent waiting in a LiveKit room, records the call, and scores the transcript.
import asyncio
import os
from dotenv import load_dotenv
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner
from fi.simulate.evaluation import evaluate_report
load_dotenv()
async def main():
# 1. Point at your deployed voice agent
agent = AgentDefinition(
name="my-support-agent",
url=os.environ["LIVEKIT_URL"],
room_name="support-room",
system_prompt="Helpful support agent",
)
# 2. Describe the test case
scenario = Scenario(
name="Password Reset",
dataset=[
Persona(
persona={"name": "Alice", "mood": "frustrated"},
situation="She cannot log into her account.",
outcome="The agent should guide her through a password reset.",
),
],
)
# 3. Run the simulation
runner = TestRunner()
report = await runner.run_test(
agent_definition=agent,
scenario=scenario,
record_audio=True, # writes per-speaker + combined WAVs
)
# 4. Inspect results
for r in report.results:
print(r.transcript)
print(r.audio_combined_path)
# 5. Score with Future AGI evals
evaluated = evaluate_report(
report,
eval_specs=[
{"template": "task_completion",
"map": {"input": "persona.situation", "output": "transcript"}},
{"template": "audio_quality",
"map": {"input_audio": "audio_combined_path"}},
],
)
for r in evaluated.results:
for name, scores in (r.evaluation or {}).items():
print(name, scores["score"], scores["reason"])
asyncio.run(main())Required environment variables (voice mode):
LIVEKIT_URL="wss://your-livekit-server.com"
LIVEKIT_API_KEY="..."
LIVEKIT_API_SECRET="..."
OPENAI_API_KEY="..." # for the simulated customer
FI_API_KEY="..." # for evaluation
FI_SECRET_KEY="..."Cloud mode runs the scenario orchestration on Future AGI's backend and calls your agent over a local callback. Use it when you want thousands of parallel text conversations against an OpenAI, Anthropic, LangChain, or Gemini agent without running LiveKit.
- Create a simulation run from the Future AGI platform and copy its
run_id(or name). - Wire your agent to the runner:
import asyncio
import os
from openai import AsyncOpenAI
from fi.simulate import TestRunner, OpenAIAgentWrapper
async def main():
# Your agent, wrapped in a zero-config adapter
wrapper = OpenAIAgentWrapper(
client=AsyncOpenAI(),
model="gpt-4o-mini",
system_prompt="You are a helpful support agent.",
)
runner = TestRunner(
api_key=os.environ["FI_API_KEY"],
secret_key=os.environ["FI_SECRET_KEY"],
)
report = await runner.run_test(
run_test_name="support-agent-smoke-test", # or run_id="..."
agent_callback=wrapper,
concurrency=5,
)
asyncio.run(main())Scores and transcripts land in the platform dashboard. The local TestReport is intentionally empty — metrics live in the backend so you can compare runs over time.
See examples/test_cloud_simulation.py for a full tool-using walkthrough. That example shows a custom AgentWrapper subclass around the OpenAI Agents SDK — useful when you need tool-call capture beyond the built-in OpenAIAgentWrapper.
Built-in adapters for the most common Python agent frameworks. All are text-only and live under fi.simulate.
| Wrapper | Wraps | Import |
|---|---|---|
OpenAIAgentWrapper |
openai.OpenAI / AsyncOpenAI (chat.completions) |
from fi.simulate import OpenAIAgentWrapper |
AnthropicAgentWrapper |
anthropic.Anthropic / AsyncAnthropic |
from fi.simulate import AnthropicAgentWrapper |
GeminiAgentWrapper |
google.generativeai.GenerativeModel |
from fi.simulate import GeminiAgentWrapper |
LangChainAgentWrapper |
Any LangChain Runnable / chain |
from fi.simulate import LangChainAgentWrapper |
| Custom | Anything — subclass AgentWrapper |
from fi.simulate import AgentWrapper |
Rolling your own wrapper is a 20-line class — see CONTRIBUTING.md → Adding a new agent wrapper.
The evaluate_report helper delegates to ai-evaluation, Future AGI's Evaluate SDK. It accepts either a named template list or field-mapped specs:
from fi.simulate.evaluation import evaluate_report
# Named templates with sensible defaults
evaluate_report(report, eval_templates=("task_completion", "tone", "is_helpful"))
# Or explicit field mapping — including audio
evaluate_report(
report,
eval_specs=[
{"template": "task_completion",
"map": {"input": "persona.situation", "output": "transcript"}},
{"template": "audio_quality",
"map": {"input_audio": "audio_combined_path"}},
],
)50+ metrics are available out of the box — groundedness, faithfulness, tool-use correctness, RAG context relevance, hallucination, PII, toxicity, bias, audio quality, and custom rubrics. See the evaluation docs for the full catalog.
agent-simulate is one of six pillars in the Future AGI platform:
Simulate → Evaluate → Control → Monitor → Optimize · with Agent Command Center as the runtime gateway.
Traces from simulations flow into Monitor, scores flow into Evaluate, and failures feed Optimize — one loop, on your infrastructure.
| SDK | Pillar | Purpose |
|---|---|---|
| agent-simulate (you are here) | Simulate | Voice + text agent simulation |
| ai-evaluation | Evaluate | 50+ metrics, LLM-as-judge, guardrail scanners |
| traceAI | Monitor | OpenTelemetry tracing for 50+ AI frameworks |
| agent-opt | Optimize | 6 prompt-optimization algorithms |
| Shipped | In progress | Coming up |
|---|---|---|
|
|
|
We love contributions — bug fixes, new wrappers, framework integrations, docs, examples.
- Browse
good first issue - Read the Contributing Guide
- Say hi on Discord
- Sign the CLA on your first PR (automatic bot)
| 💬 Discord | Real-time help from the team and community |
| 🗨️ GitHub Discussions | Ideas, questions, roadmap input |
| 🐦 Twitter / X | Release announcements |
| 📝 Blog | Engineering & research posts |
| 📧 support@futureagi.com | Cloud account / billing |
| 🔐 security@futureagi.com | Private vulnerability disclosure (see SECURITY.md) |
agent-simulate is licensed under the Apache License 2.0. See LICENSE and NOTICE.
Built by the Future AGI team and contributors worldwide.
If this SDK helps you ship better agents, a ⭐ helps more teams find us.
🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com

