Skip to content

Latest commit

 

History

History
366 lines (262 loc) · 12.6 KB

File metadata and controls

366 lines (262 loc) · 12.6 KB

Future AGI

Simulate — test AI agents before users meet them

Python SDK for the Simulate pillar of Future AGI. Run voice and text agents against persona-driven scenarios, capture transcripts and audio, and feed results straight into evals.

PyPI Python versions Apache 2.0 License PyPI downloads Discord

PyPI · Docs · Platform · Main repo · Discord · Issues


Why this SDK?

AI agents hallucinate. They fabricate facts and misquote policies, and a bad output in production is already in the world by the time anyone notices. You can't unit-test "don't make things up" — you run the agent against realistic conversations before real users do.

agent-simulate is the client SDK for the Simulate pillar of Future AGI. It drives voice and text agents through persona-driven scenarios and hands the transcripts + audio to the Evaluate pillar for scoring.


agent-simulate Demo

What's in the box

Voice agents (LiveKit)

Connect a simulated customer to an agent sitting in a LiveKit room over WebRTC. Full multi-turn conversation, per-speaker and combined WAV recordings, and a complete transcript.

Text agents (Cloud)

Orchestrate thousands of multi-turn text conversations against any agent framework (OpenAI, Anthropic, LangChain, Gemini, or your own), via Future AGI's hosted simulation backend.

Evaluation-ready

Results drop straight into ai-evaluation via the evaluate_report helper. Score with 50+ built-in metrics or your own rubrics — task completion, tone, audio quality, groundedness, and others.


Install

# Core SDK
pip install agent-simulate

# With voice (LiveKit) support
pip install "agent-simulate[livekit]"

# With evaluation helpers
pip install "agent-simulate[evaluation]"

# Everything
pip install "agent-simulate[all]"

Requires Python 3.10–3.13.

Voice mode: download Silero VAD weights (one time)

The LiveKit engine uses Silero VAD for voice-activity detection. Run this once after installing the [livekit] extra:

from livekit.plugins import silero

if __name__ == "__main__":
    silero.VAD.load()

🚀 Quickstart — Voice agent (LiveKit)

Connects a simulated customer (Alice) to a deployed voice agent waiting in a LiveKit room, records the call, and scores the transcript.

import asyncio
import os
from dotenv import load_dotenv
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner
from fi.simulate.evaluation import evaluate_report

load_dotenv()

async def main():
    # 1. Point at your deployed voice agent
    agent = AgentDefinition(
        name="my-support-agent",
        url=os.environ["LIVEKIT_URL"],
        room_name="support-room",
        system_prompt="Helpful support agent",
    )

    # 2. Describe the test case
    scenario = Scenario(
        name="Password Reset",
        dataset=[
            Persona(
                persona={"name": "Alice", "mood": "frustrated"},
                situation="She cannot log into her account.",
                outcome="The agent should guide her through a password reset.",
            ),
        ],
    )

    # 3. Run the simulation
    runner = TestRunner()
    report = await runner.run_test(
        agent_definition=agent,
        scenario=scenario,
        record_audio=True,  # writes per-speaker + combined WAVs
    )

    # 4. Inspect results
    for r in report.results:
        print(r.transcript)
        print(r.audio_combined_path)

    # 5. Score with Future AGI evals
    evaluated = evaluate_report(
        report,
        eval_specs=[
            {"template": "task_completion",
             "map": {"input": "persona.situation", "output": "transcript"}},
            {"template": "audio_quality",
             "map": {"input_audio": "audio_combined_path"}},
        ],
    )

    for r in evaluated.results:
        for name, scores in (r.evaluation or {}).items():
            print(name, scores["score"], scores["reason"])

asyncio.run(main())

Required environment variables (voice mode):

LIVEKIT_URL="wss://your-livekit-server.com"
LIVEKIT_API_KEY="..."
LIVEKIT_API_SECRET="..."
OPENAI_API_KEY="..."           # for the simulated customer
FI_API_KEY="..."               # for evaluation
FI_SECRET_KEY="..."

🚀 Quickstart — Text agent (Cloud)

Cloud mode runs the scenario orchestration on Future AGI's backend and calls your agent over a local callback. Use it when you want thousands of parallel text conversations against an OpenAI, Anthropic, LangChain, or Gemini agent without running LiveKit.

  1. Create a simulation run from the Future AGI platform and copy its run_id (or name).
  2. Wire your agent to the runner:
import asyncio
import os
from openai import AsyncOpenAI
from fi.simulate import TestRunner, OpenAIAgentWrapper

async def main():
    # Your agent, wrapped in a zero-config adapter
    wrapper = OpenAIAgentWrapper(
        client=AsyncOpenAI(),
        model="gpt-4o-mini",
        system_prompt="You are a helpful support agent.",
    )

    runner = TestRunner(
        api_key=os.environ["FI_API_KEY"],
        secret_key=os.environ["FI_SECRET_KEY"],
    )

    report = await runner.run_test(
        run_test_name="support-agent-smoke-test",  # or run_id="..."
        agent_callback=wrapper,
        concurrency=5,
    )

asyncio.run(main())

Scores and transcripts land in the platform dashboard. The local TestReport is intentionally empty — metrics live in the backend so you can compare runs over time.

See examples/test_cloud_simulation.py for a full tool-using walkthrough. That example shows a custom AgentWrapper subclass around the OpenAI Agents SDK — useful when you need tool-call capture beyond the built-in OpenAIAgentWrapper.


Agent wrappers

Built-in adapters for the most common Python agent frameworks. All are text-only and live under fi.simulate.

Wrapper Wraps Import
OpenAIAgentWrapper openai.OpenAI / AsyncOpenAI (chat.completions) from fi.simulate import OpenAIAgentWrapper
AnthropicAgentWrapper anthropic.Anthropic / AsyncAnthropic from fi.simulate import AnthropicAgentWrapper
GeminiAgentWrapper google.generativeai.GenerativeModel from fi.simulate import GeminiAgentWrapper
LangChainAgentWrapper Any LangChain Runnable / chain from fi.simulate import LangChainAgentWrapper
Custom Anything — subclass AgentWrapper from fi.simulate import AgentWrapper

Rolling your own wrapper is a 20-line class — see CONTRIBUTING.md → Adding a new agent wrapper.


Evaluation

The evaluate_report helper delegates to ai-evaluation, Future AGI's Evaluate SDK. It accepts either a named template list or field-mapped specs:

from fi.simulate.evaluation import evaluate_report

# Named templates with sensible defaults
evaluate_report(report, eval_templates=("task_completion", "tone", "is_helpful"))

# Or explicit field mapping — including audio
evaluate_report(
    report,
    eval_specs=[
        {"template": "task_completion",
         "map": {"input": "persona.situation", "output": "transcript"}},
        {"template": "audio_quality",
         "map": {"input_audio": "audio_combined_path"}},
    ],
)

50+ metrics are available out of the box — groundedness, faithfulness, tool-use correctness, RAG context relevance, hallucination, PII, toxicity, bias, audio quality, and custom rubrics. See the evaluation docs for the full catalog.


How this fits into Future AGI

agent-simulate is one of six pillars in the Future AGI platform:

Simulate → Evaluate → Control → Monitor → Optimize · with Agent Command Center as the runtime gateway.

Traces from simulations flow into Monitor, scores flow into Evaluate, and failures feed Optimize — one loop, on your infrastructure.

SDK Pillar Purpose
agent-simulate (you are here) Simulate Voice + text agent simulation
ai-evaluation Evaluate 50+ metrics, LLM-as-judge, guardrail scanners
traceAI Monitor OpenTelemetry tracing for 50+ AI frameworks
agent-opt Optimize 6 prompt-optimization algorithms

Full platform README →


Roadmap

Shipped In progress Coming up
  • LiveKit voice simulation engine
  • Cloud simulation engine
  • OpenAI / Anthropic / Gemini / LangChain wrappers
  • Per-speaker + combined audio capture
  • Scenario auto-generation from a topic
  • evaluate_report integration with ai-evaluation
  • Tool-call capture in wrapper responses
  • Conversation-graph scenarios (branching flows)
  • Latency, interruption, and turn-taking metrics
  • Streaming transcript API
  • Pluggable voice-backend interface (VAPI / Retell / Pipecat land on the main platform roadmap)
  • Adversarial persona templates (jailbreak, PII probing)
  • Multi-agent scenarios
  • On-device VAD + STT for air-gapped runs
  • Regression dashboards in-SDK

🤝 Contributing

We love contributions — bug fixes, new wrappers, framework integrations, docs, examples.

  1. Browse good first issue
  2. Read the Contributing Guide
  3. Say hi on Discord
  4. Sign the CLA on your first PR (automatic bot)

🌍 Community & support

💬 Discord Real-time help from the team and community
🗨️ GitHub Discussions Ideas, questions, roadmap input
🐦 Twitter / X Release announcements
📝 Blog Engineering & research posts
📧 support@futureagi.com Cloud account / billing
🔐 security@futureagi.com Private vulnerability disclosure (see SECURITY.md)

📄 License

agent-simulate is licensed under the Apache License 2.0. See LICENSE and NOTICE.


Built by the Future AGI team and contributors worldwide.

If this SDK helps you ship better agents, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com