🤖 ROBO CODED — This project was made with AI and may not be 100% sane. But the code does work! 🎉
A voice-powered AI assistant that answers phone calls, understands natural language, and performs actions like checking weather, setting timers, scheduling callbacks, and more.
| Feature | Description |
|---|---|
| 🎙️ Voice Conversations | Natural speech-to-text and text-to-speech powered by Whisper & Kokoro |
| 🤖 LLM Integration | Connects to OpenAI, vLLM, Ollama, LM Studio, and more |
| 🔧 Built-in Tools | Weather, timers, callbacks, date/time, calculator, jokes |
| 🔌 Plugin System | Easily add custom tools with Python |
| 🌐 REST API | Initiate outbound calls, execute tools, manage schedules |
| ⏰ Scheduled Calls | One-time or recurring calls (daily briefings, reminders) |
| 🔗 Webhooks | Trigger calls from Home Assistant, n8n, Grafana, and more |
| 🗣️ Custom Phrases | Customize greetings, goodbyes, and responses via JSON or env vars |
| 📊 Observability | Prometheus metrics, OpenTelemetry tracing, structured JSON logs |
| Use Case | Example |
|---|---|
| ⏲️ Timers & Reminders | "Set a timer for 10 minutes" |
| 📞 Callbacks | "Call me back in an hour" |
| 🌤️ Weather Briefings | Scheduled morning weather calls |
| 📅 Appointment Reminders | Outbound calls with confirmation |
| 🚨 Alerts & Notifications | Webhook-triggered phone calls |
| 🏠 Smart Home | Voice control via phone |
Call the assistant and say:
🗣️ "What's the weather like?"
sequenceDiagram
participant User as 👤 User
participant Agent as 🤖 SIP Agent
participant STT as 🎤 Speaches
participant LLM as 🧠 LLM
participant Tool as 🌤️ Weather Tool
User->>Agent: "What's the weather like?"
Agent->>STT: Audio stream
STT-->>Agent: Transcribed text
Agent->>LLM: User query + context
LLM-->>Agent: [TOOL:WEATHER]
Agent->>Tool: Execute
Tool-->>Agent: Weather data
Agent->>LLM: Tool result
LLM-->>Agent: Natural response
Agent->>STT: Text to speech
STT-->>Agent: Audio
Agent->>User: "At Storm Lake, it's 44°..."
Assistant responds:
🤖 "At Storm Lake, as of 9:30 pm, it's 44 degrees with foggy conditions. Wind is calm."
flowchart LR
subgraph Caller
Phone[📱 SIP Phone]
end
subgraph Agent["🤖 SIP AI Agent"]
SIP[SIP Client]
Audio[Audio Pipeline]
Tools[Tool Manager]
API[REST API]
end
subgraph Services
LLM[🧠 LLM Server<br/>OpenAI / vLLM / Ollama]
Speaches[🎤 Speaches<br/>STT + TTS]
end
subgraph Integrations
HA[🏠 Home Assistant]
N8N[🔄 n8n]
Webhook[🔗 Webhooks]
end
Phone <-->|SIP/RTP| SIP
SIP <--> Audio
Audio <-->|Whisper| Speaches
Audio <-->|Kokoro| Speaches
Audio <--> Tools
Tools <-->|OpenAI API| LLM
API <--> Tools
HA -->|HTTP| API
N8N -->|HTTP| API
Webhook -->|HTTP| API
| Service | Purpose | URL |
|---|---|---|
| 🤖 SIP Agent | AI Voice Assistant API | localhost:8080 |
| 🎤 Speaches | STT/TTS (Whisper + Kokoro) | localhost:8001 |
| 🧠 vLLM | LLM Inference | localhost:8000 |
| 🔴 Redis | Call Queue & Cache | redis://localhost:6379 |
| 📊 Prometheus | Metrics Collection | localhost:9090 |
| 📈 Grafana | Dashboards | localhost:3000 |
| 📝 Loki | Log Aggregation | localhost:3100 |
| 🔍 Tempo | Distributed Tracing | localhost:3200 |
| 🔄 n8n | Workflow Automation | localhost:5678 |
| Requirement | Description |
|---|---|
| 🐳 Docker | Docker and Docker Compose |
| 📞 SIP Server | FreePBX, Asterisk, 3CX, or any SIP PBX |
# Clone the repository
git clone https://github.com/your-org/sip-agent.git
cd sip-agent
# Configure environment
cp sip-agent/.env.example sip-agent/.env
nano sip-agent/.env
# Start services
docker compose up -d
# (Optional) Start services with Observability
docker compose -f ./docker-compose.yml -f docker-compose.observability.yml up -dcurl http://localhost:8080/health | jqExpected output:
{
"status": "healthy",
"sip_registered": true,
"active_calls": 0
}┌────────────────────────────────────────────────────────────┐
│ 📞 INCOMING CALL │
├────────────────────────────────────────────────────────────┤
│ 🤖 "Hello! Welcome to the AI assistant. How can I help?" │
│ 👤 "What's the weather like?" │
│ 🤖 "At Storm Lake, it's 44 degrees with foggy conditions."│
│ 👤 "Set a timer for 5 minutes" │
│ 🤖 "Timer set for 5 minutes!" │
│ 👤 "Goodbye" │
│ 🤖 "Goodbye! Have a great day!" │
└────────────────────────────────────────────────────────────┘
Create a .env file with your settings:
# 📞 SIP Connection
SIP_USER=ai-assistant
SIP_PASSWORD=your-secure-password
SIP_DOMAIN=pbx.example.com
# 🎤 Speaches (STT + TTS)
SPEACHES_API_URL=http://speaches:8001
# 🧠 LLM Settings
LLM_BASE_URL=http://vllm:8000/v1
LLM_MODEL=openai-community/gpt2-xl
# 🌤️ Weather (Optional)
TEMPEST_STATION_ID=12345
TEMPEST_API_TOKEN=your-api-token📖 See Configuration Reference for all options.
curl -X POST http://localhost:8080/call \
-H "Content-Type: application/json" \
-d '{
"extension": "5551234567",
"message": "Hello! This is a reminder about your appointment tomorrow."
}'Response:
{
"call_id": "out-1732945860-1",
"status": "queued",
"message": "Call initiated"
}Schedule a daily weather call at 7am:
curl -X POST http://sip-agent:8080/schedule \
-H "Content-Type: application/json" \
-d '{
"extension": "5551234567",
"tool": "WEATHER",
"at_time": "07:00",
"timezone": "America/Los_Angeles",
"recurring": "daily",
"prefix": "Good morning! Here is your weather update for today.",
"suffix": "Have a great day!"
}' | jqResponse:
{
"schedule_id": "a1b2c3d4",
"status": "scheduled",
"scheduled_for": "2025-12-01T07:00:00-08:00",
"recurring": "daily"
}curl http://localhost:8080/tools | jq '.[].name'Output:
"WEATHER"
"SET_TIMER"
"CALLBACK"
"HANGUP"
"STATUS"
"CANCEL"
"DATETIME"
"CALC"
"JOKE"
Data center GPUs with maximum performance.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-70B-Instruct |
Best quality, fits in single GPU |
| LLM | Qwen/Qwen2.5-72B-Instruct |
Alternative, excellent reasoning |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# H100/A100 80GB Configuration
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartGrace Blackwell GB10 with shared CPU/GPU memory.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-70B-Instruct |
Fits in unified memory |
| LLM | Qwen/Qwen2.5-72B-Instruct |
Alternative option |
| LLM | deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
Reasoning focused |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# DGX Spark Configuration (128GB unified memory)
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartNext-gen consumer flagship.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-32B-Instruct |
Best fit for 32GB |
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Faster, lower quality |
| LLM | mistralai/Mistral-Small-24B-Instruct-2501 |
Good balance |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# RTX 5090 Configuration (32GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-32B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartCurrent consumer flagship.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-14B-Instruct |
Best quality for 24GB |
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Faster option |
| LLM | mistralai/Mistral-7B-Instruct-v0.3 |
Good tool calling |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# RTX 4090 Configuration (24GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-14B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartHigh-end consumer GPUs.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Best for 16-24GB |
| LLM | Qwen/Qwen2.5-7B-Instruct |
Fast alternative |
| LLM | microsoft/Phi-3-medium-4k-instruct |
14B, good quality |
| STT | Systran/faster-whisper-medium |
Good balance |
| TTS | af_heart |
Warm, natural voice |
# RTX 3090/4080 Configuration (16-24GB VRAM)
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-medium
TTS_VOICE=af_heartMid-range GPUs.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-7B-Instruct |
Best for 10-12GB |
| LLM | microsoft/Phi-3-mini-4k-instruct |
3.8B, very fast |
| LLM | meta-llama/Llama-3.2-3B-Instruct |
Lightweight |
| STT | Systran/faster-whisper-small |
Low VRAM |
| TTS | af_heart |
Warm, natural voice |
# RTX 3080/4070 Configuration (10-12GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-small
TTS_VOICE=af_heartOptimized for fastest response times.
# Minimum latency configuration
LLM_MODEL=Qwen/Qwen2.5-3B-Instruct
STT_MODEL=Systran/faster-whisper-tiny.en
TTS_VOICE=af_heart
TTS_SPEED=1.1| Voice | Style | Gender | Accent |
|---|---|---|---|
af_heart |
Warm, friendly | Female | American |
af_bella |
Professional | Female | American |
af_sarah |
Casual | Female | American |
af_nicole |
Expressive | Female | American |
am_adam |
Neutral | Male | American |
am_michael |
Professional | Male | American |
bf_emma |
Warm | Female | British |
bm_george |
Professional | Male | British |
| Tool | Description | Example Phrase |
|---|---|---|
🌤️ WEATHER |
Current weather conditions | "What's the weather?" |
⏲️ SET_TIMER |
Set a countdown timer | "Set a timer for 5 minutes" |
📞 CALLBACK |
Schedule a callback | "Call me back in an hour" |
📴 HANGUP |
End the call | "Goodbye" |
📋 STATUS |
Check pending timers | "What timers do I have?" |
❌ CANCEL |
Cancel timers/callbacks | "Cancel my timer" |
🕐 DATETIME |
Current date and time | "What time is it?" |
🧮 CALC |
Math calculations | "What's 25 times 4?" |
😄 JOKE |
Tell a joke | "Tell me a joke" |
🦜 SIMON_SAYS |
Repeat back verbatim | "Simon says hello world" |
Add custom tools by creating Python plugins:
# src/plugins/hello_tool.py
from tool_plugins import BaseTool, ToolResult, ToolStatus
class HelloTool(BaseTool):
name = "HELLO"
description = "Say hello to someone"
parameters = {
"name": {
"type": "string",
"description": "Name to greet",
"required": True
}
}
async def execute(self, params):
name = params.get("name", "friend")
return ToolResult(
status=ToolStatus.SUCCESS,
message=f"Hello, {name}! Nice to meet you."
)Register in tool_manager.py:
from plugins.hello_tool import HelloTool
tool_classes = [
# ... existing tools ...
HelloTool,
]📖 See Creating Plugins for the full guide.
# Docker logs
docker logs -f sip-agent
# Formatted log viewer
python tools/view-logs.py -fExample output:
┌──────────────────────────────────────────────────────────────
│ 📞 CALL #1 - From: 1001
└──────────────────────────────────────────────────────────────
15:30:05 📞 Call started
15:30:06 👤 "What's the weather?"
15:30:07 🔧 [TOOL:WEATHER]
15:30:08 🤖 "At Storm Lake, it's 44 degrees..."
15:30:12 👤 "Thanks, goodbye"
15:30:13 📴 Call ended (duration: 0:08)
Import the included dashboard:
grafana/dashboards/sip-agent.jsonsip-agent/
├── 📄 README.md # 👈 You are here
├── 📄 RELEASE.md # Release notes
├── 📄 CHANGELOG.md # Version history
├── 📄 docker-compose.yml # Main compose file
├── 📄 docker-compose.observability.yml
├── 📄 openapi.yaml # API specification
│
├── 📂 sip-agent/ # Core application
│ ├── 📄 Dockerfile
│ ├── 📄 requirements.txt
│ ├── 📄 .env.example
│ ├── 📂 data/
│ │ └── 📄 phrases.json.example
│ └── 📂 src/
│ ├── 📄 main.py # Application entry
│ ├── 📄 config.py # Configuration
│ ├── 📄 api.py # REST API
│ ├── 📄 sip_handler.py # SIP call handling
│ ├── 📄 audio_pipeline.py # STT/TTS processing
│ ├── 📄 llm_engine.py # LLM integration
│ ├── 📄 tool_manager.py # Tool orchestration
│ ├── 📄 tool_plugins.py # Plugin base classes
│ ├── 📄 call_queue.py # Redis call queue
│ ├── 📄 realtime_client.py # WebSocket STT
│ ├── 📄 telemetry.py # OpenTelemetry
│ ├── 📄 logging_utils.py # Structured logging
│ ├── 📄 retry_utils.py # API retry logic
│ └── 📂 plugins/ # Built-in tools
│ ├── 📄 weather_tool.py
│ ├── 📄 timer_tool.py
│ ├── 📄 callback_tool.py
│ ├── 📄 hangup_tool.py
│ ├── 📄 status_tool.py
│ ├── 📄 cancel_tool.py
│ ├── 📄 datetime_tool.py
│ ├── 📄 calc_tool.py
│ ├── 📄 joke_tool.py
│ └── 📄 simon_says_tool.py
│
├── 📂 docs/ # Documentation
│ ├── 📄 index.md # Overview
│ ├── 📄 getting-started.md # Installation
│ ├── 📄 configuration.md # Config reference
│ ├── 📄 api-reference.md # REST API
│ ├── 📄 tools.md # Built-in tools
│ ├── 📄 plugins.md # Plugin development
│ ├── 📄 examples.md # Integration examples
│ └── 📂 screenshots/
│
├── 📂 observability/ # Monitoring stack
│ ├── 📂 grafana/
│ │ └── 📂 provisioning/
│ │ ├── 📂 dashboards/ # Pre-built dashboards
│ │ └── 📂 datasources/
│ ├── 📂 prometheus/
│ │ └── 📄 prometheus.yaml
│ ├── 📂 loki/
│ │ └── 📄 loki.yaml
│ ├── 📂 tempo/
│ │ └── 📄 tempo.yaml
│ └── 📂 otel-collector/
│ └── 📄 config.yaml
│
├── 📂 tools/ # Utilities
│ └── 📄 view-logs.py # Log viewer
│
└── 📂 .github/
└── 📂 workflows/
├── 📄 docker-build.yml # Docker CI
└── 📄 readme-sync.yml # Docs sync
This project is optimized to run on the NVIDIA DGX Spark with Grace Blackwell architecture.
┌─────────────────────────────────────────────────────────────┐
│ 🟢 NVIDIA DGX Spark │
├─────────────────────────────────────────────────────────────┤
│ 🧠 Grace Blackwell GB10 Superchip │
│ 💾 128GB Unified Memory │
│ ⚡ 1 PFLOP AI Performance │
├─────────────────────────────────────────────────────────────┤
│ ✅ Local LLM inference (vLLM, Ollama) │
│ ✅ Local STT/TTS (Speaches + Whisper + Kokoro) │
│ ✅ Real-time voice processing │
│ ✅ Multiple concurrent calls │
└─────────────────────────────────────────────────────────────┘
Recommended DGX Spark setup:
# Run everything locally on DGX Spark
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=openai/gpt-oss-20b
SPEACHES_API_URL=http://localhost:8001📚 Full documentation available at sip-agent.readme.io
| Document | Description |
|---|---|
| 📖 Overview | Architecture and features |
| 🚀 Getting Started | Installation guide |
| ⚙️ Configuration | Environment variables |
| 🌐 API Reference | REST API endpoints |
| 🔧 Built-in Tools | Available tools |
| 🔌 Creating Plugins | Custom tool development |
| 📖 Examples | Integration patterns |
Contributions are welcome! Please read our contributing guidelines first.
# Fork and clone
git clone https://github.com/your-username/sip-agent.git
# Create branch
git checkout -b feature/amazing-feature
# Make changes and test
docker compose up -d
python -m pytest
# Commit with emoji
git commit -m "✨ feat: add amazing feature"
# Push and PR
git push origin feature/amazing-featureThis project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.
SPDX-License-Identifier: AGPL-3.0-or-later
- NVIDIA DGX Spark — AI supercomputer platform
- Speaches — Unified STT/TTS server
- PJSIP — SIP stack
- FastAPI — REST API framework
- WeatherFlow Tempest — Weather data
| Resource | Link |
|---|---|
| 📖 Docs | sip-agent.readme.io |
| 🐛 Issues | GitHub Issues |
| 💬 Discussions | GitHub Discussions |
Made with ❤️ and 🤖






