📞⚡ General Dissarray

🤖 SIP Enabled AI Agent

🤖 ROBO CODED — This project was made with AI and may not be 100% sane. But the code does work! 🎉

A voice-powered AI assistant that answers phone calls, understands natural language, and performs actions like checking weather, setting timers, scheduling callbacks, and more.

📖 Read the Documentation

✨ Features

Feature	Description
🎙️ Voice Conversations	Natural speech-to-text and text-to-speech powered by Whisper & Kokoro
🤖 LLM Integration	Connects to OpenAI, vLLM, Ollama, LM Studio, and more
🔧 Built-in Tools	Weather, timers, callbacks, date/time, calculator, jokes
🔌 Plugin System	Easily add custom tools with Python
🌐 REST API	Initiate outbound calls, execute tools, manage schedules
⏰ Scheduled Calls	One-time or recurring calls (daily briefings, reminders)
🔗 Webhooks	Trigger calls from Home Assistant, n8n, Grafana, and more
🗣️ Custom Phrases	Customize greetings, goodbyes, and responses via JSON or env vars
📊 Observability	Prometheus metrics, OpenTelemetry tracing, structured JSON logs

💡 Use Cases

Use Case	Example
⏲️ Timers & Reminders	"Set a timer for 10 minutes"
📞 Callbacks	"Call me back in an hour"
🌤️ Weather Briefings	Scheduled morning weather calls
📅 Appointment Reminders	Outbound calls with confirmation
🚨 Alerts & Notifications	Webhook-triggered phone calls
🏠 Smart Home	Voice control via phone

🚀 Quick Example

Call the assistant and say:

🗣️ "What's the weather like?"

sequenceDiagram
    participant User as 👤 User
    participant Agent as 🤖 SIP Agent
    participant STT as 🎤 Speaches
    participant LLM as 🧠 LLM
    participant Tool as 🌤️ Weather Tool
    
    User->>Agent: "What's the weather like?"
    Agent->>STT: Audio stream
    STT-->>Agent: Transcribed text
    Agent->>LLM: User query + context
    LLM-->>Agent: [TOOL:WEATHER]
    Agent->>Tool: Execute
    Tool-->>Agent: Weather data
    Agent->>LLM: Tool result
    LLM-->>Agent: Natural response
    Agent->>STT: Text to speech
    STT-->>Agent: Audio
    Agent->>User: "At Storm Lake, it's 44°..."

Assistant responds:

🤖 "At Storm Lake, as of 9:30 pm, it's 44 degrees with foggy conditions. Wind is calm."

🏗️ Architecture

flowchart LR
    subgraph Caller
        Phone[📱 SIP Phone]
    end
    
    subgraph Agent["🤖 SIP AI Agent"]
        SIP[SIP Client]
        Audio[Audio Pipeline]
        Tools[Tool Manager]
        API[REST API]
    end
    
    subgraph Services
        LLM[🧠 LLM Server<br/>OpenAI / vLLM / Ollama]
        Speaches[🎤 Speaches<br/>STT + TTS]
    end
    
    subgraph Integrations
        HA[🏠 Home Assistant]
        N8N[🔄 n8n]
        Webhook[🔗 Webhooks]
    end
    
    Phone <-->|SIP/RTP| SIP
    SIP <--> Audio
    Audio <-->|Whisper| Speaches
    Audio <-->|Kokoro| Speaches
    Audio <--> Tools
    Tools <-->|OpenAI API| LLM
    
    API <--> Tools
    HA -->|HTTP| API
    N8N -->|HTTP| API
    Webhook -->|HTTP| API

🔗 Services & Integrations

Service	Purpose	URL
🤖 SIP Agent	AI Voice Assistant API	localhost:8080
🎤 Speaches	STT/TTS (Whisper + Kokoro)	localhost:8001
🧠 vLLM	LLM Inference	localhost:8000
🔴 Redis	Call Queue & Cache	`redis://localhost:6379`
📊 Prometheus	Metrics Collection	localhost:9090
📈 Grafana	Dashboards	localhost:3000
📝 Loki	Log Aggregation	localhost:3100
🔍 Tempo	Distributed Tracing	localhost:3200
🔄 n8n	Workflow Automation	localhost:5678

🚀 Quick Start

Prerequisites

Requirement	Description
🐳 Docker	Docker and Docker Compose
📞 SIP Server	FreePBX, Asterisk, 3CX, or any SIP PBX

Installation

# Clone the repository
git clone https://github.com/your-org/sip-agent.git
cd sip-agent

# Configure environment
cp sip-agent/.env.example sip-agent/.env
nano sip-agent/.env

# Start services
docker compose up -d

# (Optional) Start services with Observability
docker compose -f ./docker-compose.yml -f docker-compose.observability.yml up -d

Verify Installation

curl http://localhost:8080/health | jq

Expected output:

{
  "status": "healthy",
  "sip_registered": true,
  "active_calls": 0
}

Make a Test Call

┌────────────────────────────────────────────────────────────┐
│ 📞 INCOMING CALL                                           │
├────────────────────────────────────────────────────────────┤
│ 🤖 "Hello! Welcome to the AI assistant. How can I help?"  │
│ 👤 "What's the weather like?"                              │
│ 🤖 "At Storm Lake, it's 44 degrees with foggy conditions."│
│ 👤 "Set a timer for 5 minutes"                             │
│ 🤖 "Timer set for 5 minutes!"                             │
│ 👤 "Goodbye"                                               │
│ 🤖 "Goodbye! Have a great day!"                           │
└────────────────────────────────────────────────────────────┘

⚙️ Configuration

Create a .env file with your settings:

# 📞 SIP Connection
SIP_USER=ai-assistant
SIP_PASSWORD=your-secure-password
SIP_DOMAIN=pbx.example.com

# 🎤 Speaches (STT + TTS)
SPEACHES_API_URL=http://speaches:8001

# 🧠 LLM Settings
LLM_BASE_URL=http://vllm:8000/v1
LLM_MODEL=openai-community/gpt2-xl

# 🌤️ Weather (Optional)
TEMPEST_STATION_ID=12345
TEMPEST_API_TOKEN=your-api-token

📖 See Configuration Reference for all options.

🌐 API Examples

📞 Make an Outbound Call

curl -X POST http://localhost:8080/call \
  -H "Content-Type: application/json" \
  -d '{
    "extension": "5551234567",
    "message": "Hello! This is a reminder about your appointment tomorrow."
  }'

Response:

{
  "call_id": "out-1732945860-1",
  "status": "queued",
  "message": "Call initiated"
}

🌅 Morning Weather Briefing

Schedule a daily weather call at 7am:

curl -X POST http://sip-agent:8080/schedule \
  -H "Content-Type: application/json" \
  -d '{
    "extension": "5551234567",
    "tool": "WEATHER",
    "at_time": "07:00",
    "timezone": "America/Los_Angeles",
    "recurring": "daily",
    "prefix": "Good morning! Here is your weather update for today.",
    "suffix": "Have a great day!"
  }' | jq

Response:

{
  "schedule_id": "a1b2c3d4",
  "status": "scheduled",
  "scheduled_for": "2025-12-01T07:00:00-08:00",
  "recurring": "daily"
}

🔧 List Available Tools

curl http://localhost:8080/tools | jq '.[].name'

Output:

"WEATHER"
"SET_TIMER"
"CALLBACK"
"HANGUP"
"STATUS"
"CANCEL"
"DATETIME"
"CALC"
"JOKE"

🧠 Recommended Models

NVIDIA H100 / A100 (80GB HBM)

Data center GPUs with maximum performance.

Component	Model	Notes
LLM	`meta-llama/Llama-3.1-70B-Instruct`	Best quality, fits in single GPU
LLM	`Qwen/Qwen2.5-72B-Instruct`	Alternative, excellent reasoning
STT	`Systran/faster-whisper-large-v3`	Best accuracy
TTS	`af_heart`	Warm, natural voice

# H100/A100 80GB Configuration
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA DGX Spark (128GB Unified)

Grace Blackwell GB10 with shared CPU/GPU memory.

Component	Model	Notes
LLM	`meta-llama/Llama-3.1-70B-Instruct`	Fits in unified memory
LLM	`Qwen/Qwen2.5-72B-Instruct`	Alternative option
LLM	`deepseek-ai/DeepSeek-R1-Distill-Llama-70B`	Reasoning focused
STT	`Systran/faster-whisper-large-v3`	Best accuracy
TTS	`af_heart`	Warm, natural voice

# DGX Spark Configuration (128GB unified memory)
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 5090 (32GB GDDR7)

Next-gen consumer flagship.

Component	Model	Notes
LLM	`Qwen/Qwen2.5-32B-Instruct`	Best fit for 32GB
LLM	`meta-llama/Llama-3.1-8B-Instruct`	Faster, lower quality
LLM	`mistralai/Mistral-Small-24B-Instruct-2501`	Good balance
STT	`Systran/faster-whisper-large-v3`	Best accuracy
TTS	`af_heart`	Warm, natural voice

# RTX 5090 Configuration (32GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-32B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 4090 (24GB GDDR6X)

Current consumer flagship.

Component	Model	Notes
LLM	`Qwen/Qwen2.5-14B-Instruct`	Best quality for 24GB
LLM	`meta-llama/Llama-3.1-8B-Instruct`	Faster option
LLM	`mistralai/Mistral-7B-Instruct-v0.3`	Good tool calling
STT	`Systran/faster-whisper-large-v3`	Best accuracy
TTS	`af_heart`	Warm, natural voice

# RTX 4090 Configuration (24GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-14B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 3090 / 4080 (24GB / 16GB)

High-end consumer GPUs.

Component	Model	Notes
LLM	`meta-llama/Llama-3.1-8B-Instruct`	Best for 16-24GB
LLM	`Qwen/Qwen2.5-7B-Instruct`	Fast alternative
LLM	`microsoft/Phi-3-medium-4k-instruct`	14B, good quality
STT	`Systran/faster-whisper-medium`	Good balance
TTS	`af_heart`	Warm, natural voice

# RTX 3090/4080 Configuration (16-24GB VRAM)
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-medium
TTS_VOICE=af_heart

NVIDIA RTX 3080 / 4070 (10-12GB)

Mid-range GPUs.

Component	Model	Notes
LLM	`Qwen/Qwen2.5-7B-Instruct`	Best for 10-12GB
LLM	`microsoft/Phi-3-mini-4k-instruct`	3.8B, very fast
LLM	`meta-llama/Llama-3.2-3B-Instruct`	Lightweight
STT	`Systran/faster-whisper-small`	Low VRAM
TTS	`af_heart`	Warm, natural voice

# RTX 3080/4070 Configuration (10-12GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-small
TTS_VOICE=af_heart

Low-Latency Stack (Any GPU)

Optimized for fastest response times.

# Minimum latency configuration
LLM_MODEL=Qwen/Qwen2.5-3B-Instruct
STT_MODEL=Systran/faster-whisper-tiny.en
TTS_VOICE=af_heart
TTS_SPEED=1.1

TTS Voice Options

Voice	Style	Gender	Accent
`af_heart`	Warm, friendly	Female	American
`af_bella`	Professional	Female	American
`af_sarah`	Casual	Female	American
`af_nicole`	Expressive	Female	American
`am_adam`	Neutral	Male	American
`am_michael`	Professional	Male	American
`bf_emma`	Warm	Female	British
`bm_george`	Professional	Male	British

🔧 Built-in Tools

Tool	Description	Example Phrase
🌤️ `WEATHER`	Current weather conditions	"What's the weather?"
⏲️ `SET_TIMER`	Set a countdown timer	"Set a timer for 5 minutes"
📞 `CALLBACK`	Schedule a callback	"Call me back in an hour"
📴 `HANGUP`	End the call	"Goodbye"
📋 `STATUS`	Check pending timers	"What timers do I have?"
❌ `CANCEL`	Cancel timers/callbacks	"Cancel my timer"
🕐 `DATETIME`	Current date and time	"What time is it?"
🧮 `CALC`	Math calculations	"What's 25 times 4?"
😄 `JOKE`	Tell a joke	"Tell me a joke"
🦜 `SIMON_SAYS`	Repeat back verbatim	"Simon says hello world"

🔌 Creating Plugins

Add custom tools by creating Python plugins:

# src/plugins/hello_tool.py
from tool_plugins import BaseTool, ToolResult, ToolStatus

class HelloTool(BaseTool):
    name = "HELLO"
    description = "Say hello to someone"
    
    parameters = {
        "name": {
            "type": "string",
            "description": "Name to greet",
            "required": True
        }
    }
    
    async def execute(self, params):
        name = params.get("name", "friend")
        return ToolResult(
            status=ToolStatus.SUCCESS,
            message=f"Hello, {name}! Nice to meet you."
        )

from plugins.hello_tool import HelloTool

tool_classes = [
    # ... existing tools ...
    HelloTool,
]

📖 See Creating Plugins for the full guide.

📊 Monitoring

View Logs

# Docker logs
docker logs -f sip-agent

# Formatted log viewer
python tools/view-logs.py -f

Example output:

┌──────────────────────────────────────────────────────────────
│ 📞 CALL #1 - From: 1001
└──────────────────────────────────────────────────────────────
15:30:05  📞 Call started
15:30:06  👤 "What's the weather?"
15:30:07  🔧 [TOOL:WEATHER]
15:30:08  🤖 "At Storm Lake, it's 44 degrees..."
15:30:12  👤 "Thanks, goodbye"
15:30:13  📴 Call ended (duration: 0:08)

Grafana Dashboard

Import the included dashboard:

grafana/dashboards/sip-agent.json

🗂️ Project Structure

sip-agent/
├── 📄 README.md                    # 👈 You are here
├── 📄 RELEASE.md                   # Release notes
├── 📄 CHANGELOG.md                 # Version history
├── 📄 docker-compose.yml           # Main compose file
├── 📄 docker-compose.observability.yml
├── 📄 openapi.yaml                 # API specification
│
├── 📂 sip-agent/                   # Core application
│   ├── 📄 Dockerfile
│   ├── 📄 requirements.txt
│   ├── 📄 .env.example
│   ├── 📂 data/
│   │   └── 📄 phrases.json.example
│   └── 📂 src/
│       ├── 📄 main.py              # Application entry
│       ├── 📄 config.py            # Configuration
│       ├── 📄 api.py               # REST API
│       ├── 📄 sip_handler.py       # SIP call handling
│       ├── 📄 audio_pipeline.py    # STT/TTS processing
│       ├── 📄 llm_engine.py        # LLM integration
│       ├── 📄 tool_manager.py      # Tool orchestration
│       ├── 📄 tool_plugins.py      # Plugin base classes
│       ├── 📄 call_queue.py        # Redis call queue
│       ├── 📄 realtime_client.py   # WebSocket STT
│       ├── 📄 telemetry.py         # OpenTelemetry
│       ├── 📄 logging_utils.py     # Structured logging
│       ├── 📄 retry_utils.py       # API retry logic
│       └── 📂 plugins/             # Built-in tools
│           ├── 📄 weather_tool.py
│           ├── 📄 timer_tool.py
│           ├── 📄 callback_tool.py
│           ├── 📄 hangup_tool.py
│           ├── 📄 status_tool.py
│           ├── 📄 cancel_tool.py
│           ├── 📄 datetime_tool.py
│           ├── 📄 calc_tool.py
│           ├── 📄 joke_tool.py
│           └── 📄 simon_says_tool.py
│
├── 📂 docs/                        # Documentation
│   ├── 📄 index.md                 # Overview
│   ├── 📄 getting-started.md       # Installation
│   ├── 📄 configuration.md         # Config reference
│   ├── 📄 api-reference.md         # REST API
│   ├── 📄 tools.md                 # Built-in tools
│   ├── 📄 plugins.md               # Plugin development
│   ├── 📄 examples.md              # Integration examples
│   └── 📂 screenshots/
│
├── 📂 observability/               # Monitoring stack
│   ├── 📂 grafana/
│   │   └── 📂 provisioning/
│   │       ├── 📂 dashboards/      # Pre-built dashboards
│   │       └── 📂 datasources/
│   ├── 📂 prometheus/
│   │   └── 📄 prometheus.yaml
│   ├── 📂 loki/
│   │   └── 📄 loki.yaml
│   ├── 📂 tempo/
│   │   └── 📄 tempo.yaml
│   └── 📂 otel-collector/
│       └── 📄 config.yaml
│
├── 📂 tools/                       # Utilities
│   └── 📄 view-logs.py             # Log viewer
│
└── 📂 .github/
    └── 📂 workflows/
        ├── 📄 docker-build.yml     # Docker CI
        └── 📄 readme-sync.yml      # Docs sync

🖥️ Runs on NVIDIA DGX Spark

This project is optimized to run on the NVIDIA DGX Spark with Grace Blackwell architecture.

┌─────────────────────────────────────────────────────────────┐
│ 🟢 NVIDIA DGX Spark                                         │
├─────────────────────────────────────────────────────────────┤
│ 🧠 Grace Blackwell GB10 Superchip                          │
│ 💾 128GB Unified Memory                                     │
│ ⚡ 1 PFLOP AI Performance                                   │
├─────────────────────────────────────────────────────────────┤
│ ✅ Local LLM inference (vLLM, Ollama)                      │
│ ✅ Local STT/TTS (Speaches + Whisper + Kokoro)             │
│ ✅ Real-time voice processing                               │
│ ✅ Multiple concurrent calls                                │
└─────────────────────────────────────────────────────────────┘

Recommended DGX Spark setup:

# Run everything locally on DGX Spark
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=openai/gpt-oss-20b
SPEACHES_API_URL=http://localhost:8001

📖 Documentation

📚 Full documentation available at sip-agent.readme.io

Document	Description
📖 Overview	Architecture and features
🚀 Getting Started	Installation guide
⚙️ Configuration	Environment variables
🌐 API Reference	REST API endpoints
🔧 Built-in Tools	Available tools
🔌 Creating Plugins	Custom tool development
📖 Examples	Integration patterns

🤝 Contributing

Contributions are welcome! Please read our contributing guidelines first.

# Fork and clone
git clone https://github.com/your-username/sip-agent.git

# Create branch
git checkout -b feature/amazing-feature

# Make changes and test
docker compose up -d
python -m pytest

# Commit with emoji
git commit -m "✨ feat: add amazing feature"

# Push and PR
git push origin feature/amazing-feature

📜 License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

SPDX-License-Identifier: AGPL-3.0-or-later

🙏 Acknowledgments

NVIDIA DGX Spark — AI supercomputer platform
Speaches — Unified STT/TTS server
PJSIP — SIP stack
FastAPI — REST API framework
WeatherFlow Tempest — Weather data

📞 Support

Resource	Link
📖 Docs	sip-agent.readme.io
🐛 Issues	GitHub Issues
💬 Discussions	GitHub Discussions

Made with ❤️ and 🤖

FilesExpand file tree

README.md

Latest commit

History