Live Demo: https://agoraio-community.github.io/RPM-agora-agent/
Build conversational AI agents with synchronized lip movements, natural expressions, and genuine real-time responses powered by Agora ConvoAI Engine, WebAudio API, and ReadyPlayer.me avatars.
- Real-Time FFT Analysis - Analyzes AI voice at 60 FPS using WebAudio API (85-255 Hz speech range)
- ARKit Viseme Mapping - Frequency patterns map to phonemes (aa, E, I, O, U, PP, FF, TH, etc.)
- 50+ Morph Targets - ARKit blend shapes for realistic facial deformation
- Exponential Smoothing - Delta-time easing for fluid transitions without jitter
- <50ms Latency - Audio-to-visual synchronization with minimal delay
- Breathing Simulation - Subtle sine wave variations for natural idle behavior
- Ultra-Low Latency WebRTC - Real-time voice streaming via Agora RTC SDK
- Speech-to-Text (ASR) - Automatic speech recognition for user input
- LLM Integration - OpenAI GPT-4 or compatible models for intelligent responses
- Text-to-Speech (TTS) - Azure Speech Services for natural voice synthesis
- Cloud-Based Agent - ConvoAI Agent joins Agora channel as a remote user
- Multi-Language Support - Configurable ASR/TTS language settings
- GLB 3D Models - Optimized web-ready avatars with facial rigs
- Facial Expressions - 7 emotional states (smile, surprised, sad, angry, etc.)
- Body Animations - Idle, talking, laughing, crying, and more
- Real-Time Morphing - Facial blend shapes respond to live audio analysis
- Manual Override - UI panels for expression/animation control
- Three.js Rendering - 60 FPS WebGL performance
User Speech β Agora RTC β ConvoAI Engine β LLM (GPT-4) β TTS (Azure) β Audio Stream
β
WebAudio Analyzer
β
FFT Analysis (256)
β
Frequency β Viseme Mapping
β
ARKit Blend Shapes
β
Three.js Rendering (60 FPS)
β
Synchronized Lip Sync
- User speaks β Agora RTC captures and streams audio to ConvoAI Engine
- ConvoAI processes β Speech-to-text (ASR), LLM reasoning, text-to-speech (TTS)
- AI responds β TTS audio streams back through Agora RTC as remote user
- WebAudio analyzes β AnalyserNode performs FFT on audio stream (85-255 Hz speech range)
- Viseme mapping β Frequency patterns map to phoneme shapes (A, E, I, O, U, PP, FF, etc.)
- Morph targets update β ARKit blend shapes deform facial mesh at 60 FPS
- Avatar speaks β Realistic lip sync with <50ms audio-to-visual latency
Human speech frequencies cluster in predictable ranges:
- Low (85-150 Hz): Open vowels β "O", "U" visemes
- Mid (150-200 Hz): Central vowels β "A" visemes
- High (200-255 Hz): Closed vowels β "E", "I" visemes
- Consonants: Distinct spikes β PP, FF, TH, kk visemes
// Vowel phonemes with complex mouth shapes
A: { jawOpen: 0.7, mouthOpen: 0.8, mouthWide: 0.5 }
E: { jawOpen: 0.4, mouthOpen: 0.6, mouthWide: 0.7, mouthSmile: 0.3 }
I: { jawOpen: 0.2, mouthOpen: 0.3, mouthWide: 0.8, mouthSmile: 0.5 }
O: { jawOpen: 0.5, mouthOpen: 0.7, mouthFunnel: 0.6, mouthPucker: 0.4 }
U: { jawOpen: 0.3, mouthOpen: 0.4, mouthFunnel: 0.8, mouthPucker: 0.7 }
// Consonant phonemes with precise articulation
PP: { mouthPressLeft: 0.8, mouthPressRight: 0.8, mouthClose: 0.9 }
FF: { jawOpen: 0.1, mouthOpen: 0.2, mouthFunnel: 0.3 }
TH: { jawOpen: 0.3, mouthOpen: 0.4, tongueOut: 0.2 }- Exponential Smoothing:
lerp(current, target, 1 - exp(-15 * deltaTime))eliminates jitter - Frame-Rate Independent: Delta-time integration for consistent animation speed
- Viseme Transitions: 12x speed multiplier for natural phoneme blending
- Breathing Variation:
sin(time * 2) * 0.1adds subtle idle movement - Intensity Scaling: 2x-4x audio level multipliers for visible mouth movement
- Agora account with App ID and Token
- Agora ConvoAI API credentials ("Customer ID" and "Customer Secret")
- OpenAI API key or compatible LLM
- Azure Speech Services API key for TTS
- Modern browser with WebAudio API support (Chrome 80+, Firefox 75+, Safari 14+, Edge 80+)
Visit: https://agoraio-community.github.io/RPM-agora-agent/
Click the Settings (β°) button in the top-right and enter your credentials:
App ID: [From Agora Console]
Token: [Generate from Agora Console]
Channel: [Your channel name, e.g., "test-channel"]
API Base URL: https://api.agora.io/v1
Customer ID: [Your ConvoAI Customer ID]
Customer Secret: [Your ConvoAI Customer Secret]
Agent Name: Virtual Assistant
Agent UID: 8888
API URL: https://api.openai.com/v1
API Key: [Your OpenAI API Key]
Model: gpt-4o-mini
System Message: You are a friendly virtual agent assistant.
Greeting: Hello! How can I help you today?
API Key: [Your Azure Speech Key]
Region: eastus (or your region)
Voice Name: en-US-AriaNeural
Language: en-US
Settings are stored in sessionStorage during your browser session.
- Click Connect to join the Agora channel
- The ConvoAI agent will automatically join as a remote user
- Start speaking - the avatar will listen and analyze your speech
- The AI responds with synthesized voice and synchronized lip movements
- Watch real-time lip sync powered by WebAudio FFT analysis! 18** - UI component framework
- React Three Fiber - React renderer for Three.js
- Three.js - WebGL graphics engine for 3D rendering
- @react-three/drei - Useful helpers for R3F (useGLTF, etc.)
- Agora RTC SDK - WebRTC communication and streaming
- WebAudio API - Browser-native audio analysis (AnalyserNode, FFT)
- Vite - Fast build tool and dev server
- Tailwind CSS - Utility-first CSS framework
- Sample Rate: 48kHz audio streams from Agora RTC
- FFT Size: 256 (provides 128 frequency bins)
- Frequency Range: 85-255 Hz (primary speech frequencies)
- Analysis Rate: ~60 FPS via requestAnimationFrame
- Latency: <50ms from audio output to visual update
- Smoothing: Exponential interpolation (
lerpwithexp(-speed * deltaTime))
- Model Format: GLB (Binary glTF) - optimized for web streaming
- Facial Rig: 50+ ARKit-compatible morph targets
- Rendering: Three.js SkinnedMesh with morph target influences
- Animation: Frame-by-frame morph target updates at 60 FPS
- Expressions: Layered blend shapes (expressions + lip sync)
- Source: ReadyPlayer.me avatar creator with full facial rig
- REST API: Join/leave agent endpoints
- Authentication: Basic Auth with Customer ID/Secret
- Agent Lifecycle: Programmatic agent creation and management
- Voice Pipeline: ASR β LLM β TTS fully managed by ConvoAI
- Agent UID: ConvoAI agent joins as remote user in Agora channel
Manual controls override AI behavior for creative control
- React Three Fiber - 3D rendering and animation
- Three.js - WebGL graphics engine
- WebAudio API - Real-time audio analysis
- Agora SDK - WebRTC communication
- Tailwind CSS - UI styling
- Audio Sampling: 44.1kHz audio analysis
- Frequency Analysis: FFT processing for audio features
- Viseme Detection: Speech sound classification
- Morph Target Interpolation: Smooth facial animation
- Frame Rate: 60fps animation updates
- File Format: GLB (optimized for web)
- Facial Rig: 50+ morph targets
- Animation System: Mixamo-compatible FBX animations
- Texture Resolution: Optimized for real-time rendering
- LOD System: Performance-optimized for web browsers
You provide all API credentials and control spending: RPM-agora-agent
npm install
Start development server (runs on http://localhost:5173)
npm run dev
npm run build
npm run deploy
### **Project Structure**
src/ βββ components/ β βββ Avatar.jsx # 3D avatar with lip sync engine β βββ Experience.jsx # Three.js scene setup β βββ UI.jsx # Main interface β βββ Settings.jsx # API credentials panel β βββ CombinedChat.jsx # Chat interface βββ hooks/ β βββ useAgora.jsx # Agora RTC + ConvoAI integration β βββ useChat.jsx # Chat state management β βββ useLipSync.jsx # Lip sync audio analysis βββ App.jsx # Root component βββ main.jsx # Entry point
### **Customization Options**
- **Avatar Models**: Replace GLB files in `public/models/Avatars/` with custom ReadyPlayer.me avatars
- **Viseme Tuning**: Adjust frequency ranges and intensity multipliers in `useAgora.jsx`
- **LLM Models**: Switch between GPT-4, GPT-3.5, or other OpenAI-compatible APIs
- **TTS Voices**: Choose from 400+ Azure neural voices in different languages
- **UI Styling**: Modify Tailwind classes for custom appearance
- **Facial Expressions**: Add new expression presets in `Avatar.jsx`ser closes
- β
**Open Source** - Full code transparency
- β
**No Tracking** - No analytics or user tracking
## π οΈ Advanced Development
### **Local Development**
```bash
# Clone repository
git clone https://github.com/AgoraIO-Community/RPM-agora-agent.git
cd agora-agent
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Deploy to GitHub Pages
npm run deploy
- No Environment Variables - All config via UI
- Hot Module Replacement - Instant code updates
- Debug Panels - Real-time lip sync monitoring
- Animation Controls - Manual override capabilities
- Audio Level Indicators - WebRTC connection status
- Avatar Models - Replace GLB files with custom 3D models
- Animation Sets - Add custom FBX animations
- Voice Personalities - Configure different AI personalities
- UI Themes - Customize interface appearance
- Lip Sync Tuning - Adjust viseme sensitivity parameters
- No Audio Output: Check microphone permissions and Agora token validity
- ConvoAI Connection Failed: Verify Customer ID/Secret and App ID match
- No Lip Sync: Ensure AudioContext is not suspended (some browsers require user interaction)
- Avatar Not Loading: Check browser console for GLB loading errors
- Performance Issues: Close other browser tabs, check FPS in Three.js stats
- Open browser DevTools Console for detailed logs
- Check Network tab for ConvoAI API call responses
- Monitor WebAudio analyzer data in
useAgora.jsx - Use Three.js DevTools extension for scene inspection
- Comprehensive Guide: See GUIDE.md for detailed implementation walkthrough
- Deployment: See docs/DEPLOYMENT.md for production deployment
- Architecture: See docs/ARCHITECTURE_PLAN.md for system design
- Agora ConvoAI Documentation
- Agora RTC Web SDK Reference
- ReadyPlayer.me Documentation
- WebAudio API Guide
- Three.js Morph Targets
- Join the Agora Developer Community
WebAudio-driven lip sync meets AI conversation in stunning 3D - all running in your browser with <50ms latency!
Live Demo: https://agoraio-community.github.io/RPM-agora-agent/
Built with β€οΈ using Agora ConvoAI, ReadyPlayer.me, and WebAudio API
Questions? Open an issue on GitHub
- ES6 modules
- No Audio: Check microphone permissions
- Connection Failed: Verify Agora credentials
- No Lip Sync: Ensure WebAudio permissions
- Performance Issues: Lower quality settings
- Open browser DevTools
- Check Console for errors
- Monitor Network tab for API calls
- Use Performance tab for optimization
Real-time lip sync meets AI conversation in stunning 3D - all running in your browser!
Live Demo: https://agoraio-community.github.io/RPM-agora-agent/
