🎭 Real-Time AI Avatars with Lip Sync Using Agora ConvoAI

Live Demo: https://agoraio-community.github.io/RPM-agora-agent/

Build conversational AI agents with synchronized lip movements, natural expressions, and genuine real-time responses powered by Agora ConvoAI Engine, WebAudio API, and ReadyPlayer.me avatars.

🌟 Key Features

🎤 WebAudio-Driven Lip Sync

Real-Time FFT Analysis - Analyzes AI voice at 60 FPS using WebAudio API (85-255 Hz speech range)
ARKit Viseme Mapping - Frequency patterns map to phonemes (aa, E, I, O, U, PP, FF, TH, etc.)
50+ Morph Targets - ARKit blend shapes for realistic facial deformation
Exponential Smoothing - Delta-time easing for fluid transitions without jitter
<50ms Latency - Audio-to-visual synchronization with minimal delay
Breathing Simulation - Subtle sine wave variations for natural idle behavior

🤖 Agora ConvoAI Engine

Ultra-Low Latency WebRTC - Real-time voice streaming via Agora RTC SDK
Speech-to-Text (ASR) - Automatic speech recognition for user input
LLM Integration - OpenAI GPT-4 or compatible models for intelligent responses
Text-to-Speech (TTS) - Azure Speech Services for natural voice synthesis
Cloud-Based Agent - ConvoAI Agent joins Agora channel as a remote user
Multi-Language Support - Configurable ASR/TTS language settings

🎨 ReadyPlayer.me Avatar System

GLB 3D Models - Optimized web-ready avatars with facial rigs
Facial Expressions - 7 emotional states (smile, surprised, sad, angry, etc.)
Body Animations - Idle, talking, laughing, crying, and more
Real-Time Morphing - Facial blend shapes respond to live audio analysis
Manual Override - UI panels for expression/animation control
Three.js Rendering - 60 FPS WebGL performance

🎯 How It Works

Real-Time Data Flow

User Speech → Agora RTC → ConvoAI Engine → LLM (GPT-4) → TTS (Azure) → Audio Stream
                                                                              ↓
                                                                    WebAudio Analyzer
                                                                              ↓
                                                                    FFT Analysis (256)
                                                                              ↓
                                                              Frequency → Viseme Mapping
                                                                              ↓
                                                                  ARKit Blend Shapes
                                                                              ↓
                                                              Three.js Rendering (60 FPS)
                                                                              ↓
                                                                  Synchronized Lip Sync

User speaks → Agora RTC captures and streams audio to ConvoAI Engine
ConvoAI processes → Speech-to-text (ASR), LLM reasoning, text-to-speech (TTS)
AI responds → TTS audio streams back through Agora RTC as remote user
WebAudio analyzes → AnalyserNode performs FFT on audio stream (85-255 Hz speech range)
Viseme mapping → Frequency patterns map to phoneme shapes (A, E, I, O, U, PP, FF, etc.)
Morph targets update → ARKit blend shapes deform facial mesh at 60 FPS
Avatar speaks → Realistic lip sync with <50ms audio-to-visual latency

Technical Deep Dive

Frequency-to-Viseme Mapping

Human speech frequencies cluster in predictable ranges:

Low (85-150 Hz): Open vowels → "O", "U" visemes
Mid (150-200 Hz): Central vowels → "A" visemes
High (200-255 Hz): Closed vowels → "E", "I" visemes
Consonants: Distinct spikes → PP, FF, TH, kk visemes

ARKit Blend Shape Targets

// Vowel phonemes with complex mouth shapes
A: { jawOpen: 0.7, mouthOpen: 0.8, mouthWide: 0.5 }
E: { jawOpen: 0.4, mouthOpen: 0.6, mouthWide: 0.7, mouthSmile: 0.3 }
I: { jawOpen: 0.2, mouthOpen: 0.3, mouthWide: 0.8, mouthSmile: 0.5 }
O: { jawOpen: 0.5, mouthOpen: 0.7, mouthFunnel: 0.6, mouthPucker: 0.4 }
U: { jawOpen: 0.3, mouthOpen: 0.4, mouthFunnel: 0.8, mouthPucker: 0.7 }

// Consonant phonemes with precise articulation
PP: { mouthPressLeft: 0.8, mouthPressRight: 0.8, mouthClose: 0.9 }
FF: { jawOpen: 0.1, mouthOpen: 0.2, mouthFunnel: 0.3 }
TH: { jawOpen: 0.3, mouthOpen: 0.4, tongueOut: 0.2 }

Smooth Animation Pipeline

Exponential Smoothing: lerp(current, target, 1 - exp(-15 * deltaTime)) eliminates jitter
Frame-Rate Independent: Delta-time integration for consistent animation speed
Viseme Transitions: 12x speed multiplier for natural phoneme blending
Breathing Variation: sin(time * 2) * 0.1 adds subtle idle movement
Intensity Scaling: 2x-4x audio level multipliers for visible mouth movement

🚀 Quick Start

Prerequisites

Agora account with App ID and Token
Agora ConvoAI API credentials ("Customer ID" and "Customer Secret")
OpenAI API key or compatible LLM
Azure Speech Services API key for TTS
Modern browser with WebAudio API support (Chrome 80+, Firefox 75+, Safari 14+, Edge 80+)

1. Access the Live Demo

Visit: https://agoraio-community.github.io/RPM-agora-agent/

2. Configure Your API Credentials

Click the Settings (☰) button in the top-right and enter your credentials:

Agora Tab

App ID: [From Agora Console]
Token: [Generate from Agora Console]
Channel: [Your channel name, e.g., "test-channel"]

ConvoAI Tab

API Base URL: https://api.agora.io/v1
Customer ID: [Your ConvoAI Customer ID]
Customer Secret: [Your ConvoAI Customer Secret]
Agent Name: Virtual Assistant
Agent UID: 8888

LLM Tab

API URL: https://api.openai.com/v1
API Key: [Your OpenAI API Key]
Model: gpt-4o-mini
System Message: You are a friendly virtual agent assistant.
Greeting: Hello! How can I help you today?

TTS Tab

API Key: [Your Azure Speech Key]
Region: eastus (or your region)
Voice Name: en-US-AriaNeural

ASR Tab

Language: en-US

Settings are stored in sessionStorage during your browser session.

3. Start Conversing with Your AI Avatar

Click Connect to join the Agora channel
The ConvoAI agent will automatically join as a remote user
Start speaking - the avatar will listen and analyze your speech
The AI responds with synthesized voice and synchronized lip movements
Watch real-time lip sync powered by WebAudio FFT analysis! 18** - UI component framework

React Three Fiber - React renderer for Three.js
Three.js - WebGL graphics engine for 3D rendering
@react-three/drei - Useful helpers for R3F (useGLTF, etc.)
Agora RTC SDK - WebRTC communication and streaming
WebAudio API - Browser-native audio analysis (AnalyserNode, FFT)
Vite - Fast build tool and dev server
Tailwind CSS - Utility-first CSS framework

Real-Time Audio Processing

Sample Rate: 48kHz audio streams from Agora RTC
FFT Size: 256 (provides 128 frequency bins)
Frequency Range: 85-255 Hz (primary speech frequencies)
Analysis Rate: ~60 FPS via requestAnimationFrame
Latency: <50ms from audio output to visual update
Smoothing: Exponential interpolation (lerp with exp(-speed * deltaTime))

3D Avatar Architecture

Model Format: GLB (Binary glTF) - optimized for web streaming
Facial Rig: 50+ ARKit-compatible morph targets
Rendering: Three.js SkinnedMesh with morph target influences
Animation: Frame-by-frame morph target updates at 60 FPS
Expressions: Layered blend shapes (expressions + lip sync)
Source: ReadyPlayer.me avatar creator with full facial rig

ConvoAI Integration

REST API: Join/leave agent endpoints
Authentication: Basic Auth with Customer ID/Secret
Agent Lifecycle: Programmatic agent creation and management
Voice Pipeline: ASR → LLM → TTS fully managed by ConvoAI
Agent UID: ConvoAI agent joins as remote user in Agora channel

Manual controls override AI behavior for creative control

🔧 Technical Architecture

Frontend Stack

React Three Fiber - 3D rendering and animation
Three.js - WebGL graphics engine
WebAudio API - Real-time audio analysis
Agora SDK - WebRTC communication
Tailwind CSS - UI styling

Real-Time Processing

Audio Sampling: 44.1kHz audio analysis
Frequency Analysis: FFT processing for audio features
Viseme Detection: Speech sound classification
Morph Target Interpolation: Smooth facial animation
Frame Rate: 60fps animation updates

3D Model Features

File Format: GLB (optimized for web)
Facial Rig: 50+ morph targets
Animation System: Mixamo-compatible FBX animations
Texture Resolution: Optimized for real-time rendering
LOD System: Performance-optimized for web browsers

💰 Cost Structure

User-Controlled Costs

You provide all API credentials and control spending: RPM-agora-agent

Install dependencies

npm install

Start development server (runs on http://localhost:5173)

npm run dev

Build for production

npm run build

Deploy to GitHub Pages

npm run deploy


### **Project Structure**

src/ ├── components/ │ ├── Avatar.jsx # 3D avatar with lip sync engine │ ├── Experience.jsx # Three.js scene setup │ ├── UI.jsx # Main interface │ ├── Settings.jsx # API credentials panel │ └── CombinedChat.jsx # Chat interface ├── hooks/ │ ├── useAgora.jsx # Agora RTC + ConvoAI integration │ ├── useChat.jsx # Chat state management │ └── useLipSync.jsx # Lip sync audio analysis ├── App.jsx # Root component └── main.jsx # Entry point


### **Customization Options**
- **Avatar Models**: Replace GLB files in `public/models/Avatars/` with custom ReadyPlayer.me avatars
- **Viseme Tuning**: Adjust frequency ranges and intensity multipliers in `useAgora.jsx`
- **LLM Models**: Switch between GPT-4, GPT-3.5, or other OpenAI-compatible APIs
- **TTS Voices**: Choose from 400+ Azure neural voices in different languages
- **UI Styling**: Modify Tailwind classes for custom appearance
- **Facial Expressions**: Add new expression presets in `Avatar.jsx`ser closes
- ✅ **Open Source** - Full code transparency
- ✅ **No Tracking** - No analytics or user tracking

## 🛠️ Advanced Development

### **Local Development**
```bash
# Clone repository
git clone https://github.com/AgoraIO-Community/RPM-agora-agent.git
cd agora-agent

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Deploy to GitHub Pages
npm run deploy

Key Development Features

No Environment Variables - All config via UI
Hot Module Replacement - Instant code updates
Debug Panels - Real-time lip sync monitoring
Animation Controls - Manual override capabilities
Audio Level Indicators - WebRTC connection status

Customization Options

Avatar Models - Replace GLB files with custom 3D models
Animation Sets - Add custom FBX animations
Voice Personalities - Configure different AI personalities
UI Themes - Customize interface appearance
Lip Sync Tuning - Adjust viseme sensitivity parameters

� PerformanceResources

Common Issues

No Audio Output: Check microphone permissions and Agora token validity
ConvoAI Connection Failed: Verify Customer ID/Secret and App ID match
No Lip Sync: Ensure AudioContext is not suspended (some browsers require user interaction)
Avatar Not Loading: Check browser console for GLB loading errors
Performance Issues: Close other browser tabs, check FPS in Three.js stats

Debug Mode

Open browser DevTools Console for detailed logs
Check Network tab for ConvoAI API call responses
Monitor WebAudio analyzer data in useAgora.jsx
Use Three.js DevTools extension for scene inspection

Learn More

Comprehensive Guide: See GUIDE.md for detailed implementation walkthrough
Deployment: See docs/DEPLOYMENT.md for production deployment
Architecture: See docs/ARCHITECTURE_PLAN.md for system design

Resources

🎉 Experience Real-Time AI Avatars

WebAudio-driven lip sync meets AI conversation in stunning 3D - all running in your browser with <50ms latency!

Live Demo: https://agoraio-community.github.io/RPM-agora-agent/

Built with ❤️ using Agora ConvoAI, ReadyPlayer.me, and WebAudio API
Questions? Open an issue on GitHub

ES6 modules

📞 Support & Troubleshooting

Common Issues

No Audio: Check microphone permissions
Connection Failed: Verify Agora credentials
No Lip Sync: Ensure WebAudio permissions
Performance Issues: Lower quality settings

Debug Mode

Open browser DevTools
Check Console for errors
Monitor Network tab for API calls
Use Performance tab for optimization

🎉 Experience the Future of AI Interaction

Real-time lip sync meets AI conversation in stunning 3D - all running in your browser!

Live Demo: https://agoraio-community.github.io/RPM-agora-agent/

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
docs		docs
public		public
src		src
.gitignore		.gitignore
GUIDE.md		GUIDE.md
GuidePreview.html		GuidePreview.html
README.md		README.md
USER_SETUP_GUIDE.md		USER_SETUP_GUIDE.md
backend-example.js		backend-example.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

🎭 Real-Time AI Avatars with Lip Sync Using Agora ConvoAI

🌟 Key Features

🎤 WebAudio-Driven Lip Sync

🤖 Agora ConvoAI Engine

🎨 ReadyPlayer.me Avatar System

🎯 How It Works

Real-Time Data Flow

Technical Deep Dive

Frequency-to-Viseme Mapping

ARKit Blend Shape Targets

Smooth Animation Pipeline

🚀 Quick Start

Prerequisites

1. Access the Live Demo

2. Configure Your API Credentials

Agora Tab

ConvoAI Tab

LLM Tab

TTS Tab

ASR Tab

3. Start Conversing with Your AI Avatar

Real-Time Audio Processing

3D Avatar Architecture

ConvoAI Integration

🔧 Technical Architecture

Frontend Stack

Real-Time Processing

3D Model Features

💰 Cost Structure

User-Controlled Costs

Install dependencies

Start development server (runs on http://localhost:5173)

Build for production

Deploy to GitHub Pages

Key Development Features

Customization Options

� PerformanceResources

Common Issues

Debug Mode

Learn More

Resources

🎉 Experience Real-Time AI Avatars

📞 Support & Troubleshooting

Common Issues

Debug Mode

🎉 Experience the Future of AI Interaction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages