A modern Next.js application showcasing D-ID's live streaming capabilities with AI-powered conversation, voice recognition, and dynamic presenter selection.
- π Dynamic Presenter Selection: Choose from D-ID's presenter library or use custom images
- π₯ Real-time Avatar Streaming: D-ID WebRTC streaming with interactive avatars
- π€ Voice Recognition: Deepgram Nova-2 speech-to-text transcription
- π€ AI Chat: GPT-4o integration for intelligent conversations
- π¬ Multi-Modal Input: Support both text and voice interactions
- π¨ Modern Glass-morphism UI: Responsive design with Tailwind CSS v4
- π§ TypeScript: Fully typed for better development experience
- π¬ Interactive Previews: Hover to see presenter talking previews
- π Context-Based Architecture: Global state management for seamless switching
- π‘οΈ Advanced Error Handling: Detailed error reporting with fallback systems
You'll need API keys for the following services:
- D-ID - For avatar streaming
- OpenAI - For GPT-4o chat completions
- Deepgram - For speech-to-text transcription
- ElevenLabs - For voice synthesis
-
Clone and navigate to the project:
cd d-id-nextjs -
Install dependencies:
npm install # or bun install -
Configure environment variables: Copy
.env.localand update with your API keys:# D-ID API Configuration NEXT_PUBLIC_DID_API_KEY=your_did_api_key_here NEXT_PUBLIC_DID_WEBSOCKET_URL=wss://api.d-id.com # Note: DID_SERVICE is now optional - managed by PresenterContext # OpenAI API Configuration NEXT_PUBLIC_OPENAI_API_KEY=your_openai_api_key_here # Deepgram API Configuration NEXT_PUBLIC_DEEPGRAM_API_KEY=your_deepgram_api_key_here # ElevenLabs API Configuration NEXT_PUBLIC_ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
-
Add idle videos (optional): Place idle video files in the
publicdirectory:emma_idle.mp4(for talks service)alex_v2_idle.mp4(for clips service)
-
Run the development server:
npm run dev # or bun dev -
Open http://localhost:3000 in your browser
- Select Presenter: Click the presenter icon to choose from available presenters or custom images
- Choose Mode: Toggle between "Clips" (pre-trained presenters) and "Talks" (custom images)
- Connect: Click the "Connect" button to establish a connection to D-ID streaming
- Chat: Type messages in the text input or use voice recording
- Voice: Hold the "Hold to Record" button to capture voice input
- Watch: The avatar will respond with synthesized speech and lip-sync
- Presenter Previews: Hover over presenters in the selection grid to see talking previews
- Dynamic Switching: Change presenters anytime (automatically disconnects and requires reconnection)
- Error Handling: Detailed error messages help troubleshoot connection issues
- Fallback System: If presenter videos fail to load, local videos automatically serve as backups
The application features a modern, context-driven architecture:
PresenterContext.tsx- Global presenter state management with automatic disconnect handling
/api/presenters/route.ts- D-ID presenter API proxy with 30-minute caching
deepgramClient.ts- Speech-to-text transcriptionopenaiClient.ts- GPT-4o chat completionsdidClient.ts- Enhanced D-ID WebSocket/WebRTC with dynamic presenter supportwebrtcManager.ts- WebRTC peer connection handling
useConversation.ts- Chat history and LLM interactionsuseVoiceRecording.ts- Audio capture and transcriptionuseDidStreaming.ts- Context-aware D-ID connection and video streaming
StreamingChat.tsx- Main application orchestrator with enhanced error handlingPresenterSelector.tsx- Dynamic presenter selection with API integrationVideoDisplay.tsx- Smart video display with automatic fallback systemChatInterface.tsx- Text chat with message historyVoiceRecorder.tsx- Voice recording with visual feedbackStatusPanel.tsx- Connection and system statusControlButtons.tsx- Connect/disconnect controls
- Next.js 15 - React framework with App Router
- TypeScript - Type safety and better DX
- Tailwind CSS - Utility-first styling
- WebRTC - Real-time peer-to-peer communication
- WebSocket - Real-time messaging with D-ID
The app features comprehensive error handling and debugging:
- Categorized Errors: Separate display for D-ID Streaming, AI, and Voice errors
- Detailed Information: Shows connection IDs, request IDs for easier debugging
- User-Friendly Messages: Clear, actionable error descriptions
- API Error Parsing: Extracts detailed error information from D-ID responses
- Console Logging: Comprehensive logging for development and troubleshooting
- Message Tracking: Full WebSocket message logging with presenter configuration
- Connection State Monitoring: Real-time connection status and error tracking
- Video Fallbacks: Automatic switching from remote to local videos on load failure
- API Rate Limiting: Detection and user notification for API limits
- Connection Recovery: Graceful handling of WebSocket/WebRTC disconnections
- Error Boundaries: React error boundaries for graceful degradation
βββ app/
β βββ api/presenters/ # D-ID presenter API proxy with caching
β βββ globals.css # Tailwind CSS v4 configuration
β βββ layout.tsx # Root layout
β βββ page.tsx # Main page with PresenterProvider
βββ components/
β βββ PresenterSelector.tsx # Dynamic presenter selection UI
β βββ StreamingChat.tsx # Main orchestrator with error handling
β βββ VideoDisplay.tsx # Smart video display with fallbacks
β βββ [other components] # Chat, voice, status components
βββ contexts/
β βββ PresenterContext.tsx # Global presenter state management
βββ hooks/ # Context-aware custom React hooks
βββ services/ # Enhanced API clients
βββ types/ # Comprehensive TypeScript definitions
βββ utils/ # Configuration and constants
βββ public/ # Static assets and fallback videos
- Context-Driven Architecture: Global state management using React Context
- Modular Components: Single-responsibility components with clear interfaces
- Custom Hooks: Context-aware hooks for state management
- TypeScript Safety: Comprehensive type definitions and strict typing
- Error-First Design: Comprehensive error handling and fallback systems
- Performance Optimized: API caching, video preloading, efficient re-rendering
# Development
bun run dev
# Production build
bun run build
# Type checking
bun run type-check
# Linting
bun run lint-
"Internal server error" from D-ID
- Check the detailed error display in the UI for connection/request IDs
- Verify presenter configuration in console logs
- Ensure selected presenter is valid and streamable
- Check D-ID API key permissions and quotas
-
Presenter videos not loading
- Videos automatically fallback to local files if remote URLs fail
- Check console for "Trying fallback local video..." messages
- Ensure local idle videos exist in
/publicdirectory
-
WebRTC Connection Failed
- Check firewall settings
- Ensure HTTPS in production
- Verify D-ID API key and permissions
- Check browser console for detailed WebSocket messages
-
Voice Recording Not Working
- Check microphone permissions
- Ensure HTTPS for getUserMedia
- Verify Deepgram API key
- Check browser compatibility
-
Presenter Selection Issues
- Ensure D-ID API key has access to clips/presenters endpoint
- Check network connectivity for API calls
- Verify API rate limits haven't been exceeded
- Chrome/Chromium: Full support
- Firefox: Full support
- Safari: Requires additional WebRTC polyfills
- Mobile browsers: Limited WebRTC support
Unlike traditional static configurations, this demo features:
- Real-time API Integration: Fetches presenters directly from D-ID's live API
- Interactive Selection: Visual grid with hover previews and smooth transitions
- Context Management: Global state ensures consistency across components
- Automatic Switching: Seamless presenter changes with connection management
- Granular Error Parsing: Extracts specific error details from D-ID responses
- User-Friendly Display: Categorized error messages with actionable information
- Development Tools: Comprehensive logging and debugging information
- Graceful Fallbacks: Multiple layers of fallback systems
- Smart Caching: 30-minute API response caching to reduce calls
- Video Fallbacks: Automatic switching to local videos when remote fails
- Optimized Rendering: Context-based architecture prevents unnecessary re-renders
- Mobile-Responsive: Works seamlessly across devices and screen sizes
This project is for demonstration purposes. Please ensure you comply with the terms of service for all third-party APIs used (D-ID, OpenAI, Deepgram, ElevenLabs).