AI-powered assistant for meeting transcription, analysis, and management.
This project implements an intelligent conversational agent that orchestrates the entire meeting intelligence workflow. It allows users to upload video recordings, automatically transcribe them with speaker identification, edit transcripts, store them in a vector database (Pinecone), and perform advanced RAG (Retrieval-Augmented Generation) queries to extract insights, summaries, and action items.
- ๐๏ธ MEMO โ Meeting Intelligence Agent
- ๐ Table of Contents
- ๐ Features
- ๐ Documentation
- ๐๏ธ System Architecture
- ๐ Quick Start
- ๐ณ Docker Support
- ๐ Live Demo & Deployment
- ๐ Project Structure
- ๐ Monitoring & Evaluation
- ๐ MCP Integration Details
- ๐ฎ Future Enhancements
- ๐ค Contributing
- ๐ License
- ๐ Acknowledgments
- ๐ง Contact
-
๐ฃ๏ธ Natural Language Interface: Control everything through a chat-based agent using LangGraph.
-
๐ Local/Cloud Deployment: Docker + Hugging Face Spaces.
-
๐น Video Analysis Pipeline:
- Upload MP4/MOV/AVI files directly.
- WhisperX Transcription: High-accuracy speech-to-text.
- Speaker Diarization: Automatically distinguishes between different speakers.
- Smart Speaker Mapping: LLM intelligently assigns real names to speaker labels (e.g., "Speaker_01" โ "Alice") from context.
-
โ๏ธ Interactive Editor: Review and correct transcripts before commiting them to the database.
-
๐ง Semantic Search (RAG):
- Stores meetings in Pinecone vector database.
- Intelligent metadata extraction (Titles, Dates, Summaries) using GPT-4o-mini.
- Time-Aware Queries: Understands relative time (e.g., "What did we discuss 2 weeks ago?") using a dedicated Time MCP server.
- Ask questions like "What did we decide about the budget?" or "List all action items for John".
-
๐ MCP Integration (Model Context Protocol):
- Connects to external tools like Notion to export meeting minutes directly.
- Custom Time Server: World time queries for relative date calculations.
- Zoom Integration: (Future) Real-time meeting capture via RTMS API.
-
๐ LangSmith Integration: Full tracing and monitoring of agent workflows.
Additional Documentation Files
For detailed technical documentation, see:
- ARCHITECTURE.md - Full system design
- TECHNICAL_IMPLEMENTATION.md - Complete tool reference and Mermaid diagrams
- DEPLOYMENT_GUIDE.md - Step-by-step deployment guide
- Pinecone Management Script - Utility for database management
graph TD
%% Define Styles
classDef ui fill:#e1f5fe,stroke:#01579b,stroke-width:2px**,color:#000000;
classDef agent fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000000;
classDef tools fill:#fff3e0,stroke:#ef6c00,stroke-width:2px,color:#000000;
classDef pipe fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000;
classDef db fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000000;
classDef ext fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#000000;
classDef dev fill:#ffccbc,stroke:#d84315,stroke-width:2px,stroke-dasharray: 5 5,color:#000000;
User([User]) <--> UI[Gradio Interface]
UI <--> Agent["Conversational Agent (LangGraph)"]
Agent <--> LLM[OpenAI GPT-3.5-turbo]
subgraph Tools ["๐ ๏ธ Tools & Capabilities"]
direction TB
VideoTools[Video Processing<br/>8 Tools]
QueryTools[Meeting Queries<br/>6 Tools]
MCPTools[MCP Integration]
end
Agent --> VideoTools
Agent --> QueryTools
Agent --> MCPTools
subgraph Pipeline ["๐น Video Pipeline"]
direction LR
Upload[Upload MP4/MOV/AVI]
Upload --> Whisper[WhisperX]
Whisper --> SpeakerID[Pyannote Diarization]
SpeakerID --> Editor[Interactive Editor]
Editor --> MetaExtract["GPT-4o-mini<br/>Metadata Extraction"]
end
VideoTools --> Pipeline
subgraph Storage ["๐พ Data Storage"]
MetaExtract --> Pinecone[("Pinecone DB")]
QueryTools <--> Pinecone
end
subgraph Integrations ["๐ External APIs"]
MCPTools --> Notion[Notion API]
MCPTools --> Time["โฐ World Time Server<br/>(Custom MCP)"]
MCPTools -.-> Zoom["๐ฅ Zoom RTMS API<br/>(Future Integration)"]
end
subgraph Monitoring ["๐ Monitoring"]
Agent --> LangSmith[LangSmith Tracing]
end
%% Apply Styles
class UI ui;
class Agent,LLM agent;
class VideoTools,QueryTools,MCPTools tools;
class Upload,Whisper,SpeakerID,Editor,MetaExtract pipe;
class Pinecone db;
class Notion,Time ext;
class Zoom dev;
class LangSmith ext;
- Python 3.11
- FFmpeg (required for audio processing)
- Node.js & npm (optional, required if using Notion MCP integration)
- Pinecone Account
- OpenAI API Key
-
Clone the repository:
git clone https://github.com/yourusername/meeting-agent.git cd meeting-agent -
Install dependencies:
pip install -r requirements.txt
-
Configure Environment: Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_key PINECONE_API_KEY=your_pinecone_key PINECONE_INDEX=your_index_name PINECONE_ENVIRONMENT=us-east-1 # Optional: For Notion MCP ENABLE_MCP=true NOTION_TOKEN=your_notion_key # LangSmith (optional) LANGSMITH_API_KEY=your_langsmith_key LANGSMITH_PROJECT=meeting-agent
-
Run the Application:
python app.py
Access the UI at
http://localhost:7860.
Manage your vector database with the included utility:
# List all meetings
python scripts/manage_pinecone.py list
# View statistics
python scripts/manage_pinecone.py stats
# Delete specific meeting
python scripts/manage_pinecone.py delete meeting_abc12345Build and run the application in a container.
-
Build the image:
docker build -t meeting-agent .โ ๏ธ IMPORTANT FOR HUGGING FACE SPACES:
Standard Gradio deployment may fail due to specific dependency conflicts (WhisperX/Pyannote).
You must use Docker for deployment.
Userequirements_hf.txt(rename it torequirements.txtinside your deployment repo) which contains safe, Linux-compatible version ranges. The standardrequirements.txtis optimized for local Mac/Dev environments. -
Run the container:
docker run -p 7860:7860 --env-file .env meeting-agent
- Live Demo: https://huggingface.co/spaces/GFiaMon/meeting-agent-docker
- One-Click Clone: Click "Duplicate Space" on the Hugging Face page to deploy your own instance
- Auto-Deploy: Cloned spaces automatically build as Docker containers with all dependencies
-
โฐ World Time Server: A custom MCP server for timezone-aware queries deployed at https://huggingface.co/spaces/GFiaMon/date_time_mpc_server_tool (Can be cloned or connected as an external MCP server to an AI agent)
-
๐ฅ Zoom RTMS Integration: In development (
external_mcp_servers/zoom_mcp/), working with Zoom's API team
meeting-agent/
โโโ app.py # ๐ Entry point (Gradio App)
โโโ src/
โ โโโ agents/ # LangGraph Agent definition
โ โโโ config/ # Configuration & Settings
โ โโโ processing/ # Audio/Video processing pipelines
โ โโโ retrievers/ # Pinecone & RAG logic
โ โโโ tools/ # Tool definitions (Video, General, MCP)
โ โโโ ui/ # Gradio UI components
โโโ external_mcp_servers/ # ๐ง Custom MCP servers
โ โโโ time_mcp_server/ # โฐ World Time Server (Gradio app)
โ โโโ zoom_mcp/ # ๐ฅ Zoom RTMS (prototype, in development)
โโโ archive/ # Deprecated code & experiments (only on local repo)
โโโ scripts/ # Helper scripts
โ โโโ manage_pinecone.py # Pinecone index management utility
โ โโโ setup_pinecone.py # Initial Pinecone setup
โโโ documentation/ # Technical Documentation
โโโ requirements.txt # Dependencies
The agent integrates with LangSmith for comprehensive tracing and monitoring:
- Prompt/Response Tracking: All agent interactions are logged
- Tool Usage: Complete tool execution history
- Performance Metrics: Latency and token usage tracking
- Debugging: Easy identification of issues in complex workflows
While full quantitative metrics are a future enhancement, the system includes:
- Functional Testing: All tools tested end-to-end
- Integration Testing: Pinecone, Notion, and MCP connections verified
- Notion MCP: Official
@notionhq/notion-mcp-serverfor Notion API access - World Time Server: Custom Gradio-based MCP server for timezone-aware queries
- Zoom RTMS: Prototype for future Zoom integration (in development)
The custom Time MCP server enables queries like:
- "What did we discuss last Tuesday?"
- "Show me meetings from 2 weeks ago"
- "What action items were assigned in December?"
The server calculates relative dates and provides timezone-aware timestamps for accurate meeting retrieval.
- Zoom RTMS API Integration: Real-time meeting transcription and capture
- Enhanced Metrics: Quantitative evaluation with LangSmith
- Batch Processing: Handle multiple meetings simultaneously
- Multi-language Support: Transcription in 10+ languages
- Advanced Analytics: Sentiment analysis, speaker analytics
- Export Formats: PDF, Google Docs, etc
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
- Ironhack Data Science & AI Program - Course framework and guidance
- OpenAI - Whisper and GPT models
- WhisperX - Audio/Video processing
- Pinecone - Vector database
- Notion - Notion API access
- LangChain - Agent framework and tools
- Hugging Face - Deployment platform and community
Author: Guillermo Fiallo Montero - Data Science & AI Engineer
Project Link: https://github.com/GFiaMon/meeting-intelligence-agent
Capstone Project - Ironhack Data Science & AI Program - December 2025

