git push -u origin main# ai-research-assistant-starter
An end-to-end multimodal AI Research Agent that listens, retrieves, summarizes, and speaks.
Designed for rapid prototyping, technical demonstration, and job-ready GitHub pinning.
- Clean structure suitable for pinning on your GitHub profile
- Swappable components (ASR / Retriever / Summarizer / TTS)
- Small, readable codebase with a simple test to validate imports
- No proprietary assets or heavy data included
# 0) Verify GPU access (optional)
nvidia-smi
# 1) Create environment (Python 3.10)
conda create -n research-assistant python=3.10 -y
conda activate research-assistant
# 2) System deps (Whisper requires FFmpeg)
sudo apt-get update && sudo apt-get install -y ffmpeg
# 3) Install Python deps
pip install -r requirements.txt
# 4) Run API
uvicorn app:app --reload --port 8000Try it
-
Text query:
curl -X POST http://localhost:8000/ask -H "Content-Type: application/json" -d '{"query":"Summarize recent advances in retrieval-augmented generation."}'
-
Audio (WAV/MP3) query:
curl -X POST "http://localhost:8000/ask" -F "audio_file=@input.wav"
Response includes: answer, citations (arXiv IDs), and tts_wav (path to WAV file if generated).
This project functions as a lightweight AI Research Agent, coordinating several cognitive tools through a reasoning loop:
User → ASR (Whisper/Faster-Whisper) → arXiv Retriever → LLM Summarizer → TTS / Notion SyncEach module operates as an independent agent skill, orchestrated by the FastAPI controller (app.py). Together they emulate perception (listening), reasoning (retrieval + summarization), and action (text/voice response).
The system can be easily extended with:
Memory (e.g., SQLite or vector store for multi-turn context)
Advanced orchestration using LangChain, LangGraph, or function-calling frameworks
External integrations (Notion, Slack, or custom research APIs)
Edit .env (copy from .env.example) or set env vars:
ASR_BACKEND=faster_whisper|whisper|noneUSE_GPU=1or0SUMMARIZER_MODEL= huggingface model id (default:facebook/bart-large-cnn)- Optional Notion:
NOTION_API_KEY,NOTION_PAGE_ID
ai-research-assistant-starter/
├─ app.py
├─ config.py
├─ state.py
├─ tools/
│ ├─ asr.py
│ ├─ retriever.py
│ ├─ llm.py
│ ├─ tts.py
│ └─ notion_sync.py
├─ tests/
│ └─ test_smoke.py
├─ cli.py
├─ requirements.txt
├─ environment.yml
├─ .env.example
├─ .gitignore
└─ README.md
A compact verification bundle (script + fixtures) can be shared privately to confirm:
- ASR is functional with a reference sample audio clip (Whisper / Faster‑Whisper)
- Retrieval returns deterministic arXiv results for a fixed query
- Summarizer produces a valid non‑empty answer within expected latency bounds
- TTS generates a playable WAV file (Linux/WSL) or speaks via
pyttsx3(Windows) - API contract:
/askresponds withanswer,citations, andtts_wav
For reviewers: request the verification pack to reproduce results quickly without exposing private assets.
- On WSL, play WAV responses using ffplay or transfer them to Windows for playback.
- Swap in smaller summarizer models (e.g., sshleifer/distilbart-cnn-12-6) for faster inference.
- Retrieval is modular — integrate vector databases (Chroma, FAISS) for richer results.
MIT