Multimodal Math Mentor

Author: Ujjwal Reddy Annedla

Project Overview

The goal was to build a math tutoring system that doesn't just "guess" answers but actually follows a logical, verifiable process.

🏗️ How It Works (The "5-Agent" Architecture)

I implemented a Multi-Agent System using LangChain to mimic how a human tutor thinks. It doesn't just output an answer; it moves through these distinct stages:

The Parser: First, it cleans the input. If the image is blurry or the audio is muffled, it triggers a HITL (Human-in-the-Loop) request immediately rather than guessing.
The Router: It identifies if the problem is Algebra, Calculus, or Probability to pick the right strategy.
The Solver (with RAG): Before solving, it looks up formulas in a local ChromaDB knowledge base. This prevents the "hallucination" of fake math theorems.
The Verifier: This agent acts as a critic. In my testing, this was crucial for catching sign errors (e.g., confusing - for +).
The Explainer: Finally, it formats the output into a student-friendly explanation.

graph TD
    %% Styling
    classDef gemini fill:#e8f0fe,stroke:#1a73e8,stroke-width:2px;
    classDef agent fill:#fce8e6,stroke:#d93025,stroke-width:2px;
    classDef database fill:#e6f4ea,stroke:#1e8e3e,stroke-width:2px;
    classDef hitl fill:#fef7e0,stroke:#f9ab00,stroke-width:2px,stroke-dasharray: 5 5;

    %% Nodes
    User([👤 User Input])
    Gemini(⚡ Gemini 2.0 Flash\nVision + Audio + Logic):::gemini
    
    subgraph "5-Agent System (LangChain)"
        Parser(Agent 1: Parser):::agent
        Router(Agent 2: Router):::agent
        Solver(Agent 3: Solver):::agent
        Verifier(Agent 4: Verifier):::agent
        Explainer(Agent 5: Explainer):::agent
    end

    RAG[(📚 ChromaDB\nKnowledge Base)]:::database
    Memory[(💾 Memory\nJSON History)]:::database
    HITL{Requires\nReview?}:::hitl
    UserEdit[✍️ HITL Panel]:::hitl

    %% Flow
    User -->|Image / Audio / Text| Gemini
    Gemini --> Parser
    Parser --> HITL
    
    HITL -->|Ambiguous| UserEdit
    UserEdit --> Router
    HITL -->|Clear| Router
    
    Router -->|Classify Topic| Solver
    Solver <-->|Retrieve Formula| RAG
    Solver -->|Draft Solution| Verifier
    
    Verifier -->|❌ Reject| Solver
    Verifier -->|✅ Approve| Explainer
    
    Explainer -->|Final Output| User
    Explainer -.->|Save Pattern| Memory
    Memory -.->|Recall Similar| Solver

📊 Evaluation & Observations

I tested the system on 20 JEE-level problems (handwritten and typed). Here is the honest breakdown:

Handwriting Recognition: The move to Gemini Vision was a win. It correctly read 19/20 handwritten integrals, whereas my initial tests with Tesseract failed on complex notations.
Reasoning Capability: The system solved 18/20 problems correctly. The 2 failures were in complex 3D geometry where the RAG retrieval didn't find the exact theorem needed.
Latency: The average response time is ~3.2 seconds, which feels snappy for a real-time app.
Memory Reuse: When I asked a similar question twice, the second response was generated ~40% faster because it retrieved the reasoning path from memory.

🛠️ Tech Stack

Model: Google Gemini 2.0 Flash (Chosen for its native multimodal reasoning)
Orchestration: LangChain
Vector Store: ChromaDB (Local persistence)
Frontend: Streamlit

⚙️ Setup & Run

1. Clone the repo

git clone [https://github.com/](https://github.com/)[your-username]/math-mentor.git
cd math-mentor

2. Install dependencies

pip install -r requirements.txt

3. Set up your API Key Create a .env file and add your Google key. That's the only key needed since Gemini handles everything.

GOOGLE_API_KEY=your_key_here

4. Build the Knowledge Base Run this script to ingest the math formulas into ChromaDB:

python rag_engine.py

5. Launch the App

streamlit run app.py

🎥 Demo Video

[Link to Demo Video] - Shows the HITL flow and Audio transcription in action.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agents.py		agents.py
app.py		app.py
check_models.py		check_models.py
debug.py		debug.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Math Mentor

Project Overview

🏗️ How It Works (The "5-Agent" Architecture)

📊 Evaluation & Observations

🛠️ Tech Stack

⚙️ Setup & Run

🎥 Demo Video

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Math Mentor

Project Overview

🏗️ How It Works (The "5-Agent" Architecture)

📊 Evaluation & Observations

🛠️ Tech Stack

⚙️ Setup & Run

🎥 Demo Video

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages