Skip to content

Sumitkk10/SME-Agent-for-Climate-and-Weather

Repository files navigation

Climate SME Agent

A K-12 focused Subject Matter Expert (SME) agent for Climate and Weather education, built with advanced RAG, hierarchical chunking, and safety guardrails.

This project implements a sophisticated Retrieval-Augmented Generation (RAG) system that acts as a tutor and curriculum designer. It goes beyond simple "chat with PDF" by using a "Index Small, Retrieve Big" strategy to ensure high precision in search while providing rich context to the LLM.

Key Features

  • Hierarchical Retrieval ("Index Small, Retrieve Big"):
    • Documents are split into Parent (2048 tokens), Child (512 tokens), and Grandchild (128 tokens) chunks.
    • Only Grandchild chunks are indexed for high-precision vector search.
    • When a match is found, the system automatically retrieves the Parent chunk to provide full context to the LLM.
  • Advanced RAG Pipeline:
    • Hybrid Search: Combines dense vector search (FAISS + BGE large) with keyword search.
    • Reranking: Uses a cross-encoder (BAAI/bge-reranker-large) to re-score top results for maximum relevance.
  • K-12 Education Focused:
    • Difficulty Adaptation: Automatically adjusts responses for K6-8, K9-10, or K11-12 levels.
    • Curriculum Generation: Uses an LLM agent to design structured lesson plans and export them as PDF, DOCX, or PPTX.
  • Safety Guardrails:
    • Input sanitization to prevent prompt injections.
    • Output moderation to ensure content is safe for students.
  • Model Agnostic:
    • Supports Local LLMs (Ollama) for privacy and zero cost.
    • Supports Cloud LLMs (OpenAI, Anthropic, Gemini) for higher reasoning capabilities.

Architecture Highlights

The system is designed around a decoupled indexing/retrieval strategy:

  1. Ingestion: Multi-format support (PDF, DOCX, PPTX, TXT) with src/preprocessing.py.
  2. Chunking: Recursive character splitting with metadata propagation in src/chunking.py.
  3. Indexing: src/indexing.py builds a FAISS index of grandchild chunks using BAAI/bge-large-en-v1.5.
  4. Routing: A hybrid intent classifier (Regex + LLM) routes queries to the appropriate handler (Chat, RAG, Curriculum, Tools).

Installation

Prerequisites

  • Python 3.8+
  • Ollama (optional, for local LLMs)

Setup

  1. Clone the repository and install dependencies:

    pip install -r requirements.txt
  2. (Optional) If using a local LLM, pull the model:

    ollama pull llama3.2

Usage

1. Build the Knowledge Base

Process the documents in data/ and build the vector index:

python build_index.py

This handles hierarchical chunking, embedding generation, and FAISS indexing.

2. Run the Agent (Interactive Mode)

Start the chat interface:

python climate_sme_agent.py --interactive

Commands inside chat:

  • /difficulty [basic|intermediate|advanced] - Set the target audience level.
  • /tools - List available tools (Calculator, Source Lookup, etc.).
  • /quit - Exit.

3. Usage Examples

Single Query:

python climate_sme_agent.py --query "Explain the greenhouse effect" --difficulty basic

Curriculum Generation: Generate a 6-week lesson plan on hurricanes and export to PowerPoint:

python climate_sme_agent.py \
  --curriculum-topics "hurricanes,severe storms" \
  --curriculum-grade K6-8 \
  --curriculum-format pptx \
  --curriculum-weeks 6 \
  --curriculum-output my_lesson_plan

Demo Mode: Run a pre-scripted demonstration of capabilities:

python climate_sme_agent.py --demo

Configuration

Settings are managed in src/config.py.

LLM Providers

You can switch providers via command line arguments or environment variables.

Local (Ollama):

python climate_sme_agent.py --llm-provider ollama --llm-model llama3.2

OpenAI (GPT-4, etc.):

export OPENAI_API_KEY="sk-..."
python climate_sme_agent.py --llm-provider openai --llm-model gpt-4

Anthropic (Claude):

export ANTHROPIC_API_KEY="sk-..."
python climate_sme_agent.py --llm-provider anthropic --llm-model claude-3-opus

Project Structure

.
├── src/
│   ├── preprocessing.py      # content ingestion
│   ├── chunking.py           # hierarchical splitter
│   ├── indexing.py           # FAISS + Embedding logic
│   ├── sme.py                # Difficulty adaptation & response logic
│   ├── agent.py              # Orchestrator & Router
│   ├── llm.py                # LLM Provider wrappers
│   └── tools.py              # Calculator, Search tools
├── data/                     # Source documents
├── build_index.py            # Script: Docs -> Vector Store
├── query_index.py            # Script: Test retrieval only
└── climate_sme_agent.py      # Main Entry Point

Data Sources

The default knowledge base includes open educational resources:

  • Open WA Weather and Climate Book
  • NIOS Weather and Climate Chapters
  • KS3 Weather and Climate Lessons

Contributors

About

A K-12 focused Subject Matter Expert (SME) agent for Climate and Weather education, built with advanced RAG, hierarchical chunking, and safety guardrails.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages