Skip to content

HarshitWaldia/WebTraffic-Analysis-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 WEBTraffic Analysis Agent

Python Groq LangChain LangGraph Gradio dotenv Agentic AI Status


Agentic GA4 Traffic Analysis with Exact-Format LLM Summaries

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

This project demonstrates safe, production-ready LLM integration by clearly separating deterministic computation from generative reasoning.


🚀 Key Features

  • ✅ Multi-table GA4 Excel parsing
  • ✅ Automatic YOY % and LM % calculations
  • ✅ State-driven agent workflow (INIT → COMPLETE)
  • ✅ LLM-generated summaries in exact predefined format
  • ✅ Anti-hallucination design (LLM reasoning only)
  • ✅ Professional, styled Excel output
  • ✅ Fallback logic if LLM is unavailable

🧠 Why This Project?

GA4 (Google Analytics 4) traffic reports often:

  • Contain multiple tables in a single Excel sheet
  • Require manual calculations for YOY and MoM
  • Depend on human-written summaries, leading to inconsistency

LLMs alone cannot be trusted with numerical accuracy.

👉 This project solves that by:

  • Using Python for all calculations
  • Using LLM only for explanation
  • Enforcing strict output formats
  • Producing business-ready reports

🏗️ System Architecture

High-Level Flow

Excel Input (GA4 Data)
        │
        ▼
Traffic Analysis Agent (State Machine)
        │
 ┌──────┴────────┐
 │               │
 ▼               ▼
Table Parser   Metrics Engine
 │               │
 └──────┬────────┘
        ▼
LLM Summary Agent (Groq – LLaMA 3.3)
        │
        ▼
Styled Excel Report (Output)

Architectural Principles

  • Agentic orchestration via state machine
  • Deterministic math, generative reasoning
  • Fail-safe design with fallback summaries

🧩 Agent Workflow (State Machine)

State Description
INIT Initialize agent and config
LOAD_DATA Load Excel GA4 data
PARSE_TABLES Detect and parse multiple tables
CALCULATE_METRICS Compute YOY %, LM %, totals
GENERATE_SUMMARIES LLM-based reasoning
CREATE_OUTPUT Styled Excel report
COMPLETE Successful execution

📐 Metrics Calculation Logic

Year-over-Year (YOY %)

(Current Year − Previous Year) / Previous Year × 100

Last Month (LM %)

(Current Month − Previous Month) / Previous Month × 100

Special Handling

  • Totals calculated only till August
  • YOY applied to Total row
  • % Change row auto-updated
  • All numeric values validated before use

🤖 LLM Integration (Groq)

  • Model: llama-3.3-70b-versatile
  • Purpose: Narrative reasoning only
  • Temperature: 0.2
  • No calculations performed by LLM

Summary Format (Strict)

  • Reasoning (Aug vs July – 1 sentence)
  • Year-over-Year comparison
  • Trend analysis
  • Important notes

Anti-Hallucination Controls

  • Explicit data injection into prompt
  • Strict formatting rules
  • Low temperature
  • Python-computed values only

📄 Output Report

The final Excel report includes:

  • 📌 Calculated GA4 tables
  • 📌 LLM summaries placed above each table
  • 📌 Color-coded trends
  • 📌 Business-ready formatting

Ideal for:

  • Stakeholder reviews
  • Management presentations
  • Monthly analytics reporting

🛠️ Tech Stack

  • Python
  • Pandas / NumPy
  • OpenPyXL
  • Groq API
  • LLaMA 3.3 (70B)
  • Excel (GA4 Data)

⚙️ How It Works (Step-by-Step)

1️⃣ Input Ingestion

  • The agent loads a GA4 Excel file containing multiple traffic tables
  • No assumptions about table count or placement

2️⃣ Agentic Orchestration (State Machine)

  • The system progresses through explicit states
  • Each state has one responsibility
  • Failures are isolated and debuggable

This avoids tightly coupled scripts and makes the system predictable.


3️⃣ Intelligent Table Parsing

  • Tables are detected dynamically using "Month" headers
  • Supports inconsistent Excel formatting
  • Duplicate columns handled safely

4️⃣ Deterministic Metrics Engine (Python)

All calculations are done before invoking the LLM:

  • YOY %
  • Last Month %
  • Totals till August
  • % Change rows

✔ Reproducible ✔ Accurate ✔ Excel-equivalent


5️⃣ Controlled LLM Reasoning Layer

  • LLM receives pre-computed data only
  • Generates business summaries
  • Enforced exact format

❌ No math ❌ No data invention


6️⃣ Professional Output Generation

  • Summaries placed above each table
  • Styled Excel formatting
  • Stakeholder-ready output

flowchart TD
    Start([Start]) --> A[📊 GA4 Excel Input<br/>Multiple Traffic Tables]
    
    A --> B{🤖 Traffic Analysis Agent<br/>State Machine}
    
    B -->|State 1| C[📥 LOAD_DATA]
    C -->|pandas.read_excel| C1[Validate Structure]
    C1 --> C2[Load All Sheets]
    
    C2 -->|State 2| D[🔍 PARSE_TABLES]
    D --> D1[Detect Month Headers]
    D1 --> D2[Extract Table Ranges]
    D2 --> D3[Separate Multiple Tables]
    
    D3 -->|State 3| E[📐 CALCULATE_METRICS]
    E --> E1[🔢 YOY % Calculation<br/>Current vs Last Year]
    E --> E2[📊 LM % Calculation<br/>Current vs Last Month]
    E --> E3[➕ Total Rows<br/>Sum + % Change]
    
    E1 & E2 & E3 -->|State 4| F[🧠 GENERATE_SUMMARIES]
    
    F --> G[☁️ Groq API Call]
    G --> G1[LLaMA 3.3 70B Model]
    G1 --> G2[Prompt: Exact Format Instructions]
    
    G2 --> H[📝 Structured Summary]
    H --> H1[💡 Reasoning Section]
    H --> H2[📈 YOY Analysis]
    H --> H3[📉 Trend Observations]
    H --> H4[📌 Key Notes]
    
    H1 & H2 & H3 & H4 -->|State 5| I[📦 CREATE_OUTPUT]
    
    I --> I1[Build Excel Workbook]
    I1 --> I2[Apply Cell Styling]
    I2 --> I3[Format Headers]
    I3 --> I4[Add Summary Sections]
    
    I4 --> J[💾 Styled Excel Output<br/>Professional Report]
    
    J --> K([✅ COMPLETE])
    
    %% Styling
    classDef input fill:#E3F2FD,stroke:#1E88E5,stroke-width:3px
    classDef orchestrator fill:#E8F5E9,stroke:#2E7D32,stroke-width:3px
    classDef processing fill:#FFFDE7,stroke:#F9A825,stroke-width:2px
    classDef metrics fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px
    classDef llm fill:#FCE4EC,stroke:#C2185B,stroke-width:2px
    classDef output fill:#E0F2F1,stroke:#00695C,stroke-width:2px
    classDef terminal fill:#ECEFF1,stroke:#37474F,stroke-width:3px
    
    class Start,A input
    class B orchestrator
    class C,C1,C2,D,D1,D2,D3 processing
    class E,E1,E2,E3 metrics
    class F,G,G1,G2,H,H1,H2,H3,H4 llm
    class I,I1,I2,I3,I4,J output
    class K terminal
Loading

🧠 Why This Works (Design Rationale)

✅ Separation of Concerns

Responsibility Component
Math & Logic Python
Orchestration Agent State Machine
Reasoning LLM
Presentation Excel Builder

✅ Anti-Hallucination Architecture

LLMs explain results — they do not compute them.

  • Data is injected directly
  • Strict formatting rules
  • Low temperature
  • Fallback summaries

✅ Agentic, Not Scripted

  • Dynamic table detection
  • Context-aware summaries
  • Multi-stage decision flow
  • Failure-tolerant execution

✅ Production-Ready

  • Deterministic execution
  • Observable states
  • Scalable to more tables
  • Business-aligned output

⚙️ Setup & Usage

1️⃣ Install Dependencies

pip install pandas numpy openpyxl python-dotenv

2️⃣ Set Environment Variable

export GROQ_API_KEY=your_api_key_here

(Windows)

set GROQ_API_KEY=your_api_key_here

3️⃣ Run the Agent

python agentic_main.py

🔐 Fallback Behavior

If no Groq API key is provided:

  • The agent still runs
  • Summaries are generated using deterministic fallback logic
  • No execution failure

📌 What This Project Demonstrates

  • Real-world Agentic AI design
  • Safe and responsible LLM usage
  • Production-grade data automation
  • Explainable AI workflows
  • Clean separation of logic and reasoning

📧 Contact

Have questions or suggestions?


🌟 Star History

If you find this project helpful, please consider giving it a star! ⭐


👨‍💻 Author

Harshit Waldia

Ahaṁ Brahmāsmi | अहं ब्रह्मास्मि

The true self is not the body but an eternal, infinite part of the universe

About

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages