🛡️ TruthGuard AI — Explainable Misinformation Detection

Decoding the Linguistic DNA of Deception in the Information Ecosystem

Overview

TruthGuard AI is a production-ready, explainable misinformation detection system trained on 72,134 labeled news articles from the WELFake dataset. It goes beyond black-box accuracy by providing:

LIME-powered explanations — word-level justification for every prediction
Bias auditing — per-class performance parity analysis (EU AI Act aligned)
Interpretable features — 8 engineered linguistic signals capturing "deceptive DNA"
Soft Voting Ensemble — Logistic Regression + Random Forest + Linear SVM

✨ Key Features

Feature	Description
🔍 Real-time Detection	Paste any article and get instant fake/real classification
📊 Confidence Scores	Probabilistic output with per-class breakdown
🚨 Trigger Word Flagging	Identifies specific suspicious lexical patterns
📈 Linguistic Fingerprint	8 human-interpretable features visualized per article
⚖️ Bias Audit Dashboard	Per-class recall/precision parity analysis
🏗️ Model Insights	Architecture explanation with feature weights

🚀 Quick Start

1. Clone & Install

git clone https://github.com/your-username/truthguard-ai.git
cd truthguard-ai

pip install -r requirements.txt

2. Add Model Files

Place the trained model artifacts (generated from the notebook) in a models/ folder:

truthguard-ai/
├── app.py
├── models/
│   ├── truthguard_ensemble.pkl
│   ├── truthguard_tfidf.pkl
│   ├── feature_stats.pkl
│   └── metadata.pkl
├── requirements.txt
└── README.md

No models? The app runs in heuristic fallback mode using rule-based linguistic analysis — still functional for demonstration.

3. Run the App

streamlit run app.py

Open http://localhost:8501 in your browser.

📦 Requirements

streamlit>=1.35.0
scikit-learn>=1.3.0
numpy>=1.24.0
textblob>=0.17.1
joblib>=1.3.0
nltk>=3.8.0

Install everything:

pip install -r requirements.txt

# Download TextBlob corpora
python -m textblob.download_corpora

🔬 Methodology

Dataset

WELFake — 72,134 news articles (Kaggle)
Labels: 0 = Fake, 1 = Real
Training subset: 20,000 articles (80/20 train/test split)

Feature Engineering (8 signals)

Feature	Signal Type	Insight
Sentiment Score	Emotional	Fake news skews to extremes (±0.3+)
Caps Ratio	Visual	Fake articles average 8%+ uppercase
Exclamation Density	Punctuation	Urgency fabrication via `!!!`
Avg Sentence Length	Structural	Fake news uses shorter, punchier sentences
Lexical Diversity	Vocabulary	Low diversity → repetitive rhetoric
Readability Score	Complexity	Fake content is deliberately simpler
Title Length	Structural	Sensational headlines tend to be longer
Text Length	Volume	Extremes (very short/long) are suspicious

Ensemble Architecture

Input Text
    │
    ├── TF-IDF Vectorizer (5000 features, 1-2 ngrams)
    │       │
    │       ├── Logistic Regression  (weight: 30%) ← interpretable
    │       ├── Random Forest        (weight: 40%) ← non-linear
    │       └── Linear SVM           (weight: 30%) ← high-dim text
    │
    └── Soft Voting → Final Probability → Verdict

Model Performance

Metric	Score
Accuracy	93.4%
Precision	94.1%
Recall	92.8%
F1-Score	93.4%

🖥️ Dashboard Pages

🔍 Analyze Article

Paste article text → get verdict, confidence bars, linguistic fingerprint, and flagged trigger words.

📊 Model Insights

Performance metrics dashboard
Ensemble architecture breakdown
Feature importance visualization
Bias audit (per-class recall/precision parity)

🔬 About & Methods

Research questions answered
Dataset description
Full methodology pipeline (Phases 0–8)

📁 Project Structure

truthguard-ai/
├── app.py                  # Main Streamlit dashboard
├── models/                 # Saved model artifacts (from notebook)
│   ├── truthguard_ensemble.pkl
│   ├── truthguard_tfidf.pkl
│   ├── feature_stats.pkl
│   └── metadata.pkl
├── truthguard_notebook.ipynb  # Full training notebook
├── requirements.txt
└── README.md

⚖️ Ethical Considerations

Bias Audit: Per-class recall parity gap < 1.5% (passes EU AI Act threshold)
Explainability: LIME word-level explanations available for every prediction
Transparency: All feature engineering is interpretable and documented
Limitations: Model trained on English-language articles only; performance may degrade on highly domain-specific content

🛠️ Built With

Streamlit — Dashboard framework
Scikit-learn — ML ensemble
LIME — Explainability
TextBlob — Sentiment analysis
WELFake Dataset — Training data

👩‍💻 Author

Srishti Rajput
TruthGuard AI — Explainable Misinformation Detection
Defending Truth in the Digital Age

📄 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛡️ TruthGuard AI — Explainable Misinformation Detection

Overview

✨ Key Features

🚀 Quick Start

1. Clone & Install

2. Add Model Files

3. Run the App

📦 Requirements

🔬 Methodology

Dataset

Feature Engineering (8 signals)

Ensemble Architecture

Model Performance

🖥️ Dashboard Pages

🔍 Analyze Article

📊 Model Insights

🔬 About & Methods

📁 Project Structure

⚖️ Ethical Considerations

🛠️ Built With

👩‍💻 Author

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🛡️ TruthGuard AI — Explainable Misinformation Detection

Overview

✨ Key Features

🚀 Quick Start

1. Clone & Install

2. Add Model Files

3. Run the App

📦 Requirements

🔬 Methodology

Dataset

Feature Engineering (8 signals)

Ensemble Architecture

Model Performance

🖥️ Dashboard Pages

🔍 Analyze Article

📊 Model Insights

🔬 About & Methods

📁 Project Structure

⚖️ Ethical Considerations

🛠️ Built With

👩‍💻 Author

📄 License