Skip to content

IamIremIdil/Gitboard-live

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ GitHub Archive Live Dashboard

A real-time data pipeline and dashboard that visualizes live GitHub activity from the GitHub Archive dataset.

Dashboard Preview Python Streamlit

🎯 Features

  • Real-time Data Collection: Continuously fetches GitHub Archive data
  • Live Dashboard: Auto-refreshing Streamlit dashboard
  • Redis Backend: Fast in-memory data storage
  • Multiple Visualizations:
    • Programming language popularity
    • Trending repositories
    • Event type breakdown
    • Top active users
    • Live event stream

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GitHub Archive  β”‚
β”‚   (gharchive)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Collector     β”‚ ◄── Fetches hourly data
β”‚  (collector.py) β”‚     Processes events
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Redis       β”‚ ◄── Stores aggregated stats
β”‚   (in-memory)   β”‚     Sorted sets & hashes
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Dashboard     β”‚ ◄── Reads from Redis
β”‚   (main.py)     β”‚     Real-time updates
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Choose the method that works best for you:

🐳 Method 1: Docker (Recommended - Easiest)

Prerequisites: Docker & Docker Compose (Install Docker)

# 1. Clone the repository
git clone <your-repo-url>
cd github-live-dashboard

# 2. Setup environment
cp .env.example .env

# 3. Start everything
docker compose up --build

# 4. Open browser
# http://localhost:8501

That's it! The collector will automatically start fetching GitHub data.

Useful commands:

# Run in background
docker compose up -d

# View logs
docker compose logs -f

# Stop services
docker compose down

🐍 Method 2: Standalone (Without Docker)

Prerequisites: Python 3.8+, pip, Redis (Install Redis)

Quick Setup (Linux/Mac)

# 1. Clone the repository
git clone <your-repo-url>
cd github-live-dashboard

# 2. Run automated setup
chmod +x setup-standalone.sh
./setup-standalone.sh

# 3. Start services
./run-all.sh

# 4. Open browser
# http://localhost:8501

Manual Setup (All Platforms)

# 1. Install Redis
# Ubuntu/Debian: sudo apt install redis-server
# Mac: brew install redis
# Windows: See WINDOWS_SETUP.md

# 2. Start Redis
sudo systemctl start redis-server  # Linux
brew services start redis           # Mac

# 3. Setup Python
python3 -m venv .venv
source .venv/bin/activate           # Linux/Mac
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Edit .env: Change REDIS_HOST=redis to REDIS_HOST=localhost

# 5. Run services (in separate terminals)
python src/collector.py      # Terminal 1
streamlit run src/main.py    # Terminal 2

πŸ’‘ Which Method Should I Use?

Choose Docker if... Choose Standalone if...
βœ… You want easiest setup βœ… You can't install Docker
βœ… You're okay installing Docker βœ… You already have Python/Redis
βœ… You want isolated environment βœ… You want more control
βœ… Works on any OS βœ… Lower resource usage

πŸ”§ VS Code Dev Containers (Optional)

For development with VS Code:

  1. Install "Dev Containers" extension
  2. Open project in VS Code
  3. Press F1 β†’ "Dev Containers: Reopen in Container"
  4. Dashboard auto-starts at http://localhost:8501

πŸ“ Project Structure

github-live-dashboard/
β”œβ”€β”€ .devcontainer/
β”‚   └── devcontainer.json          # VS Code dev container config
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ collector.py               # Data collection service
β”‚   └── main.py                    # Streamlit dashboard
β”œβ”€β”€ docker-compose.yml             # Service orchestration
β”œβ”€β”€ Dockerfile                     # Application container
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ .env                          # Environment variables
└── README.md                     # This file

πŸ”§ Configuration

Environment Variables

Edit .env to customize:

# How many hours of historical data to fetch initially
LOOKBACK_HOURS=2

# Redis connection
REDIS_HOST=redis
REDIS_PORT=6379

# Streamlit configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0

Dashboard Settings

The dashboard includes controls for:

  • Auto Refresh: Enable/disable automatic updates
  • Refresh Interval: Set update frequency (5-60 seconds)
  • Manual Refresh: Force immediate data reload

πŸ“Š Data Flow

  1. Collector (collector.py):

    • Fetches compressed JSON files from https://data.gharchive.org
    • Processes each event (Push, Watch, Fork, Issues, etc.)
    • Updates Redis sorted sets and counters
    • Maintains hourly statistics
  2. Redis Storage:

    • trending_repos: Sorted set of repository activity
    • language_popularity: Sorted set of language counts
    • event_types: Sorted set of event type counts
    • top_users: Sorted set of user activity
    • recent_events: List of last 100 events
    • hourly_stats:{hour}: Hash maps for hourly breakdowns
    • total_events: Counter for all processed events
  3. Dashboard (main.py):

    • Reads from Redis every N seconds
    • Generates interactive Plotly visualizations
    • Displays real-time metrics and charts

πŸ› Troubleshooting

Dashboard shows "Waiting for data"

  • Check if collector is running: docker-compose ps
  • View collector logs: docker-compose logs ingestor
  • The collector fetches data from GitHub Archive, which may take 1-2 minutes initially

Redis connection failed

  • Ensure Redis is running: docker-compose ps redis
  • Check Redis logs: docker-compose logs redis
  • Verify port 6379 is not in use: lsof -i :6379

Collector not processing events

  • GitHub Archive may be temporarily unavailable
  • Check network connectivity
  • Try reducing LOOKBACK_HOURS in .env

πŸ“ˆ Scaling for Production

For production deployments, consider:

  1. Persistent Redis: Use Redis with persistence (RDB/AOF)
  2. Database: Store historical data in PostgreSQL/TimescaleDB
  3. Caching: Add Redis caching layer for dashboard queries
  4. Load Balancing: Use multiple dashboard instances behind nginx
  5. Monitoring: Add Prometheus/Grafana for system metrics
  6. Error Handling: Implement retry logic and dead letter queues

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“§ Contact

Created by @IamIremIdil


⭐ If you found this project useful, please consider giving it a star! ⭐

About

A real-time data pipeline and dashboard that visualizes live GitHub activity from the GitHub Archive dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors