A real-time data pipeline and dashboard that visualizes live GitHub activity from the GitHub Archive dataset.
- Real-time Data Collection: Continuously fetches GitHub Archive data
- Live Dashboard: Auto-refreshing Streamlit dashboard
- Redis Backend: Fast in-memory data storage
- Multiple Visualizations:
- Programming language popularity
- Trending repositories
- Event type breakdown
- Top active users
- Live event stream
βββββββββββββββββββ
β GitHub Archive β
β (gharchive) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Collector β βββ Fetches hourly data
β (collector.py) β Processes events
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Redis β βββ Stores aggregated stats
β (in-memory) β Sorted sets & hashes
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Dashboard β βββ Reads from Redis
β (main.py) β Real-time updates
βββββββββββββββββββ
Choose the method that works best for you:
Prerequisites: Docker & Docker Compose (Install Docker)
# 1. Clone the repository
git clone <your-repo-url>
cd github-live-dashboard
# 2. Setup environment
cp .env.example .env
# 3. Start everything
docker compose up --build
# 4. Open browser
# http://localhost:8501That's it! The collector will automatically start fetching GitHub data.
Useful commands:
# Run in background
docker compose up -d
# View logs
docker compose logs -f
# Stop services
docker compose downPrerequisites: Python 3.8+, pip, Redis (Install Redis)
# 1. Clone the repository
git clone <your-repo-url>
cd github-live-dashboard
# 2. Run automated setup
chmod +x setup-standalone.sh
./setup-standalone.sh
# 3. Start services
./run-all.sh
# 4. Open browser
# http://localhost:8501# 1. Install Redis
# Ubuntu/Debian: sudo apt install redis-server
# Mac: brew install redis
# Windows: See WINDOWS_SETUP.md
# 2. Start Redis
sudo systemctl start redis-server # Linux
brew services start redis # Mac
# 3. Setup Python
python3 -m venv .venv
source .venv/bin/activate # Linux/Mac
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env: Change REDIS_HOST=redis to REDIS_HOST=localhost
# 5. Run services (in separate terminals)
python src/collector.py # Terminal 1
streamlit run src/main.py # Terminal 2| Choose Docker if... | Choose Standalone if... |
|---|---|
| β You want easiest setup | β You can't install Docker |
| β You're okay installing Docker | β You already have Python/Redis |
| β You want isolated environment | β You want more control |
| β Works on any OS | β Lower resource usage |
For development with VS Code:
- Install "Dev Containers" extension
- Open project in VS Code
- Press
F1β "Dev Containers: Reopen in Container" - Dashboard auto-starts at http://localhost:8501
github-live-dashboard/
βββ .devcontainer/
β βββ devcontainer.json # VS Code dev container config
βββ src/
β βββ collector.py # Data collection service
β βββ main.py # Streamlit dashboard
βββ docker-compose.yml # Service orchestration
βββ Dockerfile # Application container
βββ requirements.txt # Python dependencies
βββ .env # Environment variables
βββ README.md # This file
Edit .env to customize:
# How many hours of historical data to fetch initially
LOOKBACK_HOURS=2
# Redis connection
REDIS_HOST=redis
REDIS_PORT=6379
# Streamlit configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0The dashboard includes controls for:
- Auto Refresh: Enable/disable automatic updates
- Refresh Interval: Set update frequency (5-60 seconds)
- Manual Refresh: Force immediate data reload
-
Collector (
collector.py):- Fetches compressed JSON files from https://data.gharchive.org
- Processes each event (Push, Watch, Fork, Issues, etc.)
- Updates Redis sorted sets and counters
- Maintains hourly statistics
-
Redis Storage:
trending_repos: Sorted set of repository activitylanguage_popularity: Sorted set of language countsevent_types: Sorted set of event type countstop_users: Sorted set of user activityrecent_events: List of last 100 eventshourly_stats:{hour}: Hash maps for hourly breakdownstotal_events: Counter for all processed events
-
Dashboard (
main.py):- Reads from Redis every N seconds
- Generates interactive Plotly visualizations
- Displays real-time metrics and charts
- Check if collector is running:
docker-compose ps - View collector logs:
docker-compose logs ingestor - The collector fetches data from GitHub Archive, which may take 1-2 minutes initially
- Ensure Redis is running:
docker-compose ps redis - Check Redis logs:
docker-compose logs redis - Verify port 6379 is not in use:
lsof -i :6379
- GitHub Archive may be temporarily unavailable
- Check network connectivity
- Try reducing
LOOKBACK_HOURSin.env
For production deployments, consider:
- Persistent Redis: Use Redis with persistence (RDB/AOF)
- Database: Store historical data in PostgreSQL/TimescaleDB
- Caching: Add Redis caching layer for dashboard queries
- Load Balancing: Use multiple dashboard instances behind nginx
- Monitoring: Add Prometheus/Grafana for system metrics
- Error Handling: Implement retry logic and dead letter queues
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Data provided by GitHub Archive
- Built with Streamlit
- Charts powered by Plotly
Created by @IamIremIdil
β If you found this project useful, please consider giving it a star! β