🏎 F1 Analytics Hub — Historical Race Intelligence Platform

End-to-end data engineering & analytics project that extracts 7+ years of Formula 1 race data, models it in a dimensional data warehouse, and delivers interactive visualizations through a custom-built dashboard.

Project Overview

This project demonstrates a complete data engineering pipeline — from raw API extraction to a polished, interactive analytics dashboard. It processes historical Formula 1 data spanning the 2018–2025 seasons, covering race results, lap-by-lap telemetry, pit stop strategies, and qualifying sessions.

Key highlights:

ETL Pipeline built in Python that extracts data from the FastF1 API, applies business transformations (team name normalization, time conversions, null handling), and loads it into a PostgreSQL star schema
Dimensional Data Model with 4 dimension tables and 4 fact tables, optimized with indexes for analytical queries across 100k+ lap records
Interactive Dashboard with 5 analytical views, dynamic team-based theming, and light/dark mode support
Incremental Loading — the ETL detects already-loaded races and skips them, with graceful API rate limit handling
41 automated tests covering transformations, ORM models, and ETL logic

Architecture

                    ┌─────────────────┐
                    │   FastF1 API    │  Historical F1 data source
                    └────────┬────────┘
                             │ Extract (Python · requests · caching)
                    ┌────────▼────────┐
                    │   ETL Engine    │  Pandas · SQLAlchemy · Custom transforms
                    │   (Transform)   │  Team normalization · Time conversion
                    └────────┬────────┘  Null handling · Incremental logic
                             │ Load (bulk insert · chunksize=100)
                    ┌────────▼────────┐
                    │   PostgreSQL    │  Docker container · Star Schema
                    │   15 (DW)      │  4 dimensions · 4 fact tables
                    └────────┬────────┘  Indexed FKs · Persistent volume
                             │ Query (SQLAlchemy · cached)
                    ┌────────▼────────┐
                    │   Streamlit     │  5 analytical pages
                    │   Dashboard     │  Plotly charts · Dynamic theming
                    └─────────────────┘  Light/Dark mode · Team palettes

Data Model — Star Schema

The data warehouse follows a Kimball-style star schema designed for analytical queries:

Dimension Tables

Table	Description	Key Fields
`dim_drivers`	Driver master data	`full_name`, `abbreviation`, `nationality`, `number`
`dim_teams`	Team/constructor data with canonical names	`team_name`, `team_color`
`dim_circuits`	Circuit information	`circuit_name`, `country`, `locality`
`dim_calendar`	Race calendar linking year, round, circuit, date, and format (sprint/race)	`year`, `round`, `circuit_id`, `event_date`

Fact Tables

Table	Grain	Description	Records
`fact_results`	1 row per driver/race	Final position, points, grid, status, fastest lap time, total race time	~2,000+
`fact_laps`	1 row per driver/lap	Sector times, compound, tyre life, position, pit in/out flags	~100,000+
`fact_pit_stops`	1 row per pit stop	Stop duration, compound before/after, lap number	~5,000+
`fact_qualifying`	1 row per driver/race	Q1/Q2/Q3 times, qualifying position	~2,000+

All foreign keys are indexed for fast joins across the schema.

Technology Stack

Layer	Technology	Purpose
Data Extraction	FastF1 API	Official F1 timing data source
Transformation	Pandas, NumPy	Data cleaning, normalization, type conversion
Data Warehouse	PostgreSQL 15 (Docker)	Persistent star schema storage
ORM	SQLAlchemy 2.x	Schema definition, get-or-create patterns
Dashboard	Streamlit (multipage)	Interactive web application
Visualization	Plotly Express / Graph Objects	Charts, heatmaps, radar plots
Styling	Custom CSS injection	Dynamic theming, team palettes, light/dark
Testing	pytest	Unit tests with SQLite in-memory fixtures
Infrastructure	Docker Compose	Containerized database with health checks

ETL Pipeline — Design Decisions

Extraction

Uses the FastF1 library to access the official F1 timing API
Data is cached locally (./cache/) to avoid redundant API calls
Rate limit handling: the pipeline detects the 500 calls/hour API limit and gracefully skips affected races, allowing re-execution later to fill gaps

Transformation

Team name normalization: historical team names are mapped to their current canonical name (e.g., Renault → Alpine, Toro Rosso → AlphaTauri → RB)
Time conversion: all timedelta objects are converted to float seconds for consistent storage and calculation
Null sanitization: custom _clean_str helper converts NaN, NaT, "nan", and empty strings to None before database insertion
Position/points cleaning: handles edge cases like 0, negative values, and non-numeric strings

Loading

Incremental: checks if a race is already loaded before processing — safe to re-run at any time
Bulk inserts: uses Pandas to_sql with method='multi' and chunksize=100 to avoid SQL parameter overflow
Transaction management: dimension records are committed before fact table inserts to satisfy foreign key constraints across separate connections

Dashboard — 5 Analytical Views

🏆 Championship Overview

Season KPIs: races, drivers, teams, championship leader
Driver and constructor standings with interactive line charts
Points evolution across the season with team-colored traces
Race wins and podiums distribution (horizontal bar charts)

🏁 Race Analysis

Deep dive into any individual race (selectable by season + round)
Full results table, position chart (lap-by-lap), tyre strategy visualization
Lap time distribution (box plots), qualifying results with gap-to-pole analysis

👥 Driver Comparison (Head-to-Head)

Side-by-side KPIs for any two drivers in a season
Radar chart comparing points, wins, podiums, consistency, and reliability
Race-by-race comparison table with head-to-head win count

🏢 Team Performance

Dynamic team theming: the entire UI adapts to the selected team's official color palette
Cumulative points progression and per-race points breakdown
Teammate battle: points and finish position comparison
Reliability analysis (pie chart) and grid-vs-finish positions gained/lost

⏱ Lap & Strategy Analysis

Lap time evolution per driver (excluding pit laps for clean visualization)
Tyre degradation scatter plot: lap time vs tyre life by compound
Best sector times table with theoretical best lap calculation
Sector heatmap (driver × lap) with F1-style purple/yellow/green color scale
Gap to leader chart across the race distance

Theming System

The dashboard features a dual-mode theme system (light/dark) with a toggle in the sidebar.

Dark Mode (Default)

Background: #0F1117 — deep black inspired by F1 broadcast graphics
Cards: gradient from #1A1A2E to #16213E
Accent: #E10600 (official F1 red)

Light Mode

Background: #F5F5F7 — clean, professional white
Cards: white with subtle gradient
All text, charts, and UI elements adapt automatically

Team Color Palettes

When a team is selected in the sidebar, all accent colors (tabs, headers, metric borders, chart highlights) switch to the team's official 2024 colors:

Team	Primary	Secondary	Accent
Red Bull Racing	`#3671C6`	`#1B1F27`	`#FFD700`
Ferrari	`#E80020`	`#FFEB3B`	`#FFFFFF`
Mercedes	`#27F4D2`	`#000000`	`#AAAAAA`
McLaren	`#FF8000`	`#47C7FC`	`#FFFFFF`
Aston Martin	`#229971`	`#CEDC00`	`#FFFFFF`
Alpine	`#FF87BC`	`#0093CC`	`#FFFFFF`
Williams	`#64C4FF`	`#041E42`	`#FFFFFF`
RB	`#6692FF`	`#1B3D73`	`#FFFFFF`
Kick Sauber	`#52E252`	`#000000`	`#FFFFFF`
Haas	`#B6BABD`	`#B6181C`	`#FFFFFF`

Tyre compounds use official F1 colors: 🔴 Soft · 🟡 Medium · ⚪ Hard · 🟢 Intermediate · 🔵 Wet

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.10+

1. Clone & Configure

git clone <repo-url>
cd f1_project
# Optionally edit .env for custom DB credentials

2. Start PostgreSQL

docker compose up -d
docker compose ps   # verify it's healthy

3. Install Dependencies

pip install -r requirements.txt

4. Run the ETL

# Full load (2018–2025)
python -m src.etl_engine

# Custom range
python -m src.etl_engine 2023 2025

# Single season
python -m src.etl_engine 2024

The ETL is incremental — safe to re-run. Already-loaded races are skipped automatically. First run downloads data from the API and caches it locally. Subsequent runs are much faster. The FastF1 API has a 500 calls/hour rate limit. If hit, the ETL skips affected races and can be re-run later.

5. Launch the Dashboard

streamlit run src/dashboard/app.py

Opens at http://localhost:8501

Testing

pytest tests/ -v

41 tests covering:

Test File	Coverage
`tests/test_transformations.py`	Team name normalization, timedelta conversion, position/points cleaning
`tests/test_models.py`	ORM table creation in SQLite in-memory, unique constraints, relationships
`tests/test_etl.py`	Get-or-create dimension patterns, upsert behavior, abbreviation updates

Project Structure

f1_project/
├── docker-compose.yml                # PostgreSQL 15 · persistent volume · healthcheck
├── .env                              # DB credentials (gitignored)
├── .gitignore
├── .streamlit/config.toml            # Streamlit theme config
├── requirements.txt                  # Python dependencies
├── README.md
│
├── src/
│   ├── config.py                     # DB connection, cache path, ETL parameters
│   ├── models.py                     # SQLAlchemy ORM (star schema)
│   ├── etl_engine.py                 # ETL pipeline: extract → transform → load
│   ├── transformations.py            # Normalization, cleaning, conversion
│   ├── logger.py                     # Dual logging (console + file)
│   └── dashboard/
│       ├── app.py                    # Streamlit entry point (multipage)
│       ├── data.py                   # Data access layer (cached queries)
│       ├── theme.py                  # Palettes, CSS injection, light/dark
│       ├── components.py             # KPI cards, chart builders, tables
│       └── pages/
│           ├── 01_overview.py        # Championship Overview
│           ├── 02_race_analysis.py   # Race Analysis
│           ├── 03_driver_compare.py  # Driver Comparison (H2H)
│           ├── 04_team_performance.py# Team Performance
│           └── 05_lap_analysis.py    # Lap & Strategy Analysis
│
├── tests/
│   ├── conftest.py                   # pytest fixtures (SQLite in-memory)
│   ├── test_transformations.py
│   ├── test_models.py
│   └── test_etl.py
│
└── cache/                            # FastF1 API cache (gitignored)

Challenges & Solutions

Challenge	Solution
FastF1 API rate limit (500 calls/hour)	Implemented graceful skip-on-limit with incremental loading — re-run fills gaps
`psycopg2` SQL parameter overflow on bulk inserts	Reduced `chunksize` from 500 to 100 in `pandas.to_sql`
Foreign key violations during fact table loading	Changed `session.flush()` to `session.commit()` before `to_sql` calls (separate connections)
`NaN` strings persisted in compound fields	Built `_clean_str` helper to sanitize `NaN`, `NaT`, `"nan"`, empty strings → `None`
Historical team name inconsistencies	Created normalization mapping (30+ aliases → 10 canonical names)
PostgreSQL timezone configuration errors	Created timezone symlink in Docker entrypoint for client compatibility
Streamlit subprocess import resolution	Added `sys.path` injection at the top of each page script

Connect Power BI (Optional)

Open Power BI Desktop → Get Data → PostgreSQL
Server: localhost:5432 · Database: f1_warehouse · User: f1admin
Select the star schema tables
Verify relationships on driver_id, team_id, race_id, circuit_id

Technical Notes

Team normalization: 30+ historical names mapped to current canonical names (Renault → Alpine, Toro Rosso → RB, etc.)
Time storage: all timedelta values converted to float seconds for consistent arithmetic
Bulk loading: method='multi' with chunksize=100 prevents SQL parameter overflow
Index strategy: all foreign keys indexed for fast analytical joins across 100k+ rows
Query caching: Streamlit @st.cache_data(ttl=600) prevents redundant DB queries
Persistent storage: PostgreSQL data stored in Docker volume f1_pgdata
Typography: Inter font family (Google Fonts) with weights 300–900 for professional appearance

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.streamlit		.streamlit
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

krushodev/f1-analytics

Folders and files

Latest commit

History

Repository files navigation

🏎 F1 Analytics Hub — Historical Race Intelligence Platform

Project Overview

Architecture

Data Model — Star Schema

Dimension Tables

Fact Tables

Technology Stack

ETL Pipeline — Design Decisions

Extraction

Transformation

Loading

Dashboard — 5 Analytical Views

🏆 Championship Overview

🏁 Race Analysis

👥 Driver Comparison (Head-to-Head)

🏢 Team Performance

⏱ Lap & Strategy Analysis

Theming System

Dark Mode (Default)

Light Mode

Team Color Palettes

Quick Start

Prerequisites

1. Clone & Configure

2. Start PostgreSQL

3. Install Dependencies

4. Run the ETL

5. Launch the Dashboard

Testing

Project Structure

Challenges & Solutions

Connect Power BI (Optional)

Technical Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages