End-to-End Streaming Analytics using the Modern Data Stack
Snowflake · dbt · Apache Airflow · Apache Kafka · Python · Docker · Power BI
This project showcases a production-style, real-time stock market data pipeline built using the Modern Data Stack.
Live stock prices are fetched from an external API, streamed in real time through Kafka, orchestrated with Airflow, transformed using dbt inside Snowflake, and finally served to analytics-ready Power BI dashboards.
The goal of this project is to demonstrate real-world data engineering skills, including streaming ingestion, layered data modeling (Bronze/Silver/Gold), orchestration, and business-focused analytics.
High-level flow:
API → Kafka Producer → Kafka Topic → Kafka Consumer → MinIO (Raw/Bronze) → Airflow → Snowflake → dbt Models → Power BI
This architecture closely mirrors how modern data platforms handle near real-time analytics at scale.
- Snowflake – Cloud data warehouse for scalable analytics
- dbt – SQL-based transformations and analytics engineering
- Apache Airflow – Workflow orchestration and scheduling
- Apache Kafka – Real-time data streaming
- Python – API ingestion, producers, consumers, utilities
- Docker – Containerized local development environment
- Power BI – Business intelligence and visualization
- Fetches live stock market data from a real external API (not simulated)
- Real-time streaming using Apache Kafka
- Layered data modeling (Bronze → Silver → Gold)
- Orchestrated ingestion and transformations with Airflow
- Scalable Snowflake warehouse design
- dbt models built with analytics best practices
- Power BI dashboards powered by real-time warehouse data
real-time-stocks-data-pipeline/
├── producer/ # Kafka producer (API → Kafka)
│ └── producer.py
├── consumer/ # Kafka consumer (Kafka → MinIO)
│ └── consumer.py
├── dbt_stocks/ # dbt project
│ └── models/
│ ├── bronze/
│ │ ├── bronze_stg_stock_quotes.sql
│ │ └── sources.yml
│ ├── silver/
│ │ └── silver_clean_stock_quotes.sql
│ └── gold/
│ ├── gold_candlestick.sql
│ ├── gold_kpi.sql
│ └── gold_treechart.sql
├── dags/ # Airflow DAGs
│ └── minio_to_snowflake.py
├── docker-compose.yml # Kafka, Zookeeper, MinIO, Airflow, Postgres
├── requirements.txt
└── README.md
- Clone the repository
- Configure environment variables (API keys, Snowflake credentials)
- Start services using Docker Compose
- Run the Kafka producer to fetch live stock data
- Airflow orchestrates ingestion into Snowflake
- dbt applies transformations
- Connect Power BI to Snowflake for visualization
- Apache Kafka and Zookeeper are configured using Docker
- A dedicated topic is created for streaming stock market events
- Producers publish live market data; consumers process and persist it
- Python-based Kafka producer fetches live stock prices from an external API
- Data is streamed in JSON format at near real-time intervals
- Designed to handle schema changes and API fluctuations
- Python consumer reads messages from Kafka
- Data is written to MinIO (S3-compatible storage)
- Raw data is organized for Bronze-layer ingestion
-
Apache Airflow runs inside Docker
-
DAG responsibilities:
- Load data from MinIO into Snowflake staging tables
- Schedule ingestion jobs at frequent intervals
- Monitor pipeline health and failures
-
Snowflake database and schemas created for analytics
-
Layered architecture implemented:
- Bronze – Raw structured data
- Silver – Cleaned and validated data
- Gold – Business-ready analytical models
-
dbt project configured with Snowflake adapter
-
Models include:
- Bronze models for structured ingestion
- Silver models for data quality and normalization
- Gold models for analytics and reporting
-
Follows dbt best practices: modular SQL, clear naming, and testability
-
Power BI connects directly to Snowflake (Gold layer)
-
Dashboards include:
- Candlestick charts for price movement
- Tree maps for comparative trends
- KPI cards for real-time metrics
- Volume and performance indicators
- Fully automated real-time data pipeline
- Snowflake warehouse with Bronze/Silver/Gold layers
- dbt analytics models
- Airflow DAGs for orchestration
- Power BI dashboards with near real-time insights
This project demonstrates:
- Real-world data engineering workflows
- Streaming + batch hybrid architecture
- Analytics engineering using dbt
- Cloud-native warehouse design
- Business-focused visualization
It is designed to closely resemble production-grade data platforms used in modern analytics teams.
Prince Pastakiya Data Engineer | Analytics Engineer
⭐ If you find this project helpful, feel free to star the repository and explore further enhancements!

