A complete end-to-end pipeline for real-time analytics using Kafka + Spark Structured Streaming + PostgreSQL + Elasticsearch.
It simulates and processes clickstream data in real time. It uses:
- Kafka to ingest user click events
- Spark Structured Streaming (via PySpark) for stream processing
- PostgreSQL to store historical data
- Elasticsearch + Kibana for search & visualization
Optional tools like Kafka Connect and Kafdrop are also included.
\[Simulator (Go)]
↓
\[Kafka Topic: clicks]
↓
\[Spark Structured Streaming]
↓ ↘
\[PostgreSQL] \[Elasticsearch]
↓
\[Kibana]
- ✅ Real-time event simulation (Go)
- ✅ Kafka producer/consumer pipeline
- ✅ Stream processing with PySpark
- ✅ Sink to PostgreSQL and Elasticsearch
- ✅ Docker Compose setup for full reproducibility
- ✅ Extensible and production-oriented
| Tool | Role |
|---|---|
| Kafka | Message broker for real-time events |
| Spark | Stream processing engine (PySpark) |
| PostgreSQL | Storage of historical events |
| Elasticsearch | Real-time search and analytics |
| Kibana | Visualization UI (optional) |
| Kafka Connect | Sink connectors (optional) |
| Docker Compose | Environment orchestration |
kafka-spark-realtime/
├── docker-compose.yml
├── simulator/
│ └── main.go
├── spark/
│ └── stream\_processor.py
├── sql/
│ └── create\_tables.sql
├── connectors/
│ ├── postgres-sink.json
│ └── elastic-sink.json
├── notebooks/
│ └── exploratory\_analysis.ipynb
└── README.md
- Clone the repo
git clone [https://github.com//kafka-spark-realtime.git](https://github.com/Yurhigz/kafka-spark-realtime.git)
cd kafka-spark-realtime- Launch the environment
docker-compose up -d2.bis Setup C Bindings
Installer le compilateur GCC :
sudo apt-get update
sudo apt-get install build-essentialModifier la variable CGO_ENABLED
- Start the simulator
go run simulator/main.go
- Launch the Spark job
spark-submit spark/stream_processor.py- Explore
- PostgreSQL:
localhost:5432 - Elasticsearch:
localhost:9200 - Kibana:
localhost:5601 - Kafdrop:
localhost:9000
{
"user_id": "user_42",
"page": "/product/1234",
"event": "click",
"timestamp": "2025-06-04T12:00:00Z"
}MIT License — feel free to use for academic or enterprise learning.