Skip to content
View sunilmakkar's full-sized avatar

Block or report sunilmakkar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sunilmakkar/README.md

👋 Hi, I’m Sunil! | Data & Analytics Engineer

🔧 What I’m Building

I bridge the gap between raw data and actionable insights. My focus is on Medallion Architecture, modular data modeling with dbt, and building the infrastructure that powers AI/LLM applications. While I am a fan of the Modern Data Stack and primarily work with batch pipelines, I love getting my hands dirty with real-time and near real-time systems, from high-frequency pollers to event-driven streaming.

💼 Background

  • Recent Applied Data Engineering grad from WeCloudData (June 2025).

  • Experience building production-grade pipelines within Snowflake-centric architectures, as well as AWS, Azure, and local OLAP environments.

🚀 Core Stack

  • Data: Snowflake, dbt, DuckDB, PostgreSQL, Delta Lake, Airflow.

  • Data Modeling: Medallion Architecture, Kimball Star Schema, Fact/Dimension Modeling.

  • Engineering: Python, Kafka, PySpark, FastAPI, Docker, systemd, Airflow, Celery, SQLAlchemy, Redis.

  • AI: RAG Systems, Vector DBs (Qdrant), LLM APIs (OpenAI/Gemini), LangChain.

📂 Featured Work

  • 🎧 Spotify Data Platform: A hybrid Kafka-S3-Snowflake pipeline + a real-time playback poller.

  • 🇧🇷 Olist E-commerce: End-to-end dbt transformation layer using Kimball Star Schema.

  • 🤖 YouTube Sentiment: Async FastAPI/Celery pipeline processing 10k+ comments.

  • 🗳️ Election Monte Carlo: Distributed Spark simulation of the 2016 Electoral College.

📝 Let's Connect I’m always down to chat about data modeling, the future of AI infra, or the best way to optimize an Airflow DAG.

🎯 Currently focused on Analytics Engineering & Data Platform opportunities.

Pinned Loading

  1. spotify-data-pipeline spotify-data-pipeline Public

    Build a production-grade data pipeline that mimics Spotify's data infrastructure - from event generation through transformations to analytics and recommendations. Demonstrate end-to-end data engine…

    HTML 1

  2. olist-e-commerce-data-pipeline olist-e-commerce-data-pipeline Public

    This project builds a scalable, production-grade data pipeline for the Olist E-commerce dataset (the largest public dataset of Brazilian e-commerce). The pipeline transitions from raw, messy CSV da…

    Jupyter Notebook

  3. youtube-sentiment-analyzer youtube-sentiment-analyzer Public

    Async FastAPI service that ingests YouTube comments, runs sentiment analysis with HuggingFace, stores results in Postgres, and exposes analytics endpoints (trends, distribution, keywords).

    Python

  4. call-detailed-record-data call-detailed-record-data Public

    Given a dataset of call detail records I was asked as an assignment for school to do a simple data transformation project on Azure Databricks.

    Jupyter Notebook

  5. ai-web-scraper ai-web-scraper Public

    Basic AI Web Scraper

    Python

  6. insightful-orders insightful-orders Public

    Insightful-Orders is a Flask-based API that ingests Olist order data, runs business analytics (AOV, RFM, cohorts), and streams real-time alerts. It’s JWT-secured, OpenAPI-documented, fully containe…

    Python