Skip to content
View tuni56's full-sized avatar
🤓
Developing new things
🤓
Developing new things

Block or report tuni56

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tuni56/README.md

Rocío Baigorria

Rocío Baigorria

Data Engineer | SQL • Python • Kafka • AWS Data Platforms

I design and build AWS-based data platforms that process large-scale datasets using batch and real-time pipelines, improving data availability, reliability, and cost efficiency.

I specialize in end-to-end data engineering: ingestion, streaming, transformation, storage, and analytical data modeling. My focus is on building systems that work under real production constraints — not just isolated pipelines.

US Citizen — Open to Remote Roles and Relocation to the United States


About Me

I build production-style data platforms that replicate real-world data engineering challenges, including multi-source ingestion, streaming pipelines, and analytical data modeling.

My work emphasizes:

  • Systems that handle failure gracefully
  • Predictable scalability under growing data volumes
  • Operational simplicity for engineering teams

Core principle:
Data platforms must be reliable under failure, scalable under growth, and simple enough for teams to operate in production.


Core Skills

Data Engineering

  • SQL for analytics and large-scale data transformation
  • Python for data pipelines and automation
  • ETL / ELT pipeline design
  • Batch and streaming data processing
  • Data modeling (Star Schema, Fact & Dimension tables)
  • Incremental data processing

Streaming & Distributed Systems

  • Kafka event streaming
  • Event-driven architectures
  • Schema evolution and event versioning
  • Reliable ingestion and decoupled systems

AWS Data Platforms

  • S3 (data lake architecture)
  • Glue (ETL + catalog)
  • Athena (serverless analytics)
  • Redshift (data warehouse)
  • Lambda, DynamoDB

Infrastructure & Reliability

  • Terraform (Infrastructure as Code)
  • CloudFormation
  • IAM (least-privilege design)
  • Monitoring and observability (CloudWatch, Grafana)
  • Failure handling and retry strategies
  • Cost-aware architecture decisions

Selected Data Engineering Projects

Ecommerce Data Warehouse

AWS Redshift Serverless — Analytical Data Platform

Designed an analytical data warehouse to support business reporting and handle evolving transactional data.

This system addresses challenges such as late-arriving data, incremental ingestion, and query performance optimization for analytics workloads.

Key Decisions

  • Star schema modeling for analytical performance
  • Incremental ingestion pipelines to reduce compute costs
  • Strategy for handling late-arriving transactional data
  • Infrastructure fully defined using Terraform

Architecture Highlights

  • AWS Redshift Serverless
  • Incremental batch pipelines
  • Fact and dimension modeling
  • Reproducible infrastructure

Tech Stack:
AWS Redshift • S3 • Terraform • SQL

Repository:
https://github.com/tuni56/ecommerce-data-warehouse-redshift


Serverless Data Lake Platform

AWS-native Analytics Data Lake

Designed a serverless data lake architecture to support scalable analytics across raw and curated datasets.

The system enables efficient querying using columnar storage and automated metadata discovery.

Key Decisions

  • Serverless-first architecture to eliminate idle compute
  • Columnar storage (Parquet) for performance optimization
  • Automated schema discovery with Glue Crawlers
  • Metadata catalog for data discoverability

Architecture Highlights

  • Multi-layer S3 data lake (raw → processed → curated)
  • AWS Glue catalog and ETL
  • Athena for querying
  • Infrastructure automation

Tech Stack:
S3 • Glue • Athena • Python • CloudFormation

Repository:
https://github.com/tuni56/serverless-aws-data-lake-with-kiro


Real-Time Event-Driven Data Pipeline

Kafka Streaming Architecture

Built a real-time data pipeline to process high-velocity event streams with reliability and observability.

Designed to reflect production-grade streaming systems with decoupled components and resilient processing.

Key Decisions

  • Event-driven architecture using Kafka
  • Schema evolution with Schema Registry
  • Monitoring-first system design
  • Consumer reliability and fault tolerance strategies

Architecture Highlights

  • Streaming ingestion pipeline
  • Event routing and processing
  • Observability dashboards
  • Resilient message handling

Tech Stack:
Kafka • Python • Redis • Grafana • Terraform

Repository:
https://github.com/tuni56/real-time-event-driven-data-pipeline


IoT Data Architecture on AWS

Scalable Sensor Data Ingestion

Designed a scalable architecture to ingest and store long-term IoT datasets while maintaining cost efficiency and queryability.

Key Decisions

  • Storage lifecycle optimization
  • Serverless ingestion design
  • Long-term data retention strategy
  • Queryable historical datasets

Focus Areas

  • Scalability
  • Cost optimization
  • Long-term data management

Tech Stack:
AWS Serverless • Kafka • Data Lake Architecture

Repository:
https://github.com/tuni56/iot-data-architecture-aws


AWS Serverless Cost Dashboard

Operational Data Pipeline for Cloud Cost Monitoring

Built an automated data pipeline to process AWS Cost & Usage Reports and generate near real-time operational insights.

Highlights

  • Automated cost data ingestion
  • Event-driven processing
  • Monitoring-first design
  • Near real-time cost visibility

Tech Stack:
Lambda • S3 • CloudWatch • SNS • Python

Repository:
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-


Engineering Background

Background in distributed backend systems:

  • Java
  • Spring Boot
  • Microservices
  • Messaging systems

This experience influences how I design data platforms as production systems, not isolated pipelines.


Current Focus

  • AWS-native data platform architecture
  • Event-driven data systems
  • Infrastructure as Code for data platforms
  • Observability-driven pipeline design

Actively pursuing Data Engineer / Data Platform Engineer roles focused on modern data infrastructure.

Location

Argentina (GMT-3)

Open to:

  • Remote roles
  • Relocation to the United States

US Citizen


Contact

LinkedIn
https://www.linkedin.com/in/rociobaigorria/

Email
rociomnbaigorria@gmail.com


Engineering Philosophy

Data systems are not just pipelines.

They are living distributed systems that must handle:

  • failures
  • scale changes
  • operational pressure
  • human operators

My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

GitHub Space Invaders


Pinned Loading

  1. ecommerce-streaming-data-platform ecommerce-streaming-data-platform Public

    Real-time ecommerce streaming data platform using Kafka, AWS Route 53 routing, event-driven architecture, and observability with Grafana.

    Python

  2. serverless-aws-data-lake-with-kiro serverless-aws-data-lake-with-kiro Public

    Cost-optimized serverless AWS data lake using S3, Glue, Athena, CloudFormation, and Kiro. Raw/curated architecture, Parquet, automated crawlers, and zero-idle compute.

    Python 2

  3. iot-data-architecture-aws iot-data-architecture-aws Public

    Cost-effective AWS architecture for ingesting, storing, and querying 5 years of IoT sensor data using a serverless data lake approach.

    1 1

  4. AWS-Cost-Dashboard-Serverless- AWS-Cost-Dashboard-Serverless- Public

    AWS Cost Dashboard Serverless

    Python

  5. real-time-event-driven-data-pipeline real-time-event-driven-data-pipeline Public

    Real-time event streaming pipeline with Kafka, Schema Registry, Kafka Streams, and production monitoring. Demonstrates advanced data engineering patterns at scale.

    Java

  6. datalake-analytics-pipeline datalake-analytics-pipeline Public

    Python