Rocio.data tuni56

Rocío Baigorria

Data Engineer | SQL • Python • Kafka • AWS Data Platforms

I design and build AWS-based data platforms that process large-scale datasets using batch and real-time pipelines, improving data availability, reliability, and cost efficiency.

I specialize in end-to-end data engineering: ingestion, streaming, transformation, storage, and analytical data modeling. My focus is on building systems that work under real production constraints — not just isolated pipelines.

US Citizen — Open to Remote Roles and Relocation to the United States

About Me

I build production-style data platforms that replicate real-world data engineering challenges, including multi-source ingestion, streaming pipelines, and analytical data modeling.

My work emphasizes:

Systems that handle failure gracefully
Predictable scalability under growing data volumes
Operational simplicity for engineering teams

Core principle:
Data platforms must be reliable under failure, scalable under growth, and simple enough for teams to operate in production.

Core Skills

Data Engineering

SQL for analytics and large-scale data transformation
Python for data pipelines and automation
ETL / ELT pipeline design
Batch and streaming data processing
Data modeling (Star Schema, Fact & Dimension tables)
Incremental data processing

Streaming & Distributed Systems

Kafka event streaming
Event-driven architectures
Schema evolution and event versioning
Reliable ingestion and decoupled systems

AWS Data Platforms

S3 (data lake architecture)
Glue (ETL + catalog)
Athena (serverless analytics)
Redshift (data warehouse)
Lambda, DynamoDB

Infrastructure & Reliability

Terraform (Infrastructure as Code)
CloudFormation
IAM (least-privilege design)
Monitoring and observability (CloudWatch, Grafana)
Failure handling and retry strategies
Cost-aware architecture decisions

Selected Data Engineering Projects

Ecommerce Data Warehouse

AWS Redshift Serverless — Analytical Data Platform

Designed an analytical data warehouse to support business reporting and handle evolving transactional data.

This system addresses challenges such as late-arriving data, incremental ingestion, and query performance optimization for analytics workloads.

Key Decisions

Star schema modeling for analytical performance
Incremental ingestion pipelines to reduce compute costs
Strategy for handling late-arriving transactional data
Infrastructure fully defined using Terraform

Architecture Highlights

AWS Redshift Serverless
Incremental batch pipelines
Fact and dimension modeling
Reproducible infrastructure

Tech Stack:
AWS Redshift • S3 • Terraform • SQL

Repository:
https://github.com/tuni56/ecommerce-data-warehouse-redshift

Serverless Data Lake Platform

AWS-native Analytics Data Lake

Designed a serverless data lake architecture to support scalable analytics across raw and curated datasets.

The system enables efficient querying using columnar storage and automated metadata discovery.

Key Decisions

Serverless-first architecture to eliminate idle compute
Columnar storage (Parquet) for performance optimization
Automated schema discovery with Glue Crawlers
Metadata catalog for data discoverability

Architecture Highlights

Multi-layer S3 data lake (raw → processed → curated)
AWS Glue catalog and ETL
Athena for querying
Infrastructure automation

Tech Stack:
S3 • Glue • Athena • Python • CloudFormation

Repository:
https://github.com/tuni56/serverless-aws-data-lake-with-kiro

Real-Time Event-Driven Data Pipeline

Kafka Streaming Architecture

Built a real-time data pipeline to process high-velocity event streams with reliability and observability.

Designed to reflect production-grade streaming systems with decoupled components and resilient processing.

Key Decisions

Event-driven architecture using Kafka
Schema evolution with Schema Registry
Monitoring-first system design
Consumer reliability and fault tolerance strategies

Architecture Highlights

Streaming ingestion pipeline
Event routing and processing
Observability dashboards
Resilient message handling

Tech Stack:
Kafka • Python • Redis • Grafana • Terraform

Repository:
https://github.com/tuni56/real-time-event-driven-data-pipeline

IoT Data Architecture on AWS

Scalable Sensor Data Ingestion

Designed a scalable architecture to ingest and store long-term IoT datasets while maintaining cost efficiency and queryability.

Key Decisions

Storage lifecycle optimization
Serverless ingestion design
Long-term data retention strategy
Queryable historical datasets

Focus Areas

Scalability
Cost optimization
Long-term data management

Tech Stack:
AWS Serverless • Kafka • Data Lake Architecture

Repository:
https://github.com/tuni56/iot-data-architecture-aws

AWS Serverless Cost Dashboard

Operational Data Pipeline for Cloud Cost Monitoring

Built an automated data pipeline to process AWS Cost & Usage Reports and generate near real-time operational insights.

Highlights

Automated cost data ingestion
Event-driven processing
Monitoring-first design
Near real-time cost visibility

Tech Stack:
Lambda • S3 • CloudWatch • SNS • Python

Repository:
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-

Engineering Background

Background in distributed backend systems:

Java
Spring Boot
Microservices
Messaging systems

This experience influences how I design data platforms as production systems, not isolated pipelines.

Current Focus

AWS-native data platform architecture
Event-driven data systems
Infrastructure as Code for data platforms
Observability-driven pipeline design

Actively pursuing Data Engineer / Data Platform Engineer roles focused on modern data infrastructure.

Location

Argentina (GMT-3)

Open to:

Remote roles
Relocation to the United States

US Citizen

Contact

LinkedIn
https://www.linkedin.com/in/rociobaigorria/

Email
rociomnbaigorria@gmail.com

Engineering Philosophy

Data systems are not just pipelines.

They are living distributed systems that must handle:

failures
scale changes
operational pressure
human operators

My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rocio.data tuni56

Achievements

Achievements

Block or report tuni56

Rocío Baigorria

Rocío Baigorria

About Me

Core Skills

Data Engineering

Streaming & Distributed Systems

AWS Data Platforms

Infrastructure & Reliability

Selected Data Engineering Projects

Ecommerce Data Warehouse

Serverless Data Lake Platform

Real-Time Event-Driven Data Pipeline

IoT Data Architecture on AWS

AWS Serverless Cost Dashboard

Engineering Background

Current Focus

Location

Contact

Engineering Philosophy

Pinned Loading

Uh oh!