Data Engineer | SQL • Python • Kafka • AWS Data Platforms
I design and build AWS-based data platforms that process large-scale datasets using batch and real-time pipelines, improving data availability, reliability, and cost efficiency.
I specialize in end-to-end data engineering: ingestion, streaming, transformation, storage, and analytical data modeling. My focus is on building systems that work under real production constraints — not just isolated pipelines.
US Citizen — Open to Remote Roles and Relocation to the United States
I build production-style data platforms that replicate real-world data engineering challenges, including multi-source ingestion, streaming pipelines, and analytical data modeling.
My work emphasizes:
- Systems that handle failure gracefully
- Predictable scalability under growing data volumes
- Operational simplicity for engineering teams
Core principle:
Data platforms must be reliable under failure, scalable under growth, and simple enough for teams to operate in production.
- SQL for analytics and large-scale data transformation
- Python for data pipelines and automation
- ETL / ELT pipeline design
- Batch and streaming data processing
- Data modeling (Star Schema, Fact & Dimension tables)
- Incremental data processing
- Kafka event streaming
- Event-driven architectures
- Schema evolution and event versioning
- Reliable ingestion and decoupled systems
- S3 (data lake architecture)
- Glue (ETL + catalog)
- Athena (serverless analytics)
- Redshift (data warehouse)
- Lambda, DynamoDB
- Terraform (Infrastructure as Code)
- CloudFormation
- IAM (least-privilege design)
- Monitoring and observability (CloudWatch, Grafana)
- Failure handling and retry strategies
- Cost-aware architecture decisions
AWS Redshift Serverless — Analytical Data Platform
Designed an analytical data warehouse to support business reporting and handle evolving transactional data.
This system addresses challenges such as late-arriving data, incremental ingestion, and query performance optimization for analytics workloads.
Key Decisions
- Star schema modeling for analytical performance
- Incremental ingestion pipelines to reduce compute costs
- Strategy for handling late-arriving transactional data
- Infrastructure fully defined using Terraform
Architecture Highlights
- AWS Redshift Serverless
- Incremental batch pipelines
- Fact and dimension modeling
- Reproducible infrastructure
Tech Stack:
AWS Redshift • S3 • Terraform • SQL
Repository:
https://github.com/tuni56/ecommerce-data-warehouse-redshift
AWS-native Analytics Data Lake
Designed a serverless data lake architecture to support scalable analytics across raw and curated datasets.
The system enables efficient querying using columnar storage and automated metadata discovery.
Key Decisions
- Serverless-first architecture to eliminate idle compute
- Columnar storage (Parquet) for performance optimization
- Automated schema discovery with Glue Crawlers
- Metadata catalog for data discoverability
Architecture Highlights
- Multi-layer S3 data lake (raw → processed → curated)
- AWS Glue catalog and ETL
- Athena for querying
- Infrastructure automation
Tech Stack:
S3 • Glue • Athena • Python • CloudFormation
Repository:
https://github.com/tuni56/serverless-aws-data-lake-with-kiro
Kafka Streaming Architecture
Built a real-time data pipeline to process high-velocity event streams with reliability and observability.
Designed to reflect production-grade streaming systems with decoupled components and resilient processing.
Key Decisions
- Event-driven architecture using Kafka
- Schema evolution with Schema Registry
- Monitoring-first system design
- Consumer reliability and fault tolerance strategies
Architecture Highlights
- Streaming ingestion pipeline
- Event routing and processing
- Observability dashboards
- Resilient message handling
Tech Stack:
Kafka • Python • Redis • Grafana • Terraform
Repository:
https://github.com/tuni56/real-time-event-driven-data-pipeline
Scalable Sensor Data Ingestion
Designed a scalable architecture to ingest and store long-term IoT datasets while maintaining cost efficiency and queryability.
Key Decisions
- Storage lifecycle optimization
- Serverless ingestion design
- Long-term data retention strategy
- Queryable historical datasets
Focus Areas
- Scalability
- Cost optimization
- Long-term data management
Tech Stack:
AWS Serverless • Kafka • Data Lake Architecture
Repository:
https://github.com/tuni56/iot-data-architecture-aws
Operational Data Pipeline for Cloud Cost Monitoring
Built an automated data pipeline to process AWS Cost & Usage Reports and generate near real-time operational insights.
Highlights
- Automated cost data ingestion
- Event-driven processing
- Monitoring-first design
- Near real-time cost visibility
Tech Stack:
Lambda • S3 • CloudWatch • SNS • Python
Repository:
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-
Background in distributed backend systems:
- Java
- Spring Boot
- Microservices
- Messaging systems
This experience influences how I design data platforms as production systems, not isolated pipelines.
- AWS-native data platform architecture
- Event-driven data systems
- Infrastructure as Code for data platforms
- Observability-driven pipeline design
Actively pursuing Data Engineer / Data Platform Engineer roles focused on modern data infrastructure.
Argentina (GMT-3)
Open to:
- Remote roles
- Relocation to the United States
US Citizen
LinkedIn
https://www.linkedin.com/in/rociobaigorria/
Email
rociomnbaigorria@gmail.com
Data systems are not just pipelines.
They are living distributed systems that must handle:
- failures
- scale changes
- operational pressure
- human operators
My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

