Skip to content
This repository was archived by the owner on Mar 24, 2026. It is now read-only.

ESousa97/go-file-processor

Go File Processor

Parallel and resilient processing of massive files with Worker Pool in Go.

Go File Processor Banner

CI Go Report Card CodeFactor Go Reference License: MIT Go Version Last Commit


Note: Archival Project
This was my second major project in Go, built as a deep dive into the language's idiomatic concurrency patterns and high-performance I/O. It is now archived but serves as a solid reference for ETL (Extract, Transform, Load) implementations in Golang.

Go File Processor is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. It demonstrates the power of Go's concurrency primitives to achieve maximum throughput with minimal memory overhead.

🚀 Core Learning Objectives

This project was a hands-on laboratory to master several Go concepts:

  • Concurrency via Worker Pool: Leveraging goroutines and channels to process data in parallel without overwhelming the system.
  • Memory Efficiency (Streaming): Using io.Reader and io.Writer to process gigabytes of data with a constant, tiny memory footprint.
  • The Middleware Pattern: Implementing a "Chain of Responsibility" for data transformation that is both flexible and type-safe.
  • Atomic Operations: Using sync/atomic for high-speed metrics tracking, avoiding the overhead of mutexes.
  • Idiomatic Project Layout: Following standard Go folder structures (cmd/, internal/) and build automation with Makefile.

Demonstration

As a Library

proc := processor.NewCSVToJSONProcessor()
config := processor.Config{WorkerCount: 8}

// Fluent transformation chain
config.AddTransformer(processor.EmailFilter(`@company.com$`))
config.AddTransformer(processor.FieldMasker("email"))

metrics, err := proc.Process("input.csv", "output.json", config)

As a CLI

./fileproc -input data.csv -output data.json -workers 4

Tech Stack & Architecture

Technology What I Learned
Worker Pool How to orchestrate multiple goroutines for parallel work.
Channels Managing safe communication and backpressure between stages.
Streaming I/O Processing files record-by-record instead of loading to RAM.
Atomic Counters Implementing thread-safe counters with maximum performance.
Structured Logs Using slog for modern, machine-readable observability.

Pipeline Flow

The system uses a streaming model to maintain low memory usage: Input CSV -> Producer -> Job Channel -> [Workers + Transformers] -> Result Channel -> Consumer -> Output JSON

Makefile Targets

Target Description
make build Compiles the fileproc binary.
make test Runs the full unit test suite.
make bench Runs benchmarks to see the speed of Parallel vs Sequential.
make generate-data Generates a 100k row test file for performance testing.

📚 Final Thoughts

Building this project taught me that Go isn't just about syntax; it's about a philosophy of simplicity and performance. The transition from sequential processing to a parallel worker pool showed me how Go empowers developers to build tools that scale effortlessly.


Author

Enoque Sousa

LinkedIn GitHub Portfolio

⬆ Back to top

Made with ❤️ by Enoque Sousa

Project Status: Archived — Educational Milestone

About

55 - High-performance, concurrent CSV to JSON processor in Go optimized for massive datasets using Worker Pools.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors