Feat/low memory ingestion by koenvo · Pull Request #61 · PySport/ingestify

koenvo · 2026-03-17T10:14:05Z

No description provided.

Replace BytesIO with BufferedStream (SpooledTemporaryFile, 5MB threshold) throughout the fetch and store pipeline to avoid loading large files fully into memory. - Add BufferedStream to utils: stays in memory up to 5MB, spills to disk - Stream HTTP response body via iter_content(1MB chunks) into BufferedStream, hashing on the fly — no more response.content loading full body into memory - Add http_decompress=True option to retrieve_http: stream-decompresses gzip content (e.g. .json.gz from S3) without double-compressing on store - Use BufferedStream in _prepare_write_stream and _prepare_read_stream for compress/decompress in the store - DraftFile.stream typed as BufferedStream with coercing validator for backwards compatibility (accepts BytesIO, bytes, or any readable)

Replace BytesIO with BufferedStream (SpooledTemporaryFile, 5MB threshold) throughout the fetch and store pipeline to avoid loading large files into memory. - Add BufferedStream to utils: stays in memory up to 5MB, spills to disk - Stream HTTP response via iter_content(1MB chunks) into BufferedStream, hashing on the fly — no more response.content loading full body into memory - Detect gzip content (magic bytes) once in retrieve_http, store as DraftFile.content_compression_method — no re-reading the stream later - Gzip files are stored as-is (no recompression CPU cost); size is read from the gzip trailer so file.size always reflects uncompressed data size - _prepare_write_stream uses content_compression_method to skip compression for already-compressed files, and returns the actual compression_method used so File metadata is always correct - DraftFile.stream typed as BufferedStream with coercing validator for backwards compatibility (accepts BytesIO, bytes, or any readable)

- Extract detect_compression() and gzip_uncompressed_size() into utils - Set DraftFile.content_compression_method once on fetch, used by store - Add tests for detect_compression, gzip_uncompressed_size, and http fetch - Format with black

koenvo added 3 commits March 12, 2026 17:46

koenvo merged commit 3bc8373 into main Mar 17, 2026
12 checks passed

koenvo deleted the feat/low-memory-ingestion branch March 17, 2026 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/low memory ingestion#61

Feat/low memory ingestion#61
koenvo merged 3 commits intomainfrom
feat/low-memory-ingestion

koenvo commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

koenvo commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant