Skip to content

Stream template build-context upload from disk instead of buffering in memory#1435

Open
mishushakov wants to merge 8 commits into
mainfrom
mishushakov/stream-template-build-upload
Open

Stream template build-context upload from disk instead of buffering in memory#1435
mishushakov wants to merge 8 commits into
mainfrom
mishushakov/stream-template-build-upload

Conversation

@mishushakov

@mishushakov mishushakov commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Template builds previously buffered the entire gzipped build-context tar archive in memory before uploading it. This PR spools the archive to a temporary file and streams it from disk during upload — in the JS SDK and both sync and async Python SDKs — so memory usage no longer scales with the size of the build context.

The upload keeps an explicit Content-Length header (taken from the spooled file's size), which S3 presigned PUT URLs require — they reject Transfer-Encoding: chunked with 501 NotImplemented (#1243).

Changes

  • JS (packages/js-sdk/src/template/): tarFileStream/tarFileStreamUpload are replaced by tarFileToStream, which writes the archive to a temp file and returns a self-cleaning read stream plus its size. The spooled temp file deletes itself once the stream is closed (consumed, errored, or destroyed) via the stream's close event — mirroring the Python SDK's tar_file_stream. buildApi streams this body with duplex: 'half' and an explicit Content-Length from size; if fetch throws before consuming the body, it destroys the stream to trigger the same cleanup. There is no separate cleanup callback, so a cleanup failure can no longer mask the upload result.
  • Python (packages/python-sdk/e2b/template/utils.py, template_async/build_api.py, template_sync/build_api.py): tar_file_stream now writes to a tempfile.TemporaryFile instead of io.BytesIO and returns the file object positioned at the start; the upload streams from it with an explicit Content-Length and closes it (deleting the temp file) when done.
  • Tests updated for the new return shapes (JS tarFileToStream.test.ts, uploadFile.test.ts; Python upload/tar tests), including assertions that the spooled archive is removed on both the consume and destroy paths.

Usage

No API changes — Template.build() / template builds behave the same, just without holding the build context in memory:

await Template.build(template, { alias: 'my-template' })
Template.build(template, alias="my-template")

Split out of #1433, which covers streaming for sandbox/volume file uploads and downloads.

🤖 Generated with Claude Code

…uffering in memory

Spool the build-context tar archive to a temporary file and stream it
from disk during upload (JS and Python, sync and async), keeping the
explicit Content-Length required by S3 presigned PUT URLs, which reject
Transfer-Encoding: chunked with 501 NotImplemented (#1243).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cla-bot cla-bot Bot added the cla-signed label Jun 12, 2026
@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: e353629

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
e2b Patch
@e2b/python-sdk Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes core template build upload paths in both SDKs; behavior is intended to be equivalent but relies on streaming fetch/httpx and temp-file lifecycle, with added regression tests for upload headers and cleanup error handling.

Overview
Template build uploads no longer hold the full gzipped build-context tar in RAM. JS and Python (sync and async) now spool the archive to a temp file, upload from disk with an explicit Content-Length (still required for S3 presigned PUTs that reject chunked encoding), and remove the spool when the read stream finishes or is destroyed.

JS: tarFileStream writes to disk and returns { stream, size }; uploadFile uses Readable.toWeb, Content-Length, and duplex: 'half', and destroys the stream on failure so temp dirs are cleaned up. The old in-memory buffer path and tarFileStreamUpload wrapper are gone.

Python: tar_file_stream uses tempfile.TemporaryFile instead of BytesIO; uploads stream the file (async via aiter_io_chunks + Content-Length). Temp-file close/cleanup is best-effort so a cleanup error after a successful upload does not surface as FileUploadException.

Tests cover Content-Length/no-chunked contract, temp cleanup on consume/destroy, and post-upload close failures.

Reviewed by Cursor Bugbot for commit e353629. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Package Artifacts

Built from 534aa69. Download artifacts from this workflow run.

JS SDK (e2b@2.30.6-mishushakov-stream-template-build-upload.0):

npm install ./e2b-2.30.6-mishushakov-stream-template-build-upload.0.tgz

CLI (@e2b/cli@2.12.3-mishushakov-stream-template-build-upload.0):

npm install ./e2b-cli-2.12.3-mishushakov-stream-template-build-upload.0.tgz

Python SDK (e2b==2.29.5+mishushakov-stream-template-build-upload):

pip install ./e2b-2.29.5+mishushakov.stream.template.build.upload-py3-none-any.whl

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4c78c2b. Configure here.

Comment thread packages/js-sdk/src/template/buildApi.ts Outdated
mishushakov and others added 2 commits June 23, 2026 15:31
After S3 accepts the archive, the temp-file cleanup running in the
upload's `finally` block could throw (e.g. removal fails while a read
stream still holds the file). That error escaped into the outer catch
and was wrapped as a FileUploadError/FileUploadException, reporting a
false failure for an upload that already succeeded — and on a failed
upload it would overwrite the real error.

Make cleanup best-effort in all three implementations (JS, Python sync
and async) so it can never alter the upload result. Add regression tests
covering a cleanup/close failure after a 200 response.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes the Lint and Generated-files CI checks, which run `pnpm run
format` and fail on any resulting diff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mishushakov mishushakov marked this pull request as ready for review June 23, 2026 13:44
Comment thread packages/js-sdk/src/template/utils.ts
Comment thread packages/js-sdk/tests/template/uploadFile.test.ts
mishushakov and others added 4 commits June 23, 2026 21:31
Address Claude bot review on PR #1435:

- In tarFileToTempFile, the failure-path `catch` did `await cleanup();
  throw err`. If cleanup() rejected (e.g. EBUSY/EPERM on Windows while
  the tar process still holds a handle), it short-circuited and the
  original archive-creation error was lost. Use the same best-effort
  `await cleanup().catch(() => {})` pattern as uploadFile.
- Update the stale uploadFile test header comment that still described
  the old in-memory buffering instead of the spool-and-stream approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the JS tarFileToTempFile fix in the shared tar_file_stream util
(used by both sync and async). On the archive-creation failure path,
`tar_file.close()` running before `raise` could itself throw (e.g. an
OSError flushing the partially-written archive) and propagate as the
primary exception instead of the genuine archive-creation error. Wrap
the close in best-effort try/except so the real error is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Python)

Replace tarFileToTempFile ({ path, size, cleanup }) with tarFileToStream
({ stream, size }), mirroring the Python SDK's tar_file_stream: the
spooled archive deletes itself once the returned stream is closed
(consumed, errored, or destroyed) via its `close` event.

uploadFile no longer manages cleanup — there is no cleanup callback or
`finally` in the upload path, so a cleanup failure can no longer mask the
upload result. To preserve guaranteed removal when fetch never consumes
the body (e.g. it throws before reading), uploadFile destroys the stream
in its catch, which triggers the same `close`-based cleanup.

Node has no portable delete-on-close (Python's TemporaryFile uses
O_TEMPORARY on Windows / immediate unlink on POSIX), so cleanup is tied to
the stream lifecycle instead. Tests rewritten to consume the stream and
assert the spooled archive is removed on both consume and destroy paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename tarFileToStream -> tarFileStream to match the prior naming and the
Python SDK's tar_file_stream, and the local upload variable stream ->
uploadStream for clarity. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant