WIP - feat(storage): add snapshot/fork benchmark mode by kisernl · Pull Request #145 · computesdk/benchmarks

kisernl · 2026-06-24T15:40:45Z

Add a new snapshot-fork benchmark mode that measures snapshot creation, fork-from-snapshot, and fork-from-live timings against the storage providers, with read-back verification and best-effort teardown of every created object/snapshot/fork.

src/storage/snapshot-fork-benchmark.ts: per-iteration seed -> snapshot -> fork(snapshot) -> fork(live) -> verify -> cleanup
src/storage/snapshot-fork-types.ts: small/wide/deep dataset presets to separate per-object overhead from bytes-copied cost
src/storage/stats.ts: extract shared median/p95/p99 helpers out of benchmark.ts and reuse in both storage benchmarks
src/run.ts: wire up --mode snapshot-fork and --dataset, results output
package.json: bench:snapshot-fork scripts
CI: PR-only snapshot-fork job (single iteration, small dataset) across aws-s3, cloudflare-r2, tigris; skips gracefully without secrets

Add a new `snapshot-fork` benchmark mode that measures snapshot creation, fork-from-snapshot, and fork-from-live timings against the storage providers, with read-back verification and best-effort teardown of every created object/snapshot/fork. - src/storage/snapshot-fork-benchmark.ts: per-iteration seed -> snapshot -> fork(snapshot) -> fork(live) -> verify -> cleanup - src/storage/snapshot-fork-types.ts: small/wide/deep dataset presets to separate per-object overhead from bytes-copied cost - src/storage/stats.ts: extract shared median/p95/p99 helpers out of benchmark.ts and reuse in both storage benchmarks - src/run.ts: wire up --mode snapshot-fork and --dataset, results output - package.json: bench:snapshot-fork scripts - CI: PR-only snapshot-fork job (single iteration, small dataset) across aws-s3, cloudflare-r2, tigris; skips gracefully without secrets NOTE: untested — the snapshot/fork paths have not been run against any live provider yet. Needs validation before relying on results.

open-cla · 2026-06-24T15:40:53Z

Contributor License Agreement

All contributors are covered by a CLA.

github-actions · 2026-06-24T15:42:00Z

Browser Benchmark Results

#	Provider	Score	Create	Connect	Navigate	Release	Total	Status
1	Browserbase	94.2	0.21s	0.12s	0.10s	0.12s	0.61s	10/10
2	Kernel	87.2	0.05s	0.12s	0.13s	0.07s	0.35s	9/10
3	Hyperbrowser	86.4	0.83s	0.24s	0.15s	0.08s	1.40s	10/10
4	Steel	80.9	0.60s	0.77s	0.14s	0.21s	1.99s	10/10

View full run · SVG available as build artifact

github-actions · 2026-06-24T15:42:42Z

Sandbox Benchmark Results

Sequential

#	Provider	Score	Median TTI	P95	P99	Status
1	isorun	99.4	0.03s	0.12s	0.12s	10/10
2	declaw	97.8	0.20s	0.25s	0.25s	10/10
3	northflank	97.6	0.19s	0.31s	0.31s	10/10
4	tensorlake	97.2	0.27s	0.29s	0.29s	10/10
5	daytona	96.8	0.23s	0.44s	0.44s	10/10
6	upstash	96.5	0.28s	0.44s	0.44s	10/10
7	e2b	94.5	0.48s	0.66s	0.66s	10/10
8	blaxel	94.1	0.53s	0.68s	0.68s	10/10
9	modal	93.8	0.57s	0.71s	0.71s	10/10
10	vercel	92.0	0.69s	0.97s	0.97s	10/10
11	archil	84.7	1.50s	1.58s	1.58s	10/10
12	runloop	83.3	0.61s	3.27s	3.27s	10/10
13	hopx	81.1	1.75s	2.10s	2.10s	10/10
14	codesandbox	71.3	2.68s	3.16s	3.16s	10/10
15	cloudflare	67.5	3.09s	3.49s	3.49s	10/10

Staggered

#	Provider	Score	Median TTI	P95	P99	Status
1	isorun	99.7	0.03s	0.03s	0.03s	10/10
2	northflank	98.1	0.17s	0.22s	0.22s	10/10
3	declaw	97.2	0.23s	0.36s	0.36s	10/10
4	daytona	96.8	0.23s	0.44s	0.44s	10/10
5	upstash	96.5	0.30s	0.43s	0.43s	10/10
6	tensorlake	95.4	0.40s	0.55s	0.55s	10/10
7	e2b	95.2	0.44s	0.53s	0.53s	10/10
8	blaxel	94.1	0.52s	0.69s	0.69s	10/10
9	modal	94.1	0.56s	0.63s	0.63s	10/10
10	vercel	92.8	0.60s	0.90s	0.90s	10/10
11	archil	85.9	1.34s	1.52s	1.52s	10/10
12	runloop	83.1	0.65s	3.24s	3.24s	10/10
13	hopx	83.0	1.53s	1.95s	1.95s	10/10
14	codesandbox	70.4	2.77s	3.25s	3.25s	10/10
15	cloudflare	68.7	2.97s	3.37s	3.37s	10/10

Burst

#	Provider	Score	Median TTI	P95	P99	Status
1	isorun	99.7	0.03s	0.03s	0.03s	10/10
2	northflank	97.8	0.20s	0.24s	0.24s	10/10
3	upstash	95.8	0.34s	0.53s	0.53s	10/10
4	tensorlake	95.8	0.41s	0.44s	0.44s	10/10
5	blaxel	94.4	0.54s	0.59s	0.59s	10/10
6	e2b	94.2	0.49s	0.73s	0.73s	10/10
7	modal	93.9	0.60s	0.63s	0.63s	10/10
8	declaw	93.6	0.62s	0.67s	0.67s	10/10
9	runloop	92.5	0.74s	0.78s	0.78s	10/10
10	vercel	90.9	0.77s	1.12s	1.12s	10/10
11	daytona	86.6	0.28s	0.52s	0.52s	9/10
12	archil	84.4	1.36s	1.87s	1.87s	10/10
13	hopx	76.0	2.00s	2.99s	2.99s	10/10
14	cloudflare	68.0	2.90s	3.63s	3.63s	10/10
15	codesandbox	64.0	3.40s	3.91s	3.91s	10/10

View full run · SVGs available as build artifacts

superagent-security

Superagent found 2 security concern(s).

superagent-security · 2026-06-24T15:43:06Z

+    # smallest dataset to keep cost and leak risk minimal. Fork PRs without
+    # secrets skip gracefully (missing-creds path) rather than failing.
+    if: github.event_name == 'pull_request'
+    runs-on: namespace-profile-default


P0: New workflow job runs on self-hosted runner for pull_request events

New PR job runs on a self-hosted runner, which fork PRs can reach.

Restrict the runner group to disallow public fork jobs, or use GitHub-hosted runners for PRs.

AI prompt

Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior. <file name=".github/workflows/storage-benchmarks.yml"> <violation number="1" location=".github/workflows/storage-benchmarks.yml:108"> <priority>P0</priority> <title>New workflow job runs on self-hosted runner for pull_request events</title> <evidence>The snapshot-fork job uses runs-on: namespace-profile-default with if: github.event_name == 'pull_request'. Self-hosted runners are reachable by fork PRs under the pull_request trigger, and runner state is shared across jobs. This has led to critical compromises (e.g., PyTorch).</evidence> <recommendation>Verify in GitHub Settings that the namespace-profile-default runner group is restricted to not accept jobs from public forks. If fork PRs must run benchmarks, consider using GitHub-hosted runners (ubuntu-latest) for the fork-PR path, or gate the job behind an approval requirement.</recommendation> </violation> </file>

github-actions · 2026-06-24T15:43:21Z

Browser Throughput Benchmark Results

#	Provider	Score	APS (med)	Task (med)	Task (p95)	Screenshot	Status
1	Kernel	59.9	4.61/s	10.85s	14.08s	330ms	3/3
2	Browserbase	51.5	3.48/s	14.39s	15.51s	224ms	3/3
3	Hyperbrowser	34.8	2.67/s	18.69s	32.25s	544ms	3/3
4	Steel	21.1	1.60/s	31.19s	34.41s	666ms	3/3

View full run · SVG available as build artifact

github-actions · 2026-06-24T15:46:27Z

Storage Benchmark Results

1MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	96.0	0.03s	246.0 Mbps	0.11s	1000/1000
2	AWS S3	95.6	0.05s	172.1 Mbps	0.06s	1000/1000
3	Cloudflare R2	94.7	0.13s	66.3 Mbps	0.24s	1000/1000

4MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	97.0	0.07s	467.6 Mbps	0.15s	10000/10000
2	Cloudflare R2	94.7	0.18s	184.5 Mbps	0.46s	10000/10000

10MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	96.7	0.16s	524.6 Mbps	0.40s	10000/10000
2	Cloudflare R2	94.5	0.27s	308.1 Mbps	0.83s	10000/10000

16MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	95.8	0.31s	427.5 Mbps	0.45s	10000/10000
2	Cloudflare R2	94.3	0.40s	337.8 Mbps	0.86s	9999/10000

View full run · SVGs available as build artifacts

Wire the snapshot/fork benchmark into the collect job so results show up on PRs and get committed to results/ on scheduled runs, matching how the storage benchmark is handled. - collect now depends on the snapshot-fork job and downloads its per-provider artifacts into a separate sf-artifacts/ dir (kept apart so merge-results.ts doesn't try to parse them as storage results) - post a dedicated "Snapshot/Fork Benchmark Results" PR comment (its own marker, find-or-update) rendered from those artifacts - merge-results.ts: add --mode snapshot-fork to combine per-provider artifacts into results/snapshot-fork/<dataset>/, deduped by provider with scores recomputed (absolute latency ceiling, so no cross-provider normalization needed) - run snapshot-fork on schedule/dispatch (not just PRs) with low, event-scaled iteration counts, clear stale checked-out results, and stage results/snapshot-fork/ in the existing commit-and-push step

superagent-security · 2026-06-24T16:13:36Z

Superagent didn't find any vulnerabilities or security issues in this PR.

github-actions · 2026-06-24T16:17:42Z

Snapshot/Fork Benchmark Results

small dataset

#	Provider	Score	Snapshot create	Fork (snapshot)	Fork (live)	First read	Status
1	Tigris	99.8	0.07s	0.22s	0.13s	0.09s	1/1
2	AWS S3	0.0	0.00s	0.00s	0.00s	0.00s	0/1
3	Cloudflare R2	0.0	0.00s	0.00s	0.00s	0.00s	0/1

View full run

The Tigris adapter wraps unmapped failures in a StorageError whose message is just the code (e.g. "Provider"), with the real error in `.cause`. The benchmark only recorded `err.message`, so CI logged the uninformative "FAILED: Provider" and discarded the underlying cause. Add a `formatError` helper that walks the cause chain and prepends each error's code, and use it at the error-capture sites in the snapshot/fork and storage benchmarks so failures are self-diagnosing.

…hot-fork Tigris snapshots require a Standard-tier, snapshot-enabled bucket, which the default upload/download bucket is not. Add an optional per-provider snapshotFork override (bucket/credentials/requiredEnvVars) applied only in snapshot-fork mode, and point Tigris at TIGRIS_SNAPSHOT_* env vars. Wire the new secrets into the snapshot-fork CI job.

kisernl changed the title ~~feat(storage): add snapshot/fork benchmark mode~~ WIP - feat(storage): add snapshot/fork benchmark mode Jun 24, 2026

superagent-security Bot added the pr:flagged PR flagged for review by security analysis. label Jun 24, 2026

superagent-security Bot reviewed Jun 24, 2026

View reviewed changes

superagent-security Bot removed the pr:flagged PR flagged for review by security analysis. label Jun 24, 2026

kisernl added 3 commits June 24, 2026 17:27

fix: swap to 1mb test only for storage on PR

7c22ae5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP - feat(storage): add snapshot/fork benchmark mode#145

WIP - feat(storage): add snapshot/fork benchmark mode#145
kisernl wants to merge 5 commits into
masterfrom
storage-snapshot-fork-bench

kisernl commented Jun 24, 2026

Uh oh!

open-cla Bot commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

superagent-security Bot left a comment

Uh oh!

superagent-security Bot Jun 24, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

superagent-security Bot commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kisernl commented Jun 24, 2026

Uh oh!

open-cla Bot commented Jun 24, 2026

Contributor License Agreement

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Benchmark Results

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sandbox Benchmark Results

Sequential

Staggered

Burst

Uh oh!

superagent-security Bot left a comment

Choose a reason for hiding this comment

Uh oh!

superagent-security Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Throughput Benchmark Results

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Storage Benchmark Results

1MB Files

4MB Files

10MB Files

16MB Files

Uh oh!

superagent-security Bot commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Snapshot/Fork Benchmark Results

small dataset

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading