Skip to content

WIP - feat(storage): add snapshot/fork benchmark mode#145

Open
kisernl wants to merge 5 commits into
masterfrom
storage-snapshot-fork-bench
Open

WIP - feat(storage): add snapshot/fork benchmark mode#145
kisernl wants to merge 5 commits into
masterfrom
storage-snapshot-fork-bench

Conversation

@kisernl

@kisernl kisernl commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Add a new snapshot-fork benchmark mode that measures snapshot creation, fork-from-snapshot, and fork-from-live timings against the storage providers, with read-back verification and best-effort teardown of every created object/snapshot/fork.

  • src/storage/snapshot-fork-benchmark.ts: per-iteration seed -> snapshot -> fork(snapshot) -> fork(live) -> verify -> cleanup
  • src/storage/snapshot-fork-types.ts: small/wide/deep dataset presets to separate per-object overhead from bytes-copied cost
  • src/storage/stats.ts: extract shared median/p95/p99 helpers out of benchmark.ts and reuse in both storage benchmarks
  • src/run.ts: wire up --mode snapshot-fork and --dataset, results output
  • package.json: bench:snapshot-fork scripts
  • CI: PR-only snapshot-fork job (single iteration, small dataset) across aws-s3, cloudflare-r2, tigris; skips gracefully without secrets

Add a new `snapshot-fork` benchmark mode that measures snapshot
creation, fork-from-snapshot, and fork-from-live timings against the
storage providers, with read-back verification and best-effort teardown
of every created object/snapshot/fork.

- src/storage/snapshot-fork-benchmark.ts: per-iteration seed -> snapshot
  -> fork(snapshot) -> fork(live) -> verify -> cleanup
- src/storage/snapshot-fork-types.ts: small/wide/deep dataset presets to
  separate per-object overhead from bytes-copied cost
- src/storage/stats.ts: extract shared median/p95/p99 helpers out of
  benchmark.ts and reuse in both storage benchmarks
- src/run.ts: wire up --mode snapshot-fork and --dataset, results output
- package.json: bench:snapshot-fork scripts
- CI: PR-only snapshot-fork job (single iteration, small dataset) across
  aws-s3, cloudflare-r2, tigris; skips gracefully without secrets

NOTE: untested — the snapshot/fork paths have not been run against any
live provider yet. Needs validation before relying on results.
@open-cla

open-cla Bot commented Jun 24, 2026

Copy link
Copy Markdown

Contributor License Agreement

All contributors are covered by a CLA.

@kisernl kisernl changed the title feat(storage): add snapshot/fork benchmark mode WIP - feat(storage): add snapshot/fork benchmark mode Jun 24, 2026
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Browser Benchmark Results

# Provider Score Create Connect Navigate Release Total Status
1 Browserbase 94.2 0.21s 0.12s 0.10s 0.12s 0.61s 10/10
2 Kernel 87.2 0.05s 0.12s 0.13s 0.07s 0.35s 9/10
3 Hyperbrowser 86.4 0.83s 0.24s 0.15s 0.08s 1.40s 10/10
4 Steel 80.9 0.60s 0.77s 0.14s 0.21s 1.99s 10/10

View full run · SVG available as build artifact

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Sandbox Benchmark Results

Sequential

# Provider Score Median TTI P95 P99 Status
1 isorun 99.4 0.03s 0.12s 0.12s 10/10
2 declaw 97.8 0.20s 0.25s 0.25s 10/10
3 northflank 97.6 0.19s 0.31s 0.31s 10/10
4 tensorlake 97.2 0.27s 0.29s 0.29s 10/10
5 daytona 96.8 0.23s 0.44s 0.44s 10/10
6 upstash 96.5 0.28s 0.44s 0.44s 10/10
7 e2b 94.5 0.48s 0.66s 0.66s 10/10
8 blaxel 94.1 0.53s 0.68s 0.68s 10/10
9 modal 93.8 0.57s 0.71s 0.71s 10/10
10 vercel 92.0 0.69s 0.97s 0.97s 10/10
11 archil 84.7 1.50s 1.58s 1.58s 10/10
12 runloop 83.3 0.61s 3.27s 3.27s 10/10
13 hopx 81.1 1.75s 2.10s 2.10s 10/10
14 codesandbox 71.3 2.68s 3.16s 3.16s 10/10
15 cloudflare 67.5 3.09s 3.49s 3.49s 10/10

Staggered

# Provider Score Median TTI P95 P99 Status
1 isorun 99.7 0.03s 0.03s 0.03s 10/10
2 northflank 98.1 0.17s 0.22s 0.22s 10/10
3 declaw 97.2 0.23s 0.36s 0.36s 10/10
4 daytona 96.8 0.23s 0.44s 0.44s 10/10
5 upstash 96.5 0.30s 0.43s 0.43s 10/10
6 tensorlake 95.4 0.40s 0.55s 0.55s 10/10
7 e2b 95.2 0.44s 0.53s 0.53s 10/10
8 blaxel 94.1 0.52s 0.69s 0.69s 10/10
9 modal 94.1 0.56s 0.63s 0.63s 10/10
10 vercel 92.8 0.60s 0.90s 0.90s 10/10
11 archil 85.9 1.34s 1.52s 1.52s 10/10
12 runloop 83.1 0.65s 3.24s 3.24s 10/10
13 hopx 83.0 1.53s 1.95s 1.95s 10/10
14 codesandbox 70.4 2.77s 3.25s 3.25s 10/10
15 cloudflare 68.7 2.97s 3.37s 3.37s 10/10

Burst

# Provider Score Median TTI P95 P99 Status
1 isorun 99.7 0.03s 0.03s 0.03s 10/10
2 northflank 97.8 0.20s 0.24s 0.24s 10/10
3 upstash 95.8 0.34s 0.53s 0.53s 10/10
4 tensorlake 95.8 0.41s 0.44s 0.44s 10/10
5 blaxel 94.4 0.54s 0.59s 0.59s 10/10
6 e2b 94.2 0.49s 0.73s 0.73s 10/10
7 modal 93.9 0.60s 0.63s 0.63s 10/10
8 declaw 93.6 0.62s 0.67s 0.67s 10/10
9 runloop 92.5 0.74s 0.78s 0.78s 10/10
10 vercel 90.9 0.77s 1.12s 1.12s 10/10
11 daytona 86.6 0.28s 0.52s 0.52s 9/10
12 archil 84.4 1.36s 1.87s 1.87s 10/10
13 hopx 76.0 2.00s 2.99s 2.99s 10/10
14 cloudflare 68.0 2.90s 3.63s 3.63s 10/10
15 codesandbox 64.0 3.40s 3.91s 3.91s 10/10

View full run · SVGs available as build artifacts

@superagent-security superagent-security Bot added the pr:flagged PR flagged for review by security analysis. label Jun 24, 2026

@superagent-security superagent-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superagent found 2 security concern(s).

# smallest dataset to keep cost and leak risk minimal. Fork PRs without
# secrets skip gracefully (missing-creds path) rather than failing.
if: github.event_name == 'pull_request'
runs-on: namespace-profile-default

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: New workflow job runs on self-hosted runner for pull_request events

New PR job runs on a self-hosted runner, which fork PRs can reach.

Restrict the runner group to disallow public fork jobs, or use GitHub-hosted runners for PRs.

AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.

<file name=".github/workflows/storage-benchmarks.yml">
<violation number="1" location=".github/workflows/storage-benchmarks.yml:108">
<priority>P0</priority>
<title>New workflow job runs on self-hosted runner for pull_request events</title>
<evidence>The snapshot-fork job uses runs-on: namespace-profile-default with if: github.event_name == 'pull_request'. Self-hosted runners are reachable by fork PRs under the pull_request trigger, and runner state is shared across jobs. This has led to critical compromises (e.g., PyTorch).</evidence>
<recommendation>Verify in GitHub Settings that the namespace-profile-default runner group is restricted to not accept jobs from public forks. If fork PRs must run benchmarks, consider using GitHub-hosted runners (ubuntu-latest) for the fork-PR path, or gate the job behind an approval requirement.</recommendation>
</violation>
</file>

Comment thread .github/workflows/storage-benchmarks.yml
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Browser Throughput Benchmark Results

# Provider Score APS (med) Task (med) Task (p95) Screenshot Status
1 Kernel 59.9 4.61/s 10.85s 14.08s 330ms 3/3
2 Browserbase 51.5 3.48/s 14.39s 15.51s 224ms 3/3
3 Hyperbrowser 34.8 2.67/s 18.69s 32.25s 544ms 3/3
4 Steel 21.1 1.60/s 31.19s 34.41s 666ms 3/3

View full run · SVG available as build artifact

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Storage Benchmark Results

1MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 96.0 0.03s 246.0 Mbps 0.11s 1000/1000
2 AWS S3 95.6 0.05s 172.1 Mbps 0.06s 1000/1000
3 Cloudflare R2 94.7 0.13s 66.3 Mbps 0.24s 1000/1000

4MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 97.0 0.07s 467.6 Mbps 0.15s 10000/10000
2 Cloudflare R2 94.7 0.18s 184.5 Mbps 0.46s 10000/10000

10MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 96.7 0.16s 524.6 Mbps 0.40s 10000/10000
2 Cloudflare R2 94.5 0.27s 308.1 Mbps 0.83s 10000/10000

16MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 95.8 0.31s 427.5 Mbps 0.45s 10000/10000
2 Cloudflare R2 94.3 0.40s 337.8 Mbps 0.86s 9999/10000

View full run · SVGs available as build artifacts

Wire the snapshot/fork benchmark into the collect job so results show up
on PRs and get committed to results/ on scheduled runs, matching how the
storage benchmark is handled.

- collect now depends on the snapshot-fork job and downloads its
  per-provider artifacts into a separate sf-artifacts/ dir (kept apart so
  merge-results.ts doesn't try to parse them as storage results)
- post a dedicated "Snapshot/Fork Benchmark Results" PR comment (its own
  marker, find-or-update) rendered from those artifacts
- merge-results.ts: add --mode snapshot-fork to combine per-provider
  artifacts into results/snapshot-fork/<dataset>/, deduped by provider
  with scores recomputed (absolute latency ceiling, so no cross-provider
  normalization needed)
- run snapshot-fork on schedule/dispatch (not just PRs) with low,
  event-scaled iteration counts, clear stale checked-out results, and
  stage results/snapshot-fork/ in the existing commit-and-push step
@superagent-security

Copy link
Copy Markdown

Superagent didn't find any vulnerabilities or security issues in this PR.

@superagent-security superagent-security Bot removed the pr:flagged PR flagged for review by security analysis. label Jun 24, 2026
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Snapshot/Fork Benchmark Results

small dataset

# Provider Score Snapshot create Fork (snapshot) Fork (live) First read Status
1 Tigris 99.8 0.07s 0.22s 0.13s 0.09s 1/1
2 AWS S3 0.0 0.00s 0.00s 0.00s 0.00s 0/1
3 Cloudflare R2 0.0 0.00s 0.00s 0.00s 0.00s 0/1

View full run

kisernl added 3 commits June 24, 2026 17:27
The Tigris adapter wraps unmapped failures in a StorageError whose
message is just the code (e.g. "Provider"), with the real error in
`.cause`. The benchmark only recorded `err.message`, so CI logged the
uninformative "FAILED: Provider" and discarded the underlying cause.

Add a `formatError` helper that walks the cause chain and prepends each
error's code, and use it at the error-capture sites in the snapshot/fork
and storage benchmarks so failures are self-diagnosing.
…hot-fork

Tigris snapshots require a Standard-tier, snapshot-enabled bucket, which
the default upload/download bucket is not. Add an optional per-provider
snapshotFork override (bucket/credentials/requiredEnvVars) applied only in
snapshot-fork mode, and point Tigris at TIGRIS_SNAPSHOT_* env vars. Wire
the new secrets into the snapshot-fork CI job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant