HuggingFace safetensors load benchmark#3396
Open
mridul-sahu wants to merge 2 commits into
Open
Conversation
A/B-comparison load benchmark for HuggingFace-format safetensors checkpoints: the SafetensorLoadBenchmark generator, per-tensor sharding staged by prepare.py (leading-dim FSDP / inner-dim TP), tiered model configs (GPT-2 through DeepSeek-V3 / Llama-3-405B), per-host correctness digests, and a run_suite.sh tier runner. The loader reports its per-host read accounting via jax.monitoring, so per-host bytes/reads land in the card without TensorStore counters.
The current (transient-array) loader now emits /jax/orbax/read/safetensors/ bytes_read and num_reads per host, the same channel the sharding-driven loader uses, so the load benchmark's per-host bytes/reads card populates for the baseline too and an A/B run compares like-for-like. storage_reads is omitted: each per-host bundle is one contiguous read, so it would equal num_reads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a load-only benchmark for HuggingFace-format safetensors checkpoints under
_src/testing/benchmarks/safetensor/, for A/B-comparing the safetensors loadpath across revisions:
SafetensorLoadBenchmarkgenerator + tiered model configs (GPT-2 throughDeepSeek-V3 / Llama-3-405B, in leading-dim FSDP and inner-dim TP variants),
staged by
prepare.py— it downloads a model from the HF Hub, optionallymirrors it to GCS, and emits the per-tensor sharding spec read from the
safetensors headers. Per-host SHA-256 digests carry load correctness.
read accounting via
jax.monitoring(
/jax/orbax/read/safetensors/bytes_read,num_reads). It has no TensorStorecounters, so this is what lets the benchmark's per-host bytes/reads card
populate — and lets a future loader change be A/B'd against this baseline on
the same channel. Telemetry only; load behavior is unchanged.
Builds on the per-name benchmark card work merged in #3395.
Type of change
Checklist