fix: filter single-file safetensors by assigned layers before push#83
Open
cjchanh wants to merge 1 commit intoevilsocket:mainfrom
Open
fix: filter single-file safetensors by assigned layers before push#83cjchanh wants to merge 1 commit intoevilsocket:mainfrom
cjchanh wants to merge 1 commit intoevilsocket:mainfrom
Conversation
When a worker is assigned a subset of layers from a single-file safetensors model, extract only the needed tensors instead of pushing the entire file. For Qwen2.5-7B-4bit (4 GiB), a 2-layer iPad worker now receives 250 MiB instead of 4 GiB — staying well under the 3 GiB iOS jetsam limit. The indexed model path already filtered correctly via weight_map. This extends the same extraction to the single-file fallback by: - Reading the safetensors header to enumerate tensor names - Filtering by assigned layer prefixes - Calling extract_layer_tensors to build a minimal blob - Falling back to full push when layers is empty (backward compat) Verified: M5 master + iPad Air M3 worker, 2 layers, 250.1 MiB push, 1.4 GiB RSS, coherent output at 17.21 tok/s.
Author
|
This fix is still relevant from my side. I attempted a conflict-only rebase against current main but found that recent upstream changes (PR #84's iOS TCP retry refactor and adjacent commits) introduce API drift beyond a simple merge — |
cjchanh
added a commit
to cjchanh/cake
that referenced
this pull request
May 1, 2026
…row resolution) Mobile workers receiving a single-file `.safetensors` model previously got the FULL file regardless of layer assignment. On 4 GiB single-file models (Qwen2.5-7B-Instruct-4bit) this exceeded iPad jetsam budgets and crashed with `early eof`. Same root cause as PR evilsocket#83 against cake/main, but applied here on q4-metal-patchset (PR evilsocket#82's source branch) since PR evilsocket#83's branch (`fix/single-file-layer-filter` at ee01115) has API drift against current upstream and isn't cleanly rebasable. Changes: * cake-core/src/utils/split.rs: - extract `reduce_for_layers(&Index, &[String])` from the worker- specific `reduce_for_worker` (more general, layer-list-driven) - introduce `ReducedModelBundle { index_json, safetensors }` for the reduced-bundle return type - add `build_reduced_single_file_bundle(model_path, layers)` that reads the safetensors header, filters tensors by layer prefixes, and emits a minimal safetensors blob + matching index.json * cake-core/src/cake/sharding/mod.rs: - replace the single-file fallback (which pushed the full model regardless of layer) with the reduced-bundle path - generalize `inline_files: HashMap<String, Vec<u8>>` so both the indexed and single-file paths can stream multiple inline blobs (index + reduced safetensors) - import `HashMap` (already had `HashSet`) Test coverage and benchmark updates pair with this in the existing q4-metal-patchset commits. Closes spec 199's cake-q4-branch NEEDS_OPERATOR_DECISION row with disposition: COMMIT (intentional q4 follow-up; preserves PR evilsocket#82 contribution path; commit stays local until operator authorizes fork push). Spec: 199-triage-dirty-trees-across-active-portfolio (cake-q4-branch row) SOP: ~/Documents/Centennial/SOPs/CDS_Stuck_Spec_Triage_SOP_v1.md §3.A v1.1 Triage report: ~/ai/evidence/spec-096-triage-20260430/TRIAGE_REPORT.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a Cake master distributes a single-file safetensors model to a worker, it pushes the entire file regardless of how many layers the worker is assigned. For
Qwen2.5-7B-Instruct-4bit(4 GiB single file), an iPad worker with a 3 GiB jetsam budget receives the full 4 GiB, exceeds memory, and crashes withearly eof.The indexed model path (
model.safetensors.index.jsonpresent) already filters correctly viaweight_map. The single-file fallback atsharding/mod.rsunconditionally addsmodel.safetensorsto the push list.Fix
For single-file models with assigned layers, the push path now:
starts_withlogic as the indexed path)extract_layer_tensorsto build a minimal safetensors blob containing only the needed tensorsBackward compatible: if
layersis empty (no specific assignment), the full file is still pushed. If no tensors match assigned layers, falls back to full push with a warning.Results
Tested with M5 Max master + iPad Air M3 worker,
Qwen2.5-7B-Instruct-4bit:early eof)Test plan
cargo test -p cake-core --lib— 641 tests pass (638 existing + 3 new)cargo test -p cake-core --test unit— 235 tests passcargo clippy— zero new warningsNew unit tests
extract_layer_tensors_single_file_filters_correctly— 4 tensors in, request 2, verify only 2 in output with correct data bytesextract_layer_tensors_single_file_all_layers— request all tensors, verify all present with correct total sizeextract_layer_tensors_single_file_missing_tensor_errors— request nonexistent tensor, verify error