fix: filter single-file safetensors by assigned layers before push by cjchanh · Pull Request #83 · evilsocket/cake

cjchanh · 2026-04-14T17:03:41Z

Problem

When a Cake master distributes a single-file safetensors model to a worker, it pushes the entire file regardless of how many layers the worker is assigned. For Qwen2.5-7B-Instruct-4bit (4 GiB single file), an iPad worker with a 3 GiB jetsam budget receives the full 4 GiB, exceeds memory, and crashes with early eof.

The indexed model path (model.safetensors.index.json present) already filters correctly via weight_map. The single-file fallback at sharding/mod.rs unconditionally adds model.safetensors to the push list.

Fix

For single-file models with assigned layers, the push path now:

Reads only the safetensors header to enumerate tensor names
Filters tensors by assigned layer prefixes (same starts_with logic as the indexed path)
Calls extract_layer_tensors to build a minimal safetensors blob containing only the needed tensors
Pushes the reduced blob instead of the full file

Backward compatible: if layers is empty (no specific assignment), the full file is still pushed. If no tensors match assigned layers, falls back to full push with a warning.

Results

Tested with M5 Max master + iPad Air M3 worker, Qwen2.5-7B-Instruct-4bit:

Metric	Before	After
Push size	4 GiB (full model)	250.1 MiB (52 tensors, 2 layers)
iPad RSS	jetsam kill	1.4 GiB (under 3 GiB limit)
Result	crash (`early eof`)	coherent output at 17.21 tok/s

Test plan

cargo test -p cake-core --lib — 641 tests pass (638 existing + 3 new)
cargo test -p cake-core --test unit — 235 tests pass
cargo clippy — zero new warnings
Integration: M5 master + iPad Air M3, 2 layers of 7B-4bit, verified 250.1 MiB push, 1.4 GiB RSS, correct inference
Extended inference: longer generation to verify sustained correctness across distributed layers

New unit tests

extract_layer_tensors_single_file_filters_correctly — 4 tensors in, request 2, verify only 2 in output with correct data bytes
extract_layer_tensors_single_file_all_layers — request all tensors, verify all present with correct total size
extract_layer_tensors_single_file_missing_tensor_errors — request nonexistent tensor, verify error

When a worker is assigned a subset of layers from a single-file safetensors model, extract only the needed tensors instead of pushing the entire file. For Qwen2.5-7B-4bit (4 GiB), a 2-layer iPad worker now receives 250 MiB instead of 4 GiB — staying well under the 3 GiB iOS jetsam limit. The indexed model path already filtered correctly via weight_map. This extends the same extraction to the single-file fallback by: - Reading the safetensors header to enumerate tensor names - Filtering by assigned layer prefixes - Calling extract_layer_tensors to build a minimal blob - Falling back to full push when layers is empty (backward compat) Verified: M5 master + iPad Air M3 worker, 2 layers, 250.1 MiB push, 1.4 GiB RSS, coherent output at 17.21 tok/s.

cjchanh · 2026-04-30T17:58:57Z

This fix is still relevant from my side. I attempted a conflict-only rebase against current main but found that recent upstream changes (PR #84's iOS TCP retry refactor and adjacent commits) introduce API drift beyond a simple merge — Strategy::assign_layers trait signature changed (7→8 params), Message::DeviceInfoRequest variant was removed, and the BUILD_HASH constant location shifted, producing 16 compile errors when ee01115 is rebased onto current main. Rather than ship a broken-build force-push, I'm leaving this PR in CONFLICTING state. Happy to either redo this as a fresh PR against current main (cherry-picking only the minimal safetensors-filter logic) or close this in favor of that — let me know which you'd prefer.

…row resolution) Mobile workers receiving a single-file `.safetensors` model previously got the FULL file regardless of layer assignment. On 4 GiB single-file models (Qwen2.5-7B-Instruct-4bit) this exceeded iPad jetsam budgets and crashed with `early eof`. Same root cause as PR evilsocket#83 against cake/main, but applied here on q4-metal-patchset (PR evilsocket#82's source branch) since PR evilsocket#83's branch (`fix/single-file-layer-filter` at ee01115) has API drift against current upstream and isn't cleanly rebasable. Changes: * cake-core/src/utils/split.rs: - extract `reduce_for_layers(&Index, &[String])` from the worker- specific `reduce_for_worker` (more general, layer-list-driven) - introduce `ReducedModelBundle { index_json, safetensors }` for the reduced-bundle return type - add `build_reduced_single_file_bundle(model_path, layers)` that reads the safetensors header, filters tensors by layer prefixes, and emits a minimal safetensors blob + matching index.json * cake-core/src/cake/sharding/mod.rs: - replace the single-file fallback (which pushed the full model regardless of layer) with the reduced-bundle path - generalize `inline_files: HashMap<String, Vec<u8>>` so both the indexed and single-file paths can stream multiple inline blobs (index + reduced safetensors) - import `HashMap` (already had `HashSet`) Test coverage and benchmark updates pair with this in the existing q4-metal-patchset commits. Closes spec 199's cake-q4-branch NEEDS_OPERATOR_DECISION row with disposition: COMMIT (intentional q4 follow-up; preserves PR evilsocket#82 contribution path; commit stays local until operator authorizes fork push). Spec: 199-triage-dirty-trees-across-active-portfolio (cake-q4-branch row) SOP: ~/Documents/Centennial/SOPs/CDS_Stuck_Spec_Triage_SOP_v1.md §3.A v1.1 Triage report: ~/ai/evidence/spec-096-triage-20260430/TRIAGE_REPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: filter single-file safetensors by assigned layers before push#83

fix: filter single-file safetensors by assigned layers before push#83
cjchanh wants to merge 1 commit intoevilsocket:mainfrom
cjchanh:fix/single-file-layer-filter

cjchanh commented Apr 14, 2026

Uh oh!

cjchanh commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cjchanh commented Apr 14, 2026

Problem

Fix

Results

Test plan

New unit tests

Uh oh!

cjchanh commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant