Skip to content

[Optimization] Incremental checkpoint save for dcp on torch 2.7.x (ARM CPU optimization)#1525

Open
tina-wen wants to merge 1 commit intoInternLM:mainfrom
tina-wen:dcp_save
Open

[Optimization] Incremental checkpoint save for dcp on torch 2.7.x (ARM CPU optimization)#1525
tina-wen wants to merge 1 commit intoInternLM:mainfrom
tina-wen:dcp_save

Conversation

@tina-wen
Copy link

@tina-wen tina-wen commented Mar 3, 2026

Description

This PR optimizes dcp.save performance on ARM CPUs by implementing incremental metadata saving for torch 2.7.1.

Implementation

  • Incremental save: Only save metadata changes after first checkpoint
  • xtuner framework patch: Added patch_for_dcp_finish config flag
  • API update: Switch to storage_writer/planner for dcp.save

Performance

Checkpoint saving performance improved by up to 85%

Compatibility

✅ Works with existing ckpt_save
✅ No precision issues on recovery
✅ No PyTorch/PTA source changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant