Skip to content

cleaning _accumulate_local_param_grad#46394

Open
3outeille wants to merge 8 commits into
sp_tp_ep_planfrom
to_local_swap_tensors
Open

cleaning _accumulate_local_param_grad#46394
3outeille wants to merge 8 commits into
sp_tp_ep_planfrom
to_local_swap_tensors

Conversation

@3outeille
Copy link
Copy Markdown
Member

@3outeille 3outeille commented Jun 4, 2026

We cant remove _accumulate_local_param_grad fully as a user might want to train in TP only (without using grouped_gemm for the experts) which involves moe_tp_gate_up_colwise thus StridedShard (thus need to stitch gradient manuall)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@3outeille 3outeille changed the title remove _accumulate_local_param_grad cleaning _accumulate_local_param_grad Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46394&sha=2eba46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants