HF Buckets integration to Trainer by SunMarc · Pull Request #46386 · huggingface/transformers

SunMarc · 2026-06-03T16:36:55Z

What does this PR do?

This PR adds support for HF Buckets in Trainer. Instead of storing checkpoints locally or on the hub repository, we can leverage HF Buckets for that (S3 like storage + XET dedup).

With buckets (push_to_buckets=True), users don't need to save the checkpoints on the hub anymore like before (hub_strategy="checkpoint" or "all_checkpoints") as we push all checkpoints to the bucket. Still, pushing to the hub is still useful (hub_strategy="every_save") as it will upload a version of a model that can we load with transformers.

Features:

Saving checkpoints to HF Buckets
Resuming training from HF Buckets

Right now, everything should be pretty much compatible like before since we are just synching a local dir to the HF Buckets async but in the future, it would be a lot better if we are able load / save directly from / to HF Buckets without going through the disk, might be useful for deepspeed and fsdp.

Usage

from transformers import Trainer, TrainingArguments

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="out",
        save_strategy="steps",
        push_to_bucket=True,
        bucket_id="bucket_id",
    ),
    train_dataset=train_dataset,
)
trainer.train()

# resume on a fresh machine — pulls the latest checkpoint from the bucket
trainer.train(resume_from_checkpoint="hf://buckets/my-org/my-run")

HuggingFaceDocBuilderDev · 2026-06-03T16:51:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu

super nice! 🪣

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ArthurZucker

Nice! was wondering if mounting the bucket and just writing to it would not make more sense?

good defaults?! since anyways

ArthurZucker · 2026-06-04T12:32:24Z

+    model=model,
+    args=TrainingArguments(
+        push_to_bucket=True,
+        bucket_id="my-org/my-run",


we could have a good default for this IMO, the least amount of args the better!

Yeah, I didn't precise but it will default to hub_model_id if it is set or default to output_dir otherwise.

so this is actually optional

ArthurZucker · 2026-06-04T12:33:22Z

+        self.push_in_progress = None  # Tracks the in-flight repo push
        if self.args.push_to_hub:
            self.init_hf_repo()
+        if self.args.push_to_bucket and self.is_world_process_zero():


we could have push_in_progress = True default to push to a bucket?

this is a private attribute not related to push to a bucket, more an arg to track that we are still pushing a checkpoint so we shouldn't interrupt it. If we want to force pushing to a bucket, we would have to set push_to_bucket = True as the default

qgallouedec

lgtm, just a few question and edge cases

qgallouedec · 2026-06-04T12:31:42Z

-                revision=self.args.hub_revision,
+
+            # Full checkpoint -> repo (unchanged behavior, gated by hub_strategy).
+            if self.args.hub_strategy in [HubStrategy.CHECKPOINT, HubStrategy.ALL_CHECKPOINTS]:


we don't need the modeling_files in this case?

There are three things happening sequentially:

if push to hub: we update the output_dir to have the most recent model + tokenizer and so on (this is the modeling_files) and we upload that to the hub

if push to hub +`hub_strategy="checkpoint" or "all_checkpoints", we either update last_checkpoint folder with the new checkpoint or upload the new checkpoint file

if push_to_bucket, we sync the output_dir folder with the bucket

One thing that I didn't add but we should maybe is that right now we don't update the output_dir with modeling_files when pushing to bucket and this is actually something we should maybe to do have parity with push to hub and not create confusion.

qgallouedec · 2026-06-04T12:41:42Z

+        """Push model files to the repo and/or sync the checkpoint to the bucket, from a checkpoint folder."""
        if not self.is_world_process_zero() or self.args.hub_strategy == HubStrategy.END:
            return
        # If we haven't finished the last push, we don't do this one unless args.hub_always_push=True.


I think this is problematic: when push_to_bucket=True, push_to_hub=False, hub_strategy="end", this disables the bucket entirely. Would someone use push_to_hub=False and hub_strategy="end"? Maybe it could be

- self.args.hub_strategy == HubStrategy.END + (self.args.push_to_hub and self.args.hub_strategy == HubStrategy.END) and not push_to_bucket

something like this

Nice thanks for noticing this. I will update this !

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

…ainer

github-actions · 2026-06-05T14:00:48Z

CI Dashboard: View test results in Grafana

github-actions · 2026-06-05T14:03:17Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46386&sha=58a878

SunMarc added 3 commits June 3, 2026 15:34

buckets integration to trainer

9c0d084

udapte

9a396a8

udpate

c7dd8ff

stevhliu approved these changes Jun 3, 2026

View reviewed changes

Comment thread docs/source/en/trainer_recipes.md Outdated

Comment thread docs/source/en/trainer_recipes.md Outdated

Comment thread docs/source/en/trainer_recipes.md Outdated

Comment thread docs/source/en/trainer_recipes.md Outdated

davanstrien mentioned this pull request Jun 4, 2026

docs: surface Transformers Trainer → HF Buckets huggingface/hub-docs#2528

Draft

3 tasks

Apply suggestions from code review

2c085d1

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

SunMarc requested a review from qgallouedec June 4, 2026 12:26

ArthurZucker approved these changes Jun 4, 2026

View reviewed changes

qgallouedec approved these changes Jun 4, 2026

View reviewed changes

SunMarc and others added 6 commits June 4, 2026 17:31

Update src/transformers/training_args.py

0218015

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Fix

5b21903

Merge remote-tracking branch 'origin/buckets-trainer' into buckets-tr…

67fa6bc

…ainer

style

9629cb8

update

47cadc4

Merge branch 'main' into buckets-trainer

58a878d

Conversation

SunMarc commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Uh oh!

HuggingFaceDocBuilderDev commented Jun 3, 2026

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SunMarc commented Jun 3, 2026 •

edited

Loading

SunMarc Jun 4, 2026 •

edited

Loading