You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix trust_remote_code and gradient checkpointing for custom models (#696)
* Fix: Add trust_remote_code=True for models with custom code
- Add trust_remote_code=True to all AutoConfig/AutoTokenizer.from_pretrained() calls
- Add torchrun path resolution (shutil.which with sys.executable fallback)
- Pass trust_remote_code=True to base_model_args and VLM helper functions
This fixes training failures for models like Nemotron that use remote code.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Fix: Handle models without gradient checkpointing support
Wrap gradient_checkpointing_enable() in try/except to handle models
like NemotronH that don't support gradient checkpointing.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Address reviewer feedback and fix ruff formatting
- Narrow exception handling in is_gpt_oss_model to catch specific
exceptions (OSError, ValueError) instead of bare Exception, and
log the failure details for debugging
- Add trust_remote_code=True to process_documents_for_pretraining()
tokenizer loading for consistency with configure_tokenizer()
- Replace invalid fast_tokenizer kwarg with use_fast in
tokenizer_utils.py setup_tokenizer()
- Create shared _enable_gradient_checkpointing_if_supported() helper
on Model base class, catching ValueError, NotImplementedError, and
AttributeError; use it in both LigerModel and CausalLMModel
- Improve torchrun fallback to use sys.executable -m
torch.distributed.run instead of assuming a sibling script exists
- Fix ruff formatting for AutoConfig.from_pretrained call
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Fix unit tests to expect trust_remote_code=True
Update test assertions to expect trust_remote_code=True parameter
in AutoTokenizer.from_pretrained calls after adding this parameter
to process_documents_for_pretraining.
* Fix ruff formatting: break long assertion line
* Make trust_remote_code configurable via flag and environment variable
Instead of hardcoding trust_remote_code=True everywhere:
1. Add trust_remote_code field to TrainingArgs (default: False)
2. Add --trust_remote_code argparse flag to subprocess CLI
3. Support TRUST_REMOTE_CODE=1 environment variable
4. Thread the setting through Model, tokenizer, and config calls
5. Remove torchrun fallback — error if torchrun is not found
6. Remove unnecessary try/except in is_gpt_oss_model
7. Remove redundant use_fast=True from tokenizer_utils
The env var is exported by main() when the flag is set, so
downstream calls (data_process, tokenizer_utils, gpt_oss_utils)
automatically pick it up without needing explicit parameter threading.
* Document trust_remote_code in README
Add trust_remote_code to the TrainingArgs table and document the
TRUST_REMOTE_CODE environment variable in the environment variables
section.
* Enable local mamba kernel pre-population for NemotronH models
NemotronH has Mamba layers just like GraniteMoeHybrid and needs
the same _use_local_mamba_kernels() call to avoid causal_conv1d_cuda
import failures in torchrun subprocesses.
* Fix lint: revert torchrun shutil.which, remove unused imports, ruff format
The torchrun-not-found issue was caused by the venv not being
activated, not an installation problem. Revert to plain 'torchrun'
command. Remove now-unused shutil and sys imports. Run ruff format
on all modified files.
* Clarify trust_remote_code docs with security warning
* Add FP8 dequantization and requantization for Ministral VLM training
Ministral-3-3B ships with FP8 quantized weights that include scalar
parameters (weight_scale_inv, activation_scale) which FSDP rejects.
This change dequantizes FP8 weights to bf16 after VLM extraction for
training compatibility, preserves the original scales, and requantizes
back to FP8 at checkpoint save time so saved checkpoints match the
original FP8 format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Ruff formatting fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Mustafa Eyceoz <meyceoz@redhat.com>
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -244,6 +244,7 @@ for training jobs. There are a number of options you can specify, such as settin
244
244
| distributed_backend | Specifies which distributed training backend to use. Supported options are "fsdp" and "deepspeed". |
245
245
| disable_flash_attn | Disables flash attention when set to true. This allows for training on older devices. |
246
246
| keep_last_checkpoint_only | Determines whether we should only keep the last checkpoint directory - the previous checkpoint directory is always overwritten. The checkpoint directory is called `last_epoch`. |
247
+
| trust_remote_code | Controls whether repository-provided Python code from HuggingFace Hub is executed when loading models and tokenizers. This is required for models that ship custom modeling code, such as Nemotron, Ministral, and Qwen3.5. Can also be enabled via the `TRUST_REMOTE_CODE=1` environment variable. Defaults to `False`. **Security note:** enabling this setting will execute remote code from the model repository — only enable it for sources you trust. |
247
248
248
249
### `DeepSpeedOptions`
249
250
@@ -507,6 +508,7 @@ run_training(
507
508
Below is a list of custom environment variables users can set in the training library.
508
509
509
510
1.`INSTRUCTLAB_NCCL_TIMEOUT_MS`, this environment variable controls the NCCL timeout in milliseconds. Consider increasing if seeing FSDP related NCCL errors.
511
+
2.`TRUST_REMOTE_CODE`, when set to `1`, allows repository-provided Python code from HuggingFace Hub to be executed when loading models and tokenizers. This is required for models that ship custom modeling code (e.g. Nemotron, Ministral, Qwen3.5). Equivalent to setting `trust_remote_code=True` in `TrainingArgs`. Only enable for sources you trust.
0 commit comments