Skip to content

Latest commit

 

History

History
53 lines (48 loc) · 4.35 KB

File metadata and controls

53 lines (48 loc) · 4.35 KB

Changelog

NVIDIA Nemo Run 0.5.0

  • Fix docs warnings #271
  • Fix docs build #269
  • Support overlapped srun commands in Slurm Ray #263
  • Refactor DGXC Lepton data mover: switch to BatchJob with auto cleanup and sleep after every run #265
  • ci: Fix nemo fw template ref after migrating to new org #256
  • Enable Nsys gpu device metrics #257
  • Sync job code in local tunnel for Slurm Ray job #254
  • Change the create dist job function to support creating a single node #240
  • Making job names match Run:ai requirements and making errors more descriptive #255
  • Support for %j in slurm log retrieval #252
  • Add KubeRay tests for Ray APIs #249
  • Upgrade skypilot executor with 0.9.2 #246
  • Add user scoping for k8s backend and log level support for Ray APIs #247
  • Update to latest Lepton SDK #248
  • Add storage mount options to LeptonExecutor #237
  • Import guard k8s import in Ray Cluster and Job #245
  • Add RayJob and Slurm support for Ray APIs + integration with run.Experiment #236
  • ci: Enforce coverage #238
  • Fix bug with a CLI overwrite #235
  • Add LeptonExecutor support #224
  • Add cancel to docker executor #233
  • Change default log wait timeout to 10s #232
  • Add RayCluster API with Kuberay support #222
  • Add sbatch network arg #230
  • chore: Update package info #227
  • Add support for job groups for local executor #220
  • Roll back get_underlying_types change + introduce extract_constituent #223
  • Fix some bugs for --lazy in CLI #179
  • Adding support for modern type-hints #221
  • Fix bug in CLI with calling a factory-fn inside a list #214
  • Handle more edge cases in --help #219
  • Add autogenerated API reference content to the documentation #190
  • Handle Callable in --help to fix nemo llm export --help error #217
  • Ensure job directory creation for various schedulers #216
  • Adding support for ForwardRef in CLI #176
  • Add additional debug to DGXC data mover #215
  • Handle ctx in entrypoint for experiment #213
  • zozhang/dgxc executor data mover #206
  • Add support for YAML, TOML & JSON #182
  • Add clean mode for experiment to avoid printing any NeMo-Run specific logs #208
  • Fix seed for torchrun #209
  • Support torchrun multi node on local executor #143
  • Add nsys filename param #205
  • Add DGXCloudExecutor docs and update execution guide #192
  • Add --cuda-event-trace=false to nsys command #180