Skip to content

fix: resolve Windows single-GPU training crash and matplotlib compati…#2794

Open
unknown7751 wants to merge 4 commits into
RVC-Project:mainfrom
unknown7751:fix/windows-single-gpu-training-v2
Open

fix: resolve Windows single-GPU training crash and matplotlib compati…#2794
unknown7751 wants to merge 4 commits into
RVC-Project:mainfrom
unknown7751:fix/windows-single-gpu-training-v2

Conversation

@unknown7751
Copy link
Copy Markdown

…bility

  • Skip mp.Process subprocess for single-GPU runs; call run() directly. On Windows, PyTorch spawn-based child processes silently crash during CUDA context initialization, preventing any training from starting.
  • Guard dist.init_process_group and DDP wrapping behind n_gpus > 1.
  • Set num_workers=0 on Windows to avoid nested subprocess spawning. Remove persistent_workers and prefetch_factor which require num_workers > 0.
  • Replace fig.canvas.tostring_rgb() with fig.canvas.buffer_rgba() in plot_spectrogram_to_numpy(). tostring_rgb() was removed in matplotlib 3.8.

Pull request checklist

  • The PR has a proper title. Use Semantic Commit Messages. (No more branch-name title please)

  • Make sure this is ready to be merged into the relevant branch. Please don't create a PR and let it hang for a few days.

  • Ensure you can run the codes you submitted succesfully. These submissions will be prioritized for review:

    Introduce improvements in program execution speed;

    Introduce improvements in synthesis quality;

    Fix existing bugs reported by user feedback (or you met);

    Introduce more convenient user operations.

PR type

  • Bug fix / new feature / synthesis quality improvement / program execution speed improvement

Description

  • Describe what this pull request is for.
  • What will it affect.

Screenshot

  • Please include a screenshot if applicable

unknown7751 and others added 4 commits May 10, 2026 10:19
…bility

- Skip mp.Process subprocess for single-GPU runs; call run() directly.
  On Windows, PyTorch spawn-based child processes silently crash during
  CUDA context initialization, preventing any training from starting.
- Guard dist.init_process_group and DDP wrapping behind n_gpus > 1.
- Set num_workers=0 on Windows to avoid nested subprocess spawning.
  Remove persistent_workers and prefetch_factor which require num_workers > 0.
- Replace fig.canvas.tostring_rgb() with fig.canvas.buffer_rgba() in
  plot_spectrogram_to_numpy(). tostring_rgb() was removed in matplotlib 3.8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fairseq checkpoint_utils.load_model_ensemble_and_task calls torch.load
without weights_only=False, which raises UnpicklingError in PyTorch 2.6+
where weights_only defaults to True. Apply the same patch used in train.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant