fix: skip cudaHostRegister when DS4_CUDA_COPY_MODEL_CHUNKED is set by kyuz0 · Pull Request #320 · antirez/ds4

kyuz0 · 2026-06-01T14:38:05Z

Fixes a bug where DS4_CUDA_COPY_MODEL_CHUNKED would still cause host RAM exhaustion during model load on APUs.

Context:
Previously, even when DS4_CUDA_COPY_MODEL_CHUNKED=1 was set, ds4_gpu_set_model_map would still unconditionally call cudaHostRegister on the entire memory-mapped model. cudaHostRegister page-locks the memory, meaning the chunked copy's subsequent posix_madvise(DONTNEED) calls failed to release the RAM.

On AMD APUs like Ryzen AI Max "Strix Halo" (unified memory), this resulted in 80GB+ of system RAM being pinned, completely defeating the purpose of the chunked copy and causing an immediate OOM.

This patch simply exits ds4_gpu_set_model_map early if chunking is enabled, allowing the chunked copy loop to correctly page-in and discard system RAM as intended.

When DS4_CUDA_COPY_MODEL_CHUNKED is set, skip cudaHostRegister. Registering a large memory map prevents posix_madvise(DONTNEED) from freeing pages during the chunked copy, leading to catastrophic system RAM exhaustion on APUs with unified memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip cudaHostRegister when DS4_CUDA_COPY_MODEL_CHUNKED is set#320

fix: skip cudaHostRegister when DS4_CUDA_COPY_MODEL_CHUNKED is set#320
kyuz0 wants to merge 1 commit into
antirez:rocmfrom
kyuz0:rocm

kyuz0 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kyuz0 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant