Skip to content

[Bug?] Failed to allocate CPU_REPACK buffer -> failed to load model (with --usemmap) #2207

@alex-ie

Description

@alex-ie

Describe the Issue
I have successfully used --usemmap to load the model GGUF of ~110% of free/available RAM. But loading the model 3x free RAM has failed (the model consisted of several GGUF files, does it matter for below?).

In terminal (numbers rounded):

done getting tensors: ... moved from CPU_REPACK, using CPU instead
ggml_aligned_malloc: insufficient memory (attempted to allocate 103 000 MB)
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 108 000 000 000
alloc_tensor_range: failed to allocate CPU_REPACK buffer of size 108 000 000 000
llama_model_load: error loading model: unable to allocate CPU_REPACK buffer
llama_model_load_from_file_impl: failed to load model 

I have used --usemmap, why has the engine tried to allocate amount ~ total size of GGUF? Could it be a bug? If not, does such huge allocation necessity depend of model architecture maybe? Some 120 GB models can be loaded in 40 GB free RAM and some cannot? If so, what it depends on?

https://github.com/LostRuins/koboldcpp/wiki

mmap, or memory-mapped file I/O, maps files or devices into memory. It is a method of reducing the amount of RAM needed for loading the model, as parts can be read from disk into RAM on demand. You can enable it with --usemmap

Additional Information:
v1.112 Linux nocuda

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions