[Bug?] Failed to allocate CPU_REPACK buffer -> failed to load model (with `--usemmap`)

**Describe the Issue**
I have successfully used `--usemmap` to load the model GGUF of ~110% of free/available RAM. But loading the model 3x free RAM has failed (the model consisted of several GGUF files, does it matter for below?).

In terminal (numbers rounded):
```
done getting tensors: ... moved from CPU_REPACK, using CPU instead
ggml_aligned_malloc: insufficient memory (attempted to allocate 103 000 MB)
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 108 000 000 000
alloc_tensor_range: failed to allocate CPU_REPACK buffer of size 108 000 000 000
llama_model_load: error loading model: unable to allocate CPU_REPACK buffer
llama_model_load_from_file_impl: failed to load model 
```

I have used `--usemmap`, why has the engine tried to allocate amount ~ total size of GGUF? Could it be a bug? If not, does such huge allocation necessity depend of model architecture maybe? Some 120 GB models can be loaded in 40 GB free RAM and some cannot? If so, what it depends on?

https://github.com/LostRuins/koboldcpp/wiki

> mmap, or memory-mapped file I/O, maps files or devices into memory. It is a method of reducing the amount of RAM needed for loading the model, as parts can be read from disk into RAM on demand. You can enable it with --usemmap

**Additional Information:**
v1.112 Linux nocuda


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug?] Failed to allocate CPU_REPACK buffer -> failed to load model (with `--usemmap`) #2207

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug?] Failed to allocate CPU_REPACK buffer -> failed to load model (with --usemmap) #2207

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[Bug?] Failed to allocate CPU_REPACK buffer -> failed to load model (with `--usemmap`) #2207