Skip to content

feat: add --threads-all option to llama-bench#25261

Open
xiaobai0529 wants to merge 1 commit into
ggml-org:masterfrom
xiaobai0529:feature/bench-threads-all
Open

feat: add --threads-all option to llama-bench#25261
xiaobai0529 wants to merge 1 commit into
ggml-org:masterfrom
xiaobai0529:feature/bench-threads-all

Conversation

@xiaobai0529

Copy link
Copy Markdown

Summary

Add a new command-line option -ta / --threads-all to llama-bench that uses all logical cores (including SMT/hyperthreading) instead of only physical cores.

Problem

As discussed in issue #17611, llama-bench currently uses cpu_get_num_math() which returns the number of physical cores. On systems with SMT/hyperthreading enabled (e.g., dual CPU with 36 cores / 72 threads), this results in only 50% CPU utilization being reported by default.

While using physical cores is generally optimal for llama.cpp workloads, users who want to benchmark their system's full capabilities need to manually specify thread count with -t 72.

Solution

Add a convenient -ta / --threads-all flag that automatically uses std::thread::hardware_concurrency() to get all logical cores.

Usage

# Use all logical cores (including SMT)
./llama-bench -ta
./llama-bench --threads-all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant