From 5ad9c0507145151e63c73bb9a4c0f9692bb53135 Mon Sep 17 00:00:00 2001 From: JasonOA888 Date: Fri, 13 Mar 2026 12:11:03 +0800 Subject: [PATCH] docs: clarify 100B refers to training tokens, not model size This fixes confusion where readers misinterpret '100B model' as referring to parameter count, when it actually refers to training tokens. The change makes this distinction clear: - Before: 'run a 100B BitNet b1.58 model' - After: 'run a BitNet b1.58 model trained on 100B tokens' Closes #391 Co-authored-by: Jason L --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3bb25596e..d0daff89f 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Try it out via this [demo](https://demo-bitnet-h0h8hcfqeqhrf5gf.canadacentral-01 bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support **fast** and **lossless** inference of 1.58-bit models on CPU and GPU (NPU support will coming next). -The first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of **1.37x** to **5.07x** on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by **55.4%** to **70.0%**, further boosting overall efficiency. On x86 CPUs, speedups range from **2.37x** to **6.17x** with energy reductions between **71.9%** to **82.2%**. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. Please refer to the [technical report](https://arxiv.org/abs/2410.16144) for more details. +The first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of **1.37x** to **5.07x** on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by **55.4%** to **70.0%**, further boosting overall efficiency. On x86 CPUs, speedups range from **2.37x** to **6.17x** with energy reductions between **71.9%** to **82.2%**. Furthermore, bitnet.cpp can run a BitNet b1.58 model trained on 100B tokens on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. Please refer to the [technical report](https://arxiv.org/abs/2410.16144) for more details. **Latest optimization** introduces parallel kernel implementations with configurable tiling and embedding quantization support, achieving **1.15x to 2.1x** additional speedup over the original implementation across different hardware platforms and workloads. For detailed technical information, see the [optimization guide](src/README.md).