A fast Rust implementation of k-means, based on fast-kmeans and flash-kmeans.
[dependencies]
fastkmeans-rs = "0.1"| Feature | Platform | Description |
|---|---|---|
cuda |
NVIDIA GPU | Flash-accelerated CUDA with cuBLAS GEMM and warp-cooperative kernels |
metal_gpu |
macOS (Apple Silicon) | Metal Performance Shaders GPU acceleration |
accelerate |
macOS | Apple Accelerate BLAS for CPU |
mkl |
Linux (Intel/AMD) | Intel MKL for CPU (recommended for Linux, fastest) |
openblas |
Linux / Windows | OpenBLAS for CPU (requires libopenblas-dev) |
# NVIDIA GPU (recommended for Linux)
fastkmeans-rs = { version = "0.1", features = ["cuda"] }
# Apple Silicon GPU
fastkmeans-rs = { version = "0.1", features = ["metal_gpu", "accelerate"] }
# CPU-only with BLAS
fastkmeans-rs = { version = "0.1", features = ["mkl"] } # Linux (fastest)
fastkmeans-rs = { version = "0.1", features = ["accelerate"] } # macOS
fastkmeans-rs = { version = "0.1", features = ["openblas"] } # Linux (fallback)When cuda or metal_gpu is enabled, FastKMeans automatically uses the GPU. No code changes needed.
use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
// Generate data: 100K points, 128 dimensions
let data = Array2::random((100_000, 128), Uniform::new(-1.0f32, 1.0));
// Create model with 256 clusters
let config = KMeansConfig::new(256)
.with_max_iters(25)
.with_max_points_per_centroid(None);
let mut kmeans = FastKMeans::with_config(config);
// Train
kmeans.train(&data.view()).unwrap();
// Predict
let labels = kmeans.predict(&data.view()).unwrap();
// Or fit + predict in one call
let labels = kmeans.fit_predict(&data.view()).unwrap();
// Access centroids
let centroids = kmeans.centroids().unwrap(); // shape: (256, 128)All benchmarks run with 25 iterations.
Train 100K vectors, 128 dimensions, 25 iterations.
Compared against fast-kmeans and flash-kmeans (optimized Triton kernels). CUDA/GPU benchmarks on H100, Metal GPU on Apple Silicon. fastkmeans-rs is pure Rust with no Python dependency.
Based on fast-kmeans and flash-kmeans. Credit for the algorithm design goes to the original authors.
Apache-2.0 — see LICENSE.
