fastkmeans-rs

A fast Rust implementation of k-means, based on fast-kmeans and flash-kmeans.

Installation

[dependencies]
fastkmeans-rs = "0.1"

Features

Feature	Platform	Description
`cuda`	NVIDIA GPU	Flash-accelerated CUDA with cuBLAS GEMM and warp-cooperative kernels
`metal_gpu`	macOS (Apple Silicon)	Metal Performance Shaders GPU acceleration
`accelerate`	macOS	Apple Accelerate BLAS for CPU
`mkl`	Linux (Intel/AMD)	Intel MKL for CPU (recommended for Linux, fastest)
`openblas`	Linux / Windows	OpenBLAS for CPU (requires `libopenblas-dev`)

# NVIDIA GPU (recommended for Linux)
fastkmeans-rs = { version = "0.1", features = ["cuda"] }

# Apple Silicon GPU
fastkmeans-rs = { version = "0.1", features = ["metal_gpu", "accelerate"] }

# CPU-only with BLAS
fastkmeans-rs = { version = "0.1", features = ["mkl"] }          # Linux (fastest)
fastkmeans-rs = { version = "0.1", features = ["accelerate"] }   # macOS
fastkmeans-rs = { version = "0.1", features = ["openblas"] }     # Linux (fallback)

When cuda or metal_gpu is enabled, FastKMeans automatically uses the GPU. No code changes needed.

Usage

use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

// Generate data: 100K points, 128 dimensions
let data = Array2::random((100_000, 128), Uniform::new(-1.0f32, 1.0));

// Create model with 256 clusters
let config = KMeansConfig::new(256)
    .with_max_iters(25)
    .with_max_points_per_centroid(None);

let mut kmeans = FastKMeans::with_config(config);

// Train
kmeans.train(&data.view()).unwrap();

// Predict
let labels = kmeans.predict(&data.view()).unwrap();

// Or fit + predict in one call
let labels = kmeans.fit_predict(&data.view()).unwrap();

// Access centroids
let centroids = kmeans.centroids().unwrap(); // shape: (256, 128)

Benchmarks

All benchmarks run with 25 iterations.

fastkmeans-rs vs fast-kmeans vs flash-kmeans

Train 100K vectors, 128 dimensions, 25 iterations.

Compared against fast-kmeans and flash-kmeans (optimized Triton kernels). CUDA/GPU benchmarks on H100, Metal GPU on Apple Silicon. fastkmeans-rs is pure Rust with no Python dependency.

Acknowledgements

Based on fast-kmeans and flash-kmeans. Credit for the algorithm design goes to the original authors.

License

Apache-2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
assets		assets
benches		benches
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastkmeans-rs

Installation

Features

Usage

Benchmarks

fastkmeans-rs vs fast-kmeans vs flash-kmeans

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fastkmeans-rs

Installation

Features

Usage

Benchmarks

fastkmeans-rs vs fast-kmeans vs flash-kmeans

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages