Skip to content

lightonai/fastkmeans-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastkmeans-rs

Crates.io License

A fast Rust implementation of k-means, based on fast-kmeans and flash-kmeans.


Installation

[dependencies]
fastkmeans-rs = "0.1"

Features

Feature Platform Description
cuda NVIDIA GPU Flash-accelerated CUDA with cuBLAS GEMM and warp-cooperative kernels
metal_gpu macOS (Apple Silicon) Metal Performance Shaders GPU acceleration
accelerate macOS Apple Accelerate BLAS for CPU
mkl Linux (Intel/AMD) Intel MKL for CPU (recommended for Linux, fastest)
openblas Linux / Windows OpenBLAS for CPU (requires libopenblas-dev)
# NVIDIA GPU (recommended for Linux)
fastkmeans-rs = { version = "0.1", features = ["cuda"] }

# Apple Silicon GPU
fastkmeans-rs = { version = "0.1", features = ["metal_gpu", "accelerate"] }

# CPU-only with BLAS
fastkmeans-rs = { version = "0.1", features = ["mkl"] }          # Linux (fastest)
fastkmeans-rs = { version = "0.1", features = ["accelerate"] }   # macOS
fastkmeans-rs = { version = "0.1", features = ["openblas"] }     # Linux (fallback)

When cuda or metal_gpu is enabled, FastKMeans automatically uses the GPU. No code changes needed.


Usage

use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

// Generate data: 100K points, 128 dimensions
let data = Array2::random((100_000, 128), Uniform::new(-1.0f32, 1.0));

// Create model with 256 clusters
let config = KMeansConfig::new(256)
    .with_max_iters(25)
    .with_max_points_per_centroid(None);

let mut kmeans = FastKMeans::with_config(config);

// Train
kmeans.train(&data.view()).unwrap();

// Predict
let labels = kmeans.predict(&data.view()).unwrap();

// Or fit + predict in one call
let labels = kmeans.fit_predict(&data.view()).unwrap();

// Access centroids
let centroids = kmeans.centroids().unwrap(); // shape: (256, 128)

Benchmarks

All benchmarks run with 25 iterations.

fastkmeans-rs vs fast-kmeans vs flash-kmeans

Train 100K vectors, 128 dimensions, 25 iterations.

Benchmark

Compared against fast-kmeans and flash-kmeans (optimized Triton kernels). CUDA/GPU benchmarks on H100, Metal GPU on Apple Silicon. fastkmeans-rs is pure Rust with no Python dependency.


Acknowledgements

Based on fast-kmeans and flash-kmeans. Credit for the algorithm design goes to the original authors.

License

Apache-2.0 — see LICENSE.

About

A Rust rewrite of FastKMeans for CPU-based clustering

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages