-
-
Notifications
You must be signed in to change notification settings - Fork 195
Add KMedoids clusterer #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.6
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,6 @@ | ||
| - 2.5.3 | ||
| - Added K Medoids clusterer | ||
|
|
||
| - 2.5.2 | ||
| - Fix bug in One-class SVM inferencing | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,60 @@ | ||||||||||||||
| <?php | ||||||||||||||
|
|
||||||||||||||
| namespace Rubix\ML\Benchmarks\Clusterers; | ||||||||||||||
|
|
||||||||||||||
| use Rubix\ML\Clusterers\KMedoids; | ||||||||||||||
| use Rubix\ML\Datasets\Generators\Blob; | ||||||||||||||
| use Rubix\ML\Datasets\Generators\Agglomerate; | ||||||||||||||
|
|
||||||||||||||
| /** | ||||||||||||||
| * @Groups({"Clusterers"}) | ||||||||||||||
| * @BeforeMethods({"setUp"}) | ||||||||||||||
| */ | ||||||||||||||
| class KMedoidsBench | ||||||||||||||
| { | ||||||||||||||
| protected const TRAINING_SIZE = 10000; | ||||||||||||||
|
|
||||||||||||||
| protected const TESTING_SIZE = 10000; | ||||||||||||||
|
Comment on lines
+15
to
+17
|
||||||||||||||
| protected const TRAINING_SIZE = 10000; | |
| protected const TESTING_SIZE = 10000; | |
| protected const TRAINING_SIZE = 2000; | |
| protected const TESTING_SIZE = 2000; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| <span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/Clusterers/KMedoids.php">[source]</a></span> | ||
|
|
||
| # K Medoids | ||
| A robust centroid-based hard clustering algorithm that uses actual data points (medoids) as cluster centers instead of computed means. K Medoids is more resistant to outliers than K Means and is suitable for clustering with arbitrary distance metrics. The algorithm minimizes the sum of dissimilarities between samples and their nearest medoid using the Partitioning Around Medoids (PAM) approach. | ||
|
|
||
| **Interfaces:** [Estimator](../estimator.md), [Learner](../learner.md), [Online](../online.md), [Probabilistic](../probabilistic.md), [Persistable](../persistable.md), [Verbose](../verbose.md) | ||
|
|
||
| **Data Type Compatibility:** Continuous | ||
|
|
||
| ## Parameters | ||
| | # | Name | Default | Type | Description | | ||
| |---|---|---|---|---| | ||
| | 1 | k | | int | The number of target clusters. | | ||
| | 2 | batch size | 128 | int | The size of each mini batch in samples. | | ||
|
Comment on lines
+10
to
+14
|
||
| | 3 | epochs | 1000 | int | The maximum number of training rounds to execute. | | ||
| | 4 | min change | 1e-4 | float | The minimum change in the inertia for training to continue. | | ||
| | 5 | window | 5 | int | The number of epochs without improvement in the validation score to wait before considering an early stop. | | ||
| | 6 | kernel | Euclidean | Distance | The distance kernel used to compute the distance between sample points. | | ||
| | 7 | seeder | PlusPlus | Seeder | The seeder used to initialize the cluster medoids. | | ||
|
|
||
| ## Example | ||
| ```php | ||
| use Rubix\ML\Clusterers\KMedoids; | ||
| use Rubix\ML\Kernels\Distance\Euclidean; | ||
| use Rubix\ML\Clusterers\Seeders\PlusPlus; | ||
|
|
||
| $estimator = new KMedoids(3, 128, 300, 10.0, 10, new Euclidean(), new PlusPlus()); | ||
| ``` | ||
|
|
||
| ## Additional Methods | ||
| Return the *k* computed medoids of the training set: | ||
| ```php | ||
| public medoids() : array[] | ||
| ``` | ||
|
|
||
| Return the number of training samples that each medoid is responsible for: | ||
| ```php | ||
| public sizes() : int[] | ||
| ``` | ||
|
|
||
| Return an iterable progress table with the steps from the last training session: | ||
| ```php | ||
| public steps() : iterable | ||
| ``` | ||
|
|
||
| ```php | ||
| use Rubix\ML\Extractors\CSV; | ||
|
|
||
| $extractor = new CSV('progress.csv', true); | ||
|
|
||
| $extractor->export($estimator->steps()); | ||
| ``` | ||
|
|
||
| Return the loss for each epoch from the last training session: | ||
| ```php | ||
| public losses() : float[]|null | ||
| ``` | ||
|
|
||
| ## References | ||
| [^1]: L. Kaufman et al. (1987). Clustering by means of Medoids. | ||
| [^2]: H. S. Park et al. (2009). A simple and fast algorithm for K-medoids clustering. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new feature, it will not be released as a bug fix. This would be released in 2.6 at minimum.