Is your feature request related to a problem? Please describe.
cuML currently enables using nn-descent based ANN for UMAP and HDBSCAN, which has several configuration parameters.
Currently, the parameters don't match across the estimators.
|
- `knn_n_clusters` (int, default=1): Number of clusters for data partitioning. |
|
Higher values reduce memory usage at the cost of accuracy. When `knn_n_clusters > 1`, |
|
HDBSCAN can process data larger than device memory. |
|
|
|
- `knn_overlap_factor` (int, default=2): Number of clusters each data point belongs to. |
|
n_clusters = build_kwds.get("nnd_n_clusters", 1) |
|
overlap_factor = build_kwds.get("nnd_overlap_factor", 2) |
We should unify these for a better UX.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
Is your feature request related to a problem? Please describe.
cuML currently enables using nn-descent based ANN for UMAP and HDBSCAN, which has several configuration parameters.
Currently, the parameters don't match across the estimators.
cuml/python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
Lines 603 to 607 in 72e6458
cuml/python/cuml/cuml/manifold/umap/umap.pyx
Lines 379 to 380 in 72e6458
We should unify these for a better UX.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.