Skip to content

[FEA] Standardize configuration parameters for nn-descent based ANN algorithms #7652

@beckernick

Description

@beckernick

Is your feature request related to a problem? Please describe.

cuML currently enables using nn-descent based ANN for UMAP and HDBSCAN, which has several configuration parameters.

Currently, the parameters don't match across the estimators.

- `knn_n_clusters` (int, default=1): Number of clusters for data partitioning.
Higher values reduce memory usage at the cost of accuracy. When `knn_n_clusters > 1`,
HDBSCAN can process data larger than device memory.
- `knn_overlap_factor` (int, default=2): Number of clusters each data point belongs to.

n_clusters = build_kwds.get("nnd_n_clusters", 1)
overlap_factor = build_kwds.get("nnd_overlap_factor", 2)

We should unify these for a better UX.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions