Skip to content

K-Means Initialization with Missing Values #64

@jaanisfehling

Description

@jaanisfehling

Hi, I think it should not be possible to initialize StepMix with a NaN compatible measurement model (e.g. GaussianNan) and init_params="kmeans".
I get the following error:

ValueError: Input X contains NaN.
KMeans does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

It makes sense tho, since the data has NaN values and the default sklearn implementation of k-means does not handle that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions