Skip to content

[BUG]: problem in anomaly detection using model 'cluster' (CBLOF) when using _kmeans to label observations #3606

@Guillaume-Lombardo

Description

@Guillaume-Lombardo

pycaret version checks

Issue Description

Hello,

In anomaly detection, when using 'cluster' model (CBLOF). The resulting model get a value error when calling clustering_estimator_.predict method. The problem (revealed by function sklearn.cluster._kmeans_lloyd.lloyd_iter_chunked_dense) is present in pycaret (3.0.2 and git@master) bu not in pyod for example.

the problem arise when at least one of the column is float (it doesn't if there all int for instance).

The problem is confirmed with version 3.9 and 3.10 of python.


ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

Reproducible Example

import pandas as pd
from pycaret.anomaly import AnomalyExperiment

data = pd.DataFrame({"col1": [j * 1.3 for j in range(10)]})

exp = AnomalyExperiment()
exp.setup(data, log_experiment=False, session_id=1)
cluster = exp.create_model("cluster")
cluster.clustering_estimator_.predict(data)

Expected Behavior

the result should be an array produce by this code:

import pandas as pd
from pyod.models.cblof import CBLOF

data = pd.DataFrame({"col1": [j * 1.3 for j in range(10)]})

pyod_model = CBLOF()
pyod_model.fit(df)
pyod_model.clustering_estimator_.predict(df)

result:

> array([3, 3, 1, 7, 4, 6, 2, 2, 5, 0], dtype=int32)

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 11
      9 exp.setup(data, log_experiment=False, session_id=1)
     10 cluster = exp.create_model("cluster")
---> 11 cluster.clustering_estimator_.predict(data)

File .../env/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:1061, in _BaseKMeans.predict(self, X, sample_weight)
   1058 X = self._check_test_data(X)
   1059 sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)
-> 1061 labels = _labels_inertia_threadpool_limit(
   1062     X,
   1063     sample_weight,
   1064     self.cluster_centers_,
   1065     n_threads=self._n_threads,
   1066     return_inertia=False,
   1067 )
   1069 return labels

File .../env/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:813, in _labels_inertia_threadpool_limit(X, sample_weight, centers, n_threads, return_inertia)
    811 """Same as _labels_inertia but in a threadpool_limits context."""
    812 with threadpool_limits(limits=1, user_api="blas"):
--> 813     result = _labels_inertia(X, sample_weight, centers, n_threads, return_inertia)
    815 return result
...
    802     inertia = _inertia(X, sample_weight, centers, labels, n_threads)

File sklearn/cluster/_k_means_lloyd.pyx:27, in sklearn.cluster._k_means_lloyd.lloyd_iter_chunked_dense()

ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

Installed Versions

System: python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] executable: .../env/bin/python machine: Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.17

PyCaret required dependencies:
pip: 23.1.2
setuptools: 57.4.0
pycaret: 3.0.2
IPython: 8.13.2
ipywidgets: 8.0.6
tqdm: 4.65.0
numpy: 1.23.5
pandas: 1.5.3
jinja2: 3.1.2
scipy: 1.10.1
joblib: 1.2.0
sklearn: 1.2.1
pyod: 1.0.9
imblearn: 0.10.1
category_encoders: 2.6.1
lightgbm: 3.3.5
numba: 0.57.0
requests: 2.31.0
matplotlib: 3.7.1
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.13.0
kaleido: 0.2.1
statsmodels: 0.14.0
sktime: 0.17.0
tbats: 1.1.3
pmdarima: 2.0.3
psutil: 5.9.5

PyCaret optional dependencies:
shap: Not installed
interpret: Not installed
umap: 0.5.3
pandas_profiling: Not installed
explainerdashboard: Not installed
autoviz: Not installed
fairlearn: Not installed
xgboost: Not installed
catboost: Not installed
kmodes: Not installed
mlxtend: Not installed
statsforecast: Not installed
tune_sklearn: Not installed
ray: Not installed
hyperopt: Not installed
optuna: Not installed
skopt: Not installed
mlflow: 1.30.1
gradio: Not installed
fastapi: Not installed
uvicorn: Not installed
m2cgen: Not installed
evidently: Not installed
fugue: Not installed
streamlit: Not installed
prophet: Not installed

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclusteringTopics related to the clustering

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions