-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pycaret version checks
-
I have checked that this issue has not already been reported here.
-
I have confirmed this bug exists on the latest version of pycaret.
-
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).
Issue Description
Hello,
In anomaly detection, when using 'cluster' model (CBLOF). The resulting model get a value error when calling clustering_estimator_.predict method. The problem (revealed by function sklearn.cluster._kmeans_lloyd.lloyd_iter_chunked_dense) is present in pycaret (3.0.2 and git@master) bu not in pyod for example.
the problem arise when at least one of the column is float (it doesn't if there all int for instance).
The problem is confirmed with version 3.9 and 3.10 of python.
ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
Reproducible Example
import pandas as pd
from pycaret.anomaly import AnomalyExperiment
data = pd.DataFrame({"col1": [j * 1.3 for j in range(10)]})
exp = AnomalyExperiment()
exp.setup(data, log_experiment=False, session_id=1)
cluster = exp.create_model("cluster")
cluster.clustering_estimator_.predict(data)
Expected Behavior
the result should be an array produce by this code:
import pandas as pd
from pyod.models.cblof import CBLOF
data = pd.DataFrame({"col1": [j * 1.3 for j in range(10)]})
pyod_model = CBLOF()
pyod_model.fit(df)
pyod_model.clustering_estimator_.predict(df)
result:
> array([3, 3, 1, 7, 4, 6, 2, 2, 5, 0], dtype=int32)
Actual Results
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[10], line 11
9 exp.setup(data, log_experiment=False, session_id=1)
10 cluster = exp.create_model("cluster")
---> 11 cluster.clustering_estimator_.predict(data)
File .../env/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:1061, in _BaseKMeans.predict(self, X, sample_weight)
1058 X = self._check_test_data(X)
1059 sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)
-> 1061 labels = _labels_inertia_threadpool_limit(
1062 X,
1063 sample_weight,
1064 self.cluster_centers_,
1065 n_threads=self._n_threads,
1066 return_inertia=False,
1067 )
1069 return labels
File .../env/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:813, in _labels_inertia_threadpool_limit(X, sample_weight, centers, n_threads, return_inertia)
811 """Same as _labels_inertia but in a threadpool_limits context."""
812 with threadpool_limits(limits=1, user_api="blas"):
--> 813 result = _labels_inertia(X, sample_weight, centers, n_threads, return_inertia)
815 return result
...
802 inertia = _inertia(X, sample_weight, centers, labels, n_threads)
File sklearn/cluster/_k_means_lloyd.pyx:27, in sklearn.cluster._k_means_lloyd.lloyd_iter_chunked_dense()
ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
Installed Versions
PyCaret required dependencies:
pip: 23.1.2
setuptools: 57.4.0
pycaret: 3.0.2
IPython: 8.13.2
ipywidgets: 8.0.6
tqdm: 4.65.0
numpy: 1.23.5
pandas: 1.5.3
jinja2: 3.1.2
scipy: 1.10.1
joblib: 1.2.0
sklearn: 1.2.1
pyod: 1.0.9
imblearn: 0.10.1
category_encoders: 2.6.1
lightgbm: 3.3.5
numba: 0.57.0
requests: 2.31.0
matplotlib: 3.7.1
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.13.0
kaleido: 0.2.1
statsmodels: 0.14.0
sktime: 0.17.0
tbats: 1.1.3
pmdarima: 2.0.3
psutil: 5.9.5
PyCaret optional dependencies:
shap: Not installed
interpret: Not installed
umap: 0.5.3
pandas_profiling: Not installed
explainerdashboard: Not installed
autoviz: Not installed
fairlearn: Not installed
xgboost: Not installed
catboost: Not installed
kmodes: Not installed
mlxtend: Not installed
statsforecast: Not installed
tune_sklearn: Not installed
ray: Not installed
hyperopt: Not installed
optuna: Not installed
skopt: Not installed
mlflow: 1.30.1
gradio: Not installed
fastapi: Not installed
uvicorn: Not installed
m2cgen: Not installed
evidently: Not installed
fugue: Not installed
streamlit: Not installed
prophet: Not installed