-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pycaret version checks
-
I have checked that this issue has not already been reported here.
-
I have confirmed this bug exists on the latest version of pycaret.
-
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).
Issue Description
I am getting a test AUC of 1 with tree based models with random data.
models tested rt, xgboost, lightgbm, et, dt
Reproducible Example
from pycaret.classification import *
# Import libraries
import numpy as np
import pandas as pd
# Set random seed for reproducibility
np.random.seed(42)
# Define number of samples
N = 30000
# Generate features
numeric_feature = np.random.normal(0, 1, N)
cat_feature = np.random.choice(31, N).astype(int)
cat_feature2 = np.random.choice(2, N).astype(str)
# Combine features into a single matrix
X = np.column_stack((cat_feature, cat_feature2))
# Generate target variable
y = np.random.binomial(1, 0.2, N)
# Convert to DataFrame and add target variable
df = pd.DataFrame(X, columns=['categorical-cardinal', 'categorical-binary'])
df['target'] = y
# Print the first five rows of the dataset
print("Synthetic Dataset for Machine Learning:\n")
print(df.head())
setup(df, target='target', numeric_features=[], categorical_features=['categorical-cardinal', 'categorical-binary'])
model = create_model('lightgbm')
plot_model(model, 'auc')
predict_model(model)
Expected Behavior
The test auc should be around 0.5, as in pycaret version 2.
Actual Results
No error message, but unrealistic results.
Installed Versions
PyCaret required dependencies:
pip: 23.0.1
setuptools: 66.0.0
pycaret: 2.1.post14082020
IPython: 7.34.0
ipywidgets: 7.7.5
tqdm: 4.64.1
numpy: 1.23.5
pandas: 1.5.3
jinja2: 3.1.2
scipy: 1.9.3
joblib: 1.2.0
sklearn: 1.2.2
pyod: 1.0.9
imblearn: 0.10.1
category_encoders: 2.6.0
lightgbm: 3.3.5
numba: 0.56.4
requests: 2.28.2
matplotlib: 3.6.3
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.14.1
kaleido: 0.2.1
statsmodels: 0.13.5
sktime: 0.17.0
tbats: 1.1.3
pmdarima: 2.0.3
psutil: 5.9.5
PyCaret optional dependencies:
shap: 0.41.0
interpret: 0.3.2
umap: 0.5.3
pandas_profiling: 3.6.6
explainerdashboard: 0.4.2.1
autoviz: 0.1.601
fairlearn: 0.7.0
xgboost: 1.7.5
catboost: 1.1.1
kmodes: 0.12.2
mlxtend: 0.22.0
statsforecast: 1.5.0
tune_sklearn: 0.4.5
ray: 2.3.1
hyperopt: 0.2.7
optuna: 3.1.1
skopt: 0.9.0
mlflow: 1.30.1
gradio: 3.27.0
fastapi: 0.95.1
uvicorn: 0.21.1
m2cgen: 0.10.0
evidently: 0.3.0
fugue: 0.8.3
streamlit: Not installed
prophet: Not installed