Skip to content

Conversation

celestinoxp
Copy link
Contributor

@celestinoxp celestinoxp commented Dec 15, 2023

This PR serves some simple and superficial modifications to pycaret so that it supports scikit-learn 1.4.

Closes #3795

celestinoxp and others added 12 commits December 15, 2023 21:02
base_estimator - This parameter is deprecated. Use estimator instead.

Deprecated since version 1.2: The parameter base_estimator is deprecated in 1.2 and will be removed in 1.4. Use estimator instead.
Fix warning: SAMME.R is deprecated and will be removed in scikit-learn 1.6
Copy link

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a complete review, just two things I noticed.

@adrinjalali
Copy link

Happy to review, but I think you should first figure out dependency issues 😉

ERROR: Ignored the following versions that require a different python version: 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.11.3 Requires-Python <3.13,>=3.9; 1.11.4 Requires-Python >=3.9; 1.12.0 Requires-Python >=3.9; 1.12.0rc1 Requires-Python >=3.9; 1.12.0rc2 Requires-Python >=3.9; 1.4.0 Requires-Python >=3.9; 1.4.0rc1 Requires-Python >=3.9; 8.13.1 Requires-Python >=3.9; 8.13.2 Requires-Python >=3.9; 8.14.0 Requires-Python >=3.9; 8.15.0 Requires-Python >=3.9; 8.16.0 Requires-Python >=3.9; 8.16.1 Requires-Python >=3.9; 8.17.0 Requires-Python >=3.9; 8.17.1 Requires-Python >=3.9; 8.17.2 Requires-Python >=3.9; 8.18.0 Requires-Python >=3.9; 8.18.1 Requires-Python >=3.9; 8.19.0 Requires-Python >=3.10; 8.20.0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement scikit-learn<1.5.0,>=1.4.0 (from pycaret) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.17, 0.17.1, 0.18, 0.18.1, 0.18.2, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21.1, 0.21.2, 0.21.3, 0.22, 0.22.1, 0.22.2, 0.22.2.post1, 0.23.0, 0.23.1, 0.23.2, 0.24.0, 0.24.1, 0.24.2, 1.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.2.0rc1, 1.2.0, 1.2.1, 1.2.2, 1.3.0rc1, 1.3.0, 1.3.1, 1.3.2)
ERROR: No matching distribution found for scikit-learn<1.5.0,>=1.4.0

Once the CI runs, happy to be pinged for a review. Or if you have any particular questions, also happy to jump in.

@celestinoxp
Copy link
Contributor Author

@adrinjalali i think problem is pyod is a depencencie of pycaret and not support scikit-learn 1.4.
pycaret/internal/metrics.py needs some changes... i receive warnings about pos_label and score down to half :(
can you fix it ?

@adrinjalali
Copy link

In the latest release, you get warnings when pos_label is not very clear form the data, which is mostly the case. You need to introduce that on the user end and fix tests to make sure it's always passed.

pyod is blockig github tests, so it is deactivated unti support scikit-learn 1.4
I thought that the use_line_collection error was from the shap package, but after all it is reproduced by matplotlib and the cause is in the yellowbrics package, that is, yellowbrics should remove use_line_collection.
I opened an issue: DistrictDataLabs/yellowbrick#1312 and as soon as it is resolved, we should update the yellowbrics version
@celestinoxp
Copy link
Contributor Author

@Yard1 @adrinjalali @glemaitre I'm testing on my laptop because scikit-learn 1.4 has a bug that has already been fixed, but we can only use 1.4.1, so I'm using the nightly version. It's embarrassing that this PR is stuck while scikit-learn 1.4.1 is not released, even worse is that I don't know how to solve it!!! So I ask for your help to solve this problem (if you can analyze the code on your machines):

image

@adrinjalali
Copy link

That doesn't look like a bug to me at all. It seems it's because you are trying to get predict_proba from a pipeline where the last step of the pipeline doesn't implement predict_proba.

@celestinoxp
Copy link
Contributor Author

celestinoxp commented Feb 18, 2024

@celestinoxp
Copy link
Contributor Author

celestinoxp commented Feb 18, 2024

time-series failling! can you help to fix this pr? @ngupta23 @fkiraly @yarnabrina

@celestinoxp celestinoxp marked this pull request as ready for review February 18, 2024 23:47
@yarnabrina
Copy link

Hi @celestinoxp, it seems that there is just one failure.

=========================== short test summary info ============================
FAILED tests/test_optimize_threshold.py::test_optimize_threshold - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
= 1 failed, 882 passed, 2 skipped, 166 deselected, 17864 warnings in 4235.05s (1:10:35) =

(Ref. https://github.com/pycaret/pycaret/actions/runs/7958439755/job/21723320810?pr=3857#step:6:3841)

And the logs suggest it's from classification.

    # train model
    lr = pycaret.classification.create_model("lr")

    # optimize threshold
  optimized_data, optimized_model = pycaret.classification.optimize_threshold(
        lr, return_data=True
    )

(Ref. https://github.com/pycaret/pycaret/actions/runs/7958439755/job/21723320810?pr=3857#step:6:95)

It does not look like sktime related. I'm not much familiar with pycaret's testing framework so may have missed something, so please let me know if that's the case.

@celestinoxp
Copy link
Contributor Author

celestinoxp commented Feb 19, 2024

Hi @celestinoxp, it seems that there is just one failure.

=========================== short test summary info ============================
FAILED tests/test_optimize_threshold.py::test_optimize_threshold - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
= 1 failed, 882 passed, 2 skipped, 166 deselected, 17864 warnings in 4235.05s (1:10:35) =

(Ref. https://github.com/pycaret/pycaret/actions/runs/7958439755/job/21723320810?pr=3857#step:6:3841)

And the logs suggest it's from classification.

    # train model
    lr = pycaret.classification.create_model("lr")

    # optimize threshold
  optimized_data, optimized_model = pycaret.classification.optimize_threshold(
        lr, return_data=True
    )

(Ref. https://github.com/pycaret/pycaret/actions/runs/7958439755/job/21723320810?pr=3857#step:6:95)

It does not look like sktime related. I'm not much familiar with pycaret's testing framework so may have missed something, so please let me know if that's the case.

Hi @yarnabrina At the time I asked for help I was having a timeseries error, but I think I've already solved it... however now this error has appeared... ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I still haven't found where the problem comes from...

It's a shame that so far no one has collaborated to resolve this PR... the funny thing is that I'm just a self-taught person without enough knowledge for a PR of this magnitude...

@celestinoxp
Copy link
Contributor Author

how to fix this? FAILED tests/test_optimize_threshold.py::test_optimize_threshold - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your hard work! If tests pass I'll merge and release.

@Yard1 Yard1 merged commit 401fc19 into pycaret:master Feb 20, 2024
@celestinoxp celestinoxp deleted the support-scikit-learn-1.4 branch February 22, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: pycaret will need changes to support next version of scikit-learn (1.4)
5 participants