Skip to content

Conversation

tvdboom
Copy link
Collaborator

@tvdboom tvdboom commented May 10, 2023

Closes #3507

Describe the changes you've made

Change the default encoder for high cardinality features from LeaveOneOut to TargetEncoder

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, local variables)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

@tvdboom tvdboom requested review from Yard1 and ngupta23 May 10, 2023 16:04
@@ -104,6 +104,14 @@ def test_assign_index(index):
assert pc.dataset.index[0] != 0


def test_duplicate_columns():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test related to the change to the encoder? Do we need to add a test for the encoding issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unrelated. just a minor check I added. I don;t think we need a test for the encoding since it's no longer an issue. Only if users would select specifically the LeaveOneOut encoder, which will only happen if they understand the encoder, in which case they should see it doesn;t work as expected

@tvdboom tvdboom merged commit 7f57794 into master May 11, 2023
@tvdboom tvdboom deleted the remove_leaveoneout branch May 11, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Data leakage in pycaret 3 classification with unbalanced dataset?
3 participants