Skip to content

[BUG] ImplicitCF failing if not using stratified split #2024

@daviddavo

Description

@daviddavo

Description

ImplicitCF raises an IndexError if the user appears in the test dataset but not on the training dataset.

How do we replicate the issue?

Split a dataset using a method like TimeSeriesSplit or python_chrono_split. I.e: len(ImplicitCF.interact_status) < len(ImplicitCF.user_idx)

Expected behavior (i.e. solution)

Raisign a meaningful error if the dataset needs to be stratified, or assuming that if the user is not on the ImplicitCF.interact_status table, it should have the empty set of items.

Other Comments

Meanwhile, I solved it by using:

data.interact_status = data.interact_status.reindex(data.user_idx['userID_idx'])
data.interact_status['userID'] = data.interact_status.index
data.interact_status['itemID_interacted'] = data.interact_status['itemID_interacted'].fillna("").apply(set)

This will create a the remaining "empty" users

Or just deleting items in test that don't appear in train

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions