-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Description
ImplicitCF raises an IndexError
if the user appears in the test dataset but not on the training dataset.
How do we replicate the issue?
Split a dataset using a method like TimeSeriesSplit or python_chrono_split. I.e: len(ImplicitCF.interact_status) < len(ImplicitCF.user_idx)
Expected behavior (i.e. solution)
Raisign a meaningful error if the dataset needs to be stratified, or assuming that if the user is not on the ImplicitCF.interact_status
table, it should have the empty set of items.
Other Comments
Meanwhile, I solved it by using:
data.interact_status = data.interact_status.reindex(data.user_idx['userID_idx'])
data.interact_status['userID'] = data.interact_status.index
data.interact_status['itemID_interacted'] = data.interact_status['itemID_interacted'].fillna("").apply(set)
This will create a the remaining "empty" users
Or just deleting items in test that don't appear in train
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working