Skip to content

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Nov 9, 2018

See discussion at #3884. The num_feature variable uses 32-bit integer, which causes XGBoost to fail for datasets with more than 2147483647 features. For now, print a better error message.

@hcho3 hcho3 requested a review from trivialfis November 9, 2018 03:55
@hcho3
Copy link
Collaborator Author

hcho3 commented Nov 9, 2018

@trivialfis FYI

@trivialfis
Copy link
Member

@hcho3 What's preventing using int64_t currently?

@hcho3
Copy link
Collaborator Author

hcho3 commented Nov 9, 2018

@trivialfis XGBoost saves the trained model as binary dump. So you can't easily add or change fields without breaking backward compatibility. As an aside, someone told me today that the use of binary dump is causing endian-compatibility issue as well (model created by little-endian machine doesn't load on big-endian machine). We may want to consider a next generation model format.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hcho3 hcho3 merged commit e38d5a6 into dmlc:master Nov 9, 2018
@hcho3 hcho3 deleted the too_large_dim branch November 9, 2018 08:32
@khotilov
Copy link
Member

khotilov commented Nov 9, 2018

@trivialfis Since dense structures are currently used in many places to index features, using such large indices would be consuming many gigs of memory.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants