-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[tabular] AutoGluon Distributed #4606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Job PR-4606-9e7b9c3 is done. |
Job PR-4606-35bdd07 is done. |
Job PR-4606-2a1dc74 is done. |
Job PR-4606-134e77c is done. |
Job PR-4606-c46a88a is done. |
Job PR-4606-f38572c is done. |
Job PR-4606-c2dc07f is done. |
At this point, I am considering not adding distributed prediction to this PR. The use case for distributed predicting is still vague to me (besides Kaggle competitions). Moreover, this would be a nice / cleaner new PR by chance (maybe after 1.2). |
Marking PR as ready for review. I have addressed many of the prior limitations / TODOs. Benchmark results on TabRepo with m6i.16xlarge (64 CPU cores)Parallel logic has zero failures across 1464 tasks. Parallel logic produces identical* results to sequential if both are given infinite time, but parallel trains over 2x faster. *With the exception of NeuralNetFastAI, which differs depending on how many CPU cores were used to train it, however the results are not better nor worse on average. Elo Table on TabRepo datasets >=10000 samples (258 tasks):Note: "pr4606" == parallel mode, "pr4606_seq" == sequential mode, aka mainline
Parallel mode for 4 hour runtime has an increase of 62 elo compared to sequential. 1 hour runtime has an increase of 17 elo compared to sequential. Current limitations
Other notes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise LGTM!
Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
Job PR-4606-6cb3e8d is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Approving after extensive benchmarking and testing.
For documentation purposes: https://gist.github.com/LennartPurucker/61357dd53efadd0e72fbee7986e2b025 |
Description of changes:
This PR starts with the implementation of distributed AutoGluon, which is based on our Kaggle Grand Prix code.
I am adding some documentation below. Feel free to ignore this for now, as it is mostly documentation for me.
Road Map
Open Problems / Questions
Minor Open Questions (can be ignored for a merge)
_add_model
multiple times?Local Testing Script:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.