Skip to content

Conversation

LennartPurucker
Copy link
Collaborator

Problem that is to be solved:

By default, AutoGluon uses the ParallelFoldFittingStrategy. In edge cases where AutoGluon detects that there is not enough memory to fit folds in parallel, AutoGluon switches to SequentialLocalFoldFittingStrategy instead.

While this is the most time efficient alternative, I observed that this leads to memory leaks after training models; reducing the already too small memory even further in the above-mentioned edge case. Consequently, certain models won’t be fit and the available memory is reduced for the rest of the run of AutoGluon.

The reasons that this happens is that Python's garbage collector works in mysterious and unreliable ways. In this case, it also makes no difference to manually delete objects from memory, use sub-functions, or calling the garbage collector explicitly. The only reliable solution I found so far that avoids memory leakage, is to execute memory-consuming code in a subprocess such that when leaving the subprocess no trace of unwanted memory remains.

Luckily, AutoGluon already has a functionality to fit all models in a subprocess via ray - as it is done in the ParallelFoldFittingStrategy. I.e., usually, we do not have to worry about memory leakage but once we switch to SequentialLocalFoldFittingStrategy, we do need to worry.

Thus, I propose to adjust ParallelFoldFittingStrategy instead of switching to SequentialLocalFoldFittingStrategy in the edge cases where there is not enough memory to fit folds in parallel. In essence, we adjust ParallelFoldFittingStrategy such that it only fits one fold at most in a sequential manner, see the code for more details.

One downside of this PR is that ParallelFoldFittingStrategy introduces a time overhead that makes it less efficient than SequentialLocalFoldFittingStrategy. But considering that we have a serious memory and not a time problem in the above-mentioned edge case, this should be a tolerable trade-off. One additional downside would be debugging, as it becomes harder for the user to debug the edge-case. As the log messages explains, the user can still overrule the behavior in the edge case by passing setting SequentialLocalFoldFittingStrategy to be run for the model in question.

Description of changes:

Adjust the reaction to not having enough memory in bagged_ensemble_model.py and adjust fold_fitting_strategy.py to support a sequential version of ParallelFoldFittingStrategy in a memory-optimal way.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 1.0 Release milestone Oct 20, 2023
@Innixma Innixma added enhancement New feature or request module: tabular labels Oct 20, 2023
@Innixma Innixma self-requested a review October 20, 2023 18:21
Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is an awesome improvement to our memory stability!

@github-actions
Copy link

Job PR-3614-82aa603 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3614/82aa603/index.html

@Innixma Innixma merged commit cac053d into autogluon:master Oct 20, 2023
LennartPurucker added a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024
…gStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel (autogluon#3614)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: tabular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants