Avoid Memory Leak by Using a Sequential Version of ParallelFoldFittingStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel #3614

LennartPurucker · 2023-10-20T17:17:50Z

Problem that is to be solved:

By default, AutoGluon uses the ParallelFoldFittingStrategy. In edge cases where AutoGluon detects that there is not enough memory to fit folds in parallel, AutoGluon switches to SequentialLocalFoldFittingStrategy instead.

While this is the most time efficient alternative, I observed that this leads to memory leaks after training models; reducing the already too small memory even further in the above-mentioned edge case. Consequently, certain models won’t be fit and the available memory is reduced for the rest of the run of AutoGluon.

The reasons that this happens is that Python's garbage collector works in mysterious and unreliable ways. In this case, it also makes no difference to manually delete objects from memory, use sub-functions, or calling the garbage collector explicitly. The only reliable solution I found so far that avoids memory leakage, is to execute memory-consuming code in a subprocess such that when leaving the subprocess no trace of unwanted memory remains.

Luckily, AutoGluon already has a functionality to fit all models in a subprocess via ray - as it is done in the ParallelFoldFittingStrategy. I.e., usually, we do not have to worry about memory leakage but once we switch to SequentialLocalFoldFittingStrategy, we do need to worry.

Thus, I propose to adjust ParallelFoldFittingStrategy instead of switching to SequentialLocalFoldFittingStrategy in the edge cases where there is not enough memory to fit folds in parallel. In essence, we adjust ParallelFoldFittingStrategy such that it only fits one fold at most in a sequential manner, see the code for more details.

One downside of this PR is that ParallelFoldFittingStrategy introduces a time overhead that makes it less efficient than SequentialLocalFoldFittingStrategy. But considering that we have a serious memory and not a time problem in the above-mentioned edge case, this should be a tolerable trade-off. One additional downside would be debugging, as it becomes harder for the user to debug the edge-case. As the log messages explains, the user can still overrule the behavior in the edge case by passing setting SequentialLocalFoldFittingStrategy to be run for the model in question.

Description of changes:

Adjust the reaction to not having enough memory in bagged_ensemble_model.py and adjust fold_fitting_strategy.py to support a sequential version of ParallelFoldFittingStrategy in a memory-optimal way.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Innixma

LGTM, this is an awesome improvement to our memory stability!

github-actions · 2023-10-20T20:12:56Z

Job PR-3614-82aa603 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3614/82aa603/index.html

…gStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel (autogluon#3614)

LennartPurucker added 3 commits October 20, 2023 09:42

add code for pseudo sequential

3ed0b3e

add basic unittest

7deabda

update code docu

82aa603

Innixma added this to the 1.0 Release milestone Oct 20, 2023

Innixma added enhancement New feature or request module: tabular labels Oct 20, 2023

Innixma self-requested a review October 20, 2023 18:21

Innixma approved these changes Oct 20, 2023

View reviewed changes

Innixma merged commit cac053d into autogluon:master Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid Memory Leak by Using a Sequential Version of ParallelFoldFittingStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel #3614

Avoid Memory Leak by Using a Sequential Version of ParallelFoldFittingStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel #3614

Uh oh!

LennartPurucker commented Oct 20, 2023

Uh oh!

Innixma left a comment

Uh oh!

github-actions bot commented Oct 20, 2023

Uh oh!

Uh oh!

Avoid Memory Leak by Using a Sequential Version of ParallelFoldFittingStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel #3614

Avoid Memory Leak by Using a Sequential Version of ParallelFoldFittingStrategy instead of SequentialLocalFoldFittingStrategy If Not Enough Memory to Fit Models in Parallel #3614

Uh oh!

Conversation

LennartPurucker commented Oct 20, 2023

Problem that is to be solved:

Description of changes:

Uh oh!

Innixma left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 20, 2023

Uh oh!

Uh oh!