Skip to content

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented Feb 24, 2025

Issue #, if available: Fixes #4914

Description of changes:

  • MLflow autologging can break AutoGluon model pickling, completely breaking the training process. This PR logs a warning if MLflow autologging is detected.

autologging is enabled on platforms like databricks, so users often encounter this issue with AG.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

except Exception:
# gracefully handle cases where autologging_is_disabled is not available
pass
except Exception:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could, in theory, be narrowed down to an ImportWarning, but I don't want to accidentally break the entire AG training process if import mlflow fails with some other spectacular exception.

@shchur shchur changed the title Log a wraning if mlflow autologging is enabled [common] Log a wraning if mlflow autologging is enabled Feb 24, 2025
@shchur shchur changed the title [common] Log a wraning if mlflow autologging is enabled [common] Log a warning if mlflow autologging is enabled Feb 24, 2025
@@ -193,3 +193,32 @@ def reset_logger_for_remote_call(verbosity: int):
# Limiting the verbosity of the BaggedEnsembleModel logger to 10 to avoid
# duplicated messages about fitting.
set_logger_verbosity(verbosity=min(verbosity, 1), logger=bem_logger)


def warn_if_mlflow_autologging_is_enabled(logger: Optional[logging.Logger] = None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas for how/whether we can test this method in our unittests without installing mlflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather test the good path?

(pseudocode)

def test():
    try:
         import mlflow
         pytest.fail("mlflow not expected to be installed in test env")
    except:
         assert warn_if_mlflow_autologging_is_enabled()  does not log

Copy link

Job PR-4925-ef0b265 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4925/ef0b265/index.html

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as is. Couple of comments. Thanks!

@@ -1043,6 +1043,7 @@ def fit(

verbosity = kwargs.get("verbosity", self.verbosity)
set_logger_verbosity(verbosity)
warn_if_mlflow_autologging_is_enabled(logger=logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it enough to call this in fit? why not somewhere in an init file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patched metrics only cause problems when saving objects with pickle (which is only required during training). Calling methods like leaderboard or evaluate should work normally.

@@ -193,3 +193,32 @@ def reset_logger_for_remote_call(verbosity: int):
# Limiting the verbosity of the BaggedEnsembleModel logger to 10 to avoid
# duplicated messages about fitting.
set_logger_verbosity(verbosity=min(verbosity, 1), logger=bem_logger)


def warn_if_mlflow_autologging_is_enabled(logger: Optional[logging.Logger] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather test the good path?

(pseudocode)

def test():
    try:
         import mlflow
         pytest.fail("mlflow not expected to be installed in test env")
    except:
         assert warn_if_mlflow_autologging_is_enabled()  does not log

@shchur shchur force-pushed the warn-if-mlflow-autologging branch from ef0b265 to 06e0679 Compare March 3, 2025 09:50
Copy link

github-actions bot commented Mar 3, 2025

Job PR-4925-06e0679 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4925/06e0679/index.html

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shchur shchur merged commit 4b90af4 into autogluon:master Mar 4, 2025
28 checks passed
@shchur shchur deleted the warn-if-mlflow-autologging branch April 2, 2025 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] TimeSeriesPredictor problems training tree-based models RecursiveTabular, DirectTabular and ChronosFineTuned[bolt_small]
3 participants