[tabular] Dynamic Stacking Logging Enhancement #4208

Innixma · 2024-05-18T00:55:33Z

Issue #, if available:
Resolves #4148

Description of changes:

Overhauled the logging for Dynamic Stacking to provide more useful information.
Added option to show ray logs during dynamic stacking: force_ray_logging=True.
Added extra type hints.
Moved system info logging to earlier in fit so it is logged prior to Dynamic Stacking.
Added exception handling if dynamic stacking errors (for example, having negative time remaining after preprocessing).

This PR:

DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
	Running DyStack for up to 15s of the 60s of remaining time (25%).
	Running DyStack sub-fit in a ray process to avoid memory leakage. Logs will not be shown until this process is complete, due to a limitation in ray. You can force logging by specifying `ds_args={'force_ray_logging': True}`.
		Context path: "AutogluonModels/ag-20240518_004300/ds_sub_fit/sub_fit_ho"
Leaderboard on holdout data from dynamic stacking:
                 model  score_holdout  score_val eval_metric  pred_time_test  pred_time_val  fit_time  pred_time_test_marginal  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0       XGBoost_BAG_L1      -0.337663  -0.333345    log_loss        0.202160       0.017601  0.657651                 0.202160                0.017601           0.657651            1       True          2
1  WeightedEnsemble_L3      -0.339735  -0.331134    log_loss        0.360134       0.033437  2.938443                 0.001720                0.000586           0.010818            3       True          6
2  WeightedEnsemble_L2      -0.339999  -0.332223    log_loss        0.351907       0.026324  1.826918                 0.001708                0.000594           0.007622            2       True          3
3       XGBoost_BAG_L2      -0.345056  -0.349410    log_loss        0.372647       0.043121  2.426904                 0.022447                0.017391           0.607608            2       True          5
4      LightGBM_BAG_L2      -0.349683  -0.351669    log_loss        0.358413       0.032851  2.927625                 0.008214                0.007121           1.108329            2       True          4
5      LightGBM_BAG_L1      -0.357196  -0.346560    log_loss        0.148039       0.008130  1.161645                 0.148039                0.008130           1.161645            1       True          1
	1	 = Optimal   num_stack_levels (Stacked Overfitting Occurred: False)
	17s	 = DyStack   runtime |	43s	 = Remaining runtime
Starting main fit with num_stack_levels=1.
	For future fit calls on this dataset, you can skip dynamic stacking to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)`
Beginning AutoGluon training ... Time limit = 43s

Mainline:

Dynamic stacking is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
Detecting stacked overfitting by sub-fitting AutoGluon on the input data. That is, copies of AutoGluon will be sub-fit on subset(s) of the data. Then, the holdout validation data is used to detect stacked overfitting.
Sub-fit(s) time limit is: 60 seconds.
Starting holdout-based sub-fit for dynamic stacking. Context path is: AutogluonModels/ag-20240518_004822/ds_sub_fit/sub_fit_ho.
Running the sub-fit in a ray process to avoid memory leakage.
Spend 17 seconds for the sub-fit(s) during dynamic stacking.
Time left for full fit of AutoGluon: 43 seconds.
Starting full fit now with num_stack_levels 1.
Beginning AutoGluon training ... Time limit = 43s

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

yinweisu · 2024-05-18T01:05:51Z

Previous CI Run	Current CI Run

yinweisu · 2024-05-18T01:16:03Z

Previous CI Run	Current CI Run

rey-allan · 2024-05-20T23:32:19Z

tabular/src/autogluon/tabular/predictor/predictor.py

+        include_gpu_count = False
+        if verbosity >= 3:
+            include_gpu_count = True


nit: this can be done in a more Pythonic way

include_gpu_count = True if verbosity >= 3 else False

Good point! Updated to

include_gpu_count = (verbosity >= 3)

rey-allan · 2024-05-20T23:35:48Z

tabular/src/autogluon/tabular/predictor/predictor.py

+        num_stack_levels = 0 if stacked_overfitting else org_num_stack_levels
+        self._stacked_overfitting_occurred = stacked_overfitting
+
+        # logger.info(f"\tSpent {time_spend_sub_fits}s for the sub-fit(s) during dynamic stacking.")


If this statement isn't needed, it's better to remove it.

Good catch! This one snuck through, removed.

rey-allan

LGTM!

yinweisu · 2024-05-21T01:41:12Z

Previous CI Run	Current CI Run

github-actions · 2024-05-21T04:13:03Z

Job PR-4208-c21c80c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4208/c21c80c/index.html

Innixma added this to the 1.1.1 Release milestone May 18, 2024

Innixma requested review from rey-allan and prateekdesai04 May 18, 2024 00:55

Innixma added 7 commits May 18, 2024 01:06

[tabular] Dynamic Stacking logging improvements

0acaa4a

system logging move to earlier

d0c3112

Improve Dynamic Stacking Logging

3b56ce3

Extra improvements

9219b20

Extra improvements

9e5f756

linting

9b34e6e

holdout_score -> score_holdout

cbc2a81

Innixma force-pushed the dystack_logging_enhancement branch from e5320e5 to cbc2a81 Compare May 18, 2024 01:09

rey-allan reviewed May 20, 2024

View reviewed changes

Addressed comments

2981fe4

rey-allan approved these changes May 21, 2024

View reviewed changes

linting

c21c80c

Innixma merged commit 7b782df into autogluon:master May 21, 2024

Innixma deleted the dystack_logging_enhancement branch April 16, 2025 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tabular] Dynamic Stacking Logging Enhancement #4208

[tabular] Dynamic Stacking Logging Enhancement #4208

Uh oh!

Innixma commented May 18, 2024 •

edited

Loading

Uh oh!

yinweisu commented May 18, 2024

Uh oh!

yinweisu commented May 18, 2024

Uh oh!

rey-allan May 20, 2024

Uh oh!

Innixma May 20, 2024

Uh oh!

rey-allan May 20, 2024

Uh oh!

Innixma May 20, 2024

Uh oh!

rey-allan left a comment

Uh oh!

yinweisu commented May 21, 2024

Uh oh!

github-actions bot commented May 21, 2024

Uh oh!

Uh oh!

[tabular] Dynamic Stacking Logging Enhancement #4208

[tabular] Dynamic Stacking Logging Enhancement #4208

Uh oh!

Conversation

Innixma commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yinweisu commented May 18, 2024

Uh oh!

yinweisu commented May 18, 2024

Uh oh!

rey-allan May 20, 2024

Choose a reason for hiding this comment

Uh oh!

Innixma May 20, 2024

Choose a reason for hiding this comment

Uh oh!

rey-allan May 20, 2024

Choose a reason for hiding this comment

Uh oh!

Innixma May 20, 2024

Choose a reason for hiding this comment

Uh oh!

rey-allan left a comment

Choose a reason for hiding this comment

Uh oh!

yinweisu commented May 21, 2024

Uh oh!

github-actions bot commented May 21, 2024

Uh oh!

Uh oh!

Innixma commented May 18, 2024 •

edited

Loading