Skip to content

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented May 18, 2024

Issue #, if available:
Resolves #4148

Description of changes:

  • Overhauled the logging for Dynamic Stacking to provide more useful information.
  • Added option to show ray logs during dynamic stacking: force_ray_logging=True.
  • Added extra type hints.
  • Moved system info logging to earlier in fit so it is logged prior to Dynamic Stacking.
  • Added exception handling if dynamic stacking errors (for example, having negative time remaining after preprocessing).

This PR:

DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
	Running DyStack for up to 15s of the 60s of remaining time (25%).
	Running DyStack sub-fit in a ray process to avoid memory leakage. Logs will not be shown until this process is complete, due to a limitation in ray. You can force logging by specifying `ds_args={'force_ray_logging': True}`.
		Context path: "AutogluonModels/ag-20240518_004300/ds_sub_fit/sub_fit_ho"
Leaderboard on holdout data from dynamic stacking:
                 model  score_holdout  score_val eval_metric  pred_time_test  pred_time_val  fit_time  pred_time_test_marginal  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0       XGBoost_BAG_L1      -0.337663  -0.333345    log_loss        0.202160       0.017601  0.657651                 0.202160                0.017601           0.657651            1       True          2
1  WeightedEnsemble_L3      -0.339735  -0.331134    log_loss        0.360134       0.033437  2.938443                 0.001720                0.000586           0.010818            3       True          6
2  WeightedEnsemble_L2      -0.339999  -0.332223    log_loss        0.351907       0.026324  1.826918                 0.001708                0.000594           0.007622            2       True          3
3       XGBoost_BAG_L2      -0.345056  -0.349410    log_loss        0.372647       0.043121  2.426904                 0.022447                0.017391           0.607608            2       True          5
4      LightGBM_BAG_L2      -0.349683  -0.351669    log_loss        0.358413       0.032851  2.927625                 0.008214                0.007121           1.108329            2       True          4
5      LightGBM_BAG_L1      -0.357196  -0.346560    log_loss        0.148039       0.008130  1.161645                 0.148039                0.008130           1.161645            1       True          1
	1	 = Optimal   num_stack_levels (Stacked Overfitting Occurred: False)
	17s	 = DyStack   runtime |	43s	 = Remaining runtime
Starting main fit with num_stack_levels=1.
	For future fit calls on this dataset, you can skip dynamic stacking to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)`
Beginning AutoGluon training ... Time limit = 43s

Mainline:

Dynamic stacking is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
Detecting stacked overfitting by sub-fitting AutoGluon on the input data. That is, copies of AutoGluon will be sub-fit on subset(s) of the data. Then, the holdout validation data is used to detect stacked overfitting.
Sub-fit(s) time limit is: 60 seconds.
Starting holdout-based sub-fit for dynamic stacking. Context path is: AutogluonModels/ag-20240518_004822/ds_sub_fit/sub_fit_ho.
Running the sub-fit in a ray process to avoid memory leakage.
Spend 17 seconds for the sub-fit(s) during dynamic stacking.
Time left for full fit of AutoGluon: 43 seconds.
Starting full fit now with num_stack_levels 1.
Beginning AutoGluon training ... Time limit = 43s

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 1.1.1 Release milestone May 18, 2024
@yinweisu
Copy link
Contributor

Previous CI Run Current CI Run

@Innixma Innixma force-pushed the dystack_logging_enhancement branch from e5320e5 to cbc2a81 Compare May 18, 2024 01:09
@yinweisu
Copy link
Contributor

Previous CI Run Current CI Run

Comment on lines 959 to 961
include_gpu_count = False
if verbosity >= 3:
include_gpu_count = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be done in a more Pythonic way

include_gpu_count = True if verbosity >= 3 else False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Updated to

include_gpu_count = (verbosity >= 3)

num_stack_levels = 0 if stacked_overfitting else org_num_stack_levels
self._stacked_overfitting_occurred = stacked_overfitting

# logger.info(f"\tSpent {time_spend_sub_fits}s for the sub-fit(s) during dynamic stacking.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this statement isn't needed, it's better to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This one snuck through, removed.

Copy link
Contributor

@rey-allan rey-allan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yinweisu
Copy link
Contributor

Previous CI Run Current CI Run

Copy link

Job PR-4208-c21c80c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4208/c21c80c/index.html

@Innixma Innixma merged commit 7b782df into autogluon:master May 21, 2024
@Innixma Innixma deleted the dystack_logging_enhancement branch April 16, 2025 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] time_limit is displayd wrong in logs
3 participants