Skip to content

Conversation

LennartPurucker
Copy link
Collaborator

@LennartPurucker LennartPurucker commented Oct 21, 2023

Rel.: #2779

Description of changes:

This PR adds support for dynamic stacking. The idea of dynamic stacking is to avoid stacked overfitting (a.k.a. stacked information leakage) by fitting AutoGluon at least twice. All but the last fits are done on a subset of the training data whereby a holdout set is used to determine if stacked overfitting occurs for the provided data. Afterwards, we run the last fit with or without multi-layer stacking depending on whether stacked overfitting has not occurred in a previous fit.

We understand dynamic stacking as a baseline solution to avoid stacked overfitting. Nevertheless, so far, it is the best solution we have benchmarked due to it being most consistent (which is very important for AutoML systems). Moreover, due to having a holdout set as a result of this approach, we have an empirical source of truth to determine if stacked overfitting occurred. While this source of truth may not necessarily generalize to the test data, it has so far still been better than any heuristic alternative or adjustment to the out-of-fold predictions.

The default version of the code in this PR has an accuracy (!preliminary numbers!) of correctly determining whether stacked overfitting occurs for each fold of the AutoML benchmark of ~74% (balanced accuracy: ~69%). This translates into better predictive performance on average.

Code Example

Script (outdated, see tests for newest examples)
import numpy as np
import openml
import pandas as pd
from autogluon.tabular import TabularPredictor


def get_data(tid: int, fold: int):
    # Get Task and dataset from OpenML and return split data
    oml_task = openml.tasks.get_task(tid, download_splits=True, download_data=True, download_qualities=False, download_features_meta_data=False)

    train_ind, test_ind = oml_task.get_train_test_split_indices(fold)
    X, *_ = oml_task.get_dataset().get_data(dataset_format="dataframe")

    return (
        X.iloc[train_ind, :].reset_index(drop=True),
        X.iloc[test_ind, :].reset_index(drop=True),
        oml_task.target_name,
        oml_task.task_type != "Supervised Classification",
    )


def _print_lb(leaderboard, task_id):
    with pd.option_context("display.max_rows", None, "display.max_columns", None, "display.width", 1000):
        print(leaderboard[leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=True))
        print(leaderboard[~leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=False))


def _run(task_id, metric, fold=0, print_res=True, enable_test=False):
    train_data, test_data, label, regression = get_data(task_id, fold)
    n_max_cols = 100
    n_max_train_instances = 500
    n_max_test_instances = 200

    # Sub sample instances
    train_data = train_data.sample(n=min(len(train_data), n_max_train_instances), random_state=0).reset_index(drop=True)
    test_data = test_data.sample(n=min(len(test_data), n_max_test_instances), random_state=0).reset_index(drop=True)

    # Sub sample columns
    cols = list(train_data.columns)
    cols.remove(label)
    if len(cols) > n_max_cols:
        cols = list(np.random.RandomState(42).choice(cols, replace=False, size=n_max_cols))
    train_data = train_data[cols + [label]]
    test_data = test_data[cols + [label]]

    # Run AutoGluon
    print(f"### task {task_id} and fold {fold} and shape {(train_data.shape, test_data.shape)}.")
    predictor = TabularPredictor(eval_metric=metric, label=label, verbosity=2)
    predictor.fit(
        train_data=train_data,
        hyperparameters={
            "FASTAI": [{}],
            "NN_TORCH": [{}],
            "RF": [{}],
            "XT": [{}],
            "GBM": [{}],
        },
        num_bag_sets=2,
        num_bag_folds=2,
        num_gpus=0,
        num_stack_levels=1,
        fit_weighted_ensemble=True,
        dynamic_stacking=True,
        time_limit=240,
        ds_args=dict(use_holdout=True, detection_time_frac=1 / 4, holdout_frac=1 / 9),
    )
    leaderboard = predictor.leaderboard(train_data, silent=True)[["model", "score_test", "score_val"]].sort_values(by="model").reset_index(drop=True)

    if print_res:
        _print_lb(leaderboard, task_id)

    return leaderboard


if __name__ == "__main__":
    _run(359955, "roc_auc")
    _run(146217, "log_loss")
    _run(359938, "mse")

TODOs and Open Questions for Merge

  • Determine how to include this in presets and decide on default split ratio

Write tests:

  • No time limit and time limit
  • Extra fit works with dynamic stacking at first fit
  • holdout, cv, repeated cv, custom val data.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@LennartPurucker LennartPurucker marked this pull request as ready for review October 26, 2023 21:07
@github-actions
Copy link

Job PR-3616-11eef0c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/11eef0c/index.html

@github-actions
Copy link

Job PR-3616-bb16dbc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/bb16dbc/index.html

@github-actions
Copy link

Job PR-3616-fa14d95 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/fa14d95/index.html

@Innixma Innixma added this to the 1.0 Release milestone Oct 27, 2023
@Innixma Innixma added enhancement New feature or request module: tabular priority: 0 Maximum priority labels Oct 27, 2023
@Innixma Innixma self-requested a review October 27, 2023 18:48
Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added initial review

raise ValueError("Unsupported validation procedure during dynamic stacking!")

set_logger_verbosity(ag_fit_kwargs["verbosity"])
org_learner = copy.deepcopy(self._learner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a copy of the learner instead of a copy of the predictor? (both are probably fine, but wanted to know the reasoning)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the only part of the predictor that is affected / needed for a reset to the original state before a fit is the learner. I think copying the predictor would be more expensive (copying useless stuff) than copying the learner. But for safety, we could also copy the predictor and reset the object this way.

@github-actions
Copy link

Job PR-3616-64c98fc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/64c98fc/index.html

@github-actions
Copy link

Job PR-3616-7711a9a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/7711a9a/index.html

@github-actions
Copy link

Job PR-3616-8228cc4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/8228cc4/index.html

@github-actions
Copy link

Job PR-3616-9e20387 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/9e20387/index.html

@github-actions
Copy link

Job PR-3616-6e2aace is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/6e2aace/index.html

@github-actions
Copy link

Job PR-3616-f090dc2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/f090dc2/index.html

@Innixma
Copy link
Contributor

Innixma commented Nov 2, 2023

Benchmark results look good! Thanks for the amazing contribution @LennartPurucker!

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Innixma Innixma merged commit 29dd8d2 into autogluon:master Nov 2, 2023
LennartPurucker added a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: tabular priority: 0 Maximum priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants