Skip to content

[BUG] TimeSeriesPredictor loading from different path than originally saved in fails for tabular predictors #4133

@inigohidalgo

Description

@inigohidalgo

Bug Report Checklist

  • I provided code that demonstrates a minimal reproducible example.
  • I confirmed bug exists on the latest mainline of AutoGluon via source install.
  • I confirmed bug exists on the latest stable version of AutoGluon.

Describe the bug

When training a TimeSeriesPredictor, we are copying the resulting model into blob storage and then downloading it from blob storage into a local directory at prediction time. All the non-tabular models work well, but when the best model is one of RecursiveTabular or DirectTabular, the TimeSeriesPredictor fails as it is unable to find the pkl

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 944, in get_model_pred_dict
    model_pred_dict[model_name] = self._predict_model(
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 874, in _predict_model
    return model.predict(data, known_covariates=known_covariates)
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/models/abstract/abstract_timeseries_model.py", line 298, in predict
    predictions = self._predict(data=data, known_covariates=known_covariates, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/models/multi_window/multi_window_model.py", line 177, in _predict
    return self.most_recent_model.predict(data, known_covariates=known_covariates, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/models/abstract/abstract_timeseries_model.py", line 298, in predict
    predictions = self._predict(data=data, known_covariates=known_covariates, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/models/autogluon_tabular/mlforecast.py", line 467, in _predict
    raw_predictions = self._mlf.models_["mean"].predict(df)
  File "/usr/local/lib/python3.10/site-packages/autogluon/timeseries/models/autogluon_tabular/mlforecast.py", line 55, in predict
    return self.predictor.predict(X).values
  File "/usr/local/lib/python3.10/site-packages/autogluon/tabular/predictor/predictor.py", line 1931, in predict
    return self._learner.predict(X=data, model=model, as_pandas=as_pandas, transform_features=transform_features, decision_threshold=decision_threshold)
  File "/usr/local/lib/python3.10/site-packages/autogluon/tabular/learner/abstract_learner.py", line 208, in predict
    y_pred_proba = self.predict_proba(
  File "/usr/local/lib/python3.10/site-packages/autogluon/tabular/learner/abstract_learner.py", line 189, in predict_proba
    y_pred_proba = self.load_trainer().predict_proba(X, model=model)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 773, in predict_proba
    return self._predict_proba_model(X, model, cascade=cascade)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 2525, in _predict_proba_model
    return self.get_pred_proba_from_model(model=model, X=X, model_pred_proba_dict=model_pred_proba_dict, cascade=cascade)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 787, in get_pred_proba_from_model
    model_pred_proba_dict = self.get_model_pred_proba_dict(X=X, models=models, model_pred_proba_dict=model_pred_proba_dict, cascade=cascade)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1022, in get_model_pred_proba_dict
    model = self.load_model(model_name=model_name)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1651, in load_model
    return model_type.load(path=os.path.join(self.path, path), reset_paths=self.reset_paths)
  File "/usr/local/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1096, in load
    model = load_pkl.load(path=file_path, verbose=verbose)
  File "/usr/local/lib/python3.10/site-packages/autogluon/common/loaders/load_pkl.py", line 43, in load
    with compression_fn_map[compression_fn]["open"](validated_path, "rb", **compression_fn_kwargs) as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4katc9my/models/DirectTabular/W2/tabular_predictor/models/LightGBM/model.pkl'

The temporary path which it is trying to load from is the path in which it was trained in another cloud instance. The way we are persisting these models is by uploading the full model directory, then downloading it into a new path, and attempting to call TimeSeriesPredictor.load(new_path).

Steps to reproduce

  1. Train a TimeSeriesPredictor with DirectTabular enabled
  2. Move the model directory to a different location, ensuring the original path no longer exists
  3. Load from the new directory
  4. Predict using DirectTabular (or the WeightedEnsemble)

The Predictor will try to load the model from the old directory.

This issue only arises with the Tabular models.

Installed Versions

INSTALLED VERSIONS
------------------
date                : 2024-04-24
time                : 12:29:19.525968
python              : 3.10.13.final.0
OS                  : Linux
OS-release          : 5.15.0-1052-azure
Version             : #60-Ubuntu SMP Mon Nov 6 10:08:16 UTC 2023
machine             : x86_64
processor           : x86_64
num_cores           : 32
cpu_ram_mb          : 257926.4453125
cuda version        : None
num_gpus            : 0
gpu_ram_mb          : []
avail_disk_size_mb  : 33082

accelerate          : 0.21.0
autogluon           : None
autogluon.common    : 1.1.0
autogluon.core      : 1.1.0
autogluon.features  : 1.1.0
autogluon.tabular   : 1.1.0
autogluon.timeseries: 1.1.0
boto3               : 1.34.90
catboost            : 1.2.5
fastai              : None
gluonts             : 0.14.3
hyperopt            : 0.2.7
imodels             : None
joblib              : 1.4.0
lightgbm            : 3.3.5
lightning           : 2.1.4
matplotlib          : 3.8.4
mlforecast          : 0.10.0
networkx            : 3.3
numpy               : 1.25.2
onnxruntime-gpu     : None
optimum             : None
optimum-intel       : None
orjson              : 3.10.1
pandas              : 2.0.3
psutil              : 5.9.8
pytorch-lightning   : 2.1.4
ray                 : 2.10.0
requests            : 2.31.0
scikit-learn        : 1.4.0
scikit-learn-intelex: None
scipy               : 1.12.0
setuptools          : 69.5.1
skl2onnx            : None
statsforecast       : 1.4.0
tabpfn              : None
tensorboard         : 2.16.2
torch               : 2.1.2
tqdm                : 4.66.2
transformers        : 4.38.2
utilsforecast       : 0.0.10
vowpalwabbit        : None
xgboost             : 1.7.6

Somewhere there is an absolute path being used instead of a relative path. Have you seen this issue before?

I see for TabularPredictor there are some cloning for deployment methods, and also in autogluon.cloud there is functionality to persist models into s3. Is there a recommended way to do this on our own (using azure blob storage).

Thank you

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmodule: timeseriesrelated to the timeseries module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions