Skip to content

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented Apr 23, 2025

Issue #, if available:

Description of changes:

  • Second attempt at [timeseries] Add support for horizon_weight in time series forecasting metrics #5058, this time folding horizon_weight and eval_metric_seasonal_period into properties of the TimeSeriesScorer
  • Add an optional argument horizon_weight: list[float] | None to the TimeSeriesPredictor and all forecasting metrics that allows assigning custom weights to each time step in the forecast horizon when computing the metric.
  • When computing all metrics, we create an array of raw error values of shape [num_items, prediction_length]. We then multiply this array with horizon_weight.reshape[1, prediction_length], before applying the final aggregation step.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur requested a review from canerturkmen April 23, 2025 14:58
Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great PR, I think this is headed in a great direction. Left some comments here and there.

@@ -93,6 +93,14 @@ class TimeSeriesPredictor:
eval_metric_seasonal_period : int, optional
Seasonal period used to compute some evaluation metrics such as mean absolute scaled error (MASE). Defaults to
``None``, in which case the seasonal period is computed based on the data frequency.
horizon_weight : List[float], optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is [1, 1, 1, 1] more intuitive than [0.25, 0.25, 0.25, 0.25]? When one says weight my mind immediately jumps to something that sums to 1. This is especially apparent if we need lead time: [0, 0, 2, 2] vs [0, 0, .5, .5].

Copy link
Collaborator Author

@shchur shchur Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly about efficiency and numerical stability rather than being intuitive. We always have a matrix errors of shape [num_items, prediction_length]. There are two ways to apply different weights per item / per time step.

Current approach (no weights = all weights are equal to 1)

If we have an array horizon_weight of shape [1, prediction_length] where values sum up to prediction_length, and an array item_weight of shape [num_items, 1], where values sum up to num_items, the averaging logic is quite simple

errors: np.ndarray,  # shape [num_items, prediction_length]
if horizon_weight is not None:
    errors *= horizon_weight
if item_weight is not None:
    errors *= item_weight
return np.nanmean(errors)

Alternative approach (no weights = all weights are equal to 1/prediction_length or 1/num_items)

If instead horizon_weight and item_weight each summed up to 1, the code would be a lot less elegant. We won't be able to just optionally multiple the errors with some weights, but rather we will need to perform reduction after each multiplication. We won't be able to use np.average for this reduction since it doesn't support NaN values, so we will need to us np.nansum. So the code will look something like

errors: np.ndarray,  # shape [num_items, prediction_length]
if horizon_weight is not None:
    errors = np.nansum(errors * horizon_weight, axis=1)
else:
    errors = np.nanmean(errors, axis=1)
# now `errors` has shape [num_items]
if item_weight is not None:
    return np.nansum(errors * item_weight)
else:
    return np.nanmean(errors)

This looks less elegant to me.

Also, if the weights are uniform, I suspect that np.nansum(errors * horizon_weight, axis=1) is less numerically stable than np.nanmean(errors, axis=1) (especially if prediction_length is very large).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late to the party. I understand it's similar in spirit to how sample_weight works in most stats software. LGTM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that sample_weight usually adds up to 1, or that it usually adds up to len(y)? In sklearn it seems to be latter (https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_sample_weight.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the latter.

@shchur shchur force-pushed the embedded-horizon-weight branch from fefbc01 to 0ab8fca Compare April 24, 2025 09:12
@@ -66,18 +77,16 @@ def __call__(
self,
data: TimeSeriesDataFrame,
predictions: TimeSeriesDataFrame,
prediction_length: int = 1,
Copy link
Collaborator Author

@shchur shchur Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One breaking change is that the prediction_length must be set as an attribute of the TimeSeriesScorer.

This, for example, means that changes are required to the code in the tutorials

mse = MeanSquaredError()
mse_score = mse(
  data=test_data,
  predictions=predictions,
  prediction_length=predictor.prediction_length,
  target=predictor.target,
)

to

mse = MeanSquaredError(prediction_length=predictor.prediction_length)
mse_score = mse(
  data=test_data,
  predictions=predictions,
  target=predictor.target,
)

We could still allow passing prediction_length via kwargs here for backward compatibility (so both options above would work). Do you think this makes sense?

def __call__(..., **kwargs):
    prediction_length = kwargs.get("prediction_length", self.prediction_length)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have backward compatibility although I am not sure how many users work directly with metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a backward-compatible option to pass the prediction_length here.

@@ -699,19 +692,15 @@ def _score_with_predictions(
self,
data: TimeSeriesDataFrame,
predictions: TimeSeriesDataFrame,
metric: Optional[str] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why an AbstractTimeSeriesModel should be able to score with different metrics (other than its own eval_metric), so I removed this functionality. We only use it in unit tests, in normal usage scoring with custom metrics is handled by TimeSeriesTrainer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. in the future: why should AbstractTimeSeriesModel even be aware of its metric :) ?

Copy link

Job PR-5084-cbd0df2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-5084/cbd0df2/index.html

@shchur shchur force-pushed the embedded-horizon-weight branch from cbd0df2 to 669146c Compare April 25, 2025 07:35
@shchur shchur requested a review from abdulfatir April 25, 2025 07:44
Copy link
Collaborator

@abdulfatir abdulfatir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shchur! Overall looks great to me. Left some comments.

@@ -66,18 +77,16 @@ def __call__(
self,
data: TimeSeriesDataFrame,
predictions: TimeSeriesDataFrame,
prediction_length: int = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have backward compatibility although I am not sure how many users work directly with metrics.

Copy link

Job PR-5084-669146c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-5084/669146c/index.html

Copy link
Collaborator

@abdulfatir abdulfatir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@canerturkmen canerturkmen added module: timeseries related to the timeseries module enhancement New feature or request labels Apr 25, 2025
@canerturkmen canerturkmen added this to the 1.3 Release milestone Apr 25, 2025
@shchur shchur merged commit 32fcbab into autogluon:master Apr 25, 2025
25 checks passed
@shchur shchur deleted the embedded-horizon-weight branch April 25, 2025 13:32
Copy link

Job PR-5084-1a0d133 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-5084/1a0d133/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: timeseries related to the timeseries module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants