-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[timeseries] Add horizon_weight support for TimeSeriesPredictor #5084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the great PR, I think this is headed in a great direction. Left some comments here and there.
@@ -93,6 +93,14 @@ class TimeSeriesPredictor: | |||
eval_metric_seasonal_period : int, optional | |||
Seasonal period used to compute some evaluation metrics such as mean absolute scaled error (MASE). Defaults to | |||
``None``, in which case the seasonal period is computed based on the data frequency. | |||
horizon_weight : List[float], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is [1, 1, 1, 1] more intuitive than [0.25, 0.25, 0.25, 0.25]
? When one says weight
my mind immediately jumps to something that sums to 1. This is especially apparent if we need lead time: [0, 0, 2, 2]
vs [0, 0, .5, .5]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly about efficiency and numerical stability rather than being intuitive. We always have a matrix errors
of shape [num_items, prediction_length]
. There are two ways to apply different weights per item / per time step.
Current approach (no weights = all weights are equal to 1)
If we have an array horizon_weight
of shape [1, prediction_length]
where values sum up to prediction_length
, and an array item_weight
of shape [num_items, 1]
, where values sum up to num_items
, the averaging logic is quite simple
errors: np.ndarray, # shape [num_items, prediction_length]
if horizon_weight is not None:
errors *= horizon_weight
if item_weight is not None:
errors *= item_weight
return np.nanmean(errors)
Alternative approach (no weights = all weights are equal to 1/prediction_length or 1/num_items)
If instead horizon_weight
and item_weight
each summed up to 1, the code would be a lot less elegant. We won't be able to just optionally multiple the errors with some weights, but rather we will need to perform reduction after each multiplication. We won't be able to use np.average
for this reduction since it doesn't support NaN values, so we will need to us np.nansum
. So the code will look something like
errors: np.ndarray, # shape [num_items, prediction_length]
if horizon_weight is not None:
errors = np.nansum(errors * horizon_weight, axis=1)
else:
errors = np.nanmean(errors, axis=1)
# now `errors` has shape [num_items]
if item_weight is not None:
return np.nansum(errors * item_weight)
else:
return np.nanmean(errors)
This looks less elegant to me.
Also, if the weights are uniform, I suspect that np.nansum(errors * horizon_weight, axis=1)
is less numerically stable than np.nanmean(errors, axis=1)
(especially if prediction_length
is very large).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to the party. I understand it's similar in spirit to how sample_weight
works in most stats software. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that sample_weight
usually adds up to 1, or that it usually adds up to len(y)
? In sklearn
it seems to be latter (https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_sample_weight.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the latter.
fefbc01
to
0ab8fca
Compare
@@ -66,18 +77,16 @@ def __call__( | |||
self, | |||
data: TimeSeriesDataFrame, | |||
predictions: TimeSeriesDataFrame, | |||
prediction_length: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One breaking change is that the prediction_length
must be set as an attribute of the TimeSeriesScorer
.
This, for example, means that changes are required to the code in the tutorials
mse = MeanSquaredError()
mse_score = mse(
data=test_data,
predictions=predictions,
prediction_length=predictor.prediction_length,
target=predictor.target,
)
to
mse = MeanSquaredError(prediction_length=predictor.prediction_length)
mse_score = mse(
data=test_data,
predictions=predictions,
target=predictor.target,
)
We could still allow passing prediction_length
via kwargs
here for backward compatibility (so both options above would work). Do you think this makes sense?
def __call__(..., **kwargs):
prediction_length = kwargs.get("prediction_length", self.prediction_length)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have backward compatibility although I am not sure how many users work directly with metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a backward-compatible option to pass the prediction_length
here.
@@ -699,19 +692,15 @@ def _score_with_predictions( | |||
self, | |||
data: TimeSeriesDataFrame, | |||
predictions: TimeSeriesDataFrame, | |||
metric: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why an AbstractTimeSeriesModel
should be able to score with different metrics (other than its own eval_metric
), so I removed this functionality. We only use it in unit tests, in normal usage scoring with custom metrics is handled by TimeSeriesTrainer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. in the future: why should AbstractTimeSeriesModel
even be aware of its metric :) ?
Job PR-5084-cbd0df2 is done. |
cbd0df2
to
669146c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shchur! Overall looks great to me. Left some comments.
@@ -66,18 +77,16 @@ def __call__( | |||
self, | |||
data: TimeSeriesDataFrame, | |||
predictions: TimeSeriesDataFrame, | |||
prediction_length: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have backward compatibility although I am not sure how many users work directly with metrics.
timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py
Show resolved
Hide resolved
Job PR-5084-669146c is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Job PR-5084-1a0d133 is done. |
Issue #, if available:
Description of changes:
horizon_weight
in time series forecasting metrics #5058, this time foldinghorizon_weight
andeval_metric_seasonal_period
into properties of theTimeSeriesScorer
horizon_weight: list[float] | None
to theTimeSeriesPredictor
and all forecasting metrics that allows assigning custom weights to each time step in the forecast horizon when computing the metric.[num_items, prediction_length]
. We then multiply this array withhorizon_weight.reshape[1, prediction_length]
, before applying the final aggregation step.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.