-
Notifications
You must be signed in to change notification settings - Fork 857
Description
Note
If you want to skip a bunch of discussion and see our @indexjoseph's proposed design, see #3008 (comment)
Is your feature request related to a problem? Please describe.
During a scheduled in-game events or new version releases, we see pretty rapid spikes in usage of either an already high use fleet or a newer unused fleet. Both of these events we know the timing of, and our current options are either:
- Prescale agressively: this works but means that unless we are building scheduling logic ourselves to undo the additional scale afterwards, we're paying for a lot of unused capacity.
- Webhook autoscaler: this is a viable solution, but requires us to build a service to do so.
Describe the solution you'd like
Introduce the concept of scheduled overrides that contain the following:
- a start time(in UTC)
- an end time(in UTC)
- a priority int(higher the better, much like PriorityClasses)
- a buffer autoscaler block
Then on autoscaling evaluation:
- collect those overrides for which we are between the start and end time
a. if there are no matching overrides, just use the default autoscaling rule - of those select the highest priority
- apply that buffer autoscaling rule instead of the default
This would allow us to set special scaling windows for events or new version releases. A further extension could be to allow recurring windows to do time of day scheduling so that we could have a buffer window in the off hours and a percentage during higher usage, which could help with issues like that described in #2504
Describe alternatives you've considered
As described at the top, we can either prescale agressively, which either results in us adjusting the autoscaler directly, or using the webhook autoscaler.