-
Notifications
You must be signed in to change notification settings - Fork 858
Description
Is your feature request related to a problem? Please describe.
The current HA solution for the allocator is implemented with a service and at least 2 replicas of the allocator that process the received gsa at intervals of time.
This approach is not well suited for high allocations rate and high capacity servers because the allocator compete to update the same server CRD thus generating update failures and many retries leading eventually to spreading of the sessions on multiple servers and high allocations time. Even with one session per server, the allocators can compete on the same server.
Using one allocator (especially with batching enabled) with a fine tuned batch wait time gives better results but the HA policy is downgraded to just restarting the pod when needed.
The scenarios we need to support would be:
- one node goes down with the allocator on it
- planned maintenance when the allocator is restarted to be updated
- allocator crashes
Describe the solution you'd like
The perfect solution would be to find a design where multiple allocators are running, sharing the load when needed.
Describe alternatives you've considered
Possible solutions:
- master/slave allocators implemented using the readiness check of each pod and a leader election scheme
- pub/sub instead of channel for consuming the allocations
- shared cache and list for servers state between allocators
Additional context
Add any other context or screenshots about the feature request here.
Link to the Agones Feature Proposal (if any)
None
Discussion Link (if any)
There have been discussions here: #4176 (comment)