-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
We are using Dapr Side car in our application and it is deployed in GKE. Our application is based on spring boot. We use KEDA for HPA. While scaling down the pods in GKE environment, we observed 503 http response code for many request. We have configured the dapr.io/graceful-shutdown-seconds for Dapr container and gracefultimeout for our application as well (spring boot).
Analysis of issue
A. Kubernetes Graceful Shutdown Process
- POD is marked in TERMINATING State
- Containers in the POD receives preStop request first via preStop configured. preStop will be received only if preStop is configured.
- After preStop, SIGTERM is sent to Containers. At this stage the containers should not receive any request and it should shut its http gates.
B. What does GKE do when preStop is being executed by containers.
POD is marked in TERMINATING state, the GKE sends preStop requests to containers. While preStop is being handled by Container, the GKE utilizes this time to update its IP table. GKE removes the IP of POD marked in terminating state. While container is handling preStop , GKE can still send new request to this container till it finally removes POD IP. Container can receive new requests as its http gates are still open.
In a way, preStop provides a way to delay the sending of SIGTERM to containers in POD so that IP table can be updated by GKE. Refer below links for more details.
C. What happens when SIGTEMR is received by container
The container shuts down gracefully respecting the time configured. Inbound ports are closed gracefully.
D. Scale down behaviour without Dapr side car
Application with spring has preStop hook. We configure this hook and provide some sleep. During this time GKE updates the IP tables to remove the POD IP.
While this is happening spring container still receives new request as it has not yet closed its http ports and GKE is sending the new requests.
After preStop , Spring Container receives SIGTERM and close its http ports. So execution of preStop hook by spring container provides sufficient time to GKE to update IP tables.
Please refer to
https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace
Refer point no 2 in above URL
For Spring, please refer
https://docs.spring.io/spring-boot/docs/current/reference/html/deployment.html#deployment.cloud.kubernetes.container-lifecycle
E. Scale down behaviour with Dapr side car
During this time GKE is supposed to removes the IP of POD marked TERMINATING.While this is happening the daprd receives the SIGTERM as there is no preStop hook configured for daprd , where as Spring container receives the preStop. Here the Daprd is stopped as it got SIGTERM whereas Spring container is still executing preStop.
There might be some delay in updating IP table by GKE. This delay leads to GKE still sending the new requests to daprd container , although daprd has stopped its gates (as it already received SIGTERM in absence of preStop). This results into 503 response. The execution of preStop hooks by daprd would have provided sufficient time for GKE to update the IP table.
Query
Is there a way to configure preStop in daprd which can provide sufficient time to kubernetes in removing POD IP? Seems like there is no way to configure the preStop in daprd, like spring.