Skip to content

Kubernetes startup watch may never terminate if there is a Pod error #258

@Skazza94

Description

@Skazza94

The current implementation of the KubernetesMachine._wait_machines_startup method continuously loops on watch events from list_namespaced_pod. In specific cases, such as critical Pod errors (like CNI errors), no further events are generated.

Consequently, the for loop runs forever, causing the program to hang indefinitely.

To resolve this issue, it is necessary to introduce a mechanism that breaks the loop after a defined threshold. Our approach involves utilizing threading.Timer to establish a 3-minute timer. This timer will be reset upon receiving each new event. However, if no events occur within the 3-minute interval, the callback will be triggered, signaling an error and terminating the program.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions