-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
First check
- I added a descriptive title to this issue.
- I used the GitHub search to find a similar request and didn't find it.
- I searched the Prefect documentation for this feature.
Prefect Version
2.x
Describe the current behavior
Prefect workers need to be able to uniquely identify a cluster identity, in order to support flow run cancellation. Our existing approach requires that the user installing the Helm chart have permissions to read the kube-system
namespace, and this causes issues in some deployment environments. When running helm template
or using ArgoCD, we have found that the lookup
function always returns an empty value, which means that the worker has no cluster identity at runtime.
Describe the proposed behavior
Either document this UUID or revisit our approach. Perhaps a generated UUID stored in a ConfigMap would be sufficient for our needs.
Example Use
No response
Additional context
Diagnosing the problem
The Helm chart normally sets PREFECT_KUBERNETES_CLUSTER_UID
, however, if it is unset, the worker will try to load it, and likely receive an error similar to the following, due to the missing ClusterRole/ClusterRoleBinding:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect:prefect-worker\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"",
"reason": "Forbidden",
"details": {
"name": "kube-system",
"kind": "namespaces"
},
"code": 403
}
Workaround
The workaround is to manually override the clusterUUID
setting. Administrators are expected to supply a cluster-unique setting; otherwise, cancellation may not behave correctly.
Background on the original implementation
Kubernetes does not provide any generic feature for determining cluster identity. There are various techniques for determining a unique cluster identity, but because this is not standardized, they are all approximations (e.g. using the IP address of the control plane, using a randomly-generated UUID stored in a ConfigMap or similar, etc.)
Because the kube-system
namespace always exists (cannot be deleted) and has a unique ID, some Kubernetes experts have suggested using the kube-system UID metadata to uniquely identify a cluster.
In our Helm chart, we have a lookup
function, which uses the installer's credentials to read the UID during installation time. This is done so that we don't need to grant the worker's Kubernetes service account a ClusterRoleBinding or create a ClusterRole (since namespaces are not themselves namespaced, the ClusterRole needs to allow read access to all namespaces in the cluster).
See also
- This is the source code that retrieves the UID, with fallback to an environment variable: https://github.com/PrefectHQ/prefect-kubernetes/blob/8c33171a7dbe1e2cd304162fcd1331d48cb5248d/prefect_kubernetes/worker.py#L684-L708
- Use cluster uid and namespace instead of cluster "name" for Kubernetes job identifiers #7747
- Setup permissions so that we can read the kube-system namespace to get its UID. prefect-helm#91