-
Notifications
You must be signed in to change notification settings - Fork 482
Description
/kind bug
What steps did you take and what happened:
I installed the latest version of Katib by cloning the repo's master
tree and running make deploy
against aour OpenShift 4.6.21 cluster.
Then I applied random-example.yaml.
Created experiment remains in Running condition, Trial's pods are not updated with sidecar containers, `deployment/katib-controller' shows logs with following lines:
2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47974: remote error: tls: bad certificate
2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47972: remote error: tls: bad certificate
What did you expect to happen:
Webhook certificates are valid, Trial's pods are injected with metric-gathering sidecars, Experiment successfully gathers metrics and progresses as it should.
Anything else you would like to add:
As a result of job/katib-cert-generator
WebhookConfiguration's .webhooks[].clientConfig.caBundle
are updated with ca.crt
from katib-cert-generator-token
secret, assigned for the SA katib-cert-generator
.
According to documentation on CSR, ServiceAccount's ca.crt
are not guaranteed to verify arbitrary client certificates:
None of these usages are related to ServiceAccount token secrets .data[ca.crt] in any way. That CA bundle is only guaranteed to verify a connection to the API server using the default service (kubernetes.default.svc).
I fetched tls.crt
from secret/katib-webhook-cert
and ca.crt
from secret/katib-cert-generator-token-***
, attached to the corresponding SA. Indeed, the pair is not valid:
[maanur@maanur-notebook katib-webhook-cert]$ openssl verify -verbose -CAfile ca.crt katib.crt
O = system:nodes, CN = system:node:katib-controller.kubeflow.svc
error 20 at 0 depth lookup: unable to get local issuer certificate
error katib.crt: verification failed
Environment:
- Katib version: 86884ca
- Kubeflow version: (not used)
- Kubernetes version: 1.19
- OpenShift version: 4.6.21