Skip to content

Webhook certificates validation fails #1512

@maanur

Description

@maanur

/kind bug

What steps did you take and what happened:
I installed the latest version of Katib by cloning the repo's master tree and running make deploy against aour OpenShift 4.6.21 cluster.
Then I applied random-example.yaml.
Created experiment remains in Running condition, Trial's pods are not updated with sidecar containers, `deployment/katib-controller' shows logs with following lines:

2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47974: remote error: tls: bad certificate
2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47972: remote error: tls: bad certificate

What did you expect to happen:
Webhook certificates are valid, Trial's pods are injected with metric-gathering sidecars, Experiment successfully gathers metrics and progresses as it should.

Anything else you would like to add:
As a result of job/katib-cert-generator WebhookConfiguration's .webhooks[].clientConfig.caBundle are updated with ca.crt from katib-cert-generator-token secret, assigned for the SA katib-cert-generator.
According to documentation on CSR, ServiceAccount's ca.crt are not guaranteed to verify arbitrary client certificates:

None of these usages are related to ServiceAccount token secrets .data[ca.crt] in any way. That CA bundle is only guaranteed to verify a connection to the API server using the default service (kubernetes.default.svc).

I fetched tls.crt from secret/katib-webhook-cert and ca.crt from secret/katib-cert-generator-token-***, attached to the corresponding SA. Indeed, the pair is not valid:

[maanur@maanur-notebook katib-webhook-cert]$ openssl verify -verbose -CAfile ca.crt katib.crt
O = system:nodes, CN = system:node:katib-controller.kubeflow.svc
error 20 at 0 depth lookup: unable to get local issuer certificate
error katib.crt: verification failed

Environment:

  • Katib version: 86884ca
  • Kubeflow version: (not used)
  • Kubernetes version: 1.19
  • OpenShift version: 4.6.21

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions