Skip to content

Conversation

hakman
Copy link
Member

@hakman hakman commented Jun 22, 2025

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot requested a review from justinsb June 22, 2025 06:37
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2025
@k8s-ci-robot k8s-ci-robot requested a review from rifelpet June 22, 2025 06:37
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 22, 2025
@hakman
Copy link
Member Author

hakman commented Jun 22, 2025

/test pull-kops-e2e-k8s-aws-calico

@hakman
Copy link
Member Author

hakman commented Jun 22, 2025

pull-kops-e2e-k8s-aws-calico succeeded using the new etcd-manager-slim image built with ko.
Minor adjustments are needed, as seen in this POC. The only missing piece is etcd-manager-ctl.

- hostPath:
path: /etc/ssl
type: Directory
name: ca-certificates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little torn about this. On the one hand, it's great that we pick up system ca-certificates so if they update we don't have to do anything (although in practice these don't change fast, but I guess if someone wanted to install their own in-house ca-certificate this now work). On the other hand, using hostPath to pick up ca-certificates isn't the general pattern for other pods, and I don't know if we're going to have new problems.

Given it is etcd-manager, which is really a system component and in another universe could run as a systemd service, I say we run with it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -558,7 +564,7 @@ func (b *EtcdManagerBuilder) buildPod(etcdCluster kops.EtcdClusterSpec, instance
}

{
container.Command = exec.WithTee("/etcd-manager", args, "/var/log/etcd.log")
container.Command = exec.WithTee("/ko-app/etcd-manager", args, "/var/log/etcd.log")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use alsologtostderr and logfile instead of the tee "hack"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal would be to just override args, and rely on Entrypoint being set correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice, if possible.

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

/test pull-kops-e2e-k8s-aws-calico
/test pull-kops-e2e-k8s-gce-cilium

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

/test pull-kops-e2e-k8s-aws-calico
/test pull-kops-e2e-k8s-gce-cilium

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 29, 2025
@hakman hakman changed the title etcd-manager-slim built with ko Update etcd-manager to v3.0.20250629 Jun 29, 2025
@hakman hakman marked this pull request as ready for review June 29, 2025 11:55
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 29, 2025
@hakman hakman requested a review from justinsb June 29, 2025 11:55
@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

@rifelpet @justinsb This should be ready for review now.

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

/test pull-kops-e2e-k8s-aws-amazonvpc

@rifelpet
Copy link
Member

I'm surprised with the bazel-less builds there is a significant increase in image size. We also lose some reproducibility based on some timestamps changing:

registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20250629     e44fd24eed1b   2 weeks ago     232MB
registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20241012     44e2eaa3e9ae   55 years ago    152MB

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

I'm surprised with the bazel-less builds there is a significant increase in image size. We also lose some reproducibility based on some timestamps changing:

registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20250629     e44fd24eed1b   2 weeks ago     232MB
registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20241012     44e2eaa3e9ae   55 years ago    152MB

Bazel builds were based on distroless, now the base is debian12-slim.
Also, binaries are a little bigger now in v3.0.20250629 vs v3.0.20250628.

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

@rifelpet Any idea about the failure in amazonvpc CNI test?

I0629 13:19:57.131784 29505 middleware.go:44] AWS request: EC2 DescribeVolumes
W0629 13:19:57.135973 29505 boot.go:44] error during attempt to bootstrap (will sleep and retry): unable to attach master volumes: error querying for EC2 volumes: operation error EC2: DescribeVolumes, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://ec2.eu-west-1.amazonaws.com/": tls: failed to verify certificate: x509: certificate signed by unknown authority

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

Looks like RHEL/AL2023 use different paths for certs:
https://github.com/golang/go/blob/go1.24.4/src/crypto/x509/root_linux.go#L19-L23

// Possible directories with certificate files; all will be read.
var certDirectories = []string{
        "/etc/ssl/certs",     // SLES10/SLES11, https://golang.org/issue/12139
        "/etc/pki/tls/certs", // Fedora/RHEL
}

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

/test pull-kops-e2e-k8s-aws-amazonvpc

@justinsb
Copy link
Member

Looks like RHEL/AL2023 use different paths for certs:
https://github.com/golang/go/blob/go1.24.4/src/crypto/x509/root_linux.go#L19-L23

One thing we could do is use bind mounting to "fix" this, making sure that the path is always in either a standard place we define, or alternatively bind mounting to e.g. /etc/kubernetes/etcd-manager/ca-certs/ so that we can do per-pod configuration. That way we could have a static manifest, which I personally think is easier to follow. It's overkill for this use case, because etcd-manager comes in so early though, so I think we probably should not do this in this PR (it's also a separable idea, so I think we should separate it into another PR if we do choose to pursue it).

Do you think this PR is ready to merge? It is looking good...

@hakman
Copy link
Member Author

hakman commented Jun 29, 2025

@justinsb I think for sure it can be improved, but we should merge it and get some testgrid feedback.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 29, 2025
@justinsb
Copy link
Member

Thanks @hakman

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 2dbaf8c into kubernetes:master Jun 29, 2025
26 of 27 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Jun 29, 2025
@hakman hakman deleted the etcd-ko-build branch June 29, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants