Skip to content

Conversation

howard-junec
Copy link
Contributor

@howard-junec howard-junec commented Jun 3, 2025

Issue #, if available:

Description of changes:

This PR implements certificate renewal functionality for Bottlerocket nodes in EKS Anywhere clusters. This is part 3 of the certificate renewal feature, focusing on Bottlerocket-specific implementation. Key features include:

  • Container-based certificate renewal strategy
  • Integration with Bottlerocket's sheltie for privileged operations
  • Structured command builders for reliable operations
  • Secure certificate transfer between host and containers
  • Flexible renewal options:
    • Full cluster renewal (etcd + control plane)
    • Selective renewal of etcd nodes only
    • Selective renewal of control plane nodes only
    • Component selection via --component flag
  • Auto-creates eksa-system namespace if missing for workload cluster certificate renewal

This PR completes the certificate renewal feature by adding Bottlerocket support, complementing the Linux implementation from part 2.

Testing (if applicable):

Documentation added/planned (if applicable):

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

Hi @howard-junec. Thanks for your PR.

I'm waiting for a aws member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eks-distro-bot eks-distro-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 3, 2025
@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 9, 2025
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 9, 2025
@eks-distro-bot eks-distro-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 12, 2025
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025
@eks-distro-bot eks-distro-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 12, 2025
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025
@eks-distro-bot eks-distro-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 8, 2025

func buildBRControlPlaneCopyCertsFromTmpCmd() string {
script := `if [ -d "/run/host-containerd/io.containerd.runtime.v2.task/default/admin/rootfs/tmp/etcd-client-certs" ]; then
cp /run/host-containerd/io.containerd.runtime.v2.task/default/admin/rootfs/tmp/etcd-client-certs/apiserver-etcd-client.crt /var/lib/kubeadm/pki/server-etcd-client.crt || exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need || exit 1 for each command?

if hasExternalEtcd {
script = fmt.Sprintf(`mkdir -p '/etc/kubernetes/pki.bak_%[1]s'
cp -r %[2]s/* '/etc/kubernetes/pki.bak_%[1]s/'
rm -rf '/etc/kubernetes/pki.bak_%[1]s/etcd'`, backupDir, certDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing etcd directory from the backup folder?

--mount type=bind,src=/var/lib/kubeadm,dst=/var/lib/kubeadm,options=rbind:rw \
--mount type=bind,src=/var/lib/kubeadm,dst=/etc/kubernetes,options=rbind:rw \
--rm ${IMAGE_ID} tmp-cert-renew \
/opt/bin/kubeadm certs renew all`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work in case if we have external etcd?

return script
}

func buildBRControlPlaneCheckCertsCmd() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work in case if we have external etcd?

func buildBRControlPlaneBackupCertsCmd(_ string, hasExternalEtcd bool, backupDir, certDir string) string {
var script string
if hasExternalEtcd {
script = fmt.Sprintf(`mkdir -p '/etc/kubernetes/pki.bak_%[1]s'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can skip mkdir


func buildBRReadTmpCertCmd(tempDir string) string {
script := fmt.Sprintf(`sudo sheltie << 'EOF'
cat %s/apiserver-etcd-client.crt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine getting data from key and crt file by passing entire file name with folder?

buildBRControlPlaneRenewCertsCmd(),
buildBRControlPlaneCheckCertsCmd(),
buildBRControlPlaneCopyCertsFromTmpCmd(),
buildBRControlPlaneRestartPodsCmd(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the names to this to make it more readable?

sessionCmds := sheltie(
pullContainerImage(),
backupControlPlaneCerts(component, hasExternalEtcd, b.backup, brControlPlaneCertDir),
renewControlPlaneCerts(),
checkControlPlaneCerts(),
copyExternalEtcdCerts(),
restartControlPlaneStaticPods(),

Can you pl change these names for Ubuntu too in the other PR that you have?

}

logger.V(4).Info("Certificates copied successfully")
logger.V(4).Info("Backup directory", "path", b.backup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these log lines from 199-201.

crtPath := filepath.Join(destDir, "apiserver-etcd-client.crt")
keyPath := filepath.Join(destDir, "apiserver-etcd-client.key")

logger.V(4).Info("Writing certificates to:")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all these logs.


func (b *BottlerocketRenewer) CopyEtcdCerts(ctx context.Context, node string, ssh SSHRunner) error {
logger.V(4).Info("Reading certificate from ETCD node", "node", node)
logger.V(4).Info("Using backup directory", "path", b.backup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this log line?

KEY_END
[ $? -eq 0 ]

chmod 600 "${TARGET_DIR}/apiserver-etcd-client.crt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to give 600 permission?

cat <<'KEY_END' | base64 -d > "${TARGET_DIR}/apiserver-etcd-client.key"
%[3]s
KEY_END
[ $? -eq 0 ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is [ $? -eq 0 ] for?

func (b *BottlerocketRenewer) createTempDirectoryAndWriteCerts(dirName, certBase64, keyBase64 string) string {
script := fmt.Sprintf(`TARGET_DIR="/tmp/%[1]s"
mkdir -p "${TARGET_DIR}"
chmod 755 "${TARGET_DIR}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to give 755 permission here?


func (b *BottlerocketRenewer) copyEtcdCertsToTemp(tempDir string) string {
script := fmt.Sprintf(`cp /var/lib/etcd/pki/apiserver-etcd-client.* %[1]s/
chmod 600 %[1]s/apiserver-etcd-client.crt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we only need to give chmod permission for the key file?

return nil
}

func (b *BottlerocketRenewer) transferCertsToControlPlane(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SInce we have this for both ubuntu and bottlerocket, can we define this method in OSRenewer interface?

@howard-junec howard-junec force-pushed the cert-renewal-bottlerocket branch from 2e25a80 to 5f1bcfb Compare July 11, 2025 22:24

shellCommands := b.sheltie(
b.pullContainerImage(),
b.backupControlPlaneCerts(component, hasExternalEtcd, b.backup, brControlPlaneCertDir),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to get b.backup from backupControlPlaneCerts, can we remove it from the param?

return fmt.Errorf("certificate file is empty")
}

logger.V(4).Info("Reading key from ETCD node", "node", node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this log

if _, err := ssh.RunCommand(ctx, node, b.sheltie(
b.copyEtcdCertsToTemp(brTempDir),
)); err != nil {
return fmt.Errorf("copying certificates to tmp: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify etcd in all the return error of CopyEtcdCerts method?

}

func (b *BottlerocketRenewer) backupEtcdCerts(backupDir string) string {
backupCerts := fmt.Sprintf(`cp -r /var/lib/etcd/pki /var/lib/etcd/pki.bak_%[1]s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use/create constant variable for /var/lib/etcd/pki and use it where possible?

@panktishah26
Copy link
Member

/lgtm
/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: panktishah26

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot merged commit e35e926 into aws:main Jul 14, 2025
6 checks passed
@eks-distro-pr-bot
Copy link
Contributor

@panktishah26: new pull request created: #9958

In response to this:

/cherry-pick release-0.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@panktishah26
Copy link
Member

/cherry-pick release-0.23

@eks-distro-pr-bot
Copy link
Contributor

@panktishah26: new pull request created: #9961

In response to this:

/cherry-pick release-0.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-ok-to-test size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants