Skip to content

Ubuntu/Rhel Certificate Renewal Implementation (Part 2 of the certificate renewal command feature) #9781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 2, 2025

Conversation

howard-junec
Copy link
Contributor

@howard-junec howard-junec commented Jun 3, 2025

Issue #, if available:

Description of changes:

This PR implements certificate renewal functionality for Ubuntu and RHEL nodes in EKS Anywhere clusters. This is part 2 of the certificate renewal feature, focusing on Linux-based operating systems. Key features include:

  • Direct certificate management for Ubuntu and RHEL nodes using a unified Linux implementation
  • SSH-based secure certificate deployment and renewal system
  • Automatic certificate backup and validation
  • Support for control plane certificate renewal for both stacked and external etcd configurations on Ubuntu and RHEL nodes
  • Automatic kubeconfig discovery based on cluster name from input YAML file
  • Configurable logging verbosity with -v flag to control output detail level
    • Default mode shows only essential information
    • Verbose mode (-v=2) displays detailed renewal process information
  • Environment variable support:
    • Set EKSA_SSH_KEY_PASSPHRASE_ETCD and EKSA_SSH_KEY_PASSPHRASE_CP to avoid manual password input
    • Manual password input when environment variables are not set

Example Config File:

clusterName: my-cluster
os: ubuntu  # ubuntu, rhel
controlPlane:
  nodes:
    - 100.000.0.11
    - 100.000.0.12
  ssh:
    sshKey: /path/to/ssh/private-key
    sshUser: ec2-user
etcd:
  nodes:
    - 100.000.0.14
    - 100.000.0.15
  ssh:
    sshKey: /path/to/ssh/private-key
    sshUser: ec2-user

This PR builds on the configuration structure from part 1, implementing the renewal logic for traditional Linux distributions.

Testing (if applicable):

Documentation added/planned (if applicable):

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

Hi @howard-junec. Thanks for your PR.

I'm waiting for a aws member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eks-distro-bot eks-distro-bot added needs-ok-to-test size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 3, 2025
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from fbe65d4 to 51d856f Compare June 6, 2025 22:14
@eks-distro-bot eks-distro-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 6, 2025
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from b2dad8b to 6eb3e86 Compare June 12, 2025 21:32
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2025
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from 6fcf2db to c56b9b5 Compare June 18, 2025 20:31
@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 18, 2025
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2025
err = r.kube.Get(ctx, secretName, constants.EksaSystemNamespace, secret)
if err != nil {
if apierrors.IsNotFound(err) {
logger.V(2).Info("Secret not found—skipping creation", "name", secretName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will we run into this case?

Copy link
Contributor Author

@howard-junec howard-junec Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for skipping creating the secret if not found, otherwise, we will update the apiserver etcd client secret

return fmt.Errorf("backup certs: %v", err)
}
if err := ssh.RunCommand(ctx, node,
[]string{"sudo", "etcdadm", "join", "phase", "certificates", "http://eks-a-etcd-dumb-url"}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we running this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for run certificates join phase to regenerate the deleted certificates

"cd %[1]s && cp -r pki pki.bak_%[2]s && rm -rf pki/* && cp pki.bak_%[2]s/ca.* pki/",
linuxEtcdCertDir, backup,
)
return []string{"sudo", "sh", "-c", script}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we just move the first 3 params here to a separate func and make it accept any script and just call it here and all other use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

}

func (r *DefaultSSHRunner) buildCommandString(cmds []string) string {
if len(cmds) >= 3 && cmds[0] == "sudo" && cmds[1] == "sh" && cmds[2] == "-c" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this only for this command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since commands are passed as string slices, but sudo sh -c needs proper quoting to preserve script integrity, and in later, bottlerocket's sudo sheltie commands need special handling to maintain heredoc syntax. This function will formats both cases correctly.


kubeClient := deps.UnAuthKubeClient.KubeconfigClient(kubeCfgPath)

osKey := cfg.OS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call it os?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

// RunCommand runs a command on the remote host
RunCommand(ctx context.Context, node string, cmds []string) error
// RunCommandWithOutput runs a command on the remote host and returns the output
RunCommandWithOutput(ctx context.Context, node string, cmds []string) (string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need both RunCommandWithOutput and RunCommand? I think we only need RunCommandWithOutput, just don't assign the output if we don't need it
_, err := ssh.RunCommandWithOutput

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried replacing RunCommand with RunCommandWithOutput as suggested, but found that it loses detailed command output when using high verbosity levels like -v=9. RunCommand has special logging behavior that streams command output directly to the console at high verbosity levels, which is critical for debugging. Both methods serve different purposes, I think we can keep both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we deciding when to run RunCommand or RunCommandWithOutput?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for nor we only use RunCommandWithOutput when we need to programmatically capture and process the command's output, like captures details steps in "copyEtcdCerts", and all other steps we just use RunCommand.

@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from 2851169 to 160214e Compare June 25, 2025 21:26
…tialization, and handle environment variables more robustly
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from fe2d729 to c59d637 Compare June 26, 2025 21:16
@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 30, 2025
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from a454c69 to 9e664e6 Compare June 30, 2025 18:12
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 30, 2025
@howard-junec howard-junec force-pushed the cert-renewal-ubuntu branch from c1ae8ea to 9e664e6 Compare June 30, 2025 18:17
@eks-distro-bot eks-distro-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 30, 2025
@panktishah26
Copy link
Member

Thanks for all the work!
/lgtm

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot merged commit 7942b13 into aws:main Jul 2, 2025
6 checks passed
@eks-distro-pr-bot
Copy link
Contributor

@panktishah26: new pull request created: #9957

In response to this:

/cherry-pick release-0.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@panktishah26
Copy link
Member

/cherry-pick release-0.23

@eks-distro-pr-bot
Copy link
Contributor

@panktishah26: new pull request created: #9959

In response to this:

/cherry-pick release-0.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-ok-to-test size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants