Skip to content

Conversation

theboringstuff
Copy link
Collaborator

@theboringstuff theboringstuff commented May 20, 2025

Description

During upgrade, if new kubelet version is installed on all nodes and then kubelet service is immediately restarted on all nodes, upgrade will fail, because a lot of pods in the cluster will become non-functional due to inconsistent kubelet/kubernetes version.
Overall, this is risky to upgrade kubelet globally on all nodes at the beginning of the upgrade, because upgrade may fail and cluster will be left in a fragile state with inconsistent kubelet.

Solution

As the least invasive fix, we could upgrade kubelet per-node, right after node draining:

  • changed install_all_thirdparties task:
    • now this task allows excluding thirdparty from upgrade
    • fixed name typo
  • during upgrade, kubelet is excluded from global thirdparties upgrade
  • kubelet thirdparty is upgraded per-node during control-plane/worker upgrade, right after draining the node

Test Cases

TestCase 1

Steps:

  1. Install cluster with k8s v1.30.10
  2. Run upgrade to v1.31.6 with only following tasks (first in upgrade tasks sequence):
    cleanup_tmp_dir,verify_upgrade_versions,thirdparties,prepull_images
    
  3. Restart kubelet on all nodes, e.g.
    kubemarine do -g all --no_stream -- systemctl restart kubelet
    
    These steps will simulate sudden upgrade and cluster failure after "prepull_images"
  4. Rerun upgrade from v1.30.10 (make sure this version still used in cluster.yaml) to v1.31.6 as usual, with all tasks included

Results:

Before After
Upgrade fails, since system containers are in a bad state after global kubelet upgrade/restart Upgrade completes, since kubelet is not globally upgraded

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • There is no breaking changes, or migration patch is provided
  • Integration CI passed
  • Unit tests. If Yes list of new/changed tests with brief description
  • There is no merge conflicts

@theboringstuff theboringstuff marked this pull request as ready for review May 20, 2025 13:32
@DmitriiRabenok DmitriiRabenok merged commit 3b87612 into main Jul 8, 2025
35 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants