Skip to content

Shrinking and extending does not work with first node #210

@pfaelzerchen

Description

@pfaelzerchen

Summary

I've set up a three node HA cluster with etcd following the quickstart guide with hosts tick, trick and track. Then I wanted to test how to take single nodes from out (e.g. to install new ubuntu lts releases) and get them back to the cluster. This works fine with trick and track, but not with tick.

I'm relatively new to ansible and k3s, so sorry if I didn't see something obvious.

Issue Type

  • Bug Report

Controller Environment and Configuration

I'm using v3.4.2 from ansible-galaxy. Following the dump from shrinking.

# Begin ANSIBLE VERSION
ansible [core 2.14.2]
  config file = None
  configured module search path = ['/home/matthias/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /home/matthias/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True
# End ANSIBLE VERSION

# Begin ANSIBLE CONFIG
CONFIG_FILE() = None
# End ANSIBLE CONFIG

# Begin ANSIBLE ROLES
# /home/matthias/.ansible/roles
- hifis.unattended_upgrades, v3.1.0
- xanmanning.k3s, v3.4.2
# End ANSIBLE ROLES

# Begin PLAY HOSTS
["tick", "trick", "track"]
# End PLAY HOSTS

# Begin K3S ROLE CONFIG
## tick
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_state: "uninstalled"
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t129\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.605739", "end": "2023-07-16 12:24:52.607773", "delta": "0:00:00.002034", "msg": "", "stdout_lines": ["cpuset\t0\t129\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}

## trick
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t133\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.741053", "end": "2023-07-16 12:24:52.744222", "delta": "0:00:00.003169", "msg": "", "stdout_lines": ["cpuset\t0\t133\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}

## track
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t129\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.737496", "end": "2023-07-16 12:24:52.740649", "delta": "0:00:00.003153", "msg": "", "stdout_lines": ["cpuset\t0\t129\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}

# End K3S ROLE CONFIG

# Begin K3S RUNTIME CONFIG
## tick
## trick
## track
# End K3S RUNTIME CONFIG

Steps to Reproduce

  1. Set up a three node cluster as given in quickstart documentation
  2. Following the shrinking documentation for track => cluster is alive with 2 nodes
  3. Following the extending documentation for track => cluster is alive with 3 nodes
  4. Following the shrinking documentation for tick => there may be errors running the playbook, but the cluster remains alive with 2 nodes
  5. Following the extending documentation for tick => there are errors running the playbook and the process may hang and needs to be rerun. The errors in the playbook-run don't seem to be exactly reproducible.

Playbook:

---
- name: Install k3s cluster
  hosts: kubernetes
  remote_user: matthias
  become: true
  vars:
    k3s_release_version: v1.27.3+k3s1
    k3s_become: true
    k3s_etcd_datastore: true
    k3s_use_experimental: false  # Note this is required for k3s < v1.19.5+k3s1
    k3s_use_unsupported_config: false
    k3s_install_hard_links: true
    k3s_build_cluster: true

  roles:
    - role: xanmanning.k3s

Inventory:

---
all:
  children:
    kubernetes:
      hosts:
        tick:
          hostname: tick
        trick:
          hostname: trick
        track:
          hostname: track
      vars:
        k3s_control_node: true
        k3s_server:
          disable:
            - traefik

Expected Result

The cluster is up and running with three nodes and using the existing certificates.

Actual Result

tick is up and running a one node cluster, trick and track are unable to start k3s. The systemd unit fails on the host.

I also copied the kubectl configuration on my local machine. Locally, I cannot connect with kubectl any more as the certificates are wrong. So it seems tick got a completely new installation with new certificates. After steps 1 and 2 the cluster was still reachable with the existing certificates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions