-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Summary
I've set up a three node HA cluster with etcd following the quickstart guide with hosts tick, trick and track. Then I wanted to test how to take single nodes from out (e.g. to install new ubuntu lts releases) and get them back to the cluster. This works fine with trick and track, but not with tick.
I'm relatively new to ansible and k3s, so sorry if I didn't see something obvious.
Issue Type
- Bug Report
Controller Environment and Configuration
I'm using v3.4.2 from ansible-galaxy. Following the dump from shrinking.
# Begin ANSIBLE VERSION
ansible [core 2.14.2]
config file = None
configured module search path = ['/home/matthias/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /home/matthias/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
# End ANSIBLE VERSION
# Begin ANSIBLE CONFIG
CONFIG_FILE() = None
# End ANSIBLE CONFIG
# Begin ANSIBLE ROLES
# /home/matthias/.ansible/roles
- hifis.unattended_upgrades, v3.1.0
- xanmanning.k3s, v3.4.2
# End ANSIBLE ROLES
# Begin PLAY HOSTS
["tick", "trick", "track"]
# End PLAY HOSTS
# Begin K3S ROLE CONFIG
## tick
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_state: "uninstalled"
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t129\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.605739", "end": "2023-07-16 12:24:52.607773", "delta": "0:00:00.002034", "msg": "", "stdout_lines": ["cpuset\t0\t129\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}
## trick
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t133\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.741053", "end": "2023-07-16 12:24:52.744222", "delta": "0:00:00.003169", "msg": "", "stdout_lines": ["cpuset\t0\t133\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}
## track
k3s_control_node: true
k3s_server: {"disable": ["traefik"]}
k3s_check_openrc_run: {"changed": false, "skipped": true, "skip_reason": "Conditional result was False"}
k3s_check_cgroup_option: {"changed": false, "stdout": "cpuset\t0\t129\t1", "stderr": "", "rc": 0, "cmd": ["grep", "-E", "^cpuset\\s+.*\\s+1$", "/proc/cgroups"], "start": "2023-07-16 12:24:52.737496", "end": "2023-07-16 12:24:52.740649", "delta": "0:00:00.003153", "msg": "", "stdout_lines": ["cpuset\t0\t129\t1"], "stderr_lines": [], "failed": false, "failed_when_result": false}
# End K3S ROLE CONFIG
# Begin K3S RUNTIME CONFIG
## tick
## trick
## track
# End K3S RUNTIME CONFIG
Steps to Reproduce
- Set up a three node cluster as given in quickstart documentation
- Following the shrinking documentation for track => cluster is alive with 2 nodes
- Following the extending documentation for track => cluster is alive with 3 nodes
- Following the shrinking documentation for tick => there may be errors running the playbook, but the cluster remains alive with 2 nodes
- Following the extending documentation for tick => there are errors running the playbook and the process may hang and needs to be rerun. The errors in the playbook-run don't seem to be exactly reproducible.
Playbook:
---
- name: Install k3s cluster
hosts: kubernetes
remote_user: matthias
become: true
vars:
k3s_release_version: v1.27.3+k3s1
k3s_become: true
k3s_etcd_datastore: true
k3s_use_experimental: false # Note this is required for k3s < v1.19.5+k3s1
k3s_use_unsupported_config: false
k3s_install_hard_links: true
k3s_build_cluster: true
roles:
- role: xanmanning.k3s
Inventory:
---
all:
children:
kubernetes:
hosts:
tick:
hostname: tick
trick:
hostname: trick
track:
hostname: track
vars:
k3s_control_node: true
k3s_server:
disable:
- traefik
Expected Result
The cluster is up and running with three nodes and using the existing certificates.
Actual Result
tick is up and running a one node cluster, trick and track are unable to start k3s. The systemd unit fails on the host.
I also copied the kubectl configuration on my local machine. Locally, I cannot connect with kubectl any more as the certificates are wrong. So it seems tick got a completely new installation with new certificates. After steps 1 and 2 the cluster was still reachable with the existing certificates.