Skip to content

Unable to start VM on aarch64 cluster with ceph storage #1708

@tregubovav-dev

Description

@tregubovav-dev

I'm unable to start any VM on my cluster after upgrading Incus from 6.9 to 6.10.

Cluster config:
  • 7x Raspberry PI 4B
  • Host OS: Ubuntu Server 24.04.2 LTS
  • microceph cluster with RBD and CephFS services (snap latest/stable - 18.2.4+snapc9f2b08f92)
  • Incus 6.10 with ceph and cephfs storages (ceph-common package installed)

Starting any VM fails with the error: Error: Failed setting up device via monitor: Failed adding block device for disk device "root": Failed adding block device: error connecting: Operation not supported. No instance log is created as command incus info <instance> --show-log displays empty log.

instance config:
architecture: aarch64
config:
  image.architecture: arm64
  image.description: Alpine edge arm64 (20250228_13:00)
  image.os: Alpine
  image.release: edge
  image.requirements.secureboot: "false"
  image.serial: "20250228_13:00"
  image.type: disk-kvm.img
  image.variant: default
  security.secureboot: "false"
  volatile.base_image: faf1e97bddf1d6c09256671b8aab3ac486f8ecbacbb20f2794cfbd1a0092de44
  volatile.cloud-init.instance-id: 620a588d-4859-4f2e-89c4-7cdc7d537fe8
  volatile.uuid: 43a84905-a9a5-41d4-9206-647cd642e255
  volatile.uuid.generation: 43a84905-a9a5-41d4-9206-647cd642e255
  volatile.vsock_id: "192594832"
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""
storage config:
config:
  ceph.cluster_name: ceph
  ceph.osd.pg_num: "32"
  ceph.osd.pool_name: test_remote
  ceph.user.name: admin
  volatile.pool.pristine: "true"
description: ""
name: test_remote
driver: ceph
used_by:
  <redacted>
status: Created
locations:
  <redacted>
incusd log snippet:
DEBUG  [2025-02-28T22:10:26-08:00] Start started                                 instance=alp instanceType=virtual-machine project=test stateful=false
DEBUG  [2025-02-28T22:10:26-08:00] Instance operation lock created               action=start instance=alp project=test reusable=false
DEBUG  [2025-02-28T22:10:26-08:00] MountInstance started                         driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:26-08:00] Matched trusted cert                          fingerprint=4da06b45f08856c028ee3dfac94944772f40053813bcd29d59a85ce8648fb185 subject="CN=root@picl-01,O=Linux Containers"
DEBUG  [2025-02-28T22:10:26-08:00] Handling API request                          fingerprint=4da06b45f08856c028ee3dfac94944772f40053813bcd29d59a85ce8648fb185 ip="picl-01:34246" method=GET protocol=cluster
 url="/1.0/operations/396201d4-6b5b-422c-bcd1-a893e57c5b6b?project=test"
DEBUG  [2025-02-28T22:10:26-08:00] Matched trusted cert                          fingerprint=4da06b45f08856c028ee3dfac94944772f40053813bcd29d59a85ce8648fb185 subject="CN=root@picl-01,O=Linux Containers"
DEBUG  [2025-02-28T22:10:26-08:00] WriteJSON
        {
                "type": "sync",
                "status": "Success",
                "status_code": 200,
                "operation": "",
                "error_code": 0,
                "error": "",
                "metadata": {
                        "id": "396201d4-6b5b-422c-bcd1-a893e57c5b6b",
                        "class": "task",
                        "description": "Starting instance",
                        "created_at": "2025-02-28T22:10:26.369873345-08:00",
                        "updated_at": "2025-02-28T22:10:26.369873345-08:00",
                        "status": "Running",
                        "status_code": 103,
                        "resources": {
                                "instances": [
                                        "/1.0/instances/alp?project=test"
                                ]
                        },
                        "metadata": null,
                        "may_cancel": false,
                        "err": "",
                        "location": "picl-03"
                }
        }  http_code=200
DEBUG  [2025-02-28T22:10:27-08:00] Activated RBD volume                          dev=/dev/rbd3 driver=ceph pool=test_remote volName=virtual-machine_test_alp.block
DEBUG  [2025-02-28T22:10:27-08:00] Activated RBD volume                          dev=/dev/rbd4 driver=ceph pool=test_remote volName=virtual-machine_test_alp
DEBUG  [2025-02-28T22:10:27-08:00] Mounted RBD volume                            dev=/dev/rbd4 driver=ceph options=discard path=/var/lib/incus/storage-pools/test_remote/virtual-machines/test_alp pool=test_remote volName=test_alp
DEBUG  [2025-02-28T22:10:27-08:00] MountInstance finished                        driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:27-08:00] Starting device                               device=root instance=alp instanceType=virtual-machine project=test type=disk
DEBUG  [2025-02-28T22:10:27-08:00] Starting QEMU                                 command="[forklimits limit=memlock:unlimited:unlimited fd=3 fd=4 -- /opt/incus/bin/qemu-system-aarch64 -S -name alp -uuid 43a84905-a9a5-41d4-9206-647cd642e255 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /run/incus/test_alp/qemu.conf -spice unix=on,disable-ticketing=on,addr=/run/incus/test_alp/qemu.spice -pidfile /run/incus/test_alp/qemu.pid -D /var/log/incus/test_alp/qemu.log -smbios type=2,manufacturer=LinuxContainers,product=Incus -runas incus]" instance=alp instanceType=virtual-machine project=test
DEBUG  [2025-02-28T22:10:27-08:00] UpdateInstanceBackupFile started              driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:27-08:00] Skipping unmount as in use                    driver=ceph pool=test_remote refCount=1 volName=test_alp
DEBUG  [2025-02-28T22:10:27-08:00] UpdateInstanceBackupFile finished             driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:29-08:00] QMP monitor started                           path=/run/incus/test_alp/qemu.monitor
DEBUG  [2025-02-28T22:10:29-08:00] Instance operation lock finished              action=start err="Failed adding block device for disk device \"root\": Failed adding block device: error connecting: Permission denied" instance=alp project=test reusable=false
DEBUG  [2025-02-28T22:10:29-08:00] Failed to unmount                             attempt=0 err="device or resource busy" path=/var/lib/incus/devices/test_alp/config.mount
DEBUG  [2025-02-28T22:10:29-08:00] QMP monitor stopped                           path=/run/incus/test_alp/qemu.monitor
DEBUG  [2025-02-28T22:10:29-08:00] Stopping device                               device=root instance=alp instanceType=virtual-machine project=test type=disk
DEBUG  [2025-02-28T22:10:29-08:00] UnmountInstance started                       driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:29-08:00] Unmounted RBD volume                          driver=ceph keepBlockDev=false path=/var/lib/incus/storage-pools/test_remote/virtual-machines/test_alp pool=test_remote volName=test_alp
DEBUG  [2025-02-28T22:10:30-08:00] Deactivated RBD volume                        driver=ceph pool=test_remote volName=virtual-machine_test_alp
DEBUG  [2025-02-28T22:10:30-08:00] Matched trusted cert                          fingerprint=5ff252dc3942daa47bfa6756e0a74d3288574413b09b40a2d56b395dfc83db47 subject="CN=root@picl-02,O=Linux Containers"
DEBUG  [2025-02-28T22:10:30-08:00] Replace current raft nodes                    raftMembers="[{{14 192.168.82.7:8443 voter} picl-07} {{15 192.168.82.4:8443 voter} picl-04} {{8 192.168.82.1:8443 voter} picl-01} {{9 192.168.82.2:8443 voter} picl-02} {{10 192.168.82.3:8443 stand-by} picl-03} {{12 192.168.82.5:8443 voter} picl-05} {{13 192.168.82.6:8443 stand-by} picl-06}]"
DEBUG  [2025-02-28T22:10:31-08:00] Deactivated RBD volume                        driver=ceph pool=test_remote volName=virtual-machine_test_alp.block
DEBUG  [2025-02-28T22:10:31-08:00] UnmountInstance finished                      driver=ceph instance=alp pool=test_remote project=test
DEBUG  [2025-02-28T22:10:31-08:00] Start finished                                instance=alp instanceType=virtual-machine project=test stateful=false
DEBUG  [2025-02-28T22:10:31-08:00] Failure for operation                         class=task description="Starting instance" err="Failed setting up device via monitor: Failed adding block device for disk device \"root\": Failed adding block device: error connecting: Permission denied" operation=396201d4-6b5b-422c-bcd1-a893e57c5b6b project=test
syslog snippet:
2025-02-28T22:20:01.115763-08:00 picl-03 wpa_supplicant[643]: wlan0: Failed to initiate sched scan
2025-02-28T22:20:01.640351-08:00 picl-03 CRON[10619]: (certupdater) CMD (/var/lib/certupdater/incus/cert-update.sh)
2025-02-28T22:20:01.653279-08:00 picl-03 systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
2025-02-28T22:20:01.682181-08:00 picl-03 systemd[1]: sysstat-collect.service: Deactivated successfully.
2025-02-28T22:20:01.686078-08:00 picl-03 systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
2025-02-28T22:20:02.117473-08:00 picl-03 CRON[10618]: (CRON) info (No MTA installed, discarding output)
2025-02-28T22:20:02.860856-08:00 picl-03 kernel:  rbd3: p1 p2
2025-02-28T22:20:02.861852-08:00 picl-03 kernel: rbd: rbd3: capacity 10737418240 features 0x1
2025-02-28T22:20:03.011734-08:00 picl-03 (udev-worker)[10654]: rbd3: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/rbd3' failed with exit code 1.
2025-02-28T22:20:03.132859-08:00 picl-03 (udev-worker)[10654]: rbd3p1: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/rbd3p1' failed with exit code 1.
2025-02-28T22:20:03.154264-08:00 picl-03 (udev-worker)[10655]: rbd3p2: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/rbd3p2' failed with exit code 1.
2025-02-28T22:20:03.345869-08:00 picl-03 kernel: rbd: rbd4: capacity 524288000 features 0x1
2025-02-28T22:20:03.455099-08:00 picl-03 (udev-worker)[10654]: rbd4: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/rbd4' failed with exit code 1.
2025-02-28T22:20:03.525921-08:00 picl-03 kernel: EXT4-fs (rbd4): mounted filesystem a3e63722-7fa5-4ed0-8667-74404004510f r/w with ordered data mode. Quota mode: none.
2025-02-28T22:20:03.620101-08:00 picl-03 kernel: audit: type=1400 audit(1740810003.618:184): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="incus-test_alp_</var/lib/incus>" pid=10737 comm="apparmor_parser"
2025-02-28T22:20:05.280842-08:00 picl-03 systemd[1]: var-lib-incus-devices-test_alp-config.mount.mount: Deactivated successfully.
2025-02-28T22:20:05.292967-08:00 picl-03 systemd[1]: var-lib-incus-storage\x2dpools-test_remote-virtual\x2dmachines-test_alp.mount: Deactivated successfully.
2025-02-28T22:20:05.382857-08:00 picl-03 kernel: EXT4-fs (rbd4): unmounting filesystem a3e63722-7fa5-4ed0-8667-74404004510f.
dmesq output:
[ 2364.008043]  rbd3: p1 p2
[ 2364.008546] rbd: rbd3: capacity 10737418240 features 0x1
[ 2364.494942] rbd: rbd4: capacity 524288000 features 0x1
[ 2364.674972] EXT4-fs (rbd4): mounted filesystem a3e63722-7fa5-4ed0-8667-74404004510f r/w with ordered data mode. Quota mode: none.
[ 2364.789207] audit: type=1400 audit(1740810259.210:185): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="incus-test_alp_</var/lib/incus>" pid=11021 comm="apparmor_parser"
[ 2366.457320] EXT4-fs (rbd4): unmounting filesystem a3e63722-7fa5-4ed0-8667-74404004510f.

P.S.
VMs start without issue on single Incus 6.10 instance with dir storage

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions