-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Hello folks!
While developing the LINSTOR driver for Incus (lxc/incus#1621), we noticed that sometimes the request for creating a snapshot hangs indefinitely. In such cases, we are able to see that LINSTOR reports the snapshot with a Failed
status. The controller generates an error report with the message Creation of snapshot 'X' of resource 'Y' failed due to an unknown exception.
Environment information
$ uname -a
Linux server01 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ lvm version
LVM version: 2.03.16(2) (2022-05-18)
Library version: 1.02.185 (2022-05-18)
Driver version: 4.48.0
Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline
$ linstor controller version
linstor controller 1.30.4; GIT-hash: bef74a44609cb592c5efad2e707b50e696623c61
$ linstor node list
╭─────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═════════════════════════════════════════════════════════════╡
┊ server01 ┊ SATELLITE ┊ 10.172.117.143:3366 (PLAIN) ┊ Online ┊
┊ server02 ┊ SATELLITE ┊ 10.172.117.58:3366 (PLAIN) ┊ Online ┊
┊ server03 ┊ SATELLITE ┊ 10.172.117.93:3366 (PLAIN) ┊ Online ┊
┊ server04 ┊ SATELLITE ┊ 10.172.117.241:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────╯
$ linstor storage-pool list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ server01 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ server01;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server02 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ server02;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server03 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ server03;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server04 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ server04;DfltDisklessStorPool ┊
┊ nvme ┊ server01 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊ 49.89 GiB ┊ 49.89 GiB ┊ True ┊ Ok ┊ server01;nvme ┊
┊ nvme ┊ server02 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊ 49.89 GiB ┊ 49.89 GiB ┊ True ┊ Ok ┊ server02;nvme ┊
┊ nvme ┊ server03 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊ 49.89 GiB ┊ 49.89 GiB ┊ True ┊ Ok ┊ server03;nvme ┊
┊ nvme ┊ server04 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊ 49.89 GiB ┊ 49.89 GiB ┊ True ┊ Ok ┊ server04;nvme ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ linstor node info
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊ SPDK ┊ EXOS ┊ Remote SPDK ┊ Storage Spaces ┊ Storage Spaces/Thin ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ server01 ┊ + ┊ + ┊ + ┊ + ┊ + ┊ - ┊ - ┊ + ┊ - ┊ - ┊
┊ server02 ┊ + ┊ + ┊ + ┊ + ┊ + ┊ - ┊ - ┊ + ┊ - ┊ - ┊
┊ server03 ┊ + ┊ + ┊ + ┊ + ┊ + ┊ - ┊ - ┊ + ┊ - ┊ - ┊
┊ server04 ┊ + ┊ + ┊ + ┊ + ┊ + ┊ - ┊ - ┊ + ┊ - ┊ - ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────╮
┊ Node ┊ DRBD ┊ LUKS ┊ NVMe ┊ Cache ┊ BCache ┊ WriteCache ┊ Storage ┊
╞═══════════════════════════════════════════════════════════════════════╡
┊ server01 ┊ + ┊ - ┊ - ┊ + ┊ - ┊ + ┊ + ┊
┊ server02 ┊ + ┊ - ┊ - ┊ + ┊ - ┊ + ┊ + ┊
┊ server03 ┊ + ┊ - ┊ - ┊ + ┊ - ┊ + ┊ + ┊
┊ server04 ┊ + ┊ - ┊ - ┊ + ┊ - ┊ + ┊ + ┊
╰───────────────────────────────────────────────────────────────────────╯
How to reproduce
Given an environment similar to the one described above (I was also able to reproduce the behavior in a single node), spawn a resource definition with linstor resource-group spawn
:
$ linstor resource-group spawn DfltRscGrp test-resource 1GiB
Then create a loop to reproduce the behavior. In this case we're creating and deleting a snapshot until the command fails somehow:
$ while linstor snapshot create test-resource snap && linstor snapshot delete test-resource snap; do :; done
...
Error: Socket timeout, no data received for more than 300s.
$ linstor snapshot list
╭───────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ SnapshotName ┊ NodeNames ┊ Volumes ┊ CreatedOn ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════╡
┊ test-resource ┊ snap ┊ server01, server02 ┊ 0: 1 GiB ┊ ┊ Failed ┊
╰───────────────────────────────────────────────────────────────────────────────────╯
Logs
Here are the logs for the linstor-controller and linstor-satellite services collected when the error was reproduced, as well as the error report.