Skip to content

Creation of snapshot resource sporadically fails due to an unknown exception #443

@luissimas

Description

@luissimas

Hello folks!

While developing the LINSTOR driver for Incus (lxc/incus#1621), we noticed that sometimes the request for creating a snapshot hangs indefinitely. In such cases, we are able to see that LINSTOR reports the snapshot with a Failed status. The controller generates an error report with the message Creation of snapshot 'X' of resource 'Y' failed due to an unknown exception.

Environment information

$ uname -a
Linux server01 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ lvm version
  LVM version:     2.03.16(2) (2022-05-18)
  Library version: 1.02.185 (2022-05-18)
  Driver version:  4.48.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline
$ linstor controller version
linstor controller 1.30.4; GIT-hash: bef74a44609cb592c5efad2e707b50e696623c61
$ linstor node list
╭─────────────────────────────────────────────────────────────╮
┊ Node     ┊ NodeType  ┊ Addresses                   ┊ State  ┊
╞═════════════════════════════════════════════════════════════╡
┊ server01 ┊ SATELLITE ┊ 10.172.117.143:3366 (PLAIN) ┊ Online ┊
┊ server02 ┊ SATELLITE ┊ 10.172.117.58:3366 (PLAIN)  ┊ Online ┊
┊ server03 ┊ SATELLITE ┊ 10.172.117.93:3366 (PLAIN)  ┊ Online ┊
┊ server04 ┊ SATELLITE ┊ 10.172.117.241:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────╯
$ linstor storage-pool list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node     ┊ Driver   ┊ PoolName                          ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                    ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ server01 ┊ DISKLESS ┊                                   ┊              ┊               ┊ False        ┊ Ok    ┊ server01;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server02 ┊ DISKLESS ┊                                   ┊              ┊               ┊ False        ┊ Ok    ┊ server02;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server03 ┊ DISKLESS ┊                                   ┊              ┊               ┊ False        ┊ Ok    ┊ server03;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server04 ┊ DISKLESS ┊                                   ┊              ┊               ┊ False        ┊ Ok    ┊ server04;DfltDisklessStorPool ┊
┊ nvme                 ┊ server01 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊    49.89 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server01;nvme                 ┊
┊ nvme                 ┊ server02 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊    49.89 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server02;nvme                 ┊
┊ nvme                 ┊ server03 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊    49.89 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server03;nvme                 ┊
┊ nvme                 ┊ server04 ┊ LVM_THIN ┊ linstor_linstor-nvme/linstor-nvme ┊    49.89 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server04;nvme                 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ linstor node info
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node     ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊ SPDK ┊ EXOS ┊ Remote SPDK ┊ Storage Spaces ┊ Storage Spaces/Thin ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ server01 ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ server02 ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ server03 ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ server04 ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭───────────────────────────────────────────────────────────────────────╮
┊ Node     ┊ DRBD ┊ LUKS ┊ NVMe ┊ Cache ┊ BCache ┊ WriteCache ┊ Storage ┊
╞═══════════════════════════════════════════════════════════════════════╡
┊ server01 ┊ +    ┊ -    ┊ -    ┊ +     ┊ -      ┊ +          ┊ +       ┊
┊ server02 ┊ +    ┊ -    ┊ -    ┊ +     ┊ -      ┊ +          ┊ +       ┊
┊ server03 ┊ +    ┊ -    ┊ -    ┊ +     ┊ -      ┊ +          ┊ +       ┊
┊ server04 ┊ +    ┊ -    ┊ -    ┊ +     ┊ -      ┊ +          ┊ +       ┊
╰───────────────────────────────────────────────────────────────────────╯

How to reproduce

Given an environment similar to the one described above (I was also able to reproduce the behavior in a single node), spawn a resource definition with linstor resource-group spawn:

$ linstor resource-group spawn DfltRscGrp test-resource 1GiB

Then create a loop to reproduce the behavior. In this case we're creating and deleting a snapshot until the command fails somehow:

$ while linstor snapshot create test-resource snap && linstor snapshot delete test-resource snap; do :; done
...
Error: Socket timeout, no data received for more than 300s.
$ linstor snapshot list
╭───────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName  ┊ SnapshotName ┊ NodeNames          ┊ Volumes  ┊ CreatedOn ┊ State  ┊
╞═══════════════════════════════════════════════════════════════════════════════════╡
┊ test-resource ┊ snap         ┊ server01, server02 ┊ 0: 1 GiB ┊           ┊ Failed ┊
╰───────────────────────────────────────────────────────────────────────────────────╯

Logs

Here are the logs for the linstor-controller and linstor-satellite services collected when the error was reproduced, as well as the error report.

controller.log
report.log
satellite1.log
satellite2.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions