Change rangelock handling in FreeBSD's zfs_getpages() #16643

markjdb · 2024-10-13T16:22:25Z

Unconditionally hold the rangelock in zfs_getpages().

Motivation and Context

This is reportedly required for direct I/O support on FreeBSD.

Description

This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.

How Has This Been Tested?

Smoke testing on FreeBSD.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

module/os/freebsd/zfs/zfs_vnops_os.c

behlendorf · 2024-10-14T16:55:17Z

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

markjdb · 2024-10-14T17:23:01Z

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Signed-off-by: Mark Johnston <markj@FreeBSD.org>

markjdb · 2024-11-08T13:45:42Z

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.
From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643
  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8
Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

I believe the latest version addresses this, and I didn't observe any panics while running the ZTS locally.

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Signed-off-by: Mark Johnston <markj@FreeBSD.org>

bwatkinson

Seems reasonable to me. I am not an expert in FreeBSD, but acquiring the range lock unconditionally in zfs_getpages() was the goal. It also is now passing the CI tests for FreeBSD for the dio_mmap test case.

behlendorf

__

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #16643

amotin · 2024-11-13T15:51:53Z

module/os/freebsd/zfs/zfs_vnops_os.c

+		count1 = j - i;
+		last_size = j == count ?
+		    MIN(end, obj_size) - (end - PAGE_SIZE) : PAGE_SIZE;
+		error = dmu_read_pages(zfsvfs->z_os, zp->z_id, &ma[i], count1,
+		    i == 0 ? &pgsin_b : NULL, j == count ? &pgsin_a : NULL,
+		    last_size);


I wonder, once we read several (count1) pages at once, shall we also increment i by the same number? I guess it may be fine via vm_page_any_valid() continue above, and it seems to be cheap, but do we need those iterations at all?

But more serious: dmu_read_pages() seems to not expect NULL for rbehind and rahead arguments, requiring pointer to zeroes instead. @markjdb I think this may end in NULL pointer dereference in whatever scenario this page skipping algorithm handles.

Looking over this again, I am seeing the same thing. I overlooked this originally.

Somehow I convinced myself that a null pointer was ok. Probably because zfs_getpages() itself permits null readahead/behind pointers.

I submitted a PR to fix this, thanks for taking another look: #16758

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes openzfs#16643

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 14, 2024

behlendorf requested review from behlendorf and bwatkinson October 14, 2024 02:30

amotin reviewed Oct 14, 2024

View reviewed changes

module/os/freebsd/zfs/zfs_vnops_os.c Outdated Show resolved Hide resolved

behlendorf added the Status: Work in Progress Not yet ready for general review label Oct 20, 2024

markjdb force-pushed the master branch from 9d2d8fc to fd77ba8 Compare November 8, 2024 13:44

markjdb force-pushed the master branch from fd77ba8 to 55cb2e7 Compare November 8, 2024 13:58

bwatkinson approved these changes Nov 13, 2024

View reviewed changes

bwatkinson requested a review from amotin November 13, 2024 00:50

amotin removed the Status: Work in Progress Not yet ready for general review label Nov 13, 2024

behlendorf approved these changes Nov 13, 2024

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 13, 2024

behlendorf closed this in 25eb538 Nov 13, 2024

amotin reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Uh oh!

markjdb commented Oct 13, 2024

Uh oh!

Uh oh!

behlendorf commented Oct 14, 2024

Uh oh!

markjdb commented Oct 14, 2024

Uh oh!

markjdb commented Nov 8, 2024

Uh oh!

bwatkinson left a comment

Uh oh!

behlendorf left a comment

Uh oh!

amotin Nov 13, 2024 •

edited

Loading

Uh oh!

bwatkinson Nov 14, 2024

Uh oh!

markjdb Nov 14, 2024

Uh oh!

Uh oh!

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Uh oh!

Conversation

markjdb commented Oct 13, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Uh oh!

behlendorf commented Oct 14, 2024

Uh oh!

markjdb commented Oct 14, 2024

Uh oh!

markjdb commented Nov 8, 2024

Uh oh!

bwatkinson left a comment

Choose a reason for hiding this comment

Uh oh!

behlendorf left a comment

Choose a reason for hiding this comment

Uh oh!

amotin Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwatkinson Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

markjdb Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amotin Nov 13, 2024 •

edited

Loading