Skip to content

Conversation

micromaomao
Copy link
Owner

Correctness TBD...

@micromaomao micromaomao force-pushed the landlock-rcu-walk-base branch from 93a8240 to a5806cd Compare June 2, 2025 23:32
@micromaomao micromaomao force-pushed the landlock-rcu-walk branch 5 times, most recently from 0f085cd to 383b3c4 Compare June 3, 2025 23:29
This commit replaces dget_parent with a direct read of d_parent. By
holding rcu read lock we should be safe in terms of not reading freed
memory, but this is still problematic due to move+unlink, as will be shown
with the test in the next commit.

Note that follow_up is still used when walking up a mountpoint.

Signed-off-by: Tingmao Wang <m@maowtm.org>
By not taking references to parent directories when walking, the dentry
can become negative under us, if the target file is moved then the parent
quickly deleted.  This is problematic because it means that we lose access
to any landlock rules attached to that parent, and thus we won't be able
to grant the correct access even when we're allowed to pretend the move
hasn't happened yet.

This commit tests a slightly more complicated scenario, where after moving
the file's parent directory away, the next two intermediate directories
are quickly removed.  This demonstrates that in this situation the only
choice for Landlock is to restart the walk.  Without doing that, even if
we were to re-check d_parent, if we were "cut off" when we've already
walked away from the original leaf, we would still not be able to recover.

As an illustration:

mkdir /d1 /d1/d2 /b
create landlock rule on /d1
create landlock rule on /b
touch /d1/d2/file

thread 1                              thread 2
                                      cd /d1/d2
                                      cat file
                                        landlock walks to /d1/d2 without ref,
                                        checked rule on /d1/d2 (nothing),
                                        about to walk up to /d1
mv /d1/d2/file /b
rmdir /d1/d2 /d1
(/d1/d2 and /d1 both becomes negative)
                                        notices stuff changed
                                        at this point, we're looking at /d1/d2
                                        and trying to walk to its parent.
                                        however, both /d1/d2 and /d1 are negative
                                        now.
                                        Our only choice is to restart the walk
                                        altogether from the original file's dentry,
                                        which will now have d_parent /b.

The test is probablistic as it tests for a race condition, but I found
that in my environment, with the previous patch, it pretty much always
reliably fails within 10 seconds.  I've set the timeout to 30 seconds, and
the test will pass if no permission errors (or other errors) detected.  In
those 30 seconds it will keep recreating the above directory structure
(except with a lot more sibling directories so it can run for some time
before it "exhausts" all the directories and has to recreate the whole
thing).

Signed-off-by: Tingmao Wang <m@maowtm.org>
This fixes the issue mentioned in the previous patch, by essentially
having two "modes" for the pathwalk code - in the pathwalk_ref == false
case we don't take references and just inspect `d_parent` (unless we have
to `follow_up`).  In the pathwalk_ref == true case, this is the same as
before.

When we detect any renames during a pathwalk_ref == false walk, we restart
with pathwalk_ref == true, re-initializing the layer masks.  I'm not sure
if this is completely correct in regards to is_dom_check - but seems to
work for now.  I can revisit this later.

Signed-off-by: Tingmao Wang <m@maowtm.org>
micromaomao pushed a commit that referenced this pull request Sep 7, 2025
These iterations require the read lock, otherwise RCU
lockdep will splat:

=============================
WARNING: suspicious RCU usage
6.17.0-rc3-00014-g31419c045d64 #6 Tainted: G           O
-----------------------------
drivers/base/power/main.c:1333 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/547:
 #0: 00000000643ab418 (sb_writers#6){.+.+}-{0:0}, at: file_start_write+0x2b/0x3a
 #1: 0000000067a0ca88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x181/0x24b
 #2: 00000000631eac40 (kn->active#3){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x191/0x24b
 #3: 00000000609a1308 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0xaf/0x30b
 #4: 0000000060c0fdb0 (device_links_srcu){.+.+}-{0:0}, at: device_links_read_lock+0x75/0x98

stack backtrace:
CPU: 0 UID: 0 PID: 547 Comm: rtcwake Tainted: G           O        6.17.0-rc3-00014-g31419c045d64 #6 VOLUNTARY
Tainted: [O]=OOT_MODULE
Stack:
 223721b3a80 6089eac6 00000001 00000001
 ffffff00 6089eac6 00000535 6086e528
 721b3ac0 6003c294 00000000 60031fc0
Call Trace:
 [<600407ed>] show_stack+0x10e/0x127
 [<6003c294>] dump_stack_lvl+0x77/0xc6
 [<6003c2fd>] dump_stack+0x1a/0x20
 [<600bc2f8>] lockdep_rcu_suspicious+0x116/0x13e
 [<603d8ea1>] dpm_async_suspend_superior+0x117/0x17e
 [<603d980f>] device_suspend+0x528/0x541
 [<603da24b>] dpm_suspend+0x1a2/0x267
 [<603da837>] dpm_suspend_start+0x5d/0x72
 [<600ca0c9>] suspend_devices_and_enter+0xab/0x736
 [...]

Add the fourth argument to the iteration to annotate
this and avoid the splat.

Fixes: 0679963 ("PM: sleep: Make async suspend handle suppliers like parents")
Fixes: ed18738 ("PM: sleep: Make async resume handle consumers like children")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250826134348.aba79f6e6299.I9ecf55da46ccf33778f2c018a82e1819d815b348@changeid
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant