fs/9p: Reuse inode based on path (in addition to qid) #12

micromaomao · 2025-08-27T09:39:35Z

Changes since v1:

Fix various issues (rcu usage etc) in ino_path.c with feedback from Al
Viro and Mickaël Salaün
Use d_same_name instead of strncmp
Properly initialize v9inode->path to NULL
merge v9fs_test_inode_dotl and v9fs_test_new_inode_dotl
don't do path comparison for cached mode

TODO:

~~remove the ability to configure inodeident - it will just be cached vs uncached...? (ask suggestion)~~
ask to clarify the role of st_gen
guard(rcu)

Instead of never reusing inode if path-based ident is not enabled, we
always reuse in uncached mode, but if path-based ident is not enabled,
we don't check the path. This makes the code easier to reason about,
and gets rid of the complexity of having to support two quite different
mode of operation (re-using and not re-using inodes).
Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked
specifies that it can't sleep.

TODO:
don't down_read rename_sem on iget5 when new

option: no none for inodeident, default to path, can be qid.
inodeident=path on cached mode not supported.

Removing V9FS_INODE_IDENT_MASK

Dropped original patch 3 ("Hide inodeident=path from show_options as it is the default")

rename handling

Discovered by xfstests/generic/084:
Fix crash when inode is allocated then immediately deallocated in iget5_locked due to race
Fix incorrect handling of not-fully-initialized inode being tested by iget5

Cover

(This would also cause a new inode to be created on server-side renames, but
on cached mode that is not supported anyway)

Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.

handle also modifying the ino_path of children when renaming a directory.

Remote renames

We cannot cope with remote renames. The fact that path-based inode ident
can be turned off helps those who want to support such use cases.

Hard links

inodeident=path is "ideologically" incompatible with hard-links, so
changing the inode path to the destination, and not of any other hard
links, is a best effort approach.

Q: why not tree?
A: inode lifetime is independent of dentry, and Landlock might e.g. ihold
/a/b/c but dentry for /a and everything below is gone. Therefore if we
need to mv /a/b -> /a/b2, there is no way to find the ino_path originally
created for /a/b that /a/b/c linked to without searching over either all
children or all live inodes, unless we effectively invent our own
"dcache". So let's just do a O(N) scan of all inodes and change their
ino_path...

Bugs found via xfstests: generic/084
link/unlink and rm/touch race
Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.

Receiving HSR frame with insufficient space to hold HSR tag in the skb can result in a crash (kernel BUG): [ 45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1 [ 45.392559] ------------[ cut here ]------------ [ 45.392912] kernel BUG at net/core/skbuff.c:211! [ 45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef) [ 45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 45.395273] RIP: 0010:skb_panic+0x15b/0x1d0 <snip registers, remove unreliable trace> [ 45.402911] Call Trace: [ 45.403105] <IRQ> [ 45.404470] skb_push+0xcd/0xf0 [ 45.404726] br_dev_queue_push_xmit+0x7c/0x6c0 [ 45.406513] br_forward_finish+0x128/0x260 [ 45.408483] __br_forward+0x42d/0x590 [ 45.409464] maybe_deliver+0x2eb/0x420 [ 45.409763] br_flood+0x174/0x4a0 [ 45.410030] br_handle_frame_finish+0xc7c/0x1bc0 [ 45.411618] br_handle_frame+0xac3/0x1230 [ 45.413674] __netif_receive_skb_core.constprop.0+0x808/0x3df0 [ 45.422966] __netif_receive_skb_one_core+0xb4/0x1f0 [ 45.424478] __netif_receive_skb+0x22/0x170 [ 45.424806] process_backlog+0x242/0x6d0 [ 45.425116] __napi_poll+0xbb/0x630 [ 45.425394] net_rx_action+0x4d1/0xcc0 [ 45.427613] handle_softirqs+0x1a4/0x580 [ 45.427926] do_softirq+0x74/0x90 [ 45.428196] </IRQ> This issue was found by syzkaller. The panic happens in br_dev_queue_push_xmit() once it receives a corrupted skb with ETH header already pushed in linear data. When it attempts the skb_push() call, there's not enough headroom and skb_push() panics. The corrupted skb is put on the queue by HSR layer, which makes a sequence of unintended transformations when it receives a specific corrupted HSR frame (with incomplete TAG). Fix it by dropping and consuming frames that are not long enough to contain both ethernet and hsr headers. Alternative fix would be to check for enough headroom before skb_push() in br_dev_queue_push_xmit(). In the reproducer, this is injected via AF_PACKET, but I don't easily see why it couldn't be sent over the wire from adjacent network. Further Details: In the reproducer, the following network interface chain is set up: ┌────────────────┐ ┌────────────────┐ │ veth0_to_hsr ├───┤ hsr_slave0 ┼───┐ └────────────────┘ └────────────────┘ │ │ ┌──────┐ ├─┤ hsr0 ├───┐ │ └──────┘ │ ┌────────────────┐ ┌────────────────┐ │ │┌────────┐ │ veth1_to_hsr ┼───┤ hsr_slave1 ├───┘ └┤ │ └────────────────┘ └────────────────┘ ┌┼ bridge │ ││ │ │└────────┘ │ ┌───────┐ │ │ ... ├──────┘ └───────┘ To trigger the events leading up to crash, reproducer sends a corrupted HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'. The first HSR-layer function to process this frame is hsr_handle_frame(). It and then checks if the protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that the skb is long enough. For the crashing frame it is not, and hence the skb->network_header and skb->mac_len fields are set incorrectly, pointing after the end of the linear buffer. I will call this a BUG#1 and it is what is addressed by this patch. In the crashing scenario before the fix, the skb continues to go down the hsr path as follows. hsr_handle_frame() then calls this sequence hsr_forward_skb() fill_frame_info() hsr->proto_ops->fill_frame_info() hsr_fill_frame_info() hsr_fill_frame_info() contains a check that intends to check whether the skb actually contains the HSR header. But the check relies on the skb->mac_len field which was erroneously setup due to BUG#1, so the check passes and the execution continues back in the hsr_forward_skb(): hsr_forward_skb() hsr_forward_do() hsr->proto_ops->get_untagged_frame() hsr_get_untagged_frame() create_stripped_skb_hsr() In create_stripped_skb_hsr(), a copy of the skb is created and is further corrupted by operation that attempts to strip the HSR tag in a call to __pskb_copy(). The skb enters create_stripped_skb_hsr() with ethernet header pushed in linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of ethernet header into the headroom, creating skb_in with a headroom of size 8. The subsequent __pskb_copy() then creates an skb with headroom of just 2 and skb->len of just 12, this is how it looks after the copy: gdb) p skb->len $10 = 12 (gdb) p skb->data $11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373", (gdb) p skb->head $12 = (unsigned char *) 0xffff888041e45380 "" It seems create_stripped_skb_hsr() assumes that ETH header is pulled in the headroom when it's entered, because it just pulls HSR header on top. But that is not the case in our code-path and we end up with the corrupted skb instead. I will call this BUG#2 *I got confused here because it seems that under no conditions can create_stripped_skb_hsr() work well, the assumption it makes is not true during the processing of hsr frames - since the skb_push() in hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I missed something here.* Next, the execution arrives in hsr_deliver_master(). It calls skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have enough space for the pull (as it only has 12 bytes in total at this point). *The skb_pull() here further suggests that ethernet header is meant to be pushed through the whole hsr processing and create_stripped_skb_hsr() should pull it before doing the HSR header pull.* hsr_deliver_master() then puts the corrupted skb on the queue, it is then picked up from there by bridge frame handling layer and finally lands in br_dev_queue_push_xmit where it panics. Cc: stable@kernel.org Fixes: 48b491a ("net: hsr: fix mac_len checks") Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>

The intention of this patch is to allow features like Landlock and fanotify (inode mark mode) to work on uncached 9pfs. These features rely on holding a specific inode and handling further access to the same file (as identified by that inode), however, currently in uncached mode, we always get a new inode on each access, due to concerns regarding server side inode number collision. On cached mode (either CACHE_LOOSE or CACHE_META), inode is already reused only by looking at the qid (server-side inode number). Since introducing this additional check would regress hard links (as they will have different path, and thus the two ends of a hard link won't be the same inode anymore under this approach), this won't be done for cached mode. Currently this patch doesn't actually have any effect - the next commit will introduce a config option to control inodeident=path enablement and default it to on for uncached mode. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> Closes: landlock-lsm#45 --- Changes since v1: - Assume inodeident=path will not be set in cached mode. - Fix various issues (rcu usage etc) in ino_path.c with feedback from Al Viro and Mickaël Salaün - Use d_same_name instead of strncmp - Instead of changing v9fs_test_new_inode_dotl to add the path check (thus hijacking the meaning of "new" to actually mean "uncached"), we add the path check (conditional on the right flags in v9ses) to the cached test function (v9fs_test_inode_dotl) and use that function for both cached and uncached mode, by adding additional conditionals within in for the version/generation check. The v9fs_test_new_inode_dotl function is thus used only for mknod, mkdir and atomic_open in the "need to create" case. - Instead of never reusing inode if path-based ident is not enabled, we always reuse in uncached mode, but if path-based ident is not enabled, we don't check the path. This makes the code easier to reason about, and gets rid of the complexity of having to support two quite different mode of operation (re-using and not re-using inodes). - Fix crash due to uninitialized v9inode->path when inode is allocated then immediately deallocated in iget5_locked as a result of two iget racing with each other to insert the inode. Spotted via xfstests. - Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked specifies that it can't sleep. Doing so means that we need to handle a special case of inode being created and hashed into the inode list, and thus may be tested by another iget5_locked call, but its v9inode->path has not been populated yet. This is resolved via waiting for iget5_locked to return before checking the path. This edge case was spotted via xfstests.

By this point we have two ways to test for inode reuse - qid and qid+path. By default, uncached mode uses qid+path and cached mode uses qid (and in fact does not support qid+path). This patch adds the option to control the behaviour for uncached mode. In a future version, if we can negotiate with the server and be sure that it won't give us duplicate qid.path, the default for those cases can be qid-based. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> --- Changes since v1: - Removed inodeident=none and instead supports inodeident=qid. This means that there is no longer an option to not re-use inodes at all. - No longer supports inodeident=path on cached mode, checks added at option init time. - Added explicit bits for both V9FS_INODE_IDENT_PATH and V9FS_INODE_IDENT_QID, in order to set a default based on cache bits when neither are set explicitly by the user.

This replicates the earlier .L patch for non-.L, and removing some previously inserted conditionals in shared code. Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Reflect v2 changes to the .L counterpart of this.

This uses the stat struct we already got as part of lookup process to refresh the inode "for free". Only for uncached mode for now. We will need to revisit this for cached mode once we sort out reusing an old inode with changed qid.version. (Currently this is not done in this series, which would be fine unless server side change happens, which is not supposed to happen in cached mode anyway) Note that v9fs_test_inode_dotl already makes sure we don't reuse inodes of the wrong type or different qid. Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Check cache bits instead of using `new` - uncached mode now also have new=0.

This replicates the previous .L commit for non-.L Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Check cache bits instead of using `new` - uncached mode now also have new=0.

This makes it possible for the inode to "move along" to the new location when a file under a inodeident=path 9pfs is moved, and it will be reused on next access to the new location. Modifying the ino_path of children when renaming a directory is currently not handled. Renaming non-empty directories still work, but the children won't have their the inodes be reused after renaming. Inodes will also not be reused on server-side rename, since there is no way for us to know about it. From our perspective this is indistinguishable from a new file being created in the destination that just happened to have the same qid, and the original file being deleted. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> --- New patch in v2

Add a row for this option in the Options table. Signed-off-by: Tingmao Wang <m@maowtm.org> --- New patch in v2

FUSE is already supported but v9fs requires some fixes [1]. These tests require /mnt/test-v9fs and /mnt/test-fuse to be already mounted. It would be too complex to set up such mount points with the test fixtures because of the related daemons' lifetime, but it is easy to run these tests with check-linux.sh from landlock-test-tools [2]. Add new kernel configurations to support these two new filesystems. Rename and update path_is_fs() to take a path as argument. [tingmao: pulled this patch into this series per discussion with Mickaël, original link is:] Link: https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/ Cc: Christian Schoenebeck <linux_oss@crudebyte.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Eric Van Hensbergen <ericvh@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Tingmao Wang <m@maowtm.org> Link: https://lore.kernel.org/r/cover.1743971855.git.m@maowtm.org [1] Link: landlock-lsm/landlock-test-tools#21 [2] Signed-off-by: Mickaël Salaün <mic@digikod.net> Tested-by: Tingmao Wang <m@maowtm.org>

To: "Mickaël Salaün" <mic@digikod.net> To: "Günther Noack" <gnoack@google.com> Signed-off-by: Tingmao Wang <m@maowtm.org>

Since the path-based inode identification feature needs special handling for renames, we add some Landlock tests for renames on custom filesystems. As part of this, also standardize the file path used in existing layout3_fs variants to be file1_s1d1 (unless they are per-fs fixed paths). This is so that we can rename this file to e.g. file2_s1d1 or file1_s2d1 (by renaming the parent dir). This is an optional change - we could always add an extra file creation in fixture setup for layout3_fs, but since there's no particular reason to use "dir/file" as the test file, we just make it use the "standard" name used in the rest of this fs_test. To: "Mickaël Salaün" <mic@digikod.net> To: "Günther Noack" <gnoack@google.com> Signed-off-by: Tingmao Wang <m@maowtm.org>

Copilot

Pull Request Overview

This PR introduces path-based inode identification for 9p filesystems as an alternative to the existing qid-based approach. The primary motivation is to support security frameworks like Landlock and fanotify that rely on inode identity, while handling cases where servers may present qid collisions or when remote renames occur.

Adds inodeident=path mount option that tracks file paths internally to determine inode reuse eligibility
Implements path comparison logic using stored path snapshots for inode matching
Updates inode lookup functions throughout the 9p codebase to support path-based identification

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tools/testing/selftests/landlock/fs_test.c	Adds comprehensive test coverage for rename operations across multiple filesystems including new 9p and FUSE variants
tools/testing/selftests/landlock/config	Enables required kernel config options for 9p and FUSE filesystem testing
fs/9p/vfs_super.c	Initializes root inode path for path-based identification mode
fs/9p/vfs_inode_dotl.c	Major refactoring of inode lookup to support path comparison and handle race conditions
fs/9p/vfs_inode.c	Updates non-dotl inode operations for path-based identification and rename handling
fs/9p/v9fs.h	Defines new data structures and function signatures for path-based inode identification
fs/9p/v9fs.c	Adds mount option parsing and validation for `inodeident` parameter
fs/9p/ino_path.c	New file implementing path snapshot creation, comparison, and cleanup operations
fs/9p/Makefile	Includes new ino_path.c compilation unit
Documentation/filesystems/9p.rst	Documents the new `inodeident` mount option and its behavior

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-06T01:12:17Z

fs/9p/vfs_inode_dotl.c

+					"Refusing to reuse inode %p based on path mismatch",
+					inode);


[nitpick] The debug message format is inconsistent with other debug messages in the function. Consider adding the path information to make debugging easier: "Refusing to reuse inode %p (path mismatch for %pd)" and include the dentry parameter.

Suggested change

"Refusing to reuse inode %p based on path mismatch",

inode);

"Refusing to reuse inode %p (path mismatch for %pd)",

inode, dentry);

Copilot · 2025-09-06T01:12:17Z

fs/9p/vfs_inode_dotl.c

+		if (!inode || (inode->i_state & I_NEW)) {
+			WARN_ONCE(
+				1,
+				"Expected iget5_locked to return an existing inode");


The error message is unclear about what went wrong. Consider expanding it to: "Expected iget5_locked to return an existing inode during double-check, got new inode instead" to better explain the unexpected condition.

Suggested change

"Expected iget5_locked to return an existing inode");

"Expected iget5_locked to return an existing inode during double-check, got new inode instead");

Copilot · 2025-09-06T01:12:17Z

fs/9p/ino_path.c

+
+	/* Don't include the root dentry */
+	while (curr->d_parent != curr) {
+		if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) {


Using SSIZE_MAX as the limit is incorrect since path_components is declared as size_t (unsigned) but compared against a signed maximum. This should be SIZE_MAX or the variable should be ssize_t.

Suggested change

if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) {

if (WARN_ON_ONCE(path_components >= SIZE_MAX)) {

Copilot · 2025-09-06T01:12:18Z

fs/9p/ino_path.c

+			return false;
+		}
+		compare = &ino_path->names[i];
+		if (!d_same_name(curr, curr->d_parent, &compare->name)) {


The d_same_name function expects a parent dentry as the second parameter, but here curr->d_parent is passed while curr is the child. The parameter order should be verified against the function signature to ensure correct usage.

micromaomao requested a review from Copilot August 27, 2025 09:39