-
Notifications
You must be signed in to change notification settings - Fork 1
fs/9p: Reuse inode based on path (in addition to qid) #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: tmp-base
Are you sure you want to change the base?
Conversation
a19f53b
to
6a8a4f7
Compare
6a8a4f7
to
d0f3cb6
Compare
3f8f951
to
74e300f
Compare
74e300f
to
3a3120f
Compare
a4abf12
to
c03accc
Compare
39eea49
to
52b5459
Compare
52b5459
to
a848f82
Compare
a848f82
to
38ac722
Compare
Receiving HSR frame with insufficient space to hold HSR tag in the skb can result in a crash (kernel BUG): [ 45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1 [ 45.392559] ------------[ cut here ]------------ [ 45.392912] kernel BUG at net/core/skbuff.c:211! [ 45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef) [ 45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 45.395273] RIP: 0010:skb_panic+0x15b/0x1d0 <snip registers, remove unreliable trace> [ 45.402911] Call Trace: [ 45.403105] <IRQ> [ 45.404470] skb_push+0xcd/0xf0 [ 45.404726] br_dev_queue_push_xmit+0x7c/0x6c0 [ 45.406513] br_forward_finish+0x128/0x260 [ 45.408483] __br_forward+0x42d/0x590 [ 45.409464] maybe_deliver+0x2eb/0x420 [ 45.409763] br_flood+0x174/0x4a0 [ 45.410030] br_handle_frame_finish+0xc7c/0x1bc0 [ 45.411618] br_handle_frame+0xac3/0x1230 [ 45.413674] __netif_receive_skb_core.constprop.0+0x808/0x3df0 [ 45.422966] __netif_receive_skb_one_core+0xb4/0x1f0 [ 45.424478] __netif_receive_skb+0x22/0x170 [ 45.424806] process_backlog+0x242/0x6d0 [ 45.425116] __napi_poll+0xbb/0x630 [ 45.425394] net_rx_action+0x4d1/0xcc0 [ 45.427613] handle_softirqs+0x1a4/0x580 [ 45.427926] do_softirq+0x74/0x90 [ 45.428196] </IRQ> This issue was found by syzkaller. The panic happens in br_dev_queue_push_xmit() once it receives a corrupted skb with ETH header already pushed in linear data. When it attempts the skb_push() call, there's not enough headroom and skb_push() panics. The corrupted skb is put on the queue by HSR layer, which makes a sequence of unintended transformations when it receives a specific corrupted HSR frame (with incomplete TAG). Fix it by dropping and consuming frames that are not long enough to contain both ethernet and hsr headers. Alternative fix would be to check for enough headroom before skb_push() in br_dev_queue_push_xmit(). In the reproducer, this is injected via AF_PACKET, but I don't easily see why it couldn't be sent over the wire from adjacent network. Further Details: In the reproducer, the following network interface chain is set up: ┌────────────────┐ ┌────────────────┐ │ veth0_to_hsr ├───┤ hsr_slave0 ┼───┐ └────────────────┘ └────────────────┘ │ │ ┌──────┐ ├─┤ hsr0 ├───┐ │ └──────┘ │ ┌────────────────┐ ┌────────────────┐ │ │┌────────┐ │ veth1_to_hsr ┼───┤ hsr_slave1 ├───┘ └┤ │ └────────────────┘ └────────────────┘ ┌┼ bridge │ ││ │ │└────────┘ │ ┌───────┐ │ │ ... ├──────┘ └───────┘ To trigger the events leading up to crash, reproducer sends a corrupted HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'. The first HSR-layer function to process this frame is hsr_handle_frame(). It and then checks if the protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that the skb is long enough. For the crashing frame it is not, and hence the skb->network_header and skb->mac_len fields are set incorrectly, pointing after the end of the linear buffer. I will call this a BUG#1 and it is what is addressed by this patch. In the crashing scenario before the fix, the skb continues to go down the hsr path as follows. hsr_handle_frame() then calls this sequence hsr_forward_skb() fill_frame_info() hsr->proto_ops->fill_frame_info() hsr_fill_frame_info() hsr_fill_frame_info() contains a check that intends to check whether the skb actually contains the HSR header. But the check relies on the skb->mac_len field which was erroneously setup due to BUG#1, so the check passes and the execution continues back in the hsr_forward_skb(): hsr_forward_skb() hsr_forward_do() hsr->proto_ops->get_untagged_frame() hsr_get_untagged_frame() create_stripped_skb_hsr() In create_stripped_skb_hsr(), a copy of the skb is created and is further corrupted by operation that attempts to strip the HSR tag in a call to __pskb_copy(). The skb enters create_stripped_skb_hsr() with ethernet header pushed in linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of ethernet header into the headroom, creating skb_in with a headroom of size 8. The subsequent __pskb_copy() then creates an skb with headroom of just 2 and skb->len of just 12, this is how it looks after the copy: gdb) p skb->len $10 = 12 (gdb) p skb->data $11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373", (gdb) p skb->head $12 = (unsigned char *) 0xffff888041e45380 "" It seems create_stripped_skb_hsr() assumes that ETH header is pulled in the headroom when it's entered, because it just pulls HSR header on top. But that is not the case in our code-path and we end up with the corrupted skb instead. I will call this BUG#2 *I got confused here because it seems that under no conditions can create_stripped_skb_hsr() work well, the assumption it makes is not true during the processing of hsr frames - since the skb_push() in hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I missed something here.* Next, the execution arrives in hsr_deliver_master(). It calls skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have enough space for the pull (as it only has 12 bytes in total at this point). *The skb_pull() here further suggests that ethernet header is meant to be pushed through the whole hsr processing and create_stripped_skb_hsr() should pull it before doing the HSR header pull.* hsr_deliver_master() then puts the corrupted skb on the queue, it is then picked up from there by bridge frame handling layer and finally lands in br_dev_queue_push_xmit where it panics. Cc: stable@kernel.org Fixes: 48b491a ("net: hsr: fix mac_len checks") Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
38ac722
to
5eb9748
Compare
ac0ce1c
to
d2698b4
Compare
1ea5ef7
to
7ef55d4
Compare
The intention of this patch is to allow features like Landlock and fanotify (inode mark mode) to work on uncached 9pfs. These features rely on holding a specific inode and handling further access to the same file (as identified by that inode), however, currently in uncached mode, we always get a new inode on each access, due to concerns regarding server side inode number collision. On cached mode (either CACHE_LOOSE or CACHE_META), inode is already reused only by looking at the qid (server-side inode number). Since introducing this additional check would regress hard links (as they will have different path, and thus the two ends of a hard link won't be the same inode anymore under this approach), this won't be done for cached mode. Currently this patch doesn't actually have any effect - the next commit will introduce a config option to control inodeident=path enablement and default it to on for uncached mode. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> Closes: landlock-lsm#45 --- Changes since v1: - Assume inodeident=path will not be set in cached mode. - Fix various issues (rcu usage etc) in ino_path.c with feedback from Al Viro and Mickaël Salaün - Use d_same_name instead of strncmp - Instead of changing v9fs_test_new_inode_dotl to add the path check (thus hijacking the meaning of "new" to actually mean "uncached"), we add the path check (conditional on the right flags in v9ses) to the cached test function (v9fs_test_inode_dotl) and use that function for both cached and uncached mode, by adding additional conditionals within in for the version/generation check. The v9fs_test_new_inode_dotl function is thus used only for mknod, mkdir and atomic_open in the "need to create" case. - Instead of never reusing inode if path-based ident is not enabled, we always reuse in uncached mode, but if path-based ident is not enabled, we don't check the path. This makes the code easier to reason about, and gets rid of the complexity of having to support two quite different mode of operation (re-using and not re-using inodes). - Fix crash due to uninitialized v9inode->path when inode is allocated then immediately deallocated in iget5_locked as a result of two iget racing with each other to insert the inode. Spotted via xfstests. - Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked specifies that it can't sleep. Doing so means that we need to handle a special case of inode being created and hashed into the inode list, and thus may be tested by another iget5_locked call, but its v9inode->path has not been populated yet. This is resolved via waiting for iget5_locked to return before checking the path. This edge case was spotted via xfstests.
By this point we have two ways to test for inode reuse - qid and qid+path. By default, uncached mode uses qid+path and cached mode uses qid (and in fact does not support qid+path). This patch adds the option to control the behaviour for uncached mode. In a future version, if we can negotiate with the server and be sure that it won't give us duplicate qid.path, the default for those cases can be qid-based. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> --- Changes since v1: - Removed inodeident=none and instead supports inodeident=qid. This means that there is no longer an option to not re-use inodes at all. - No longer supports inodeident=path on cached mode, checks added at option init time. - Added explicit bits for both V9FS_INODE_IDENT_PATH and V9FS_INODE_IDENT_QID, in order to set a default based on cache bits when neither are set explicitly by the user.
This replicates the earlier .L patch for non-.L, and removing some previously inserted conditionals in shared code. Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Reflect v2 changes to the .L counterpart of this.
This uses the stat struct we already got as part of lookup process to refresh the inode "for free". Only for uncached mode for now. We will need to revisit this for cached mode once we sort out reusing an old inode with changed qid.version. (Currently this is not done in this series, which would be fine unless server side change happens, which is not supposed to happen in cached mode anyway) Note that v9fs_test_inode_dotl already makes sure we don't reuse inodes of the wrong type or different qid. Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Check cache bits instead of using `new` - uncached mode now also have new=0.
This replicates the previous .L commit for non-.L Signed-off-by: Tingmao Wang <m@maowtm.org> --- Changes since v1: - Check cache bits instead of using `new` - uncached mode now also have new=0.
This makes it possible for the inode to "move along" to the new location when a file under a inodeident=path 9pfs is moved, and it will be reused on next access to the new location. Modifying the ino_path of children when renaming a directory is currently not handled. Renaming non-empty directories still work, but the children won't have their the inodes be reused after renaming. Inodes will also not be reused on server-side rename, since there is no way for us to know about it. From our perspective this is indistinguishable from a new file being created in the destination that just happened to have the same qid, and the original file being deleted. Signed-off-by: Tingmao Wang <m@maowtm.org> Cc: "Mickaël Salaün" <mic@digikod.net> Cc: "Günther Noack" <gnoack@google.com> --- New patch in v2
Add a row for this option in the Options table. Signed-off-by: Tingmao Wang <m@maowtm.org> --- New patch in v2
7ef55d4
to
25c93e5
Compare
FUSE is already supported but v9fs requires some fixes [1]. These tests require /mnt/test-v9fs and /mnt/test-fuse to be already mounted. It would be too complex to set up such mount points with the test fixtures because of the related daemons' lifetime, but it is easy to run these tests with check-linux.sh from landlock-test-tools [2]. Add new kernel configurations to support these two new filesystems. Rename and update path_is_fs() to take a path as argument. [tingmao: pulled this patch into this series per discussion with Mickaël, original link is:] Link: https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/ Cc: Christian Schoenebeck <linux_oss@crudebyte.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Eric Van Hensbergen <ericvh@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Tingmao Wang <m@maowtm.org> Link: https://lore.kernel.org/r/cover.1743971855.git.m@maowtm.org [1] Link: landlock-lsm/landlock-test-tools#21 [2] Signed-off-by: Mickaël Salaün <mic@digikod.net> Tested-by: Tingmao Wang <m@maowtm.org>
f45ce6e
to
84736ea
Compare
To: "Mickaël Salaün" <mic@digikod.net> To: "Günther Noack" <gnoack@google.com> Signed-off-by: Tingmao Wang <m@maowtm.org>
Since the path-based inode identification feature needs special handling for renames, we add some Landlock tests for renames on custom filesystems. As part of this, also standardize the file path used in existing layout3_fs variants to be file1_s1d1 (unless they are per-fs fixed paths). This is so that we can rename this file to e.g. file2_s1d1 or file1_s2d1 (by renaming the parent dir). This is an optional change - we could always add an extra file creation in fixture setup for layout3_fs, but since there's no particular reason to use "dir/file" as the test file, we just make it use the "standard" name used in the rest of this fs_test. To: "Mickaël Salaün" <mic@digikod.net> To: "Günther Noack" <gnoack@google.com> Signed-off-by: Tingmao Wang <m@maowtm.org>
84736ea
to
6728435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces path-based inode identification for 9p filesystems as an alternative to the existing qid-based approach. The primary motivation is to support security frameworks like Landlock and fanotify that rely on inode identity, while handling cases where servers may present qid collisions or when remote renames occur.
- Adds
inodeident=path
mount option that tracks file paths internally to determine inode reuse eligibility - Implements path comparison logic using stored path snapshots for inode matching
- Updates inode lookup functions throughout the 9p codebase to support path-based identification
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tools/testing/selftests/landlock/fs_test.c | Adds comprehensive test coverage for rename operations across multiple filesystems including new 9p and FUSE variants |
tools/testing/selftests/landlock/config | Enables required kernel config options for 9p and FUSE filesystem testing |
fs/9p/vfs_super.c | Initializes root inode path for path-based identification mode |
fs/9p/vfs_inode_dotl.c | Major refactoring of inode lookup to support path comparison and handle race conditions |
fs/9p/vfs_inode.c | Updates non-dotl inode operations for path-based identification and rename handling |
fs/9p/v9fs.h | Defines new data structures and function signatures for path-based inode identification |
fs/9p/v9fs.c | Adds mount option parsing and validation for inodeident parameter |
fs/9p/ino_path.c | New file implementing path snapshot creation, comparison, and cleanup operations |
fs/9p/Makefile | Includes new ino_path.c compilation unit |
Documentation/filesystems/9p.rst | Documents the new inodeident mount option and its behavior |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
"Refusing to reuse inode %p based on path mismatch", | ||
inode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The debug message format is inconsistent with other debug messages in the function. Consider adding the path information to make debugging easier: "Refusing to reuse inode %p (path mismatch for %pd)"
and include the dentry parameter.
"Refusing to reuse inode %p based on path mismatch", | |
inode); | |
"Refusing to reuse inode %p (path mismatch for %pd)", | |
inode, dentry); |
Copilot uses AI. Check for mistakes.
if (!inode || (inode->i_state & I_NEW)) { | ||
WARN_ONCE( | ||
1, | ||
"Expected iget5_locked to return an existing inode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is unclear about what went wrong. Consider expanding it to: "Expected iget5_locked to return an existing inode during double-check, got new inode instead"
to better explain the unexpected condition.
"Expected iget5_locked to return an existing inode"); | |
"Expected iget5_locked to return an existing inode during double-check, got new inode instead"); |
Copilot uses AI. Check for mistakes.
|
||
/* Don't include the root dentry */ | ||
while (curr->d_parent != curr) { | ||
if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SSIZE_MAX
as the limit is incorrect since path_components
is declared as size_t
(unsigned) but compared against a signed maximum. This should be SIZE_MAX
or the variable should be ssize_t
.
if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) { | |
if (WARN_ON_ONCE(path_components >= SIZE_MAX)) { |
Copilot uses AI. Check for mistakes.
return false; | ||
} | ||
compare = &ino_path->names[i]; | ||
if (!d_same_name(curr, curr->d_parent, &compare->name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The d_same_name
function expects a parent dentry as the second parameter, but here curr->d_parent
is passed while curr
is the child. The parameter order should be verified against the function signature to ensure correct usage.
Copilot uses AI. Check for mistakes.
Changes since v1:
Viro and Mickaël Salaün
TODO:
remove the ability to configure inodeident - it will just be cached vs uncached...? (ask suggestion)always reuse in uncached mode, but if path-based ident is not enabled,
we don't check the path. This makes the code easier to reason about,
and gets rid of the complexity of having to support two quite different
mode of operation (re-using and not re-using inodes).
specifies that it can't sleep.
TODO:
don't down_read rename_sem on iget5 when new
option: no none for inodeident, default to path, can be qid.
inodeident=path on cached mode not supported.
Removing V9FS_INODE_IDENT_MASK
Dropped original patch 3 ("Hide inodeident=path from show_options as it is the default")
rename handling
Discovered by xfstests/generic/084:
Fix crash when inode is allocated then immediately deallocated in iget5_locked due to race
Fix incorrect handling of not-fully-initialized inode being tested by iget5
Cover
(This would also cause a new inode to be created on server-side renames, but
on cached mode that is not supported anyway)
Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.
handle also modifying the ino_path of children when renaming a directory.
Remote renames
We cannot cope with remote renames. The fact that path-based inode ident
can be turned off helps those who want to support such use cases.
Hard links
inodeident=path is "ideologically" incompatible with hard-links, so
changing the inode path to the destination, and not of any other hard
links, is a best effort approach.
Q: why not tree?
A: inode lifetime is independent of dentry, and Landlock might e.g. ihold
/a/b/c but dentry for /a and everything below is gone. Therefore if we
need to mv /a/b -> /a/b2, there is no way to find the ino_path originally
created for /a/b that /a/b/c linked to without searching over either all
children or all live inodes, unless we effectively invent our own
"dcache". So let's just do a O(N) scan of all inodes and change their
ino_path...
Bugs found via xfstests: generic/084
link/unlink and rm/touch race
Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.