Skip to content

Conversation

micromaomao
Copy link
Owner

@micromaomao micromaomao commented Aug 27, 2025

Changes since v1:

  • Fix various issues (rcu usage etc) in ino_path.c with feedback from Al
    Viro and Mickaël Salaün
  • Use d_same_name instead of strncmp
  • Properly initialize v9inode->path to NULL
  • merge v9fs_test_inode_dotl and v9fs_test_new_inode_dotl
  • don't do path comparison for cached mode

TODO:

  • remove the ability to configure inodeident - it will just be cached vs uncached...? (ask suggestion)
  • ask to clarify the role of st_gen
  • guard(rcu)

  • Instead of never reusing inode if path-based ident is not enabled, we
    always reuse in uncached mode, but if path-based ident is not enabled,
    we don't check the path. This makes the code easier to reason about,
    and gets rid of the complexity of having to support two quite different
    mode of operation (re-using and not re-using inodes).
  • Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked
    specifies that it can't sleep.

TODO:
don't down_read rename_sem on iget5 when new

option: no none for inodeident, default to path, can be qid.
inodeident=path on cached mode not supported.

Removing V9FS_INODE_IDENT_MASK


Dropped original patch 3 ("Hide inodeident=path from show_options as it is the default")


rename handling


Discovered by xfstests/generic/084:
Fix crash when inode is allocated then immediately deallocated in iget5_locked due to race
Fix incorrect handling of not-fully-initialized inode being tested by iget5


Cover

(This would also cause a new inode to be created on server-side renames, but
on cached mode that is not supported anyway)

Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.

handle also modifying the ino_path of children when renaming a directory.

Remote renames

We cannot cope with remote renames. The fact that path-based inode ident
can be turned off helps those who want to support such use cases.

Hard links

inodeident=path is "ideologically" incompatible with hard-links, so
changing the inode path to the destination, and not of any other hard
links, is a best effort approach.

Q: why not tree?
A: inode lifetime is independent of dentry, and Landlock might e.g. ihold
/a/b/c but dentry for /a and everything below is gone. Therefore if we
need to mv /a/b -> /a/b2, there is no way to find the ino_path originally
created for /a/b that /a/b/c linked to without searching over either all
children or all live inodes, unless we effectively invent our own
"dcache". So let's just do a O(N) scan of all inodes and change their
ino_path...

Bugs found via xfstests: generic/084
link/unlink and rm/touch race
Use rename_sem to protect v9inode->path, since we need to take it to compare paths anyway.

@micromaomao micromaomao requested a review from Copilot August 27, 2025 09:39
Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the 9pfs-landlock-fix branch 2 times, most recently from 3f8f951 to 74e300f Compare August 27, 2025 10:37
@micromaomao micromaomao requested a review from Copilot August 27, 2025 10:38
Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the 9pfs-landlock-fix branch 2 times, most recently from a4abf12 to c03accc Compare August 27, 2025 13:06
@micromaomao micromaomao requested a review from Copilot August 27, 2025 13:06
Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the 9pfs-landlock-fix branch 3 times, most recently from 39eea49 to 52b5459 Compare August 27, 2025 13:36
@micromaomao micromaomao requested a review from Copilot August 27, 2025 13:39
Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

micromaomao pushed a commit that referenced this pull request Aug 30, 2025
Receiving HSR frame with insufficient space to hold HSR tag in the skb
can result in a crash (kernel BUG):

[   45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1
[   45.392559] ------------[ cut here ]------------
[   45.392912] kernel BUG at net/core/skbuff.c:211!
[   45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef)
[   45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   45.395273] RIP: 0010:skb_panic+0x15b/0x1d0

<snip registers, remove unreliable trace>

[   45.402911] Call Trace:
[   45.403105]  <IRQ>
[   45.404470]  skb_push+0xcd/0xf0
[   45.404726]  br_dev_queue_push_xmit+0x7c/0x6c0
[   45.406513]  br_forward_finish+0x128/0x260
[   45.408483]  __br_forward+0x42d/0x590
[   45.409464]  maybe_deliver+0x2eb/0x420
[   45.409763]  br_flood+0x174/0x4a0
[   45.410030]  br_handle_frame_finish+0xc7c/0x1bc0
[   45.411618]  br_handle_frame+0xac3/0x1230
[   45.413674]  __netif_receive_skb_core.constprop.0+0x808/0x3df0
[   45.422966]  __netif_receive_skb_one_core+0xb4/0x1f0
[   45.424478]  __netif_receive_skb+0x22/0x170
[   45.424806]  process_backlog+0x242/0x6d0
[   45.425116]  __napi_poll+0xbb/0x630
[   45.425394]  net_rx_action+0x4d1/0xcc0
[   45.427613]  handle_softirqs+0x1a4/0x580
[   45.427926]  do_softirq+0x74/0x90
[   45.428196]  </IRQ>

This issue was found by syzkaller.

The panic happens in br_dev_queue_push_xmit() once it receives a
corrupted skb with ETH header already pushed in linear data. When it
attempts the skb_push() call, there's not enough headroom and
skb_push() panics.

The corrupted skb is put on the queue by HSR layer, which makes a
sequence of unintended transformations when it receives a specific
corrupted HSR frame (with incomplete TAG).

Fix it by dropping and consuming frames that are not long enough to
contain both ethernet and hsr headers.

Alternative fix would be to check for enough headroom before skb_push()
in br_dev_queue_push_xmit().

In the reproducer, this is injected via AF_PACKET, but I don't easily
see why it couldn't be sent over the wire from adjacent network.

Further Details:

In the reproducer, the following network interface chain is set up:

┌────────────────┐   ┌────────────────┐
│ veth0_to_hsr   ├───┤  hsr_slave0    ┼───┐
└────────────────┘   └────────────────┘   │
                                          │ ┌──────┐
                                          ├─┤ hsr0 ├───┐
                                          │ └──────┘   │
┌────────────────┐   ┌────────────────┐   │            │┌────────┐
│ veth1_to_hsr   ┼───┤  hsr_slave1    ├───┘            └┤        │
└────────────────┘   └────────────────┘                ┌┼ bridge │
                                                       ││        │
                                                       │└────────┘
                                                       │
                                        ┌───────┐      │
                                        │  ...  ├──────┘
                                        └───────┘

To trigger the events leading up to crash, reproducer sends a corrupted
HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'.

The first HSR-layer function to process this frame is
hsr_handle_frame(). It and then checks if the
protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls
skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that
the skb is long enough. For the crashing frame it is not, and hence the
skb->network_header and skb->mac_len fields are set incorrectly,
pointing after the end of the linear buffer.

I will call this a BUG#1 and it is what is addressed by this patch. In
the crashing scenario before the fix, the skb continues to go down the
hsr path as follows.

hsr_handle_frame() then calls this sequence
hsr_forward_skb()
  fill_frame_info()
    hsr->proto_ops->fill_frame_info()
      hsr_fill_frame_info()

hsr_fill_frame_info() contains a check that intends to check whether the
skb actually contains the HSR header. But the check relies on the
skb->mac_len field which was erroneously setup due to BUG#1, so the
check passes and the execution continues  back in the hsr_forward_skb():

hsr_forward_skb()
  hsr_forward_do()
    hsr->proto_ops->get_untagged_frame()
      hsr_get_untagged_frame()
        create_stripped_skb_hsr()

In create_stripped_skb_hsr(), a copy of the skb is created and is
further corrupted by operation that attempts to strip the HSR tag in a
call to __pskb_copy().

The skb enters create_stripped_skb_hsr() with ethernet header pushed in
linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of
ethernet header into the headroom, creating skb_in with a headroom of
size 8. The subsequent __pskb_copy() then creates an skb with headroom
of just 2 and skb->len of just 12, this is how it looks after the copy:

gdb) p skb->len
$10 = 12
(gdb) p skb->data
$11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373",
(gdb) p skb->head
$12 = (unsigned char *) 0xffff888041e45380 ""

It seems create_stripped_skb_hsr() assumes that ETH header is pulled
in the headroom when it's entered, because it just pulls HSR header on
top. But that is not the case in our code-path and we end up with the
corrupted skb instead. I will call this BUG#2

*I got confused here because it seems that under no conditions can
create_stripped_skb_hsr() work well, the assumption it makes is not true
during the processing of hsr frames - since the skb_push() in
hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I
missed something here.*

Next, the execution arrives in hsr_deliver_master(). It calls
skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have
enough space for the pull (as it only has 12 bytes in total at this
point).

*The skb_pull() here further suggests that ethernet header is meant
to be pushed through the whole hsr processing and
create_stripped_skb_hsr() should pull it before doing the HSR header
pull.*

hsr_deliver_master() then puts the corrupted skb on the queue, it is
then picked up from there by bridge frame handling layer and finally
lands in br_dev_queue_push_xmit where it panics.

Cc: stable@kernel.org
Fixes: 48b491a ("net: hsr: fix mac_len checks")
Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
@micromaomao micromaomao force-pushed the 9pfs-landlock-fix branch 3 times, most recently from ac0ce1c to d2698b4 Compare September 2, 2025 20:12
@micromaomao micromaomao requested a review from Copilot September 2, 2025 20:14
Copilot

This comment was marked as outdated.

@micromaomao micromaomao requested a review from Copilot September 2, 2025 20:51
Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the 9pfs-landlock-fix branch 2 times, most recently from 1ea5ef7 to 7ef55d4 Compare September 4, 2025 00:17
The intention of this patch is to allow features like Landlock and
fanotify (inode mark mode) to work on uncached 9pfs.  These features rely
on holding a specific inode and handling further access to the same file
(as identified by that inode), however, currently in uncached mode, we
always get a new inode on each access, due to concerns regarding server
side inode number collision.

On cached mode (either CACHE_LOOSE or CACHE_META), inode is already reused
only by looking at the qid (server-side inode number).  Since introducing
this additional check would regress hard links (as they will have
different path, and thus the two ends of a hard link won't be the same
inode anymore under this approach), this won't be done for cached mode.

Currently this patch doesn't actually have any effect - the next commit
will introduce a config option to control inodeident=path enablement and
default it to on for uncached mode.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>
Closes: landlock-lsm#45

---
Changes since v1:
- Assume inodeident=path will not be set in cached mode.

- Fix various issues (rcu usage etc) in ino_path.c with feedback from Al
  Viro and Mickaël Salaün

- Use d_same_name instead of strncmp

- Instead of changing v9fs_test_new_inode_dotl to add the path check (thus
  hijacking the meaning of "new" to actually mean "uncached"), we add the
  path check (conditional on the right flags in v9ses) to the cached test
  function (v9fs_test_inode_dotl) and use that function for both cached
  and uncached mode, by adding additional conditionals within in for the
  version/generation check.  The v9fs_test_new_inode_dotl function is thus
  used only for mknod, mkdir and atomic_open in the "need to create" case.

- Instead of never reusing inode if path-based ident is not enabled, we
  always reuse in uncached mode, but if path-based ident is not enabled,
  we don't check the path.  This makes the code easier to reason about,
  and gets rid of the complexity of having to support two quite different
  mode of operation (re-using and not re-using inodes).

- Fix crash due to uninitialized v9inode->path when inode is allocated
  then immediately deallocated in iget5_locked as a result of two iget
  racing with each other to insert the inode.  Spotted via xfstests.

- Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked
  specifies that it can't sleep.  Doing so means that we need to handle a
  special case of inode being created and hashed into the inode list, and
  thus may be tested by another iget5_locked call, but its v9inode->path
  has not been populated yet.  This is resolved via waiting for
  iget5_locked to return before checking the path.  This edge case was
  spotted via xfstests.
By this point we have two ways to test for inode reuse - qid and qid+path.
By default, uncached mode uses qid+path and cached mode uses qid (and in
fact does not support qid+path).  This patch adds the option to control
the behaviour for uncached mode.

In a future version, if we can negotiate with the server and be sure that
it won't give us duplicate qid.path, the default for those cases can be
qid-based.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>

---
Changes since v1:
- Removed inodeident=none and instead supports inodeident=qid.  This means
  that there is no longer an option to not re-use inodes at all.

- No longer supports inodeident=path on cached mode, checks added at
  option init time.

- Added explicit bits for both V9FS_INODE_IDENT_PATH and
  V9FS_INODE_IDENT_QID, in order to set a default based on cache bits when
  neither are set explicitly by the user.
This replicates the earlier .L patch for non-.L, and removing some
previously inserted conditionals in shared code.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Reflect v2 changes to the .L counterpart of this.
This uses the stat struct we already got as part of lookup process to
refresh the inode "for free".

Only for uncached mode for now.  We will need to revisit this for cached
mode once we sort out reusing an old inode with changed qid.version.
(Currently this is not done in this series, which would be fine unless
server side change happens, which is not supposed to happen in cached mode
anyway)

Note that v9fs_test_inode_dotl already makes sure we don't reuse
inodes of the wrong type or different qid.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Check cache bits instead of using `new` - uncached mode now also have
  new=0.
This replicates the previous .L commit for non-.L

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Check cache bits instead of using `new` - uncached mode now also have
  new=0.
This makes it possible for the inode to "move along" to the new location
when a file under a inodeident=path 9pfs is moved, and it will be reused
on next access to the new location.

Modifying the ino_path of children when renaming a directory is currently
not handled.  Renaming non-empty directories still work, but the children
won't have their the inodes be reused after renaming.

Inodes will also not be reused on server-side rename, since there is no
way for us to know about it.  From our perspective this is
indistinguishable from a new file being created in the destination that
just happened to have the same qid, and the original file being deleted.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>

---
New patch in v2
Add a row for this option in the Options table.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
New patch in v2
FUSE is already supported but v9fs requires some fixes [1].

These tests require /mnt/test-v9fs and /mnt/test-fuse to be already
mounted.  It would be too complex to set up such mount points with the
test fixtures because of the related daemons' lifetime, but it is easy
to run these tests with check-linux.sh from landlock-test-tools [2].

Add new kernel configurations to support these two new filesystems.

Rename and update path_is_fs() to take a path as argument.

[tingmao: pulled this patch into this series per discussion with Mickaël, original link is:]
Link: https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/
Cc: Christian Schoenebeck <linux_oss@crudebyte.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Eric Van Hensbergen <ericvh@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/cover.1743971855.git.m@maowtm.org [1]
Link: landlock-lsm/landlock-test-tools#21 [2]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Tested-by: Tingmao Wang <m@maowtm.org>
To: "Mickaël Salaün" <mic@digikod.net>
To: "Günther Noack" <gnoack@google.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Since the path-based inode identification feature needs special handling
for renames, we add some Landlock tests for renames on custom filesystems.

As part of this, also standardize the file path used in existing
layout3_fs variants to be file1_s1d1 (unless they are per-fs fixed paths).
This is so that we can rename this file to e.g. file2_s1d1 or file1_s2d1
(by renaming the parent dir).  This is an optional change - we could
always add an extra file creation in fixture setup for layout3_fs, but
since there's no particular reason to use "dir/file" as the test file, we
just make it use the "standard" name used in the rest of this fs_test.

To: "Mickaël Salaün" <mic@digikod.net>
To: "Günther Noack" <gnoack@google.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces path-based inode identification for 9p filesystems as an alternative to the existing qid-based approach. The primary motivation is to support security frameworks like Landlock and fanotify that rely on inode identity, while handling cases where servers may present qid collisions or when remote renames occur.

  • Adds inodeident=path mount option that tracks file paths internally to determine inode reuse eligibility
  • Implements path comparison logic using stored path snapshots for inode matching
  • Updates inode lookup functions throughout the 9p codebase to support path-based identification

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/testing/selftests/landlock/fs_test.c Adds comprehensive test coverage for rename operations across multiple filesystems including new 9p and FUSE variants
tools/testing/selftests/landlock/config Enables required kernel config options for 9p and FUSE filesystem testing
fs/9p/vfs_super.c Initializes root inode path for path-based identification mode
fs/9p/vfs_inode_dotl.c Major refactoring of inode lookup to support path comparison and handle race conditions
fs/9p/vfs_inode.c Updates non-dotl inode operations for path-based identification and rename handling
fs/9p/v9fs.h Defines new data structures and function signatures for path-based inode identification
fs/9p/v9fs.c Adds mount option parsing and validation for inodeident parameter
fs/9p/ino_path.c New file implementing path snapshot creation, comparison, and cleanup operations
fs/9p/Makefile Includes new ino_path.c compilation unit
Documentation/filesystems/9p.rst Documents the new inodeident mount option and its behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +104 to +105
"Refusing to reuse inode %p based on path mismatch",
inode);
Copy link
Preview

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The debug message format is inconsistent with other debug messages in the function. Consider adding the path information to make debugging easier: "Refusing to reuse inode %p (path mismatch for %pd)" and include the dentry parameter.

Suggested change
"Refusing to reuse inode %p based on path mismatch",
inode);
"Refusing to reuse inode %p (path mismatch for %pd)",
inode, dentry);

Copilot uses AI. Check for mistakes.

if (!inode || (inode->i_state & I_NEW)) {
WARN_ONCE(
1,
"Expected iget5_locked to return an existing inode");
Copy link
Preview

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is unclear about what went wrong. Consider expanding it to: "Expected iget5_locked to return an existing inode during double-check, got new inode instead" to better explain the unexpected condition.

Suggested change
"Expected iget5_locked to return an existing inode");
"Expected iget5_locked to return an existing inode during double-check, got new inode instead");

Copilot uses AI. Check for mistakes.


/* Don't include the root dentry */
while (curr->d_parent != curr) {
if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) {
Copy link
Preview

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using SSIZE_MAX as the limit is incorrect since path_components is declared as size_t (unsigned) but compared against a signed maximum. This should be SIZE_MAX or the variable should be ssize_t.

Suggested change
if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) {
if (WARN_ON_ONCE(path_components >= SIZE_MAX)) {

Copilot uses AI. Check for mistakes.

return false;
}
compare = &ino_path->names[i];
if (!d_same_name(curr, curr->d_parent, &compare->name)) {
Copy link
Preview

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The d_same_name function expects a parent dentry as the second parameter, but here curr->d_parent is passed while curr is the child. The parameter order should be verified against the function signature to ensure correct usage.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants