-
Notifications
You must be signed in to change notification settings - Fork 57.6k
Update faq.txt #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update faq.txt #61
Conversation
I remove dead links.
Linus does not accept pull requests from github. Send an e-mail to LKML. |
How to send an email to LKML ? http://www.cadsoft.de/people/kls/vdr/ => Not Found. Apologies, but the page you requested could not be found. http://www.metzlerbros.org/dvb/ => Not Found. The requested URL /dvb/ was not found on this server. http://xinehq.de/ => The page is not displayed http://cvs.tuxbox.org/ => projet dead, no code source :( |
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK [ 2.489005] smpboot: Booting Node 4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK [ 3.093005] smpboot: Booting Node 5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK [ 3.698005] smpboot: Booting Node 6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK [ 4.304005] smpboot: Booting Node 7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
…loop-during-umount-checkpatch-fixes ERROR: code indent should use tabs where possible torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) {$ WARNING: please, no spaces at the start of a line torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) {$ WARNING: suspect code indent for conditional statements (23, 31) torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) { + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; ERROR: code indent should use tabs where possible torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088: + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$ WARNING: please, no spaces at the start of a line torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088: + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$ ERROR: code indent should use tabs where possible #58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$ WARNING: please, no spaces at the start of a line #58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$ WARNING: quoted string split across lines torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, " + "telling master to get ref " ERROR: code indent should use tabs where possible torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + "telling master to get ref "$ WARNING: please, no spaces at the start of a line torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + "telling master to get ref "$ WARNING: quoted string split across lines torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "telling master to get ref " + "for cleared out mle during " ERROR: code indent should use tabs where possible torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "for cleared out mle during "$ WARNING: please, no spaces at the start of a line torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "for cleared out mle during "$ WARNING: quoted string split across lines torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "for cleared out mle during " + "migration\n", dlm->name, ERROR: code indent should use tabs where possible torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "migration\n", dlm->name,$ WARNING: please, no spaces at the start of a line torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "migration\n", dlm->name,$ ERROR: code indent should use tabs where possible torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093: + namelen, name, master,$ WARNING: please, no spaces at the start of a line torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093: + namelen, name, master,$ ERROR: code indent should use tabs where possible torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094: + new_master);$ WARNING: please, no spaces at the start of a line torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094: + new_master);$ ERROR: code indent should use tabs where possible torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095: + }$ WARNING: please, no spaces at the start of a line torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095: + }$ total: 9 errors, 13 warnings, 20 lines checked NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: jiangyiwen <jiangyiwen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…loop-during-umount-checkpatch-fixes ERROR: code indent should use tabs where possible torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) {$ WARNING: please, no spaces at the start of a line torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) {$ WARNING: suspect code indent for conditional statements (23, 31) torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087: + if (tmp->type == DLM_MLE_MASTER) { + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; ERROR: code indent should use tabs where possible torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088: + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$ WARNING: please, no spaces at the start of a line torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088: + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$ ERROR: code indent should use tabs where possible #58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$ WARNING: please, no spaces at the start of a line #58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$ WARNING: quoted string split across lines torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + mlog(0, "%s:%.*s: master=%u, newmaster=%u, " + "telling master to get ref " ERROR: code indent should use tabs where possible torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + "telling master to get ref "$ WARNING: please, no spaces at the start of a line torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090: + "telling master to get ref "$ WARNING: quoted string split across lines torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "telling master to get ref " + "for cleared out mle during " ERROR: code indent should use tabs where possible torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "for cleared out mle during "$ WARNING: please, no spaces at the start of a line torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091: + "for cleared out mle during "$ WARNING: quoted string split across lines torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "for cleared out mle during " + "migration\n", dlm->name, ERROR: code indent should use tabs where possible torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "migration\n", dlm->name,$ WARNING: please, no spaces at the start of a line torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092: + "migration\n", dlm->name,$ ERROR: code indent should use tabs where possible torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093: + namelen, name, master,$ WARNING: please, no spaces at the start of a line torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093: + namelen, name, master,$ ERROR: code indent should use tabs where possible torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094: + new_master);$ WARNING: please, no spaces at the start of a line torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094: + new_master);$ ERROR: code indent should use tabs where possible torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095: + }$ WARNING: please, no spaces at the start of a line torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095: + }$ total: 9 errors, 13 warnings, 20 lines checked NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: jiangyiwen <jiangyiwen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Cc: stable@vger.kernel.org Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty #61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty #61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty #61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty #61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WARNING: quoted string split across lines torvalds#60: FILE: fs/isofs/compress.c:163: " page idx = %d, bh idx = %d," + " avail_in = %ld," WARNING: quoted string split across lines torvalds#61: FILE: fs/isofs/compress.c:164: + " avail_in = %ld," + " avail_out = %ld\n", WARNING: missing space after return type torvalds#76: FILE: include/linux/decompress/bunzip2.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#77: FILE: include/linux/decompress/bunzip2.h:6: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#94: FILE: include/linux/decompress/generic.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#95: FILE: include/linux/decompress/generic.h:6: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#122: FILE: include/linux/decompress/inflate.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#123: FILE: include/linux/decompress/inflate.h:6: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#140: FILE: include/linux/decompress/unlz4.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#141: FILE: include/linux/decompress/unlz4.h:6: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#158: FILE: include/linux/decompress/unlzma.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#159: FILE: include/linux/decompress/unlzma.h:6: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#177: FILE: include/linux/decompress/unlzo.h:5: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#178: FILE: include/linux/decompress/unlzo.h:6: + long(*flush)(void*, unsigned long), WARNING: please, no spaces at the start of a line torvalds#210: FILE: include/linux/zlib.h:86: + uLong avail_in; /* number of bytes available at next_in */$ WARNING: please, no spaces at the start of a line torvalds#215: FILE: include/linux/zlib.h:90: + uLong avail_out; /* remaining free space at next_out */$ WARNING: __initdata should be placed after count torvalds#259: FILE: init/initramfs.c:177: +static __initdata unsigned long count; WARNING: __initdata should be placed after remains torvalds#268: FILE: init/initramfs.c:189: +static __initdata long remains; WARNING: missing space after return type torvalds#385: FILE: lib/decompress_bunzip2.c:679: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#386: FILE: lib/decompress_bunzip2.c:680: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#401: FILE: lib/decompress_bunzip2.c:747: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#402: FILE: lib/decompress_bunzip2.c:748: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#427: FILE: lib/decompress_inflate.c:37: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#428: FILE: lib/decompress_inflate.c:38: + long(*flush)(void*, unsigned long), WARNING: Unnecessary space before function pointer arguments torvalds#456: FILE: lib/decompress_unlz4.c:35: + long (*fill) (void *, unsigned long), WARNING: Unnecessary space before function pointer arguments torvalds#457: FILE: lib/decompress_unlz4.c:36: + long (*flush) (void *, unsigned long), WARNING: missing space after return type torvalds#479: FILE: lib/decompress_unlz4.c:179: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#480: FILE: lib/decompress_unlz4.c:180: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#529: FILE: lib/decompress_unlzma.c:283: + long(*flush)(void*, unsigned long); WARNING: missing space after return type torvalds#541: FILE: lib/decompress_unlzma.c:538: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#542: FILE: lib/decompress_unlzma.c:539: + long(*flush)(void*, unsigned long), WARNING: missing space after return type torvalds#557: FILE: lib/decompress_unlzma.c:671: + long(*fill)(void*, unsigned long), WARNING: missing space after return type torvalds#558: FILE: lib/decompress_unlzma.c:672: + long(*flush)(void*, unsigned long), WARNING: Unnecessary space before function pointer arguments torvalds#586: FILE: lib/decompress_unlzo.c:112: + long (*fill) (void *, unsigned long), WARNING: Unnecessary space before function pointer arguments torvalds#587: FILE: lib/decompress_unlzo.c:113: + long (*flush) (void *, unsigned long), total: 0 errors, 35 warnings, 479 lines checked ./patches/initramfs-support-initramfs-that-is-more-than-2g.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Cc: stable@vger.kernel.org Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
BugLink: http://bugs.launchpad.net/bugs/1320946 commit a585f87 upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty torvalds#61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty torvalds#61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Rtc: merged. thanks
Handle KSEG1 addresses in mips_dma_mmap()
After the recent fix of runtime PM for USB-audio driver, we got a lockdep warning like: ============================================= [ INFO: possible recursive locking detected ] 4.2.0-rc8+ torvalds#61 Not tainted --------------------------------------------- pulseaudio/980 is trying to acquire lock: (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio] but task is already holding lock: (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio] This comes from snd_usb_autoresume() invoking down_read() and it's used in a nested way. Although it's basically safe, per se (as these are read locks), it's better to reduce such spurious warnings. The read lock is needed to guarantee the execution of "shutdown" (cleanup at disconnection) task after all concurrent tasks are finished. This can be implemented in another better way. Also, the current check of chip->in_pm isn't good enough for protecting the racy execution of multiple auto-resumes. This patch rewrites the logic of snd_usb_autoresume() & co; namely, - The recursive call of autopm is avoided by the new refcount, chip->active. The chip->in_pm flag is removed accordingly. - Instead of rwsem, another refcount, chip->usage_count, is introduced for tracking the period to delay the shutdown procedure. At the last clear of this refcount, wake_up() to the shutdown waiter is called. - The shutdown flag is replaced with shutdown atomic count; this is for reducing the lock. - Two new helpers are introduced to simplify the management of these refcounts; snd_usb_lock_shutdown() increases the usage_count, checks the shutdown state, and does autoresume. snd_usb_unlock_shutdown() does the opposite. Most of mixer and other codes just need this, and simply returns an error if it receives an error from lock. Fixes: 9003ebb ('ALSA: usb-audio: Fix runtime PM unbalance') Reported-and-tested-by: Alexnader Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit 06bb6f5 ("mtd: spi-nor: stop (ab)using struct spi_device_id") converted an array into a pointer, which means that we should be checking if the pointer goes anywhere, not whether the C string is empty. To do the latter means we dereference a NULL pointer when we reach the terminating entry, for which 'name' is now NULL instead of an array { 0, 0, ... }. Sample crash: [ 1.101371] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1.109457] pgd = c0004000 [ 1.112157] [00000000] *pgd=00000000 [ 1.115736] Internal error: Oops: 5 [#1] SMP ARM [ 1.120345] Modules linked in: [ 1.123405] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.2.0-next-20150902+ #61 [ 1.130611] Hardware name: Rockchip (Device Tree) [ 1.135306] task: ee0b8d4 ti: ee0ba00 task.ti: ee0ba00 [ 1.140697] PC is at spi_nor_scan+0x90/0x8c4 [ 1.144958] LR is at spi_nor_scan+0xa4/0x8c4 ... [ 1.504112] [<c03cc2e0>] (spi_nor_scan) from [<c03cb188>] (m25p_probe+0xc8/0x11c) [ 1.511583] [<c03cb188>] (m25p_probe) from [<c03cd9d8>] (spi_drv_probe+0x60/0x7c) [ 1.519055] [<c03cd9d8>] (spi_drv_probe) from [<c037faa0>] (driver_probe_device+0x1a0/0x444) [ 1.527478] [<c037faa0>] (driver_probe_device) from [<c037fec8>] (__device_attach_driver+0x94/0xa0) [ 1.536507] [<c037fec8>] (__device_attach_driver) from [<c037db3c>] (bus_for_each_drv+0x94/0xa4) [ 1.545277] [<c037db3c>] (bus_for_each_drv) from [<c037f7e4>] (__device_attach+0xa4/0x144) [ 1.553526] [<c037f7e4>] (__device_attach) from [<c0380058>] (device_initial_probe+0x1c/0x20) [ 1.562035] [<c0380058>] (device_initial_probe) from [<c037ec88>] (bus_probe_device+0x38/0x94) [ 1.570631] [<c037ec88>] (bus_probe_device) from [<c037ccf4>] (device_add+0x430/0x558) [ 1.578534] [<c037ccf4>] (device_add) from [<c03d0240>] (spi_add_device+0xe4/0x174) [ 1.586178] [<c03d0240>] (spi_add_device) from [<c03d0a24>] (spi_register_master+0x698/0x7d4) [ 1.594688] [<c03d0a24>] (spi_register_master) from [<c03d0ba0>] (devm_spi_register_master+0x40/0x7c) [ 1.603892] [<c03d0ba0>] (devm_spi_register_master) from [<c03d2fb4>] (rockchip_spi_probe+0x360/0x3f4) [ 1.613182] [<c03d2fb4>] (rockchip_spi_probe) from [<c0381e34>] (platform_drv_probe+0x58/0xa8) [ 1.621779] [<c0381e34>] (platform_drv_probe) from [<c037faa0>] (driver_probe_device+0x1a0/0x444) [ 1.630635] [<c037faa0>] (driver_probe_device) from [<c037fdc4>] (__driver_attach+0x80/0xa4) [ 1.639058] [<c037fdc4>] (__driver_attach) from [<c037e850>] (bus_for_each_dev+0x98/0xac) [ 1.647221] [<c037e850>] (bus_for_each_dev) from [<c037f448>] (driver_attach+0x28/0x30) [ 1.655210] [<c037f448>] (driver_attach) from [<c037ef74>] (bus_add_driver+0x128/0x250) [ 1.663200] [<c037ef74>] (bus_add_driver) from [<c0380c40>] (driver_register+0xac/0xf0) [ 1.671191] [<c0380c40>] (driver_register) from [<c0381d50>] (__platform_driver_register+0x58/0x6c) [ 1.680221] [<c0381d50>] (__platform_driver_register) from [<c0a467c8>] (rockchip_spi_driver_init+0x18/0x20) [ 1.690033] [<c0a467c8>] (rockchip_spi_driver_init) from [<c00098a4>] (do_one_initcall+0x124/0x1dc) [ 1.699063] [<c00098a4>] (do_one_initcall) from [<c0a19f84>] (kernel_init_freeable+0x218/0x2ec) [ 1.707748] [<c0a19f84>] (kernel_init_freeable) from [<c0719ed8>] (kernel_init+0x1c/0xf4) [ 1.715912] [<c0719ed8>] (kernel_init) from [<c000fe50>] (ret_from_fork+0x14/0x24) [ 1.723460] Code: e3510000 159f67c0 0a00000c e5961000 (e5d13000) [ 1.729564] ---[ end trace 95baa6b3b861ce25 ]--- Fixes: 06bb6f5 ("mtd: spi-nor: stop (ab)using struct spi_device_id") Signed-off-by: Brian Norris <computersforpeace@gmail.com> Cc: Rafał Miłecki <zajec5@gmail.com>
…ntext" With 4.1 RT-kernel on TI ARM dra7-evm the below error report is displayed each time system is woken up from sleep state: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 142, name: sh INFO: lockdep is turned off. irq event stamp: 11575 hardirqs last enabled at (11575): [<c06dd280>] _raw_spin_unlock_irq+0x24/0x68 hardirqs last disabled at (11574): [<c06dd0b8>] _raw_spin_lock_irq+0x18/0x4c softirqs last enabled at (0): [<c0045080>] copy_process.part.54+0x3f4/0x1930 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 0 PID: 142 Comm: sh Not tainted 4.1.10-rt8-01710-g6c5fab9 torvalds#61 Hardware name: Generic DRA74X (Flattened Device Tree) [<c00190a8>] (unwind_backtrace) from [<c0014460>] (show_stack+0x10/0x14) [<c0014460>] (show_stack) from [<c06d7638>] (dump_stack+0x7c/0x98) [<c06d7638>] (dump_stack) from [<c06dd604>] (rt_spin_lock+0x20/0x60) [<c06dd604>] (rt_spin_lock) from [<c00a2d20>] (freeze_wake+0x10/0x58) [<c00a2d20>] (freeze_wake) from [<c00b1a1c>] (irq_pm_check_wakeup+0x40/0x48) [<c00b1a1c>] (irq_pm_check_wakeup) from [<c00acee0>] (irq_may_run+0x20/0x48) [<c00acee0>] (irq_may_run) from [<c00ad1f8>] (handle_level_irq+0x40/0x160) [<c00ad1f8>] (handle_level_irq) from [<c00a8fbc>] (generic_handle_irq+0x28/0x3c) [<c00a8fbc>] (generic_handle_irq) from [<c03fc330>] (pcs_irq_handle+0x6c/0x8c) [<c03fc330>] (pcs_irq_handle) from [<c03fc380>] (pcs_irq_chain_handler+0x30/0x88) [<c03fc380>] (pcs_irq_chain_handler) from [<c00a8fbc>] (generic_handle_irq+0x28/0x3c) [<c00a8fbc>] (generic_handle_irq) from [<c00330c4>] (omap_prcm_irq_handler+0xc4/0x1a8) [<c00330c4>] (omap_prcm_irq_handler) from [<c00a8fbc>] (generic_handle_irq+0x28/0x3c) [<c00a8fbc>] (generic_handle_irq) from [<c00a92bc>] (__handle_domain_irq+0x8c/0x120) [<c00a92bc>] (__handle_domain_irq) from [<c000958c>] (gic_handle_irq+0x20/0x60) [<c000958c>] (gic_handle_irq) from [<c06ddea4>] (__irq_svc+0x44/0x90) Exception stack(0xeca49dc8 to 0xeca49e10) 9dc0: c00a2e38 ec9fae00 00000100 00000000 c12fb80c 00000003 9de0: 00000000 c0b396a0 c0a70804 c0896b20 c0896ae8 00000000 00000000 eca49e10 9e00: c00a2e38 c00a2e3c 60000013 ffffffff [<c06ddea4>] (__irq_svc) from [<c00a2e3c>] (arch_suspend_enable_irqs+0xc/0x10) [<c00a2e3c>] (arch_suspend_enable_irqs) from [<c00a3138>] (suspend_enter+0x2f8/0xce4) [<c00a3138>] (suspend_enter) from [<c00a3bf8>] (suspend_devices_and_enter+0xd4/0x648) [<c00a3bf8>] (suspend_devices_and_enter) from [<c00a480c>] (enter_state+0x6a0/0x1030) [<c00a480c>] (enter_state) from [<c00a51b0>] (pm_suspend+0x14/0x74) [<c00a51b0>] (pm_suspend) from [<c00a1d6c>] (state_store+0x64/0xb8) [<c00a1d6c>] (state_store) from [<c02143bc>] (kernfs_fop_write+0xb8/0x19c) [<c02143bc>] (kernfs_fop_write) from [<c019c290>] (__vfs_write+0x20/0xd8) [<c019c290>] (__vfs_write) from [<c019cb30>] (vfs_write+0x90/0x164) [<c019cb30>] (vfs_write) from [<c019d354>] (SyS_write+0x44/0x9c) [<c019d354>] (SyS_write) from [<c0010300>] (ret_fast_syscall+0x0/0x1c) The root cause of issue is freeze_wake() which is called from atomic context (HW IRQ handler). The call path: gic_handle_irq - omap_prcm_irq_handler - pcs_irq_chain_handler - [...] - handle_level_irq - irq_may_run - irq_pm_check_wakeup - freeze_wake - spin_lock_irqsave(&suspend_freeze_lock,..); Hence, fix this issue by doing the following: - convert suspend_freeze_lock to raw lock; - convert suspend_freeze_wait_head to simple waitqueue; - move pm_wakeup_pending() out of region, protected by suspend_freeze_lock; Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reported to be needed as per fdo#70354 comment torvalds#61. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This fixes BUG that is trickered when fwnode->secondary is not null, but has ERR_PTR(-ENODEV) instead. BUG: unable to handle kernel paging request at ffffffffffffffed IP: [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 PGD 200e067 PUD 2010067 PMD 0 Oops: 0000 [#1] SMP KASAN Modules linked in: dwc3_pci(+) dwc3 CPU: 0 PID: 1138 Comm: modprobe Not tainted 4.5.0-rc5+ torvalds#61 task: ffff88015aaf5b00 ti: ffff88007b958000 task.ti: ffff88007b958000 RIP: 0010:[<ffffffff81677b86>] [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 RSP: 0018:ffff88007b95eff8 EFLAGS: 00010246 RAX: fffffbfffffffffd RBX: ffffffffffffffed RCX: ffff88015999cd37 RDX: dffffc0000000000 RSI: ffffffff81e11bc0 RDI: ffffffffffffffed RBP: ffff88007b95f020 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88007b90f7cf R11: 0000000000000000 R12: ffff88007b95f0a0 R13: 00000000fffffffa R14: ffffffff81e11bc0 R15: ffff880159ea37a0 FS: 00007ff35f46c700(0000) GS:ffff88015b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffed CR3: 000000007b8be000 CR4: 00000000001006f0 Stack: ffff88015999cd20 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b383dd8 ffff880159ea37a0 ffff88007b95f048 ffffffff81677d03 ffff88007b952460 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b95f070 ffffffff81677d40 Call Trace: [<ffffffff81677d03>] fwnode_property_read_string+0x43/0x50 [<ffffffff81677d40>] device_property_read_string+0x30/0x40 ... Fixes: 362c0b3 (device property: Fallback to secondary fwnode if primary misses the property) Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
This fixes BUG that is trickered when fwnode->secondary is not null, but has ERR_PTR(-ENODEV) instead. BUG: unable to handle kernel paging request at ffffffffffffffed IP: [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 PGD 200e067 PUD 2010067 PMD 0 Oops: 0000 [#1] SMP KASAN Modules linked in: dwc3_pci(+) dwc3 CPU: 0 PID: 1138 Comm: modprobe Not tainted 4.5.0-rc5+ torvalds#61 task: ffff88015aaf5b00 ti: ffff88007b958000 task.ti: ffff88007b958000 RIP: 0010:[<ffffffff81677b86>] [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 RSP: 0018:ffff88007b95eff8 EFLAGS: 00010246 RAX: fffffbfffffffffd RBX: ffffffffffffffed RCX: ffff88015999cd37 RDX: dffffc0000000000 RSI: ffffffff81e11bc0 RDI: ffffffffffffffed RBP: ffff88007b95f020 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88007b90f7cf R11: 0000000000000000 R12: ffff88007b95f0a0 R13: 00000000fffffffa R14: ffffffff81e11bc0 R15: ffff880159ea37a0 FS: 00007ff35f46c700(0000) GS:ffff88015b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffed CR3: 000000007b8be000 CR4: 00000000001006f0 Stack: ffff88015999cd20 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b383dd8 ffff880159ea37a0 ffff88007b95f048 ffffffff81677d03 ffff88007b952460 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b95f070 ffffffff81677d40 Call Trace: [<ffffffff81677d03>] fwnode_property_read_string+0x43/0x50 [<ffffffff81677d40>] device_property_read_string+0x30/0x40 ... Fixes: 362c0b3 (device property: Fallback to secondary fwnode if primary misses the property) Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
This fixes BUG triggered when fwnode->secondary is not NULL, but has ERR_PTR(-ENODEV) instead. BUG: unable to handle kernel paging request at ffffffffffffffed IP: [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 PGD 200e067 PUD 2010067 PMD 0 Oops: 0000 [#1] SMP KASAN Modules linked in: dwc3_pci(+) dwc3 CPU: 0 PID: 1138 Comm: modprobe Not tainted 4.5.0-rc5+ torvalds#61 task: ffff88015aaf5b00 ti: ffff88007b958000 task.ti: ffff88007b958000 RIP: 0010:[<ffffffff81677b86>] [<ffffffff81677b86>] __fwnode_property_read_string+0x26/0x160 RSP: 0018:ffff88007b95eff8 EFLAGS: 00010246 RAX: fffffbfffffffffd RBX: ffffffffffffffed RCX: ffff88015999cd37 RDX: dffffc0000000000 RSI: ffffffff81e11bc0 RDI: ffffffffffffffed RBP: ffff88007b95f020 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88007b90f7cf R11: 0000000000000000 R12: ffff88007b95f0a0 R13: 00000000fffffffa R14: ffffffff81e11bc0 R15: ffff880159ea37a0 FS: 00007ff35f46c700(0000) GS:ffff88015b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffed CR3: 000000007b8be000 CR4: 00000000001006f0 Stack: ffff88015999cd20 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b383dd8 ffff880159ea37a0 ffff88007b95f048 ffffffff81677d03 ffff88007b952460 ffffffff81e11bc0 ffff88007b95f0a0 ffff88007b95f070 ffffffff81677d40 Call Trace: [<ffffffff81677d03>] fwnode_property_read_string+0x43/0x50 [<ffffffff81677d40>] device_property_read_string+0x30/0x40 ... Fixes: 362c0b3 (device property: Fallback to secondary fwnode if primary misses the property) Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
WARNING: Missing a blank line after declarations torvalds#61: FILE: mm/util.c:356: + int i; + if (likely(!PageCompound(page))) total: 0 errors, 1 warnings, 55 lines checked ./patches/mm-uninline-page_mapped.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Upstream commit 47ab154 ] After the recent fix of runtime PM for USB-audio driver, we got a lockdep warning like: ============================================= [ INFO: possible recursive locking detected ] 4.2.0-rc8+ torvalds#61 Not tainted --------------------------------------------- pulseaudio/980 is trying to acquire lock: (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio] but task is already holding lock: (&chip->shutdown_rwsem){.+.+.+}, at: [<ffffffffa0355dac>] snd_usb_autoresume+0x1d/0x52 [snd_usb_audio] This comes from snd_usb_autoresume() invoking down_read() and it's used in a nested way. Although it's basically safe, per se (as these are read locks), it's better to reduce such spurious warnings. The read lock is needed to guarantee the execution of "shutdown" (cleanup at disconnection) task after all concurrent tasks are finished. This can be implemented in another better way. Also, the current check of chip->in_pm isn't good enough for protecting the racy execution of multiple auto-resumes. This patch rewrites the logic of snd_usb_autoresume() & co; namely, - The recursive call of autopm is avoided by the new refcount, chip->active. The chip->in_pm flag is removed accordingly. - Instead of rwsem, another refcount, chip->usage_count, is introduced for tracking the period to delay the shutdown procedure. At the last clear of this refcount, wake_up() to the shutdown waiter is called. - The shutdown flag is replaced with shutdown atomic count; this is for reducing the lock. - Two new helpers are introduced to simplify the management of these refcounts; snd_usb_lock_shutdown() increases the usage_count, checks the shutdown state, and does autoresume. snd_usb_unlock_shutdown() does the opposite. Most of mixer and other codes just need this, and simply returns an error if it receives an error from lock. Fixes: 9003ebb ('ALSA: usb-audio: Fix runtime PM unbalance') Reported-and-tested-by: Alexnader Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
… runtime 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set. However, if changing filter feature during netdev runtime, null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance. [1] BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) Write of size 8 at addr 0000000000000000 by task ip/382 CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#60 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) kasan_report (mm/kasan/report.c:636) unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: real_dev up vlan_device_event vlan_vid_add(dev, htons(ETH_P_8021Q), 0); vlan_info_alloc //alloc new empty vid0. refcnt=1 step6: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //will trigger it if step5 was not executed vlan_group_set_device array = vg->vlan_devices_arrays //null-ptr-ref will be triggered after step5 E.g. the following sequence can reproduce null-ptr-ref $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 Add the auto_vid0 flag to track the refcount of vlan0, and use this flag to determine whether to dec refcount while disabling real_dev. Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Co-developed-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: NipaLocal <nipa@local>
… runtime 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set. However, if changing filter feature during netdev runtime, null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance. [1] BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) Write of size 8 at addr 0000000000000000 by task ip/382 CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#60 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) kasan_report (mm/kasan/report.c:636) unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: real_dev up vlan_device_event vlan_vid_add(dev, htons(ETH_P_8021Q), 0); vlan_info_alloc //alloc new empty vid0. refcnt=1 step6: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //will trigger it if step5 was not executed vlan_group_set_device array = vg->vlan_devices_arrays //null-ptr-ref will be triggered after step5 E.g. the following sequence can reproduce null-ptr-ref $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 Add the auto_vid0 flag to track the refcount of vlan0, and use this flag to determine whether to dec refcount while disabling real_dev. Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Co-developed-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: NipaLocal <nipa@local>
… runtime 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set. However, if changing filter feature during netdev runtime, null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance. [1] BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) Write of size 8 at addr 0000000000000000 by task ip/382 CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#60 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) kasan_report (mm/kasan/report.c:636) unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: real_dev up vlan_device_event vlan_vid_add(dev, htons(ETH_P_8021Q), 0); vlan_info_alloc //alloc new empty vid0. refcnt=1 step6: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //will trigger it if step5 was not executed vlan_group_set_device array = vg->vlan_devices_arrays //null-ptr-ref will be triggered after step5 E.g. the following sequence can reproduce null-ptr-ref $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 Add the auto_vid0 flag to track the refcount of vlan0, and use this flag to determine whether to dec refcount while disabling real_dev. Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Co-developed-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: NipaLocal <nipa@local>
… runtime 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set. However, if changing filter feature during netdev runtime, null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance. [1] BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) Write of size 8 at addr 0000000000000000 by task ip/382 CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#60 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) kasan_report (mm/kasan/report.c:636) unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: real_dev up vlan_device_event vlan_vid_add(dev, htons(ETH_P_8021Q), 0); vlan_info_alloc //alloc new empty vid0. refcnt=1 step6: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //will trigger it if step5 was not executed vlan_group_set_device array = vg->vlan_devices_arrays //null-ptr-ref will be triggered after step5 E.g. the following sequence can reproduce null-ptr-ref $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 Add the auto_vid0 flag to track the refcount of vlan0, and use this flag to determine whether to dec refcount while disabling real_dev. Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Co-developed-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: NipaLocal <nipa@local>
… runtime Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 8984bcb)
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
… runtime [ Upstream commit 579d4f9 ] Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 torvalds#61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I remove dead links.