-
Notifications
You must be signed in to change notification settings - Fork 31
Vsc fixes #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vsc fixes #11
Conversation
Drop the unused vsc_tp_request_irq() and vsc_tp_free_irq() functions. Signed-off-by: Hans de Goede <hansg@kernel.org>
mei_vsc_hw_reset() gets called from mei_start() and mei_stop() in the latter case we do not need to re-init the VSC by calling vsc_tp_init(). mei_stop() only happens on shutdown and driver unbind. On shutdown we don't need to load + boot the firmware and if the driver later is bound to the device again then mei_start() will do another reset. The intr_enable flag is true when called from mei_start() and false on mei_stop(). Skip vsc_tp_init() when intr_enable is false. This avoids unnecessarily uploading the firmware, which takes 11 seconds. This change reduces the poweroff/reboot time by 11 seconds. Fixes: 386a766 ("mei: Add MEI hardware support for IVSC device") Signed-off-by: Hans de Goede <hansg@kernel.org>
Now that mei_vsc_hw_reset() no longer re-inits the VSC when called from mei_stop(), vsc_tp_shutdown() unregistering the platform-device, which runs mei_stop() is sufficient to put the VSC in a clean state. Signed-off-by: Hans de Goede <hansg@kernel.org>
After removing the vsc_tp_reset() call from vsc_tp_shutdown() it is now identical to vsc_tp_remove(). Use vsc_tp_remove() as shutdown handler and remove vsc_tp_shutdown(). Signed-off-by: Hans de Goede <hansg@kernel.org>
The event_notify callback which runs from vsc_tp_thread_isr may call vsc_tp_xfer() which locks the mutex. So the ISR depends on the mutex. Move the mutex_destroy() call to after free_irq() to ensure that the ISR is not running while the mutex is destroyed. Fixes: 566f5ca ("mei: Add transport driver for IVSC device") Signed-off-by: Hans de Goede <hansg@kernel.org>
vsc_tp_register_event_cb() can race with vsc_tp_thread_isr(), add a mutex to protect against this. Fixes: 566f5ca ("mei: Add transport driver for IVSC device") Signed-off-by: Hans de Goede <hansg@kernel.org>
Make mei_vsc_remove() properly unset the callback to avoid a dead callback sticking around after probe errors or unbinding of the platform driver. Fixes: 386a766 ("mei: Add MEI hardware support for IVSC device") Signed-off-by: Hans de Goede <hansg@kernel.org>
The event_notify callback in some cases calls vsc_tp_xfer(), which checks tp->assert_cnt and waits for it through the tp->xfer_wait wait-queue. And tp->assert_cnt is increased and the tp->xfer_wait queue is woken o from the interrupt handler. So the interrupt handler which is running the event callback is waiting for itself to signal that it can continue. This happens to work because the event callback runs from the threaded ISR handler and while that is running the hard ISR handler will still get called a second / third time for further interrupts and it is the hard ISR handler which does the atomic_inc() and wake_up() calls. But having the threaded ISR handler wait for its own interrupt to trigger again is not how a threaded ISR handler is supposed to be used. Move the running of the event callback from a threaded interrupt handler to a workqueue since a threaded ISR should not wait for events from its own interrupt. This is a preparation patch for moving the atomic_inc() and wake_up() calls to the threaded ISR handler, which is necessary to fix a locking issue. Fixes: 566f5ca ("mei: Add transport driver for IVSC device") Signed-off-by: Hans de Goede <hansg@kernel.org>
Kernels build with CONFIG_PROVE_RAW_LOCK_NESTING report the following tp-vsc lockdep error: ============================= [ BUG: Invalid wait context ] ... swapper/10/0 is trying to lock: ffff88819c271888 (&tp->xfer_wait){....}-{3:3}, at: __wake_up (kernel/sched/wait.c:106 kernel/sched/wait.c:127) ... Call Trace: <IRQ> ... __raw_spin_lock_irqsave (./include/linux/spinlock_api_smp.h:111) __wake_up (kernel/sched/wait.c:106 kernel/sched/wait.c:127) vsc_tp_isr (drivers/misc/mei/vsc-tp.c:110) mei_vsc_hw __handle_irq_event_percpu (kernel/irq/handle.c:158) handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210) handle_edge_irq (kernel/irq/chip.c:833) ... </IRQ> The root-cause of this is the IRQF_NO_THREAD flag used by the intel-pinctrl code. Setting IRQF_NO_THREAD requires all interrupt handlers for GPIO ISRs to use raw-spinlocks only since normal spinlocks can sleep in PREEMPT-RT kernels and with IRQF_NO_THREAD the interrupt handlers will always run in an atomic context [1]. vsc_tp_isr() calls wake_up(&tp->xfer_wait), which uses a regular spinlock, breaking the raw-spinlocks only rule for Intel GPIO ISRs. Make vsc_tp_isr() run as threaded ISR instead of as hard ISR to fix this. Fixes: 566f5ca ("mei: Add transport driver for IVSC device") Link: https://lore.kernel.org/linux-gpio/18ab52bd-9171-4667-a600-0f52ab7017ac@kernel.org/ [1] Signed-off-by: Hans de Goede <hansg@kernel.org>
mei_cl_bus_dev_release() also frees the mei-client (struct mei_cl) belonging to the device being released. If there are bugs like the just fixed bug in the ACE/CSI2 mei drivers, the mei-client being freed might still be part of the mei_device's file_list and iterating over this list after the freeing will then trigger a use-afer-free bug. Add a check to mei_cl_bus_dev_release() to make sure that the to-be-freed mei-client is not on the mei_device's file_list. Signed-off-by: Hans de Goede <hansg@kernel.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the VSC transport layer by removing deprecated IRQ API functions, refactoring IRQ handling with a threaded work mechanism, and unifying shutdown and removal procedures.
- Removed vsc_tp_request_irq and vsc_tp_free_irq definitions from the header.
- Replaced the dual IRQ handlers with a threaded work approach that schedules event callbacks.
- Adjusted event callback registration and integrated shutdown with the device removal path.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
drivers/misc/mei/vsc-tp.h | Removed deprecated IRQ API calls |
drivers/misc/mei/vsc-tp.c | Refactored IRQ handling, added workqueue-based event processing |
drivers/misc/mei/platform-vsc.c | Updated event callback registration and interrupt handling in probe |
drivers/misc/mei/bus.c | Added a debug check to warn on duplicate client entries in a list |
Comments suppressed due to low confidence (2)
drivers/misc/mei/vsc-tp.c:500
- Using NULL as the primary IRQ handler means that the interrupt is only handled in the threaded context. Please confirm that this change meets the hardware interrupt handling requirements.
ret = request_threaded_irq(spi->irq, NULL, vsc_tp_isr,
drivers/misc/mei/vsc-tp.c:551
- The previous shutdown implementation called vsc_tp_reset before releasing resources. Verify that removing this reset step during shutdown is intentional and does not affect device behavior.
mutex_destroy(&tp->mutex);
|
||
mei_cl_flush_queues(cldev->cl, NULL); | ||
mei_me_cl_put(cldev->me_cl); | ||
mei_dev_bus_put(cldev->bus); | ||
|
||
list_for_each_entry(cl, &mdev->file_list, link) | ||
WARN_ON(cl == cldev->cl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The WARN_ON check may benefit from additional contextual information in its message to aid debugging if the condition is triggered.
WARN_ON(cl == cldev->cl); | |
WARN(cl == cldev->cl, "Unexpected condition: cl (%p) matches cldev->cl (%p)\n", cl, cldev->cl); |
Copilot uses AI. Check for mistakes.
This pull-req was created to test Copilot code-review, see: This is not intended for merging, closing this now. |
Since commit 6b9f29b ("riscv: Enable pcpu page first chunk allocator"), if NUMA is enabled, the page percpu allocator may be used on very sparse configurations, or when requested on boot with percpu_alloc=page. In that case, percpu data gets put in the vmalloc area. However, sbi_hsm_hart_start() needs the physical address of a sbi_hart_boot_data, and simply assumes that __pa() would work. This causes the just started hart to immediately access an invalid address and hang. Fortunately, struct sbi_hart_boot_data is not too large, so we can simply allocate an array for boot_data statically, putting it in the kernel image. This fixes NUMA=y SMP boot on Sophgo SG2042. To reproduce on QEMU: Set CONFIG_NUMA=y and CONFIG_DEBUG_VIRTUAL=y, then run with: qemu-system-riscv64 -M virt -smp 2 -nographic \ -kernel arch/riscv/boot/Image \ -append "percpu_alloc=page" Kernel output: [ 0.000000] Booting Linux on hartid 0 [ 0.000000] Linux version 6.16.0-rc1 (dram@sakuya) (riscv64-unknown-linux-gnu-gcc (GCC) 14.2.1 20250322, GNU ld (GNU Binutils) 2.44) #11 SMP Tue Jun 24 14:56:22 CST 2025 ... [ 0.000000] percpu: 28 4K pages/cpu s85784 r8192 d20712 ... [ 0.083192] smp: Bringing up secondary CPUs ... [ 0.086722] ------------[ cut here ]------------ [ 0.086849] virt_to_phys used for non-linear address: (____ptrval____) (0xff2000000001d080) [ 0.088001] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xae/0xe8 [ 0.088376] Modules linked in: [ 0.088656] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.16.0-rc1 #11 NONE [ 0.088833] Hardware name: riscv-virtio,qemu (DT) [ 0.088948] epc : __virt_to_phys+0xae/0xe8 [ 0.089001] ra : __virt_to_phys+0xae/0xe8 [ 0.089037] epc : ffffffff80021eaa ra : ffffffff80021eaa sp : ff2000000004bbc0 [ 0.089057] gp : ffffffff817f49c0 tp : ff60000001d60000 t0 : 5f6f745f74726976 [ 0.089076] t1 : 0000000000000076 t2 : 705f6f745f747269 s0 : ff2000000004bbe0 [ 0.089095] s1 : ff2000000001d080 a0 : 0000000000000000 a1 : 0000000000000000 [ 0.089113] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.089131] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 [ 0.089155] s2 : ffffffff8130dc00 s3 : 0000000000000001 s4 : 0000000000000001 [ 0.089174] s5 : ffffffff8185eff8 s6 : ff2000007f1eb000 s7 : ffffffff8002a2ec [ 0.089193] s8 : 0000000000000001 s9 : 0000000000000001 s10: 0000000000000000 [ 0.089211] s11: 0000000000000000 t3 : ffffffff8180a9f7 t4 : ffffffff8180a9f7 [ 0.089960] t5 : ffffffff8180a9f8 t6 : ff2000000004b9d8 [ 0.089984] status: 0000000200000120 badaddr: ffffffff80021eaa cause: 0000000000000003 [ 0.090101] [<ffffffff80021eaa>] __virt_to_phys+0xae/0xe8 [ 0.090228] [<ffffffff8001d796>] sbi_cpu_start+0x6e/0xe8 [ 0.090247] [<ffffffff8001a5da>] __cpu_up+0x1e/0x8c [ 0.090260] [<ffffffff8002a32e>] bringup_cpu+0x42/0x258 [ 0.090277] [<ffffffff8002914c>] cpuhp_invoke_callback+0xe0/0x40c [ 0.090292] [<ffffffff800294e0>] __cpuhp_invoke_callback_range+0x68/0xfc [ 0.090320] [<ffffffff8002a96a>] _cpu_up+0x11a/0x244 [ 0.090334] [<ffffffff8002aae6>] cpu_up+0x52/0x90 [ 0.090384] [<ffffffff80c09350>] bringup_nonboot_cpus+0x78/0x118 [ 0.090411] [<ffffffff80c11060>] smp_init+0x34/0xb8 [ 0.090425] [<ffffffff80c01220>] kernel_init_freeable+0x148/0x2e4 [ 0.090442] [<ffffffff80b83802>] kernel_init+0x1e/0x14c [ 0.090455] [<ffffffff800124ca>] ret_from_fork_kernel+0xe/0xf0 [ 0.090471] [<ffffffff80b8d9c2>] ret_from_fork_kernel_asm+0x16/0x18 [ 0.090560] ---[ end trace 0000000000000000 ]--- [ 1.179875] CPU1: failed to come online [ 1.190324] smp: Brought up 1 node, 1 CPU Cc: stable@vger.kernel.org Reported-by: Han Gao <rabenda.cn@gmail.com> Fixes: 6b9f29b ("riscv: Enable pcpu page first chunk allocator") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://lore.kernel.org/r/20250624-riscv-hsm-boot-data-array-v1-1-50b5eeafbe61@iscas.ac.cn Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
No description provided.