-
Notifications
You must be signed in to change notification settings - Fork 5
Update cs35l41_hda_property.c #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By comparing with asus-linux/fedora-kernel it seems ChimeraOS still uses an older version of Luke's patch as you can see here. We should make sure to ship the latest interation of it also here. |
This is what a successful patch letting BIOS DSD maintain control for >= 330 looks like.
This is the proper behavior that happens when If I change this one line as suggested by another user on the ChimeraOS discord to This is what a failed
For reference, this patch series is discussed here. If you want Luke's latest ASUS Linux patch as well, that can be found here. This pull request was meant to fix my original patch, once I discovered what was wrong with this current implementation. Thanks. |
A GISA cannot be destroyed as long it is linked in the GIB alert list as this would break the alert list. Just waiting for its removal from the list triggered by another vm is not sufficient as it might be the only vm. The below shown cpu stall situation might occur when GIB alerts are delayed and is fixed by calling process_gib_alert_list() instead of waiting. At this time the vcpus of the vm are already destroyed and thus no vcpu can be kicked to enter the SIE again if for some reason an interrupt is pending for that vm. Additionally the IAM restore value is set to 0x00. That would be a bug introduced by incomplete device de-registration, i.e. missing kvm_s390_gisc_unregister() call. Setting this value and the IAM in the GISA to 0x00 guarantees that late interrupts don't bring the GISA back into the alert list. CPU stall caused by kvm_s390_gisa_destroy(): [ 4915.311372] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 14-.... } 24533 jiffies s: 5269 root: 0x1/. [ 4915.311390] rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x4000/. [ 4915.311394] Task dump for CPU 14: [ 4915.311395] task:qemu-system-s39 state:R running task stack:0 pid:217198 ppid:1 flags:0x00000045 [ 4915.311399] Call Trace: [ 4915.311401] [<0000038003a33a10>] 0x38003a33a10 [ 4933.861321] rcu: INFO: rcu_sched self-detected stall on CPU [ 4933.861332] rcu: 14-....: (42008 ticks this GP) idle=53f4/1/0x4000000000000000 softirq=61530/61530 fqs=14031 [ 4933.861353] rcu: (t=42008 jiffies g=238109 q=100360 ncpus=18) [ 4933.861357] CPU: 14 PID: 217198 Comm: qemu-system-s39 Not tainted 6.5.0-20230816.rc6.git26.a9d17c5d8813.300.fc38.s390x #1 [ 4933.861360] Hardware name: IBM 8561 T01 703 (LPAR) [ 4933.861361] Krnl PSW : 0704e00180000000 000003ff804bfc66 (kvm_s390_gisa_destroy+0x3e/0xe0 [kvm]) [ 4933.861414] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 4933.861416] Krnl GPRS: 0000000000000000 00000372000000fc 00000002134f8000 000000000d5f5900 [ 4933.861419] 00000002f5ea1d18 00000002f5ea1d18 0000000000000000 0000000000000000 [ 4933.861420] 00000002134fa890 00000002134f8958 000000000d5f5900 00000002134f8000 [ 4933.861422] 000003ffa06acf98 000003ffa06858b0 0000038003a33c20 0000038003a33bc8 [ 4933.861430] Krnl Code: 000003ff804bfc58: ec66002b007e cij %r6,0,6,000003ff804bfcae 000003ff804bfc5e: b904003a lgr %r3,%r10 #000003ff804bfc62: a7f40005 brc 15,000003ff804bfc6c >000003ff804bfc66: e330b7300204 lg %r3,10032(%r11) 000003ff804bfc6c: 58003000 l %r0,0(%r3) 000003ff804bfc70: ec03fffb6076 crj %r0,%r3,6,000003ff804bfc66 000003ff804bfc76: e320b7600271 lay %r2,10080(%r11) 000003ff804bfc7c: c0e5fffea339 brasl %r14,000003ff804942ee [ 4933.861444] Call Trace: [ 4933.861445] [<000003ff804bfc66>] kvm_s390_gisa_destroy+0x3e/0xe0 [kvm] [ 4933.861460] ([<00000002623523de>] free_unref_page+0xee/0x148) [ 4933.861507] [<000003ff804aea98>] kvm_arch_destroy_vm+0x50/0x120 [kvm] [ 4933.861521] [<000003ff8049d374>] kvm_destroy_vm+0x174/0x288 [kvm] [ 4933.861532] [<000003ff8049d4fe>] kvm_vm_release+0x36/0x48 [kvm] [ 4933.861542] [<00000002623cd04a>] __fput+0xea/0x2a8 [ 4933.861547] [<00000002620d5bf8>] task_work_run+0x88/0xf0 [ 4933.861551] [<00000002620b0aa6>] do_exit+0x2c6/0x528 [ 4933.861556] [<00000002620b0f00>] do_group_exit+0x40/0xb8 [ 4933.861557] [<00000002620b0fa6>] __s390x_sys_exit_group+0x2e/0x30 [ 4933.861559] [<0000000262d481f4>] __do_syscall+0x1d4/0x200 [ 4933.861563] [<0000000262d59028>] system_call+0x70/0x98 [ 4933.861565] Last Breaking-Event-Address: [ 4933.861566] [<0000038003a33b60>] 0x38003a33b60 Fixes: 9f30f62 ("KVM: s390: add gib_alert_irq_handler()") Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Link: https://lore.kernel.org/r/20230901105823.3973928-1-mimu@linux.ibm.com Message-ID: <20230901105823.3973928-1-mimu@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Use atomic readl_poll_timeout_atomic, because msdc_reset_hw may be invoked in IRQ handler in the following context: msdc_irq() -> msdc_cmd_done() -> msdc_reset_hw() The following kernel BUG stack trace can be observed on Genio 1200 EVK after initializing MSDC1 hardware during kernel boot: [ 1.187441] BUG: scheduling while atomic: swapper/0/0/0x00010002 [ 1.189157] Modules linked in: [ 1.204633] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.15.42-mtk+modified #1 [ 1.205713] Hardware name: MediaTek Genio 1200 EVK-P1V2-EMMC (DT) [ 1.206484] Call trace: [ 1.206796] dump_backtrace+0x0/0x1ac [ 1.207266] show_stack+0x24/0x30 [ 1.207692] dump_stack_lvl+0x68/0x84 [ 1.208162] dump_stack+0x1c/0x38 [ 1.208587] __schedule_bug+0x68/0x80 [ 1.209056] __schedule+0x6ec/0x7c0 [ 1.209502] schedule+0x7c/0x110 [ 1.209915] schedule_hrtimeout_range_clock+0xc4/0x1f0 [ 1.210569] schedule_hrtimeout_range+0x20/0x30 [ 1.211148] usleep_range_state+0x84/0xc0 [ 1.211661] msdc_reset_hw+0xc8/0x1b0 [ 1.212134] msdc_cmd_done.isra.0+0x4ac/0x5f0 [ 1.212693] msdc_irq+0x104/0x2d4 [ 1.213121] __handle_irq_event_percpu+0x68/0x280 [ 1.213725] handle_irq_event+0x70/0x15c [ 1.214230] handle_fasteoi_irq+0xb0/0x1a4 [ 1.214755] handle_domain_irq+0x6c/0x9c [ 1.215260] gic_handle_irq+0xc4/0x180 [ 1.215741] call_on_irq_stack+0x2c/0x54 [ 1.216245] do_interrupt_handler+0x5c/0x70 [ 1.216782] el1_interrupt+0x30/0x80 [ 1.217242] el1h_64_irq_handler+0x1c/0x2c [ 1.217769] el1h_64_irq+0x78/0x7c [ 1.218206] cpuidle_enter_state+0xc8/0x600 [ 1.218744] cpuidle_enter+0x44/0x5c [ 1.219205] do_idle+0x224/0x2d0 [ 1.219624] cpu_startup_entry+0x30/0x80 [ 1.220129] rest_init+0x108/0x134 [ 1.220568] arch_call_rest_init+0x1c/0x28 [ 1.221094] start_kernel+0x6c0/0x700 [ 1.221564] __primary_switched+0xc0/0xc8 Fixes: ffaea6e ("mmc: mtk-sd: Use readl_poll_timeout instead of open-coded polling") Signed-off-by: Pablo Sun <pablo.sun@mediatek.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioachino.delregno@collabora.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230922095348.22182-1-pablo.sun@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In powerpc pseries system, below behaviour is observed while enabling tracing on hcall: # cd /sys/kernel/debug/tracing/ # cat events/powerpc/hcall_exit/enable 0 # echo 1 > events/powerpc/hcall_exit/enable # ls -bash: fork: Bad address Above is from power9 lpar with latest kernel. Past this, softlockup is observed. Initially while attempting via perf_event_open to use "PERF_TYPE_TRACEPOINT", kernel panic was observed. perf config used: ================ memset(&pe[1],0,sizeof(struct perf_event_attr)); pe[1].type=PERF_TYPE_TRACEPOINT; pe[1].size=96; pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */ pe[1].sample_type=0; /* 0 */ pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */ pe[1].inherit=1; pe[1].precise_ip=0; /* arbitrary skid */ pe[1].wakeup_events=0; pe[1].bp_type=HW_BREAKPOINT_EMPTY; pe[1].config1=0x1ULL; Kernel panic logs: ================== Kernel attempted to read user page (8) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000008 Faulting instruction address: 0xc0000000004c2814 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1 Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries NIP page_remove_rmap+0x44/0x320 LR wp_page_copy+0x384/0xec0 Call Trace: 0xc00000001416e400 (unreliable) wp_page_copy+0x384/0xec0 __handle_mm_fault+0x9d4/0xfb0 handle_mm_fault+0xf0/0x350 ___do_page_fault+0x48c/0xc90 hash__do_page_fault+0x30/0x70 do_hash_fault+0x1a4/0x330 data_access_common_virt+0x198/0x1f0 --- interrupt: 300 at 0x7fffae971abc git bisect tracked this down to below commit: 'commit baa49d8 ("powerpc/pseries: hvcall stack frame overhead")' This commit changed STACK_FRAME_OVERHEAD (112 ) to STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size for ELFv2 stack. With the latest kernel, when running on ELFv2, STACK_FRAME_MIN_SIZE is used to allocate stack size. During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL which saves the registers and allocates new stack frame. In the plpar_hcall_trace code, STK_PARAM is accessed at two places. 1. To save r4: std r4,STK_PARAM(R4)(r1) 2. To access r4 back: ld r12,STK_PARAM(R4)(r1) HCALL_INST_PRECALL precall allocates a new stack frame. So all the stack parameter access after the precall, needs to be accessed with +STACK_FRAME_MIN_SIZE. So the store instruction should be: std r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1) If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will end up with overwriting stack contents and cause corruption. But instead of updating 'std', we can instead remove it since HCALL_INST_PRECALL already saves it to the correct location. similarly load instruction should be: ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1) Fix the load instruction to correctly access the stack parameter with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the precall saves it correctly. Cc: stable@vger.kernel.org # v6.2+ Fixes: baa49d8 ("powerpc/pseries: hvcall stack frame overhead") Co-developed-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230929172337.7906-1-atrajeev@linux.vnet.ibm.com
Many functions in drivers/usb/core/hub.c and drivers/usb/core/hub.h access fields inside udev->bos without checking if it was allocated and initialized. If usb_get_bos_descriptor() fails for whatever reason, udev->bos will be NULL and those accesses will result in a crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 17818 Comm: kworker/5:1 Tainted: G W 5.15.108-18910-gab0e1cb584e1 #1 <HASH:1f9e 1> Hardware name: Google Kindred/Kindred, BIOS Google_Kindred.12672.413.0 02/03/2021 Workqueue: usb_hub_wq hub_event RIP: 0010:hub_port_reset+0x193/0x788 Code: 89 f7 e8 20 f7 15 00 48 8b 43 08 80 b8 96 03 00 00 03 75 36 0f b7 88 92 03 00 00 81 f9 10 03 00 00 72 27 48 8b 80 a8 03 00 00 <48> 83 78 18 00 74 19 48 89 df 48 8b 75 b0 ba 02 00 00 00 4c 89 e9 RSP: 0018:ffffab740c53fcf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa1bc5f678000 RCX: 0000000000000310 RDX: fffffffffffffdff RSI: 0000000000000286 RDI: ffffa1be9655b840 RBP: ffffab740c53fd70 R08: 00001b7d5edaa20c R09: ffffffffb005e060 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: ffffab740c53fd3e R14: 0000000000000032 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa1be96540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000022e80c005 CR4: 00000000003706e0 Call Trace: hub_event+0x73f/0x156e ? hub_activate+0x5b7/0x68f process_one_work+0x1a2/0x487 worker_thread+0x11a/0x288 kthread+0x13a/0x152 ? process_one_work+0x487/0x487 ? kthread_associate_blkcg+0x70/0x70 ret_from_fork+0x1f/0x30 Fall back to a default behavior if the BOS descriptor isn't accessible and skip all the functionalities that depend on it: LPM support checks, Super Speed capabilitiy checks, U1/U2 states setup. Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230830100418.1952143-1-ricardo.canuelo@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For ARM processor, unaligned access to device memory is not allowed. Method memcpy does not take care of alignment. USB detection failure with the unalingned address of memory, with below kernel crash. To fix the unalingned address kernel panic, replace memcpy with memcpy_toio method. Kernel crash: Unable to handle kernel paging request at virtual address ffff80000c05008a Mem abort info: ESR = 0x96000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061 CM = 0, WnR = 1 swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000143b000 [ffff80000c05008a] pgd=100000087ffff003, p4d=100000087ffff003, pud=100000087fffe003, pmd=1000000800bcc003, pte=00680000a0010713 Internal error: Oops: 96000061 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.19-xilinx-v2022.1 #1 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x30/0x260 lr : __xudc_ep0_queue+0xf0/0x110 sp : ffff800008003d00 x29: ffff800008003d00 x28: ffff800009474e80 x27: 00000000000000a0 x26: 0000000000000100 x25: 0000000000000012 x24: ffff000800bc8080 x23: 0000000000000001 x22: 0000000000000012 x21: ffff000800bc8080 x20: 0000000000000012 x19: ffff000800bc8080 x18: 0000000000000000 x17: ffff800876482000 x16: ffff800008004000 x15: 0000000000004000 x14: 00001f09785d0400 x13: 0103020101005567 x12: 0781400000000200 x11: 00000000c5672a10 x10: 00000000000008d0 x9 : ffff800009463cf0 x8 : ffff8000094757b0 x7 : 0201010055670781 x6 : 4000000002000112 x5 : ffff80000c05009a x4 : ffff000800a15012 x3 : ffff00080362ad80 x2 : 0000000000000012 x1 : ffff000800a15000 x0 : ffff80000c050088 Call trace: __memcpy+0x30/0x260 xudc_ep0_queue+0x3c/0x60 usb_ep_queue+0x38/0x44 composite_ep0_queue.constprop.0+0x2c/0xc0 composite_setup+0x8d0/0x185c configfs_composite_setup+0x74/0xb0 xudc_irq+0x570/0xa40 __handle_irq_event_percpu+0x58/0x170 handle_irq_event+0x60/0x120 handle_fasteoi_irq+0xc0/0x220 handle_domain_irq+0x60/0x90 gic_handle_irq+0x74/0xa0 call_on_irq_stack+0x2c/0x60 do_interrupt_handler+0x54/0x60 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x78/0x7c arch_cpu_idle+0x18/0x2c do_idle+0xdc/0x15c cpu_startup_entry+0x28/0x60 rest_init+0xc8/0xe0 arch_call_rest_init+0x10/0x1c start_kernel+0x694/0x6d4 __primary_switched+0xa4/0xac Fixes: 1f7c516 ("usb: gadget: Add xilinx usb2 device support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202209020044.CX2PfZzM-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Piyush Mehta <piyush.mehta@amd.com> Link: https://lore.kernel.org/r/20230929121514.13475-1-piyush.mehta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Surfaces can be backed (i.e. stored in) memory objects (mob's) which are created and managed by the userspace as GEM buffers. Surfaces grab only a ttm reference which means that the gem object can be deleted underneath us, especially in cases where prime buffer export is used. Make sure that all userspace surfaces which are backed by gem objects hold a gem reference to make sure they're not deleted before vmw surfaces are done with them, which fixes: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 2632 at lib/refcount.c:28 refcount_warn_saturate+0xfb/0x150 Modules linked in: overlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock snd_ens1371 snd_ac97_codec ac97_bus snd_pcm gameport> CPU: 2 PID: 2632 Comm: vmw_ref_count Not tainted 6.5.0-rc2-vmwgfx #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:refcount_warn_saturate+0xfb/0x150 Code: eb 9e 0f b6 1d 8b 5b a6 01 80 fb 01 0f 87 ba e4 80 00 83 e3 01 75 89 48 c7 c7 c0 3c f9 a3 c6 05 6f 5b a6 01 01 e8 15 81 98 ff <0f> 0b e9 6f ff ff ff 0f b> RSP: 0018:ffffbdc34344bba0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: ffff960475ea1548 RSI: 0000000000000001 RDI: ffff960475ea1540 RBP: ffffbdc34344bba8 R08: 0000000000000003 R09: 65646e75203a745f R10: ffffffffa5b32b20 R11: 72657466612d6573 R12: ffff96037d6a6400 R13: ffff9603484805b0 R14: 000000000000000b R15: ffff9603bed06060 FS: 00007f5fd8520c40(0000) GS:ffff960475e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5fda755000 CR3: 000000010d012005 CR4: 00000000003706e0 Call Trace: <TASK> ? show_regs+0x6e/0x80 ? refcount_warn_saturate+0xfb/0x150 ? __warn+0x91/0x150 ? refcount_warn_saturate+0xfb/0x150 ? report_bug+0x19d/0x1b0 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x1d/0x80 ? asm_exc_invalid_op+0x1f/0x30 ? refcount_warn_saturate+0xfb/0x150 drm_gem_object_handle_put_unlocked+0xba/0x110 [drm] drm_gem_object_release_handle+0x6e/0x80 [drm] drm_gem_handle_delete+0x6a/0xc0 [drm] ? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx] vmw_bo_unref_ioctl+0x33/0x40 [vmwgfx] drm_ioctl_kernel+0xbc/0x160 [drm] drm_ioctl+0x2d2/0x580 [drm] ? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx] ? do_vmi_munmap+0xee/0x180 vmw_generic_ioctl+0xbd/0x180 [vmwgfx] vmw_unlocked_ioctl+0x19/0x20 [vmwgfx] __x64_sys_ioctl+0x99/0xd0 do_syscall_64+0x5d/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? handle_mm_fault+0x16e/0x2f0 ? exit_to_user_mode_prepare+0x34/0x170 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x8e/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f5fda51aaff Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 7> RSP: 002b:00007ffd536a4d30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffd536a4de0 RCX: 00007f5fda51aaff RDX: 00007ffd536a4de0 RSI: 0000000040086442 RDI: 0000000000000003 RBP: 0000000040086442 R08: 000055fa603ada50 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffd536a51b8 R13: 0000000000000003 R14: 000055fa5ebb4c80 R15: 00007f5fda90f040 </TASK> ---[ end trace 0000000000000000 ]--- A lot of the analyis on the bug was done by Murray McAllister and Ian Forbes. Reported-by: Murray McAllister <murray.mcallister@gmail.com> Cc: Ian Forbes <iforbes@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Fixes: a950b98 ("drm/vmwgfx: Do not drop the reference to the handle too soon") Cc: <stable@vger.kernel.org> # v6.2+ Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230928041355.737635-1-zack@kde.org
Eddie reported that newer kernels were crashing during boot on his 476 FSP2 system: kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xb7ee2000 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K FSP-2 Modules linked in: CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 #1 Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2 NIP: b7ee2000 LR: 8c008000 CTR: 00000000 REGS: bffebd83 TRAP: 0400 Not tainted (6.1.55-d23900f.ppcnf-fs p2) MSR: 00000030 <IR,DR> CR: 00001000 XER: 20000000 GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000 GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000 GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0 GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0 NIP [b7ee2000] 0xb7ee2000 LR [8c008000] 0x8c008000 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- The problem is in ret_from_syscall where the check for icache_44x_need_flush is done. When the flush is needed the code jumps out-of-line to do the flush, and then intends to jump back to continue the syscall return. However the branch back to label 1b doesn't return to the correct location, instead branching back just prior to the return to userspace, causing bogus register values to be used by the rfi. The breakage was introduced by commit 6f76a01 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which inadvertently removed the "1" label and reused it elsewhere. Fix it by adding named local labels in the correct locations. Note that the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n compiles. Fixes: 6f76a01 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Eddie James <eajames@linux.ibm.com> Tested-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au
The cited patch change mlx5 driver so that during probe DMA operations were performed before pci_enable_device(), and during teardown DMA operations were performed after pci_disable_device(). DMA operations require PCI to be enabled. Hence, The above leads to the following oops in PPC systems[1]. On s390x systems, as reported by Niklas Schnelle, this is a problem because mlx5_pci_init() is where the DMA and coherent mask is set but mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA allocation is done during probe before the correct mask is set. This causes probe to fail initialization of the cmdif SW structs on s390x after that is converted to the common dma-iommu code. This is because on s390x DMA addresses below 4 GiB are reserved on current machines and unlike the old s390x specific DMA API implementation common code enforces DMA masks. Fix it by performing the DMA operations during probe after pci_enable_device() and after the dma mask is set, and during teardown before pci_disable_device(). [1] Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 netconsole rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm ib_uverbs ib_core mlx5_core(-) ptp pps_core fuse vmx_crypto crc32c_vpmsum [last unloaded: mlx5_ib] CPU: 1 PID: 8937 Comm: modprobe Not tainted 6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02 #1 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c000000000423388 LR: c0000000001e733c CTR: c0000000001e4720 REGS: c0000000055636d0 TRAP: 0380 Not tainted (6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02) MSR: 8000000000009033 CR: 24008884 XER: 20040000 CFAR: c0000000001e7338 IRQMASK: 0 NIP [c000000000423388] __free_pages+0x28/0x160 LR [c0000000001e733c] dma_direct_free+0xac/0x190 Call Trace: [c000000005563970] [5deadbeef0000100] 0x5deadbeef0000100 (unreliable) [c0000000055639b0] [c0000000003d46cc] kfree+0x7c/0x150 [c000000005563a40] [c0000000001e47c8] dma_free_attrs+0xa8/0x1a0 [c000000005563aa0] [c008000000d0064c] mlx5_cmd_cleanup+0xa4/0x100 [mlx5_core] [c000000005563ad0] [c008000000cf629c] mlx5_mdev_uninit+0xf4/0x140 [mlx5_core] [c000000005563b00] [c008000000cf6448] remove_one+0x160/0x1d0 [mlx5_core] [c000000005563b40] [c000000000958540] pci_device_remove+0x60/0x110 [c000000005563b80] [c000000000a35e80] device_remove+0x70/0xd0 [c000000005563bb0] [c000000000a37a38] device_release_driver_internal+0x2a8/0x330 [c000000005563c00] [c000000000a37b8c] driver_detach+0x8c/0x160 [c000000005563c40] [c000000000a35350] bus_remove_driver+0x90/0x110 [c000000005563c80] [c000000000a38948] driver_unregister+0x48/0x90 [c000000005563cf0] [c000000000957e38] pci_unregister_driver+0x38/0x150 [c000000005563d40] [c008000000eb6140] mlx5_cleanup+0x38/0x90 [mlx5_core] Fixes: 06cd555 ("net/mlx5: split mlx5_cmd_init() to probe and reload routines") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from nic profile to uplink representor for example. Similar logic which fixed a similar scenario was previously introduced in the following commit: commit 72cc654 net/mlx5e: Take RTNL lock when needed before calling xdp_set_features(). This fixes the following assertion and warning call trace: RTNL: assertion failed at net/core/dev.c (1961) WARNING: CPU: 13 PID: 2529 at net/core/dev.c:1961 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 13 PID: 2529 Comm: devlink Not tainted 6.5.0_for_upstream_min_debug_2023_09_07_20_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 8f ff 80 3d 77 0d 16 01 00 75 c5 ba a9 07 00 00 48 c7 c6 c4 bb 0d 82 48 c7 c7 18 c8 06 82 c6 05 5b 0d 16 01 01 e8 44 f6 8c ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff88819930f7f0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8309f740 RCX: 0000000000000027 RDX: ffff88885fb5b5c8 RSI: 0000000000000001 RDI: ffff88885fb5b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff88819930f810 R13: ffff88810b7fea40 R14: ffff8881154e8fd8 R15: ffff888107e881a0 FS: 00007f3ad248f800(0000) GS:ffff88885fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563b85f164e0 CR3: 0000000113b5c006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_build_rep_params+0x97/0x130 [mlx5_core] mlx5e_init_ul_rep+0x9f/0x100 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_vport_rep_load+0x329/0x4a0 [mlx5_core] mlx5_esw_offloads_rep_load+0x9e/0xf0 [mlx5_core] esw_offloads_enable+0x4bc/0xe90 [mlx5_core] mlx5_eswitch_enable_locked+0x3c8/0x570 [mlx5_core] ? kmalloc_trace+0x25/0x80 mlx5_devlink_eswitch_mode_set+0x224/0x680 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x180/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1fc/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x276/0x620 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3ad231340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffd70aad4b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000c36b00 RCX:00007f3ad231340a RDX: 0000000000000038 RSI: 0000000000c36b00 RDI: 0000000000000003 RBP: 0000000000c36910 R08: 00007f3ad2625200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ Fixes: 4d5ab0a ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When testing on risc-v QEMU environment with "crashkernel=" parameter enabled, a problem occurred with the following message: [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000177e00000 - 0x0000000277e00000 (4096 MB) [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:779 __insert_resource+0x8e/0xd0 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc2-next-20230920 #1 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __insert_resource+0x8e/0xd0 [ 0.000000] ra : insert_resource+0x28/0x4e [ 0.000000] epc : ffffffff80017344 ra : ffffffff8001742e sp : ffffffff81203db0 [ 0.000000] gp : ffffffff812ece98 tp : ffffffff8120dac0 t0 : ff600001f7ff2b00 [ 0.000000] t1 : 0000000000000000 t2 : 3428203030303030 s0 : ffffffff81203dc0 [ 0.000000] s1 : ffffffff81211e18 a0 : ffffffff81211e18 a1 : ffffffff81289380 [ 0.000000] a2 : 0000000277dfffff a3 : 0000000177e00000 a4 : 0000000177e00000 [ 0.000000] a5 : ffffffff81289380 a6 : 0000000277dfffff a7 : 0000000000000078 [ 0.000000] s2 : ffffffff81289380 s3 : ffffffff80a0bac8 s4 : ff600001f7ff2880 [ 0.000000] s5 : 0000000000000280 s6 : 8000000a00006800 s7 : 000000000000007f [ 0.000000] s8 : 0000000080017038 s9 : 0000000080038ea0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff80a0bc00 t4 : ffffffff80a0bc00 [ 0.000000] t5 : ffffffff80a0bbd0 t6 : ffffffff80a0bc00 [ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.000000] [<ffffffff80017344>] __insert_resource+0x8e/0xd0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Failed to add a Crash kernel resource at 177e00000 The crashkernel memory has been allocated successfully, whereas it failed to insert into iomem_resource. This is due to the unique reserving logic in risc-v arch specific code, i.e. crashk_res/crashk_low_res will be added into iomem_resource later in init_resources(), which is not aligned with current unified reserving logic in reserve_crashkernel_{generic,low}() and therefore leads to the failure of crashkernel reservation. Removing the arch specific code within #ifdef CONFIG_KEXEC_CORE in init_resources() to fix above problem. Fixes: 3154915 ("riscv: kdump: use generic interface to simplify crashkernel reservation") Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
On driver load, scsi_add_host() can fail. This triggers the free path to call qla2x00_mem_free() multiple times. This causes NULL pointer access of ha->base_qpair. Add check before access. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx] PGD 8000001fcfe4a067 PUD 1fc8f0a067 PMD 0 Oops: 0000 [#1] SMP RIP: 0010:[<ffffffffc118f73c>] [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx] RSP: 0018:ffff8ace97a93a30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8ace8efd0000 RCX: 000000000000488f RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8ace97a93a60 R08: 000000000001f040 R09: ffffffff8678209b R10: ffff8acf7d6df040 R11: ffffc591c0fcc980 R12: ffffffff87034800 R13: ffff8acf0e3cc740 R14: ffff8ace8efd0000 R15: 00000000fffffff4 FS: 00007f4cf5449740(0000) GS:ffff8acf7d6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000001fc2f6c000 CR4: 00000000007607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff86781f18>] ? kobject_put+0x28/0x60 [<ffffffffc119a59c>] qla2x00_probe_one+0x19fc/0x3040 [qla2xxx] Fixes: efeda3b ("scsi: qla2xxx: Move resource to allow code reuse") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20231016101749.5059-1-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
…ress If we specify a valid CQ ring address but an invalid SQ ring address, we'll correctly spot this and free the allocated pages and clear them to NULL. However, we don't clear the ring page count, and hence will attempt to free the pages again. We've already cleared the address of the page array when freeing them, but we don't check for that. This causes the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Oops [#1] Modules linked in: CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty torvalds#56 Hardware name: ucbbar,riscvemu-bare (DT) Workqueue: events_unbound io_ring_exit_work epc : io_pages_free+0x2a/0x58 ra : io_rings_free+0x3a/0x50 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d [<ffffffff808811a2>] io_pages_free+0x2a/0x58 [<ffffffff80881406>] io_rings_free+0x3a/0x50 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424 [<ffffffff80027234>] process_one_work+0x10c/0x1f4 [<ffffffff8002756e>] worker_thread+0x252/0x31c [<ffffffff8002f5e4>] kthread+0xc4/0xe0 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c Check for a NULL array in io_pages_free(), but also clear the page counts when we free them to be on the safer side. Reported-by: rtm@csail.mit.edu Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes") Cc: stable@vger.kernel.org Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty torvalds#43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004ee("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004ee("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As drm_dp_get_mst_branch_device_by_guid() is called from drm_dp_get_mst_branch_device_by_guid(), mstb parameter has to be checked, otherwise NULL dereference may occur in the call to the memcpy() and cause following: [12579.365869] BUG: kernel NULL pointer dereference, address: 0000000000000049 [12579.365878] #PF: supervisor read access in kernel mode [12579.365880] #PF: error_code(0x0000) - not-present page [12579.365882] PGD 0 P4D 0 [12579.365887] Oops: 0000 [#1] PREEMPT SMP NOPTI ... [12579.365895] Workqueue: events_long drm_dp_mst_up_req_work [12579.365899] RIP: 0010:memcmp+0xb/0x29 [12579.365921] Call Trace: [12579.365927] get_mst_branch_device_by_guid_helper+0x22/0x64 [12579.365930] drm_dp_mst_up_req_work+0x137/0x416 [12579.365933] process_one_work+0x1d0/0x419 [12579.365935] worker_thread+0x11a/0x289 [12579.365938] kthread+0x13e/0x14f [12579.365941] ? process_one_work+0x419/0x419 [12579.365943] ? kthread_blkcg+0x31/0x31 [12579.365946] ret_from_fork+0x1f/0x30 As get_mst_branch_device_by_guid_helper() is recursive, moving condition to the first line allow to remove a similar one for step over of NULL elements inside a loop. Fixes: 5e93b82 ("drm/dp/mst: move GUID storage from mgr, port to only mst branch") Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Lukasz Majczak <lma@semihalf.com> Reviewed-by: Radoslaw Biernacki <rad@chromium.org> Signed-off-by: Manasi Navare <navaremanasi@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230922063410.23626-1-lma@semihalf.com
Erhard reported that his G5 was crashing with v6.6-rc kernels: mpic: Setting up HT PICs workarounds for U3/U4 BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe Faulting instruction address: 0xc00000000005dc40 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 6.6.0-rc3-PMacGS #1 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c00000000005dc40 LR: c000000000066660 CTR: c000000000007730 REGS: c0000000022bf510 TRAP: 0380 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44004242 XER: 00000000 IRQMASK: 3 GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300 GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000 GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8 GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000 GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000 GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac NIP hash_page_do_lazy_icache+0x50/0x100 LR __hash_page_4K+0x420/0x590 Call Trace: hash_page_mm+0x364/0x6f0 do_hash_fault+0x114/0x2b0 data_access_common_virt+0x198/0x1f0 --- interrupt: 300 at mpic_init+0x4bc/0x10c4 NIP: c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000 REGS: c0000000022bf9f0 TRAP: 0300 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 24004248 XER: 00000000 DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1 ... NIP mpic_init+0x4bc/0x10c4 LR mpic_init+0x464/0x10c4 --- interrupt: 300 pmac_setup_one_mpic+0x258/0x2dc pmac_pic_init+0x28c/0x3d8 init_IRQ+0x90/0x140 start_kernel+0x57c/0x78c start_here_common+0x1c/0x20 A bisect pointed to the breakage beginning with commit 9fee28b ("powerpc: implement the new page table range API"). Analysis of the oops pointed to a struct page with a corrupted compound_head being loaded via page_folio() -> _compound_head() in hash_page_do_lazy_icache(). The access by the mpic code is to an MMIO address, so the expectation is that the struct page for that address would be initialised by init_unavailable_range(), as pointed out by Aneesh. Instrumentation showed that was not the case, which eventually lead to the realisation that pfn_valid() was returning false for that address, causing the struct page to not be initialised. Because the system is using FLATMEM, the version of pfn_valid() in memory_model.h is used: static inline int pfn_valid(unsigned long pfn) { ... return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr; } Which relies on max_mapnr being initialised. Early in boot max_mapnr is zero meaning no PFNs are valid. max_mapnr is initialised in mem_init() called via: start_kernel() mm_core_init() # init/main.c:928 mem_init() But that is too late for the usage in init_unavailable_range() called via: start_kernel() setup_arch() # init/main.c:893 paging_init() free_area_init() init_unavailable_range() Although max_mapnr is currently set in mem_init(), the value is actually already available much earlier, as soon as mem_topology_setup() has completed, which is also before paging_init() is called. So move the initialisation there, which causes paging_init() to correctly initialise the struct page and fixes the bug. This bug seems to have been lurking for years, but went unnoticed because the pre-folio code was inspecting the uninitialised page->flags but not dereferencing it. Thanks to Erhard and Aneesh for help debugging. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au
c26ab45
to
d5c3fe9
Compare
Hi all, thanks for letting me know. I have rebased with your proposed changes |
commit bb32500 upstream The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty torvalds#186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Fixes: f5ca233 ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When GPU reset, we need to mark CRTC and planes for reset if content_type from new and old state differs. Therefore, mark CRTC mode as changed when atomic checking connector states. It mitigates recovering issue when trying to artificially reset GPU with `cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover`: [110.957772] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no status for stream 00000000f8460964 on acrtc000000002c01a447 [110.958270] amdgpu: [PFLIP]:crtc:0, pflip_stat:AMDGPU_FLIP_SUBMITTED [110.958395] ------------[ cut here ]------------ [110.958397] WARNING: CPU: 7 PID: 9 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3126 dc_update_planes_and_stream+0x28a/0x9f0 [amdgpu] [110.958836] Modules linked in: [...] [110.958996] CPU: 7 PID: 9 Comm: kworker/u32:0 Not tainted 6.1.29-valve4-1-neptune-61 #1 822eaa97b1b4805dc2af914eea829bf025fccfc6 [110.959001] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0116 05/12/2023 [110.959004] Workqueue: events_unbound commit_work [110.959012] RIP: 0010:dc_update_planes_and_stream+0x28a/0x9f0 [amdgpu] [110.959438] Code: [...] [110.959441] RSP: 0018:ffffb407800b39d8 EFLAGS: 00010202 [110.959445] RAX: 0000000000000000 RBX: ffff8bcf4cc11800 RCX: 0000000000000000 [110.959447] RDX: 0000000000000000 RSI: ffff8bd017e80000 RDI: ffff8bcf4cc11800 [110.959449] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb407800b39b8 [110.959451] R10: ffff8bcf4cc11800 R11: ffffffffc0e01a80 R12: ffff8bcf491c4000 [110.959453] R13: ffff8bcf4b0e0000 R14: 0000000000000000 R15: 0000000000000000 [110.959456] FS: 0000000000000000(0000) GS:ffff8bd26edc0000(0000) knlGS:0000000000000000 [110.959458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [110.959461] CR2: 000017568061a809 CR3: 00000001fb610000 CR4: 0000000000350ee0 [110.959463] Call Trace: [110.959469] <TASK> [110.959476] amdgpu_dm_atomic_commit_tail+0x19f1/0x39e0 [amdgpu d6a1084b950a0fdbd34dc529d5f93aeb468cb916] [110.959918] ? update_sd_lb_stats.constprop.0+0x961/0x980 [110.959927] ? load_balance+0x19d/0xe90 [110.959931] ? update_load_avg+0x7e/0x780 [110.959937] ? sugov_update_single_freq+0x18f/0x250 [110.959943] ? update_load_avg+0x7e/0x780 [110.959948] ? psi_group_change+0x15f/0x3a0 [110.959953] ? psi_task_switch+0xd6/0x230 [110.959958] ? wait_for_completion_timeout+0x13e/0x170 [110.959963] ? __switch_to_asm+0x3e/0x60 [110.959968] ? finish_task_switch.isra.0+0x90/0x2c0 [110.959975] commit_tail+0x94/0x130 [110.959979] process_one_work+0x1c7/0x380 [110.959987] worker_thread+0x51/0x390 [110.959992] ? rescuer_thread+0x3b0/0x3b0 [110.959996] kthread+0xde/0x110 [110.960000] ? kthread_complete_and_exit+0x20/0x20 [110.960004] ret_from_fork+0x22/0x30 [110.960010] </TASK> [110.960012] ---[ end trace 0000000000000000 ]--- [117.904310] tpm tpm0: Operation Timed out [121.024304] amdgpu 0000:04:00.0: [drm] *ERROR* [CRTC:81:crtc-0] flip_done timed out NOTE: content_type support for AMD display is a downstream work yet. Link: https://gitlab.steamos.cloud/jupiter/linux-integration/-/issues/33 Fixes: c57428cc9cca ("drm/amd/display: Hook up content_type property") Reported-by: Guilherme Piccoli <gpiccoli@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com>
[ Upstream commit 4a8e65b ] SRCU callbacks acceleration might fail if the preceding callbacks advance also fails. This can happen when the following steps are met: 1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12). 2) The grace period for RCU_WAIT_TAIL is observed as started but not yet completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5. 3) This value is passed to rcu_segcblist_advance() which can't move any segment forward and fails. 4) srcu_gp_start_if_needed() still proceeds with callback acceleration. But then the call to rcu_seq_snap() observes the grace period for the RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one for the RCU_NEXT_READY_TAIL segment as started (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the next grace period, which is 16. 5) The value of 16 is passed to rcu_segcblist_accelerate() but the freshly enqueued callback in RCU_NEXT_TAIL can't move to RCU_NEXT_READY_TAIL which already has callbacks for a previous grace period (gp_num = 12). So acceleration fails. 6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance to run srcu_invoke_callbacks(). Then some very bad outcome may happen if the following happens: 7) Some other CPU races and starts the grace period number 16 before the CPU handling previous steps had a chance. Therefore srcu_gp_start() isn't called on the latter sdp to fix the acceleration leak from previous steps with a new pair of call to advance/accelerate. 8) The grace period 16 completes and srcu_invoke_callbacks() is finally called. All the callbacks from previous grace periods (8 and 12) are correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL still remain. Then rcu_segcblist_accelerate() is called with a snaphot of 20. 9) Since nothing started the grace period number 20, callbacks stay unhandled. This has been reported in real load: [3144162.608392] INFO: task kworker/136:12:252684 blocked for more than 122 seconds. [3144162.615986] Tainted: G O K 5.4.203-1-tlinux4-0011.1 #1 [3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [3144162.631162] kworker/136:12 D 0 252684 2 0x90004000 [3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] [3144162.631192] Call Trace: [3144162.631202] __schedule+0x2ee/0x660 [3144162.631206] schedule+0x33/0xa0 [3144162.631209] schedule_timeout+0x1c4/0x340 [3144162.631214] ? update_load_avg+0x82/0x660 [3144162.631217] ? raw_spin_rq_lock_nested+0x1f/0x30 [3144162.631218] wait_for_completion+0x119/0x180 [3144162.631220] ? wake_up_q+0x80/0x80 [3144162.631224] __synchronize_srcu.part.19+0x81/0xb0 [3144162.631226] ? __bpf_trace_rcu_utilization+0x10/0x10 [3144162.631227] synchronize_srcu+0x5f/0xc0 [3144162.631236] irqfd_shutdown+0x3c/0xb0 [kvm] [3144162.631239] ? __schedule+0x2f6/0x660 [3144162.631243] process_one_work+0x19a/0x3a0 [3144162.631244] worker_thread+0x37/0x3a0 [3144162.631247] kthread+0x117/0x140 [3144162.631247] ? process_one_work+0x3a0/0x3a0 [3144162.631248] ? __kthread_cancel_work+0x40/0x40 [3144162.631250] ret_from_fork+0x1f/0x30 Fix this with taking the snapshot for acceleration _before_ the read of the current grace period number. The only side effect of this solution is that callbacks advancing happen then _after_ the full barrier in rcu_seq_snap(). This is not a problem because that barrier only cares about: 1) Ordering accesses of the update side before call_srcu() so they don't bleed. 2) See all the accesses prior to the grace period of the current gp_num The only things callbacks advancing need to be ordered against are carried by snp locking. Reported-by: Yong He <alexyonghe@tencent.com> Co-developed-by:: Yong He <alexyonghe@tencent.com> Signed-off-by: Yong He <alexyonghe@tencent.com> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com Fixes: da915ad ("srcu: Parallelize callback handling") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5252d9 ] The clock dividers might be used with clock stop bit enabled or not. Current tables only support recommended values from the datasheet. This might result in warnings like below because no valid clock divider is found. Resulting in a 0 divider. There are Renesas ARM Trusted Firmware version out there which e.g. configure 0x201 (shifted logical right by 2: 0x80) and with this match the added { STPnHCK | 0, 1 }: https://github.com/renesas-rcar/arm-trusted-firmware/blob/rcar_gen3_v2.3/drivers/renesas/rcar/emmc/emmc_init.c#L108 ------------[ cut here ]------------ sd1h: Zero divisor and CLK_DIVIDER_ALLOW_ZERO not set WARNING: CPU: 1 PID: 1 at drivers/clk/clk-divider.c:141 divider_recalc_rate+0x48/0x70 Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.52 #1 Hardware name: Custom board based on r8a7796 (DT) pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : divider_recalc_rate+0x48/0x70 ... ------------[ cut here ]------------ Fixes: bb6d3fa ("clk: renesas: rcar-gen3: Switch to new SD clock handling") Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> [wsa: extended the table to 5 entries, added comments, reword commit message a little] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230928080317.28224-1-wsa+renesas@sang-engineering.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4428399 ] The lt8912b driver, in its bridge detach function, calls drm_connector_unregister() and drm_connector_cleanup(). drm_connector_unregister() should be called only for connectors explicitly registered with drm_connector_register(), which is not the case in lt8912b. The driver's drm_connector_funcs.destroy hook is set to drm_connector_cleanup(). Thus the driver should not call either drm_connector_unregister() nor drm_connector_cleanup() in its lt8912_bridge_detach(), as they cause a crash on bridge detach: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000858f3000 [0000000000000000] pgd=0800000085918003, p4d=0800000085918003, pud=0800000085431003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: tidss(-) display_connector lontium_lt8912b tc358768 panel_lvds panel_simple drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks CPU: 3 PID: 462 Comm: rmmod Tainted: G W 6.5.0-rc2+ #2 Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drm_connector_cleanup+0x78/0x2d4 [drm] lr : lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] sp : ffff800082ed3a90 x29: ffff800082ed3a90 x28: ffff0000040c1940 x27: 0000000000000000 x26: 0000000000000000 x25: dead000000000122 x24: dead000000000122 x23: dead000000000100 x22: ffff000003fb6388 x21: 0000000000000000 x20: 0000000000000000 x19: ffff000003fb6260 x18: fffffffffffe56e8 x17: 0000000000000000 x16: 0010000000000000 x15: 0000000000000038 x14: 0000000000000000 x13: ffff800081914b48 x12: 000000000000040e x11: 000000000000015a x10: ffff80008196ebb8 x9 : ffff800081914b48 x8 : 00000000ffffefff x7 : ffff0000040c1940 x6 : ffff80007aa649d0 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008159e008 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: drm_connector_cleanup+0x78/0x2d4 [drm] lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] drm_bridge_detach+0x44/0x84 [drm] drm_encoder_cleanup+0x40/0xb8 [drm] drmm_encoder_alloc_release+0x1c/0x30 [drm] drm_managed_release+0xac/0x148 [drm] drm_dev_put.part.0+0x88/0xb8 [drm] devm_drm_dev_init_release+0x14/0x24 [drm] devm_action_release+0x14/0x20 release_nodes+0x5c/0x90 devres_release_all+0x8c/0xe0 device_unbind_cleanup+0x18/0x68 device_release_driver_internal+0x208/0x23c driver_detach+0x4c/0x94 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 platform_driver_unregister+0x14/0x20 tidss_platform_driver_exit+0x18/0xb2c [tidss] __arm64_sys_delete_module+0x1a0/0x2b4 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x60/0x10c do_el0_svc_compat+0x1c/0x40 el0_svc_compat+0x40/0xac el0t_32_sync_handler+0xb0/0x138 el0t_32_sync+0x194/0x198 Code: 9104a276 f2fbd5b7 aa0203e1 91008af8 (f85c0420) Fixes: 30e2ae9 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230804-lt8912b-v1-2-c542692c6a2f@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
…i_hns3_pmu uninit process [ Upstream commit 50b5607 ] When tearing down a 'hisi_hns3' PMU, we mistakenly run the CPU hotplug callbacks after the device has been unregistered, leading to fireworks when we try to execute empty function callbacks within the driver: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | CPU: 0 PID: 15 Comm: cpuhp/0 Tainted: G W O 5.12.0-rc4+ #1 | Hardware name: , BIOS KpxxxFPGA 1P B600 V143 04/22/2021 | pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) | pc : perf_pmu_migrate_context+0x98/0x38c | lr : perf_pmu_migrate_context+0x94/0x38c | | Call trace: | perf_pmu_migrate_context+0x98/0x38c | hisi_hns3_pmu_offline_cpu+0x104/0x12c [hisi_hns3_pmu] Use cpuhp_state_remove_instance_nocalls() instead of cpuhp_state_remove_instance() so that the notifiers don't execute after the PMU device has been unregistered. Fixes: 66637ab ("drivers/perf: hisi: add driver for HNS3 PMU") Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231019091352.998964-1-shaojijie@huawei.com [will: Rewrote commit message] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
…it_event_hooks() [ Upstream commit 91cfe0b ] When CONFIG_HID_UCLOGIC=y and CONFIG_KUNIT_ALL_TESTS=y, launch kernel and then the below user-memory-access bug occurs. In hid_test_uclogic_params_cleanup_event_hooks(),it call uclogic_params_ugee_v2_init_event_hooks() with the first arg=NULL, so when it calls uclogic_params_ugee_v2_has_battery(), the hid_get_drvdata() will access hdev->dev with hdev=NULL, which will cause below user-memory-access. So add a fake_device with quirks member and call hid_set_drvdata() to assign hdev->dev->driver_data which avoids the null-ptr-def bug for drvdata->quirks in uclogic_params_ugee_v2_has_battery(). After applying this patch, the below user-memory-access bug never occurs. general protection fault, probably for non-canonical address 0xdffffc0000000329: 0000 [#1] PREEMPT SMP KASAN KASAN: probably user-memory-access in range [0x0000000000001948-0x000000000000194f] CPU: 5 PID: 2189 Comm: kunit_try_catch Tainted: G B W N 6.6.0-rc2+ torvalds#30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:uclogic_params_ugee_v2_init_event_hooks+0x87/0x600 Code: f3 f3 65 48 8b 14 25 28 00 00 00 48 89 54 24 60 31 d2 48 89 fa c7 44 24 30 00 00 00 00 48 c7 44 24 28 02 f8 02 01 48 c1 ea 03 <80> 3c 02 00 0f 85 2c 04 00 00 48 8b 9d 48 19 00 00 48 b8 00 00 00 RSP: 0000:ffff88810679fc88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000329 RSI: ffff88810679fd88 RDI: 0000000000001948 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffed1020f639f0 R10: ffff888107b1cf87 R11: 0000000000000400 R12: 1ffff11020cf3f92 R13: ffff88810679fd88 R14: ffff888100b97b08 R15: ffff8881030bb080 FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000005286001 CR4: 0000000000770ee0 DR0: ffffffff8fdd6cf4 DR1: ffffffff8fdd6cf5 DR2: ffffffff8fdd6cf6 DR3: ffffffff8fdd6cf7 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? uclogic_params_ugee_v2_init_event_hooks+0x87/0x600 ? sched_clock_cpu+0x69/0x550 ? uclogic_parse_ugee_v2_desc_gen_params+0x70/0x70 ? load_balance+0x2950/0x2950 ? rcu_trc_cmpxchg_need_qs+0x67/0xa0 hid_test_uclogic_params_cleanup_event_hooks+0x9e/0x1a0 ? uclogic_params_ugee_v2_init_event_hooks+0x600/0x600 ? __switch_to+0x5cf/0xe60 ? migrate_enable+0x260/0x260 ? __kthread_parkme+0x83/0x150 ? kunit_try_run_case_cleanup+0xe0/0xe0 kunit_generic_run_threadfn_adapter+0x4a/0x90 ? kunit_try_catch_throw+0x80/0x80 kthread+0x2b5/0x380 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x70 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 0000000000000000 ]--- RIP: 0010:uclogic_params_ugee_v2_init_event_hooks+0x87/0x600 Code: f3 f3 65 48 8b 14 25 28 00 00 00 48 89 54 24 60 31 d2 48 89 fa c7 44 24 30 00 00 00 00 48 c7 44 24 28 02 f8 02 01 48 c1 ea 03 <80> 3c 02 00 0f 85 2c 04 00 00 48 8b 9d 48 19 00 00 48 b8 00 00 00 RSP: 0000:ffff88810679fc88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000329 RSI: ffff88810679fd88 RDI: 0000000000001948 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffed1020f639f0 R10: ffff888107b1cf87 R11: 0000000000000400 R12: 1ffff11020cf3f92 R13: ffff88810679fd88 R14: ffff888100b97b08 R15: ffff8881030bb080 FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000005286001 CR4: 0000000000770ee0 DR0: ffffffff8fdd6cf4 DR1: ffffffff8fdd6cf5 DR2: ffffffff8fdd6cf6 DR3: ffffffff8fdd6cf7 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 1 seconds.. Fixes: a251d65 ("HID: uclogic: Handle wireless device reconnection") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: José Expósito <jose.exposito89@gmail.com> Link: https://lore.kernel.org/r/20231009064245.3573397-2-ruanjinjie@huawei.com Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 #12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ede72dc ] Fuzzing found that an invalid tracepoint name would create a memory leak with an address sanitizer build: ``` $ perf stat -e '*:o/' true event syntax error: '*:o/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ================================================================= ==59380==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4 byte(s) in 2 object(s) allocated from: #0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x55f2f41be73b in str util/parse-events.l:49 #2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338 #3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464 #4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822 #5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094 #6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279 #7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251 #8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351 #9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539 #10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654 #11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501 #12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322 #13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375 #14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419 #15 0x55f2f409412b in main tools/perf/perf.c:535 SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s). ``` Fix by adding the missing destructor. Fixes: 865582c ("perf tools: Adds the tracepoint name parsing support") Signed-off-by: Ian Rogers <irogers@google.com> Cc: He Kuang <hekuang@huawei.com> Link: https://lore.kernel.org/r/20230914164028.363220-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When GPU reset, we need to mark CRTC and planes for reset if content_type from new and old state differs. Therefore, mark CRTC mode as changed when atomic checking connector states. It mitigates recovering issue when trying to artificially reset GPU with `cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover`: [110.957772] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no status for stream 00000000f8460964 on acrtc000000002c01a447 [110.958270] amdgpu: [PFLIP]:crtc:0, pflip_stat:AMDGPU_FLIP_SUBMITTED [110.958395] ------------[ cut here ]------------ [110.958397] WARNING: CPU: 7 PID: 9 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3126 dc_update_planes_and_stream+0x28a/0x9f0 [amdgpu] [110.958836] Modules linked in: [...] [110.958996] CPU: 7 PID: 9 Comm: kworker/u32:0 Not tainted 6.1.29-valve4-1-neptune-61 #1 822eaa97b1b4805dc2af914eea829bf025fccfc6 [110.959001] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0116 05/12/2023 [110.959004] Workqueue: events_unbound commit_work [110.959012] RIP: 0010:dc_update_planes_and_stream+0x28a/0x9f0 [amdgpu] [110.959438] Code: [...] [110.959441] RSP: 0018:ffffb407800b39d8 EFLAGS: 00010202 [110.959445] RAX: 0000000000000000 RBX: ffff8bcf4cc11800 RCX: 0000000000000000 [110.959447] RDX: 0000000000000000 RSI: ffff8bd017e80000 RDI: ffff8bcf4cc11800 [110.959449] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb407800b39b8 [110.959451] R10: ffff8bcf4cc11800 R11: ffffffffc0e01a80 R12: ffff8bcf491c4000 [110.959453] R13: ffff8bcf4b0e0000 R14: 0000000000000000 R15: 0000000000000000 [110.959456] FS: 0000000000000000(0000) GS:ffff8bd26edc0000(0000) knlGS:0000000000000000 [110.959458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [110.959461] CR2: 000017568061a809 CR3: 00000001fb610000 CR4: 0000000000350ee0 [110.959463] Call Trace: [110.959469] <TASK> [110.959476] amdgpu_dm_atomic_commit_tail+0x19f1/0x39e0 [amdgpu d6a1084b950a0fdbd34dc529d5f93aeb468cb916] [110.959918] ? update_sd_lb_stats.constprop.0+0x961/0x980 [110.959927] ? load_balance+0x19d/0xe90 [110.959931] ? update_load_avg+0x7e/0x780 [110.959937] ? sugov_update_single_freq+0x18f/0x250 [110.959943] ? update_load_avg+0x7e/0x780 [110.959948] ? psi_group_change+0x15f/0x3a0 [110.959953] ? psi_task_switch+0xd6/0x230 [110.959958] ? wait_for_completion_timeout+0x13e/0x170 [110.959963] ? __switch_to_asm+0x3e/0x60 [110.959968] ? finish_task_switch.isra.0+0x90/0x2c0 [110.959975] commit_tail+0x94/0x130 [110.959979] process_one_work+0x1c7/0x380 [110.959987] worker_thread+0x51/0x390 [110.959992] ? rescuer_thread+0x3b0/0x3b0 [110.959996] kthread+0xde/0x110 [110.960000] ? kthread_complete_and_exit+0x20/0x20 [110.960004] ret_from_fork+0x22/0x30 [110.960010] </TASK> [110.960012] ---[ end trace 0000000000000000 ]--- [117.904310] tpm tpm0: Operation Timed out [121.024304] amdgpu 0000:04:00.0: [drm] *ERROR* [CRTC:81:crtc-0] flip_done timed out NOTE: content_type support for AMD display is a downstream work yet. Link: https://gitlab.steamos.cloud/jupiter/linux-integration/-/issues/33 Fixes: c57428cc9cca ("drm/amd/display: Hook up content_type property") Reported-by: Guilherme Piccoli <gpiccoli@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com>
Commit ddd0a42 only increments scomp_scratch_users when it was 0, causing a panic when using ipcomp: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [ChimeraOS#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 619 Comm: ping Tainted: G N 6.15.0-rc3-net-00032-ga79be02bba5c torvalds#41 PREEMPT(full) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:inflate_fast+0x5a2/0x1b90 [...] Call Trace: <IRQ> zlib_inflate+0x2d60/0x6620 deflate_sdecompress+0x166/0x350 scomp_acomp_comp_decomp+0x45f/0xa10 scomp_acomp_decompress+0x21/0x120 acomp_do_req_chain+0x3e5/0x4e0 ipcomp_input+0x212/0x550 xfrm_input+0x2de2/0x72f0 [...] Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Instead, let's keep the old increment, and decrement back to 0 if the scratch allocation fails. Fixes: ddd0a42 ("crypto: scompress - Fix scratch allocation failure handling") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For some SPI flash memory operations, dummy bytes are not mandatory. For example, in Winbond SPINAND flash memory devices, the `write_cache` and `update_cache` operation variants have zero dummy bytes. Calculating the duration for SPI memory operations with zero dummy bytes causes a divide error when `ncycles` is calculated in the spi_mem_calc_op_duration(). Add changes to skip the 'ncylcles' calculation for zero dummy bytes. Following divide error is fixed by this change: Oops: divide error: 0000 [ChimeraOS#1] PREEMPT SMP NOPTI ... ? do_trap+0xdb/0x100 ? do_error_trap+0x75/0xb0 ? spi_mem_calc_op_duration+0x56/0xb0 ? exc_divide_error+0x3b/0x70 ? spi_mem_calc_op_duration+0x56/0xb0 ? asm_exc_divide_error+0x1b/0x20 ? spi_mem_calc_op_duration+0x56/0xb0 ? spinand_select_op_variant+0xee/0x190 [spinand] spinand_match_and_init+0x13e/0x1a0 [spinand] spinand_manufacturer_match+0x6e/0xa0 [spinand] spinand_probe+0x357/0x7f0 [spinand] ? kernfs_activate+0x87/0xd0 spi_mem_probe+0x7a/0xb0 spi_probe+0x7d/0x130 Fixes: 226d6cb ("spi: spi-mem: Estimate the time taken by operations") Suggested-by: Krishnamoorthi M <krishnamoorthi.m@amd.com> Co-developed-by: Akshata MukundShetty <akshata.mukundshetty@amd.com> Signed-off-by: Akshata MukundShetty <akshata.mukundshetty@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20250424121333.417372-1-Raju.Rangoju@amd.com Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org>
When memory is insufficient, the allocation of nfs_lock_context in nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) as valid and proceed to execute rpc_run_task(), this will trigger a NULL pointer dereference in nfs4_locku_prepare. For example: BUG: kernel NULL pointer dereference, address: 000000000000000c PGD 0 P4D 0 Oops: Oops: 0000 [ChimeraOS#1] SMP PTI CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty torvalds#60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 Workqueue: rpciod rpc_async_schedule RIP: 0010:nfs4_locku_prepare+0x35/0xc2 Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 Call Trace: <TASK> __rpc_execute+0xbc/0x480 rpc_async_schedule+0x2f/0x40 process_one_work+0x232/0x5d0 worker_thread+0x1da/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x240 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 000000000000000c ---[ end trace 0000000000000000 ]--- Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and return NULL to terminate subsequent rpc_run_task, preventing NULL pointer dereference. Fixes: f30cb75 ("NFS: Always wait for I/O completion before unlock") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if the TCP connection isn't established when nvmet_tcp_set_queue_sock() is called then queue->state_change isn't set and sock->sk->sk_state_change isn't replaced. As such we don't need to restore sock->sk->sk_state_change if queue->state_change is NULL. This avoids NULL pointer dereferences such as this: [ 286.462026][ C0] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 286.462814][ C0] #PF: supervisor instruction fetch in kernel mode [ 286.463796][ C0] #PF: error_code(0x0010) - not-present page [ 286.464392][ C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0 [ 286.465086][ C0] Oops: Oops: 0010 [ChimeraOS#1] SMP KASAN PTI [ 286.465559][ C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ ChimeraOS#11 PREEMPT(voluntary) [ 286.466393][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 286.467147][ C0] RIP: 0010:0x0 [ 286.467420][ C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 286.467977][ C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246 [ 286.468425][ C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43 [ 286.469019][ C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100 [ 286.469545][ C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c [ 286.470072][ C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3 [ 286.470585][ C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268 [ 286.471070][ C0] FS: 00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000 [ 286.471644][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.472543][ C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0 [ 286.473500][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 286.474467][ C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 286.475453][ C0] Call Trace: [ 286.476102][ C0] <IRQ> [ 286.476719][ C0] tcp_fin+0x2bb/0x440 [ 286.477429][ C0] tcp_data_queue+0x190f/0x4e60 [ 286.478174][ C0] ? __build_skb_around+0x234/0x330 [ 286.478940][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.479659][ C0] ? __pfx_tcp_data_queue+0x10/0x10 [ 286.480431][ C0] ? tcp_try_undo_loss+0x640/0x6c0 [ 286.481196][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.482046][ C0] ? kvm_clock_get_cycles+0x14/0x30 [ 286.482769][ C0] ? ktime_get+0x66/0x150 [ 286.483433][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.484146][ C0] tcp_rcv_established+0x6e4/0x2050 [ 286.484857][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.485523][ C0] ? ipv4_dst_check+0x160/0x2b0 [ 286.486203][ C0] ? __pfx_tcp_rcv_established+0x10/0x10 [ 286.486917][ C0] ? lock_release+0x217/0x2c0 [ 286.487595][ C0] tcp_v4_do_rcv+0x4d6/0x9b0 [ 286.488279][ C0] tcp_v4_rcv+0x2af8/0x3e30 [ 286.488904][ C0] ? raw_local_deliver+0x51b/0xad0 [ 286.489551][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.490198][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10 [ 286.490813][ C0] ? __pfx_raw_local_deliver+0x10/0x10 [ 286.491487][ C0] ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack] [ 286.492275][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.492900][ C0] ip_protocol_deliver_rcu+0x8f/0x370 [ 286.493579][ C0] ip_local_deliver_finish+0x297/0x420 [ 286.494268][ C0] ip_local_deliver+0x168/0x430 [ 286.494867][ C0] ? __pfx_ip_local_deliver+0x10/0x10 [ 286.495498][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10 [ 286.496204][ C0] ? ip_rcv_finish_core+0x19a/0x1f20 [ 286.496806][ C0] ? lock_release+0x217/0x2c0 [ 286.497414][ C0] ip_rcv+0x455/0x6e0 [ 286.497945][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.498550][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.499137][ C0] ? __pfx_ip_rcv_finish+0x10/0x10 [ 286.499763][ C0] ? lock_release+0x217/0x2c0 [ 286.500327][ C0] ? dl_scaled_delta_exec+0xd1/0x2c0 [ 286.500922][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.501480][ C0] __netif_receive_skb_one_core+0x166/0x1b0 [ 286.502173][ C0] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 286.502903][ C0] ? lock_acquire+0x2b2/0x310 [ 286.503487][ C0] ? process_backlog+0x372/0x1350 [ 286.504087][ C0] ? lock_release+0x217/0x2c0 [ 286.504642][ C0] process_backlog+0x3b9/0x1350 [ 286.505214][ C0] ? process_backlog+0x372/0x1350 [ 286.505779][ C0] __napi_poll.constprop.0+0xa6/0x490 [ 286.506363][ C0] net_rx_action+0x92e/0xe10 [ 286.506889][ C0] ? __pfx_net_rx_action+0x10/0x10 [ 286.507437][ C0] ? timerqueue_add+0x1f0/0x320 [ 286.507977][ C0] ? sched_clock_cpu+0x68/0x540 [ 286.508492][ C0] ? lock_acquire+0x2b2/0x310 [ 286.509043][ C0] ? kvm_sched_clock_read+0xd/0x20 [ 286.509607][ C0] ? handle_softirqs+0x1aa/0x7d0 [ 286.510187][ C0] handle_softirqs+0x1f2/0x7d0 [ 286.510754][ C0] ? __pfx_handle_softirqs+0x10/0x10 [ 286.511348][ C0] ? irqtime_account_irq+0x181/0x290 [ 286.511937][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.512510][ C0] do_softirq.part.0+0x89/0xc0 [ 286.513100][ C0] </IRQ> [ 286.513548][ C0] <TASK> [ 286.513953][ C0] __local_bh_enable_ip+0x112/0x140 [ 286.514522][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.515072][ C0] __dev_queue_xmit+0x872/0x3450 [ 286.515619][ C0] ? nft_do_chain+0xe16/0x15b0 [nf_tables] [ 286.516252][ C0] ? __pfx___dev_queue_xmit+0x10/0x10 [ 286.516817][ C0] ? selinux_ip_postroute+0x43c/0xc50 [ 286.517433][ C0] ? __pfx_selinux_ip_postroute+0x10/0x10 [ 286.518061][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.518606][ C0] ? ip_output+0x164/0x4a0 [ 286.519149][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.519671][ C0] ? ip_finish_output2+0x17d5/0x1fb0 [ 286.520258][ C0] ip_finish_output2+0xb4b/0x1fb0 [ 286.520787][ C0] ? __pfx_ip_finish_output2+0x10/0x10 [ 286.521355][ C0] ? __ip_finish_output+0x15d/0x750 [ 286.521890][ C0] ip_output+0x164/0x4a0 [ 286.522372][ C0] ? __pfx_ip_output+0x10/0x10 [ 286.522872][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.523402][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.524031][ C0] ? __pfx_ip_finish_output+0x10/0x10 [ 286.524605][ C0] ? __ip_queue_xmit+0x999/0x2260 [ 286.525200][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.525744][ C0] ? ipv4_dst_check+0x16a/0x2b0 [ 286.526279][ C0] ? lock_release+0x217/0x2c0 [ 286.526793][ C0] __ip_queue_xmit+0x1883/0x2260 [ 286.527324][ C0] ? __skb_clone+0x54c/0x730 [ 286.527827][ C0] __tcp_transmit_skb+0x209b/0x37a0 [ 286.528374][ C0] ? __pfx___tcp_transmit_skb+0x10/0x10 [ 286.528952][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.529472][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.530152][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.530691][ C0] tcp_write_xmit+0xb81/0x88b0 [ 286.531224][ C0] ? mod_memcg_state+0x4d/0x60 [ 286.531736][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.532253][ C0] __tcp_push_pending_frames+0x90/0x320 [ 286.532826][ C0] tcp_send_fin+0x141/0xb50 [ 286.533352][ C0] ? __pfx_tcp_send_fin+0x10/0x10 [ 286.533908][ C0] ? __local_bh_enable_ip+0xab/0x140 [ 286.534495][ C0] inet_shutdown+0x243/0x320 [ 286.535077][ C0] nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp] [ 286.535709][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.536314][ C0] ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp] [ 286.536996][ C0] ? do_raw_spin_unlock+0x54/0x1e0 [ 286.537550][ C0] ? _raw_spin_unlock+0x29/0x50 [ 286.538127][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.538664][ C0] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 286.539249][ C0] ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.539892][ C0] ? __wake_up+0x40/0x60 [ 286.540392][ C0] nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.541047][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.541589][ C0] nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp] [ 286.542254][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.542887][ C0] ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp] [ 286.543568][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.544166][ C0] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 286.544792][ C0] ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core] [ 286.545477][ C0] nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp] [ 286.546126][ C0] nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics] [ 286.546775][ C0] ? rw_verify_area+0x69/0x520 [ 286.547334][ C0] vfs_write+0x218/0xe90 [ 286.547854][ C0] ? do_syscall_64+0x9f/0x190 [ 286.548408][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.549037][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.549659][ C0] ? __pfx_vfs_write+0x10/0x10 [ 286.550259][ C0] ? do_syscall_64+0x9f/0x190 [ 286.550840][ C0] ? syscall_exit_to_user_mode+0x8e/0x280 [ 286.551516][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.552180][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.552834][ C0] ? ksys_read+0xf5/0x1c0 [ 286.553386][ C0] ? __pfx_ksys_read+0x10/0x10 [ 286.553964][ C0] ksys_write+0xf5/0x1c0 [ 286.554499][ C0] ? __pfx_ksys_write+0x10/0x10 [ 286.555072][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.555698][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.556319][ C0] ? do_syscall_64+0x54/0x190 [ 286.556866][ C0] do_syscall_64+0x93/0x190 [ 286.557420][ C0] ? rcu_read_unlock+0x17/0x60 [ 286.557986][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.558526][ C0] ? lock_release+0x217/0x2c0 [ 286.559087][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.559659][ C0] ? count_memcg_events.constprop.0+0x4a/0x60 [ 286.560476][ C0] ? exc_page_fault+0x7a/0x110 [ 286.561064][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.561647][ C0] ? lock_release+0x217/0x2c0 [ 286.562257][ C0] ? do_user_addr_fault+0x171/0xa00 [ 286.562839][ C0] ? do_user_addr_fault+0x4a2/0xa00 [ 286.563453][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.564112][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.564677][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.565317][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.565922][ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 286.566542][ C0] RIP: 0033:0x7fe3c05e6504 [ 286.567102][ C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 286.568931][ C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 286.569807][ C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504 [ 286.570621][ C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003 [ 286.571443][ C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930 [ 286.572246][ C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60 [ 286.573069][ C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e [ 286.573886][ C0] </TASK> Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/ Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Use a separate lock in the polling function eu_stall_data_buf_poll() instead of eu_stall->stream_lock. This would prevent a possible circular locking dependency leading to a deadlock as described below. This would also require additional locking with the new lock in the read function. <4> [787.192986] ====================================================== <4> [787.192988] WARNING: possible circular locking dependency detected <4> [787.192991] 6.14.0-rc7-xe+ ChimeraOS#1 Tainted: G U <4> [787.192993] ------------------------------------------------------ <4> [787.192994] xe_eu_stall/20093 is trying to acquire lock: <4> [787.192996] ffff88819847e2c0 ((work_completion) (&(&stream->buf_poll_work)->work)), at: __flush_work+0x1f8/0x5e0 <4> [787.193005] but task is already holding lock: <4> [787.193007] ffff88814ce83ba8 (>->eu_stall->stream_lock){3:3}, at: xe_eu_stall_stream_ioctl+0x41/0x6a0 [xe] <4> [787.193090] which lock already depends on the new lock. <4> [787.193093] the existing dependency chain (in reverse order) is: <4> [787.193095] -> ChimeraOS#1 (>->eu_stall->stream_lock){+.+.}-{3:3}: <4> [787.193099] __mutex_lock+0xb4/0xe40 <4> [787.193104] mutex_lock_nested+0x1b/0x30 <4> [787.193106] eu_stall_data_buf_poll_work_fn+0x44/0x1d0 [xe] <4> [787.193155] process_one_work+0x21c/0x740 <4> [787.193159] worker_thread+0x1db/0x3c0 <4> [787.193161] kthread+0x10d/0x270 <4> [787.193164] ret_from_fork+0x44/0x70 <4> [787.193168] ret_from_fork_asm+0x1a/0x30 <4> [787.193172] -> #0 ((work_completion)(&(&stream->buf_poll_work)->work)){+.+.}-{0:0}: <4> [787.193176] __lock_acquire+0x1637/0x2810 <4> [787.193180] lock_acquire+0xc9/0x300 <4> [787.193183] __flush_work+0x219/0x5e0 <4> [787.193186] cancel_delayed_work_sync+0x87/0x90 <4> [787.193189] xe_eu_stall_disable_locked+0x9a/0x260 [xe] <4> [787.193237] xe_eu_stall_stream_ioctl+0x5b/0x6a0 [xe] <4> [787.193285] __x64_sys_ioctl+0xa4/0xe0 <4> [787.193289] x64_sys_call+0x131e/0x2650 <4> [787.193292] do_syscall_64+0x91/0x180 <4> [787.193295] entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> [787.193299] other info that might help us debug this: <4> [787.193302] Possible unsafe locking scenario: <4> [787.193304] CPU0 CPU1 <4> [787.193305] ---- ---- <4> [787.193306] lock(>->eu_stall->stream_lock); <4> [787.193308] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193311] lock(>->eu_stall->stream_lock); <4> [787.193313] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193315] *** DEADLOCK *** Fixes: 760edec ("drm/xe/eustall: Add support to read() and poll() EU stall data") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4598 Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/c896932fca84f79db2df5942911997ed77b2b9b6.1744934656.git.harish.chegondi@intel.com (cherry picked from commit c2b1f1b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Commit a595138 ("arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists") added some additional CPUs to the Spectre-BHB workaround, including some new arrays for designs that require new 'k' values for the workaround to be effective. Unfortunately, the new arrays omitted the sentinel entry and so is_midr_in_range_list() will walk off the end when it doesn't find a match. With UBSAN enabled, this leads to a crash during boot when is_midr_in_range_list() is inlined (which was more common prior to c8c2647 ("arm64: Make _midr_in_range_list() an exported function")): | Internal error: aarch64 BRK: 00000000f2000001 [ChimeraOS#1] PREEMPT SMP | pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : spectre_bhb_loop_affected+0x28/0x30 | lr : is_spectre_bhb_affected+0x170/0x190 | [...] | Call trace: | spectre_bhb_loop_affected+0x28/0x30 | update_cpu_capabilities+0xc0/0x184 | init_cpu_features+0x188/0x1a4 | cpuinfo_store_boot_cpu+0x4c/0x60 | smp_prepare_boot_cpu+0x38/0x54 | start_kernel+0x8c/0x478 | __primary_switched+0xc8/0xd4 | Code: 6b09011f 54000061 52801080 d65f03c0 (d4200020) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: aarch64 BRK: Fatal exception Add the missing sentinel entries. Cc: Lee Jones <lee@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: <stable@vger.kernel.org> Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: a595138 ("arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Lee Jones <lee@kernel.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250501104747.28431-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [ChimeraOS#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ torvalds#47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The blammed commit copied to argv the size of the reallocated argv, instead of the size of the old_argv, thus reading and copying from past the old_argv allocated memory. Following BUG_ON was hit: [ 3.038929][ T1] kernel BUG at lib/string_helpers.c:1040! [ 3.039147][ T1] Internal error: Oops - BUG: 00000000f2000800 [ChimeraOS#1] SMP ... [ 3.056489][ T1] Call trace: [ 3.056591][ T1] __fortify_panic+0x10/0x18 (P) [ 3.056773][ T1] dm_split_args+0x20c/0x210 [ 3.056942][ T1] dm_table_add_target+0x13c/0x360 [ 3.057132][ T1] table_load+0x110/0x3ac [ 3.057292][ T1] dm_ctl_ioctl+0x424/0x56c [ 3.057457][ T1] __arm64_sys_ioctl+0xa8/0xec [ 3.057634][ T1] invoke_syscall+0x58/0x10c [ 3.057804][ T1] el0_svc_common+0xa8/0xdc [ 3.057970][ T1] do_el0_svc+0x1c/0x28 [ 3.058123][ T1] el0_svc+0x50/0xac [ 3.058266][ T1] el0t_64_sync_handler+0x60/0xc4 [ 3.058452][ T1] el0t_64_sync+0x1b0/0x1b4 [ 3.058620][ T1] Code: f800865e a9bf7bfd 910003fd 941f48aa (d4210000) [ 3.058897][ T1] ---[ end trace 0000000000000000 ]--- [ 3.059083][ T1] Kernel panic - not syncing: Oops - BUG: Fatal exception Fix it by copying the size of src, and not the size of dst, as it was. Fixes: 5a2a6c4 ("dm: always update the array size in realloc_argv on success") Cc: stable@vger.kernel.org Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Calling core::fmt::write() from rust code while FineIBT is enabled results in a kernel panic: [ 4614.199779] kernel BUG at arch/x86/kernel/cet.c:132! [ 4614.205343] Oops: invalid opcode: 0000 [ChimeraOS#1] PREEMPT SMP NOPTI [ 4614.211781] CPU: 2 UID: 0 PID: 6057 Comm: dmabuf_dump Tainted: G U O 6.12.17-android16-0-g6ab38c534a43 ChimeraOS#1 9da040f27673ec3945e23b998a0f8bd64c846599 [ 4614.227832] Tainted: [U]=USER, [O]=OOT_MODULE [ 4614.241247] RIP: 0010:do_kernel_cp_fault+0xea/0xf0 ... [ 4614.398144] RIP: 0010:_RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x0/0x20 [ 4614.407792] Code: 48 f7 df 48 0f 48 f9 48 89 f2 89 c6 5d e9 18 fd ff ff 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 41 81 ea 14 61 af 2c 74 03 0f 0b 90 <66> 0f 1f 00 55 48 89 e5 48 89 f2 48 8b 3f be 01 00 00 00 5d e9 e7 [ 4614.428775] RSP: 0018:ffffb95acfa4ba68 EFLAGS: 00010246 [ 4614.434609] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 [ 4614.442587] RDX: 0000000000000007 RSI: ffffb95acfa4ba70 RDI: ffffb95acfa4bc88 [ 4614.450557] RBP: ffffb95acfa4bae0 R08: ffff0a00ffffff05 R09: 0000000000000070 [ 4614.458527] R10: 0000000000000000 R11: ffffffffab67eaf0 R12: ffffb95acfa4bcc8 [ 4614.466493] R13: ffffffffac5d50f0 R14: 0000000000000000 R15: 0000000000000000 [ 4614.474473] ? __cfi__RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x10/0x10 [ 4614.484118] ? _RNvNtCs3o2tGsuHyou_4core3fmt5write+0x1d2/0x250 This happens because core::fmt::write() calls core::fmt::rt::Argument::fmt(), which currently has CFI disabled: library/core/src/fmt/rt.rs: 171 // FIXME: Transmuting formatter in new and indirectly branching to/calling 172 // it here is an explicit CFI violation. 173 #[allow(inline_no_sanitize)] 174 #[no_sanitize(cfi, kcfi)] 175 #[inline] 176 pub(super) unsafe fn fmt(&self, f: &mut Formatter<'_>) -> Result { This causes a Control Protection exception, because FineIBT has sealed off the original function's endbr64. This makes rust currently incompatible with FineIBT. Add a Kconfig dependency that prevents FineIBT from getting turned on by default if rust is enabled. [ Rust 1.88.0 (scheduled for 2025-06-26) should have this fixed [1], and thus we relaxed the condition with Rust >= 1.88. When `objtool` lands checking for this with e.g. [2], the plan is to ideally run that in upstream Rust's CI to prevent regressions early [3], since we do not control `core`'s source code. Alice tested the Rust PR backported to an older compiler. Peter would like that Rust provides a stable `core` which can be pulled into the kernel: "Relying on that much out of tree code is 'unfortunate'". - Miguel ] Signed-off-by: Paweł Anikiel <panikiel@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: rust-lang/rust#139632 [1] Link: https://lore.kernel.org/rust-for-linux/20250410154556.GB9003@noisy.programming.kicks-ass.net/ [2] Link: rust-lang/rust#139632 (comment) [3] Link: https://lore.kernel.org/r/20250410115420.366349-1-panikiel@google.com Link: https://lore.kernel.org/r/att0-CANiq72kjDM0cKALVy4POEzhfdT4nO7tqz0Pm7xM+3=_0+L1t=A@mail.gmail.com [ Reduced splat. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
…cess Add __netif_tx_lock() to ensure that only one packet is being transmitted at a time to avoid race conditions in the netif_txq struct and prevent packet data corruption. Failing to do so causes kernel panic with the following error: [ 2184.746764] ------------[ cut here ]------------ [ 2184.751412] kernel BUG at lib/dynamic_queue_limits.c:99! [ 2184.756728] Internal error: Oops - BUG: 00000000f2000800 [ChimeraOS#1] PREEMPT SMP logs: https://gist.github.com/MeghanaMalladiTI/9c7aa5fc3b7fb03f87c74aad487956e9 The lock is acquired before calling emac_xmit_xdp_frame() and released after the call returns. This ensures that the TX queue is protected from concurrent access during the transmission of XDP frames. Fixes: 62aa324 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250506110546.4065715-3-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When userspace does PR_SET_TAGGED_ADDR_CTRL, but Supm extension is not available, the kernel crashes: Oops - illegal instruction [ChimeraOS#1] [snip] epc : set_tagged_addr_ctrl+0x112/0x15a ra : set_tagged_addr_ctrl+0x74/0x15a epc : ffffffff80011ace ra : ffffffff80011a30 sp : ffffffc60039be10 [snip] status: 0000000200000120 badaddr: 0000000010a79073 cause: 0000000000000002 set_tagged_addr_ctrl+0x112/0x15a __riscv_sys_prctl+0x352/0x73c do_trap_ecall_u+0x17c/0x20c andle_exception+0x150/0x15c Fix it by checking if Supm is available. Fixes: 09d6775 ("riscv: Add support for userspace pointer masking") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20250504101920.3393053-1-namcao@linutronix.de Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
When CONFIG_PROVE_RCU_LIST is enabled, fprobe triggers the following warning: WARNING: suspicious RCU usage kernel/trace/fprobe.c:457 RCU-list traversed in non-reader section!! other info that might help us debug this: ChimeraOS#1: ffffffff863c4e08 (fprobe_mutex){+.+.}-{4:4}, at: fprobe_module_callback+0x7b/0x8c0 Call Trace: fprobe_module_callback notifier_call_chain blocking_notifier_call_chain This warning occurs because fprobe_remove_node_in_module() traverses an RCU list using RCU primitives without holding an RCU read lock. However, the function is only called from fprobe_module_callback(), which holds the fprobe_mutex lock that provides sufficient protection for safely traversing the list. Fix the warning by specifying the locking design to the CONFIG_PROVE_RCU_LIST mechanism. Add the lockdep_is_held() argument to hlist_for_each_entry_rcu() to inform the RCU checker that fprobe_mutex provides the required protection. Link: https://lore.kernel.org/all/20250410-fprobe-v1-1-068ef5f41436@debian.org/ Fixes: a3dc298 ("tracing: fprobe: Cleanup fprobe hash when module unloading") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Antonio Quartulli <antonio@mandelbit.com> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
into HEAD KVM/riscv fixes for 6.15, take ChimeraOS#1 - Add missing reset of smstateen CSRs
If the discard worker is running and there's currently only one block group, that block group is a data block group, it's in the unused block groups discard list and is being used (it got an extent allocated from it after becoming unused), the worker can end up in an infinite loop if a transaction abort happens or the async discard is disabled (during remount or unmount for example). This happens like this: 1) Task A, the discard worker, is at peek_discard_list() and find_next_block_group() returns block group X; 2) Block group X is in the unused block groups discard list (its discard index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past it become an unused block group and was added to that list, but then later it got an extent allocated from it, so its ->used counter is not zero anymore; 3) The current transaction is aborted by task B and we end up at __btrfs_handle_fs_error() in the transaction abort path, where we call btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode (setting SB_RDONLY in the super block's s_flags field); 4) Task A calls __add_to_discard_list() with the goal of moving the block group from the unused block groups discard list into another discard list, but at __add_to_discard_list() we end up doing nothing because btrfs_run_discard_work() returns false, since the super block has SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set anymore in fs_info->flags. So block group X remains in the unused block groups discard list; 5) Task A then does a goto into the 'again' label, calls find_next_block_group() again we gets block group X again. Then it repeats the previous steps over and over since there are not other block groups in the discard lists and block group X is never moved out of the unused block groups discard list since btrfs_run_discard_work() keeps returning false and therefore __add_to_discard_list() doesn't move block group X out of that discard list. When this happens we can get a soft lockup report like this: [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] [71.957] Modules linked in: xfs af_packet rfkill (...) [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default ChimeraOS#1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.957] Tainted: [W]=WARN [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [71.957] Code: c1 01 48 83 (...) [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 [71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 [71.957] PKRU: 55555554 [71.957] Call Trace: [71.957] <TASK> [71.957] process_one_work+0x17e/0x330 [71.957] worker_thread+0x2ce/0x3f0 [71.957] ? __pfx_worker_thread+0x10/0x10 [71.957] kthread+0xef/0x220 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork+0x34/0x50 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork_asm+0x1a/0x30 [71.957] </TASK> [71.957] Kernel panic - not syncing: softlockup: hung tasks [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default ChimeraOS#1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.992] Call Trace: [71.993] <IRQ> [71.994] dump_stack_lvl+0x5a/0x80 [71.994] panic+0x10b/0x2da [71.995] watchdog_timer_fn.cold+0x9a/0xa1 [71.996] ? __pfx_watchdog_timer_fn+0x10/0x10 [71.997] __hrtimer_run_queues+0x132/0x2a0 [71.997] hrtimer_interrupt+0xff/0x230 [71.998] __sysvec_apic_timer_interrupt+0x55/0x100 [71.999] sysvec_apic_timer_interrupt+0x6c/0x90 [72.000] </IRQ> [72.000] <TASK> [72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [72.002] Code: c1 01 48 83 (...) [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] [72.011] process_one_work+0x17e/0x330 [72.012] worker_thread+0x2ce/0x3f0 [72.013] ? __pfx_worker_thread+0x10/0x10 [72.014] kthread+0xef/0x220 [72.014] ? __pfx_kthread+0x10/0x10 [72.015] ret_from_fork+0x34/0x50 [72.015] ? __pfx_kthread+0x10/0x10 [72.016] ret_from_fork_asm+0x1a/0x30 [72.017] </TASK> [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [72.019] Rebooting in 90 seconds.. So fix this by making sure we move a block group out of the unused block groups discard list when calling __add_to_discard_list(). Fixes: 2bee7eb ("btrfs: discard one region at a time in async discard") Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features. If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set. Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features(). Kernel log: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [ChimeraOS#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 ChimeraOS#1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 8ff0ac5 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…unload Kernel panic occurs when a devmem TCP socket is closed after NIC module is unloaded. This is Devmem TCP unregistration scenarios. number is an order. (a)netlink socket close (b)pp destroy (c)uninstall result 1 2 3 OK 1 3 2 (d)Impossible 2 1 3 OK 3 1 2 (e)Kernel panic 2 3 1 (d)Impossible 3 2 1 (d)Impossible (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is closed. (b) page_pool_destroy() is called when the interface is down. (c) mp_ops->uninstall() is called when an interface is unregistered. (d) There is no scenario in mp_ops->uninstall() is called before page_pool_destroy(). Because unregister_netdevice_many_notify() closes interfaces first and then calls mp_ops->uninstall(). (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire netdev_lock(). But if the interface module has already been removed, net_device pointer is invalid, so it causes kernel panic. In summary, there are only 3 possible scenarios. A. sk close -> pp destroy -> uninstall. B. pp destroy -> sk close -> uninstall. C. pp destroy -> uninstall -> sk close. Case C is a kernel panic scenario. In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set binding->dev to NULL. It indicates an bound net_device was unregistered. It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock() if binding->dev is NULL. A new binding->lock is added to protect a dev of a binding. So, lock ordering is like below. priv->lock netdev_lock(dev) binding->lock Tests: Scenario A: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 kill $pid ip link set $interface down modprobe -rv $module Scenario B: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 ip link set $interface down kill $pid modprobe -rv $module Scenario C: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 modprobe -rv $module sleep 5 kill $pid Splat looks like: Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [ChimeraOS#1] SMP DEBUG_PAGEALLOC KASAN NOPTI KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf] CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ ChimeraOS#2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f Tainted: [B]=BAD_PAGE, [W]=WARN RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746) Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811 RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4 R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88 FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ... netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3)) genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705) ... netlink_release (net/netlink/af_netlink.c:737) ... __sock_release (net/socket.c:647) sock_close (net/socket.c:1393) Fixes: 1d22d30 ("net: drop rtnl_lock for queue_mgmt operations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guoyu Yin reported a splat in the ipmr netns cleanup path: WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline] WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Modules linked in: CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 ChimeraOS#1 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline] RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8 RSP: 0018:ffff888109547c58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868 RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005 RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9 R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001 R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058 FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0 Call Trace: <TASK> ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160 ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177 setup_net+0x47d/0x8e0 net/core/net_namespace.c:394 copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516 create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228 ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342 __do_sys_unshare kernel/fork.c:3413 [inline] __se_sys_unshare kernel/fork.c:3411 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3411 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f84f532cc29 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400 RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328 </TASK> The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and the sanity check for such build is still too loose. Address the issue consolidating the relevant sanity check in a single helper regardless of the kernel configuration. Also share it between the ipv4 and ipv6 code. Reported-by: Guoyu Yin <y04609127@gmail.com> Fixes: 50b9420 ("ipmr: tune the ipmr_can_free_table() checks.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing a right split on a folio, the split_at2 may point to a not-present page if the offset + length equals the original folio size, which will trigger the following error: BUG: unable to handle page fault for address: ffffea0006000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 143ffb9067 P4D 143ffb9067 PUD 143ffb8067 PMD 0 Oops: Oops: 0000 [ChimeraOS#1] SMP PTI CPU: 0 UID: 0 PID: 502640 Comm: fsx Not tainted 6.15.0-rc3-gc6156189fc6b torvalds#889 PR Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/4 RIP: 0010:truncate_inode_partial_folio+0x208/0x620 Code: ff 03 48 01 da e8 78 7e 13 00 48 83 05 10 b5 5a 0c 01 85 c0 0f 85 1c 02 001 RSP: 0018:ffffc90005bafab0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffea0005ffff00 RCX: 0000000000000002 RDX: 000000000000000c RSI: 0000000000013975 RDI: ffffc90005bafa30 RBP: ffffea0006000000 R08: 0000000000000000 R09: 00000000000009bf R10: 00000000000007e0 R11: 0000000000000000 R12: 0000000000001633 R13: 0000000000000000 R14: ffffea0005ffff00 R15: fffffffffffffffe FS: 00007f9f9a161740(0000) GS:ffff8894971fd000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0006000008 CR3: 000000017c2ae000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> truncate_inode_pages_range+0x226/0x720 truncate_pagecache+0x57/0x90 ... Fix this issue by skipping the split if truncation aligns with the folio size, make sure the split page number lies within the folio. Link: https://lkml.kernel.org/r/20250512062825.3533342-1-yi.zhang@huaweicloud.com Fixes: 7460b47 ("mm/truncate: use folio_split() in truncate operation") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: ErKun Yang <yangerkun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…ugetlb folios A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [ChimeraOS#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp torvalds#41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d2 ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Upstream commit ef1d345 ] If a faulty CXL memory device returns a broken zero LSA size in its memory device information (Identify Memory Device (Opcode 4000h), CXL spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm driver: Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm] Code and flow: 1) CXL Command 4000h returns LSA size = 0 2) config_size is assigned to zero LSA size (CXL pmem driver): drivers/cxl/pmem.c: .config_size = mds->lsa_size, 3) max_xfer is set to zero (nvdimm driver): drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size); 4) A subsequent DIV_ROUND_UP() causes a division by zero: drivers/nvdimm/label.c: /* Make our initial read size a multiple of max_xfer size */ drivers/nvdimm/label.c: read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer, drivers/nvdimm/label.c- config_size); Fix this by checking the config size parameter by extending an existing check. Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250320112223.608320-1-rrichter@amd.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5da692e ] A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object. Reproduce steps: 1. create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="128" nr_cache_blocks="512" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="0" origin_block="0" dirty="false"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. wipe the second array block of the mapping array to simulate data degradations. mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \ 2>/dev/null | hexdump -e '1/8 "%u\n"') ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \ 2>/dev/null | hexdump -e '1/8 "%u\n"') dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock 3. try bringing up the cache device. The resume is expected to fail due to the broken array block. dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dmsetup create cache --notable dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup resume cache 4. try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings. dmsetup resume cache Kernel logs: (snip) ------------[ cut here ]------------ kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570 Fix by disallowing resume operations for devices that failed the initial attempt. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b9366c ] If waiting for gpu reset done in KFD release_work, thers is WARNING: possible circular locking dependency detected #2 kfd_create_process kfd_process_mutex flush kfd release work #1 kfd release work wait for amdgpu reset work #0 amdgpu_device_gpu_reset kgd2kfd_pre_reset kfd_process_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&p->release_work)); lock((wq_completion)kfd_process_wq); lock((work_completion)(&p->release_work)); lock((wq_completion)amdgpu-reset-dev); To fix this, KFD create process move flush release work outside kfd_process_mutex. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Tianxiang Peng <txpeng@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 183a087 ] It looks like GPUs are used after shutdown is invoked. Thus, breaking virtio gpu in the shutdown callback is not a good idea - guest hangs attempting to finish console drawing, with these warnings: [ 20.504464] WARNING: CPU: 0 PID: 568 at drivers/gpu/drm/virtio/virtgpu_vq.c:358 virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.505685] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common nfit libnvdimm kvm_intel kvm rapl iTCO_wdt iTCO_vendor_support virtio_gpu virtio_dma_buf pcspkr drm_shmem_helper i2c_i801 drm_kms_helper lpc_ich i2c_smbus virtio_balloon joydev drm fuse xfs libcrc32c ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw dm_mirror dm_region_hash dm_log dm_mod [ 20.511847] CPU: 0 PID: 568 Comm: kworker/0:3 Kdump: loaded Tainted: G W ------- --- 5.14.0-578.6675_1757216455.el9.x86_64 #1 [ 20.513157] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-3.el9 11/17/2024 [ 20.513918] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] [ 20.514626] RIP: 0010:virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.515332] Code: 00 00 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 51 50 00 00 65 ff 0d 42 e3 74 3f 0f 85 69 ff ff ff 0f 1f 44 00 00 e9 5f ff ff ff <0f> 0b e9 3f ff ff ff 48 83 3c 24 00 74 0e 49 8b 7f 40 48 85 ff 74 [ 20.517272] RSP: 0018:ff34f0a8c0787ad8 EFLAGS: 00010282 [ 20.517820] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: 0000000000000820 [ 20.518565] RDX: 0000000000000000 RSI: ff34f0a8c0787be0 RDI: ff218bef03a26300 [ 20.519308] RBP: ff218bef03a26300 R08: 0000000000000001 R09: ff218bef07224360 [ 20.520059] R10: 0000000000008dc0 R11: 0000000000000002 R12: ff218bef02630028 [ 20.520806] R13: ff218bef0263fb48 R14: ff218bef00cb8000 R15: ff218bef07224360 [ 20.521555] FS: 0000000000000000(0000) GS:ff218bef7ba00000(0000) knlGS:0000000000000000 [ 20.522397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.522996] CR2: 000055ac4f7871c0 CR3: 000000010b9f2002 CR4: 0000000000771ef0 [ 20.523740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 20.524477] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 20.525223] PKRU: 55555554 [ 20.525515] Call Trace: [ 20.525777] <TASK> [ 20.526003] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526464] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526925] ? virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.527643] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.528282] ? __warn+0x7e/0xd0 [ 20.528621] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.529256] ? report_bug+0x100/0x140 [ 20.529643] ? handle_bug+0x3c/0x70 [ 20.530010] ? exc_invalid_op+0x14/0x70 [ 20.530421] ? asm_exc_invalid_op+0x16/0x20 [ 20.530862] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.531506] ? virtio_gpu_queue_ctrl_sgs+0x174/0x290 [virtio_gpu] [ 20.532148] virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.532843] virtio_gpu_primary_plane_update+0x3e2/0x460 [virtio_gpu] [ 20.533520] drm_atomic_helper_commit_planes+0x108/0x320 [drm_kms_helper] [ 20.534233] drm_atomic_helper_commit_tail+0x45/0x80 [drm_kms_helper] [ 20.534914] commit_tail+0xd2/0x130 [drm_kms_helper] [ 20.535446] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper] [ 20.536097] drm_atomic_commit+0xa4/0xe0 [drm] [ 20.536588] ? __pfx___drm_printfn_info+0x10/0x10 [drm] [ 20.537162] drm_atomic_helper_dirtyfb+0x192/0x270 [drm_kms_helper] [ 20.537823] drm_fbdev_shmem_helper_fb_dirty+0x43/0xa0 [drm_shmem_helper] [ 20.538536] drm_fb_helper_damage_work+0x87/0x160 [drm_kms_helper] [ 20.539188] process_one_work+0x194/0x380 [ 20.539612] worker_thread+0x2fe/0x410 [ 20.540007] ? __pfx_worker_thread+0x10/0x10 [ 20.540456] kthread+0xdd/0x100 [ 20.540791] ? __pfx_kthread+0x10/0x10 [ 20.541190] ret_from_fork+0x29/0x50 [ 20.541566] </TASK> [ 20.541802] ---[ end trace 0000000000000000 ]--- It looks like the shutdown is called in the middle of console drawing, so we should either wait for it to finish, or let drm handle the shutdown. This patch implements this second option: Add an option for drivers to bypass the common break+reset handling. As DRM is careful to flush/synchronize outstanding buffers, it looks like GPU can just have a NOP there. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Fixes: 8bd2fa0 ("virtio: break and reset virtio devices on device_shutdown()") Cc: Eric Auger <eauger@redhat.com> Cc: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <8490dbeb6f79ed039e6c11d121002618972538a3.1744293540.git.mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46d22b4 ] queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if the TCP connection isn't established when nvmet_tcp_set_queue_sock() is called then queue->state_change isn't set and sock->sk->sk_state_change isn't replaced. As such we don't need to restore sock->sk->sk_state_change if queue->state_change is NULL. This avoids NULL pointer dereferences such as this: [ 286.462026][ C0] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 286.462814][ C0] #PF: supervisor instruction fetch in kernel mode [ 286.463796][ C0] #PF: error_code(0x0010) - not-present page [ 286.464392][ C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0 [ 286.465086][ C0] Oops: Oops: 0010 [#1] SMP KASAN PTI [ 286.465559][ C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ #11 PREEMPT(voluntary) [ 286.466393][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 286.467147][ C0] RIP: 0010:0x0 [ 286.467420][ C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 286.467977][ C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246 [ 286.468425][ C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43 [ 286.469019][ C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100 [ 286.469545][ C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c [ 286.470072][ C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3 [ 286.470585][ C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268 [ 286.471070][ C0] FS: 00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000 [ 286.471644][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.472543][ C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0 [ 286.473500][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 286.474467][ C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 286.475453][ C0] Call Trace: [ 286.476102][ C0] <IRQ> [ 286.476719][ C0] tcp_fin+0x2bb/0x440 [ 286.477429][ C0] tcp_data_queue+0x190f/0x4e60 [ 286.478174][ C0] ? __build_skb_around+0x234/0x330 [ 286.478940][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.479659][ C0] ? __pfx_tcp_data_queue+0x10/0x10 [ 286.480431][ C0] ? tcp_try_undo_loss+0x640/0x6c0 [ 286.481196][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.482046][ C0] ? kvm_clock_get_cycles+0x14/0x30 [ 286.482769][ C0] ? ktime_get+0x66/0x150 [ 286.483433][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.484146][ C0] tcp_rcv_established+0x6e4/0x2050 [ 286.484857][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.485523][ C0] ? ipv4_dst_check+0x160/0x2b0 [ 286.486203][ C0] ? __pfx_tcp_rcv_established+0x10/0x10 [ 286.486917][ C0] ? lock_release+0x217/0x2c0 [ 286.487595][ C0] tcp_v4_do_rcv+0x4d6/0x9b0 [ 286.488279][ C0] tcp_v4_rcv+0x2af8/0x3e30 [ 286.488904][ C0] ? raw_local_deliver+0x51b/0xad0 [ 286.489551][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.490198][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10 [ 286.490813][ C0] ? __pfx_raw_local_deliver+0x10/0x10 [ 286.491487][ C0] ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack] [ 286.492275][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.492900][ C0] ip_protocol_deliver_rcu+0x8f/0x370 [ 286.493579][ C0] ip_local_deliver_finish+0x297/0x420 [ 286.494268][ C0] ip_local_deliver+0x168/0x430 [ 286.494867][ C0] ? __pfx_ip_local_deliver+0x10/0x10 [ 286.495498][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10 [ 286.496204][ C0] ? ip_rcv_finish_core+0x19a/0x1f20 [ 286.496806][ C0] ? lock_release+0x217/0x2c0 [ 286.497414][ C0] ip_rcv+0x455/0x6e0 [ 286.497945][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.498550][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.499137][ C0] ? __pfx_ip_rcv_finish+0x10/0x10 [ 286.499763][ C0] ? lock_release+0x217/0x2c0 [ 286.500327][ C0] ? dl_scaled_delta_exec+0xd1/0x2c0 [ 286.500922][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.501480][ C0] __netif_receive_skb_one_core+0x166/0x1b0 [ 286.502173][ C0] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 286.502903][ C0] ? lock_acquire+0x2b2/0x310 [ 286.503487][ C0] ? process_backlog+0x372/0x1350 [ 286.504087][ C0] ? lock_release+0x217/0x2c0 [ 286.504642][ C0] process_backlog+0x3b9/0x1350 [ 286.505214][ C0] ? process_backlog+0x372/0x1350 [ 286.505779][ C0] __napi_poll.constprop.0+0xa6/0x490 [ 286.506363][ C0] net_rx_action+0x92e/0xe10 [ 286.506889][ C0] ? __pfx_net_rx_action+0x10/0x10 [ 286.507437][ C0] ? timerqueue_add+0x1f0/0x320 [ 286.507977][ C0] ? sched_clock_cpu+0x68/0x540 [ 286.508492][ C0] ? lock_acquire+0x2b2/0x310 [ 286.509043][ C0] ? kvm_sched_clock_read+0xd/0x20 [ 286.509607][ C0] ? handle_softirqs+0x1aa/0x7d0 [ 286.510187][ C0] handle_softirqs+0x1f2/0x7d0 [ 286.510754][ C0] ? __pfx_handle_softirqs+0x10/0x10 [ 286.511348][ C0] ? irqtime_account_irq+0x181/0x290 [ 286.511937][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.512510][ C0] do_softirq.part.0+0x89/0xc0 [ 286.513100][ C0] </IRQ> [ 286.513548][ C0] <TASK> [ 286.513953][ C0] __local_bh_enable_ip+0x112/0x140 [ 286.514522][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.515072][ C0] __dev_queue_xmit+0x872/0x3450 [ 286.515619][ C0] ? nft_do_chain+0xe16/0x15b0 [nf_tables] [ 286.516252][ C0] ? __pfx___dev_queue_xmit+0x10/0x10 [ 286.516817][ C0] ? selinux_ip_postroute+0x43c/0xc50 [ 286.517433][ C0] ? __pfx_selinux_ip_postroute+0x10/0x10 [ 286.518061][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.518606][ C0] ? ip_output+0x164/0x4a0 [ 286.519149][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.519671][ C0] ? ip_finish_output2+0x17d5/0x1fb0 [ 286.520258][ C0] ip_finish_output2+0xb4b/0x1fb0 [ 286.520787][ C0] ? __pfx_ip_finish_output2+0x10/0x10 [ 286.521355][ C0] ? __ip_finish_output+0x15d/0x750 [ 286.521890][ C0] ip_output+0x164/0x4a0 [ 286.522372][ C0] ? __pfx_ip_output+0x10/0x10 [ 286.522872][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.523402][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.524031][ C0] ? __pfx_ip_finish_output+0x10/0x10 [ 286.524605][ C0] ? __ip_queue_xmit+0x999/0x2260 [ 286.525200][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.525744][ C0] ? ipv4_dst_check+0x16a/0x2b0 [ 286.526279][ C0] ? lock_release+0x217/0x2c0 [ 286.526793][ C0] __ip_queue_xmit+0x1883/0x2260 [ 286.527324][ C0] ? __skb_clone+0x54c/0x730 [ 286.527827][ C0] __tcp_transmit_skb+0x209b/0x37a0 [ 286.528374][ C0] ? __pfx___tcp_transmit_skb+0x10/0x10 [ 286.528952][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.529472][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.530152][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.530691][ C0] tcp_write_xmit+0xb81/0x88b0 [ 286.531224][ C0] ? mod_memcg_state+0x4d/0x60 [ 286.531736][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.532253][ C0] __tcp_push_pending_frames+0x90/0x320 [ 286.532826][ C0] tcp_send_fin+0x141/0xb50 [ 286.533352][ C0] ? __pfx_tcp_send_fin+0x10/0x10 [ 286.533908][ C0] ? __local_bh_enable_ip+0xab/0x140 [ 286.534495][ C0] inet_shutdown+0x243/0x320 [ 286.535077][ C0] nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp] [ 286.535709][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.536314][ C0] ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp] [ 286.536996][ C0] ? do_raw_spin_unlock+0x54/0x1e0 [ 286.537550][ C0] ? _raw_spin_unlock+0x29/0x50 [ 286.538127][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.538664][ C0] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 286.539249][ C0] ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.539892][ C0] ? __wake_up+0x40/0x60 [ 286.540392][ C0] nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.541047][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.541589][ C0] nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp] [ 286.542254][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.542887][ C0] ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp] [ 286.543568][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.544166][ C0] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 286.544792][ C0] ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core] [ 286.545477][ C0] nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp] [ 286.546126][ C0] nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics] [ 286.546775][ C0] ? rw_verify_area+0x69/0x520 [ 286.547334][ C0] vfs_write+0x218/0xe90 [ 286.547854][ C0] ? do_syscall_64+0x9f/0x190 [ 286.548408][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.549037][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.549659][ C0] ? __pfx_vfs_write+0x10/0x10 [ 286.550259][ C0] ? do_syscall_64+0x9f/0x190 [ 286.550840][ C0] ? syscall_exit_to_user_mode+0x8e/0x280 [ 286.551516][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.552180][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.552834][ C0] ? ksys_read+0xf5/0x1c0 [ 286.553386][ C0] ? __pfx_ksys_read+0x10/0x10 [ 286.553964][ C0] ksys_write+0xf5/0x1c0 [ 286.554499][ C0] ? __pfx_ksys_write+0x10/0x10 [ 286.555072][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.555698][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.556319][ C0] ? do_syscall_64+0x54/0x190 [ 286.556866][ C0] do_syscall_64+0x93/0x190 [ 286.557420][ C0] ? rcu_read_unlock+0x17/0x60 [ 286.557986][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.558526][ C0] ? lock_release+0x217/0x2c0 [ 286.559087][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.559659][ C0] ? count_memcg_events.constprop.0+0x4a/0x60 [ 286.560476][ C0] ? exc_page_fault+0x7a/0x110 [ 286.561064][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.561647][ C0] ? lock_release+0x217/0x2c0 [ 286.562257][ C0] ? do_user_addr_fault+0x171/0xa00 [ 286.562839][ C0] ? do_user_addr_fault+0x4a2/0xa00 [ 286.563453][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.564112][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.564677][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.565317][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.565922][ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 286.566542][ C0] RIP: 0033:0x7fe3c05e6504 [ 286.567102][ C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 286.568931][ C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 286.569807][ C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504 [ 286.570621][ C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003 [ 286.571443][ C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930 [ 286.572246][ C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60 [ 286.573069][ C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e [ 286.573886][ C0] </TASK> Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/ Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc7e097 ] btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ torvalds#47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5595c31 ] Calling core::fmt::write() from rust code while FineIBT is enabled results in a kernel panic: [ 4614.199779] kernel BUG at arch/x86/kernel/cet.c:132! [ 4614.205343] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 4614.211781] CPU: 2 UID: 0 PID: 6057 Comm: dmabuf_dump Tainted: G U O 6.12.17-android16-0-g6ab38c534a43 #1 9da040f27673ec3945e23b998a0f8bd64c846599 [ 4614.227832] Tainted: [U]=USER, [O]=OOT_MODULE [ 4614.241247] RIP: 0010:do_kernel_cp_fault+0xea/0xf0 ... [ 4614.398144] RIP: 0010:_RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x0/0x20 [ 4614.407792] Code: 48 f7 df 48 0f 48 f9 48 89 f2 89 c6 5d e9 18 fd ff ff 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 41 81 ea 14 61 af 2c 74 03 0f 0b 90 <66> 0f 1f 00 55 48 89 e5 48 89 f2 48 8b 3f be 01 00 00 00 5d e9 e7 [ 4614.428775] RSP: 0018:ffffb95acfa4ba68 EFLAGS: 00010246 [ 4614.434609] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 [ 4614.442587] RDX: 0000000000000007 RSI: ffffb95acfa4ba70 RDI: ffffb95acfa4bc88 [ 4614.450557] RBP: ffffb95acfa4bae0 R08: ffff0a00ffffff05 R09: 0000000000000070 [ 4614.458527] R10: 0000000000000000 R11: ffffffffab67eaf0 R12: ffffb95acfa4bcc8 [ 4614.466493] R13: ffffffffac5d50f0 R14: 0000000000000000 R15: 0000000000000000 [ 4614.474473] ? __cfi__RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x10/0x10 [ 4614.484118] ? _RNvNtCs3o2tGsuHyou_4core3fmt5write+0x1d2/0x250 This happens because core::fmt::write() calls core::fmt::rt::Argument::fmt(), which currently has CFI disabled: library/core/src/fmt/rt.rs: 171 // FIXME: Transmuting formatter in new and indirectly branching to/calling 172 // it here is an explicit CFI violation. 173 #[allow(inline_no_sanitize)] 174 #[no_sanitize(cfi, kcfi)] 175 #[inline] 176 pub(super) unsafe fn fmt(&self, f: &mut Formatter<'_>) -> Result { This causes a Control Protection exception, because FineIBT has sealed off the original function's endbr64. This makes rust currently incompatible with FineIBT. Add a Kconfig dependency that prevents FineIBT from getting turned on by default if rust is enabled. [ Rust 1.88.0 (scheduled for 2025-06-26) should have this fixed [1], and thus we relaxed the condition with Rust >= 1.88. When `objtool` lands checking for this with e.g. [2], the plan is to ideally run that in upstream Rust's CI to prevent regressions early [3], since we do not control `core`'s source code. Alice tested the Rust PR backported to an older compiler. Peter would like that Rust provides a stable `core` which can be pulled into the kernel: "Relying on that much out of tree code is 'unfortunate'". - Miguel ] Signed-off-by: Paweł Anikiel <panikiel@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: rust-lang/rust#139632 [1] Link: https://lore.kernel.org/rust-for-linux/20250410154556.GB9003@noisy.programming.kicks-ass.net/ [2] Link: rust-lang/rust#139632 (comment) [3] Link: https://lore.kernel.org/r/20250410115420.366349-1-panikiel@google.com Link: https://lore.kernel.org/r/att0-CANiq72kjDM0cKALVy4POEzhfdT4nO7tqz0Pm7xM+3=_0+L1t=A@mail.gmail.com [ Reduced splat. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c46286f ] Guoyu Yin reported a splat in the ipmr netns cleanup path: WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline] WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Modules linked in: CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 #1 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline] RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8 RSP: 0018:ffff888109547c58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868 RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005 RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9 R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001 R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058 FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0 Call Trace: <TASK> ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160 ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177 setup_net+0x47d/0x8e0 net/core/net_namespace.c:394 copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516 create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228 ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342 __do_sys_unshare kernel/fork.c:3413 [inline] __se_sys_unshare kernel/fork.c:3411 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3411 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f84f532cc29 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400 RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328 </TASK> The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and the sanity check for such build is still too loose. Address the issue consolidating the relevant sanity check in a single helper regardless of the kernel configuration. Also share it between the ipv4 and ipv6 code. Reported-by: Guoyu Yin <y04609127@gmail.com> Fixes: 50b9420 ("ipmr: tune the ipmr_can_free_table() checks.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
…ugetlb folios commit 113ed54 upstream. A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp torvalds#41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d2 ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3975aa upstream. No device was set which caused serial_base_ctrl_add to crash. BUG: kernel NULL pointer dereference, address: 0000000000000050 Oops: Oops: 0000 [ChimeraOS#1] PREEMPT SMP NOPTI CPU: 16 UID: 0 PID: 368 Comm: (udev-worker) Not tainted 6.12.25-amd64 ChimeraOS#1 Debian 6.12.25-1 RIP: 0010:serial_base_ctrl_add+0x96/0x120 Call Trace: <TASK> serial_core_register_port+0x1a0/0x580 ? __setup_irq+0x39c/0x660 ? __kmalloc_cache_noprof+0x111/0x310 jsm_uart_port_init+0xe8/0x180 [jsm] jsm_probe_one+0x1f4/0x410 [jsm] local_pci_probe+0x42/0x90 pci_device_probe+0x22f/0x270 really_probe+0xdb/0x340 ? pm_runtime_barrier+0x54/0x90 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0xa0 __driver_attach+0xba/0x1c0 bus_for_each_dev+0x8c/0xe0 bus_add_driver+0x112/0x1f0 driver_register+0x72/0xd0 jsm_init_module+0x36/0xff0 [jsm] ? __pfx_jsm_init_module+0x10/0x10 [jsm] do_one_initcall+0x58/0x310 do_init_module+0x60/0x230 Tested with Digi Neo PCIe 8 port card. Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM") Cc: stable <stable@kernel.org> Signed-off-by: Dustin Lundquist <dustin@null-ptr.net> Link: https://lore.kernel.org/r/3f31d4f75863614655c4673027a208be78d022ec.camel@null-ptr.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixing no patch applied exit code.
Return 0 breaks the patch as incorrect branches are still taken.