forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 4
[pull] master from torvalds:master #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Recently new functions for IO memcpy and IO memset were added in libs/iomem_copy.c. So, remove the arch specific implementations, to fall back to the generic ones which do exactly the same. Keep memcpy_fromio for now, because it's slight more optimized by doing 'u16' accesses if the buffer is aligned this way. Signed-off-by: Julian Vetter <julian@outer-limits.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Helge Deller <deller@gmx.de>
Smatch generates the following false error report: drivers/infiniband/hw/mlx4/main.c:393 mlx4_ib_del_gid() error: uninitialized symbol 'gids'. Traditionally, we are not changing kernel code and asking people to fix the tools. However in this case, the fix can be done by simply rearranging the code to be more clear. Fixes: e26be1b ("IB/mlx4: Implement ib_device callbacks") Link: https://patch.msgid.link/6a3a1577463da16962463fcf62883a87506e9b62.1733233426.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Convert mlx4 to use ib_umem_find_best_pgsz() instead of open-coded variant to calculate MTT size. Link: https://patch.msgid.link/c39ec6f5d4664c439a72f2961728ebb5895a9f07.1733233299.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Replace an open coding of rdma_umem_for_each_dma_block() with the proper function. Link: https://patch.msgid.link/0bf595962c964fb8918743405acf9103a5a85983.1733233299.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The current ODP counters represent the total number of pages handled, but it is not enough to understand the effectiveness of these operations. Extend the ODP counters to include the number of times page fault and invalidation events were handled. Example for a single page fault handling 512 pages: - page_fault: incremented by 512 (total pages) - page_fault_handled: incremented by 1 (operation count) The same example is applicable for page invalidation too. Previous output: $ rdma stat mr dev rocep8s0f0 mrn 8 page_faults 27 page_invalidations 0 page_prefetch 29 New output: $ rdma stat mr dev rocep8s0f0 mrn 21 page_faults 512 page_faults_handled 1 page_invalidations 0 page_invalidations_handled 0 page_prefetch 51200 Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/b18f29ed1392996ade66e9e6c45f018925253f6a.1733234165.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
The "gl->tot_len" variable is controlled by the user. It comes from process_responses(). On 32bit systems, the "gl->tot_len + sizeof(struct cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an integer wrapping bug. Use size_add() to prevent this. Fixes: 1cab775 ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection") Link: https://patch.msgid.link/r/86b404e1-4a75-4a35-a34e-e3054fa554c7@stanley.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Since this implements domain_alloc_paging_flags() it only needs one op. Fold mock_domain_alloc_paging() into mock_domain_alloc_paging_flags(). Link: https://patch.msgid.link/r/0-v1-8a3e7e21ff6a+1745d-iommufd_paging_flags_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
IOMMU_HWPT_FAULT_ID_VALID is used to mark if the fault_id field of iommu_hwp_alloc is valid or not. As the fault_id field is handled in the iommufd core, so it makes sense to sanitize the IOMMU_HWPT_FAULT_ID_VALID flag in the iommufd core, and mask it out before passing the user flags to the iommu drivers. Link: https://patch.msgid.link/r/20241207120108.5640-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, the erdma driver supports both the iWARP and RoCEv2 protocols. The erdma driver reads the ERDMA_REGS_DEV_PROTO_REG register to identify the protocol used by the erdma device. Since each protocol requires different ib_device_ops, we introduce the erdma_device_ops_iwarp and erdma_device_ops_rocev2 for iWARP and RoCEv2 protocols, respectively. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-2-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The erdma_add_gid() interface inserts a GID entry at the specified index. The erdma_del_gid() interface deletes the GID entry at the specified index. Additionally, programs can invoke the erdma_query_port() and erdma_get_port_immutable() interfaces to query the GID table length. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-3-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The erdma_query_pkey() interface queries the PKey at the specified index. Currently, erdma supports only one partition and returns the default PKey for each query. Besides, the correct length of the PKey table can be obtained by calling the erdma_query_port() and erdma_get_port_immutable() interfaces. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-4-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The address handle contains the necessary information to transmit messages to a remote peer in the RoCEv2 protocol. This commit implements the erdma_create_ah(), erdma_destroy_ah(), and erdma_query_ah() interfaces, which are used to create, destroy, and query an address handle, respectively. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-5-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The QP state machines in the RoCEv2 and iWARP protocols are different. To handle these differences for the erdma RoCEv2 device, we provide the erdma_modify_qp_rocev2() interface, which transitions the QP state and modifies QP attributes accordingly. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-6-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The procedure for modifying QP is similar for both the iWARP and RoCEv2 protocols. Therefore, we unify the code and provide the erdma_modify_qp() interface for both protocols. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-7-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Certian QP attributes, such as sq_draining, can only be obtained by querying the hardware on the erdma RoCEv2 device. To address this, we add the query_qp command to the cmdq and parse the response to retrieve corresponding QP attributes. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-8-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The iWARP protocol supports only RC QPs previously. Now we add UD QPs and UD WRs support for the RoCEv2 protocol. Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241211020930.68833-9-boshiyu@linux.alibaba.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
This is a purely cosmetic change. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1733888745-30939-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Return directly in case of error without a goto label as there is no cleanup actions performed. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1733888745-30939-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Optimize error handling path in bnxt_re_probe by removing some duplicate code. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1733888745-30939-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Move the function definition of bnxt_re_shutdown() to avoid forward declarartion of bnxt_re_dev_uninit(). Move the function definition of bnxt_re_setup_cc() before bnxt_re_add_device() to avoid it's forward declarations. Also, forward declarartions of bnxt_re_stop_irq() and bnxt_re_dev_stop() are unnecessary. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1733888745-30939-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
There is no need to include bnxt_ulp.h in ib_verbs.c. Remove it. Also, fixed hw_counters.c to remove unwanted header file inclusions. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1733888745-30939-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
hfi1_format_hwerrors() was added in 2015 by commit 7724105 ("IB/hfi1: add driver files") but never used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241216211914.745111-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
User mode queries max_msg_sz as 0x800000 by command 'ibv_devinfo -v', however ibv_post_send/ibv_post_recv has a limit of 2^31. Fix this mismatched information. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Fixes: f605f26 ("RDMA/rxe: Protect QP state with qp->state_lock") Fixes: 5bf944f ("RDMA/rxe: Add error messages") Link: https://patch.msgid.link/20241216121953.765331-1-pizhenwei@bytedance.com Review-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
As comment of device_add() says, if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count. Add a put_device() call before returning from the function to decrement reference count for cleanup. Found by code review. Fixes: c8e4c23 ("RDMA/srp: Rework the srp_add_port() error path") Signed-off-by: Ma Ke <make_ruc2021@163.com> Link: https://patch.msgid.link/20241217075538.2909996-1-make_ruc2021@163.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Fix conditional if else check by checking with wr->opcode. The indicated dead code may have performed some action; that action will never occur as op is pre-assigned a different value. Fixes: 999a0a2 ("RDMA/erdma: Support UD QPs and UD WRs") Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com> Link: https://patch.msgid.link/20241219043939.10344-1-advaitdhamorikar@gmail.com Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
The sysfs core now provides callback variants that explicitly take a const pointer. Make use of it to match the attribute definition. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Helge Deller <deller@gmx.de>
The sysfs core now provides callback variants that explicitly take a const pointer. Make use of it to match the attribute definitions. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Helge Deller <deller@gmx.de>
hdmi_infoframe_check() has been unused since it was added in commit c5e69ab ("video/hdmi: Constify infoframe passed to the pack functions") Remove it. Note that the individual check functions for each type are actually used, so they're staying. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Helge Deller <deller@gmx.de>
hdmi5_core_handle_irqs() has been unused since commit f5bab22 ("OMAPDSS: HDMI: Add OMAP5 HDMI support") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Helge Deller <deller@gmx.de>
irdma_cqp_commit_fpm_val_cmd() and irdma_cqp_query_fpm_val_cmd() were added in 2021 by commit 915cc7a ("RDMA/irdma: Add miscellaneous utility definitions") but haven't been used. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241223001613.307138-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Helge Deller <deller@gmx.de>
Restrict the check for the number of pages handled during an ODP page fault to direct mkeys. Perform the check right after handling the page fault and don't propagate the number of handled pages to callers. Indirect mkeys and their associated direct mkeys can have different start addresses. As a result, the calculation of the number of pages to handle for an indirect mkey may not match the actual page fault handling done on the direct mkey. For example: A 4K sized page fault on a KSM mkey that has a start address that is not aligned to a page will result a calculation that assumes the number of pages required to handle are 2. While the underlying MTT might be aligned will require fetching only a single page. Thus, do the calculation and compare number of pages handled only per direct mkey. Fixes: db570d7 ("IB/mlx5: Add ODP support to MW") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/86c483d9e75ce8fe14e9ff85b62df72b779f8ab1.1736187990.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
When the driver receives an async event notification from the Firmware, we make the new ulp_async_notifier() call to inform the RDMA driver that a firmware async event has been received. RDMA driver can then take necessary actions based on the event type. In the next patch, we will implement the ulp_async_notifier() callbacks in the RDMA driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-2-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Using the option provided by Ethernet driver, register for FW Async event. During probe, while registeriung with Ethernet driver, provide the ulp hook 'ulp_async_notifier' for receiving the firmware events. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Added function to query firmware default values of CC parameters during driver init. These values will be stored in driver local structure and used in subsequent patch. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-4-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
QP1 context in HW needs to be updated when there is a change in the default DSCP values used for RoCE traffic. Handle the event from FW and modify the dscp value used by QP1. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-5-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
ulp_irq_stop() can be invoked from a context where FW is healthy or when FW is in a reset state. In the latter case, ULP must stop all interactions with HW/FW and also with application and stack. Added a new parameter to the ulp_irq_stop() function to achieve that. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
In order to optimize the size of driver private structure, the memory for dev_attr is allocated dynamically during the chip context initialization. In order to make certain runtime decisions, store dev_attr in the qplib_res structure. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
This patch sends IB_EVENT_QP_LAST_WQE_REACHED event on a QP that is in error state and associated with an SRQ. This behaviour is incorporated in flush_qp() which is called when QP transitions to error state. Supports SRQ drain functionality added by commit 844bc12 ("IB/core: add support for draining Shared receive queues") Fixes: 844bc12 ("IB/core: add support for draining Shared receive queues") Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20250107095053.81007-1-anumula@chelsio.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
The Call Trace is as below: " <TASK> ? show_regs.cold+0x1a/0x1f ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __warn+0x84/0xd0 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? report_bug+0x105/0x180 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __rxe_cleanup+0x124/0x170 [rdma_rxe] rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe] ib_destroy_qp_user+0x118/0x190 [ib_core] rdma_destroy_qp.cold+0x43/0x5e [rdma_cm] rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core] rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server] process_one_work+0x21d/0x3f0 worker_thread+0x4a/0x3c0 ? process_one_work+0x3f0/0x3f0 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> " When too many rdma resources are allocated, rxe needs more time to handle these rdma resources. Sometimes with the current timeout, rxe can not release the rdma resources correctly. Compared with other rdma drivers, a bigger timeout is used. Fixes: 215d0a7 ("RDMA/rxe: Stop lookup of partially built objects") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250110160927.55014-1-yanjun.zhu@linux.dev Tested-by: Joe Klein <joe.klein812@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Fix an issue detected by syzbot: WARNING in iommufd_device_unbind iommufd: Time out waiting for iommufd object to become free Resolve a warning in iommufd_device_unbind caused by a timeout while waiting for the shortterm_users reference count to reach zero. The existing 10-second timeout is insufficient in some scenarios, resulting in failures the above warning. Increase the timeout in iommufd_object_dec_wait_shortterm from 10 seconds to 60 seconds to allow sufficient time for the reference count to drop to zero. This change prevents premature timeouts and reduces the likelihood of warnings during iommufd_device_unbind. Fixes: 6f9c4d8 ("iommufd: Do not UAF during iommufd_put_object()") Link: https://patch.msgid.link/r/20241123195900.3176-1-surajsonawane0215@gmail.com Reported-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c92878e123785b1fa2db Tested-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
…_index() Resolve a UBSAN shift-out-of-bounds issue in iova_bitmap_offset_to_index() where shifting the constant "1" (of type int) by bitmap->mapped.pgshift (an unsigned long value) could result in undefined behavior. The constant "1" defaults to a 32-bit "int", and when "pgshift" exceeds 31 (e.g., pgshift = 63) the shift operation overflows, as the result cannot be represented in a 32-bit type. To resolve this, the constant is updated to "1UL", promoting it to an unsigned long type to match the operand's type. Fixes: 58ccf01 ("vfio: Add an IOVA bitmap support") Link: https://patch.msgid.link/r/20250113223820.10713-1-qasdev00@gmail.com Reported-by: syzbot <syzbot+85992ace37d5b7b51635@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=85992ace37d5b7b51635 Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reorder the existing OBJ/IOCTL lists. Also run clang-format for the same coding style at line wrappings. No functional change. Link: https://patch.msgid.link/r/c5e6d6e0a0bb7abc92ad26937fde19c9426bee96.1736237481.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Both were missing in the initial patch. Fixes: 07838f7 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/bc8bb13e215af27e62ee51bdba3648dd4ed2dce3.1736923732.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20250114-sysfs-const-bin_attr-infiniband-v1-1-397aaa94d453@weissschuh.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20250114-sysfs-const-bin_attr-infiniband-v1-2-397aaa94d453@weissschuh.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
At some point the orig_video_isVGA field of screen_info was repurposed for video type. Using it directly for video type check is unsafe as it can still mean yes (1) or no-output (0) in certain configurations. I had one of those. Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org> Signed-off-by: Helge Deller <deller@gmx.de>
The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The vdso linker script is preprocessed on demand. Adding it to 'targets' is enough to include the .cmd file. This commit applies the previous change to parisc, which added the vdso support after commit 887af6d ("arch: vdso: add vdso linker script to 'targets' instead of extra-y"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
The 32-bit Debian kernel 6.12 fails to boot and crashes like this: init (pid 65): Protection id trap (code 7) CPU: 0 UID: 0 PID: 65 Comm: init Not tainted 6.12.9 #2 Hardware name: 9000/778/B160L YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00000000000001000000000000001111 Not tainted r00-03 0004000f 110d39d0 109a6558 12974400 r04-07 12a810e0 12a810e0 00000000 12a81144 r08-11 12a81174 00000007 00000000 00000002 r12-15 f8c55c08 0000006c 00000001 f8c55c08 r16-19 00000002 f8c58620 002da3a8 0000004e r20-23 00001a46 0000000f 10754f84 00000000 r24-27 00000000 00000003 12ae6980 1127b9d0 r28-31 00000000 00000000 12974440 109a6558 sr00-03 00000000 00000000 00000000 00000010 sr04-07 00000000 00000000 00000000 00000000 IASQ: 00000000 00000000 IAOQ: 110d39d0 110d39d4 IIR: baadf00d ISR: 00000000 IOR: 110d39d0 CPU: 0 CR30: 128740c0 CR31: 00000000 ORIG_R28: 000003f3 IAOQ[0]: 0x110d39d0 IAOQ[1]: 0x110d39d4 RP(r2): security_sk_free+0x70/0x1a4 Backtrace: [<10d8c844>] __sk_destruct+0x2bc/0x378 [<10d8e33c>] sk_destruct+0x68/0x8c [<10d8e3dc>] __sk_free+0x7c/0x148 [<10d8e560>] sk_free+0xb8/0xf0 [<10f6420c>] unix_release_sock+0x3ac/0x50c [<10f643b8>] unix_release+0x4c/0x7c [<10d832f8>] __sock_release+0x5c/0xf8 [<10d833b4>] sock_close+0x20/0x44 [<107ba52c>] __fput+0xf8/0x468 [<107baa08>] __fput_sync+0xb4/0xd4 [<107b471c>] sys_close+0x44/0x94 [<10405334>] syscall_exit+0x0/0x10 Bisecting points to this commit which triggers the issue: commit 417c564 Author: KP Singh <kpsingh@kernel.org> Date: Fri Aug 16 17:43:07 2024 +0200 lsm: replace indirect LSM hook calls with static calls After more analysis it seems that we don't fully implement the static calls and jump tables yet. Additionally the functions which mark kernel memory read-only or read-write-executable needs to be further enhanced to be able to fully support static calls. Enabling CONFIG_SECURITY_YAMA=y was one possibility to trigger the issue, although YAMA isn't the reason for the fault. As a temporary solution disable JUMP_LABEL functionality to avoid the crashes. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> # v6.12+
Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Signed-off-by: Helge Deller <deller@gmx.de>
The iommu_hwpt_pgfault is used to report IO page fault data to userspace, but iommufd_fault_fops_read was never zeroing its padding. This leaks the content of the kernel stack memory to userspace. Also, the iommufd uAPI requires explicit padding and use of __aligned_u64 to ensure ABI compatibility's with 32 bit. pahole result, before: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 */ __u32 dev_id; /* 4 4 */ __u32 pasid; /* 8 4 */ __u32 grpid; /* 12 4 */ __u32 perm; /* 16 4 */ /* XXX 4 bytes hole, try to pack */ __u64 addr; /* 24 8 */ __u32 length; /* 32 4 */ __u32 cookie; /* 36 4 */ /* size: 40, cachelines: 1, members: 8 */ /* sum members: 36, holes: 1, sum holes: 4 */ /* last cacheline: 40 bytes */ }; pahole result, after: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 */ __u32 dev_id; /* 4 4 */ __u32 pasid; /* 8 4 */ __u32 grpid; /* 12 4 */ __u32 perm; /* 16 4 */ __u32 __reserved; /* 20 4 */ __u64 addr __attribute__((__aligned__(8))); /* 24 8 */ __u32 length; /* 32 4 */ __u32 cookie; /* 36 4 */ /* size: 40, cachelines: 1, members: 9 */ /* forced alignments: 1 */ /* last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Fixes: c714f15 ("iommufd: Add fault and response message definitions") Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey. This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1]. To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. [1] From dmesg: infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: <TASK> mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: e6fb246 ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290229.git.leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Prevent double queueing of implicit ODP mr destroy work by using __xa_cmpxchg() to make sure this is the only time we are destroying this specific mr. Without this change, we could try to invalidate this mr twice, which in turn could result in queuing a MR work destroy twice, and eventually the second work could execute after the MR was freed due to the first work, causing a user after free and trace below. refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 12178 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x130 Modules linked in: bonding ib_ipoib vfio_pci ip_gre geneve nf_tables ip6_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad rdma_ucm mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 mlx5_ib vfio ib_uverbs mlx5_core iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] CPU: 2 PID: 12178 Comm: kworker/u20:5 Not tainted 6.5.0-rc1_net_next_mlx5_58c644e #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound free_implicit_child_mr_work [mlx5_ib] RIP: 0010:refcount_warn_saturate+0x12b/0x130 Code: 48 c7 c7 38 95 2a 82 c6 05 bc c6 fe 00 01 e8 0c 66 aa ff 0f 0b 5b c3 48 c7 c7 e0 94 2a 82 c6 05 a7 c6 fe 00 01 e8 f5 65 aa ff <0f> 0b 5b c3 90 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 8d 50 ff RSP: 0018:ffff8881008e3e40 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: ffff88852c91b5c8 RSI: 0000000000000001 RDI: ffff88852c91b5c0 RBP: ffff8881dacd4e00 R08: 00000000ffffffff R09: 0000000000000019 R10: 000000000000072e R11: 0000000063666572 R12: ffff88812bfd9e00 R13: ffff8881c792d200 R14: ffff88810011c005 R15: ffff8881002099c0 FS: 0000000000000000(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5694b5e000 CR3: 00000001153f6003 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0x12b/0x130 free_implicit_child_mr_work+0x180/0x1b0 [mlx5_ib] process_one_work+0x1cc/0x3c0 worker_thread+0x218/0x3c0 kthread+0xc6/0xf0 ret_from_fork+0x1f/0x30 </TASK> Fixes: 5256edc ("RDMA/mlx5: Rework implicit ODP destroy") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/c96b8645a81085abff739e6b06e286a350d1283d.1737274283.git.leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
…kernel/git/deller/linux-fbdev Pull fbdev updates from Helge Deller: "Fixes: - omap: use threaded IRQ for LCD DMA - omapfb: Fix an OF node leak in dss_of_port_get_parent_device() - vga16fb: fix orig_video_isVGA confusion Updates & cleanups: - hdmi: Remove unused hdmi_infoframe_check - omapfb: - Remove unused hdmi5_core_handle_irqs - Use of_property_present() to test existence of DT property - Use syscon_regmap_lookup_by_phandle_args - efifb: Change the return value type to void - lcdcfb: Use backlight helper - udlfb: Use const 'struct bin_attribute' callback - radeon: Use const 'struct bin_attribute' callbacks - sm501fb: Use str_enabled_disabled() helper in sm501fb_init_fb()" * tag 'fbdev-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: lcdcfb: Use backlight helper fbdev: vga16fb: fix orig_video_isVGA confusion fbdev: omapfb: Use syscon_regmap_lookup_by_phandle_args fbdev: omapfb: Use of_property_present() to test existence of DT property fbdev: sm501fb: Use str_enabled_disabled() helper in sm501fb_init_fb() fbdev: omap: use threaded IRQ for LCD DMA fbdev: omapfb: Fix an OF node leak in dss_of_port_get_parent_device() fbdev: efifb: Change the return value type to void fbdev: omapfb: Remove unused hdmi5_core_handle_irqs video: hdmi: Remove unused hdmi_infoframe_check fbdev: radeon: Use const 'struct bin_attribute' callbacks fbdev: udlfb: Use const 'struct bin_attribute' callback
…/kernel/git/deller/parisc-linux Pull parisc architecture updates from Helge Deller: - Temporarily disable jump label support to avoid kernel crash with 32-bit kernel - Add vdso linker script to 'targets' instead of extra-y - Remove parisc versions of memcpy_toio and memset_io * tag 'parisc-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Temporarily disable jump label support parisc: add vdso linker script to 'targets' instead of extra-y parisc: Remove memcpy_toio and memset_io
…ernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()
…t/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Lighter that normal, but the now usual collection of driver fixes and small improvements: - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp, efa, cxgb4 - Update mlx4 to use the new umem APIs, avoiding direct use of scatterlist - Support ROCEv2 in erdma - Remove various uncalled functions, constify bin_attribute - Provide core infrastructure to catch netdev events and route them to drivers, consolidating duplicated driver code - Fix rare race condition crashes in mlx5 ODP flows" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (63 commits) RDMA/mlx5: Fix implicit ODP use after free RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error RDMA/qib: Constify 'struct bin_attribute' RDMA/hfi1: Constify 'struct bin_attribute' RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED event RDMA/bnxt_re: Allocate dev_attr information dynamically RDMA/bnxt_re: Pass the context for ulp_irq_stop RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE event RDMA/bnxt_re: Query firmware defaults of CC params during probe RDMA/bnxt_re: Add Async event handling support bnxt_en: Add ULP call to notify async events RDMA/mlx5: Fix indirect mkey ODP page count MAINTAINERS: Update the bnxt_re maintainers RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNS RDMA/rtrs: Add missing deinit() call RDMA/efa: Align interrupt related fields to same type RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error RDMA/mlx5: Fix link status down event for MPV RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contexts ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )