summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2026-04-22x86-64: rename misleadingly named '__copy_user_nocache()' functionLinus Torvalds
commit d187a86de793f84766ea40b9ade7ac60aabbb4fe upstream. This function was a masterclass in bad naming, for various historical reasons. It claimed to be a non-cached user copy. It is literally _neither_ of those things. It's a specialty memory copy routine that uses non-temporal stores for the destination (but not the source), and that does exception handling for both source and destination accesses. Also note that while it works for unaligned targets, any unaligned parts (whether at beginning or end) will not use non-temporal stores, since only words and quadwords can be non-temporal on x86. The exception handling means that it _can_ be used for user space accesses, but not on its own - it needs all the normal "start user space access" logic around it. But typically the user space access would be the source, not the non-temporal destination. That was the original intention of this, where the destination was some fragile persistent memory target that needed non-temporal stores in order to catch machine check exceptions synchronously and deal with them gracefully. Thus that non-descriptive name: one use case was to copy from user space into a non-cached kernel buffer. However, the existing users are a mix of that intended use-case, and a couple of random drivers that just did this as a performance tweak. Some of those random drivers then actively misused the user copying version (with STAC/CLAC and all) to do kernel copies without ever even caring about the exception handling, _just_ for the non-temporal destination. Rename it as a first small step to actually make it halfway sane, and change the prototype to be more normal: it doesn't take a user pointer unless the caller has done the proper conversion, and the argument size is the full size_t (it still won't actually copy more than 4GB in one go, but there's also no reason to silently truncate the size argument in the caller). Finally, use this now sanely named function in the NTB code, which mis-used a user copy version (with STAC/CLAC and all) of this interface despite it not actually being a user copy at all. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-09kernfs: pass struct ns_common instead of const void * for namespace tagsChristian Brauner
kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void *. Replace all const void * namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: - Quite a few irdma bug fixes, several user triggerable - Fix a 0 SMAC header in ionic - Tolerate FW errors for RAAS in bng_re - Don't UAF in efa when printing error events - Better handle pool exhaustion in the new bvec paths * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Harden depth calculation functions RDMA/irdma: Return EINVAL for invalid arp index error RDMA/irdma: Fix deadlock during netdev reset with active connections RDMA/irdma: Remove reset check from irdma_modify_qp_to_err() RDMA/irdma: Clean up unnecessary dereference of event->cm_node RDMA/irdma: Remove a NOP wait_event() in irdma_modify_qp_roce() RDMA/irdma: Update ibqp state to error if QP is already in error state RDMA/irdma: Initialize free_qp completion before using it RDMA/efa: Fix possible deadlock RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ path RDMA/rw: Fall back to direct SGE on MR pool exhaustion RDMA/efa: Fix use of completion ctx after free RDMA/bng_re: Fix silent failure in HWRM version query RDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init() RDMA/irdma: Fix double free related to rereg_user_mr
2026-03-26Merge tag 'dma-mapping-7.0-2026-03-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "A set of fixes for DMA-mapping subsystem, which resolve false- positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)" * tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: add missing `inline` for `dma_free_attrs` mm/hmm: Indicate that HMM requires DMA coherency RDMA/umem: Tell DMA mapping that UMEM requires coherency iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set dma-mapping: Introduce DMA require coherency attribute dma-mapping: Clarify valid conditions for CPU cache line overlap dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output dma-debug: Allow multiple invocations of overlapping entries dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
2026-03-20RDMA/umem: Tell DMA mapping that UMEM requires coherencyLeon Romanovsky
The RDMA subsystem exposes DMA regions through the verbs interface, which assumes a coherent system. Use the DMA_ATTR_REQUIRE_COHERENCE attribute to ensure coherency and avoid taking the SWIOTLB path. The RDMA verbs programming model resembles HMM and assumes concurrent DMA and CPU access to userspace memory. The hardware and programming model support "one-sided" operations initiated remotely without any local CPU involvement or notification. These include ATOMIC compare/swap, READ, and WRITE. A remote CPU can use these operations to traverse data structures, manipulate locks, and perform similar tasks without the host CPU’s awareness. If SWIOTLB substitutes memory or DMA is not cache coherent, these use cases break entirely. In-kernel RDMA is fine with incoherent mappings because kernel users do not rely on one-sided operations in ways that would expose these issues. A given region may also be exported multiple times, which can trigger warnings about cacheline overlaps. These warnings are suppressed when the new attribute is used. infiniband rocep8s0f0: mlx5_ib_reg_user_mr:1592:(pid 5812): start 0x2b28c000, iova 0x2b28c000, length 0x1000, access_flags 0x1 infiniband rocep8s0f0: mlx5_ib_reg_user_mr:1592:(pid 5812): start 0x2b28c001, iova 0x2b28c001, length 0xfff, access_flags 0x1 ------------[ cut here ]------------ DMA-API: mlx5_core 0000:08:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: kernel/dma/debug.c:620 at add_dma_entry+0x1bb/0x280, CPU#6: ibv_rc_pingpong/5812 Modules linked in: veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core ib_core CPU: 6 UID: 2733 PID: 5812 Comm: ibv_rc_pingpong Tainted: G W 6.19.0+ #129 PREEMPT Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:add_dma_entry+0x1be/0x280 Code: 8b 7b 10 48 85 ff 0f 84 c3 00 00 00 48 8b 6f 50 48 85 ed 75 03 48 8b 2f e8 ff 8e 6a 00 48 89 c6 48 8d 3d 55 ef 2d 01 48 89 ea <67> 48 0f b9 3a 48 85 db 74 1a 48 c7 c7 b0 00 2b 82 e8 9c 25 fd ff RSP: 0018:ff11000138717978 EFLAGS: 00010286 RAX: ffffffffa02d7831 RBX: ff1100010246de00 RCX: 0000000000000000 RDX: ff110001036fac30 RSI: ffffffffa02d7831 RDI: ffffffff82678650 RBP: ff110001036fac30 R08: ff11000110dcb4a0 R09: ff11000110dcb478 R10: 0000000000000000 R11: ffffffff824b30a8 R12: 0000000000000000 R13: 00000000ffffffef R14: 0000000000000202 R15: ff1100010246de00 FS: 00007f59b411c740(0000) GS:ff110008dcc99000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe538f7000 CR3: 000000010e066005 CR4: 0000000000373eb0 Call Trace: <TASK> debug_dma_map_sg+0x1b4/0x390 __dma_map_sg_attrs+0x6d/0x1a0 dma_map_sgtable+0x19/0x30 ib_umem_get+0x254/0x380 [ib_uverbs] mlx5_ib_reg_user_mr+0x68/0x2a0 [mlx5_ib] ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0x130 [ib_uverbs] ib_uverbs_cmd_verbs+0xa0b/0xae0 [ib_uverbs] ? ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT_SPEED+0xe0/0xe0 [ib_uverbs] ? mmap_region+0x7a/0xb0 ? do_mmap+0x3b8/0x5c0 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] __x64_sys_ioctl+0x14f/0x8b0 ? ksys_mmap_pgoff+0xc5/0x190 do_syscall_64+0x8c/0xbf0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f59b430aeed Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 RSP: 002b:00007ffe538f9430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe538f94c0 RCX: 00007f59b430aeed RDX: 00007ffe538f94e0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffe538f9480 R08: 0000000000000028 R09: 00007ffe538f9684 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffe538f9684 R13: 000000000000000c R14: 000000002b28d170 R15: 000000000000000c </TASK> ---[ end trace 0000000000000000 ]--- Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-7-1dde90a7f08b@nvidia.com
2026-03-18RDMA/irdma: Harden depth calculation functionsShiraz Saleem
An issue was exposed where OS can pass in U32_MAX for SQ/RQ/SRQ size. This can cause integer overflow and truncation of SQ/RQ/SRQ depth returning a success when it should have failed. Harden the functions to do all depth calculations and boundary checking in u64 sizes. Fixes: 563e1feb5f6e ("RDMA/irdma: Add SRQ support") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Return EINVAL for invalid arp index errorTatyana Nikolova
When rdma_connect() fails due to an invalid arp index, user space rdma core reports ENOMEM which is confusing. Modify irdma_make_cm_node() to return the correct error code. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Fix deadlock during netdev reset with active connectionsAnil Samal
Resolve deadlock that occurs when user executes netdev reset while RDMA applications (e.g., rping) are active. The netdev reset causes ice driver to remove irdma auxiliary driver, triggering device_delete and subsequent client removal. During client removal, uverbs_client waits for QP reference count to reach zero while cma_client holds the final reference, creating circular dependency and indefinite wait in iWARP mode. Skip QP reference count wait during device reset to prevent deadlock. Fixes: c8f304d75f6c ("RDMA/irdma: Prevent QP use after free") Signed-off-by: Anil Samal <anil.samal@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Remove reset check from irdma_modify_qp_to_err()Tatyana Nikolova
During reset, irdma_modify_qp() to error should be called to disconnect the QP. Without this fix, if not preceded by irdma_modify_qp() to error, the API call irdma_destroy_qp() gets stuck waiting for the QP refcount to go to zero, because the cm_node associated with this QP isn't disconnected. Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Clean up unnecessary dereference of event->cm_nodeIvan Barrera
The cm_node is available and the usage of cm_node and event->cm_node seems arbitrary. Clean up unnecessary dereference of event->cm_node. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Ivan Barrera <ivan.d.barrera@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Remove a NOP wait_event() in irdma_modify_qp_roce()Tatyana Nikolova
Remove a NOP wait_event() in irdma_modify_qp_roce() which is relevant for iWARP and likely a copy and paste artifact for RoCEv2. The wait event is for sending a reset on a TCP connection, after the reset has been requested in irdma_modify_qp(), which occurs only in iWarp mode. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Update ibqp state to error if QP is already in error stateTatyana Nikolova
In irdma_modify_qp() update ibqp state to error if the irdma QP is already in error state, otherwise the ibqp state which is visible to the consumer app remains stale. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Initialize free_qp completion before using itJacob Moroni
In irdma_create_qp, if ib_copy_to_udata fails, it will call irdma_destroy_qp to clean up which will attempt to wait on the free_qp completion, which is not initialized yet. Fix this by initializing the completion before the ib_copy_to_udata call. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/efa: Fix possible deadlockEthan Tidmore
In the error path for efa_com_alloc_comp_ctx() the semaphore assigned to &aq->avail_cmds is not released. Detected by Smatch: drivers/infiniband/hw/efa/efa_com.c:662 efa_com_cmd_exec() warn: inconsistent returns '&aq->avail_cmds' Add release for &aq->avail_cmds in efa_com_alloc_comp_ctx() error path. Fixes: ef3b06742c8a2 ("RDMA/efa: Fix use of completion ctx after free") Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Link: https://patch.msgid.link/20260314045730.1143862-1-ethantidmore06@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ pathChuck Lever
When IOVA-based DMA mapping is unavailable (e.g., IOMMU passthrough mode), rdma_rw_ctx_init_bvec() falls back to checking rdma_rw_io_needs_mr() with the raw bvec count. Unlike the scatterlist path in rdma_rw_ctx_init(), which passes a post-DMA-mapping entry count that reflects coalescing of physically contiguous pages, the bvec path passes the pre-mapping page count. This overstates the number of DMA entries, causing every multi-bvec RDMA READ to consume an MR from the QP's pool. Under NFS WRITE workloads the server performs RDMA READs to pull data from the client. With the inflated MR demand, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. svcrdma treats this as a DMA mapping failure, closes the connection, and the client reconnects -- producing a cycle of 71% RPC retransmissions and ~100 reconnections per test run. RDMA WRITEs (NFS READ direction) are unaffected because DMA_TO_DEVICE never triggers the max_sgl_rd check. Remove the rdma_rw_io_needs_mr() gate from the bvec path entirely, so that bvec RDMA operations always use the map_wrs path (direct WR posting without MR allocation). The bvec caller has no post-DMA-coalescing segment count available -- xdr_buf and svc_rqst hold pages as individual pointers, and physical contiguity is discovered only during DMA mapping -- so the raw page count cannot serve as a reliable input to rdma_rw_io_needs_mr(). iWARP devices, which require MRs unconditionally, are handled by an earlier check in rdma_rw_ctx_init_bvec() and are unaffected. Fixes: bea28ac14cab ("RDMA/core: add MR support for bvec-based RDMA operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-3-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/rw: Fall back to direct SGE on MR pool exhaustionChuck Lever
When IOMMU passthrough mode is active, ib_dma_map_sgtable_attrs() produces no coalescing: each scatterlist page maps 1:1 to a DMA entry, so sgt.nents equals the raw page count. A 1 MB transfer yields 256 DMA entries. If that count exceeds the device's max_sgl_rd threshold (an optimization hint from mlx5 firmware), rdma_rw_io_needs_mr() steers the operation into the MR registration path. Each such operation consumes one or more MRs from a pool sized at max_rdma_ctxs -- roughly one MR per concurrent context. Under write-intensive workloads that issue many concurrent RDMA READs, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. Upper layer protocols treat this as a fatal DMA mapping failure and tear down the connection. The max_sgl_rd check is a performance optimization, not a correctness requirement: the device can handle large SGE counts via direct posting, just less efficiently than with MR registration. When the MR pool cannot satisfy a request, falling back to the direct SGE (map_wrs) path avoids the connection reset while preserving the MR optimization for the common case where pool resources are available. Add a fallback in rdma_rw_ctx_init() so that -EAGAIN from rdma_rw_init_mr_wrs() triggers direct SGE posting instead of propagating the error. iWARP devices, which mandate MR registration for RDMA READs, and force_mr debug mode continue to treat -EAGAIN as terminal. Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-2-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/efa: Fix use of completion ctx after freeYonatan Nachum
On admin queue completion handling, if the admin command completed with error we print data from the completion context. The issue is that we already freed the completion context in polling/interrupts handler which means we print data from context in an unknown state (it might be already used again). Change the admin submission flow so alloc/dealloc of the context will be symmetric and dealloc will be called after any potential use of the context. Fixes: 68fb9f3e312a ("RDMA/efa: Remove redundant NULL pointer check of CQE") Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://patch.msgid.link/20260308165350.18219-1-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/bng_re: Fix silent failure in HWRM version queryKamal Heib
If the firmware version query fails, the driver currently ignores the error and continues initializing. This leaves the device in a bad state. Fix this by making bng_re_query_hwrm_version() return the error code and update the driver to check for this error and stop the setup process safely if it happens. Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") Signed-off-by: Kamal Heib <kheib@redhat.com> Link: https://patch.msgid.link/20260303043645.425724-1-kheib@redhat.com Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-04RDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init()Abhijit Gangurde
ionic_build_hdr() populated the Ethernet source MAC (hdr->eth.smac_h) by passing the header’s storage directly to rdma_read_gid_l2_fields(). However, ib_ud_header_init() is called after that and re-initializes the UD header, which wipes the previously written smac_h. As a result, packets are emitted with an zero source MAC address on the wire. Correct the source MAC by reading the GID-derived smac into a temporary buffer and copy it after ib_ud_header_init() completes. Fixes: e8521822c733 ("RDMA/ionic: Register device ops for control path") Cc: stable@vger.kernel.org # 6.18 Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20260227061809.2979990-1-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-04RDMA/irdma: Fix double free related to rereg_user_mrJacob Moroni
If IB_MR_REREG_TRANS is set during rereg_user_mr, the umem will be released and a new one will be allocated in irdma_rereg_mr_trans. If any step of irdma_rereg_mr_trans fails after the new umem is allocated, it releases the umem, but does not set iwmr->region to NULL. The problem is that this failure is propagated to the user, who will then call ibv_dereg_mr (as they should). Then, the dereg_mr path will see a non-NULL umem and attempt to call ib_umem_release again. Fix this by setting iwmr->region to NULL after ib_umem_release. Fixed: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region") Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260227152743.1183388-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-26RDMA/uverbs: Import DMA-BUF module in uverbs_std_types_dmabuf fileLeon Romanovsky
Fix the following compilation error: ERROR: modpost: module ib_uverbs uses symbol dma_buf_move_notify from namespace DMA_BUF, but does not import it. Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations") Link: https://patch.msgid.link/20260225-fix-uverbs-compilation-v1-1-acf7b3d0f9fa@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/umem: Fix double dma_buf_unpin in failure pathJacob Moroni
In ib_umem_dmabuf_get_pinned_with_dma_device(), the call to ib_umem_dmabuf_map_pages() can fail. If this occurs, the dmabuf is immediately unpinned but the umem_dmabuf->pinned flag is still set. Then, when ib_umem_release() is called, it calls ib_umem_dmabuf_revoke() which will call dma_buf_unpin() again. Fix this by removing the immediate unpin upon failure and just let the ib_umem_release/revoke path handle it. This also ensures the proper unmap-unpin unwind ordering if the dmabuf_map_pages call happened to fail due to dma_resv_wait_timeout (and therefore has a non-NULL umem_dmabuf->sgt). Fixes: 1e4df4a21c5a ("RDMA/umem: Allow pinned dmabuf umem usage") Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260224234153.1207849-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-25RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev()Stefan Metzmacher
When listening on wildcard addresses we have a global list for the application layer rdma_cm_id and for any existing device or any device added in future we try to listen on any wildcard listener. When the listener has a restricted_node_type we should prevent listening on devices with a different node type. While there fix the documentation comment of rdma_restrict_node_type() to include rdma_resolve_addr() instead of having rdma_bind_addr() twice. Fixes: a760e80e90f5 ("RDMA/core: introduce rdma_restrict_node_type()") Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20260224165951.3582093-2-metze@samba.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/ionic: Fix kernel stack leak in ionic_create_cq()Jason Gunthorpe
struct ionic_cq_resp resp { __u32 cqid[2]; // offset 0 - PARTIALLY SET (see below) __u8 udma_mask; // offset 8 - SET (resp.udma_mask = vcq->udma_mask) __u8 rsvd[7]; // offset 9 - NEVER SET <- LEAK }; rsvd[7]: 7 bytes of stack memory leaked unconditionally. cqid[2]: The loop at line 1256 iterates over udma_idx but skips indices where !(vcq->udma_mask & BIT(udma_idx)). The array has 2 entries but udma_count could be 1, meaning cqid[1] might never be written via ionic_create_cq_common(). If udma_mask only has bit 0 set, cqid[1] (4 bytes) is also leaked. So potentially 11 bytes leaked. Cc: stable@vger.kernel.org Fixes: e8521822c733 ("RDMA/ionic: Register device ops for control path") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/4-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah()Jason Gunthorpe
struct irdma_create_ah_resp { // 8 bytes, no padding __u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx) __u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK }; rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata(). The reserved members of the structure were not zeroed. Cc: stable@vger.kernel.org Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24IB/mthca: Add missed mthca_unmap_user_db() for mthca_create_srq()Jason Gunthorpe
Fix a user triggerable leak on the system call failure path. Cc: stable@vger.kernel.org Fixes: ec34a922d243 ("[PATCH] IB/mthca: Add SRQ implementation") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/2-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/efa: Fix typo in efa_alloc_mr()Jason Gunthorpe
The pattern is to check the entire driver request space, not just sizeof something unrelated. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/1-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/ionic: Fix potential NULL pointer dereference in ionic_query_portKamal Heib
The function ionic_query_port() calls ib_device_get_netdev() without checking the return value which could lead to NULL pointer dereference, Fix it by checking the return value and return -ENODEV if the 'ndev' is NULL. Fixes: 2075bbe8ef03 ("RDMA/ionic: Register device ops for miscellaneous functionality") Signed-off-by: Kamal Heib <kheib@redhat.com> Link: https://patch.msgid.link/20260220222125.16973-2-kheib@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/bng_re: Unwind bng_re_dev_init properlySiva Reddy Kallam
Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:270 bng_re_dev_init() warn: missing unwind goto? Current bng_re_dev_init function is not having clear unwinding code. So, added proper unwinding with ladder. Fixes: 4f830cd8d7fe ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Reported-by: Simon Horman <horms@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20260218091246.1764808-3-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-24RDMA/bng_re: Remove unnessary validity checksSiva Reddy Kallam
Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:113 bng_re_net_ring_free() warn: variable dereferenced before check 'rdev' (see line 107) current driver has unnessary validity checks. So, removing these unnessary validity checks. Fixes: 4f830cd8d7fe ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") Fixes: 04e031ff6e60 ("RDMA/bng_re: Initialize the Firmware and Hardware") Fixes: d0da769c19d0 ("RDMA/bng_re: Add Auxiliary interface") Reported-by: Simon Horman <horms@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20260218091246.1764808-2-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-24RDMA/core: Fix stale RoCE GIDs during netdev events at registrationJiri Pirko
RoCE GID entries become stale when netdev properties change during the IB device registration window. This is reproducible with a udev rule that sets a MAC address when a VF netdev appears: ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \ RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66" After VF creation, show_gids displays GIDs derived from the original random MAC rather than the configured one. The root cause is a race between netdev event processing and device registration: CPU 0 (driver) CPU 1 (udev/workqueue) ────────────── ────────────────────── ib_register_device() ib_cache_setup_one() gid_table_setup_one() _gid_table_setup_one() ← GID table allocated rdma_roce_rescan_device() ← GIDs populated with OLD MAC ip link set eth4 addr NEW_MAC NETDEV_CHANGEADDR queued netdevice_event_work_handler() ib_enum_all_roce_netdevs() ← Iterates DEVICE_REGISTERED ← Device NOT marked yet, SKIP! enable_device_and_get() xa_set_mark(DEVICE_REGISTERED) ← Too late, event was lost The netdev event handler uses ib_enum_all_roce_netdevs() which only iterates devices marked DEVICE_REGISTERED. However, this mark is set late in the registration process, after the GID cache is already populated. Events arriving in this window are silently dropped. Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is set immediately after the GID table is allocated and initialized. Use the new mark in ib_enum_all_roce_netdevs() function to iterate devices instead of DEVICE_REGISTERED. This is safe because: - After _gid_table_setup_one(), all required structures exist (port_data, immutable, cache.gid) - The GID table mutex serializes concurrent access between the initial rescan and event handlers - Event handlers correctly update stale GIDs even when racing with rescan - The mark is cleared in ib_cache_cleanup_one() before teardown This also fixes similar races for IP address events (inetaddr_event, inet6addr_event) which use the same enumeration path. Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-23RDMA/uverbs: select CONFIG_DMA_SHARED_BUFFERArnd Bergmann
The addition of dmabuf support in uverbs means that it is no longer possible to build infiniband support if that is disabled: arm-linux-gnueabi-ld: drivers/infiniband/core/ib_core_uverbs.o: in function `rdma_user_mmap_entry_remove.part.0': ib_core_uverbs.c:(.text+0x508): undefined reference to `dma_buf_move_notify' (dma_buf_move_notify): Unknown destination type (ARM/Thumb) in drivers/infiniband/core/ib_core_uverbs.o ib_core_uverbs.c:(.text+0x518): undefined reference to `dma_resv_wait_timeout' (dma_resv_wait_timeout): Unknown destination type (ARM/Thumb) in drivers/infiniband/core/ib_core_uverbs.o Select this from Kconfig, as we do for the other users. Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260216121213.2088910-1-arnd@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-12Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Usual driver updates (qla2xxx, mpi3mr, mpt3sas, ufs) plus assorted cleanups and fixes. The biggest core change is the massive code motion in the sd driver to remove forward declarations and the most significant change is to enumify the queuecommand return" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (78 commits) scsi: csiostor: Fix dereference of null pointer rn scsi: buslogic: Reduce stack usage scsi: ufs: host: mediatek: Require CONFIG_PM scsi: ufs: mediatek: Fix page faults in ufs_mtk_clk_scale() trace event scsi: smartpqi: Fix memory leak in pqi_report_phys_luns() scsi: mpi3mr: Make driver probing asynchronous scsi: ufs: core: Flush exception handling work when RPM level is zero scsi: efct: Use IRQF_ONESHOT and default primary handler scsi: ufs: core: Use a host-wide tagset in SDB mode scsi: qla2xxx: target: Add WQ_PERCPU to alloc_workqueue() users scsi: qla2xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: qla4xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: mpi3mr: Driver version update to 8.17.0.3.50 scsi: mpi3mr: Fixed the W=1 compilation warning scsi: mpi3mr: Record and report controller firmware faults scsi: mpi3mr: Update MPI Headers to revision 39 scsi: mpi3mr: Use negotiated link rate from DevicePage0 scsi: mpi3mr: Avoid redundant diag-fault resets scsi: mpi3mr: Rename log data save helper to reflect threaded/BH context scsi: mpi3mr: Add module parameter to control threaded IRQ polling ...
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-12Merge tag 'v7.0-rc-part1-ksmbd-and-smbdirect-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server and smbdirect updates from Steve French: - Fix tcp connection leak - Fix potential use after free when freeing multichannel - Fix locking problem in showing channel list - Locking improvement for tree connection - Fix infinite loop when signing errors - Add /proc interface for monitoring server state - Fixes to avoid mixing iWarp and InfiniBand/RoCEv1/RoCEv2 port ranges used for smbdirect - Fixes for smbdirect credit handling problems, these make the connections more reliable * tag 'v7.0-rc-part1-ksmbd-and-smbdirect-fixes' of git://git.samba.org/ksmbd: (32 commits) ksmbd: fix non-IPv6 build ksmbd: convert tree_conns_lock to rw_semaphore ksmbd: fix missing chann_lock while iterating session channel list ksmbd: add chann_lock to protect ksmbd_chann_list xarray smb: server: correct value for smb_direct_max_fragmented_recv_size smb: client: correct value for smbd_max_fragmented_recv_size smb: server: fix leak of active_num_conn in ksmbd_tcp_new_connection() ksmbd: add procfs interface for runtime monitoring and statistics ksmbd: fix infinite loop caused by next_smb2_rcv_hdr_off reset in error paths smb: server: make use of rdma_restrict_node_type() smb: client: make use of rdma_restrict_node_type() RDMA/core: introduce rdma_restrict_node_type() smb: client: let send_done handle a completion without IB_SEND_SIGNALED smb: client: let smbd_post_send_negotiate_req() use smbd_post_send() smb: client: fix last send credit problem causing disconnects smb: client: make use of smbdirect_socket.send_io.bcredits smb: client: use smbdirect_send_batch processing smb: client: introduce and use smbd_{alloc, free}_send_io() smb: client: split out smbd_ib_post_send() smb: client: port and use the wait_for_credits logic used by server ...
2026-02-11bnge/bng_re: Add a new HSIVikas Gupta
The HSI is shared between the firmware and the driver and is automatically generated. Add a new HSI for the BNGE driver. The current HSI refers to BNXT, which will become incompatible with ThorUltra devices as the BNGE driver adds more features. The BNGE driver will not use the HSI located in the bnxt folder. Also, add an HSI for ThorUltra RoCE driver. Changes in v3: - Fix in bng_roce_hsi.h reported by Jakub (AI review) https://lore.kernel.org/netdev/20260207051422.4181717-1-kuba@kernel.org/ - Add an entry in MAINTAINERS Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Link: https://patch.msgid.link/20260208172925.1861255-1-vikas.gupta@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-08RDMA/mlx5: Implement DMABUF export opsYishai Hadas
Enable p2pdma on the mlx5 PCI device to allow DMABUF-based peer-to-peer DMA mappings. Add implementation of the mmap_get_pfns and pgoff_to_mmap_entry device operations required for DMABUF support in the mlx5 RDMA driver. The pgoff_to_mmap_entry operation converts a page offset to the corresponding rdma_user_mmap_entry by extracting the command and index from the offset and looking it up in the ucontext's mmap_xa. The mmap_get_pfns operation retrieves the physical address and length from the mmap entry and obtains the p2pdma provider for the underlying PCI device, which is needed for peer-to-peer DMA operations with DMABUFs. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-3-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-08RDMA/uverbs: Add DMABUF object type and operationsYishai Hadas
Expose DMABUF functionality to userspace through the uverbs interface, enabling InfiniBand/RDMA devices to export PCI based memory regions (e.g. device memory) as DMABUF file descriptors. This allows zero-copy sharing of RDMA memory with other subsystems that support the dma-buf framework. A new UVERBS_OBJECT_DMABUF object type and allocation method were introduced. During allocation, uverbs invokes the driver to supply the rdma_user_mmap_entry associated with the given page offset (pgoff). Based on the returned rdma_user_mmap_entry, uverbs requests the driver to provide the corresponding physical-memory details as well as the driver’s PCI provider information. Using this information, dma_buf_export() is called; if it succeeds, uobj->object is set to the underlying file pointer returned by the dma-buf framework. The file descriptor number follows the standard uverbs allocation flow, but the file pointer comes from the dma-buf subsystem, including its own fops and private data. When an mmap entry is removed, uverbs iterates over its associated DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that their importers are notified. The same procedure applies during the disassociate flow; final cleanup occurs when the application closes the file. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-2-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-08RDMA/uverbs: Support external FD uobjectsYishai Hadas
Add support for uobjects that wrap externally allocated file descriptors (FDs). In this mode, the FD number still follows the standard uverbs allocation flow, but the file pointer is allocated externally and has its own fops and private data. As a result, alloc_begin_fd_uobject() must handle cases where fd_type->fops is NULL, and both alloc_commit_fd_uobject() and alloc_abort_fd_uobject() must account for whether filp->private_data exists, since it is populated outside the standard uverbs flow. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-1-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-08RDMA/core: introduce rdma_restrict_node_type()Stefan Metzmacher
For smbdirect it required to use different ports depending on the RDMA protocol. E.g. for iWarp 5445 is needed (as tcp port 445 already used by the raw tcp transport for SMB), while InfiniBand, RoCEv1 and RoCEv2 use port 445, as they use an independent port range (even for RoCEv2, which uses udp port 4791 itself). Currently ksmbd is not able to function correctly at all if the system has iWarp (RDMA_NODE_RNIC) interface(s) and any InfiniBand, RoCEv1 and/or RoCEv2 interface(s) at the same time. And cifs.ko uses 5445 with a fallback to 445, which means depending on the available interfaces, it tries 5445 in the RoCE range or may tries iWarp with 445 as a fallback. This leads to strange error messages and strange network captures. To avoid these problems they will be able to use rdma_restrict_node_type(RDMA_NODE_RNIC) before trying port 5445 and rdma_restrict_node_type(RDMA_NODE_IB_CA) before trying port 445. It means we'll get early -ENODEV early from rdma_resolve_addr() without any network traffic and timeouts. This is designed to be called before calling any of rdma_bind_addr(), rdma_resolve_addr() or rdma_listen(). Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Acked-by: Leon Romanovsky <leon@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-05net/mlx5: Fix 1600G link mode enum namingYael Chemla
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its usage in ethtool and link-info tables. Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05RDMA/siw: Fix potential NULL pointer dereference in header processingYunJe Shin
If siw_get_hdr() returns -EINVAL before set_rx_fpdu_context(), qp->rx_fpdu can be NULL. The error path in siw_tcp_rx_data() dereferences qp->rx_fpdu->more_ddp_segs without checking, which may lead to a NULL pointer deref. Only check more_ddp_segs when rx_fpdu is present. KASAN splat: [ 101.384271] KASAN: null-ptr-deref in range [0x00000000000000c0-0x00000000000000c7] [ 101.385869] RIP: 0010:siw_tcp_rx_data+0x13ad/0x1e50 Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Signed-off-by: YunJe Shin <ioerts@kookmin.ac.kr> Link: https://patch.msgid.link/20260204092546.489842-1-ioerts@kookmin.ac.kr Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-05RDMA/umad: Reject negative data_len in ib_umad_writeYunJe Shin
ib_umad_write computes data_len from user-controlled count and the MAD header sizes. With a mismatched user MAD header size and RMPP header length, data_len can become negative and reach ib_create_send_mad(). This can make the padding calculation exceed the segment size and trigger an out-of-bounds memset in alloc_send_rmpp_list(). Add an explicit check to reject negative data_len before creating the send buffer. KASAN splat: [ 211.363464] BUG: KASAN: slab-out-of-bounds in ib_create_send_mad+0xa01/0x11b0 [ 211.364077] Write of size 220 at addr ffff88800c3fa1f8 by task spray_thread/102 [ 211.365867] ib_create_send_mad+0xa01/0x11b0 [ 211.365887] ib_umad_write+0x853/0x1c80 Fixes: 2be8e3ee8efd ("IB/umad: Add P_Key index support") Signed-off-by: YunJe Shin <ioerts@kookmin.ac.kr> Link: https://patch.msgid.link/20260203100628.1215408-1-ioerts@kookmin.ac.kr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-03RDMA/umem: don't abuse current->group_leaderOleg Nesterov
Cleanup and preparation to simplify the next changes. Use current->tgid instead of current->group_leader->pid. Link: https://lkml.kernel.org/r/aXY_2JIhCeGAYC0r@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Leon Romanovsky <leon@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Christan König <christian.koenig@amd.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>