kernel/net/sunrpc/xprtrdma/verbs.c, branch linux-rolling-lts

xprtrdma: Decrement re_receiving on the early exit paths

2026-03-19T15:08:12Z

[ Upstream commit 7b6275c80a0c81c5f8943272292dfe67730ce849 ] In the event that rpcrdma_post_recvs() fails to create a work request (due to memory allocation failure, say) or otherwise exits early, we should decrement ep->re_receiving before returning. Otherwise we will hang in rpcrdma_xprt_drain() as re_receiving will never reach zero and the completion will never be triggered. On a system with high memory pressure, this can appear as the following hung task: INFO: task kworker/u385:17:8393 blocked for more than 122 seconds. Tainted: G S E 6.19.0 #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u385:17 state:D stack:0 pid:8393 tgid:8393 ppid:2 task_flags:0x4248060 flags:0x00080000 Workqueue: xprtiod xprt_autoclose [sunrpc] Call Trace: __schedule+0x48b/0x18b0 ? ib_post_send_mad+0x247/0xae0 [ib_core] schedule+0x27/0xf0 schedule_timeout+0x104/0x110 __wait_for_common+0x98/0x180 ? __pfx_schedule_timeout+0x10/0x10 wait_for_completion+0x24/0x40 rpcrdma_xprt_disconnect+0x444/0x460 [rpcrdma] xprt_rdma_close+0x12/0x40 [rpcrdma] xprt_autoclose+0x5f/0x120 [sunrpc] process_one_work+0x191/0x3e0 worker_thread+0x2e3/0x420 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x273/0x2b0 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 15788d1d1077 ("xprtrdma: Do not refresh Receive Queue while it is draining") Signed-off-by: Eric Badger Reviewed-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin

xprtrdma: Remove temp allocation of rpcrdma_rep objects

2024-07-08T17:47:24Z

The original code was designed so that most calls to rpcrdma_rep_create() would occur on the NUMA node that the device preferred. There are a few cases where that's not possible, so those reps are marked as temporary. However, we have the device (and its preferred node) already in rpcrdma_rep_create(), so let's use that to guarantee the memory is allocated from the correct node. Signed-off-by: Chuck Lever Reviewed-by: Sagi Grimberg Signed-off-by: Anna Schumaker

xprtrdma: Handle device removal outside of the CM event handler

2024-07-08T17:47:24Z

Wait for all disconnects to complete to ensure the transport has divested all of its hardware resources before the underlying RDMA device can be removed. Signed-off-by: Chuck Lever Reviewed-by: Sagi Grimberg Signed-off-by: Anna Schumaker

xprtrdma: Fix rpcrdma_reqs_reset()

2024-07-08T17:47:24Z

Avoid FastReg operations getting MW_BIND_ERR after a reconnect. rpcrdma_reqs_reset() is called on transport tear-down to get each rpcrdma_req back into a clean state. MRs on req->rl_registered are waiting for a FastReg, are already registered, or are waiting for invalidation. If the transport is being torn down when reqs_reset() is called, the matching LocalInv might never be posted. That leaves these MR registered /and/ on req->rl_free_mrs, where they can be re-used for the next connection. Since xprtrdma does not keep specific track of the MR state, it's not possible to know what state these MRs are in, so the only safe thing to do is release them immediately. Fixes: 5de55ce951a1 ("xprtrdma: Release in-flight MRs on disconnect") Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker

xprtrdma: removed asm-generic headers from verbs.c

2024-07-08T14:55:39Z

asm-generic/barrier.h and asm/bitops.h are already brought into the file through the header linux/sunrpc/svc_rdma.h and the file can still be built with their removal. They have been replaced with the preferred linux/bitops.h and asm/barrier.h to remove the need for the asm-generic header. Suggested-by: Al Viro Signed-off-by: Tanzir Hasan Acked-by: Chuck Lever Signed-off-by: Anna Schumaker

rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

2024-05-20T15:37:15Z

Under the scenario of IB device bonding, when bringing down one of the ports, or all ports, we saw xprtrdma entering a non-recoverable state where it is not even possible to complete the disconnect and shut it down the mount, requiring a reboot. Following debug, we saw that transport connect never ended after receiving the RDMA_CM_EVENT_DEVICE_REMOVAL callback. The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is connected, and ESTABLISHED may not have happened. So need to work with each of these states accordingly. Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed') Cc: Sagi Grimberg Signed-off-by: Dan Aloni Reviewed-by: Sagi Grimberg Reviewed-by: Chuck Lever Signed-off-by: Trond Myklebust

rpcrdma: Introduce a simple cid tracepoint class

2024-01-07T22:54:27Z

De-duplicate some code, making it easier to add new tracepoints that report only a completion ID. Signed-off-by: Chuck Lever

xprtrdma: Remap Receive buffers after a reconnect

2023-08-19T14:26:29Z

On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA- unmapping the Receive buffers, but rpcrdma_post_recvs() neglected to remap them after a new connection had been established. The result was immediate failure of the new connection with the Receives flushing with LOCAL_PROT_ERR. Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal") Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust

xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()

2022-12-06T17:19:48Z

If rdma receive buffer allocate failed, should call rpcrdma_regbuf_free() to free the send buffer, otherwise, the buffer data will be leaked. Fixes: bb93a1ae2bf4 ("xprtrdma: Allocate req's regbufs at xprt create time") Signed-off-by: Zhang Xiaoxu Signed-off-by: Trond Myklebust

xprtrdma: Prevent memory allocations from driving a reclaim

2022-10-05T19:47:17Z

Many memory allocations that xprtrdma does can fail safely. Let's use this fact to avoid some potential deadlocks: Replace GFP_KERNEL with GFP flags that do not try hard to acquire memory. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker