summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
diff options
context:
space:
mode:
authorChuck Lever <chuck.lever@oracle.com>2026-03-13 15:41:58 -0400
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2026-04-02 13:25:32 +0200
commita068637274187f5ab7672cfb6edaa562994adc0a (patch)
tree6157b10e627ef9a8413bc0b7511b333a5785f131 /drivers/infiniband/core
parent460a7b84516db0817fa92e20a0b79a7142540acc (diff)
RDMA/rw: Fall back to direct SGE on MR pool exhaustion
[ Upstream commit 00da250c21b074ea9494c375d0117b69e5b1d0a4 ] When IOMMU passthrough mode is active, ib_dma_map_sgtable_attrs() produces no coalescing: each scatterlist page maps 1:1 to a DMA entry, so sgt.nents equals the raw page count. A 1 MB transfer yields 256 DMA entries. If that count exceeds the device's max_sgl_rd threshold (an optimization hint from mlx5 firmware), rdma_rw_io_needs_mr() steers the operation into the MR registration path. Each such operation consumes one or more MRs from a pool sized at max_rdma_ctxs -- roughly one MR per concurrent context. Under write-intensive workloads that issue many concurrent RDMA READs, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. Upper layer protocols treat this as a fatal DMA mapping failure and tear down the connection. The max_sgl_rd check is a performance optimization, not a correctness requirement: the device can handle large SGE counts via direct posting, just less efficiently than with MR registration. When the MR pool cannot satisfy a request, falling back to the direct SGE (map_wrs) path avoids the connection reset while preserving the MR optimization for the common case where pool resources are available. Add a fallback in rdma_rw_ctx_init() so that -EAGAIN from rdma_rw_init_mr_wrs() triggers direct SGE posting instead of propagating the error. iWARP devices, which mandate MR registration for RDMA READs, and force_mr debug mode continue to treat -EAGAIN as terminal. Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-2-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Diffstat (limited to 'drivers/infiniband/core')
-rw-r--r--drivers/infiniband/core/rw.c21
1 files changed, 18 insertions, 3 deletions
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 2522ff1cc462..49fbfe1cef68 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -326,14 +326,29 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num,
if (rdma_rw_io_needs_mr(qp->device, port_num, dir, sg_cnt)) {
ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_cnt,
sg_offset, remote_addr, rkey, dir);
- } else if (sg_cnt > 1) {
+ /*
+ * If MR init succeeded or failed for a reason other
+ * than pool exhaustion, that result is final.
+ *
+ * Pool exhaustion (-EAGAIN) from the max_sgl_rd
+ * optimization is recoverable: fall back to
+ * direct SGE posting. iWARP and force_mr require
+ * MRs unconditionally, so -EAGAIN is terminal.
+ */
+ if (ret != -EAGAIN ||
+ rdma_protocol_iwarp(qp->device, port_num) ||
+ unlikely(rdma_rw_force_mr))
+ goto out;
+ }
+
+ if (sg_cnt > 1)
ret = rdma_rw_init_map_wrs(ctx, qp, sg, sg_cnt, sg_offset,
remote_addr, rkey, dir);
- } else {
+ else
ret = rdma_rw_init_single_wr(ctx, qp, sg, sg_offset,
remote_addr, rkey, dir);
- }
+out:
if (ret < 0)
goto out_unmap_sg;
return ret;