kernel/drivers/nvme/target/rdma.c, branch linux-6.9.y

nvmet-rdma: fix possible bad dereference when freeing rsps

2024-05-08T13:17:01Z

It is possible that the host connected and saw a cm established event and started sending nvme capsules on the qp, however the ctrl did not yet see an established event. This is why the rsp_wait_list exists (for async handling of these cmds, we move them to a pending list). Furthermore, it is possible that the ctrl cm times out, resulting in a connect-error cm event. in this case we hit a bad deref [1] because in nvmet_rdma_free_rsps we assume that all the responses are in the free list. We are freeing the cmds array anyways, so don't even bother to remove the rsp from the free_list. It is also guaranteed that we are not racing anything when we are releasing the queue so no other context accessing this array should be running. [1]: -- Workqueue: nvmet-free-wq nvmet_rdma_free_queue_work [nvmet_rdma] [...] pc : nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] lr : nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] Call trace: nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] process_one_work+0x1ec/0x4a0 worker_thread+0x48/0x490 kthread+0x158/0x160 ret_from_fork+0x10/0x18 -- Signed-off-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch

nvmet-rdma: remove NVMET_RDMA_REQ_INVALIDATE_RKEY flag

2024-03-21T17:46:53Z

We can simply use invalidate_rkey to check instead of adding a flag. Signed-off-by: Guixin Liu Reviewed-by: Chaitanya Kulkarni Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch

nvmet-rdma: set max_queue_size for RDMA transport

2024-03-02T23:18:08Z

A new port configuration was added to set max_queue_size. Clamp user configuration to RDMA transport limits. Increase the maximal queue size of RDMA controllers from 128 to 256 (the default size stays 128 same as before). Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Signed-off-by: Keith Busch

nvme-rdma: introduce NVME_RDMA_MAX_METADATA_QUEUE_SIZE definition

2024-03-02T23:18:08Z

This definition will be used by controllers that are configured with metadata support. For now, both regular and metadata controllers have the same maximal queue size but later commit will increase the maximal queue size for regular RDMA controllers to 256. We'll keep the maximal queue size for metadata controllers to be 128 since there are more resources that are needed for metadata operations and 128 is the optimal size found for metadata controllers base on testing. Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Signed-off-by: Keith Busch

nvmet: add module description to stop warnings

2024-01-24T15:25:04Z

Add MODULE_DESCRIPTION() in order to remove warnings & get clean build:- WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvmet.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvme-loop.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvmet-rdma.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvmet-fc.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvme-fcloop.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/target/nvmet-tcp.o Signed-off-by: Chaitanya Kulkarni Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch

nvmet-rdma: avoid circular locking dependency on install_queue()

2024-01-10T21:27:45Z

nvmet_rdma_install_queue() is driven from the ->io_work workqueue function, but will call flush_workqueue() which might trigger ->release_work() which in itself calls flush_work on ->io_work. To avoid that check for pending queue in disconnecting status, and return 'controller busy' when we reached a certain threshold. Signed-off-by: Hannes Reinecke Tested-by: Shin'ichiro Kawasaki Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch

RDMA/core: introduce ib_dma_pci_p2p_dma_supported()

2022-07-26T11:28:07Z

Introduce the helper function ib_dma_pci_p2p_dma_supported() to check if a given ib_device can be used in P2PDMA transfers. This ensures the ib_device is not using virt_dma and also that the underlying dma_device supports P2PDMA. Use the new helper in nvme-rdma to replace the existing check for ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows switching away from pci_p2pdma_[un]map_sg(). Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Max Gurtovoy Reviewed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig

RDMA: Split kernel-only global device caps from uverbs device caps

2022-04-06T18:02:13Z

Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig Reviewed-by: Max Gurtovoy Signed-off-by: Jason Gunthorpe

nvmet: use a private workqueue instead of the system workqueue

2022-03-29T07:29:04Z

Any attempt to flush kernel-global WQs has possibility of deadlock so we should simply stop using them, instead introduce nvmet_wq which is the generic nvmet workqueue for work elements that don't explicitly require a dedicated workqueue (by the mere fact that they are using the system_wq). Changes were done using the following replaces: - s/schedule_work(/queue_work(nvmet_wq, /g - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g Reported-by: Tetsuo Handa Signed-off-by: Sagi Grimberg Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig

nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal

2022-03-14T07:43:06Z

This fixes following kernel-doc warning:- drivers/nvme/target/rdma.c:1722: warning: expecting prototype for nvme_rdma_device_removal(). Prototype was for nvmet_rdma_device_removal() instead Signed-off-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig