summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12rnbd-srv: don't pass a holder for non-exclusive blkdev_get_by_pathChristoph Hellwig
Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't make sense, so pass NULL instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230608110258.189493-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: remove the unused mode argument to ->releaseChristoph Hellwig
The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: pass a gendisk to ->openChristoph Hellwig
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: pass a gendisk on bdev_check_media_changeChristoph Hellwig
bdev_check_media_change should only ever be called for the whole device. Pass a gendisk to make that explicit and rename the function to disk_check_media_change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: make process_msg_sess_info returns voidGuoqing Jiang
Change the return type to void given it always returns 0. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-9-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: init err earlier in rnbd_srv_init_moduleGuoqing Jiang
With this, we can remove several lines of code. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-8-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: init ret with 0 instead of -EPERMGuoqing Jiang
Let's always set errno after pr_err which is consistent with default case. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-7-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: rename one member in rnbd_srv_devGuoqing Jiang
It actually represents the name of rnbd_srv_dev. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-6-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: no need to check sess_devGuoqing Jiang
Check ret is enough since if sess_dev is NULL which also implies ret should be 0. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-5-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd: introduce rnbd_access_modesGuoqing Jiang
Add one new array (marked with __maybe_unused to prevent gcc warning about "defined but not used" with W=1), then we can remove rnbd_access_mode_str and rnbd-common.c accordingly. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230524070026.2932-4-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd-srv: remove unused headerGuoqing Jiang
No need to include it since none of macros in limits.h are used by rnbd-srv. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-3-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11block/rnbd: kill rnbd_flags_supportedGuoqing Jiang
This routine is not called since added. Then the two flags (RNBD_OP_LAST and RNBD_F_ALL) can be removed too after kill rnbd_flags_supported. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-2-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-09Merge tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix an issue with the hardware queue nr_active, causing it to become imbalanced (Tian) - Fix an issue with null_blk not releasing pages if configured as memory backed (Nitesh) - Fix a locking issue in dasd (Jan) * tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux: s390/dasd: Use correct lock while counting channel queue length null_blk: Fix: memory release when memory_backed=1 blk-mq: fix blk_mq_hw_ctx active request accounting
2023-06-07pktcdvd: Sort headersAndy Shevchenko
Sort the headers in alphabetic order in order to ease the maintenance for this part. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-10-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of redundant 'else'Andy Shevchenko
In the snippets like the following if (...) return / goto / break / continue ...; else ... the 'else' is redundant. Get rid of it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-9-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Use put_unaligned_be16() and get_unaligned_be16()Andy Shevchenko
This makes the driver code slightly better to understand. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230310164549.22133-8-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Use DEFINE_SHOW_ATTRIBUTE() to simplify codeAndy Shevchenko
Use DEFINE_SHOW_ATTRIBUTE() helper macro to simplify the code. No functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-7-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Drop redundant castings for sector_tAndy Shevchenko
Since the commit 72deb455b5ec ("block: remove CONFIG_LBDAF") the sector_t is always 64-bit type, no need to cast anymore. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-6-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of pkt_seq_show() forward declarationAndy Shevchenko
The code can be neater without forward declarations. Get rid of pkt_seq_show() forward declaration. This will also allow futher cleanups to be cleaner. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-5-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: use sysfs_emit() to instead of scnprintf()Andy Shevchenko
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: replace sscanf() by kstrtoul()Andy Shevchenko
The checkpatch.pl warns: "Prefer kstrto<type> to single variable sscanf". Fix the code accordingly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230310164549.22133-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of custom printing macrosAndy Shevchenko
We may use traditional dev_*() macros instead of custom ones provided by the driver. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07nbd: Add the maximum limit of allocated index in nbd_dev_addZhong Jinghua
If the index allocated by idr_alloc greater than MINORMASK >> part_shift, the device number will overflow, resulting in failure to create a block device. Fix it by imiting the size of the max allocation. Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230605122159.2134384-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-06rbd: get snapshot context after exclusive lock is ensured to be heldIlya Dryomov
Move capturing the snapshot context into the image request state machine, after exclusive lock is ensured to be held for the duration of dealing with the image request. This is needed to ensure correctness of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object deltas computed based off of them. Otherwise the object map that is forked for the snapshot isn't guaranteed to accurately reflect the contents of the snapshot when the snapshot is taken under I/O. This breaks differential backup and snapshot-based mirroring use cases with fast-diff enabled: since some object deltas may be incomplete, the destination image may get corrupted. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61472 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2023-06-06rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag settingIlya Dryomov
Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request state machine to allow for the snapshot context to be captured in the image request state machine rather than in rbd_queue_workfn(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2023-06-05null_blk: Fix: memory release when memory_backed=1Nitesh Shetty
Memory/pages are not freed, when unloading nullblk driver. Steps to reproduce issue 1.free -h total used free shared buff/cache available Mem: 7.8Gi 260Mi 7.1Gi 3.0Mi 395Mi 7.3Gi Swap: 0B 0B 0B 2.modprobe null_blk memory_backed=1 3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000 4.modprobe -r null_blk 5.free -h total used free shared buff/cache available Mem: 7.8Gi 1.2Gi 6.1Gi 3.0Mi 398Mi 6.3Gi Swap: 0B 0B 0B Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05drbd: stop defining __KERNEL_SYSCALLS__Christoph Hellwig
__KERNEL_SYSCALLS__ hasn't been needed since Linux 2.6.19 so stop defining it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230601151646.1386867-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-04ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31floppy: use __bio_add_page for adding single page to bioJohannes Thumshirn
The floppy code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/33c445a3b431270c72d9be03d5da1b08ae983920.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31zram: use __bio_add_page for adding single page to bioJohannes Thumshirn
The zram writeback code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/cfd141dd7773315879a126f2aa81b7f698bc0e10.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31drbd: use __bio_add_page to add page to bioJohannes Thumshirn
The drbd code only adds a single page to a newly created bio. So use __bio_add_page() to add the page which is guaranteed to succeed in this case. This brings us closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/435007afac14f3766455559059d21843771fae53.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-27Merge tag 'for-linus-6.4-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a double free fix in the Xen pvcalls backend driver - a fix for a regression causing the MSI related sysfs entries to not being created in Xen PV guests - a fix in the Xen blkfront driver for handling insane input data better * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pci/xen: populate MSI sysfs entries xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() xen/blkfront: Only check REQ_FUA for writes
2023-05-24xen/blkfront: Only check REQ_FUA for writesRoss Lagerwall
The existing code silently converts read operations with the REQ_FUA bit set into write-barrier operations. This results in data loss as the backend scribbles zeroes over the data instead of returning it. While the REQ_FUA bit doesn't make sense on a read operation, at least one well-known out-of-tree kernel module does set it and since it results in data loss, let's be safe here and only look at REQ_FUA for writes. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-20ublk: fix build warning on iov_iter_get_pages2Ming Lei
Return type of iov_iter_get_pages2() is ssize_t instead of size_t, so fix it. Fixes: 981f95a571e3 ("ublk: cleanup ublk_copy_user_pages") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230520151134.459679-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: support user copyMing Lei
Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: add read()/write() support for ublk char deviceMing Lei
Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. IO buffer can be located by iocb->ki_pos which encodes buffer offset, io tag and queue id info, and type of iocb->ki_pos is u64, so it is big enough for holding reasonable queue depth, nr_queues and max io buffer size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: support to copy any part of request pagesMing Lei
Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: grab request reference when the request is handled by userspaceMing Lei
Add one reference counter into request pdu data, and hold this reference in the request's lifetime. Prepare for supporting to move request data copy into userspace, which needs to copy request data by read()/write() on /dev/ublkcN, so we have to guarantee that read()/write() is done on one valid/active request, and that will be enhanced by holding the io request reference in read()/write(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: cleanup ublk_copy_user_pagesMing Lei
Clean up ublk_copy_user_pages() by using iov_iter_get_pages2, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: cleanup io cmd code path by adding ublk_fill_io_cmd()Ming Lei
Add one small helper to cleanup io command hanlding code path. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: kill queuing request by task_work_addMing Lei
task_work_add() is used from early ublk development stage for handling request in batch. However, since commit 7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order"), we can get similar batch processing with io_uring_cmd_complete_in_task(), and similar performance data is observed between task_work_add() and io_uring_cmd_complete_in_task(). Meantime we can kill one fast code path, which is actually seldom used given it is common to build ublk driver as module. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-18ublk: fix AB-BA lockdep warningMing Lei
When handling UBLK_IO_FETCH_REQ, ctx->uring_lock is grabbed first, then ub->mutex is acquired. When handling UBLK_CMD_STOP_DEV or UBLK_CMD_DEL_DEV, ub->mutex is grabbed first, then calling io_uring_cmd_done() for canceling uring command, in which ctx->uring_lock may be required. Real deadlock only happens when all the above commands are issued from same uring context, and in reality different uring contexts are often used for handing control command and IO command. Fix the issue by using io_uring_cmd_complete_in_task() to cancel command in ublk_cancel_dev(ublk_cancel_queue). Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/becol2g7sawl4rsjq2dztsbc7mqypfqko6wzsyoyazqydoasml@rcxarzwidrhk Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20230517133408.210944-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-16brd: use XArray instead of radix-tree to index backing pagesPankaj Raghav
XArray was introduced to hold large array of pointers with a simple API. XArray API also provides array semantics which simplifies the way we store and access the backing pages, and the code becomes significantly easier to understand. No performance difference was noticed between the two implementation using fio with direct=1 [1]. [1] Performance in KIOPS: | radix-tree | XArray | Diff | | | write | 315 | 313 | -0.6% randwrite | 286 | 290 | +1.3% read | 330 | 335 | +1.5% randread | 309 | 312 | +0.9% Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230511121544.111648-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-12ublk: fix command op code checkMing Lei
In case of CONFIG_BLKDEV_UBLK_LEGACY_OPCODES, type of cmd opcode could be 0 or 'u'; and type can only be 'u' if CONFIG_BLKDEV_UBLK_LEGACY_OPCODES isn't set. So fix the wrong check. Fixes: 2d786e66c966 ("block: ublk: switch to ioctl command encoding") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230505153142.1258336-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-12block/rnbd: replace REQ_OP_FLUSH with REQ_OP_WRITEGuoqing Jiang
Since flush bios are implemented as writes with no data and the preflush flag per Christoph's comment [1]. And we need to change it in rnbd accordingly. Otherwise, I got splatting when create fs from rnbd client. [ 464.028545] ------------[ cut here ]------------ [ 464.028553] WARNING: CPU: 0 PID: 65 at block/blk-core.c:751 submit_bio_noacct+0x32c/0x5d0 [ ... ] [ 464.028668] CPU: 0 PID: 65 Comm: kworker/0:1H Tainted: G OE 6.4.0-rc1 #9 [ 464.028671] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 464.028673] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 464.028717] RIP: 0010:submit_bio_noacct+0x32c/0x5d0 [ 464.028720] Code: 03 0f 85 51 fe ff ff 48 8b 43 18 8b 88 04 03 00 00 85 c9 0f 85 3f fe ff ff e9 be fd ff ff 0f b6 d0 3c 0d 74 26 83 fa 01 74 21 <0f> 0b b8 0a 00 00 00 e9 56 fd ff ff 4c 89 e7 e8 70 a1 03 00 84 c0 [ 464.028722] RSP: 0018:ffffaf3680b57c68 EFLAGS: 00010202 [ 464.028724] RAX: 0000000000060802 RBX: ffffa09dcc18bf00 RCX: 0000000000000000 [ 464.028726] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffa09dde081d00 [ 464.028727] RBP: ffffaf3680b57c98 R08: ffffa09dde081d00 R09: ffffa09e38327200 [ 464.028729] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa09dde081d00 [ 464.028730] R13: ffffa09dcb06e1e8 R14: 0000000000000000 R15: 0000000000200000 [ 464.028733] FS: 0000000000000000(0000) GS:ffffa09e3bc00000(0000) knlGS:0000000000000000 [ 464.028735] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 464.028736] CR2: 000055a4e8206c40 CR3: 0000000119f06000 CR4: 00000000003506f0 [ 464.028738] Call Trace: [ 464.028740] <TASK> [ 464.028746] submit_bio+0x1b/0x80 [ 464.028748] rnbd_srv_rdma_ev+0x50d/0x10c0 [rnbd_server] [ 464.028754] ? percpu_ref_get_many.constprop.0+0x55/0x140 [rtrs_server] [ 464.028760] ? __this_cpu_preempt_check+0x13/0x20 [ 464.028769] process_io_req+0x1dc/0x450 [rtrs_server] [ 464.028775] rtrs_srv_inv_rkey_done+0x67/0xb0 [rtrs_server] [ 464.028780] __ib_process_cq+0xbc/0x1f0 [ib_core] [ 464.028793] ib_cq_poll_work+0x2b/0xa0 [ib_core] [ 464.028804] process_one_work+0x2a9/0x580 [1]. https://lore.kernel.org/all/ZFHgefWofVt24tRl@infradead.org/ Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230512034631.28686-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-12nbd: Fix debugfs_create_dir error checkingIvan Orlov
The debugfs_create_dir function returns ERR_PTR in case of error, and the only correct way to check if an error occurred is 'IS_ERR' inline function. This patch will replace the null-comparison with IS_ERR. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20230512130533.98709-1-ivan.orlov0322@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-07Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linuxLinus Torvalds
Pull more io_uring updates from Jens Axboe: "Nothing major in here, just two different parts: - A small series from Breno that enables passing the full SQE down for ->uring_cmd(). This is a prerequisite for enabling full network socket operations. Queued up a bit late because of some stylistic concerns that got resolved, would be nice to have this in 6.4-rc1 so the dependent work will be easier to handle for 6.5. - Fix for the huge page coalescing, which was a regression introduced in the 6.3 kernel release (Tobias)" * tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux: io_uring: Remove unnecessary BUILD_BUG_ON io_uring: Pass whole sqe to commands io_uring: Create a helper to return the SQE size io_uring/rsrc: check for nonconsecutive pages
2023-05-06Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()