diff options
| author | Darrick J. Wong <djwong@kernel.org> | 2025-04-23 12:53:42 -0700 |
|---|---|---|
| committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2025-10-23 16:16:38 +0200 |
| commit | a0caf1de97e1edd7f3451f1818ea6cb970495fc5 (patch) | |
| tree | 30e9d94c86b6cbbdfc1fb3ab3ab81a4beed88e34 /block/blk-zoned.c | |
| parent | 8fdd0ad43977cf123f40570329344acb07d5b9be (diff) | |
block: fix race between set_blocksize and read paths
commit c0e473a0d226479e8e925d5ba93f751d8df628e9 upstream.
With the new large sector size support, it's now the case that
set_blocksize can change i_blksize and the folio order in a manner that
conflicts with a concurrent reader and causes a kernel crash.
Specifically, let's say that udev-worker calls libblkid to detect the
labels on a block device. The read call can create an order-0 folio to
read the first 4096 bytes from the disk. But then udev is preempted.
Next, someone tries to mount an 8k-sectorsize filesystem from the same
block device. The filesystem calls set_blksize, which sets i_blksize to
8192 and the minimum folio order to 1.
Now udev resumes, still holding the order-0 folio it allocated. It then
tries to schedule a read bio and do_mpage_readahead tries to create
bufferheads for the folio. Unfortunately, blocks_per_folio == 0 because
the page size is 4096 but the blocksize is 8192 so no bufferheads are
attached and the bh walk never sets bdev. We then submit the bio with a
NULL block device and crash.
Therefore, truncate the page cache after flushing but before updating
i_blksize. However, that's not enough -- we also need to lock out file
IO and page faults during the update. Take both the i_rwsem and the
invalidate_lock in exclusive mode for invalidations, and in shared mode
for read/write operations.
I don't know if this is the correct fix, but xfs/259 found it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/174543795699.4139148.2086129139322431423.stgit@frogsfrogsfrogs
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ use bdev->bd_inode instead ]
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'block/blk-zoned.c')
| -rw-r--r-- | block/blk-zoned.c | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 619ee41a51cc..644bfa1f6753 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -401,6 +401,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode, op = REQ_OP_ZONE_RESET; /* Invalidate the page cache, including dirty pages. */ + inode_lock(bdev->bd_inode); filemap_invalidate_lock(bdev->bd_inode->i_mapping); ret = blkdev_truncate_zone_range(bdev, mode, &zrange); if (ret) @@ -423,8 +424,10 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode, GFP_KERNEL); fail: - if (cmd == BLKRESETZONE) + if (cmd == BLKRESETZONE) { filemap_invalidate_unlock(bdev->bd_inode->i_mapping); + inode_unlock(bdev->bd_inode); + } return ret; } |
