<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/block, branch linux-5.7.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.7.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.7.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2020-08-19T06:23:33Z</updated>
<entry>
<title>block: don't do revalidate zones on invalid devices</title>
<updated>2020-08-19T06:23:33Z</updated>
<author>
<name>Johannes Thumshirn</name>
<email>johannes.thumshirn@wdc.com</email>
</author>
<published>2020-07-30T11:25:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8904c89418a5a691b16be9d862e8bda5f0332803'/>
<id>urn:sha1:8904c89418a5a691b16be9d862e8bda5f0332803</id>
<content type='text'>
[ Upstream commit 1a1206dc4cf02cee4b5cbce583ee4c22368b4c28 ]

When we loose a device for whatever reason while (re)scanning zones, we
trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
log:

sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0xb [current]
sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
sda: failed to revalidate zones
sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
sda: detected capacity change from 14000519643136 to 0
==================================================================
BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58

CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 dump_stack+0x7d/0xb0
 ? blk_revalidate_zone_cb+0x1b7/0x550
 kasan_report.cold+0x5/0x37
 ? blk_revalidate_zone_cb+0x1b7/0x550
 check_memory_region+0x145/0x1a0
 blk_revalidate_zone_cb+0x1b7/0x550
 sd_zbc_parse_report+0x1f1/0x370
 ? blk_req_zone_write_trylock+0x200/0x200
 ? sectors_to_logical+0x60/0x60
 ? blk_req_zone_write_trylock+0x200/0x200
 ? blk_req_zone_write_trylock+0x200/0x200
 sd_zbc_report_zones+0x3c4/0x5e0
 ? sd_dif_config_host+0x500/0x500
 blk_revalidate_disk_zones+0x231/0x44d
 ? _raw_write_lock_irqsave+0xb0/0xb0
 ? blk_queue_free_zone_bitmaps+0xd0/0xd0
 sd_zbc_read_zones+0x8cf/0x11a0
 sd_revalidate_disk+0x305c/0x64e0
 ? __device_add_disk+0x776/0xf20
 ? read_capacity_16.part.0+0x1080/0x1080
 ? blk_alloc_devt+0x250/0x250
 ? create_object.isra.0+0x595/0xa20
 ? kasan_unpoison_shadow+0x33/0x40
 sd_probe+0x8dc/0xcd2
 really_probe+0x20e/0xaf0
 __driver_attach_async_helper+0x249/0x2d0
 async_run_entry_fn+0xbe/0x560
 process_one_work+0x764/0x1290
 ? _raw_read_unlock_irqrestore+0x30/0x30
 worker_thread+0x598/0x12f0
 ? __kthread_parkme+0xc6/0x1b0
 ? schedule+0xed/0x2c0
 ? process_one_work+0x1290/0x1290
 kthread+0x36b/0x440
 ? kthread_create_worker_on_cpu+0xa0/0xa0
 ret_from_fork+0x22/0x30
==================================================================

When the device is already gone we end up with the following scenario:
The device's capacity is 0 and thus the number of zones will be 0 as well. When
allocating the bitmap for the conventional zones, we then trip over a NULL
pointer.

So if we encounter a zoned block device with a 0 capacity, don't dare to
revalidate the zones sizes.

Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")
Signed-off-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iocost: Fix check condition of iocg abs_vdebt</title>
<updated>2020-08-19T06:23:32Z</updated>
<author>
<name>Chengming Zhou</name>
<email>zhouchengming@bytedance.com</email>
</author>
<published>2020-07-30T09:03:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d850fdbff6f468db14a34c335cc95feb4bfb4be1'/>
<id>urn:sha1:d850fdbff6f468db14a34c335cc95feb4bfb4be1</id>
<content type='text'>
[ Upstream commit d9012a59db54442d5b2fcfdfcded35cf566397d3 ]

We shouldn't skip iocg when its abs_vdebt is not zero.

Fixes: 0b80f9866e6b ("iocost: protect iocg-&gt;abs_vdebt with iocg-&gt;waitq.lock")
Signed-off-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flags</title>
<updated>2020-07-22T07:33:59Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2020-04-28T01:54:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1bccfbbd11d431f1550646ce395d6b6b60eb6f1a'/>
<id>urn:sha1:1bccfbbd11d431f1550646ce395d6b6b60eb6f1a</id>
<content type='text'>
[ Upstream commit bfe373f608cf81b7626dfeb904001b0e867c5110 ]

Else there may be magic numbers in /sys/kernel/debug/block/*/state.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: consider non-idle request as "inflight" in blk_mq_rq_inflight()</title>
<updated>2020-07-16T06:13:34Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2020-07-07T15:04:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1506084b7799564f6da07c1eae57051a7afc1d51'/>
<id>urn:sha1:1506084b7799564f6da07c1eae57051a7afc1d51</id>
<content type='text'>
commit 05a4fed69ff00a8bd83538684cb602a4636b07a7 upstream.

dm-multipath is the only user of blk_mq_queue_inflight().  When
dm-multipath calls blk_mq_queue_inflight() to check if it has
outstanding IO it can get a false negative.  The reason for this is
blk_mq_rq_inflight() doesn't consider requests that are no longer
MQ_RQ_IN_FLIGHT but that are now MQ_RQ_COMPLETE (-&gt;complete isn't
called or finished yet) as "inflight".

This causes request-based dm-multipath's dm_wait_for_completion() to
return before all outstanding dm-multipath requests have actually
completed.  This breaks DM multipath's suspend functionality because
blk-mq requests complete after DM's suspend has finished -- which
shouldn't happen.

Fix this by considering any request not in the MQ_RQ_IDLE state
(so either MQ_RQ_COMPLETE or MQ_RQ_IN_FLIGHT) as "inflight" in
blk_mq_rq_inflight().

Fixes: 3c94d83cb3526 ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: release bip in a right way in error path</title>
<updated>2020-07-16T06:13:16Z</updated>
<author>
<name>Chengguang Xu</name>
<email>cgxu519@mykernel.net</email>
</author>
<published>2020-06-24T10:21:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7a28be6a6eed46837abdaa097b0f0be7bd769323'/>
<id>urn:sha1:7a28be6a6eed46837abdaa097b0f0be7bd769323</id>
<content type='text'>
[ Upstream commit 0b8eb629a700c0ef15a437758db8255f8444e76c ]

Release bip using kfree() in error path when that was allocated
by kmalloc().

Signed-off-by: Chengguang Xu &lt;cgxu519@mykernel.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: update hctx map when use multiple maps</title>
<updated>2020-06-30T19:36:19Z</updated>
<author>
<name>Weiping Zhang</name>
<email>zhangweiping@didiglobal.com</email>
</author>
<published>2020-06-17T06:18:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f6f7d573f8a18174c0c8c90d7e778f12846cd97d'/>
<id>urn:sha1:f6f7d573f8a18174c0c8c90d7e778f12846cd97d</id>
<content type='text'>
[ Upstream commit fe35ec58f0d339221643287bbb7cee15c93a5389 ]

There is an issue when tune the number for read and write queues,
if the total queue count was not changed. The hctx-&gt;type cannot
be updated, since __blk_mq_update_nr_hw_queues will return directly
if the total queue count has not been changed.

Reproduce:

dmesg | grep "default/read/poll"
[    2.607459] nvme nvme0: 48/0/0 default/read/poll queues
cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
     48 default

tune the write queues to 24:
echo 24 &gt; /sys/module/nvme/parameters/write_queues
echo 1 &gt; /sys/block/nvme0n1/device/reset_controller

dmesg | grep "default/read/poll"
[  433.547235] nvme nvme0: 24/24/0 default/read/poll queues

cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
     48 default

The driver's hardware queue mapping is not same as block layer.

Signed-off-by: Weiping Zhang &lt;zhangweiping@didiglobal.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed</title>
<updated>2020-06-30T19:35:44Z</updated>
<author>
<name>yu kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2020-06-01T12:38:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=75afdbf6aa42bf65d53372f97e49b887f2e6d9a1'/>
<id>urn:sha1:75afdbf6aa42bf65d53372f97e49b887f2e6d9a1</id>
<content type='text'>
commit a75ca9303175d36af93c0937dd9b1a6422908b8d upstream.

commit e7bf90e5afe3 ("block/bio-integrity: fix a memory leak bug") added
a kfree() for 'buf' if bio_integrity_add_page() returns '0'. However,
the object will be freed in bio_integrity_free() since 'bio-&gt;bi_opf' and
'bio-&gt;bi_integrity' were set previousy in bio_integrity_alloc().

Fixes: commit e7bf90e5afe3 ("block/bio-integrity: fix a memory leak bug")
Signed-off-by: yu kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Bob Liu &lt;bob.liu@oracle.com&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: nr_sects_write(): Disable preemption on seqcount write</title>
<updated>2020-06-22T07:33:08Z</updated>
<author>
<name>Ahmed S. Darwish</name>
<email>a.darwish@linutronix.de</email>
</author>
<published>2020-06-03T14:49:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=68f1dfc8702f7eb793d34243cb6dac658badc164'/>
<id>urn:sha1:68f1dfc8702f7eb793d34243cb6dac658badc164</id>
<content type='text'>
commit 15b81ce5abdc4b502aa31dff2d415b79d2349d2f upstream.

For optimized block readers not holding a mutex, the "number of sectors"
64-bit value is protected from tearing on 32-bit architectures by a
sequence counter.

Disable preemption before entering that sequence counter's write side
critical section. Otherwise, the read side can preempt the write side
section and spin for the entire scheduler tick. If the reader belongs to
a real-time scheduling class, it can spin forever and the kernel will
livelock.

Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>iocost: don't let vrate run wild while there's no saturation signal</title>
<updated>2020-06-22T07:32:44Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-10-15T00:18:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a348ba72763ac921275bf7391abf9996c09c9cfe'/>
<id>urn:sha1:a348ba72763ac921275bf7391abf9996c09c9cfe</id>
<content type='text'>
[ Upstream commit 81ca627a933063fa63a6d4c66425de822a2ab7f5 ]

When the QoS targets are met and nothing is being throttled, there's
no way to tell how saturated the underlying device is - it could be
almost entirely idle, at the cusp of saturation or anywhere inbetween.
Given that there's no information, it's best to keep vrate as-is in
this state.  Before 7cd806a9a953 ("iocost: improve nr_lagging
handling"), this was the case - if the device isn't missing QoS
targets and nothing is being throttled, busy_level was reset to zero.

While fixing nr_lagging handling, 7cd806a9a953 ("iocost: improve
nr_lagging handling") broke this.  Now, while the device is hitting
QoS targets and nothing is being throttled, vrate keeps getting
adjusted according to the existing busy_level.

This led to vrate keeping climing till it hits max when there's an IO
issuer with limited request concurrency if the vrate started low.
vrate starts getting adjusted upwards until the issuer can issue IOs
w/o being throttled.  From then on, QoS targets keeps getting met and
nothing on the system needs throttling and vrate keeps getting
increased due to the existing busy_level.

This patch makes the following changes to the busy_level logic.

* Reset busy_level if nr_shortages is zero to avoid the above
  scenario.

* Make non-zero nr_lagging block lowering nr_level but still clear
  positive busy_level if there's clear non-saturation signal - QoS
  targets are met and nr_shortages is non-zero.  nr_lagging's role is
  preventing adjusting vrate upwards while there are long-running
  commands and it shouldn't keep busy_level positive while there's
  clear non-saturation signal.

* Restructure code for clarity and add comments.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Andy Newell &lt;newella@fb.com&gt;
Fixes: 7cd806a9a953 ("iocost: improve nr_lagging handling")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: reset mapping if failed to update hardware queue count</title>
<updated>2020-06-22T07:32:21Z</updated>
<author>
<name>Weiping Zhang</name>
<email>zhangweiping@didiglobal.com</email>
</author>
<published>2020-05-13T00:44:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2b8d00334a55cc0537ecca6a581ae5c448f12be3'/>
<id>urn:sha1:2b8d00334a55cc0537ecca6a581ae5c448f12be3</id>
<content type='text'>
[ Upstream commit aa880ad690ab6d4c53934af85fb5a43e69ecb0f5 ]

When we increase hardware queue count, blk_mq_update_queue_map will
reset the mapping between cpu and hardware queue base on the hardware
queue count(set-&gt;nr_hw_queues). The mapping cannot be reset if it
encounters error in blk_mq_realloc_hw_ctxs, but the fallback flow will
continue using it, then blk_mq_map_swqueue will touch a invalid memory,
because the mapping points to a wrong hctx.

blktest block/030:

null_blk: module loaded
Increasing nr_hw_queues to 8 fails, fallback to 1
==================================================================
BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
Read of size 8 at addr 0000000000000128 by task nproc/8541

CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack+0xa5/0xe6
__kasan_report.cold+0x65/0xbb
kasan_report+0x45/0x60
check_memory_region+0x15e/0x1c0
__kasan_check_read+0x15/0x20
blk_mq_map_swqueue+0x2f2/0x830
__blk_mq_update_nr_hw_queues+0x3df/0x690
blk_mq_update_nr_hw_queues+0x32/0x50
nullb_device_submit_queues_store+0xde/0x160 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x14b/0x2d0
ksys_write+0xdd/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x310
entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Weiping Zhang &lt;zhangweiping@didiglobal.com&gt;
Tested-by: Bart van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
