<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/block/partition-generic.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-05-31T13:43:28Z</updated>
<entry>
<title>block: fix use-after-free on gendisk</title>
<updated>2019-05-31T13:43:28Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-04-02T12:06:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5090dd8f2804fa746e668fe5c047ad1664ef9528'/>
<id>urn:sha1:5090dd8f2804fa746e668fe5c047ad1664ef9528</id>
<content type='text'>
[ Upstream commit 2c88e3c7ec32d7a40cc7c9b4a487cf90e4671bdd ]

commit 2da78092dda "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev-&gt;devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.

However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:

Process1		Worker			Process2
md_free
						blkdev_open
del_gendisk
  add delete_partition_work_fn() to wq
  						__blkdev_get
						get_gendisk
put_disk
  disk_release
    kfree(disk)
    						find part from ext_devt_idr
						get_disk_and_module(disk)
    					  	cause use after free

    			delete_partition_work_fn
			put_device(part)
    		  	part_release
		    	remove part from ext_devt_idr

Before &lt;devt, hd_struct pointer&gt; is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().

We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.

Thanks to Jan Kara for providing the solution and more clear comments
for the code.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: return just one value from part_in_flight</title>
<updated>2018-12-10T15:30:38Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2018-12-06T16:41:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e016b78201a2d9ff40f3f0da072292689af24c7f'/>
<id>urn:sha1:e016b78201a2d9ff40f3f0da072292689af24c7f</id>
<content type='text'>
The previous patches deleted all the code that needed the second value
returned from part_in_flight - now the kernel only uses the first value.

Consequently, part_in_flight (and blk_mq_in_flight) may be changed so that
it only returns one value.

This patch just refactors the code, there's no functional change.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: delete part_round_stats and switch to less precise counting</title>
<updated>2018-12-10T15:30:37Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2018-12-06T16:41:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5b18b5a737600fd20ba2045f320d5926ebbf341a'/>
<id>urn:sha1:5b18b5a737600fd20ba2045f320d5926ebbf341a</id>
<content type='text'>
We want to convert to per-cpu in_flight counters.

The function part_round_stats needs the in_flight counter every jiffy, it
would be too costly to sum all the percpu variables every jiffy, so it
must be deleted. part_round_stats is used to calculate two counters -
time_in_queue and io_ticks.

time_in_queue can be calculated without part_round_stats, by adding the
duration of the I/O when the I/O ends (the value is almost as exact as the
previously calculated value, except that time for in-progress I/Os is not
counted).

io_ticks can be approximated by increasing the value when I/O is started
or ended and the jiffies value has changed. If the I/Os take less than a
jiffy, the value is as exact as the previously calculated value. If the
I/Os take more than a jiffy, io_ticks can drift behind the previously
calculated value.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: stop passing 'cpu' to all percpu stats methods</title>
<updated>2018-12-10T15:30:37Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-12-06T16:41:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=112f158f66cbe25fd561a5dfe9c3826e06abf757'/>
<id>urn:sha1:112f158f66cbe25fd561a5dfe9c3826e06abf757</id>
<content type='text'>
All of part_stat_* and related methods are used with preempt disabled,
so there is no need to pass cpu around to allow of them.  Just call
smp_processor_id() as needed.

Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use rcu_work instead of call_rcu to avoid sleep in softirq</title>
<updated>2018-11-28T16:08:27Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2018-11-28T08:42:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=94a2c3a32b62e868dc1e3d854326745a7f1b8c7a'/>
<id>urn:sha1:94a2c3a32b62e868dc1e3d854326745a7f1b8c7a</id>
<content type='text'>
We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] __dump_stack lib/dump_stack.c:15 [inline]
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [&lt;ffffffff8129a981&gt;] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [&lt;ffffffff8129ac33&gt;] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [&lt;ffffffff81794c13&gt;] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc_node mm/slub.c:2610 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc mm/slub.c:2692 [inline]
 [&lt;ffffffff81794c13&gt;] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [&lt;ffffffff81cbe9a7&gt;] kmalloc include/linux/slab.h:479 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kzalloc include/linux/slab.h:623 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [&lt;ffffffff81cbf84f&gt;] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [&lt;ffffffff81cbb5b9&gt;] kobject_cleanup lib/kobject.c:633 [inline]
 [&lt;ffffffff81cbb5b9&gt;] kobject_release+0x229/0x440 lib/kobject.c:675
 [&lt;ffffffff81cbb0a2&gt;] kref_sub include/linux/kref.h:73 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kref_put include/linux/kref.h:98 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kobject_put+0x72/0xd0 lib/kobject.c:692
 [&lt;ffffffff8216f095&gt;] put_device+0x25/0x30 drivers/base/core.c:1237
 [&lt;ffffffff81c4cc34&gt;] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [&lt;ffffffff813c08bc&gt;] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [&lt;ffffffff813c08bc&gt;] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [&lt;ffffffff813c08bc&gt;] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [&lt;ffffffff8120f509&gt;] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [&lt;ffffffff81210496&gt;] invoke_softirq kernel/softirq.c:350 [inline]
 [&lt;ffffffff81210496&gt;] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [&lt;ffffffff82c2cd7b&gt;] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [&lt;ffffffff82c2cd7b&gt;] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [&lt;ffffffff82c2bc25&gt;] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 &lt;EOI&gt;  [&lt;ffffffff814cbf40&gt;] ? audit_kill_trees+0x180/0x180
 [&lt;ffffffff8187d2f7&gt;] fd_install+0x57/0x80 fs/file.c:626
 [&lt;ffffffff8180989e&gt;] do_sys_open+0x45e/0x550 fs/open.c:1043
 [&lt;ffffffff818099c2&gt;] SYSC_open fs/open.c:1055 [inline]
 [&lt;ffffffff818099c2&gt;] SyS_open+0x32/0x40 fs/open.c:1050
 [&lt;ffffffff82c299e1&gt;] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use nanosecond resolution for iostat</title>
<updated>2018-09-22T02:26:59Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2018-09-21T23:44:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b57e99b4b8b0ebdf9707424e7ddc0c392bdc5fe6'/>
<id>urn:sha1:b57e99b4b8b0ebdf9707424e7ddc0c392bdc5fe6</id>
<content type='text'>
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.

Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.

Fixes: 522a777566f5 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche &lt;klaus.kusche@computerix.info&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Track DISCARD statistics and output them in stat and diskstat</title>
<updated>2018-07-18T14:44:22Z</updated>
<author>
<name>Michael Callahan</name>
<email>michaelcallahan@fb.com</email>
</author>
<published>2018-07-18T11:47:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bdca3c87fb7ad1cc61d231d37eb0d8f90d001e0c'/>
<id>urn:sha1:bdca3c87fb7ad1cc61d231d37eb0d8f90d001e0c</id>
<content type='text'>
Add tracking of REQ_OP_DISCARD ios to the partition statistics and
append them to the various stat files in /sys as well as
/proc/diskstats.  These are tracked with the same four stats as reads
and writes:

Number of discard ios completed.
Number of discard ios merged
Number of discard sectors completed
Milliseconds spent on discard requests

This is done via adding a new STAT_DISCARD define to genhd.h and then
using it to index that stat field for discard requests.

tj: Refreshed on top of v4.17 and other previous updates.

Signed-off-by: Michael Callahan &lt;michaelcallahan@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andy Newell &lt;newella@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Define and use STAT_READ and STAT_WRITE</title>
<updated>2018-07-18T14:44:18Z</updated>
<author>
<name>Michael Callahan</name>
<email>michaelcallahan@fb.com</email>
</author>
<published>2018-07-18T11:47:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dbae2c551377b6533a00c11fc7ede370100ab404'/>
<id>urn:sha1:dbae2c551377b6533a00c11fc7ede370100ab404</id>
<content type='text'>
Add defines for STAT_READ and STAT_WRITE for indexing the partition
stat entries. This clarifies some fs/ code which has hardcoded 1 for
STAT_WRITE and will make it easier to extend the stats with additional
fields.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan &lt;michaelcallahan@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: don't print a message when the device went away</title>
<updated>2018-05-29T14:59:21Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-05-29T14:42:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5afb78356cead66db2203061fed6fc8957527ed4'/>
<id>urn:sha1:5afb78356cead66db2203061fed6fc8957527ed4</id>
<content type='text'>
The information about a size change in this case just creates confusion.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block drivers/block: Use octal not symbolic permissions</title>
<updated>2018-05-24T19:38:59Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2018-05-24T19:38:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5657a819a8d94426c76be04dcedfad0f64cfff00'/>
<id>urn:sha1:5657a819a8d94426c76be04dcedfad0f64cfff00</id>
<content type='text'>
Convert the S_&lt;FOO&gt; symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace &lt;files...&gt;

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
