<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/btrfs/async-thread.h, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2018-04-12T14:29:46Z</updated>
<entry>
<title>btrfs: replace GPL boilerplate by SPDX -- headers</title>
<updated>2018-04-12T14:29:46Z</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2018-04-03T17:16:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9888c3402c8567a977de37f61e9dd87792723064'/>
<id>urn:sha1:9888c3402c8567a977de37f61e9dd87792723064</id>
<content type='text'>
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Unify the include protection macros to match the file names.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: constify tracepoint arguments</title>
<updated>2017-08-16T12:19:53Z</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2017-06-29T03:56:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9a35b63728ceb8602c111260044451dd64952500'/>
<id>urn:sha1:9a35b63728ceb8602c111260044451dd64952500</id>
<content type='text'>
Tracepoint arguments are all read-only.  If we mark the arguments
as const, we're able to keep or convert those arguments to const
where appropriate.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: limit async_work allocation and worker func duration</title>
<updated>2016-12-13T19:01:30Z</updated>
<author>
<name>Maxim Patlasov</name>
<email>mpatlasov@virtuozzo.com</email>
</author>
<published>2016-12-12T22:32:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2939e1a86f758b55cdba73e29397dd3d94df13bc'/>
<id>urn:sha1:2939e1a86f758b55cdba73e29397dd3d94df13bc</id>
<content type='text'>
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).

Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):

[root@kteam1 ~]# cat prep.sh

DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
	mkdir /mnt/$i
	btrfs subvolume create /mnt/SV_$i
	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
	chmod a+rwx /mnt/$i
done

[root@kteam1 ~]# sh prep.sh

[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 &amp; done

[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0

The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.

The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.

The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items &gt;=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items &lt; BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.

The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.

Changed in v2: remove support of thresh == NO_THRESHOLD.

Signed-off-by: Maxim Patlasov &lt;mpatlasov@virtuozzo.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Cc: stable@vger.kernel.org # v3.15+
</content>
</entry>
<entry>
<title>btrfs: plumb fs_info into btrfs_work</title>
<updated>2016-07-26T11:53:15Z</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2016-06-09T20:22:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cb001095ca705dcd95f57fe98867e38a4889916d'/>
<id>urn:sha1:cb001095ca705dcd95f57fe98867e38a4889916d</id>
<content type='text'>
In order to provide an fsid for trace events, we'll need a btrfs_fs_info
pointer.  The most lightweight way to do that for btrfs_work structures
is to associate it with the __btrfs_workqueue structure.  Each queued
btrfs_work structure has a workqueue associated with it, so that's
a natural fit.  It's a privately defined structures, so we add accessors
to retrieve the fs_info pointer.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: async_thread: Fix workqueue 'max_active' value when initializing</title>
<updated>2015-08-31T18:46:40Z</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2015-08-20T01:30:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c6dd6ea55758cf403bdc07a51a06c2a1d474f906'/>
<id>urn:sha1:c6dd6ea55758cf403bdc07a51a06c2a1d474f906</id>
<content type='text'>
At initializing time, for threshold-able workqueue, it's max_active
of kernel workqueue should be 1 and grow if it hits threshold.

But due to the bad naming, there is both 'max_active' for kernel
workqueue and btrfs workqueue.
So wrong value is given at workqueue initialization.

This patch fixes it, and to avoid further misunderstanding, change the
member name of btrfs_workqueue to 'current_active' and 'limit_active'.

Also corresponding comment is added for readability.

Reported-by: Alex Lyakas &lt;alex.btrfs@zadarastorage.com&gt;
Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>btrfs: Fix lockdep warning of wr_ctx-&gt;wr_lock in scrub_free_wr_ctx()</title>
<updated>2015-06-10T14:04:52Z</updated>
<author>
<name>Zhao Lei</name>
<email>zhaolei@cn.fujitsu.com</email>
</author>
<published>2015-06-04T12:09:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=20b2e3029eef277cd93a46a991004260057e1a9e'/>
<id>urn:sha1:20b2e3029eef277cd93a46a991004260057e1a9e</id>
<content type='text'>
lockdep report following warning in test:
 [25176.843958] =================================
 [25176.844519] [ INFO: inconsistent lock state ]
 [25176.845047] 4.1.0-rc3 #22 Tainted: G        W
 [25176.845591] ---------------------------------
 [25176.846153] inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
 [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [25176.847246]  (&amp;wr_ctx-&gt;wr_lock){+.?...}, at: [&lt;ffffffffa04cdc6d&gt;] scrub_free_ctx+0x2d/0xf0 [btrfs]
 [25176.847838] {SOFTIRQ-ON-W} state was registered at:
 [25176.848396]   [&lt;ffffffff810bf460&gt;] __lock_acquire+0x6a0/0xe10
 [25176.848955]   [&lt;ffffffff810bfd1e&gt;] lock_acquire+0xce/0x2c0
 [25176.849491]   [&lt;ffffffff816489af&gt;] mutex_lock_nested+0x7f/0x410
 [25176.850029]   [&lt;ffffffffa04d04ff&gt;] scrub_stripe+0x4df/0x1080 [btrfs]
 [25176.850575]   [&lt;ffffffffa04d11b1&gt;] scrub_chunk.isra.19+0x111/0x130 [btrfs]
 [25176.851110]   [&lt;ffffffffa04d144c&gt;] scrub_enumerate_chunks+0x27c/0x510 [btrfs]
 [25176.851660]   [&lt;ffffffffa04d3b87&gt;] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
 [25176.852189]   [&lt;ffffffffa04e918e&gt;] btrfs_dev_replace_start+0x36e/0x450 [btrfs]
 [25176.852771]   [&lt;ffffffffa04a98e0&gt;] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
 [25176.853315]   [&lt;ffffffff8121c5b8&gt;] do_vfs_ioctl+0x318/0x570
 [25176.853868]   [&lt;ffffffff8121c851&gt;] SyS_ioctl+0x41/0x80
 [25176.854406]   [&lt;ffffffff8164da17&gt;] system_call_fastpath+0x12/0x6f
 [25176.854935] irq event stamp: 51506
 [25176.855511] hardirqs last  enabled at (51506): [&lt;ffffffff810d4ce5&gt;] vprintk_emit+0x225/0x5e0
 [25176.856059] hardirqs last disabled at (51505): [&lt;ffffffff810d4b77&gt;] vprintk_emit+0xb7/0x5e0
 [25176.856642] softirqs last  enabled at (50886): [&lt;ffffffff81067a23&gt;] __do_softirq+0x363/0x640
 [25176.857184] softirqs last disabled at (50949): [&lt;ffffffff8106804d&gt;] irq_exit+0x10d/0x120
 [25176.857746]
 other info that might help us debug this:
 [25176.858845]  Possible unsafe locking scenario:
 [25176.859981]        CPU0
 [25176.860537]        ----
 [25176.861059]   lock(&amp;wr_ctx-&gt;wr_lock);
 [25176.861705]   &lt;Interrupt&gt;
 [25176.862272]     lock(&amp;wr_ctx-&gt;wr_lock);
 [25176.862881]
  *** DEADLOCK ***

Reason:
 Above warning is caused by:
 Interrupt
 -&gt; bio_endio()
 -&gt; ...
 -&gt; scrub_put_ctx()
 -&gt; scrub_free_ctx() *1
 -&gt; ...
 -&gt; mutex_lock(&amp;wr_ctx-&gt;wr_lock);

 scrub_put_ctx() is allowed to be called in end_bio interrupt, but
 in code design, it will never call scrub_free_ctx(sctx) in interrupe
 context(above *1), because btrfs_scrub_dev() get one additional
 reference of sctx-&gt;refs, which makes scrub_free_ctx() only called
 withine btrfs_scrub_dev().

 Now the code runs out of our wish, because free sequence in
 scrub_pending_bio_dec() have a gap.

 Current code:
 -----------------------------------+-----------------------------------
 scrub_pending_bio_dec()            |  btrfs_scrub_dev
 -----------------------------------+-----------------------------------
 atomic_dec(&amp;sctx-&gt;bios_in_flight); |
 wake_up(&amp;sctx-&gt;list_wait);         |
                                    | scrub_put_ctx()
                                    | -&gt; atomic_dec_and_test(&amp;sctx-&gt;refs)
 scrub_put_ctx(sctx);               |
 -&gt; atomic_dec_and_test(&amp;sctx-&gt;refs)|
 -&gt; scrub_free_ctx()                |
 -----------------------------------+-----------------------------------

 We expected:
 -----------------------------------+-----------------------------------
 scrub_pending_bio_dec()            |  btrfs_scrub_dev
 -----------------------------------+-----------------------------------
 atomic_dec(&amp;sctx-&gt;bios_in_flight); |
 wake_up(&amp;sctx-&gt;list_wait);         |
 scrub_put_ctx(sctx);               |
 -&gt; atomic_dec_and_test(&amp;sctx-&gt;refs)|
                                    | scrub_put_ctx()
                                    | -&gt; atomic_dec_and_test(&amp;sctx-&gt;refs)
                                    | -&gt; scrub_free_ctx()
 -----------------------------------+-----------------------------------

Fix:
 Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
 in interrupt context.
 Tested by check tracelog in debug.

Changelog v1-&gt;v2:
 Use workqueue instead of adjust function call sequence in v1,
 because v1 will introduce a bug pointed out by:
 Filipe David Manana &lt;fdmanana@gmail.com&gt;

Reported-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Zhao Lei &lt;zhaolei@cn.fujitsu.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>btrfs: use correct type for workqueue flags</title>
<updated>2015-02-16T17:48:47Z</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.cz</email>
</author>
<published>2015-02-16T17:34:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6f0110581911623df08bf4b65fdef4548ebcda0a'/>
<id>urn:sha1:6f0110581911623df08bf4b65fdef4548ebcda0a</id>
<content type='text'>
Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key
takes an unsigned int.

Signed-off-by: David Sterba &lt;dsterba@suse.cz&gt;
</content>
</entry>
<entry>
<title>Btrfs: implement repair function when direct read fails</title>
<updated>2014-09-17T20:39:01Z</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2014-09-12T10:44:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8b110e393c5a6e72d50fcdf9fa7ed8b647cfdfc9'/>
<id>urn:sha1:8b110e393c5a6e72d50fcdf9fa7ed8b647cfdfc9</id>
<content type='text'>
This patch implement data repair function when direct read fails.

The detail of the implementation is:
- When we find the data is not right, we try to read the data from the other
  mirror.
- When the io on the mirror ends, we will insert the endio work into the
  dedicated btrfs workqueue, not common read endio workqueue, because the
  original endio work is still blocked in the btrfs endio workqueue, if we
  insert the endio work of the io on the mirror into that workqueue, deadlock
  would happen.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
  mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>Btrfs: fix task hang under heavy compressed write</title>
<updated>2014-08-24T14:17:02Z</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2014-08-15T15:36:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9e0af23764344f7f1b68e4eefbe7dc865018b63d'/>
<id>urn:sha1:9e0af23764344f7f1b68e4eefbe7dc865018b63d</id>
<content type='text'>
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work-&gt;func() &lt;---- (we name it work X)
    for ordered_work in wq-&gt;ordered_list
            ordered_work-&gt;ordered_func()
            ordered_work-&gt;ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() &lt;--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker-&gt;current_work = arg;  &lt;-- arg is A-&gt;normal_work
	worker-&gt;current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A-&gt;func()
		     A-&gt;ordered_func()
		     A-&gt;ordered_free()  &lt;-- A gets freed

		     B-&gt;ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   &lt;-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B-&gt;ordered_free()

As if work A has a high priority in wq-&gt;ordered_list and there are more ordered
works queued after it, such as B-&gt;ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker-&gt;current_work pointer to be work
A-&gt;normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C-&gt;normal_work
and work A-&gt;normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C-&gt;normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work-&gt;func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>btrfs: Add trace for btrfs_workqueue alloc/destroy</title>
<updated>2014-03-21T00:15:28Z</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-03-12T08:05:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c3a468915a384c0015263edd9b7263775599a323'/>
<id>urn:sha1:c3a468915a384c0015263edd9b7263775599a323</id>
<content type='text'>
Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).

Signed-off-by: Qu Wenruo &lt;quenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
</feed>
