<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/eventpoll.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-07-10T07:52:20Z</updated>
<entry>
<title>signal: remove the wrong signal_pending() check in restore_user_sigmask()</title>
<updated>2019-07-10T07:52:20Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2019-06-28T19:06:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=50c5095a4db998c8c268c3edead147bb4857d4b9'/>
<id>urn:sha1:50c5095a4db998c8c268c3edead147bb4857d4b9</id>
<content type='text'>
commit 97abc889ee296faf95ca0e978340fb7b942a3e32 upstream.

This is the minimal fix for stable, I'll send cleanups later.

Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced
the visible change which breaks user-space: a signal temporary unblocked
by set_user_sigmask() can be delivered even if the caller returns
success or timeout.

Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.

Eric said:

: For clarity.  I don't think this is required by posix, or fundamentally to
: remove the races in select.  It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented.  The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?

Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Eric Wong &lt;e@80x24.org&gt;
Tested-by: Eric Wong &lt;e@80x24.org&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: David Laight &lt;David.Laight@ACULAB.COM&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>epoll: use rwlock in order to reduce ep_poll_callback() contention</title>
<updated>2019-03-08T02:32:01Z</updated>
<author>
<name>Roman Penyaev</name>
<email>rpenyaev@suse.de</email>
</author>
<published>2019-03-08T00:28:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a218cc4914209ac14476cb32769b31a556355b22'/>
<id>urn:sha1:a218cc4914209ac14476cb32769b31a556355b22</id>
<content type='text'>
The goal of this patch is to reduce contention of ep_poll_callback()
which can be called concurrently from different CPUs in case of high
events rates and many fds per epoll.  Problem can be very well
reproduced by generating events (write to pipe or eventfd) from many
threads, while consumer thread does polling.  In other words this patch
increases the bandwidth of events which can be delivered from sources to
the poller by adding poll items in a lockless way to the list.

The main change is in replacement of the spinlock with a rwlock, which
is taken on read in ep_poll_callback(), and then by adding poll items to
the tail of the list using xchg atomic instruction.  Write lock is taken
everywhere else in order to stop list modifications and guarantee that
list updates are fully completed (I assume that write side of a rwlock
does not starve, it seems qrwlock implementation has these guarantees).

The following are some microbenchmark results based on the test [1]
which starts threads which generate N events each.  The test ends when
all events are successfully fetched by the poller thread:

 spinlock
 ========

 threads  events/ms  run-time ms
       8       6402        12495
      16       7045        22709
      32       7395        43268

 rwlock + xchg
 =============

 threads  events/ms  run-time ms
       8      10038         7969
      16      12178        13138
      32      13223        24199

According to the results bandwidth of delivered events is significantly
increased, thus execution time is reduced.

This patch was tested with different sort of microbenchmarks and
artificial delays (e.g.  "udelay(get_random_int() &amp; 0xff)") introduced
in kernel on paths where items are added to lists.

[1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Link: http://lkml.kernel.org/r/20190103150104.17128-5-rpenyaev@suse.de
Signed-off-by: Roman Penyaev &lt;rpenyaev@suse.de&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>epoll: unify awaking of wakeup source on ep_poll_callback() path</title>
<updated>2019-03-08T02:32:01Z</updated>
<author>
<name>Roman Penyaev</name>
<email>rpenyaev@suse.de</email>
</author>
<published>2019-03-08T00:28:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c3e320b61581ef7919269ca242ff13951ccfc763'/>
<id>urn:sha1:c3e320b61581ef7919269ca242ff13951ccfc763</id>
<content type='text'>
Original comment "Activate ep-&gt;ws since epi-&gt;ws may get deactivated at
any time" indeed sounds loud, but it is incorrect, because the path
where we check epi-&gt;ws is a path where insert to ovflist happens, i.e.
ep_scan_ready_list() has taken ep-&gt;mtx and waits for this callback to
finish, thus ep_modify() (which unregisters wakeup source) waits for
ep_scan_ready_list().

Here in this patch I simply call ep_pm_stay_awake_rcu(), which is a bit
extra for this path (indirectly protected by main ep-&gt;mtx, so even rcu
is not needed), but I do not want to create another naked
__ep_pm_stay_awake() variant only for this particular case, so rcu variant
is just better for all the cases.

Link: http://lkml.kernel.org/r/20190103150104.17128-4-rpenyaev@suse.de
Signed-off-by: Roman Penyaev &lt;rpenyaev@suse.de&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>epoll: make sure all elements in ready list are in FIFO order</title>
<updated>2019-03-08T02:32:01Z</updated>
<author>
<name>Roman Penyaev</name>
<email>rpenyaev@suse.de</email>
</author>
<published>2019-03-08T00:28:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c141175d011f18252abb9aa8b018c4e93c71d64b'/>
<id>urn:sha1:c141175d011f18252abb9aa8b018c4e93c71d64b</id>
<content type='text'>
Patch series "use rwlock in order to reduce ep_poll_callback()
contention", v3.

The last patch targets the contention problem in ep_poll_callback(),
which can be very well reproduced by generating events (write to pipe or
eventfd) from many threads, while consumer thread does polling.

The following are some microbenchmark results based on the test [1]
which starts threads which generate N events each.  The test ends when
all events are successfully fetched by the poller thread:

 spinlock
 ========

 threads  events/ms  run-time ms
       8       6402        12495
      16       7045        22709
      32       7395        43268

 rwlock + xchg
 =============

 threads  events/ms  run-time ms
       8      10038         7969
      16      12178        13138
      32      13223        24199

According to the results bandwidth of delivered events is significantly
increased, thus execution time is reduced.

This patch (of 4):

All coming events are stored in FIFO order and this is also should be
applicable to -&gt;ovflist, which originally is stack, i.e.  LIFO.

Thus to keep correct FIFO order -&gt;ovflist should reversed by adding
elements to the head of the read list but not to the tail.

Link: http://lkml.kernel.org/r/20190103150104.17128-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev &lt;rpenyaev@suse.de&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2019-01-05T17:16:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-01-05T17:16:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a65981109f294ba7e64b33ad3b4575a4636fce66'/>
<id>urn:sha1:a65981109f294ba7e64b33ad3b4575a4636fce66</id>
<content type='text'>
Merge more updates from Andrew Morton:

 - procfs updates

 - various misc bits

 - lib/ updates

 - epoll updates

 - autofs

 - fatfs

 - a few more MM bits

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (58 commits)
  mm/page_io.c: fix polled swap page in
  checkpatch: add Co-developed-by to signature tags
  docs: fix Co-Developed-by docs
  drivers/base/platform.c: kmemleak ignore a known leak
  fs: don't open code lru_to_page()
  fs/: remove caller signal_pending branch predictions
  mm/: remove caller signal_pending branch predictions
  arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
  kernel/sched/: remove caller signal_pending branch predictions
  kernel/locking/mutex.c: remove caller signal_pending branch predictions
  mm: select HAVE_MOVE_PMD on x86 for faster mremap
  mm: speed up mremap by 20x on large regions
  mm: treewide: remove unused address argument from pte_alloc functions
  initramfs: cleanup incomplete rootfs
  scripts/gdb: fix lx-version string output
  kernel/kcov.c: mark write_comp_data() as notrace
  kernel/sysctl: add panic_print into sysctl
  panic: add options to print system info when panic happens
  bfs: extra sanity checking and static inode bitmap
  exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
  ...
</content>
</entry>
<entry>
<title>fs/epoll: deal with wait_queue only once</title>
<updated>2019-01-04T21:13:46Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:27:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=86c051793b4c941ee4481725d57cf2a27f6b3aaf'/>
<id>urn:sha1:86c051793b4c941ee4481725d57cf2a27f6b3aaf</id>
<content type='text'>
There is no reason why we rearm the waitiqueue upon every fetch_events
retry (for when events are found yet send_events() fails).  If nothing
else, this saves four lock operations per retry, and furthermore reduces
the scope of the lock even further.

[akpm@linux-foundation.org: restore code to original position, fix and reflow comment]
Link: http://lkml.kernel.org/r/20181114182532.27981-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/epoll: rename check_events label to send_events</title>
<updated>2019-01-04T21:13:46Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:27:22Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=35cff1a6e0236500584a8ae227fe08120d9b5ee2'/>
<id>urn:sha1:35cff1a6e0236500584a8ae227fe08120d9b5ee2</id>
<content type='text'>
It is currently called check_events because it, well, did exactly that.
However, since the lockless ep_events_available() call, the label no
longer checks, but just sends the events.  Rename as such.

Link: http://lkml.kernel.org/r/20181114182532.27981-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/epoll: avoid barrier after an epoll_wait(2) timeout</title>
<updated>2019-01-04T21:13:46Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:27:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=abc610e01c663e25c41a3bdcbc4115cd7fbb047b'/>
<id>urn:sha1:abc610e01c663e25c41a3bdcbc4115cd7fbb047b</id>
<content type='text'>
Upon timeout, we can just exit out of the loop, without the cost of the
changing the task's state with an smp_store_mb call.  Just exit out of
the loop and be done - setting the task state afterwards will be, of
course, redundant.

[dave@stgolabs.net: forgotten fixlets]
  Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/epoll: reduce the scope of wq lock in epoll_wait()</title>
<updated>2019-01-04T21:13:46Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:27:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c5a282e9635e9c7382821565083db5d260085e3e'/>
<id>urn:sha1:c5a282e9635e9c7382821565083db5d260085e3e</id>
<content type='text'>
This patch aims at reducing ep wq.lock hold times in epoll_wait(2).  For
the blocking case, there is no need to constantly take and drop the
spinlock, which is only needed to manipulate the waitqueue.

The call to ep_events_available() is now lockless, and only exposed to
benign races.  Here, if false positive (returns available events and
does not see another thread deleting an epi from the list) we call into
send_events and then the list's state is correctly seen.  Otoh, if a
false negative and we don't see a list_add_tail(), for example, from irq
callback, then it is rechecked again before blocking, which will see the
correct state.

In order for more accuracy to see concurrent list_del_init(), use the
list_empty_careful() variant -- of course, this won't be safe against
insertions from wakeup.

For the overflow list we obviously need to prevent load/store tearing as
we don't want to see partial values while the ready list is disabled.

[dave@stgolabs.net: forgotten fixlets]
  Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-6-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Suggested-by: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/epoll: robustify ep-&gt;mtx held checks</title>
<updated>2019-01-04T21:13:46Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:27:12Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=21877e1a5b520132f54515f8835c963056418b4c'/>
<id>urn:sha1:21877e1a5b520132f54515f8835c963056418b4c</id>
<content type='text'>
Insted of just commenting how important it is, lets make it more robust
and add a lockdep_assert_held() call.

Link: http://lkml.kernel.org/r/20181108051006.18751-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
