<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/ipc/mqueue.c, branch linux-5.10.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.10.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.10.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-06-09T08:21:17Z</updated>
<entry>
<title>ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()</title>
<updated>2022-06-09T08:21:17Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-05-10T01:29:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2a9d3b51185b3f04253e5474f3178aee7ff948fd'/>
<id>urn:sha1:2a9d3b51185b3f04253e5474f3178aee7ff948fd</id>
<content type='text'>
[ Upstream commit d60c4d01a98bc1942dba6e3adc02031f5519f94b ]

When running the stress-ng clone benchmark with multiple testing threads,
it was found that there were significant spinlock contention in sget_fc().
The contended spinlock was the sb_lock.  It is under heavy contention
because the following code in the critcal section of sget_fc():

  hlist_for_each_entry(old, &amp;fc-&gt;fs_type-&gt;fs_supers, s_instances) {
      if (test(old, fc))
          goto share_extant_sb;
  }

After testing with added instrumentation code, it was found that the
benchmark could generate thousands of ipc namespaces with the
corresponding number of entries in the mqueue's fs_supers list where the
namespaces are the key for the search.  This leads to excessive time in
scanning the list for a match.

Looking back at the mqueue calling sequence leading to sget_fc():

  mq_init_ns()
  =&gt; mq_create_mount()
  =&gt; fc_mount()
  =&gt; vfs_get_tree()
  =&gt; mqueue_get_tree()
  =&gt; get_tree_keyed()
  =&gt; vfs_get_super()
  =&gt; sget_fc()

Currently, mq_init_ns() is the only mqueue function that will indirectly
call mqueue_get_tree() with a newly allocated ipc namespace as the key for
searching.  As a result, there will never be a match with the exising ipc
namespaces stored in the mqueue's fs_supers list.

So using get_tree_keyed() to do an existing ipc namespace search is just a
waste of time.  Instead, we could use get_tree_nodev() to eliminate the
useless search.  By doing so, we can greatly reduce the sb_lock hold time
and avoid the spinlock contention problem in case a large number of ipc
namespaces are present.

Of course, if the code is modified in the future to allow
mqueue_get_tree() to be called with an existing ipc namespace instead of a
new one, we will have to use get_tree_keyed() in this case.

The following stress-ng clone benchmark command was run on a 2-socket
48-core Intel system:

./stress-ng --clone 32 --verbose --oomable --metrics-brief -t 20

The "bogo ops/s" increased from 5948.45 before patch to 9137.06 after
patch. This is an increase of 54% in performance.

Link: https://lkml.kernel.org/r/20220121172315.19652-1-longman@redhat.com
Fixes: 935c6912b198 ("ipc: Convert mqueue fs to fs_context")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry</title>
<updated>2021-05-26T10:06:54Z</updated>
<author>
<name>Varad Gautam</name>
<email>varad.gautam@suse.com</email>
</author>
<published>2021-05-23T00:41:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4528c0c323085e645b8765913b4a7fd42cf49b65'/>
<id>urn:sha1:4528c0c323085e645b8765913b4a7fd42cf49b65</id>
<content type='text'>
commit a11ddb37bf367e6b5239b95ca759e5389bb46048 upstream.

do_mq_timedreceive calls wq_sleep with a stack local address.  The
sender (do_mq_timedsend) uses this address to later call pipelined_send.

This leads to a very hard to trigger race where a do_mq_timedreceive
call might return and leave do_mq_timedsend to rely on an invalid
address, causing the following crash:

  RIP: 0010:wake_q_add_safe+0x13/0x60
  Call Trace:
   __x64_sys_mq_timedsend+0x2a9/0x490
   do_syscall_64+0x80/0x680
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f5928e40343

The race occurs as:

1. do_mq_timedreceive calls wq_sleep with the address of `struct
   ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
   holds a valid `struct ext_wait_queue *` as long as the stack has not
   been overwritten.

2. `ewq_addr` gets added to info-&gt;e_wait_q[RECV].list in wq_add, and
   do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
   __pipelined_op.

3. Sender calls __pipelined_op::smp_store_release(&amp;this-&gt;state,
   STATE_READY).  Here is where the race window begins.  (`this` is
   `ewq_addr`.)

4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
   will see `state == STATE_READY` and break.

5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
   to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
   stack.  (Although the address may not get overwritten until another
   function happens to touch it, which means it can persist around for an
   indefinite time.)

6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
   `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
   the wake_q_add_safe call.  In the lucky case where nothing has
   overwritten `ewq_addr` yet, `ewq_addr-&gt;task` is the right task_struct.
   In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
   bogus address as the receiver's task_struct causing the crash.

do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add_safe on the receiver's
task_struct returned by get_task_struct, instead of dereferencing `this`
which sits on the receiver's stack.

As Manfred pointed out, the race potentially also exists in
ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
those in the same way.

Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers")
Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers")
Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers")
Signed-off-by: Varad Gautam &lt;varad.gautam@suse.com&gt;
Reported-by: Matthias von Faber &lt;matthias.vonfaber@aox-tech.de&gt;
Acked-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()</title>
<updated>2020-05-08T02:27:20Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2020-05-08T01:35:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b5f2006144c6ae941726037120fa1001ddede784'/>
<id>urn:sha1:b5f2006144c6ae941726037120fa1001ddede784</id>
<content type='text'>
Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.

Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().

This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?

Test-case:

	#include &lt;signal.h&gt;
	#include &lt;mqueue.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;assert.h&gt;

	static int notified;

	static void sigh(int sig)
	{
		notified = 1;
	}

	int main(void)
	{
		signal(SIGIO, sigh);

		int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL);
		assert(fd &gt;= 0);

		struct sigevent se = {
			.sigev_notify	= SIGEV_SIGNAL,
			.sigev_signo	= SIGIO,
		};
		assert(mq_notify(fd, &amp;se) == 0);

		if (!fork()) {
			assert(setuid(1) == 0);
			mq_send(fd, "",1,0);
			return 0;
		}

		wait(NULL);
		mq_unlink("/mq");
		assert(notified);
		return 0;
	}

[manfred@colorfullife.com: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere]
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Reported-by: Yoji &lt;yoji.fujihar.min@gmail.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Cc: &lt;1vier1@web.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/e2a782e4-eab9-4f5c-c749-c07a8f7a4e66@colorfullife.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue.c: fix a brace coding style issue</title>
<updated>2020-04-07T17:43:45Z</updated>
<author>
<name>Somala Swaraj</name>
<email>somalaswaraj@gmail.com</email>
</author>
<published>2020-04-07T03:12:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=43afe4d3661b04915aa357c5ab0f2a08718af9fd'/>
<id>urn:sha1:43afe4d3661b04915aa357c5ab0f2a08718af9fd</id>
<content type='text'>
Signed-off-by: somala swaraj &lt;somalaswaraj@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20200301135530.18340-1-somalaswaraj@gmail.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue.c: update/document memory barriers</title>
<updated>2020-02-04T03:05:23Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2020-02-04T01:34:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c5b2cbdbdac563f46ecd5e187253ab1abbd6fc04'/>
<id>urn:sha1:c5b2cbdbdac563f46ecd5e187253ab1abbd6fc04</id>
<content type='text'>
Update and document memory barriers for mqueue.c:

- ewp-&gt;state is read without any locks, thus READ_ONCE is required.

- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
  acquire semantics if the value is STATE_READY.

- use wake_q_add_safe()

- document why __set_current_state() may be used:
  Reading task-&gt;state cannot happen before the wake_q_add() call,
  which happens while holding info-&gt;lock. Thus the spin_unlock()
  is the RELEASE, and the spin_lock() is the ACQUIRE.

For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE

Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: &lt;1vier1@web.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue.c: remove duplicated code</title>
<updated>2020-02-04T03:05:23Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2020-02-04T01:34:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ed29f171518cbe11c81e8c20d393bb094a9e2ce7'/>
<id>urn:sha1:ed29f171518cbe11c81e8c20d393bb094a9e2ce7</id>
<content type='text'>
pipelined_send() and pipelined_receive() are identical, so merge them.

[manfred@colorfullife.com: add changelog]
Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: &lt;1vier1@web.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue: improve exception handling in do_mq_notify()</title>
<updated>2019-09-26T00:51:41Z</updated>
<author>
<name>Markus Elfring</name>
<email>elfring@users.sourceforge.net</email>
</author>
<published>2019-09-25T23:48:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c231740dd95e854de5034cff8f49737d942bc098'/>
<id>urn:sha1:c231740dd95e854de5034cff8f49737d942bc098</id>
<content type='text'>
Null pointers were assigned to local variables in a few cases as exception
handling.  The jump target “out” was used where no meaningful data
processing actions should eventually be performed by branches of an if
statement then.  Use an additional jump target for calling dev_kfree_skb()
directly.

Return also directly after error conditions were detected when no extra
clean-up is needed by this function implementation.

Link: http://lkml.kernel.org/r/592ef10e-0b69-72d0-9789-fc48f638fdfd@web.de
Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue.c: delete an unnecessary check before the macro call dev_kfree_skb()</title>
<updated>2019-09-26T00:51:41Z</updated>
<author>
<name>Markus Elfring</name>
<email>elfring@users.sourceforge.net</email>
</author>
<published>2019-09-25T23:48:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=97b0b1ad58fab9f71b1a6bc056a09af6065ec3bc'/>
<id>urn:sha1:97b0b1ad58fab9f71b1a6bc056a09af6065ec3bc</id>
<content type='text'>
dev_kfree_skb() input parameter validation, thus the test around the call
is not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/07477187-63e5-cc80-34c1-32dd16b38e12@web.de
Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>new helper: get_tree_keyed()</title>
<updated>2019-09-05T18:34:22Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-09-03T23:05:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=533770cc0ae84890624dc129609f3d75855c8982'/>
<id>urn:sha1:533770cc0ae84890624dc129609f3d75855c8982</id>
<content type='text'>
For vfs_get_keyed_super users.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2019-07-19T17:42:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-19T17:42:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=933a90bf4f3505f8ec83bda21a3c7d70d7c2b426'/>
<id>urn:sha1:933a90bf4f3505f8ec83bda21a3c7d70d7c2b426</id>
<content type='text'>
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
</content>
</entry>
</feed>
