<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/exec.c, branch linux-4.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2017-07-31T17:37:55Z</updated>
<entry>
<title>exec: Limit arg stack to at most 75% of _STK_LIM</title>
<updated>2017-07-31T17:37:55Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-07-07T18:57:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d5e990d96459decaa24e5cb7918451f12c8582a9'/>
<id>urn:sha1:d5e990d96459decaa24e5cb7918451f12c8582a9</id>
<content type='text'>
[ Upstream commit da029c11e6b12f321f36dac8771e833b65cec962 ]

To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
</content>
</entry>
<entry>
<title>fs/exec.c: account for argv/envp pointers</title>
<updated>2017-07-31T17:37:47Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-06-23T22:08:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a9cea2f13c889658769ba50d46cb0e88900e6795'/>
<id>urn:sha1:a9cea2f13c889658769ba50d46cb0e88900e6795</id>
<content type='text'>
[ Upstream commit 98da7d08850fb8bdeb395d6368ed15753304aa0c ]

When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included.  This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.

For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).

The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely.  Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).

[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea39318 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Qualys Security Advisory &lt;qsa@qualys.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
</content>
</entry>
<entry>
<title>ptrace: Capture the ptracer's creds not PT_PTRACE_CAP</title>
<updated>2017-03-03T02:51:40Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-11-15T00:48:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6d2374517c06dd8ae4b478d355817563d1043317'/>
<id>urn:sha1:6d2374517c06dd8ae4b478d355817563d1043317</id>
<content type='text'>
[ Upstream commit 64b875f7ac8a5d60a4e191479299e931ee949b67 ]

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked.  This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in.  This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid.  More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Cc: stable@vger.kernel.org
Fixes: 4214e42f96d4 ("v2.4.9.11 -&gt; v2.4.9.12")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
</content>
</entry>
<entry>
<title>fs: exec: apply CLOEXEC before changing dumpable task flags</title>
<updated>2017-01-13T01:56:58Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>asarai@suse.de</email>
</author>
<published>2016-12-21T05:26:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=363f1a90b7f3057efa6e08bf7b707874232f153c'/>
<id>urn:sha1:363f1a90b7f3057efa6e08bf7b707874232f153c</id>
<content type='text'>
[ Upstream commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 ]

If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/&lt;pid&gt;/fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-&gt; proc_pid_link_inode_operations.get_link
   -&gt; proc_pid_get_link
      -&gt; proc_fd_access_allowed
         -&gt; ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Cc: &lt;stable@vger.kernel.org&gt; # v3.2+
Reported-by: Michael Crosby &lt;crosbymichael@gmail.com&gt;
Signed-off-by: Aleksa Sarai &lt;asarai@suse.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
</content>
</entry>
<entry>
<title>parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures</title>
<updated>2015-05-12T20:03:44Z</updated>
<author>
<name>Helge Deller</name>
<email>deller@gmx.de</email>
</author>
<published>2015-05-11T20:01:27Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d045c77c1a69703143a36169c224429c48b9eecd'/>
<id>urn:sha1:d045c77c1a69703143a36169c224429c48b9eecd</id>
<content type='text'>
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).

The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.

This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).

This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.

The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).

Signed-off-by: Helge Deller &lt;deller@gmx.de&gt;
Cc: linux-parisc@vger.kernel.org
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: linux-metag@vger.kernel.org
Cc: stable@vger.kernel.org # v3.16+
</content>
</entry>
<entry>
<title>fs: take i_mutex during prepare_binprm for set[ug]id executables</title>
<updated>2015-04-19T20:46:21Z</updated>
<author>
<name>Jann Horn</name>
<email>jann@thejh.net</email>
</author>
<published>2015-04-19T00:48:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8b01fc86b9f425899f8a3a8fc1c47d73c2c20543'/>
<id>urn:sha1:8b01fc86b9f425899f8a3a8fc1c47d73c2c20543</id>
<content type='text'>
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/exec.c:de_thread: move notify_count write under lock</title>
<updated>2015-04-17T13:04:07Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@parallels.com</email>
</author>
<published>2015-04-16T19:48:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dfcce791fb0ad06f3f0b745a23160b9d8858fe39'/>
<id>urn:sha1:dfcce791fb0ad06f3f0b745a23160b9d8858fe39</id>
<content type='text'>
We set sig-&gt;notify_count = -1 between RELEASE and ACQUIRE operations:

	spin_unlock_irq(lock);
	...
	if (!thread_group_leader(tsk)) {
		...
                for (;;) {
			sig-&gt;notify_count = -1;
                        write_lock_irq(&amp;tasklist_lock);

There are no restriction on it so other processors may see this STORE
mixed with other STOREs in both areas limited by the spinlocks.

Probably, it may be reordered with the above

	sig-&gt;group_exit_task = tsk;
	sig-&gt;notify_count = zap_other_threads(tsk);

in some way.

Set it under tasklist_lock locked to be sure nothing will be reordered.

Signed-off-by: Kirill Tkhai &lt;ktkhai@parallels.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: avoid using mmap_sem for exe_file serialization</title>
<updated>2015-04-17T13:04:07Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2015-04-16T19:47:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6e399cd144d8500ffb5d40fa6848890e2580a80a'/>
<id>urn:sha1:6e399cd144d8500ffb5d40fa6848890e2580a80a</id>
<content type='text'>
Oleg cleverly suggested using xchg() to set the new mm-&gt;exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case.  For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe.  As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock.  Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: create proper filename objects using getname_kernel()</title>
<updated>2015-01-23T05:22:20Z</updated>
<author>
<name>Paul Moore</name>
<email>pmoore@redhat.com</email>
</author>
<published>2015-01-22T05:00:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5168910413830435fa3f0a593933a83721ec8bad'/>
<id>urn:sha1:5168910413830435fa3f0a593933a83721ec8bad</id>
<content type='text'>
There are several areas in the kernel that create temporary filename
objects using the following pattern:

	int func(const char *name)
	{
		struct filename *file = { .name = name };
		...
		return 0;
	}

... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function.  This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.

Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:

 * https://lkml.org/lkml/2015/1/20/710

This patch includes code that was either based on, or directly written
by Al in the above thread.

CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>syscalls: implement execveat() system call</title>
<updated>2014-12-13T20:42:51Z</updated>
<author>
<name>David Drysdale</name>
<email>drysdale@google.com</email>
</author>
<published>2014-12-13T00:57:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=51f39a1f0cea1cacf8c787f652f26dfee9611874'/>
<id>urn:sha1:51f39a1f0cea1cacf8c787f652f26dfee9611874</id>
<content type='text'>
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/&lt;fd&gt;" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/&lt;fd&gt;"
(for an empty filename) or "/dev/fd/&lt;fd&gt;/&lt;filename&gt;", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale &lt;drysdale@google.com&gt;
Cc: Meredydd Luff &lt;meredydd@senatehouse.org&gt;
Cc: Shuah Khan &lt;shuah.kh@samsung.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Rich Felker &lt;dalias@aerifal.cx&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
