<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/tools/testing/selftests/proc/.gitignore, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-10-03T01:44:54Z</updated>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-10-03T01:44:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-10-03T01:44:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e406d57be7bd2a4e73ea512c1ae36a40a44e499e'/>
<id>urn:sha1:e406d57be7bd2a4e73ea512c1ae36a40a44e499e</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
   completes the removal of this legacy IDR API

 - "panic: introduce panic status function family" from Jinchao Wang
   provides a number of cleanups to the panic code and its various
   helpers, which were rather ad-hoc and scattered all over the place

 - "tools/delaytop: implement real-time keyboard interaction support"
   from Fan Yu adds a few nice user-facing usability changes to the
   delaytop monitoring tool

 - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
   Petrongonas fixes a panic which was happening with the combination of
   EFI and KHO

 - "Squashfs: performance improvement and a sanity check" from Phillip
   Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
   150x speedup was measured for a well-chosen microbenchmark

 - plus another 50-odd singleton patches all over the place

* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
  Squashfs: reject negative file sizes in squashfs_read_inode()
  kallsyms: use kmalloc_array() instead of kmalloc()
  MAINTAINERS: update Sibi Sankar's email address
  Squashfs: add SEEK_DATA/SEEK_HOLE support
  Squashfs: add additional inode sanity checking
  lib/genalloc: fix device leak in of_gen_pool_get()
  panic: remove CONFIG_PANIC_ON_OOPS_VALUE
  ocfs2: fix double free in user_cluster_connect()
  checkpatch: suppress strscpy warnings for userspace tools
  cramfs: fix incorrect physical page address calculation
  kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
  Squashfs: fix uninit-value in squashfs_get_parent
  kho: only fill kimage if KHO is finalized
  ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
  kernel/sys.c: fix the racy usage of task_lock(tsk-&gt;group_leader) in sys_prlimit64() paths
  sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
  coccinelle: platform_no_drv_owner: handle also built-in drivers
  coccinelle: of_table: handle SPI device ID tables
  lib/decompress: use designated initializers for struct compress_format
  efi: support booting with kexec handover (KHO)
  ...
</content>
</entry>
<entry>
<title>proc: test lseek on /proc/net/dev</title>
<updated>2025-09-14T00:32:48Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2025-08-19T18:19:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b1e34412998d628dfa8ba3da042bb60dee232b6c'/>
<id>urn:sha1:b1e34412998d628dfa8ba3da042bb60dee232b6c</id>
<content type='text'>
This line in tools/testing/selftests/proc/read.c was added to catch
oopses, not to verify lseek correctness:

        (void)lseek(fd, 0, SEEK_SET);

Oh, well. Prevent more embarassement with simple test.

Link: https://lkml.kernel.org/r/aKTCfMuRXOpjBXxI@p183
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>selftests/proc: add tests for new pidns APIs</title>
<updated>2025-09-02T09:40:20Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2025-08-05T05:45:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5554d820f71c72fbe64e12c3d171908c5ef7257d'/>
<id>urn:sha1:5554d820f71c72fbe64e12c3d171908c5ef7257d</id>
<content type='text'>
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-4-705f984940e7@cyphar.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests/proc: add /proc/pid/maps tearing from vma split test</title>
<updated>2025-07-25T02:12:36Z</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2025-07-19T18:28:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=beb69e81724634063b9dbae4bc79e2e011fdeeb1'/>
<id>urn:sha1:beb69e81724634063b9dbae4bc79e2e011fdeeb1</id>
<content type='text'>
Patch series "use per-vma locks for /proc/pid/maps reads", v8.

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any
other task from concurrently modifying the address space.  This guarantees
coherent reporting of virtual address ranges, however it can block
important updates from happening.  Oftentimes /proc/pid/maps readers are
low priority monitoring tasks and them blocking high priority tasks
results in priority inversion.

Locking the entire address space is required to present fully coherent
picture of the address space, however even current implementation does not
strictly guarantee that by outputting vmas in page-size chunks and
dropping mmap_lock in between each chunk.  Address space modifications are
possible while mmap_lock is dropped and userspace reading the content is
expected to deal with possible concurrent address space modifications. 
Considering these relaxed rules, holding mmap_lock is not strictly needed
as long as we can guarantee that a concurrently modified vma is reported
either in its original form or after it was modified.

This patchset switches from holding mmap_lock while reading /proc/pid/maps
to taking per-vma locks as we walk the vma tree.  This reduces the
contention with tasks modifying the address space because they would have
to contend for the same vma as opposed to the entire address space. 
Previous version of this patchset [1] tried to perform /proc/pid/maps
reading under RCU, however its implementation is quite complex and the
results are worse than the new version because it still relied on
mmap_lock speculation which retries if any part of the address space gets
modified.  New implementaion is both simpler and results in less
contention.  Note that similar approach would not work for /proc/pid/smaps
reading as it also walks the page table and that's not RCU-safe.

Paul McKenney's designed a test [2] to measure mmap/munmap latencies while
concurrently reading /proc/pid/maps.  The test has a pair of processes
scanning /proc/PID/maps, and another process unmapping and remapping 4K
pages from a 128MB range of anonymous memory.  At the end of each 10
second run, the latency of each mmap() or munmap() operation is measured,
and for each run the maximum and mean latency is printed.  The map/unmap
process is started first, its PID is passed to the scanners, and then the
map/unmap process waits until both scanners are running before starting
its timed test.  The scanners keep scanning until the specified
/proc/PID/maps file disappears.

The latest results from Paul:
Stock mm-unstable, all of the runs had maximum latencies in excess of 0.5
milliseconds, and with 80% of the runs' latencies exceeding a full
millisecond, and ranging up beyond 4 full milliseconds.  In contrast, 99%
of the runs with this patch series applied had maximum latencies of less
than 0.5 milliseconds, with the single outlier at only 0.608 milliseconds.

From a median-performance (as opposed to maximum-latency) viewpoint, this
patch series also looks good, with stock mm weighing in at 11 microseconds
and patch series at 6 microseconds, better than a 2x improvement.

Before the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
    0.011     0.008     0.521
    0.011     0.008     0.552
    0.011     0.008     0.590
    0.011     0.008     0.660
    ...
    0.011     0.015     2.987
    0.011     0.015     3.038
    0.011     0.016     3.431
    0.011     0.016     4.707

After the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
    0.006     0.005     0.026
    0.006     0.005     0.029
    0.006     0.005     0.034
    0.006     0.005     0.035
    ...
    0.006     0.006     0.421
    0.006     0.006     0.423
    0.006     0.006     0.439
    0.006     0.006     0.608

The patchset also adds a number of tests to check for /proc/pid/maps data
coherency.  They are designed to detect any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap).  Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages.  An example of user-visible inconsistency
can be that the same vma is printed twice: once before it was modified and
then after the modifications.  For example if vma was extended, it might
be found and reported twice.  What is not expected is to see a gap where
there should have been a vma both before and after modification.  This
patchset increases the chances of such tearing, therefore it's even more
important now to test for unexpected inconsistencies.

In [3] Lorenzo identified the following possible vma merging/splitting
scenarios:

Merges with changes to existing vmas:
1 Merge both - mapping a vma over another one and between two vmas which
can be merged after this replacement;
2. Merge left full - mapping a vma at the end of an existing one and
completely over its right neighbor;
3. Merge left partial - mapping a vma at the end of an existing one and
partially over its right neighbor;
4. Merge right full - mapping a vma before the start of an existing one
and completely over its left neighbor;
5. Merge right partial - mapping a vma before the start of an existing one
and partially over its left neighbor;

Merges without changes to existing vmas:
6. Merge both - mapping a vma into a gap between two vmas which can be
merged after the insertion;
7. Merge left - mapping a vma at the end of an existing one;
8. Merge right - mapping a vma before the start end of an existing one;

Splits
9. Split with new vma at the lower address;
10. Split with new vma at the higher address;

If such merges or splits happen concurrently with the /proc/maps reading
we might report a vma twice, once before the modification and once after
it is modified:

Case 1 might report overwritten and previous vma along with the final
merged vma;
Case 2 might report previous and the final merged vma;
Case 3 might cause us to retry once we detect the temporary gap caused by
shrinking of the right neighbor;
Case 4 might report overritten and the final merged vma;
Case 5 might cause us to retry once we detect the temporary gap caused by
shrinking of the left neighbor;
Case 6 might report previous vma and the gap along with the final marged
vma;
Case 7 might report previous and the final merged vma;
Case 8 might report the original gap and the final merged vma covering the
gap;
Case 9 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma start;
Case 10 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma end;

In all these cases the retry mechanism prevents us from reporting possible
temporary gaps.

[1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/
[2] https://github.com/paulmckrcu/proc-mmap_sem-test
[3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/


The /proc/pid/maps file is generated page by page, with the mmap_lock
released between pages.  This can lead to inconsistent reads if the
underlying vmas are concurrently modified.  For instance, if a vma split
or merge occurs at a page boundary while /proc/pid/maps is being read, the
same vma might be seen twice: once before and once after the change.  This
duplication is considered acceptable for userspace handling.  However,
observing a "hole" where a vma should be (e.g., due to a vma being
replaced and the space temporarily being empty) is unacceptable.

Implement a test that:
1. Forks a child process which continuously modifies its address
   space, specifically targeting a vma at the boundary between two pages.
2. The parent process repeatedly reads the child's /proc/pid/maps.
3. The parent process checks the last vma of the first page and the
   first vma of the second page for consistency, looking for the effects
   of vma splits or merges.

The test duration is configurable via DURATION environment variable
expressed in seconds.  The default test duration is 5 seconds.

Example Command: DURATION=10 ./proc-maps-race

Link: https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [1]
Link: https://github.com/paulmckrcu/proc-mmap_sem-test [2]
Link: https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ [3]
Link: https://lkml.kernel.org/r/20250719182854.3166724-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250719182854.3166724-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jeongjun Park &lt;aha310510@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: "Paul E . McKenney" &lt;paulmck@kernel.org&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Thomas Weißschuh &lt;linux@weissschuh.net&gt;
Cc: T.J. Mercier &lt;tjmercier@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Ye Bin &lt;yebin10@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: test "Kthread:" field</title>
<updated>2024-06-25T05:25:08Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2024-06-07T15:14:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e7679a5c820d0865d0900931a7341048d830917d'/>
<id>urn:sha1:e7679a5c820d0865d0900931a7341048d830917d</id>
<content type='text'>
/proc/${pid}/status got Kthread field recently.

Test that userspace program is not reported as kernel thread.

Test that kernel thread is reported as kernel thread.
Use kthreadd with pid 2 for this.

Link: https://lkml.kernel.org/r/818c4c41-8668-4566-97a9-7254abf819ee@p183
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Chunguang Wu &lt;fullspring2018@gmail.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: test how it holds up with mapping'less process</title>
<updated>2022-10-12T01:51:11Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2022-10-05T20:14:00Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5bc73bb3451b9e449828694733a4c6b413ceeb3b'/>
<id>urn:sha1:5bc73bb3451b9e449828694733a4c6b413ceeb3b</id>
<content type='text'>
Create process without mappings and check

	/proc/*/maps
	/proc/*/numa_maps
	/proc/*/smaps
	/proc/*/smaps_rollup

They must be empty (excluding vsyscall page) or full of zeroes.

Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:

[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G        W         5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0

Note 1:
	ProtectionKey field in /proc/*/smaps is optional,
	so check most of its contents, not everything.

Note 2:
	due to the nature of this test, child process hardly can signal
	its readiness (after unmapping everything!) to parent.
	I feel like "sleep(1)" is justified.
	If you know how to do it without sleep please tell me.

Note 3:
	/proc/*/statm is not tested but can be.

Link: https://lkml.kernel.org/r/Yz3liL6Dn+n2SD8Q@localhost.localdomain
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>procfs: do not list TID 0 in /proc/&lt;pid&gt;/task</title>
<updated>2021-11-09T18:02:48Z</updated>
<author>
<name>Florian Weimer</name>
<email>fweimer@redhat.com</email>
</author>
<published>2021-11-09T02:31:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0658a0961b0ace06b4cf0e1b73a4f20e349f4346'/>
<id>urn:sha1:0658a0961b0ace06b4cf0e1b73a4f20e349f4346</id>
<content type='text'>
If a task exits concurrently, task_pid_nr_ns may return 0.

[akpm@linux-foundation.org: coding style tweaks]
[adobriyan@gmail.com: test that /proc/*/task doesn't contain "0"]
  Link: https://lkml.kernel.org/r/YV88AnVzHxPafQ9o@localhost.localdomain

Link: https://lkml.kernel.org/r/8735pn5dx7.fsf@oldenburg.str.redhat.com
Signed-off-by: Florian Weimer &lt;fweimer@redhat.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: add .gitignore for proc-subset-pid selftest</title>
<updated>2021-06-05T15:58:11Z</updated>
<author>
<name>David Matlack</name>
<email>dmatlack@google.com</email>
</author>
<published>2021-06-05T03:01:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=263e88d678baa1a2e3f2d5afbdcd9fd3feb80a4d'/>
<id>urn:sha1:263e88d678baa1a2e3f2d5afbdcd9fd3feb80a4d</id>
<content type='text'>
This new selftest needs an entry in the .gitignore file otherwise git
will try to track the binary.

Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
Fixes: 268af17ada5855 ("selftests: proc: test subset=pid")
Signed-off-by: David Matlack &lt;dmatlack@google.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: use human-readable values for hidepid</title>
<updated>2020-04-22T15:51:22Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1c6c4d112e81a919d4ea83ec6cbc2f55203217fd'/>
<id>urn:sha1:1c6c4d112e81a919d4ea83ec6cbc2f55203217fd</id>
<content type='text'>
The hidepid parameter values are becoming more and more and it becomes
difficult to remember what each new magic number means.

Backward compatibility is preserved since it is possible to specify
numerical value for the hidepid parameter. This does not break the
fsconfig since it is not possible to specify a numerical value through
it. All numeric values are converted to a string. The type
FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.

Selftest has been added to verify this behavior.

Suggested-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: allow to mount many instances of proc in one pid namespace</title>
<updated>2020-04-22T15:51:21Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fa10fed30f2550313a8284365b3e2398526eb42c'/>
<id>urn:sha1:fa10fed30f2550313a8284365b3e2398526eb42c</id>
<content type='text'>
This patch allows to have multiple procfs instances inside the
same pid namespace. The aim here is lightweight sandboxes, and to allow
that we have to modernize procfs internals.

1) The main aim of this work is to have on embedded systems one
supervisor for apps. Right now we have some lightweight sandbox support,
however if we create pid namespacess we have to manages all the
processes inside too, where our goal is to be able to run a bunch of
apps each one inside its own mount namespace without being able to
notice each other. We only want to use mount namespaces, and we want
procfs to behave more like a real mount point.

2) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/&lt;pids&gt;/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/&lt;pids&gt;/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which users can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/&lt;pid&gt;/ is a Linux ABI using
filesystem syscalls. With this change LSMs should be able to analyze
open/read/write/close...

In the new patch set version I removed the 'newinstance' option
as suggested by Eric W. Biederman.

Selftest has been added to verify new behavior.

Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
