<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/proc/internal.h, branch linux-4.16.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.16.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.16.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2018-02-07T02:32:43Z</updated>
<entry>
<title>proc: rearrange args</title>
<updated>2018-02-07T02:32:43Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-02-06T23:37:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=93ad5bc6d4addb74e30d421cd3ba5249c961fb3e'/>
<id>urn:sha1:93ad5bc6d4addb74e30d421cd3ba5249c961fb3e</id>
<content type='text'>
Rearrange args for smaller code.

lookup revolves around memcmp() which gets len 3rd arg, so propagate
length as 3rd arg.

readdir and lookup add additional arg to VFS -&gt;readdir and -&gt;lookup, so
better add it to the end.

Space savings on x86_64:

	add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-18 (-18)
	Function                                     old     new   delta
	proc_readdir                                  22      13      -9
	proc_lookup                                   18       9      -9

proc_match() is smaller if not inlined, I promise!

Link: http://lkml.kernel.org/r/20180104175958.GB5204@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/proc/internal.h: fix up comment</title>
<updated>2018-02-07T02:32:43Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-02-06T23:37:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=53f63345d893df36b58e81ddb3d11dcd2e9cc966'/>
<id>urn:sha1:53f63345d893df36b58e81ddb3d11dcd2e9cc966</id>
<content type='text'>
Document what -&gt;pde_unload_lock actually does.

Link: http://lkml.kernel.org/r/20180103185120.GB31849@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/proc/internal.h: rearrange struct proc_dir_entry</title>
<updated>2018-02-07T02:32:43Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-02-06T23:37:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=163cf548db888710695d5dbe907cda4262d45b52'/>
<id>urn:sha1:163cf548db888710695d5dbe907cda4262d45b52</id>
<content type='text'>
struct proc_dir_entry became bit messy over years:

* move 16-bit -&gt;mode_t before namelen to get rid of padding
* make -&gt;in_use first field: it seems to be most used resulting in
  smaller code on x86_64 (defconfig):

	add/remove: 0/0 grow/shrink: 7/13 up/down: 24/-67 (-43)
	Function                                     old     new   delta
	proc_readdir_de                              451     455      +4
	proc_get_inode                               282     286      +4
	pde_put                                       65      69      +4
	remove_proc_subtree                          294     297      +3
	remove_proc_entry                            297     300      +3
	proc_register                                295     298      +3
	proc_notify_change                            94      97      +3
	unuse_pde                                     27      26      -1
	proc_reg_write                                89      85      -4
	proc_reg_unlocked_ioctl                       85      81      -4
	proc_reg_read                                 89      85      -4
	proc_reg_llseek                               87      83      -4
	proc_reg_get_unmapped_area                   123     119      -4
	proc_entry_rundown                           139     135      -4
	proc_reg_poll                                 91      85      -6
	proc_reg_mmap                                 79      73      -6
	proc_get_link                                 55      49      -6
	proc_reg_release                             108     101      -7
	proc_reg_open                                298     291      -7
	close_pdeo                                   228     218     -10

* move writeable fields together to a first cacheline (on x86_64),
  those include
	* -&gt;in_use: reference count, taken every open/read/write/close etc
	* -&gt;count: reference count, taken at readdir on every entry
	* -&gt;pde_openers: tracks (nearly) every open, dirtied
	* -&gt;pde_unload_lock: spinlock protecting -&gt;pde_openers
	* -&gt;proc_iops, -&gt;proc_fops, -&gt;data: writeonce fields,
	  used right together with previous group.

* other rarely written fields go into 1st/2nd and 2nd/3rd cacheline on
  32-bit and 64-bit respectively.

Additionally on 32-bit, -&gt;subdir, -&gt;subdir_node, -&gt;namelen, -&gt;name go
fully into 2nd cacheline, separated from writeable fields.  They are all
used during lookup.

Link: http://lkml.kernel.org/r/20171220215914.GA7877@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2017-11-23T06:20:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-23T06:20:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=275327851e5c3e71bc73eaee7f065f22b2d1fe6c'/>
<id>urn:sha1:275327851e5c3e71bc73eaee7f065f22b2d1fe6c</id>
<content type='text'>
Pull mode_t whack-a-mole from Al Viro:
 "For all internal uses we want umode_t, which is arch-independent;
  mode_t (or __kernel_mode_t, for that matter) is wrong outside of
  userland ABI.

  Unfortunately, that crap keeps coming back and needs to be put down
  from time to time..."

* 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mode_t whack-a-mole: task_dump_owner()
</content>
</entry>
<entry>
<title>proc: : uninline name_to_int()</title>
<updated>2017-11-18T00:10:00Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2017-11-17T23:26:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3ee2a19908f27b8fea8ff14ffa8b755585eb7b4a'/>
<id>urn:sha1:3ee2a19908f27b8fea8ff14ffa8b755585eb7b4a</id>
<content type='text'>
Save ~360 bytes.

	add/remove: 1/0 grow/shrink: 0/4 up/down: 104/-463 (-359)
	function                                     old     new   delta
	name_to_int                                    -     104    +104
	proc_pid_lookup                              217     126     -91
	proc_lookupfd_common                         212     121     -91
	proc_task_lookup                             289     194     -95
	__proc_create                                588     402    -186

Link: http://lkml.kernel.org/r/20170912194850.GA17730@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mode_t whack-a-mole: task_dump_owner()</title>
<updated>2017-09-30T18:45:42Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2017-09-30T18:45:42Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c6eb50d2790478d8b5841379b9502812a5e5feb3'/>
<id>urn:sha1:c6eb50d2790478d8b5841379b9502812a5e5feb3</id>
<content type='text'>
should be umode_t...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>procfs: use faster rb_first_cached()</title>
<updated>2017-09-09T01:26:49Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2017-09-08T23:15:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=410bd5ecb276593e7ec1552014083215d4a43c3a'/>
<id>urn:sha1:410bd5ecb276593e7ec1552014083215d4a43c3a</id>
<content type='text'>
...  such that we can avoid the tree walks to get the node with the
smallest key.  Semantically the same, as the previously used rb_first(),
but O(1).  The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for procfs.

Link: http://lkml.kernel.org/r/20170719014603.19029-14-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add /proc/pid/smaps_rollup</title>
<updated>2017-09-07T00:27:30Z</updated>
<author>
<name>Daniel Colascione</name>
<email>dancol@google.com</email>
</author>
<published>2017-09-06T23:25:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317'/>
<id>urn:sha1:493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317</id>
<content type='text'>
/proc/pid/smaps_rollup is a new proc file that improves the performance
of user programs that determine aggregate memory statistics (e.g., total
PSS) of a process.

Android regularly "samples" the memory usage of various processes in
order to balance its memory pool sizes.  This sampling process involves
opening /proc/pid/smaps and summing certain fields.  For very large
processes, sampling memory use this way can take several hundred
milliseconds, due mostly to the overhead of the seq_printf calls in
task_mmu.c.

smaps_rollup improves the situation.  It contains most of the fields of
/proc/pid/smaps, but instead of a set of fields for each VMA,
smaps_rollup instead contains one synthetic smaps-format entry
representing the whole process.  In the single smaps_rollup synthetic
entry, each field is the summation of the corresponding field in all of
the real-smaps VMAs.  Using a common format for smaps_rollup and smaps
allows userspace parsers to repurpose parsers meant for use with
non-rollup smaps for smaps_rollup, and it allows userspace to switch
between smaps_rollup and smaps at runtime (say, based on the
availability of smaps_rollup in a given kernel) with minimal fuss.

By using smaps_rollup instead of smaps, a caller can avoid the
significant overhead of formatting, reading, and parsing each of a large
process's potentially very numerous memory mappings.  For sampling
system_server's PSS in Android, we measured a 12x speedup, representing
a savings of several hundred milliseconds.

One alternative to a new per-process proc file would have been including
PSS information in /proc/pid/status.  We considered this option but
thought that PSS would be too expensive (by a few orders of magnitude)
to collect relative to what's already emitted as part of
/proc/pid/status, and slowing every user of /proc/pid/status for the
sake of readers that happen to want PSS feels wrong.

The code itself works by reusing the existing VMA-walking framework we
use for regular smaps generation and keeping the mem_size_stats
structure around between VMA walks instead of using a fresh one for each
VMA.  In this way, summation happens automatically.  We let seq_file
walk over the VMAs just as it does for regular smaps and just emit
nothing to the seq_file until we hit the last VMA.

Benchmarks:

    using smaps:
    iterations:1000 pid:1163 pss:220023808
    0m29.46s real 0m08.28s user 0m20.98s system

    using smaps_rollup:
    iterations:1000 pid:1163 pss:220702720
    0m04.39s real 0m00.03s user 0m04.31s system

We're using the PSS samples we collect asynchronously for
system-management tasks like fine-tuning oom_adj_score, memory use
tracking for debugging, application-level memory-use attribution, and
deciding whether we want to kill large processes during system idle
maintenance windows.  Android has been using PSS for these purposes for
a long time; as the average process VMA count has increased and and
devices become more efficiency-conscious, PSS-collection inefficiency
has started to matter more.  IMHO, it'd be a lot safer to optimize the
existing PSS-collection model, which has been fine-tuned over the years,
instead of changing the memory tracking approach entirely to work around
smaps-generation inefficiency.

Tim said:

: There are two main reasons why Android gathers PSS information:
:
: 1. Android devices can show the user the amount of memory used per
:    application via the settings app.  This is a less important use case.
:
: 2. We log PSS to help identify leaks in applications.  We have found
:    an enormous number of bugs (in the Android platform, in Google's own
:    apps, and in third-party applications) using this data.
:
: To do this, system_server (the main process in Android userspace) will
: sample the PSS of a process three seconds after it changes state (for
: example, app is launched and becomes the foreground application) and about
: every ten minutes after that.  The net result is that PSS collection is
: regularly running on at least one process in the system (usually a few
: times a minute while the screen is on, less when screen is off due to
: suspend).  PSS of a process is an incredibly useful stat to track, and we
: aren't going to get rid of it.  We've looked at some very hacky approaches
: using RSS ("take the RSS of the target process, subtract the RSS of the
: zygote process that is the parent of all Android apps") to reduce the
: accounting time, but it regularly overestimated the memory used by 20+
: percent.  Accordingly, I don't think that there's a good alternative to
: using PSS.
:
: We started looking into PSS collection performance after we noticed random
: frequency spikes while a phone's screen was off; occasionally, one of the
: CPU clusters would ramp to a high frequency because there was 200-300ms of
: constant CPU work from a single thread in the main Android userspace
: process.  The work causing the spike (which is reasonable governor
: behavior given the amount of CPU time needed) was always PSS collection.
: As a result, Android is burning more power than we should be on PSS
: collection.
:
: The other issue (and why I'm less sure about improving smaps as a
: long-term solution) is that the number of VMAs per process has increased
: significantly from release to release.  After trying to figure out why we
: were seeing these 200-300ms PSS collection times on Android O but had not
: noticed it in previous versions, we found that the number of VMAs in the
: main system process increased by 50% from Android N to Android O (from
: ~1800 to ~2700) and varying increases in every userspace process.  Android
: M to N also had an increase in the number of VMAs, although not as much.
: I'm not sure why this is increasing so much over time, but thinking about
: ASLR and ways to make ASLR better, I expect that this will continue to
: increase going forward.  I would not be surprised if we hit 5000 VMAs on
: the main Android process (system_server) by 2020.
:
: If we assume that the number of VMAs is going to increase over time, then
: doing anything we can do to reduce the overhead of each VMA during PSS
: collection seems like the right way to go, and that means outputting an
: aggregate statistic (to avoid whatever overhead there is per line in
: writing smaps and in reading each line from userspace).

Link: http://lkml.kernel.org/r/20170812022148.178293-1-dancol@google.com
Signed-off-by: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Tim Murray &lt;timmurray@google.com&gt;
Cc: Joel Fernandes &lt;joelaf@google.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Sonny Rao &lt;sonnyrao@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2017-07-19T15:55:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-19T15:55:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e06fdaf40a5c021dd4a2ec797e8b724f07360070'/>
<id>urn:sha1:e06fdaf40a5c021dd4a2ec797e8b724f07360070</id>
<content type='text'>
Pull structure randomization updates from Kees Cook:
 "Now that IPC and other changes have landed, enable manual markings for
  randstruct plugin, including the task_struct.

  This is the rest of what was staged in -next for the gcc-plugins, and
  comes in three patches, largest first:

   - mark "easy" structs with __randomize_layout

   - mark task_struct with an optional anonymous struct to isolate the
     __randomize_layout section

   - mark structs to opt _out_ of automated marking (which will come
     later)

  And, FWIW, this continues to pass allmodconfig (normal and patched to
  enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
  s390 for me"

* tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  randstruct: opt-out externally exposed function pointer structs
  task_struct: Allow randomized layout
  randstruct: Mark various structs for randomization
</content>
</entry>
<entry>
<title>proc: Fix proc_sys_prune_dcache to hold a sb reference</title>
<updated>2017-07-11T16:01:24Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-07-06T13:41:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4'/>
<id>urn:sha1:2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4</id>
<content type='text'>
Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
&gt; BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
&gt; ------------[ cut here ]------------
&gt; WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
&gt; CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
&gt; Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
&gt; Workqueue: events proc_cleanup_work
&gt; Call Trace:
&gt;  dump_stack+0x63/0x86
&gt;  __warn+0xcb/0xf0
&gt;  warn_slowpath_null+0x1d/0x20
&gt;  umount_check+0x6e/0x80
&gt;  d_walk+0xc6/0x270
&gt;  ? dentry_free+0x80/0x80
&gt;  do_one_tree+0x26/0x40
&gt;  shrink_dcache_for_umount+0x2d/0x90
&gt;  generic_shutdown_super+0x1f/0xf0
&gt;  kill_anon_super+0x12/0x20
&gt;  proc_kill_sb+0x40/0x50
&gt;  deactivate_locked_super+0x43/0x70
&gt;  deactivate_super+0x5a/0x60
&gt;  cleanup_mnt+0x3f/0x90
&gt;  mntput_no_expire+0x13b/0x190
&gt;  kern_unmount+0x3e/0x50
&gt;  pid_ns_release_proc+0x15/0x20
&gt;  proc_cleanup_work+0x15/0x20
&gt;  process_one_work+0x197/0x450
&gt;  worker_thread+0x4e/0x4a0
&gt;  kthread+0x109/0x140
&gt;  ? process_one_work+0x450/0x450
&gt;  ? kthread_park+0x90/0x90
&gt;  ret_from_fork+0x2c/0x40
&gt; ---[ end trace e1c109611e5d0b41 ]---
&gt; VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
&gt; BUG: unable to handle kernel NULL pointer dereference at           (null)
&gt; IP: _raw_spin_lock+0xc/0x30
&gt; PGD 0

Fix this by taking a reference to the super block in proc_sys_prune_dcache.

The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used.  This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list.  Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks.  The fact that
head-&gt;unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.

Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to.  The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.

Cc: stable@vger.kernel.org
Reported-by: Andrei Vagin &lt;avagin@gmail.com&gt;
Tested-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
