<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/notify/group.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2018-08-17T23:20:30Z</updated>
<entry>
<title>fs: fsnotify: account fsnotify metadata to kmemcg</title>
<updated>2018-08-17T23:20:30Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeelb@google.com</email>
</author>
<published>2018-08-17T22:46:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d46eb14b735b11927d4bdc2d1854c311af19de6d'/>
<id>urn:sha1:d46eb14b735b11927d4bdc2d1854c311af19de6d</id>
<content type='text'>
Patch series "Directed kmem charging", v8.

The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs.  All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.

The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated.  However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg.  This patch series
contains two such concrete use-cases i.e.  fsnotify and buffer_head.

The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener.  The events
are allocated in the context of the event producer.  However they should
be charged to the event consumer.  Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.

To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg.  In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged.  For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.

This patch (of 2):

A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener.  This can cause
system level memory pressure or OOMs.  So, it's better to account the
fsnotify kmem caches to the memcg of the listener.

However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer.  This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.

There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener.  So, SLAB_ACCOUNT is enough for these caches.

The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.

The allocations from the event caches happen in the context of the event
producer.  For such caches we will need to remote charge the allocations
to the listener's memcg.  Thus we save the memcg reference in the
fsnotify_group structure of the listener.

This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.

[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
  Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: use type id to identify connector object type</title>
<updated>2018-05-18T12:58:22Z</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2018-04-20T23:10:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d6f7b98bc8147abd290ead82922f8d83c525fb42'/>
<id>urn:sha1:d6f7b98bc8147abd290ead82922f8d83c525fb42</id>
<content type='text'>
An fsnotify_mark_connector is referencing a single type of object
(either inode or vfsmount). Instead of storing a type mask in
connector-&gt;flags, store a single type id in connector-&gt;type to
identify the type of object.

When a connector object is detached from the object, its type is set
to FSNOTIFY_OBJ_TYPE_DETACHED and this object is not going to be
reused.

The function fsnotify_clear_marks_by_group() is the only place where
type mask was used, so use type flags instead of type id to this
function.

This change is going to be more convenient when adding a new object
type (super block).

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: convert fsnotify_group.refcnt from atomic_t to refcount_t</title>
<updated>2017-10-31T16:54:56Z</updated>
<author>
<name>Elena Reshetova</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-10-20T10:26:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7761daa6a1599fa5479b8da367470f632a1927e0'/>
<id>urn:sha1:7761daa6a1599fa5479b8da367470f632a1927e0</id>
<content type='text'>
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable fsnotify_group.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: David Windsor &lt;dwindsor@gmail.com&gt;
Reviewed-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: Remove fsnotify_detach_group_marks()</title>
<updated>2017-04-10T15:37:36Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2017-01-04T09:51:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2e37c6ca8d76c362e844c0cf3ebe8ba2e27940cb'/>
<id>urn:sha1:2e37c6ca8d76c362e844c0cf3ebe8ba2e27940cb</id>
<content type='text'>
The function is already mostly contained in what
fsnotify_clear_marks_by_group() does. Just update that function to not
select marks when all of them should be destroyed and remove
fsnotify_detach_group_marks().

Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: Provide framework for dropping SRCU lock in -&gt;handle_event</title>
<updated>2017-04-10T15:37:36Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-11-10T15:02:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=abc77577a669f424c5d0c185b9994f2621c52aa4'/>
<id>urn:sha1:abc77577a669f424c5d0c185b9994f2621c52aa4</id>
<content type='text'>
fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
from userspace so that the whole notification subsystem is not blocked
during that time. This patch provides a framework for safely getting
mark reference for a mark found in the object list which pins the mark
in that list. We can then drop fsnotify_mark_srcu, wait for userspace
response and then safely continue iteration of the object list once we
reaquire fsnotify_mark_srcu.

Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: Remove special handling of mark destruction on group shutdown</title>
<updated>2017-04-10T15:37:36Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-12-21T13:48:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f09b04a03e0239f65bd964a1de758e53cf6349e8'/>
<id>urn:sha1:f09b04a03e0239f65bd964a1de758e53cf6349e8</id>
<content type='text'>
Currently we queue all marks for destruction on group shutdown and then
destroy them from fsnotify_destroy_group() instead from a worker thread
which is the usual path. However worker can already be processing some
list of marks to destroy so this does not make 100% all marks are really
destroyed by the time group is shut down. This isn't a big problem as
each mark holds group reference and thus group stays partially alive
until all marks are really freed but there's no point in complicating
our lives - just wait for the delayed work to be finished instead.

Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: convert notification_mutex to a spinlock</title>
<updated>2016-10-08T01:46:26Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-10-07T23:56:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c21dbe20f606219fe54faf555b7bc5565487c58f'/>
<id>urn:sha1:c21dbe20f606219fe54faf555b7bc5565487c58f</id>
<content type='text'>
notification_mutex is used to protect the list of pending events.  As such
there's no reason to use a sleeping lock for it.  Convert it to a
spinlock.

[jack@suse.cz: fixed version]
  Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz
Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Lino Sanfilippo &lt;LinoSanfilippo@gmx.de&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: add a way to stop queueing events on group shutdown</title>
<updated>2016-09-19T22:36:17Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-09-19T21:44:27Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=12703dbfeb15402260e7554d32a34ac40c233990'/>
<id>urn:sha1:12703dbfeb15402260e7554d32a34ac40c233990</id>
<content type='text'>
Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group.  Fanotify will use this.

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fsnotify: avoid spurious EMFILE errors from inotify_init()</title>
<updated>2016-05-20T02:12:14Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-20T00:08:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=35e481761cdc688dbee0ef552a13f49af8eba6cc'/>
<id>urn:sha1:35e481761cdc688dbee0ef552a13f49af8eba6cc</id>
<content type='text'>
Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance).  However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance.  Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view.  This problem is exposed by LTP inotify06 testcase on
some machines.

We fix the problem by making sure all group marks are properly freed
while destroying inotify instance.  We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance.  So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Xiaoguang Wang &lt;wangxg.fnst@cn.fujitsu.com&gt;
Tested-by: Xiaoguang Wang &lt;wangxg.fnst@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/notify/group.c: make fsnotify_final_destroy_group() static</title>
<updated>2014-10-10T02:25:45Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2014-10-09T22:24:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cafbaae8afdb1e6cf9f4715aea2c897cf407ddfd'/>
<id>urn:sha1:cafbaae8afdb1e6cf9f4715aea2c897cf407ddfd</id>
<content type='text'>
No callers outside this file.

Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
