<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/linux/cgroup-defs.h, branch linux-5.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-06-15T03:46:14Z</updated>
<entry>
<title>Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2019-06-15T03:46:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-06-15T03:46:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0011572c883082a95e02d47f45fc4a42dc0e8634'/>
<id>urn:sha1:0011572c883082a95e02d47f45fc4a42dc0e8634</id>
<content type='text'>
Pull cgroup fixes from Tejun Heo:
 "This has an unusually high density of tricky fixes:

   - task_get_css() could deadlock when it races against a dying cgroup.

   - cgroup.procs didn't list thread group leaders with live threads.

     This could mislead readers to think that a cgroup is empty when
     it's not. Fixed by making PROCS iterator include dead tasks. I made
     a couple mistakes making this change and this pull request contains
     a couple follow-up patches.

   - When cpusets run out of online cpus, it updates cpusmasks of member
     tasks in bizarre ways. Joel improved the behavior significantly"

* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  cgroup: Fix css_task_iter_advance_css_set() cset skip condition
  cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
  cgroup: Include dying leaders with live threads in PROCS iterations
  cgroup: Implement css_task_iter_skip()
  cgroup: Call cgroup_release() before __exit_signal()
  docs cgroups: add another example size for hugetlb
  cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
</content>
</entry>
<entry>
<title>cgroup/bfq: revert bfq.weight symlink change</title>
<updated>2019-06-10T09:35:41Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2019-06-10T09:35:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cf8929885de318c0bf73438c9e5dde59d6536f7c'/>
<id>urn:sha1:cf8929885de318c0bf73438c9e5dde59d6536f7c</id>
<content type='text'>
There's some discussion on how to do this the best, and Tejun prefers
that BFQ just create the file itself instead of having cgroups support
a symlink feature.

Hence revert commit 54b7b868e826 and 19e9da9e86c4 for 5.2, and this
can be done properly for 5.3.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>cgroup: let a symlink too be created with a cftype file</title>
<updated>2019-06-07T07:29:39Z</updated>
<author>
<name>Angelo Ruocco</name>
<email>angeloruocco90@gmail.com</email>
</author>
<published>2019-05-21T08:01:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=54b7b868e826b294687c439b68ec55fe20cafe5b'/>
<id>urn:sha1:54b7b868e826b294687c439b68ec55fe20cafe5b</id>
<content type='text'>
This commit enables a cftype to have a symlink (of any name) that
points to the file associated with the cftype.

Signed-off-by: Angelo Ruocco &lt;angeloruocco90@gmail.com&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>mm, memcg: consider subtrees in memory.events</title>
<updated>2019-06-01T22:51:31Z</updated>
<author>
<name>Chris Down</name>
<email>chris@chrisdown.name</email>
</author>
<published>2019-06-01T05:30:22Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9852ae3fe5293264f01c49f2571ef7688f7823ce'/>
<id>urn:sha1:9852ae3fe5293264f01c49f2571ef7688f7823ce</id>
<content type='text'>
memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.

The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files.  For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour.  The same confusion applies to other counters in this file when
debugging issues.

Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.

After this patch, events are propagated up the hierarchy:

    [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
    low 0
    high 0
    max 0
    oom 0
    oom_kill 0
    [root@ktst ~]# systemd-run -p MemoryMax=1 true
    Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
    [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
    low 0
    high 0
    max 7
    oom 1
    oom_kill 1

As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set.  However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.

akpm: this is a behaviour change, so Cc:stable.  THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.

Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
Signed-off-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Include dying leaders with live threads in PROCS iterations</title>
<updated>2019-05-31T17:38:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-05-31T17:38:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c03cd7738a83b13739f00546166969342c8ff014'/>
<id>urn:sha1:c03cd7738a83b13739f00546166969342c8ff014</id>
<content type='text'>
CSS_TASK_ITER_PROCS currently iterates live group leaders; however,
this means that a process with dying leader and live threads will be
skipped.  IOW, cgroup.procs might be empty while cgroup.threads isn't,
which is confusing to say the least.

Fix it by making cset track dying tasks and include dying leaders with
live threads in PROCS iteration.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Topi Miettinen &lt;toiwoton@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
</content>
</entry>
<entry>
<title>cgroup: cgroup v2 freezer</title>
<updated>2019-04-19T18:26:48Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-04-19T17:03:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=76f969e8948d82e78e1bc4beb6b9465908e74873'/>
<id>urn:sha1:76f969e8948d82e78e1bc4beb6b9465908e74873</id>
<content type='text'>
Cgroup v1 implements the freezer controller, which provides an ability
to stop the workload in a cgroup and temporarily free up some
resources (cpu, io, network bandwidth and, potentially, memory)
for some other tasks. Cgroup v2 lacks this functionality.

This patch implements freezer for cgroup v2.

Cgroup v2 freezer tries to put tasks into a state similar to jobctl
stop. This means that tasks can be killed, ptraced (using
PTRACE_SEIZE*), and interrupted. It is possible to attach to
a frozen task, get some information (e.g. read registers) and detach.
It's also possible to migrate a frozen tasks to another cgroup.

This differs cgroup v2 freezer from cgroup v1 freezer, which mostly
tried to imitate the system-wide freezer. However uninterruptible
sleep is fine when all tasks are going to be frozen (hibernation case),
it's not the acceptable state for some subset of the system.

Cgroup v2 freezer is not supporting freezing kthreads.
If a non-root cgroup contains kthread, the cgroup still can be frozen,
but the kthread will remain running, the cgroup will be shown
as non-frozen, and the notification will not be delivered.

* PTRACE_ATTACH is not working because non-fatal signal delivery
is blocked in frozen state.

There are some interface differences between cgroup v1 and cgroup v2
freezer too, which are required to conform the cgroup v2 interface
design principles:
1) There is no separate controller, which has to be turned on:
the functionality is always available and is represented by
cgroup.freeze and cgroup.events cgroup control files.
2) The desired state is defined by the cgroup.freeze control file.
Any hierarchical configuration is allowed.
3) The interface is asynchronous. The actual state is available
using cgroup.events control file ("frozen" field). There are no
dedicated transitional states.
4) It's allowed to make any changes with the cgroup hierarchy
(create new cgroups, remove old cgroups, move tasks between cgroups)
no matter if some cgroups are frozen.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
No-objection-from-me-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: kernel-team@fb.com
</content>
</entry>
<entry>
<title>cgroup: protect cgroup-&gt;nr_(dying_)descendants by css_set_lock</title>
<updated>2019-04-19T18:26:48Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-04-19T17:03:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4dcabece4c3a9f9522127be12cc12cc120399b2f'/>
<id>urn:sha1:4dcabece4c3a9f9522127be12cc12cc120399b2f</id>
<content type='text'>
The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.

The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).

To avoid this, let's additionally synchronize these counters using
the css_set_lock.

So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: kernel-team@fb.com
</content>
</entry>
<entry>
<title>Merge branch 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2019-03-07T18:11:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-03-07T18:11:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1fc1cd8399ab5541a488a7e47b2f21537dd76c2d'/>
<id>urn:sha1:1fc1cd8399ab5541a488a7e47b2f21537dd76c2d</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - Oleg's pids controller accounting update which gets rid of rcu delay
   in pids accounting updates

 - rstat (cgroup hierarchical stat collection mechanism) optimization

 - Doc updates

* 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: remove unused task_has_mempolicy()
  cgroup, rstat: Don't flush subtree root unless necessary
  cgroup: add documentation for pids.events file
  Documentation: cgroup-v2: eliminate markup warnings
  MAINTAINERS: Update cgroup entry
  cgroup/pids: turn cgroup_subsys-&gt;free() into cgroup_subsys-&gt;release() to fix the accounting
</content>
</entry>
<entry>
<title>kernel: cgroup: add poll file operation</title>
<updated>2019-03-06T05:07:17Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-03-05T23:45:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dc50537bdd1a0804fa2cbc990565ee9a944e66fa'/>
<id>urn:sha1:dc50537bdd1a0804fa2cbc990565ee9a944e66fa</id>
<content type='text'>
Cgroup has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes.  To allow polling for
custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup/pids: turn cgroup_subsys-&gt;free() into cgroup_subsys-&gt;release() to fix the accounting</title>
<updated>2019-01-31T14:55:57Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2019-01-28T16:00:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=51bee5abeab2058ea5813c5615d6197a23dbf041'/>
<id>urn:sha1:51bee5abeab2058ea5813c5615d6197a23dbf041</id>
<content type='text'>
The only user of cgroup_subsys-&gt;free() callback is pids_cgrp_subsys which
needs pids_free() to uncharge the pid.

However, -&gt;free() is called from __put_task_struct()-&gt;cgroup_free() and this
is too late. Even the trivial program which does

	for (;;) {
		int pid = fork();
		assert(pid &gt;= 0);
		if (pid)
			wait(NULL);
		else
			exit(0);
	}

can run out of limits because release_task()-&gt;call_rcu(delayed_put_task_struct)
implies an RCU gp after the task/pid goes away and before the final put().

Test-case:

	mkdir -p /tmp/CG
	mount -t cgroup2 none /tmp/CG
	echo '+pids' &gt; /tmp/CG/cgroup.subtree_control

	mkdir /tmp/CG/PID
	echo 2 &gt; /tmp/CG/PID/pids.max

	perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &amp;
	echo $! &gt; /tmp/CG/PID/cgroup.procs

Without this patch the forking process fails soon after migration.

Rename cgroup_subsys-&gt;free() to cgroup_subsys-&gt;release() and move the callsite
into the new helper, cgroup_release(), called by release_task() which actually
frees the pid(s).

Reported-by: Herton R. Krzesinski &lt;hkrzesin@redhat.com&gt;
Reported-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
