<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/linux/sched/task.h, branch linux-6.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-02-21T12:50:04Z</updated>
<entry>
<title>cgroup: fix race between fork and cgroup.kill</title>
<updated>2025-02-21T12:50:04Z</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-01-31T00:05:42Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=28e51dd4f28bf5edb705a41475ee606d766ce375'/>
<id>urn:sha1:28e51dd4f28bf5edb705a41475ee606d766ce375</id>
<content type='text'>
commit b69bb476dee99d564d65d418e9a20acca6f32c3f upstream.

Tejun reported the following race between fork() and cgroup.kill at [1].

Tejun:
  I was looking at cgroup.kill implementation and wondering whether there
  could be a race window. So, __cgroup_kill() does the following:

   k1. Set CGRP_KILL.
   k2. Iterate tasks and deliver SIGKILL.
   k3. Clear CGRP_KILL.

  The copy_process() does the following:

   c1. Copy a bunch of stuff.
   c2. Grab siglock.
   c3. Check fatal_signal_pending().
   c4. Commit to forking.
   c5. Release siglock.
   c6. Call cgroup_post_fork() which puts the task on the css_set and tests
       CGRP_KILL.

  The intention seems to be that either a forking task gets SIGKILL and
  terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I
  don't see what guarantees that k3 can't happen before c6. ie. After a
  forking task passes c5, k2 can take place and then before the forking task
  reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child.
  What am I missing?

This is indeed a race. One way to fix this race is by taking
cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the fork()
side takes cgroup_threadgroup_rwsem in read mode from cgroup_can_fork()
to cgroup_post_fork(). However that would be heavy handed as this adds
one more potential stall scenario for cgroup.kill which is usually
called under extreme situation like memory pressure.

To fix this race, let's maintain a sequence number per cgroup which gets
incremented on __cgroup_kill() call. On the fork() side, the
cgroup_can_fork() will cache the sequence number locally and recheck it
against the cgroup's sequence number at cgroup_post_fork() site. If the
sequence numbers mismatch, it means __cgroup_kill() can been called and
we should send SIGKILL to the newly created task.

Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Closes: https://lore.kernel.org/all/Z5QHE2Qn-QZ6M-KW@slm.duckdns.org/ [1]
Fixes: 661ee6280931 ("cgroup: introduce cgroup.kill")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>locking: Introduce __cleanup() based infrastructure</title>
<updated>2024-02-23T08:12:51Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-05-26T10:23:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3c6cc62ce1265aa5623e2e1b29c0fe258bf6e232'/>
<id>urn:sha1:3c6cc62ce1265aa5623e2e1b29c0fe258bf6e232</id>
<content type='text'>
commit 54da6a0924311c7cf5015533991e44fb8eb12773 upstream.

Use __attribute__((__cleanup__(func))) to build:

 - simple auto-release pointers using __free()

 - 'classes' with constructor and destructor semantics for
   scope-based resource management.

 - lock guards based on the above classes.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230612093537.614161713%40infradead.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/fork: beware of __put_task_struct() calling context</title>
<updated>2023-09-23T09:11:00Z</updated>
<author>
<name>Wander Lairson Costa</name>
<email>wander@redhat.com</email>
</author>
<published>2023-06-14T12:23:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3b1107abdc2cabdf2044683fc2fa9743a7da5ccc'/>
<id>urn:sha1:3b1107abdc2cabdf2044683fc2fa9743a7da5ccc</id>
<content type='text'>
[ Upstream commit d243b34459cea30cfe5f3a9b2feb44e7daff9938 ]

Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.

One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:

  CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
   Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
   Call Trace:
   dump_stack_lvl+0x57/0x7d
   mark_lock_irq.cold+0x33/0xba
   mark_lock+0x1e7/0x400
   mark_usage+0x11d/0x140
   __lock_acquire+0x30d/0x930
   lock_acquire.part.0+0x9c/0x210
   rt_spin_lock+0x27/0xe0
   refill_obj_stock+0x3d/0x3a0
   kmem_cache_free+0x357/0x560
   inactive_task_timer+0x1ad/0x340
   __run_hrtimer+0x8a/0x1a0
   __hrtimer_run_queues+0x91/0x130
   hrtimer_interrupt+0x10f/0x220
   __sysvec_apic_timer_interrupt+0x7b/0xd0
   sysvec_apic_timer_interrupt+0x4f/0xd0
   asm_sysvec_apic_timer_interrupt+0x12/0x20
   RIP: 0033:0x7fff196bf6f5

Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.

The issue is reproducible with stress-ng:

  while true; do
      stress-ng --sched deadline --sched-period 1000000000 \
	      --sched-runtime 800000000 --sched-deadline \
	      1000000000 --mmapfork 23 -t 20
  done

Reported-by: Hu Chunyu &lt;chuhu@redhat.com&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Suggested-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Wander Lairson Costa &lt;wander@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230614122323.37957-2-wander@redhat.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: Move mm_cachep initialization to mm_init()</title>
<updated>2023-08-08T18:03:49Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-25T19:38:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e0fd83a193c530fdeced8b2e2ec83039ffdb884b'/>
<id>urn:sha1:e0fd83a193c530fdeced8b2e2ec83039ffdb884b</id>
<content type='text'>
commit af80602799681c78f14fbe20b6185a56020dedee upstream.

In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/mm: Use mm_alloc() in poking_init()</title>
<updated>2023-08-08T18:03:49Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-25T19:38:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9ae15aaff39c831e2f9d8b029e85a2d70c7c8a68'/>
<id>urn:sha1:9ae15aaff39c831e2f9d8b029e85a2d70c7c8a68</id>
<content type='text'>
commit 3f4c8211d982099be693be9aa7d6fc4607dff290 upstream.

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernel: exit: cleanup release_thread()</title>
<updated>2022-09-12T04:55:07Z</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2022-08-19T01:44:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2be9880dc87342dc7ae459c9ea5c9ee2a45b33d8'/>
<id>urn:sha1:2be9880dc87342dc7ae459c9ea5c9ee2a45b33d8</id>
<content type='text'>
Only x86 has own release_thread(), introduce a new weak release_thread()
function to clean empty definitions in other ARCHs.

Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Acked-by: Guo Ren &lt;guoren@kernel.org&gt;				[csky]
Acked-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Acked-by: Brian Cain &lt;bcain@quicinc.com&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;			[powerpc]
Acked-by: Stafford Horne &lt;shorne@gmail.com&gt;			[openrisc]
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;		[arm64]
Acked-by: Huacai Chen &lt;chenhuacai@kernel.org&gt;			[LoongArch]
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Anton Ivanov &lt;anton.ivanov@cambridgegreys.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Dinh Nguyen &lt;dinguyen@kernel.org&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt; [csky]
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefan Kristiansson &lt;stefan.kristiansson@saunalahti.fi&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@kernel.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Xuerui Wang &lt;kernel@xen0n.name&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fix race between exit_itimers() and /proc/pid/timers</title>
<updated>2022-07-11T16:52:59Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2022-07-11T16:16:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d5b36a4dbd06c5e8e36ca8ccc552f679069e2946'/>
<id>urn:sha1:d5b36a4dbd06c5e8e36ca8ccc552f679069e2946</id>
<content type='text'>
As Chris explains, the comment above exit_itimers() is not correct,
we can race with proc_timers_seq_ops. Change exit_itimers() to clear
signal-&gt;posix_timers with -&gt;siglock held.

Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: chris@accessvector.net
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fork: Generalize PF_IO_WORKER handling</title>
<updated>2022-05-07T14:01:59Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2022-04-12T15:18:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5bd2e97c868a8a44470950ed01846cab6328e540'/>
<id>urn:sha1:5bd2e97c868a8a44470950ed01846cab6328e540</id>
<content type='text'>
Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.

The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.

The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size".  These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().

Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>fork: Explicity test for idle tasks in copy_thread</title>
<updated>2022-05-07T14:01:59Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2022-04-11T21:17:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=36cb0e1cda645ee645b85a6ce652cb46a16e14e5'/>
<id>urn:sha1:36cb0e1cda645ee645b85a6ce652cb46a16e14e5</id>
<content type='text'>
The architectures ia64 and parisc have special handling for the idle
thread in copy_process.  Add a flag named idle to kernel_clone_args
and use it to explicity test if an idle process is being created.

Fullfill the expectations of the rest of the copy_thread
implemetations and pass a function pointer in .stack from fork_idle().
This makes what is happening in copy_thread better defined, and is
useful to make idle threads less special.

Link: https://lkml.kernel.org/r/20220506141512.516114-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>fork: Pass struct kernel_clone_args into copy_thread</title>
<updated>2022-05-07T14:01:48Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2022-04-08T23:07:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c5febea0956fd3874e8fb59c6f84d68f128d68f8'/>
<id>urn:sha1:c5febea0956fd3874e8fb59c6f84d68f128d68f8</id>
<content type='text'>
With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.

The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.

Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.

v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
