<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/init/init_task.c, branch linux-6.10.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.10.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.10.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2024-05-15T20:07:49Z</updated>
<entry>
<title>Merge tag 'livepatching-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching</title>
<updated>2024-05-15T20:07:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-05-15T20:07:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8c06da67d0bd3139a97f301b4aa9c482b9d4f29e'/>
<id>urn:sha1:8c06da67d0bd3139a97f301b4aa9c482b9d4f29e</id>
<content type='text'>
Pull livepatching update from Petr Mladek:

 - Use more informative names for the livepatch transition states

* tag 'livepatching-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch: Rename KLP_* to KLP_TRANSITION_*
</content>
</entry>
<entry>
<title>livepatch: Rename KLP_* to KLP_TRANSITION_*</title>
<updated>2024-05-09T13:48:01Z</updated>
<author>
<name>Wardenjohn</name>
<email>zhangwarden@gmail.com</email>
</author>
<published>2024-05-07T05:01:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d927752f287fe10965612541593468ffcfa9231f'/>
<id>urn:sha1:d927752f287fe10965612541593468ffcfa9231f</id>
<content type='text'>
The original macros of KLP_* is about the state of the transition.
Rename macros of KLP_* to KLP_TRANSITION_* to fix the confusing
description of klp transition state.

Signed-off-by: Wardenjohn &lt;zhangwarden@gmail.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Tested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Acked-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Link: https://lore.kernel.org/r/20240507050111.38195-2-zhangwarden@gmail.com
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
</content>
</entry>
<entry>
<title>sched/fair: Check if a task has a fitting CPU when updating misfit</title>
<updated>2024-03-25T11:09:54Z</updated>
<author>
<name>Qais Yousef</name>
<email>qyousef@layalina.io</email>
</author>
<published>2024-03-24T00:45:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=22d5607400c62c72da9b60e3324744be83e147a4'/>
<id>urn:sha1:22d5607400c62c72da9b60e3324744be83e147a4</id>
<content type='text'>
If a misfit task is affined to a subset of the possible CPUs, we need to
verify that one of these CPUs can fit it. Otherwise the load balancer
code will continuously trigger needlessly leading the balance_interval
to increase in return and eventually end up with a situation where real
imbalances take a long time to address because of this impossible
imbalance situation.

This can happen in Android world where it's common for background tasks
to be restricted to little cores.

Similarly if we can't fit the biggest core, triggering misfit is
pointless as it is the best we can ever get on this system.

To be able to detect that; we use asym_cap_list to iterate through
capacities in the system to see if the task is able to run at a higher
capacity level based on its p-&gt;cpus_ptr. We do that when the affinity
change, a fair task is forked, or when a task switched to fair policy.
We store the max_allowed_capacity in task_struct to allow for cheap
comparison in the fast path.

Improve check_misfit_status() function by removing redundant checks.
misfit_task_load will be 0 if the task can't move to a bigger CPU. And
nohz_balancer_kick() already checks for cpu_check_capacity() before
calling check_misfit_status().

Test:
=====

Add

	trace_printk("balance_interval = %lu\n", interval)

in get_sd_balance_interval().

run
	if [ "$MASK" != "0" ]; then
		adb shell "taskset -a $MASK cat /dev/zero &gt; /dev/null"
	fi
	sleep 10
	// parse ftrace buffer counting the occurrence of each valaue

Where MASK is either:

	* 0: no busy task running
	* 1: busy task is pinned to 1 cpu; handled today to not cause
	  misfit
	* f: busy task pinned to little cores, simulates busy background
	  task, demonstrates the problem to be fixed

Results:
========

Note how occurrence of balance_interval = 128 overshoots for MASK = f.

BEFORE
------

	MASK=0

		   1 balance_interval = 175
		 120 balance_interval = 128
		 846 balance_interval = 64
		  55 balance_interval = 63
		 215 balance_interval = 32
		   2 balance_interval = 31
		   2 balance_interval = 16
		   4 balance_interval = 8
		1870 balance_interval = 4
		  65 balance_interval = 2

	MASK=1

		  27 balance_interval = 175
		  37 balance_interval = 127
		 840 balance_interval = 64
		 167 balance_interval = 63
		 449 balance_interval = 32
		  84 balance_interval = 31
		 304 balance_interval = 16
		1156 balance_interval = 8
		2781 balance_interval = 4
		 428 balance_interval = 2

	MASK=f

		   1 balance_interval = 175
		1328 balance_interval = 128
		  44 balance_interval = 64
		 101 balance_interval = 63
		  25 balance_interval = 32
		   5 balance_interval = 31
		  23 balance_interval = 16
		  23 balance_interval = 8
		4306 balance_interval = 4
		 177 balance_interval = 2

AFTER
-----

Note how the high values almost disappear for all MASK values. The
system has background tasks that could trigger the problem without
simulate it even with MASK=0.

	MASK=0

		 103 balance_interval = 63
		  19 balance_interval = 31
		 194 balance_interval = 8
		4827 balance_interval = 4
		 179 balance_interval = 2

	MASK=1

		 131 balance_interval = 63
		   1 balance_interval = 31
		  87 balance_interval = 8
		3600 balance_interval = 4
		   7 balance_interval = 2

	MASK=f

		   8 balance_interval = 127
		 182 balance_interval = 63
		   3 balance_interval = 31
		   9 balance_interval = 16
		 415 balance_interval = 8
		3415 balance_interval = 4
		  21 balance_interval = 2

Signed-off-by: Qais Yousef &lt;qyousef@layalina.io&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20240324004552.999936-3-qyousef@layalina.io
</content>
</entry>
<entry>
<title>rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks</title>
<updated>2024-02-25T22:21:43Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2024-02-05T21:10:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=46faf9d8e1d52e4a91c382c6c72da6bd8e68297b'/>
<id>urn:sha1:46faf9d8e1d52e4a91c382c6c72da6bd8e68297b</id>
<content type='text'>
Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock.  This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch.  However, such deadlocks are becoming
more frequent.  In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.

In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
just as well take advantage of that fact.

This commit therefore initializes the data structures that will be needed
to rely on these quiescent states and to eliminate these deadlocks.

Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/

Reported-by: Chen Zhongjin &lt;chenzhongjin@huawei.com&gt;
Reported-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Tested-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Tested-by: Chen Zhongjin &lt;chenzhongjin@huawei.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs</title>
<updated>2024-01-11T00:43:55Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-11T00:43:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=78273df7f646f8daf2604ec714bea0897cd03aae'/>
<id>urn:sha1:78273df7f646f8daf2604ec714bea0897cd03aae</id>
<content type='text'>
Pull header cleanups from Kent Overstreet:
 "The goal is to get sched.h down to a type only header, so the main
  thing happening in this patchset is splitting out various _types.h
  headers and dependency fixups, as well as moving some things out of
  sched.h to better locations.

  This is prep work for the memory allocation profiling patchset which
  adds new sched.h interdepencencies"

* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
  Kill sched.h dependency on rcupdate.h
  kill unnecessary thread_info.h include
  Kill unnecessary kernel.h include
  preempt.h: Kill dependency on list.h
  rseq: Split out rseq.h from sched.h
  LoongArch: signal.c: add header file to fix build error
  restart_block: Trim includes
  lockdep: move held_lock to lockdep_types.h
  sem: Split out sem_types.h
  uidgid: Split out uidgid_types.h
  seccomp: Split out seccomp_types.h
  refcount: Split out refcount_types.h
  uapi/linux/resource.h: fix include
  x86/signal: kill dependency on time.h
  syscall_user_dispatch.h: split out *_types.h
  mm_types_task.h: Trim dependencies
  Split out irqflags_types.h
  ipc: Kill bogus dependency on spinlock.h
  shm: Slim down dependencies
  workqueue: Split out workqueue_types.h
  ...
</content>
</entry>
<entry>
<title>plist: Split out plist_types.h</title>
<updated>2023-12-21T00:26:31Z</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2023-12-11T18:12:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8b7787a543cde905e53eaf29172c9472fe8a6a75'/>
<id>urn:sha1:8b7787a543cde905e53eaf29172c9472fe8a6a75</id>
<content type='text'>
Trimming down sched.h dependencies: we don't want to include more than
the base types.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
</entry>
<entry>
<title>arch: remove ARCH_TASK_STRUCT_ON_STACK</title>
<updated>2023-12-11T01:21:31Z</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2023-11-16T13:36:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0eb5085c38749f2a91e5bd8cbebb1ebf3398343c'/>
<id>urn:sha1:0eb5085c38749f2a91e5bd8cbebb1ebf3398343c</id>
<content type='text'>
IA-64 was the only architecture which selected ARCH_TASK_STRUCT_ON_STACK.
IA-64 was removed with commit cf8e8658100d ("arch: Remove Itanium (IA-64)
architecture"). Therefore remove support for ARCH_TASK_STRUCT_ON_STACK
as well.

Note: this also reveals a potential bug in powerpc code, which makes use of
__init_task_data without selecting ARCH_TASK_STRUCT_ON_STACK which makes
__init_task_data a no-op. This is broken since commit d11ed3ab3166 ("Expand
INIT_TASK() in init/init_task.c and remove") from 2018 and needs to be
addressed separately.

Link: https://lkml.kernel.org/r/20231116133638.1636277-4-hca@linux.ibm.com
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-11-03T06:53:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-03T06:53:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8f6f76a6a29f36d2f3e4510d0bde5046672f6924'/>
<id>urn:sha1:8f6f76a6a29f36d2f3e4510d0bde5046672f6924</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig-&gt;stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
</content>
</entry>
<entry>
<title>kill task_struct-&gt;thread_group</title>
<updated>2023-10-04T17:41:56Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2023-08-26T11:14:09Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8e1f385104ac044f1552686ad6e1cbc71cc05a30'/>
<id>urn:sha1:8e1f385104ac044f1552686ad6e1cbc71cc05a30</id>
<content type='text'>
The last user was removed by the previous patch.

Link: https://lkml.kernel.org/r/20230826111409.GA23243@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Add task_struct-&gt;faults_disabled_mapping</title>
<updated>2023-09-12T03:59:46Z</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@gmail.com</email>
</author>
<published>2019-10-16T19:03:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2b69987be575b92adb6c177679f3c559134f0d8f'/>
<id>urn:sha1:2b69987be575b92adb6c177679f3c559134f0d8f</id>
<content type='text'>
There has been a long standing page cache coherence bug with direct IO.
This provides part of a mechanism to fix it, currently just used by
bcachefs but potentially worth promoting to the VFS.

Direct IO evicts the range of the pagecache being read or written to.

For reads, we need dirty pages to be written to disk, so that the read
doesn't return stale data. For writes, we need to evict that range of
the pagecache so that it's not stale after the write completes.

However, without a locking mechanism to prevent those pages from being
re-added to the pagecache - by a buffered read or page fault - page
cache inconsistency is still possible.

This isn't necessarily just an issue for userspace when they're playing
games; filesystems may hang arbitrary state off the pagecache, and so
page cache inconsistency may cause real filesystem bugs, depending on
the filesystem. This is less of an issue for iomap based filesystems,
but e.g. buffer heads caches disk block mappings (!) and attaches them
to the pagecache, and bcachefs attaches disk reservations to pagecache
pages.

This issue has been hard to fix, because
 - we need to add a lock (henceforth called pagecache_add_lock), which
   would be held for the duration of the direct IO
 - page faults add pages to the page cache, thus need to take the same
   lock
 - dio -&gt; gup -&gt; page fault thus can deadlock

And we cannot enforce a lock ordering with this lock, since userspace
will be controlling the lock ordering (via the fd and buffer arguments
to direct IOs), so we need a different method of deadlock avoidance.

We need to tell the page fault handler that we're already holding a
pagecache_add_lock, and since plumbing it through the entire gup() path
would be highly impractical this adds a field to task_struct.

Then the full method is:
 - in the dio path, when we first take the pagecache_add_lock, note the
   mapping in the current task_struct
 - in the page fault handler, if faults_disabled_mapping is set, we
   check if it's the same mapping as the one we're taking a page fault
   for, and if so return an error.

   Then we check lock ordering: if there's a lock ordering violation and
   trylock fails, we'll have to cycle the locks and return an error that
   tells the DIO path to retry: faults_disabled_mapping is also used for
   signalling "locks were dropped, please retry".

Also relevant to this patch: mapping-&gt;invalidate_lock.
mapping-&gt;invalidate_lock provides most of the required semantics - it's
used by truncate/fallocate to block pages being added to the pagecache.
However, since it's a rwsem, direct IOs would need to take the write
side in order to block page cache adds, and would then be exclusive with
each other - we'll need a new type of lock to pair with this approach.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Darrick J. Wong &lt;djwong@kernel.org&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: Andreas Grünbacher &lt;andreas.gruenbacher@gmail.com&gt;
</content>
</entry>
</feed>
