<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/powerpc/lib/qspinlock.c, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-09-01T08:08:12Z</updated>
<entry>
<title>powerpc/qspinlock: Add spinlock contention tracepoint</title>
<updated>2025-09-01T08:08:12Z</updated>
<author>
<name>Nysal Jan K.A.</name>
<email>nysal@linux.ibm.com</email>
</author>
<published>2025-07-31T06:18:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4f61d54d2245c15b23ad78a89f854fb2496b6216'/>
<id>urn:sha1:4f61d54d2245c15b23ad78a89f854fb2496b6216</id>
<content type='text'>
Add a lock contention tracepoint in the queued spinlock slowpath.
Also add the __lockfunc annotation so that in_lock_functions()
works as expected.

Signed-off-by: Nysal Jan K.A. &lt;nysal@linux.ibm.com&gt;
Reviewed-by: Shrikanth Hegde &lt;sshegde@linux.ibm.com&gt;
Signed-off-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Link: https://patch.msgid.link/20250731061856.1858898-1-nysal@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: Fix deadlock in MCS queue</title>
<updated>2024-08-29T05:12:51Z</updated>
<author>
<name>Nysal Jan K.A.</name>
<email>nysal@linux.ibm.com</email>
</author>
<published>2024-08-29T02:28:27Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=734ad0af3609464f8f93e00b6c0de1e112f44559'/>
<id>urn:sha1:734ad0af3609464f8f93e00b6c0de1e112f44559</id>
<content type='text'>
If an interrupt occurs in queued_spin_lock_slowpath() after we increment
qnodesp-&gt;count and before node-&gt;lock is initialized, another CPU might
see stale lock values in get_tail_qnode(). If the stale lock value happens
to match the lock on that CPU, then we write to the "next" pointer of
the wrong qnode. This causes a deadlock as the former CPU, once it becomes
the head of the MCS queue, will spin indefinitely until it's "next" pointer
is set by its successor in the queue.

Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in
occasional lockups similar to the following:

   $ stress-ng --all 128 --vm-bytes 80% --aggressive \
               --maximize --oomable --verify  --syslog \
               --metrics  --times  --timeout 5m

   watchdog: CPU 15 Hard LOCKUP
   ......
   NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490
   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
   Call Trace:
    0xc000002cfffa3bf0 (unreliable)
    _raw_spin_lock+0x6c/0x90
    raw_spin_rq_lock_nested.part.135+0x4c/0xd0
    sched_ttwu_pending+0x60/0x1f0
    __flush_smp_call_function_queue+0x1dc/0x670
    smp_ipi_demux_relaxed+0xa4/0x100
    xive_muxed_ipi_action+0x20/0x40
    __handle_irq_event_percpu+0x80/0x240
    handle_irq_event_percpu+0x2c/0x80
    handle_percpu_irq+0x84/0xd0
    generic_handle_irq+0x54/0x80
    __do_irq+0xac/0x210
    __do_IRQ+0x74/0xd0
    0x0
    do_IRQ+0x8c/0x170
    hardware_interrupt_common_virt+0x29c/0x2a0
   --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490
   ......
   NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490
   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
   --- interrupt: 500
    0xc0000029c1a41d00 (unreliable)
    _raw_spin_lock+0x6c/0x90
    futex_wake+0x100/0x260
    do_futex+0x21c/0x2a0
    sys_futex+0x98/0x270
    system_call_exception+0x14c/0x2f0
    system_call_vectored_common+0x15c/0x2ec

The following code flow illustrates how the deadlock occurs.
For the sake of brevity, assume that both locks (A and B) are
contended and we call the queued_spin_lock_slowpath() function.

        CPU0                                   CPU1
        ----                                   ----
  spin_lock_irqsave(A)                          |
  spin_unlock_irqrestore(A)                     |
    spin_lock(B)                                |
         |                                      |
         ▼                                      |
   id = qnodesp-&gt;count++;                       |
  (Note that nodes[0].lock == A)                |
         |                                      |
         ▼                                      |
      Interrupt                                 |
  (happens before "nodes[0].lock = B")          |
         |                                      |
         ▼                                      |
  spin_lock_irqsave(A)                          |
         |                                      |
         ▼                                      |
   id = qnodesp-&gt;count++                        |
   nodes[1].lock = A                            |
         |                                      |
         ▼                                      |
  Tail of MCS queue                             |
         |                             spin_lock_irqsave(A)
         ▼                                      |
  Head of MCS queue                             ▼
         |                             CPU0 is previous tail
         ▼                                      |
   Spin indefinitely                            ▼
  (until "nodes[1].next != NULL")      prev = get_tail_qnode(A, CPU0)
                                                |
                                                ▼
                                       prev == &amp;qnodes[CPU0].nodes[0]
                                     (as qnodes[CPU0].nodes[0].lock == A)
                                                |
                                                ▼
                                       WRITE_ONCE(prev-&gt;next, node)
                                                |
                                                ▼
                                        Spin indefinitely
                                     (until nodes[0].locked == 1)

Thanks to Saket Kumar Bhaskar for help with recreating the issue

Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Geetika Moolchandani &lt;geetika@linux.ibm.com&gt;
Reported-by: Vaishnavi Bhat &lt;vaish123@in.ibm.com&gt;
Reported-by: Jijo Varghese &lt;vargjijo@in.ibm.com&gt;
Signed-off-by: Nysal Jan K.A. &lt;nysal@linux.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: Rename yield_propagate_owner tunable</title>
<updated>2023-10-20T11:43:34Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b629b541702b082d161c307c9d7d9867911fc933'/>
<id>urn:sha1:b629b541702b082d161c307c9d7d9867911fc933</id>
<content type='text'>
Rename yield_propagate_owner to yield_sleepy_owner, which better
describes what it does (what, not how).

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-7-npiggin@gmail.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: Propagate sleepy if previous waiter is preempted</title>
<updated>2023-10-20T11:43:34Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1e6d5f725738fff83e1c3cbdb8547142eb8820c2'/>
<id>urn:sha1:1e6d5f725738fff83e1c3cbdb8547142eb8820c2</id>
<content type='text'>
The sleepy (aka lock-owner-is-preempted) condition is propagated down
the queue by each waiter. If a waiter becomes preempted, it can no
longer propagate sleepy. To allow subsequent waiters to yield to the
lock owner, also check the lock owner in this case.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-6-npiggin@gmail.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: don't propagate the not-sleepy state</title>
<updated>2023-10-20T11:43:34Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fcf77d44274b96a55cc74043561a4b3151b9ad24'/>
<id>urn:sha1:fcf77d44274b96a55cc74043561a4b3151b9ad24</id>
<content type='text'>
To simplify things, don't propagate the not-sleepy condition back down
the queue. Instead, have the waiters clear their own node-&gt;sleepy when
finding the lock owner is not preempted.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-5-npiggin@gmail.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: propagate owner preemptedness rather than CPU number</title>
<updated>2023-10-20T11:43:34Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fd8fae50c9c6117c4e05954deab2cc515508666b'/>
<id>urn:sha1:fd8fae50c9c6117c4e05954deab2cc515508666b</id>
<content type='text'>
Rather than propagating the CPU number of the preempted lock owner,
just propagate whether the owner was preempted. Waiters must read the
lock value when yielding to it to prevent races anyway, so might as
well always load the owner CPU from the lock.

To further simplify the code, also don't propagate the -1 (or
sleepy=false in the new scheme) down the queue. Instead, have the
waiters clear it themselves when finding the lock owner is not
preempted.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-4-npiggin@gmail.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: stop queued waiters trying to set lock sleepy</title>
<updated>2023-10-20T11:43:34Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f6568647382c667912245c8d07aa26c9c6d4d0c8'/>
<id>urn:sha1:f6568647382c667912245c8d07aa26c9c6d4d0c8</id>
<content type='text'>
If a queued waiter notices the lock owner or the previous waiter has
been preempted, it attempts to mark the lock sleepy, but it does this
as a try-set operation using the original lock value it got when
queueing, which will become stale as the queue progresses, and the
try-set will fail. Drop this and just set the sleepy seen clock.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-3-npiggin@gmail.com

</content>
</entry>
<entry>
<title>powerpc/qspinlock: Fix stale propagated yield_cpu</title>
<updated>2023-10-18T10:07:21Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-10-16T12:43:00Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f9bc9bbe8afdf83412728f0b464979a72a3b9ec2'/>
<id>urn:sha1:f9bc9bbe8afdf83412728f0b464979a72a3b9ec2</id>
<content type='text'>
yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).

The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.

This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.

This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.

Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.

Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Reported-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reported-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Debugged-by: "Nysal Jan K.A" &lt;nysal@linux.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Tested-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com
</content>
</entry>
<entry>
<title>powerpc: qspinlock: Enforce qnode writes prior to publishing to queue</title>
<updated>2023-06-21T05:13:57Z</updated>
<author>
<name>Rohan McLure</name>
<email>rmclure@linux.ibm.com</email>
</author>
<published>2023-05-10T03:31:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6f3136326ee47ae2dd5dac9306c9b08ccbc7e81e'/>
<id>urn:sha1:6f3136326ee47ae2dd5dac9306c9b08ccbc7e81e</id>
<content type='text'>
Annotate the release barrier and memory clobber (in effect, producing a
compiler barrier) in the publish_tail_cpu call. These barriers have the
effect of ensuring that qnode attributes are all written to prior to
publish the node to the waitqueue.

Even while the initial write to the 'locked' attribute is guaranteed to
terminate prior to the node being visible, KCSAN still complains that
the write is reorderable by the compiler. Issue a kcsan_release() to
inform KCSAN of the release barrier contained in publish_tail_cpu().

Signed-off-by: Rohan McLure &lt;rmclure@linux.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230510033117.1395895-3-rmclure@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc: qspinlock: Mark accesses to qnode lock checks</title>
<updated>2023-06-21T05:13:57Z</updated>
<author>
<name>Rohan McLure</name>
<email>rmclure@linux.ibm.com</email>
</author>
<published>2023-05-10T03:31:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=03d44ee80eac980a869ed3d5637ed85de6fb957f'/>
<id>urn:sha1:03d44ee80eac980a869ed3d5637ed85de6fb957f</id>
<content type='text'>
The powerpc implementation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.

Signed-off-by: Rohan McLure &lt;rmclure@linux.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230510033117.1395895-2-rmclure@linux.ibm.com

</content>
</entry>
</feed>
