<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/powerpc/perf/isa207-common.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-06-29T09:42:41Z</updated>
<entry>
<title>powerpc/perf: Update MMCR2 to support event exclude_idle</title>
<updated>2022-06-29T09:42:41Z</updated>
<author>
<name>Madhavan Srinivasan</name>
<email>maddy@linux.ibm.com</email>
</author>
<published>2021-04-29T05:02:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5969e0c1c7e2132d8b2cf80168072b1195ddce46'/>
<id>urn:sha1:5969e0c1c7e2132d8b2cf80168072b1195ddce46</id>
<content type='text'>
struct perf_event_attr supports exclude counting of idle task.
This is sent to kernel via perf_event_attr.exclude_idle and
in perf tool, user can use ":I" event modifier to enable this
for specific event.

Monitor Mode Control Register 2 (MMCR2) SPR has control bits
for each PMCs to freeze counting based on the Control Register
CTRL[RUN] state. CTRL[RUN] is not set when idle task is
running. Patch adds a check for event attr.exclude_idle to
set MMCR2[FCnWAIT] bit.

Signed-off-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20210429050208.266619-1-maddy@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/perf: Fix the threshold compare group constraint for power9</title>
<updated>2022-05-22T05:58:30Z</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2022-05-06T06:10:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ab0cc6bbf0c812731c703ec757fcc3fc3a457a34'/>
<id>urn:sha1:ab0cc6bbf0c812731c703ec757fcc3fc3a457a34</id>
<content type='text'>
Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9).
When scheduling events as a group, all events in that group should
match value in threshold bits (like thresh compare, thresh control,
thresh select). Otherwise event open for the sibling events should fail.
But in the current code, incase thresh compare bits are not valid,
we are not failing in group_constraint function which can result
in invalid group schduling.

Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid.

Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and
PM_THRESH_MET (8734340101ec) is expected to fail as both event
request different thresh control bits and invalid thresh compare value.

Result before the patch changes:

[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

            11,048      r8735340401e0
             1,967      r8734340101ec

       1.001354036 seconds time elapsed

       0.001421000 seconds user
       0.000000000 seconds sys

Result after the patch changes:

[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (r8735340401e0).
/bin/dmesg | grep -i perf may provide additional information.

Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9")
Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Reviewed-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20220506061015.43916-2-kjain@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/perf: Fix the threshold compare group constraint for power10</title>
<updated>2022-05-22T05:58:29Z</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2022-05-06T06:10:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=505d31650ba96d6032313480fdb566d289a4698c'/>
<id>urn:sha1:505d31650ba96d6032313480fdb566d289a4698c</id>
<content type='text'>
Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 8-18 bits for power10).
When scheduling events as a group, all events in that group should
match value in threshold bits. Otherwise event open for the sibling
events should fail. But in the current code, incase thresh compare bits are
not valid, we are not failing in group_constraint function which can result
in invalid group schduling.

Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid in group_constraint function.

Patch also fixes the p10_thresh_cmp_val function to return -1,
incase threshold bits are not valid and changes corresponding check in
is_thresh_cmp_valid function to return false only when the thresh_cmp
value is less then 0.

Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (3534401e0) and
PM_THRESH_MET (34340101ec) is expected to fail as both event
request different thresh control bits.

Result before the patch changes:

[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

             8,482      r35340401e0
                 0      r34340101ec

       1.001474838 seconds time elapsed

       0.001145000 seconds user
       0.000000000 seconds sys

Result after the patch changes:

[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

     &lt;not counted&gt;      r35340401e0
   &lt;not supported&gt;      r34340101ec

       1.001499607 seconds time elapsed

       0.000204000 seconds user
       0.000760000 seconds sys

Fixes: 82d2c16b350f7 ("powerpc/perf: Adds support for programming of Thresholding in P10")
Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Reviewed-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20220506061015.43916-1-kjain@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc: fix typos in comments</title>
<updated>2022-05-05T12:12:44Z</updated>
<author>
<name>Julia Lawall</name>
<email>Julia.Lawall@inria.fr</email>
</author>
<published>2022-04-30T18:56:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1fd02f6605b855b4af2883f29a2abc88bdf17857'/>
<id>urn:sha1:1fd02f6605b855b4af2883f29a2abc88bdf17857</id>
<content type='text'>
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall &lt;Julia.Lawall@inria.fr&gt;
Reviewed-by: Joel Stanley &lt;joel@jms.id.au&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr

</content>
</entry>
<entry>
<title>powerpc: declare unmodified attribute_group usages const</title>
<updated>2022-03-08T11:15:32Z</updated>
<author>
<name>Rohan McLure</name>
<email>rmclure@linux.ibm.com</email>
</author>
<published>2022-03-07T23:14:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6b3a3e12f8e6eea47428bb39aaf58832b50bb379'/>
<id>urn:sha1:6b3a3e12f8e6eea47428bb39aaf58832b50bb379</id>
<content type='text'>
Inspired by (bd75b4ef4977: Constify static attribute_group structs),
accepted by linux-next, reported:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220210202805.7750-4-rikard.falkeborn@gmail.com/

Nearly all singletons of type struct attribute_group are never modified,
and so are candidates for being const. Declare them as const.

Signed-off-by: Rohan McLure &lt;rmclure@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20220307231414.86560-1-rmclure@linux.ibm.com
</content>
</entry>
<entry>
<title>powerpc/perf: Add data source encodings for power10 platform</title>
<updated>2021-12-16T10:31:44Z</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2021-12-06T09:17:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6ed05a8efda56e5be11081954929421de19cce88'/>
<id>urn:sha1:6ed05a8efda56e5be11081954929421de19cce88</id>
<content type='text'>
The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields</title>
<updated>2021-12-16T10:31:44Z</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2021-12-06T09:17:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4a20ee106154ac1765dea97932faad29f0ba57fc'/>
<id>urn:sha1:4a20ee106154ac1765dea97932faad29f0ba57fc</id>
<content type='text'>
The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses</title>
<updated>2021-10-19T15:27:01Z</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2021-10-06T14:06:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=26da4abfb38201c3cbe127daeded76d4c2bc9077'/>
<id>urn:sha1:26da4abfb38201c3cbe127daeded76d4c2bc9077</id>
<content type='text'>
Fix the data source encodings to represent L2.1/L3.1(another core's
L2/L3 on the same node) accesses properly for power10 and older
plaforms.

Add new macros(LEVEL/REM) which can be used to add mem_lvl_num and remote
field data inside perf_mem_data_src structure.

Result in power9 system with patch changes:

localhost:~/linux/tools/perf # ./perf mem report | grep Remote
     0.01%             1  252           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff7f25eb90
anon               HitM          N/A                     No       N/A        0              0
     0.01%             1  220           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff77776d90
anon               HitM          N/A                     No       N/A        0              0
     0.01%             1  220           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff817d9410
anon               HitM          N/A                     No       N/A        0              0

Fixes: 79e96f8f930d ("powerpc/perf: Export memory hierarchy info to user space")
Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20211006140654.298352-5-kjain@linux.ibm.com
</content>
</entry>
<entry>
<title>powerpc/perf: Fix sampled instruction type for larx/stcx</title>
<updated>2021-04-22T15:38:02Z</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2021-03-04T11:55:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b4ded42268ee3d703da208278342b9901abe145a'/>
<id>urn:sha1:b4ded42268ee3d703da208278342b9901abe145a</id>
<content type='text'>
Sampled Instruction Event Register (SIER) field [46:48] identifies the
sampled instruction type. ISA v3.1 says value of 0b111 for this field as
reserved, but in POWER10 it denotes LARX/STCX type which will hopefully
be fixed in ISA v3.1 update.

Patch fixes the functions to handle type value 7 for CPU_FTR_ARCH_31.

Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
[mpe: Avoid reading mmcra until necessary, use early return to deindent if block]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1614858937-1485-1-git-send-email-atrajeev@linux.vnet.ibm.com

</content>
</entry>
<entry>
<title>powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT</title>
<updated>2021-04-20T04:22:23Z</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2021-03-22T14:57:23Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=af31fd0c9107e400a8eb89d0eafb40bb78802f79'/>
<id>urn:sha1:af31fd0c9107e400a8eb89d0eafb40bb78802f79</id>
<content type='text'>
Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
</content>
</entry>
</feed>
