<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/tools/perf/util/sort.h, branch linux-5.14.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.14.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.14.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2021-07-15T20:27:52Z</updated>
<entry>
<title>perf report: Free generated help strings for sort option</title>
<updated>2021-07-15T20:27:52Z</updated>
<author>
<name>Riccardo Mancini</name>
<email>rickyman7@gmail.com</email>
</author>
<published>2021-07-15T16:07:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a37338aad8c4d8676173ead14e881d2ec308155c'/>
<id>urn:sha1:a37338aad8c4d8676173ead14e881d2ec308155c</id>
<content type='text'>
ASan reports the memory leak of the strings allocated by sort_help() when
running perf report.

This patch changes the returned pointer to char* (instead of const
char*), saves it in a temporary variable, and finally deallocates it at
function exit.

Signed-off-by: Riccardo Mancini &lt;rickyman7@gmail.com&gt;
Fixes: 702fb9b415e7c99b ("perf report: Show all sort keys in help output")
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/a38b13f02812a8a6759200b9063c6191337f44d4.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Support pipeline stage cycles for powerpc</title>
<updated>2021-03-26T11:49:54Z</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2021-03-22T14:57:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=06e5ca746c07380dfe0e4c3e10c34a6daa69eae6'/>
<id>urn:sha1:06e5ca746c07380dfe0e4c3e10c34a6daa69eae6</id>
<content type='text'>
The pipeline stage cycles details can be recorded on powerpc from the
contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
platform, sampling registers exposes the cycles spent in different
pipeline stages. Patch adds perf tools support to present two of the
cycle counter information along with memory latency (weight).

Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.

Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.

Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to list
of sort entries that can have dynamic header string.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Support instruction latency</title>
<updated>2021-02-08T19:25:00Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2021-02-02T20:09:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=590db42de068a1d11e51bd0796a9044621aeed2e'/>
<id>urn:sha1:590db42de068a1d11e51bd0796a9044621aeed2e</id>
<content type='text'>
The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.

The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.

Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.

Add new sort functions, INSTR Latency and Local INSTR Latency,
accordingly.

Add local_ins_lat to the default_mem_sort_order[].

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.vnet.ibm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Support data block and addr block</title>
<updated>2021-02-08T19:25:00Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2021-02-02T20:09:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a054c2989fe36e1bde07f34b98e108a8a3a1e011'/>
<id>urn:sha1:a054c2989fe36e1bde07f34b98e108a8a3a1e011</id>
<content type='text'>
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.

Add a new sort function, SORT_MEM_BLOCKED, for the two fields.

For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.

Add blocked as a default mem sort key for perf report and perf mem
report.

Committer testing:

So in machines without this capability we get a "N/A" filling the new "Blocked"
column:

  $ perf mem record ls
  arch     certs	 CREDITS  Documentation  include  ipc     Kconfig  lib       MAINTAINERS  mm   samples  security  usr    block
  COPYING	 crypto	 drivers  fs             init     Kbuild  kernel   LICENSES  Makefile     net  README   scripts   sound  tools
  virt
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
  $
  $ perf mem report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  # Total Lost Samples: 0
  #
  # Samples: 6  of event 'cpu/mem-loads,ldlat=30/Pu'
  # Total weight : 1381
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access         Symbol                   Shared Object  Data Symbol             Data Object   Snoop  TLB access    Locked  Blocked
  # ........  .......  ............  ....................  .......................  .............  ......................  ............  .....  ............  ......  .......
  #
      32.87%        1  454           Local RAM or RAM hit  [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91cef3078  libc-2.31.so  Hit    L1 or L2 hit  No       N/A
      25.56%        1  353           LFB or LFB hit        [.] strcmp               ld-2.31.so     [.] 0x00005586973855ca  ls            None   L1 or L2 hit  No       N/A
      22.59%        1  312           LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0e3b18  ld.so.cache   None   L1 or L2 hit  No       N/A
       8.47%        1  117           LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceee570  libc-2.31.so  None   L1 or L2 hit  No       N/A
       6.88%        1  95            LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceed490  libc-2.31.so  None   L1 or L2 hit  No       N/A
       3.62%        1  50            LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0ebe60  ld.so.cache   None   L1 or L2 hit  No       N/A

  # Samples: 11  of event 'cpu/mem-stores/Pu'
  # Total weight : 11
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access  Symbol                   Shared Object  Data Symbol             Data Object  Snoop  TLB access  Locked  Blocked
  # ........  .......  ............  .............  .......................  .............  ......................  ...........  .....  ..........  ......  .......
  #
       9.09%        1  0             L1 hit         [.] __strcoll_l          libc-2.31.so   [.] 0x00007fffe5648fc8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_lookup_symbol_x  ld-2.31.so     [.] 0x00007fffe56490b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_name_match_p     ld-2.31.so     [.] 0x00007fffe56487d8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_start            ld-2.31.so     [.] start_time+0x0      ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_sysdep_start     ld-2.31.so     [.] 0x00007fffe56494b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5648ff8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649064  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649130  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xaf8  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xc28  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] 0x00007fffe56495b8  [stack]      N/A    N/A         N/A      N/A

  # (Tip: Show user configuration overrides: perf config --user --list)
  $

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.vnet.ibm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE</title>
<updated>2021-01-20T17:34:20Z</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2021-01-05T19:57:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9fd74f209c69c9157584af9cdc500af9bbc06b82'/>
<id>urn:sha1:9fd74f209c69c9157584af9cdc500af9bbc06b82</id>
<content type='text'>
Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.

For example:

  # perf report --stdio --sort=comm,symbol,code_page_size
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 3K of event 'mem-loads:uP'
  # Event count (approx.): 1470769
  #
  # Overhead  Command  Symbol                        Code Page Size IPC [IPC Coverage]
  # ........  .......  ............................  .............. ....................
  #
      69.56%  dtlb     [.] GetTickCount              4K              -   -
      17.93%  dtlb     [.] Calibrate                 4K              -   -
      11.40%  dtlb     [.] __gettimeofday            4K              -   -
  #

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com
Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf sort: Add sort option for data page size</title>
<updated>2020-12-19T20:52:24Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2020-12-16T18:57:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a50d03e3b8b68df13e47dcbde6c5d39b4237c479'/>
<id>urn:sha1:a50d03e3b8b68df13e47dcbde6c5d39b4237c479</id>
<content type='text'>
Add a new sort option "data_page_size" for --mem-mode sort.  With this
option applied, perf can sort and report by sample's data page size.

Here is an example:

perf report --stdio --mem-mode
--sort=comm,symbol,phys_daddr,data_page_size

 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'mem-loads:uP'
 # Total weight : 9028
 # Sort order   : comm,symbol,phys_daddr,data_page_size
 #
 # Overhead  Command  Symbol                        Data Physical
 # Address
 # Data Page Size
 # ........  .......  ............................
 # ......................  ......................
 #
    11.19%  dtlb     [.] touch_buffer              [.] 0x00000003fec82ea8  4K
     8.61%  dtlb     [.] GetTickCount              [.] 0x00000003c4f2c8a8  4K
     4.52%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f58  4K
     4.33%  dtlb     [.] __gettimeofday            [.] 0x00000003fec82f48  4K
     4.32%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f78  4K
     4.28%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f50  4K
     4.23%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f70  4K
     4.11%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f68  4K
     4.00%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f98  4K
     3.91%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f90  4K
     3.43%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e98  4K
     3.42%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e90  4K
     0.09%  dtlb     [.] DoDependentLoads          [.] 0x000000036ea084c0  2M
     0.08%  dtlb     [.] DoDependentLoads          [.] 0x000000032b010b80  2M

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf hist: Add fast path for duplicate entries check</title>
<updated>2020-04-18T12:05:01Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2020-03-19T20:25:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=12e89e65f446476951f42aedeef56b6bd6f7f1e6'/>
<id>urn:sha1:12e89e65f446476951f42aedeef56b6bd6f7f1e6</id>
<content type='text'>
Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.

The hist_entry__cmp() is used to compare the new entry with the old
entries. It will go through all the available sorts in the sort_list,
and call the specific cmp of each sort, which is very slow.

Actually, for most cases, there are no duplicate entries in callchain.
The symbols are usually different. It's much faster to do a quick check
for symbols first. Only do the full cmp when the symbols are exactly the
same.

The quick check is only to check symbols, not dso. Export
_sort__sym_cmp.

  $ perf record --call-graph lbr ./tchain_edit_64

  Without the patch
  $time perf report --stdio
  real    0m21.142s
  user    0m21.110s
  sys     0m0.033s

  With the patch
  $time perf report --stdio
  real    0m10.977s
  user    0m10.948s
  sys     0m0.027s

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexey Budankov &lt;alexey.budankov@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Mathieu Poirier &lt;mathieu.poirier@linaro.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Pavel Gerasimov &lt;pavel.gerasimov@intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Vitaly Slobodskoy &lt;vitaly.slobodskoy@intel.com&gt;
Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add 'cgroup' sort key</title>
<updated>2020-04-03T12:37:55Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2020-03-25T12:45:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b629f3e9d01b4e4ad4e84c8f76fad514967a83da'/>
<id>urn:sha1:b629f3e9d01b4e4ad4e84c8f76fad514967a83da</id>
<content type='text'>
The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches with
the given substring.

For example it will look like following.  Note that record patch adding
--all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report/top: Improve toggle callchain menu option</title>
<updated>2020-01-06T14:46:10Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-12-12T14:48:23Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bdc633fec50be7e6856b9dee89af9bb7c5e9a04f'/>
<id>urn:sha1:bdc633fec50be7e6856b9dee89af9bb7c5e9a04f</id>
<content type='text'>
Taking into account the current status of the callchain, i.e. if folded,
show "Expand", otherwise "Collapse", also show the name of the entry
that will be affected and mention the hotkeys for expanding/collapsing
all callchains below the main entry, the one that appears with/without
callchains.

Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-03arm6poo8463k5tfcfp7gkk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf diff: Report noisy for cycles diff</title>
<updated>2019-10-11T13:57:00Z</updated>
<author>
<name>Jin Yao</name>
<email>yao.jin@linux.intel.com</email>
</author>
<published>2019-09-25T01:14:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cebf7d51a6c3babc4d0589da7aec0de1af0a5691'/>
<id>urn:sha1:cebf7d51a6c3babc4d0589da7aec0de1af0a5691</id>
<content type='text'>
This patch prints the stddev and hist for the cycles diff of program
block. It can help us to understand if the cycles is noisy or not.

This patch is inspired by Andi Kleen's patch:

  https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

  perf record -b ./div
  perf record -b ./div
  perf diff -c cycles

  # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
  # ........  .......................................................... ....  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0  libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0  div                [.] rand@plt
       0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

  perf diff -c cycles --cycles-hist

  # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
  # ........  .......................................................... ....  .................  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0                     libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0                     libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0                     libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0                     div                [.] rand@plt
       0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr

 v8:
 ---
 Rebase to perf/core branch

 v7:
 ---
 1. v6 got Jiri's ACK.
 2. Rebase to latest perf/core branch.

 v6:
 ---
 1. Jiri provides better code for using data__hpp_register() in ui_init().
    Use this code in v6.

 v5:
 ---
 1. Refine the use of data__hpp_register() in ui_init() according to
    Jiri's suggestion.

 v4:
 ---
 1. Rename the new option from '--noisy' to '--cycles-hist'
 2. Remove the option '-n'.
 3. Only update the spark value and stats when '--cycles-hist' is enabled.
 4. Remove the code of printing '..'.

 v3:
 ---
 1. Move the histogram to a separate column
 2. Move the svals[] out of struct stats

 v2:
 ---
 Jiri got a compile error,

  CC       builtin-diff.o
  builtin-diff.c: In function ‘compute_cycles_diff’:
  builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
  712 |          labs(pair-&gt;block_info-&gt;cycles_spark[i] -
      |          ^~~~

 Because the result of u64 - u64 is still u64. Now we change the type of
 cycles_spark[] to s64.

Signed-off-by: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
