<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/tools/perf/util/machine.c, branch linux-6.16.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.16.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.16.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-05-31T11:58:30Z</updated>
<entry>
<title>perf callchain: Always populate the addr_location map when adding IP</title>
<updated>2025-05-31T11:58:30Z</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2025-05-29T04:39:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a913ef6fd883c05bd6538ed21ee1e773f0d750b7'/>
<id>urn:sha1:a913ef6fd883c05bd6538ed21ee1e773f0d750b7</id>
<content type='text'>
Dropping symbols also meant the callchain maps wasn't populated, but
the callchain map is needed to find the DSO.

Plumb the symbols option better, falling back to thread__find_map()
rather than thread__find_symbol() when symbols are disabled.

Fixes: 02b2705017d2e5ad ("perf callchain: Allow symbols to be optional when resolving a callchain")
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.ibm.com&gt;
Cc: Ben Gainey &lt;ben.gainey@arm.com&gt;
Cc: Chaitanya S Prakash &lt;chaitanyas.prakash@arm.com&gt;
Cc: Charlie Jenkins &lt;charlie@rivosinc.com&gt;
Cc: Chun-Tse Shao &lt;ctshao@google.com&gt;
Cc: Colin Ian King &lt;colin.i.king@gmail.com&gt;
Cc: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Cc: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Cc: Graham Woodward &lt;graham.woodward@arm.com&gt;
Cc: Howard Chu &lt;howardchu95@gmail.com&gt;
Cc: Ilkka Koskinen &lt;ilkka@os.amperecomputing.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@linaro.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Krzysztof Łopatowski &lt;krzysztof.m.lopatowski@gmail.com&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Levi Yun &lt;yeoreum.yun@arm.com&gt;
Cc: Li Huafei &lt;lihuafei1@huawei.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Martin Liška &lt;martin.liska@hey.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Matt Fleming &lt;matt@readmodwrite.com&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Steinar H. Gunderson &lt;sesse@google.com&gt;
Cc: Stephen Brennan &lt;stephen.s.brennan@oracle.com&gt;
Cc: Steve Clevenger &lt;scclevenger@os.amperecomputing.com&gt;
Cc: Thomas Falcon &lt;thomas.falcon@intel.com&gt;
Cc: Veronika Molnarova &lt;vmolnaro@redhat.com&gt;
Cc: Weilin Wang &lt;weilin.wang@intel.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Yujie Liu &lt;yujie.liu@intel.com&gt;
Cc: Zhongqiu Han &lt;quic_zhonhan@quicinc.com&gt;
Cc: Zixian Cai &lt;fzczx123@gmail.com&gt;
Link: https://lore.kernel.org/r/20250529044000.759937-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf machine: Factor creating a "live" machine out of dwarf-unwind</title>
<updated>2025-05-28T12:24:59Z</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2025-03-13T05:29:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4c04654455c09b4905b84c350d88f718260a4cd4'/>
<id>urn:sha1:4c04654455c09b4905b84c350d88f718260a4cd4</id>
<content type='text'>
Factor out for use in places other than the dwarf unwinding tests for
libunwind.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Anne Macedo &lt;retpolanne@posteo.net&gt;
Cc: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Cc: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Cc: Howard Chu &lt;howardchu95@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@linaro.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Link: https://lore.kernel.org/r/20250313052952.871958-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Do not process non-JIT BPF ksymbol events</title>
<updated>2025-03-07T00:00:25Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2025-03-05T23:28:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b0920abe0d529101bcf2eb1cd309032d2a42b4db'/>
<id>urn:sha1:b0920abe0d529101bcf2eb1cd309032d2a42b4db</id>
<content type='text'>
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so
it'd be 0 when it's not JITed.  The ksymbol is needed to symbolize the
code when it gets samples in the region but non-JITed code cannot get
samples.  Thus it'd be ok to ignore them.

Actually it caused a performance issue in the perf tools on old ARM
kernels where it can refuse to JIT some BPF codes.  It ended up
splitting the existing kernel map (kallsyms).  And later lookup for a
kernel symbol would create a new kernel map from kallsyms and then
split it again and again. :(

Probably there's a bug in the kernel map/symbol handling in perf tools.
But I think we need to fix this anyway.

Reported-by: Kevin Nomura &lt;nomurak@google.com&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps</title>
<updated>2025-03-06T07:04:07Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2025-02-28T21:17:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=41453107bfc3008302c2b98cc01da55890235b77'/>
<id>urn:sha1:41453107bfc3008302c2b98cc01da55890235b77</id>
<content type='text'>
This was detected at the end of a 'perf record' session when build-id
collection was enabled and thus the BPF programs put in place while the
session was running, some even put in place by perf itself were
processed and inserted, with some overlaps related to BPF trampolines
and programs took place.

Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes"
the problem, in the sense that overlaps will be dealt with and then the
consistency will be kept, but it would be interesting to fully
understand why such overlaps take place and how to deal with them when
doing symbol resolution.

Reported-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Suggested-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com
Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf machine: Fixup kernel maps ends after adding extra maps</title>
<updated>2025-03-06T07:03:15Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2025-02-28T21:17:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f7a46e028c394cd422326caa7a2ad6ba0cd87915'/>
<id>urn:sha1:f7a46e028c394cd422326caa7a2ad6ba0cd87915</id>
<content type='text'>
I just noticed it would add extra kernel maps after modules.  I think it
should fixup end address of the kernel maps after adding all maps first.

Fixes: 876e80cf83d10585 ("perf tools: Fixup end address of modules")
Reported-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1
Link: https://lore.kernel.org/lkml/Z712hzvv22Ni63f1@google.com
Link: https://lore.kernel.org/r/20250228211734.33781-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf machine: Reuse module path buffer</title>
<updated>2025-02-24T23:46:33Z</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2025-02-22T06:10:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e7af1946818b302d1f8b83b6896e04e0fe8bc2eb'/>
<id>urn:sha1:e7af1946818b302d1f8b83b6896e04e0fe8bc2eb</id>
<content type='text'>
Rather than copying the path and appending the directory entry in a
fresh path buffer, append to the path at the end of where it is for
the recursion level. This saves a PATH_MAX buffer per recursion level
and some unnecessary copying.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Link: https://lore.kernel.org/r/20250222061015.303622-9-irogers@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf maps: Switch modules tree walk to io_dir__readdir</title>
<updated>2025-02-24T23:46:33Z</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2025-02-22T06:10:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f7cada5f7e7f51fa88428f267ef7216b769fd262'/>
<id>urn:sha1:f7cada5f7e7f51fa88428f267ef7216b769fd262</id>
<content type='text'>
Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.

Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Link: https://lore.kernel.org/r/20250222061015.303622-3-irogers@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf tools: Improve startup time by reducing unnecessary stat() calls</title>
<updated>2025-02-19T21:55:59Z</updated>
<author>
<name>Krzysztof Łopatowski</name>
<email>krzysztof.m.lopatowski@gmail.com</email>
</author>
<published>2025-02-06T11:33:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4bac7fb5862740087825eda3ed6168e91da8b7e6'/>
<id>urn:sha1:4bac7fb5862740087825eda3ed6168e91da8b7e6</id>
<content type='text'>
When testing perf trace on NixOS, I noticed significant startup delays:
- `ls`: ~2ms
- `strace ls`: ~10ms
- `perf trace ls`: ~550ms

Profiling showed that 51% of the time is spent reading files,
26% in loading BPF programs, and 11% in `newfstatat`.

This patch optimizes module path exploration by avoiding `stat()` calls
unless necessary. For filesystems that do not implement `d_type`
(DT_UNKNOWN), it falls back to the old behavior.
See `readdir(3)` for details.

This reduces `perf trace ls` time to ~500ms.

A more thorough startup optimization based on command parameters would
be ideal, but that is a larger effort.

Signed-off-by: Krzysztof Łopatowski &lt;krzysztof.m.lopatowski@gmail.com&gt;
Acked-by: Howard Chu &lt;howardchu95@gmail.com&gt;
Link: https://lore.kernel.org/r/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf report: Add machine parallelism</title>
<updated>2025-02-18T06:00:50Z</updated>
<author>
<name>Dmitry Vyukov</name>
<email>dvyukov@google.com</email>
</author>
<published>2025-02-13T09:08:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f13bc61b2e37957bf6e693ab5650d93fd21523f4'/>
<id>urn:sha1:f13bc61b2e37957bf6e693ab5650d93fd21523f4</id>
<content type='text'>
Add calculation of the current parallelism level (number of threads actively
running on CPUs). The parallelism level can be shown in reports on its own,
and to calculate latency overheads.

Signed-off-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf sample: Make user_regs and intr_regs optional</title>
<updated>2025-02-13T04:06:11Z</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2025-01-13T19:43:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dc6d2bc2d893a878e7b58578ff01b4738708deb4'/>
<id>urn:sha1:dc6d2bc2d893a878e7b58578ff01b4738708deb4</id>
<content type='text'>
The struct dump_regs contains 512 bytes of cache_regs, meaning the two
values in perf_sample contribute 1088 bytes of its total 1384 bytes
size. Initializing this much memory has a cost reported by Tavian
Barnes &lt;tavianator@tavianator.com&gt; as about 2.5% when running `perf
script --itrace=i0`:
https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/

Adrian Hunter &lt;adrian.hunter@intel.com&gt; replied that the zero
initialization was necessary and couldn't simply be removed.

This patch aims to strike a middle ground of still zeroing the
perf_sample, but removing 79% of its size by make user_regs and
intr_regs optional pointers to zalloc-ed memory. To support the
allocation accessors are created for user_regs and intr_regs. To
support correct cleanup perf_sample__init and perf_sample__exit
functions are created and added throughout the code base.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
</entry>
</feed>
