summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)Author
2025-11-13perf libbfd: Ensure libbfd is initialized prior to useIan Rogers
Multiple threads may be creating and destroying BFD objects in situations like `perf top`. Without appropriate initialization crashes may occur during libbfd's cache management. BFD's locks require recursive mutexes, add support for these. Committer testing: This happens only when building with 'make BUILD_NONDISTRO=1' and having the binutils-devel package (or equivalent) installed, i.e. linking with binutils devel files, an opt-in perf build. Before: root@x1:~# perf top perf: Segmentation fault -------- backtrace -------- <SNIP multiple failed attempts at printing a backtrace> root@x1:~# After this patch it works as before. Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/ Fixes: 95931d9a594dd0b5 ("perf libbfd: Move libbfd functionality to its own file") Reported-by: Guilherme Amadio <amadio@gentoo.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13perf header: Write bpf_prog (infos|btfs)_cnt to data fileThomas Falcon
With commit f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info"), the write_bpf_( prog_info() | btf() ) functions exit without writing anything if env->bpf_prog.(infos| btfs)_cnt is zero. process_bpf_( prog_info() | btf() ), however, still expect a "count" value to exist in the data file. If btf information is empty, for example, process_bpf_btf will read garbage or some other data as the number of btf nodes in the data file. As a result, the data file will not be processed correctly. Instead, write the count to the data file and exit if it is zero. Fixes: f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-11perf stat: Align metric output without eventsNamhyung Kim
One of my concern in the perf stat output was the alignment in the metrics and shadow stats. I think it missed to calculate the basic output length using COUNTS_LEN and EVNAME_LEN but missed to add the unit length like "msec" and surround 2 spaces. I'm not sure why it's not printed below though. But anyway, now it shows correctly aligned metric output. $ perf stat true Performance counter stats for 'true': 859,772 task-clock # 0.395 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 56 page-faults # 65.134 K/sec 1,075,022 instructions # 0.86 insn per cycle 1,255,911 cycles # 1.461 GHz 220,573 branches # 256.548 M/sec 7,381 branch-misses # 3.35% of all branches TopdownL1 # 19.2 % tma_retiring # 28.6 % tma_backend_bound # 9.5 % tma_bad_speculation # 42.6 % tma_frontend_bound 0.002174871 seconds time elapsed ^ | 0.002154000 seconds user | 0.000000000 seconds sys here Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf tool_pmu: Make core_wide and target_cpu json eventsIan Rogers
For the sake of better documentation, add core_wide and target_cpu to the tool.json. When the values of system_wide and user_requested_cpu_list are unknown, use the values from the global stat_config. Example output showing how '-a' modifies the values in `perf stat`: ``` $ perf stat -e core_wide,target_cpu true Performance counter stats for 'true': 0 core_wide 0 target_cpu 0.000993787 seconds time elapsed 0.001128000 seconds user 0.000000000 seconds sys $ perf stat -e core_wide,target_cpu -a true Performance counter stats for 'system wide': 1 core_wide 1 target_cpu 0.002271723 seconds time elapsed $ perf list ... tool: core_wide [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool] ... target_cpu [1 if CPUs being analyzed,0 if threads/processes. Unit: tool] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove "unit" workarounds for metric-onlyIan Rogers
Remove code that tested the "unit" as in KB/sec for certain hard coded metric values and did workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Fix default metricgroup display on hybridIan Rogers
The logic to skip output of a default metric line was firing on Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the need_full_name check as it is equivalent to the different PMU test in the cases we care about, merge the 'if's and flip the evsel of the PMU test. The 'if' is now basically saying, if the output matches the last printed output then skip the output. Before: ``` TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation # 24.3 % tma_frontend_bound TopdownL1 (cpu_core) # 33.9 % tma_backend_bound # 30.6 % tma_retiring # 42.2 % tma_backend_bound # 25.0 % tma_frontend_bound (49.81%) # 12.8 % tma_bad_speculation # 20.0 % tma_retiring (59.46%) ``` After: ``` TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation # 43.7 % tma_frontend_bound # 30.7 % tma_backend_bound # 17.2 % tma_retiring TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound # 37.6 % tma_frontend_bound (49.66%) # 18.0 % tma_bad_speculation # 12.6 % tma_retiring (59.58%) ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove hard coded shadow metricsIan Rogers
Now that the metrics are encoded in common json the hard coded printing means the metrics are shown twice. Remove the hard coded version. This means that when specifying events, and those events correspond to a hard coded metric, the metric will no longer be displayed. The metric will be displayed if the metric is requested. Due to the adhoc printing in the previous approach it was often found frustrating, the new approach avoids this. The default perf stat output on an alderlake now looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 19,697 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation # 24.9 % tma_frontend_bound TopdownL1 (cpu_core) # 34.3 % tma_backend_bound # 30.1 % tma_retiring 6,593 page-faults # nan faults/sec page_faults_per_second 729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%) 1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 19.7 % tma_bad_speculation # 14.2 % tma_retiring (50.14%) # 37.3 % tma_frontend_bound (50.31%) 87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%) 512,046,956 cpu_core/branches/ # nan M/sec branch_frequency 1,111 cpu-migrations # nan migrations/sec migrations_per_second # 28.8 % tma_backend_bound (60.26%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%) 2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle 3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%) 7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.006621701 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf jevents: Add metric DefaultShowEventsIan Rogers
Some Default group metrics require their events showing for consistency with perf's previous behavior. Add a flag to indicate when this is the case and use it in stat-display. As events are coming from Default metrics remove that default hardware and software events from perf stat. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 20,550 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.1 % tma_frontend_bound TopdownL1 (cpu_core) # 29.2 % tma_backend_bound # 33.7 % tma_retiring 6,685 page-faults # nan faults/sec page_faults_per_second 790,091,064 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.83%) 2,563,918,366 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 12.3 % tma_bad_speculation # 14.5 % tma_retiring (50.20%) # 33.8 % tma_frontend_bound (50.24%) 76,390,322 cpu_atom/branches/ # nan M/sec branch_frequency (60.20%) 1,015,173,047 cpu_core/branches/ # nan M/sec branch_frequency 1,325 cpu-migrations # nan migrations/sec migrations_per_second # 39.3 % tma_backend_bound (60.17%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 554,347,072 cpu_atom/instructions/ # 0.64 insn per cycle # 0.6 instructions insn_per_cycle (60.14%) 5,228,931,991 cpu_core/instructions/ # 2.04 insn per cycle # 2.0 instructions insn_per_cycle 4,308,874 cpu_atom/branch-misses/ # 5.65% of all branches # 5.6 % branch_miss_rate (49.76%) 9,890,606 cpu_core/branch-misses/ # 0.97% of all branches # 1.0 % branch_miss_rate 1.005477803 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf jevents: Add set of common metrics based on default onesIan Rogers
Add support to getting a common set of metrics from a default table. It simplifies the generation to add json metrics at the same time. The metrics added are CPUs_utilized, cs_per_second, migrations_per_second, page_faults_per_second, insn_per_cycle, stalled_cycles_per_instruction, frontend_cycles_idle, backend_cycles_idle, cycles_frequency, branch_frequency and branch_miss_rate based on the shadow metric definitions. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 2 Performance counter stats for 'system wide': 0.00 msec cpu-clock # 0.000 CPUs utilized 77,739 context-switches 15,033 cpu-migrations 321,313 page-faults 14,355,634,225 cpu_atom/instructions/ # 1.40 insn per cycle (35.37%) 134,561,560,583 cpu_core/instructions/ # 3.44 insn per cycle (57.85%) 10,263,836,145 cpu_atom/cycles/ (35.42%) 39,138,632,894 cpu_core/cycles/ (57.60%) 2,989,658,777 cpu_atom/branches/ (42.60%) 32,170,570,388 cpu_core/branches/ (57.39%) 29,789,870 cpu_atom/branch-misses/ # 1.00% of all branches (42.69%) 165,991,152 cpu_core/branch-misses/ # 0.52% of all branches (57.19%) (software) # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 11.9 % tma_bad_speculation # 19.6 % tma_frontend_bound (63.97%) TopdownL1 (cpu_core) # 18.8 % tma_backend_bound # 49.7 % tma_retiring (63.97%) (software) # nan faults/sec page_faults_per_second # nan GHz cycles_frequency (42.88%) # nan GHz cycles_frequency (69.88%) TopdownL1 (cpu_atom) # 11.7 % tma_bad_speculation # 29.9 % tma_retiring (50.07%) TopdownL1 (cpu_atom) # 31.3 % tma_frontend_bound (43.09%) (cpu_atom) # nan M/sec branch_frequency (43.09%) # nan M/sec branch_frequency (70.07%) # nan migrations/sec migrations_per_second TopdownL1 (cpu_atom) # 27.1 % tma_backend_bound (43.08%) (software) # 0.0 CPUs CPUs_utilized # 1.4 instructions insn_per_cycle (43.04%) # 3.5 instructions insn_per_cycle (69.99%) # 1.0 % branch_miss_rate (35.46%) # 0.5 % branch_miss_rate (65.02%) 2.005626564 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf expr: Add #target_cpu literalIan Rogers
For CPU nanoseconds a lot of the stat-shadow metrics use either task-clock or cpu-clock, the latter being used when target__has_cpu. Add a #target_cpu literal so that json metrics can perform the same test. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf metricgroup: Add care to picking the evsel for displaying a metricIan Rogers
Rather than using the first evsel in the matched events, try to find the least shared non-tool evsel. The aim is to pick the first evsel that typifies the metric within the list of metrics. This addresses an issue where Default metric group metrics may lose their counter value due to how the stat displaying hides counters for default event/metric output. For a metricgroup like TopdownL1 on an Intel Alderlake the change is, before there are 4 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 7,782,334,296 cpu_core/TOPDOWN.SLOTS/ # 10.4 % tma_bad_speculation # 19.7 % tma_frontend_bound 2,668,927,977 cpu_core/topdown-retiring/ # 35.7 % tma_backend_bound # 34.1 % tma_retiring 803,623,987 cpu_core/topdown-bad-spec/ 167,514,386 cpu_core/topdown-heavy-ops/ 1,555,265,776 cpu_core/topdown-fe-bound/ 2,792,733,013 cpu_core/topdown-be-bound/ 279,769,310 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.2 % tma_retiring # 15.1 % tma_bad_speculation 457,917,232 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 38.4 % tma_backend_bound # 34.2 % tma_frontend_bound 783,519,226 cpu_atom/TOPDOWN_FE_BOUND.ALL/ 10,790,192 cpu_core/INT_MISC.UOP_DROPPING/ 879,845,633 cpu_atom/TOPDOWN_BE_BOUND.ALL/ ``` After there are 6 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 2,377,551,258 cpu_core/TOPDOWN.SLOTS/ # 7.9 % tma_bad_speculation # 36.4 % tma_frontend_bound 480,791,142 cpu_core/topdown-retiring/ # 35.5 % tma_backend_bound 186,323,991 cpu_core/topdown-bad-spec/ 65,070,590 cpu_core/topdown-heavy-ops/ # 20.1 % tma_retiring 871,733,444 cpu_core/topdown-fe-bound/ 848,286,598 cpu_core/topdown-be-bound/ 260,936,456 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.4 % tma_retiring # 17.6 % tma_bad_speculation 419,576,513 cpu_atom/CPU_CLK_UNHALTED.CORE/ 797,132,597 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.0 % tma_frontend_bound 3,055,447 cpu_core/INT_MISC.UOP_DROPPING/ 671,014,164 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 32.0 % tma_backend_bound ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf tools: Fix missing feature check for inherit + SAMPLE_READNamhyung Kim
It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ on recent kernels. Not having _TID makes the feature check wrongly detect the inherit and _READ support. It was reported that the following command failed due to the error in the missing feature check on Intel SPR machines. $ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls Error: Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed. Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 3b193a57baf15c468 ("perf tools: Detect missing kernel features properly") Reported-and-tested-by: Chen, Zide <zide.chen@intel.com> Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-10perf symbol: Remove unneeded semicolonChen Ni
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the semantic patch at scripts/coccinelle/misc/semicolon.cocci. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-09perf pmu: Make pmu_alias_terms weak againIan Rogers
The terms for a json event should be weak so they don't override command line options. Before: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 |grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 200003 { sample_period, sample_freq } 2000003 { sample_period, sample_freq } 1000 ``` After: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 |grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 ``` Fixes: 84bae3af20d0 ("perf pmu: Don't eagerly parse event terms") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-07perf tool: Add a delegate_tool that just delegates actions to another toolIan Rogers
Add an ability to be able to compose perf_tools, by having one perform an action and then calling a delegate. Currently the perf_tools have if-then-elses setting the callback and then if-then-elses within the callback. Understanding the behavior is complex as it is in two places and logic for numerous operations, within things like perf inject, is interwoven. By chaining perf_tools together based on command line options this kind of code can be avoided. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-07perf tool: Add the perf_tool argument to all callbacksIan Rogers
Getting context for what a tool is doing, such as the perf_inject instance, using container_of the tool is a common pattern in the code. This isn't possible event_op2, event_op3 and event_op4 callbacks as the tool isn't passed. Add the argument and then fix function signatures to match. As tools maybe reading a tool from somewhere else, change that code to use the passed in tool. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-06perf record: Make sure to update build-ID cacheNamhyung Kim
Recent change on enabling --buildid-mmap by default brought an issue with build-id handling. With build-ID in MMAP2 records, we don't need to save the build-ID table in the header of a perf data file. But the actual file contents still need to be cached in the debug directory for annotation etc. Split the build-ID header processing and caching and make sure perf record to save hit DSOs in the build-ID cache by moving perf_session__cache_build_ids() to the end of the record__ finish_output(). Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-03perf metricgroup: When copy metrics copy default informationIan Rogers
When copy metrics into a group also copy default information from the original metrics. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-03perf metricgroup: Missed free on error pathIan Rogers
If an out-of-memory occurs the expr also needs freeing. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-03perf metricgroup: Update comment on location of metric_event listIan Rogers
Update comment as the stat_config no longer holds all metrics. Signed-off-by: Ian Rogers <irogers@google.com> Fixes: faebee18d720 ("perf stat: Move metric list from config to evlist") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-03perf evsel: Remove unused metric_events variableIan Rogers
The metric_events exist in the metric_expr list and so this variable has been unused for a while. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-03perf symbols: Handle '1' symbols in /proc/kallsymsArnaldo Carvalho de Melo
I started seeing this in recent Fedora 42 kernels: root@x1:~# uname -a Linux x1 6.17.4-200.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 19 18:47:49 UTC 2025 x86_64 GNU/Linux root@x1:~# root@x1:~# perf test 1 1: vmlinux symtab matches kallsyms : FAILED! root@x1:~# Related to: root@x1:~# grep ' 1 ' /proc/kallsyms ffffffffb098bc00 1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ ffffffffb098bc10 1 _RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ root@x1:~# That is found in: root@x1:~# pahole --running_kernel_vmlinux /usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux root@x1:~# root@x1:~# readelf -sW /usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux | grep __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ 150649: ffffffff81f8bc00 16 FUNC LOCAL DEFAULT 1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ root@x1:~# But was being filtered out when reading /proc/kallsyms, as the '1' symbol type was not being handled, do it, there are just two of them at this point. Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Benno Lossin <lossin@kernel.org> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-31perf tools: Cache counter names for raw samples on s390Ian Rogers
Searching all event names is slower now that legacy names are included. Add a cache to avoid long iterative searches. Note, the cache isn't cleaned up and is as such a memory leak, however, globally reachable leaks like this aren't treated as leaks by leak sanitizer. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/09943f4f-516c-4b93-877c-e4a64ed61d38@linux.ibm.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-31perf trace: Increase syscall handler map size to 1024Namhyung Kim
The syscalls_sys_{enter,exit} map in augmented_raw_syscalls.bpf.c has max entries of 512. Usually syscall numbers are smaller than this but x86 has x32 ABI where syscalls start from 512. That makes trace__init_syscalls_bpf_prog_array_maps() fail in the middle of the loop when it accesses those keys. As the loop iteration is not ordered by syscall numbers anymore, the failure can affect non-x32 syscalls. Let's increase the map size to 1024 so that it can handle those ABIs too. While most systems won't need this, increasing the size will be safer for potential future changes. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-31perf lock contention: Load kernel map before lookupNamhyung Kim
On some machines, it caused troubles when it tried to find kernel symbols. I think it's because kernel modules and kallsyms are messed up during load and split. Basically we want to make sure the kernel map is loaded and the code has it in the lock_contention_read(). But recently we added more lookups in the lock_contention_prepare() which is called before _read(). Also the kernel map (kallsyms) may not be the first one in the group like on ARM. Let's use machine__kernel_map() rather than just loading the first map. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 688d2e8de231c54e ("perf lock contention: Add -l/--lock-addr option") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-27perf hwmon_pmu: Fix uninitialized variable warningMichal Suchanek
The line_len is only set on success. Check the return value instead. util/hwmon_pmu.c: In function ‘perf_pmus__read_hwmon_pmus’: util/hwmon_pmu.c:742:20: warning: ‘line_len’ may be used uninitialized [-Wmaybe-uninitialized] 742 | if (line_len > 0 && line[line_len - 1] == '\n') | ^ util/hwmon_pmu.c:719:24: note: ‘line_len’ was declared here 719 | size_t line_len; Fixes: 53cc0b351ec9 ("perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-25perf auxtrace: Add auxtrace_synth_id_range_start() helpertanze
To avoid hardcoding the offset value for synthetic event IDs in multiple auxtrace modules (arm-spe, cs-etm, intel-pt, etc.), and to improve code reusability, this patch unifies the handling of the ID offset via a dedicated helper function. Signed-off-by: tanze <tanze@kylinos.cn> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-25perf stat: Add/fix bperf cgroup max events workaroundsIan Rogers
Commit b8308511f6e0 bumped the max events to 1024 but this results in BPF verifier issues if the number of command line events is too large. Workaround this by: 1) moving the constants to a header file to share between BPF and perf C code, 2) testing that the maximum number of events doesn't cause BPF verifier issues in debug builds, 3) lower the max events from 1024 to 128, 4) in perf stat, if there are more events than the BPF counters can support then disable BPF counter usage. The rodata setup is factored into its own function to avoid duplicating it in the testing code. Signed-off-by: Ian Rogers <irogers@google.com> Fixes: b8308511f6e0 ("perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-23perf cs-etm: Mute enumeration value warningLeo Yan
When the OpenCSD library introduces a new enumeration value (for example, in the v1.7.1 release), the perf build fails with an error: util/cs-etm-decoder/cs-etm-decoder.c:600:10: error: enumeration value 'OCSD_GEN_TRC_ELEM_ITMTRACE' not explicitly handled in switch [-Werror, -Wswitch-enum] 600 | switch (elem->elem_type) { | ^~~~~~~~~~~~~~~ 1 error generated. Convert to if-else sentences to mute the enumeration value warning, which can avoid build failures whenever the lib is updated. No functional change. Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-23tools: arm64: Add Cortex-A720AE definitionsKuninori Morimoto
Add cputype definitions for Cortex-A720AE. These will be used for errata detection in subsequent patches. These values can be found in the Cortex-A720AE TRM: https://developer.arm.com/documentation/102828/0001/ ... in Table A-187 Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Save pointer offset in stack stateZecheng Li
The tracked pointer offset was not being preserved in the stack state, which could lead to incorrect type analysis. This change adds a ptr_offset field to the type_state_stack struct and passes it to set_stack_state and findnew_stack_state to ensure the offset is preserved after the pointer is loaded from a stack location. It improves the type annotation coverage and quality. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Track arithmetic instructions on pointersZecheng Li
Track the arithmetic operations on registers with pointer types. We handle only add, sub and lea instructions. The original pointer information needs to be preserved for getting outermost struct types. For example, reg0 points to a struct cfs_rq, when we add 0x10 to reg0, it should preserve the information of struct cfs_rq + 0x10 in the register instead of a pointer type to the child field at 0x10. Details: 1. struct type_state_reg now includes an offset, indicating if the register points to the start or an internal part of its associated type. This offset is used in mem to reg and reg to stack mem transfers, and also applied to the final type offset. 2. lea offset(%sp/%fp), reg is now treated as taking the address of a stack variable. It worked fine in most cases, but an issue with this approach is the pointer type may not exist. 3. lea offset(%base), reg is handled by moving the type from %base and adding an offset, similar to an add operation followed by a mov reg to reg. 4. Non-stack variables from DWARF with non-zero offsets in their location expressions are now accepted with register offset tracking. Multi-register addressing modes in LEA are not supported. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Track address registers via TSR_KIND_POINTERZecheng Li
Introduce TSR_KIND_POINTER to improve the data type profiler's ability to track pointer-based memory accesses and address register variables. TSR_KIND_POINTER represents that the location holds a pointer type to the type in the type state. The semantics match the `breg` registers that describe a memory location. This change implements handling for this new kind in mov instructions and in the check_matching_type() function. When a TSR_KIND_POINTER is moved to the stack, the stack state size is set to the architecture's pointer size. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Skip annotating data types to lea instructionsZecheng Li
Introduce a helper function is_address_gen_insn() to check arch-dependent address generation instructions like lea in x86. Remove type annotation on these instructions since they are not accessing memory. It should be counted as `no_mem_ops`. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Check return value of evsel__get_arch() properlyTianyou Li
Check the error code of evsel__get_arch() in the symbol__annotate(). Previously it checked non-zero value but after the refactoring it does only for negative values. Fixes: 0669729eb0afb0cf ("perf annotate: Factor out evsel__get_arch()") Suggested-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tianyou Li <tianyou.li@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: fix a crash when annotate the same symbol with 's' and 'T'Tianyou Li
When perf report with annotation for a symbol, press 's' and 'T', then exit the annotate browser. Once annotate the same symbol, the annotate browser will crash. The browser.arch was required to be correctly updated when data type feature was enabled by 'T'. Usually it was initialized by symbol__annotate2 function. If a symbol has already been correctly annotated at the first time, it should not call the symbol__annotate2 function again, thus the browser.arch will not get initialized. Then at the second time to show the annotate browser, the data type needs to be displayed but the browser.arch is empty. Stack trace as below: Perf: Segmentation fault -------- backtrace -------- #0 0x55d365 in ui__signal_backtrace setup.c:0 #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930] #2 0x570f08 in arch__is perf[570f08] #3 0x562186 in annotate_get_insn_location perf[562186] #4 0x562626 in __hist_entry__get_data_type annotate.c:0 #5 0x56476d in annotation_line__write perf[56476d] #6 0x54e2db in annotate_browser__write annotate.c:0 #7 0x54d061 in ui_browser__list_head_refresh perf[54d061] #8 0x54dc9e in annotate_browser__refresh annotate.c:0 #9 0x54c03d in __ui_browser__refresh browser.c:0 #10 0x54ccf8 in ui_browser__run perf[54ccf8] #11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92] #12 0x552293 in do_annotate hists.c:0 #13 0x55941c in evsel__hists_browse hists.c:0 #14 0x55b00f in evlist__tui_browse_hists perf[55b00f] #15 0x42ff02 in cmd_report perf[42ff02] #16 0x494008 in run_builtin perf.c:0 #17 0x494305 in handle_internal_command perf.c:0 #18 0x410547 in main perf[410547] #19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0] #20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680] #21 0x410b75 in _start perf[410b75] Fixes: 1d4374afd000 ("perf annotate: Add 'T' hot key to toggle data type display") Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tianyou Li <tianyou.li@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-21perf annotate: Fix build with NO_SLANG=1Namhyung Kim
The recent change for perf c2c annotate broke build without slang support like below. builtin-annotate.c: In function 'hists__find_annotations': builtin-annotate.c:522:73: error: 'NO_ADDR' undeclared (first use in this function); did you mean 'NR_ADDR'? 522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR); | ^~~~~~~ | NR_ADDR builtin-annotate.c:522:73: note: each undeclared identifier is reported only once for each function it appears in builtin-annotate.c:522:31: error: too many arguments to function 'hist_entry__tui_annotate' 522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR); | ^~~~~~~~~~~~~~~~~~~~~~~~ In file included from util/sort.h:6, from builtin-annotate.c:28: util/hist.h:756:19: note: declared here 756 | static inline int hist_entry__tui_annotate(struct hist_entry *he __maybe_unused, | ^~~~~~~~~~~~~~~~~~~~~~~~ And I noticed that it missed to update the other side of #ifdef HAVE_SLANG_SUPPORT. Let's fix it. Cc: Tianyou Li <tianyou.li@intel.com> Fixes: cd3466cd2639783d ("perf c2c: Add annotation support to perf c2c report") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-20perf parse-events: Make X modifier more respectful of groupsIan Rogers
Events with an X modifier were reordered within a group, for example slots was made the leader in: ``` $ perf record -e '{cpu/mem-stores/ppu,cpu/slots/uX}' -- sleep 1 ``` Fix by making `dont_regroup` evsels always use their index for sorting. Make the cur_leader, when fixing the groups, be that of `dont_regroup` evsel so that the `dont_regroup` evsel doesn't become a leader. On a tigerlake this patch corrects this and meets expectations in: ``` $ perf stat -e '{cpu/mem-stores/,cpu/slots/uX}' -a -- sleep 0.1 Performance counter stats for 'system wide': 83,458,652 cpu/mem-stores/ 2,720,854,880 cpu/slots/uX 0.103780587 seconds time elapsed $ perf stat -e 'slots,slots:X' -a -- sleep 0.1 Performance counter stats for 'system wide': 732,042,247 slots (48.96%) 643,288,155 slots:X (51.04%) 0.102731018 seconds time elapsed ``` Closes: https://lore.kernel.org/lkml/18f20d38-070c-4e17-bc90-cf7102e1e53d@linux.intel.com/ Fixes: 035c17893082 ("perf parse-events: Add 'X' modifier to exclude an event from being regrouped") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf c2c annotate: Start from the contention lineTianyou Li
Add support to highlight the contention line in the annotate browser, use 'TAB'/'UNTAB' to refocus to the contention line. Signed-off-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Pan Deng <pan.deng@intel.com> Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com> Reviewed-by: Wangyang Guo <wangyang.guo@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024Ian Rogers
The MAX_EVENTS value ensured a counted loop presumably to satisfy the BPF verifier. It is possible to go past 32 events when gathering uncore events. Increase the amount to 1024 as that should provide some amount of headroom. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf python: Add PMU argument to parse_metricsIan Rogers
Add an optional PMU argument to parse_metrics to allow restriction of the particular metrics to be opened. If no argument is provided then all metrics with the given name/group are opened Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf evsel: Improvements to __evsel__matchIan Rogers
Ensure both the perf_event_attr and alternate_hw_config are checked in the match. Don't mask the config if the perf_event_attr isn't a HARDWARE or HW_CACHE event. Add common early exit cases. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf evlist: Avoid scanning all PMUs for evlist__new_defaultIan Rogers
Rather than wildcard matching the cycles event specify only the core PMUs. This avoids potentially loading unnecessary uncore PMUs. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Remove hard coded legacy hardware and cache parsingIan Rogers
Now that legacy hardware and cache events are in json, having the lexer match the specific event is no longer necessary and generic PMU parsing can take place. Because of this remove the specific term parsing, event adding, and passing of alternate_hw_config which was now always PERF_COUNT_HW_MAX. This mirrors a similar change for software events in commit 6e9fa4131abb ("perf parse-events: Remove non-json software events"). With no hard coded legacy hardware or cache events the wild card, case insensitivity, etc. is consistent for events. This does, however, mean events like cycles will wild card against all PMUs. A change does the same was originally posted and merged from: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com and reverted by Linus in commit 4f1b067359ac ("Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"") due to his dislike for the cycles behavior on ARM. Earlier patches in this series make perf record event opening failures non-fatal and hide the cycles event's failure to open on ARM in perf record, so it is expected the behavior will now be transparent in perf record. perf stat with a cycles event will wildcard open the event on all PMUs. As cycles is a "default event", the perf stat behavior for default events was updated to only open them on core/software PMUs. The change to support legacy events with PMUs was done to clean up Intel's hybrid PMU implementation. Having sysfs/json events with increased priority to legacy was requested by Mark Rutland <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy events on that PMU. It was requested that RISC-V be able to add events to the perf tool json so the PMU driver didn't need to map legacy events to config encodings: https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ A previous series of patches decreasing legacy hardware event priorities was posted in: https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ Namhyung Kim <namhyung@kernel.org> mentioned that hardware and software events can be implemented similarly: https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf print-events: Remove print_symbol_eventsIan Rogers
Now legacy hardware events are in json there's no need for a specific printing routine that previously served for both hardware and software events. The associated event_symbols_hw is also removed. To support the previous filtered version use an event glob of "legacy hardware" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf print-events: Remove print_hwcache_eventsIan Rogers
Now legacy cache events are in json there's no need for a specific printing routine. To support the previous filtered version use an event glob of "legacy cache" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Add and use legacy_terms in alias informationIan Rogers
Add support to finding/adding events from the default_core event table. If an event already exists from sysfs/json then the default_core configuration is saved in the legacy_terms string. Lazily use the legacy_terms string to set a legacy hardware or cache event as deprecated if the core PMU doesn't support it. Use the legacy terms string to set the alternate_hw_config, avoiding the value needing to be passed from the parse_events parser. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Add terms for legacy hardware and cache config valuesIan Rogers
Add the PMU terms legacy-hardware-config and legacy-cache-config. These terms are similar to the config term in that their values are assigned to the perf_event_attr config value. They differ in that the PMU type is switched to be either PERF_TYPE_HARDWARE or PERF_TYPE_HW_CACHE, and the PMU type is moved into the extended type information of the config value. This will allow later patches to add legacy events to json. An example use of the terms is in the following: ``` $ perf stat -vv -e 'cpu/legacy-hardware-config=1/,cpu/legacy-cache-config=0x10001/' true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/legacy-hardware-config=0x1/ ..after resolving event: cpu/legacy-hardware-config=0x1/ Attempt to add: cpu/legacy-cache-config=0x10001/ ..after resolving event: cpu/legacy-cache-config=0x10001/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x1 (PERF_COUNT_HW_INSTRUCTIONS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10001 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_L1I) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 4 cpu/legacy-hardware-config=1/: -1: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: -1: 57453 414756 414756 cpu/legacy-hardware-config=1/: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: 57453 414756 414756 Performance counter stats for 'true': 1,364,046 cpu/legacy-hardware-config=1/ 57,453 cpu/legacy-cache-config=0x10001/ 0.001988593 seconds time elapsed 0.002194000 seconds user 0.000000000 seconds sys ``` Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Factor term parsing into a perf_event_attr into a helperIan Rogers
Factor existing functionality in perf_pmu__name_from_config into a helper that will be used in later patches. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Use fd rather than FILE from new_aliasIan Rogers
The FILE argument was necessary for the scanner but now that functionality is not being used we can switch to just using io__getline which should cut down on stdio buffer usage. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>