summaryrefslogtreecommitdiff
path: root/tools/perf/util/arm64-frame-pointer-unwind-support.c
AgeCommit message (Collapse)Author
2026-01-23perf regs: Refactor use of arch__sample_reg_masks() to perf_reg_name()Ian Rogers
arch__sample_reg_masks isn't supported on ARM(32), csky, loongarch, MIPS, RISC-V and s390. The table returned by the function just has the name of a register paired with the corresponding sample_regs_user mask value. For a given perf register we can compute the name with perf_reg_name and the mask is just 1 left-shifted by the perf register number. Change __parse_regs to use this method for finding registers rather than arch__sample_reg_masks, thereby adding __parse_regs support for ARM(32), csky, loongarch, MIPS, RISC-V and s390. As arch__sample_reg_masks is then unused, remove the now unneeded declarations. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Aditya Bodkhe <aditya.b1@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Guo Ren <guoren@kernel.org> Cc: Haibo Xu <haibo1.xu@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <pjw@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergei Trofimovich <slyich@gmail.com> Cc: Shimin Guo <shimin.guo@skydio.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-12perf sample: Make user_regs and intr_regs optionalIan Rogers
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <tavianator@tavianator.com> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <adrian.hunter@intel.com> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2022-04-09perf unwind: Don't show unwind error messages when augmenting frame pointer ↵James Clark
stack Commit Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'") intended to add a 'best effort' DWARF unwind that improved the frame pointer stack in most scenarios. It's expected that the unwind will fail sometimes, but this shouldn't be reported as an error. It only works when the return address can be determined from the contents of the link register alone. Fix the error shown when the unwinder requires extra registers by adding a new flag that suppresses error messages. This flag is not set in the normal --call-graph=dwarf unwind mode so that behavior is not changed. Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: James Clark <james.clark@arm.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-14perf arm: Fix off-by-one directory pathIan Rogers
Relative path include works in the regular build due to -I paths but may fail in other situations. Fixes: 83869019c74cc2d0 ("perf arch: Support register names from all archs") Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220114064822.1806019-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-21perf arm64: Inject missing frames when using 'perf record --call-graph=fp'Alexandre Truong
When unwinding using frame pointers on ARM64, the return address of the current function may not have been pushed into the stack when a function was interrupted, which makes perf show an incorrect call graph to the user. Consider the following example program: void leaf() { /* long computation */ } void parent() { // (1) leaf(); // (2) } ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer): leaf: /* long computation */ nop ret parent: // (1) stp x29, x30, [sp, -16]! mov x29, sp bl parent nop ldp x29, x30, [sp], 16 // (2) ret If the program is interrupted at (1), (2), or any point in "leaf:", the call graph will skip the callers of the current function. We can unwind using the dwarf info and check if the return addr is the same as the LR register, and inject the missing frame into the call graph. Before this patch, the above example shows the following call-graph when recording using "--call-graph fp" mode in ARM64: # Children Self Command Shared Object Symbol # ........ ........ ........ ................ ...................... # 99.86% 99.86% program3 program3 [.] leaf | ---_start __libc_start_main main leaf As can be seen, the "parent" function is missing. This is specially problematic in "leaf" because for leaf functions the compiler may always omit pushing the return addr into the stack. After this patch, it shows the correct graph: # Children Self Command Shared Object Symbol # ........ ........ ........ ................ ...................... # 99.86% 99.86% program3 program3 [.] leaf | ---_start __libc_start_main main parent leaf Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-7-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> [ Rename machine__normalize_is() to machine__normalized_is(), as suggested by James Clark ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>