summaryrefslogtreecommitdiff
path: root/tools/tracing/rtla/src/timerlat_top.c
AgeCommit message (Collapse)Author
2026-01-07rtla: Remove unused headersWander Lairson Costa
Remove unused includes for <errno.h> and <signal.h> to clean up the code and reduce unnecessary dependencies. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-12-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -H/--house-keeping option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -H/--house-keeping. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-8-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -P/--priority option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -P/--priority. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-7-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -e/--event option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -e/--event. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-6-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -d/--duration option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -d/--duration. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-5-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -D/--debug option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -D/--debug. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-4-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -C/--cgroup option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -C/--cgroup. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-3-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -c/--cpus option parsingCosta Shulyupin
Each rtla tool duplicates parsing of -c/--cpus. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Add common_parse_options()Costa Shulyupin
Each rtla tool duplicates parsing of many common options. This creates maintenance overhead and risks inconsistencies when updating these options. Add common_parse_options() to centralize parsing of options used across all tools. Common options to be migrated in future patches. Changes since v1: - restore opterr Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla/timerlat: Add --bpf-action optionTomas Glozar
Add option --bpf-action that allows the user to attach an external BPF program that will be executed via BPF tail call on latency threshold overflow. Executing additional BPF code on latency threshold overflow allows doing low-latency and in-kernel troubleshooting of the cause of the overflow. The option takes an argument, which is a path to a BPF ELF file expected to contain a function named "action_handler" in a section named "tp/timerlat_action" (the section is necessary for libbpf to assign the correct BPF program type to it). Link: https://lore.kernel.org/r/20251126144205.331954-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Add common_usage()Costa Shulyupin
The rtla tools have significant code quadruplication in their usage functions. Each tool implements its own version of the same help text formatting and option descriptions, leading to maintenance overhead and inconsistencies. Documentation/tools/rtla/common_options.rst lists 14 common options. Add common_usage() infrastructure to consolidate help formatting. Subsequent patches will extend this to handle other common options. The refactored output is almost identical to the original, with the following changes: - add square brackets to specify optionality: `usage: [rtla] ...` - remove `-q` from timerlat hist because hist tools don't support it - minor spacing Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251124063204.845425-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla/timerlat: Exit top main loop on any non-zero wait_retvalCrystal Wood
Comparing to exactly 1 will fail if more than one ring buffer event was seen since the last call to timerlat_bpf_wait(), which can happen in some race scenarios. Signed-off-by: Crystal Wood <crwood@redhat.com> Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla: Fix -a overriding -t argumentIvan Pravdin
When running rtla as `rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100` -a options override trace output filename specified by -t option. Running the command above will create <timerlat|osnoise>_trace.txt file instead of custom_file.txt. Fix this by making sure that -a option does not override trace output filename even if it's passed after trace output filename is specified. Fixes: 173a3b014827 ("rtla/timerlat: Add the automatic trace option") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla: Fix -C/--cgroup interfaceIvan Pravdin
Currently, user can only specify cgroup to the tracer's thread the following ways: `-C[cgroup]` `-C[=cgroup]` `--cgroup[=cgroup]` If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`, the parser silently fails and rtla's cgroup is used for the tracer threads. To make interface more user-friendly, allow user to specify cgroup in the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`. Refactor identical logic between -t/--trace and -C/--cgroup into a common function. Change documentation to reflect this user interface change. Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Replace timerlat_top_usage("...") with fatal("...")Costa Shulyupin
A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace timerlat_top_usage("...\n") with fatal("...") on errors. Remove the already unused 'usage' argument from timerlat_top_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Add fatal() and replace error handling patternCosta Shulyupin
The code contains some technical debt in error handling, which complicates the consolidation of duplicated code. Introduce an fatal() function to replace the common pattern of err_msg() followed by exit(EXIT_FAILURE), reducing the length of an already long function. Further patches using fatal() follow. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Remove unused optional option_indexCosta Shulyupin
The longindex argument of getopt_long() is optional and tied to the unused local variable option_index. Remove it to shorten the four longest functions and make the code neater. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Add for_each_monitored_cpu() helperCosta Shulyupin
The rtla tools have many instances of iterating over CPUs while checking if they are monitored. Add a for_each_monitored_cpu() helper macro to make the code more readable and reduce code duplication. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-09-27tools/rtla: Add remaining support for osnoise actionsCrystal Wood
The basic functionality came with the consolidation; now hook up the command line options, and add documentation and tests. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Consolidate code between osnoise/timerlat and hist/topCrystal Wood
Currently a lot of code is duplicated between the different rtla tools, making maintenance more difficult, and encouraging divergence such as features that are only implemented for certain tools even though they could be more broadly applicable. Merge the various main() functions into a common run_tool() with an ops struct for tool-specific details. Implement enough support for actions on osnoise to not need to keep the old params->trace_output path. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-5-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Create common_apply_config()Crystal Wood
Merge the common bits of osnoise_apply_config() and timerlat_apply_config(). Put the result in a new common.c, and move enough things to common.h so that common.c does not need to include osnoise.h. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-4-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Move top/hist params into common structCrystal Wood
The hist members were very similar between timerlat and top, so just use one common hist struct. output_divisor, quiet, and pretty printing are pretty generic concepts that can go in the main struct even if not every specific tool (currently) uses them. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-3-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Consolidate common parameters into shared structureCosta Shulyupin
timerlat_params and osnoise_params structures contain 15 identical fields. Introduce a new header common.h and define a common_params structure to consolidate shared fields, reduce code duplication, and enhance maintainability. Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-2-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on end featureTomas Glozar
Implement actions on end next to actions on threshold. A new option, --on-end is added, parallel to --on-threshold. Instead of being executed whenever a latency threshold is reached, it is executed at the end of the measurement. For example: $ rtla timerlat hist -d 5s --on-end trace will save the trace output at the end. All actions supported by --on-threshold are also supported by --on-end, except for continue, which does nothing with --on-end. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add continue actionTomas Glozar
Introduce option to resume tracing after a latency threshold overflow. The option is implemented as an action named "continue". Example: $ rtla timerlat top -q -T 200 -d 1s --on-threshold \ exec,command="echo Threshold" --on-threshold continue Threshold Threshold Threshold Timer Latency ... The feature is supported for both hist and top. After the continue action is executed, processing of the list of actions is stopped and tracing is resumed. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on threshold featureTomas Glozar
Extend the functionality provided by the -t/--trace option, which triggers saving the contents of a tracefs buffer after tracing is stopped, to support implementing arbitrary actions. A new option, --on-threshold, is added, taking an argument that further specifies the action. Actions added in this patch are: - trace[,file=<filename>]: Saves tracefs buffer, optionally taking a filename. - signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might be specified instead of number to send signal to parent process. - shell,command=<command>: Execute shell command. Multiple actions may be specified and will be executed in order, including multiple actions of the same type. Trace output requested via -t and -a now adds a trace action to the end of the list. If an action fails, the following actions are not executed. For example, this command: $ rtla timerlat -T 20 --on-threshold trace \ --on-threshold shell,command="grep ipi_send timerlat_trace.txt" \ --on-threshold signal,num=2,pid=parent will send signal 2 (SIGINT) to parent process, but only if saved trace contains the text "ipi_send". This way, the feature can be used for flexible reactions on latency spikes, and allows combining rtla with other tooling like perf. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Introduce enum timerlat_tracing_modeTomas Glozar
After the introduction of BPF-based sample collection, rtla-timerlat effectively runs in one of three modes: - Pure BPF mode, with tracefs only being used to set up the timerlat tracer. Sample processing and stop on threshold are handled by BPF. - tracefs mode. BPF is unsupported or kernel is lacking the necessary trace event (osnoise:timerlat_sample). Stop on theshold is handled by timerlat tracer stopping tracing in all instances. - BPF/tracefs mixed mode - BPF is used for sample collection for top or histogram, tracefs is used for trace output and/or auto-analysis. Stop on threshold is handled both through BPF program, which stops sample collection for top/histogram and wakes up rtla, and by timerlat tracer, which stops tracing for trace output/auto-analysis instances. Add enum timerlat_tracing_mode, with three values: - TRACING_MODE_BPF - TRACING_MODE_TRACEFS - TRACING_MODE_MIXED Those represent the modes described above. A field of this type is added to struct timerlat_params, named "mode", replacing the no_bpf variable. params->mode is set in timerlat_{top,hist}_parse_args to TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output and/or auto-analysis is requested. timerlat_{top,hist}_main then checks if BPF is not unavailable or disabled, in that case, it sets params->mode to TRACING_MODE_TRACEFS. A condition is added to timerlat_apply_config that skips setting timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those are unnecessary, since they only turn off tracing, which is already turned off in that case, since BPF is used to collect samples). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07rtla: Set distinctive exit value for failed testsCosta Shulyupin
A test is considered failed when a sample trace exceeds the threshold. Failed tests return the same exit code as passed tests, requiring test frameworks to determine the result by searching for "hit stop tracing" in the output. Assign a distinct exit code for failed tests to enable the use of shell expressions and seamless integration with testing frameworks without the need to parse output. Add enum type for return value. Update `make check`. Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jan Stancek <jstancek@redhat.com> Link: https://lore.kernel.org/20250417185757.2194541-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26rtla: Unify apply_config between top and histTomas Glozar
The functions osnoise_top_apply_config and osnoise_hist_apply_config, as well as timerlat_top_apply_config and timerlat_hist_apply_config, are mostly the same. Move common part from them into separate functions osnoise_apply_config and timerlat_apply_config. For rtla-timerlat, also unify params->user_hist and params->user_top into one field called params->user_data, and move several fields used only by timerlat-top into the top-only section of struct timerlat_params. Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250320092500.101385-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26rtla: Fix segfault in save_trace_to_file callTomas Glozar
Running rtla with exit on threshold, but without saving trace leads to a segmenetation fault: $ rtla timerlat hist -T 10 ... Max timerlat IRQ latency from idle: 4.29 us in cpu 0 Segmentation fault This is caused by null pointer deference in the call of save_trace_to_file, which attempts to dereference an uninitialized osnoise_tool variable: save_trace_to_file(record->trace.inst, params->trace_output); ^ this is uninitialized if params->trace_output is not set Fix this by not attempting to dereference "record" if it is NULL and passing NULL instead. As a safety measure, the first field is also checked for NULL inside save_trace_to_file. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250313141034.299117-1-tglozar@redhat.com Fixes: dc4d4e7c72d1 ("rtla: Refactor save_trace_to_file") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04rtla: Refactor save_trace_to_fileCosta Shulyupin
The functions osnoise_hist_main(), osnoise_top_main(), timerlat_hist_main(), and timerlat_top_main() are lengthy and contain duplicated code. Refactor by consolidating the duplicate lines into the save_trace_to_file() function. Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250219115138.406075-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04rtla/timerlat_top: Use BPF to collect samplesTomas Glozar
Collect samples using BPF program instead of pulling them from tracefs. If the osnoise:timerlat_sample tracepoint is unavailable or the BPF program fails to load for whatever reason, rtla falls back to the old implementation. The collection of samples using the BPF program is fully self-contained and requires no activity of the userspace part of rtla during the measurement. Thus, rtla only pulls the summary from the BPF map and displays it every second, improving the performance. In --aa-only mode, the BPF program does not collect any data and only signalizes the end of tracing to userspace. An optimization that re-used the main trace instance for auto-analysis in aa-only mode was dropped, as rtla no longer turns tracing on in the main trace instance, making it useless for auto-analysis. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/20250218145859.27762-8-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04rtla/timerlat_top: Move divisor to updateTomas Glozar
Unlike timerlat-hist, timerlat-top applies the output divisor used to set ns/us mode when printing results instead of applying it when collecting the samples. Move the application of the divisor from timerlat_top_print into timerlat_top_update to make it consistent with timerlat-hist. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/20250218145859.27762-7-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04rtla/timerlat: Unify params structTomas Glozar
Instead of having separate structs timerlat_top_params and timerlat_hist_params, use one struct timerlat_params for both. This allows code using the structs to be shared between timerlat-top and timerlat-hist. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/20250218145859.27762-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla: Report missed event countTomas Glozar
Print how many events were missed by trace buffer overflow in the main instance at the end of the run (for hist) or during the run (for top). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250123142339.990300-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24tools/rtla: Add osnoise_trace_is_off()Costa Shulyupin
All of the users of trace_is_off() passes in &record->trace as the second parameter, where record is a pointer to a struct osnoise_tool. This record could be NULL and there is a hidden dependency that the trace field is the first field to allow &record->trace to work with a NULL record pointer. In order to make this code a bit more robust, as record shouldn't be dereferenced if it is NULL, even if the code does work, create a new function called osnoise_trace_is_off() that takes the pointer to a struct osnoise_tool as its second parameter. This way it can properly test if it is NULL before it dereferences it. The old function trace_is_off() is removed and the function osnoise_trace_is_off() is added into osnoise.c which is what the struct osnoise_tool is associated with. Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250115180055.2136815-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threadsTomas Glozar
When using rtla timerlat with userspace threads (-u or -U), rtla disables the OSNOISE_WORKLOAD option in /sys/kernel/tracing/osnoise/options. This option is not re-enabled in a subsequent run with kernel-space threads, leading to rtla collecting no results if the previous run exited abnormally: $ rtla timerlat top -u ^\Quit (core dumped) $ rtla timerlat top -k -d 1s Timer Latency 0 00:00:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max The issue persists until OSNOISE_WORKLOAD is set manually by running: $ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if available to fix the issue. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250107144823.239782-4-tglozar@redhat.com Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Abort event processing on second signalTomas Glozar
If either SIGINT is received twice, or after a SIGALRM (that is, after timerlat was supposed to stop), abort processing events currently left in the tracefs buffer and exit immediately. This allows the user to exit rtla without waiting for processing all events, should that take longer than wanted, at the cost of not processing all samples. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Stop timerlat tracer on signalTomas Glozar
Currently, when either SIGINT from the user or SIGALRM from the duration timer is caught by rtla-timerlat, stop_tracing is set to break out of the main loop. This is not sufficient for cases where the timerlat tracer is producing more data than rtla can consume, since in that case, rtla is looping indefinitely inside tracefs_iterate_raw_events, never reaches the check of stop_tracing and hangs. In addition to setting stop_tracing, also stop the timerlat tracer on received signal (SIGINT or SIGALRM). This will stop new samples so that the existing samples may be processed and tracefs_iterate_raw_events eventually exits. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-4-tglozar@redhat.com Fixes: a828cd18bc4a ("rtla: Add timerlat tool and timelart top mode") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-22Merge tag 'trace-tools-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt: - Add ':' to getopt option 'trace-buffer-size' in timerlat_hist for consistency - Remove unused sched_getattr define - Rename sched_setattr() helper to syscall_sched_setattr() to avoid conflicts - Update counters to long from int to avoid overflow - Add libcpupower dependency detection - Add --deepest-idle-state to timerlat to limit deep idle sleeps - Other minor clean ups and documentation changes * tag 'trace-tools-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: verification/dot2: Improve dot parser robustness tools/rtla: Improve exception handling in timerlat_load.py tools/rtla: Enhance argument parsing in timerlat_load.py tools/rtla: Improve code readability in timerlat_load.py rtla/timerlat: Do not set params->user_workload with -U rtla: Documentation: Mention --deepest-idle-state rtla/timerlat: Add --deepest-idle-state for hist rtla/timerlat: Add --deepest-idle-state for top rtla/utils: Add idle state disabling via libcpupower rtla: Add optional dependency on libcpupower tools/build: Add libcpupower dependency detection rtla/timerlat: Make timerlat_hist_cpu->*_count unsigned long long rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long tools/rtla: fix collision with glibc sched_attr/sched_set_attr tools/rtla: drop __NR_sched_getattr rtla: Fix consistency in getopt_long for timerlat_hist rv: Fix a typo tools/rv: Correct the grammatical errors in the comments tools/rv: Correct the grammatical errors in the comments rtla: use the definition for stdout fd when calling isatty()
2024-11-19rtla/timerlat: Do not set params->user_workload with -UTomas Glozar
Since commit fb9e90a67ee9 ("rtla/timerlat: Make user-space threads the default"), rtla-timerlat has been defaulting to params->user_workload if neither that or params->kernel_workload is set. This has unintentionally made -U, which sets only params->user_hist/top but not params->user_workload, to behave like -u unless -k is set, preventing the user from running a custom workload. Example: $ rtla timerlat hist -U -c 0 & [1] 7413 $ python sample/timerlat_load.py 0 Error opening timerlat fd, did you run timerlat -U? $ ps | grep timerlatu 7415 pts/4 00:00:00 timerlatu/0 Fix the issue by checking for params->user_top/hist instead of params->user_workload when setting default thread mode. Link: https://lore.kernel.org/20241021123140.14652-1-tglozar@redhat.com Fixes: fb9e90a67ee9 ("rtla/timerlat: Make user-space threads the default") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-17rtla/timerlat: Add --deepest-idle-state for topTomas Glozar
Add option to limit deepest idle state on CPUs where timerlat is running for the duration of the workload. Link: https://lore.kernel.org/20241017140914.3200454-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-11rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long longTomas Glozar
Most fields of struct timerlat_top_cpu are unsigned long long, but the fields {irq,thread,user}_count are int (32-bit signed). This leads to overflow when tracing on a large number of CPUs for a long enough time: $ rtla timerlat top -a20 -c 1-127 -d 12h ... 0 12:00:00 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max 1 #43200096 | 0 0 1 2 | 3 2 6 12 ... 127 #43200096 | 0 0 1 2 | 3 2 5 11 ALL #119144 e4 | 0 5 4 | 2 28 16 The average latency should be 0-1 for IRQ and 5-6 for thread, but is reported as 5 and 28, about 4 to 5 times more, due to the count overflowing when summed over all CPUs: 43200096 * 127 = 5486412192, however, 1191444898 (= 5486412192 mod MAX_INT) is reported instead, as seen on the last line of the output, and the averages are thus ~4.6 times higher than they should be (5486412192 / 1191444898 = ~4.6). Fix the issue by changing {irq,thread,user}_count fields to unsigned long long, similarly to other fields in struct timerlat_top_cpu and to the count variable in timerlat_top_print_sum. Link: https://lore.kernel.org/20241011121015.2868751-1-tglozar@redhat.com Reported-by: Attila Fazekas <afazekas@redhat.com> Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-04rtla: use the definition for stdout fd when calling isatty()Eder Zulian
Use the STDOUT_FILENO definition when testing whether the standard output file descriptor refers to a terminal (for better redability). Link: https://lore.kernel.org/20240813142338.376039-1-ezulian@redhat.com Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03rtla: Fix the help text in osnoise and timerlat top toolsEder Zulian
The help text in osnoise top and timerlat top had some minor errors and omissions. The -d option was missing the 's' (second) abbreviation and the error message for '-d' used '-D'. Cc: stable@vger.kernel.org Fixes: 1eceb2fc2ca54 ("rtla/osnoise: Add osnoise top mode") Fixes: a828cd18bc4ad ("rtla: Add timerlat tool and timelart top mode") Link: https://lore.kernel.org/20240813155831.384446-1-ezulian@redhat.com Suggested-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-16rtla: Fix -t\--trace[=file]John Kacur
The -t option has an optional argument. The usual case is for a short option to be specified without an '=' and for the long version to be specified with an '=' Various forms of this do not work as expected. For example: rtla timerlat hist -T50 -tfile.txt will result in a truncated file name of "ile.txt" Another example is that the long form without the '=' will result in the default file name instead of the requested file name. This patch properly parses the optional argument with and without '=' and with and without spaces for the short form. This patch was also tested using -t and --trace without providing a file name both as the last requested option and with a following long and short option. For example: rtla timerlat hist -T50 -t -u rtla timerlat hist -T50 --trace -u This fix is applied to both timerlat top and hist and to osnoise top and hist. Here is the full testing for rtla timerlat hist. Before applying the patch rtla timerlat hist -T50 -t=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 -tfile.txt Truncated file name "ile.txt" rtla timerlat hist -T50 -t file.txt Default file name instead of file.txt rtla timerlat hist -T50 --trace=file.txt Truncated file name "ile.txt" rtla timerlat hist -T50 --trace file.txt Default file name "timerlat_trace.txt" instead of "file.txt" After applying the patch: rtla timerlat hist -T50 -t=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 -tfile.txt Works as expected, "file.txt" rtla timerlat hist -T50 -t file.txt Works as expected, "file.txt" rtla timerlat hist -T50 --trace=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 --trace file.txt Works as expected, "file.txt" In addition the following tests were performed to make sure that the default file name worked as expected including with trailing options. rtla timerlat hist -T50 -t Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 --trace Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 -t -u Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 --trace -u Works as expected "timerlat_trace.txt" Link: https://lkml.kernel.org/r/20240515183024.59985-1-jkacur@redhat.com Cc: Daniel Bristot de Oliveria <bristot@kernel.org> Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-16rtla: Add --trace-buffer-size optionDaniel Bristot de Oliveira
Add the option allow the users to set a different buffer size for the trace. For example, in large systems, the user might be interested on reducing the trace buffer to avoid large tracing files. The buffer size is specified in kB, and it is only affecting the tracing instance. The function trace_set_buffer_size() appears on libtracefs v1.6, so increase the minimum required version on Makefile.config. Link: https://lkml.kernel.org/r/e7c9ca5b3865f28e131a49ec3b984fadf2d056c6.1715860611.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-15rtla/timerlat: Make user-space threads the defaultDaniel Bristot de Oliveira
After ther -u addition, most of the known users are setting it. And it makes sense, as it adds more information, and inherits the default setup for the threads - e.g., cgroups configs. Thus, if the user-space interface is available, enable -u. Otherwise, use the in-kernel thread. Add the -k option to allow the user to request kernel-threads. Link: https://lkml.kernel.org/r/9241d3089de4091b124f780ed832a0e6646cadaa.1713968967.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-15rtla: Add the --warm-up optionDaniel Bristot de Oliveira
On many cases, the results right after the startup are different from the rest of the execution, biasing the results. For example, on osnoise, the scheduler might take some time to adapt to the new busy-loop workload. Add the --warm-up <seconds> option, adding a warm-up phase (in seconds) where the workload is set, but the results are discarded. Link: https://lkml.kernel.org/r/e682d5ce5af90f123bd13220f63d5c3d118a92be.1713968967.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-15rtla/timerlat: Add a summary for top modeDaniel Bristot de Oliveira
While the per-cpu values are the results to take into consideration, the overall system values are also useful. Add a summary at the bottom of rtla timerlat top showing the overall results. For instance: Timer Latency 0 00:00:10 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max 0 #10003 | 113 19 150 441 | 134 35 170 459 1 #10003 | 63 8 99 462 | 84 15 119 481 2 #10003 | 3 2 89 396 | 21 8 108 414 3 #10002 | 206 11 210 394 | 223 21 228 415 ---------------|----------------------------------------|--------------------------------------- ALL #40011 e0 | 2 137 462 | 8 156 481 Link: https://lkml.kernel.org/r/5eb510d6faeb4ce745e09395196752df75a2dd1a.1713968967.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Suggested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>