kernel/tools/perf/builtin-bench.c, branch linux-3.11.y

perf tools: Make numa benchmark optional

2013-01-30T13:36:21Z

Commit "perf: Add 'perf bench numa mem'..." added a NUMA performance benchmark to perf. Make this optional and test for required dependencies. Signed-off-by: Peter Hurley Acked-by: Ingo Molnar Cc: Ingo Molnar Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1359337882-21821-1-git-send-email-peter@hurleysoftware.com Signed-off-by: Arnaldo Carvalho de Melo

perf: Add 'perf bench numa mem' NUMA performance measurement suite

2013-01-30T13:35:36Z

Add a suite of NUMA performance benchmarks. The goal was simulate the behavior and access patterns of real NUMA workloads, via a wide range of parameters, so this tool goes well beyond simple bzero() measurements that most NUMA micro-benchmarks use: - It processes the data and creates a chain of data dependencies, like a real workload would. Neither the compiler, nor the kernel (via KSM and other optimizations) nor the CPU can eliminate parts of the workload. - It randomizes the initial state and also randomizes the target addresses of the processing - it's not a simple forward scan of addresses. - It provides flexible options to set process, thread and memory relationship information: -G sets "global" memory shared between all test processes, -P sets "process" memory shared by all threads of a process and -T sets "thread" private memory. - There's a NUMA convergence monitoring and convergence latency measurement option via -c and -m. - Micro-sleeps and synchronization can be injected to provoke lock contention and scheduling, via the -u and -S options. This simulates IO and contention. - The -x option instructs the workload to 'perturb' itself artificially every N seconds, by moving to the first and last CPU of the system periodically. This way the stability of convergence equilibrium and the number of steps taken for the scheduler to reach equilibrium again can be measured. - The amount of work can be specified via the -l loop count, and/or via a -s seconds-timeout value. - CPU and node memory binding options, to test hard binding scenarios. THP can be turned on and off via madvise() calls. - Live reporting of convergence progress in an 'at glance' output format. Printing of convergence and deconvergence events. The 'perf bench numa mem -a' option will start an array of about 30 individual tests that will each output such measurements: # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.276, secs, runtime-max/thread 5x5-bw-thread, 20.004, secs, runtime-min/thread 5x5-bw-thread, 20.155, secs, runtime-avg/thread 5x5-bw-thread, 0.671, %, spread-runtime/thread 5x5-bw-thread, 21.153, GB, data/thread 5x5-bw-thread, 528.818, GB, data-total 5x5-bw-thread, 0.959, nsecs, runtime/byte/thread 5x5-bw-thread, 1.043, GB/sec, thread-speed 5x5-bw-thread, 26.081, GB/sec, total-speed See the help text and the code for more details. Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Steven Rostedt Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Cc: Hugh Dickins Signed-off-by: Ingo Molnar

perf bench: Flush stdout before starting bench suite

2013-01-24T19:40:19Z

perf bench prints header message for bench suite before starting the benchmark. However if the stdout is redirected to a file and bench suite forks child processes this (and possibly other debugging messages too) will be repeated multiple times. $ perf bench sched messaging # Running sched/messaging benchmark... # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.100 [sec] $ perf bench sched messaging > result.txt $ wc -l result.txt 391 In this file, there were so many "Running sched/messaging benchmark..." lines. This was because stdout is converted to fully-buffered due to the redirection and inherited child processes. Other lines are printed after reaping all those tasks. So fix it by flushing stdout before starting bench suites. Signed-off-by: Namhyung Kim Acked-by: Hitoshi Mitake Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1357637966-8216-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo

perf tools: Use __maybe_used for unused variables

2012-09-11T15:19:15Z

perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea Acked-by: Pekka Enberg Cc: David Ahern Cc: Ingo Molnar Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Steven Rostedt Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo

perf bench: Documentation update

2012-06-27T16:17:48Z

The current perf-bench documentation has a couple of typos and even lacks entire description of mem subsystem. Fix it. Reported-by: Ingo Molnar Signed-off-by: Namhyung Kim Acked-by: Hitoshi Mitake Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo

perf bench: Also allow measuring memset()

2012-01-24T22:25:32Z

This simply clones the respective memcpy() implementation. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Link: http://lkml.kernel.org/r/4F16D743020000780006D735@nat28.tlf.novell.com Signed-off-by: Jan Beulich Signed-off-by: Arnaldo Carvalho de Melo

perf options: Type check all the remaining OPT_ variants

2010-05-17T19:22:41Z

OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these tools is to set something that has an enum type, that is builtin compatible with unsigned int. Several string constifications were done to make OPT_STRING require a const char * type. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo

perf bench: Add "all" pseudo subsystem and "all" pseudo suite

2009-12-14T07:51:19Z

This patch adds a new "all" pseudo subsystem and an "all" pseudo suite. These are for testing all subsystem and its all suite, or all suite of one subsystem. (This patch also contains a few trivial comment fixes for bench/* and output style fixes. I judged that there are no necessity to make them into individual patch.) Example of use: | % ./perf bench sched all # Test all suites of sched subsystem | # Running sched/messaging benchmark... | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 0.414 [sec] | | # Running sched/pipe benchmark... | # Extecuted 1000000 pipe operations between two tasks | | Total time: 10.999 [sec] | | 10.999317 usecs/op | 90914 ops/sec | | % ./perf bench all # Test all suites of all subsystems | # Running sched/messaging benchmark... | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 0.420 [sec] | | # Running sched/pipe benchmark... | # Extecuted 1000000 pipe operations between two tasks | | Total time: 11.741 [sec] | | 11.741346 usecs/op | 85169 ops/sec | | # Running mem/memcpy benchmark... | # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ... | | 808.407437 MB/Sec Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Frederic Weisbecker LKML-Reference: <1260691319-4683-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar

perf bench: Add memcpy() benchmark

2009-11-19T05:21:48Z

'perf bench mem memcpy' is a benchmark suite for measuring memcpy() performance. Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz: | % perf bench mem memcpy -l 1GB | # Running mem/memcpy benchmark... | # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ... | | 726.216412 MB/Sec Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Frederic Weisbecker LKML-Reference: <1258471212-30281-1-git-send-email-mitake@dcl.info.waseda.ac.jp> [ v2: updated changelog, clarified history of builtin-bench.c ] Signed-off-by: Ingo Molnar

perf bench: Improve builtin-bench.c for more friendly output

2009-11-10T18:56:44Z

This patch makes output of perf bench more friendly. Current style of putput, keeping user wait and printing everything suddenly when we finish, may confuse users. So I improved it: | % perf bench sched messaging | # Running sched/messaging benchmark... <- printed right after invocation | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 1.476 [sec] Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1257865442-20252-2-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar