summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-11-13selftests/bpf: Add test to verify freeing the special fields in pcpu mapsLeon Hwang
Add test to verify that updating [lru_,]percpu_hash maps decrements refcount when BPF_KPTR_REF objects are involved. The tests perform the following steps: . Call update_elem() to insert an initial value. . Use bpf_refcount_acquire() to increment the refcount. . Store the node pointer in the map value. . Add the node to a linked list. . Probe-read the refcount and verify it is *2*. . Call update_elem() again to trigger refcount decrement. . Probe-read the refcount and verify it is *1*. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251105151407.12723-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13Merge branch 'for-6.19/cxl-addr-xlat' into cxl-for-nextDave Jiang
Enable unit testing for XOR address translation of SPA to DPA and vice versa.
2025-11-13Merge branch 'for-6.19/cxl-misc' into cxl-for-nextDave Jiang
Misc patches for CXL 6.19 - Remove incorrect page-allocator quirk section in documentation. - Remove unused devm_cxl_port_enumerate_dports() function. - Fix typo in cdat.c code comment. - Replace use of system_wq with system_percpu_wq - Add locked decoder support - Return when generic target updated - Rename region_res_match_cxl_range() to spa_maps_hpa() - Clarify comment in spa_maps_hpa()
2025-11-13objtool: Warn on functions with ambiguous -ffunction-sections section namesJosh Poimboeuf
When compiled with -ffunction-sections, a function named startup() will be placed in .text.startup. However, .text.startup is also used by the compiler for functions with __attribute__((constructor)). That creates an ambiguity for the vmlinux linker script, which needs to differentiate those two cases. Similar naming conflicts exist for functions named exit(), split(), unlikely(), hot() and unknown(). One potential solution would be to use '#ifdef CC_USING_FUNCTION_SECTIONS' to create two distinct implementations of the TEXT_MAIN macro. However, -ffunction-sections can be (and is) enabled or disabled on a per-object basis (for example via ccflags-y or AUTOFDO_PROFILE). So the recently unified TEXT_MAIN macro (commit 1ba9f8979426 ("vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macros")) is necessary. This means there's no way for the linker script to disambiguate things. Instead, use objtool to warn on any function names whose resulting section names might create ambiguity when the kernel is compiled (in whole or in part) with -ffunction-sections. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: live-patching@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/65fedea974fe14be487c8867a0b8d0e4a294ce1e.1762991150.git.jpoimboe@kernel.org
2025-11-13Merge tag 'v6.18-rc5' into objtool/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-11-12tools: ynltool: correct install in MakefileJakub Kicinski
Use the variable in case user has a custom install binary. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20251111155214.2760711-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests: drv-net: Limit the max number of queues in procfs_downup_hammerDimitri Daskalakis
For NICs with a large (1024+) number of queues, this test can cause excessive memory fragmentation. This results in OOM errors, and in the worst case driver/kernel crashes. We don't need to test with the max number of queues, just enough to create a high likelihood of races between reconfiguration and stats getting read. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20251111225319.3019542-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12perf test: Add a perf event fallback testZide Chen
This adds test cases to verify the precise ip fallback logic: - If the system supports precise ip, for an event given with the maximum precision level, it should be able to decrease precise_ip to find a supported level. - The same fallback behavior should also work in more complex scenarios, such as event groups or when PEBS is involved Additional fallback tests, such as those covering missing feature cases, can be added in the future. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Ian Rogers <irogers!@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-12hung_task: panic when there are more than N hung tasks at the same timeLi RongQing
The hung_task_panic sysctl is currently a blunt instrument: it's all or nothing. Panicking on a single hung task can be an overreaction to a transient glitch. A more reliable indicator of a systemic problem is when multiple tasks hang simultaneously. Extend hung_task_panic to accept an integer threshold, allowing the kernel to panic only when N hung tasks are detected in a single scan. This provides finer control to distinguish between isolated incidents and system-wide failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic on the first hung task (unchanged) - N > 1: Panic after N hung tasks are detected in a single scan The original behavior is preserved for values 0 and 1, maintaining full backward compatibility. [lance.yang@linux.dev: new changelog] Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Lance Yang <lance.yang@linux.dev> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Florian Wesphal <fw@strlen.de> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Phil Auld <pauld@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12sched_ext: Add scx_cpu0 example schedulerTejun Heo
Add scx_cpu0, a simple scheduler that queues all tasks to a single DSQ and only dispatches them from CPU0 in FIFO order. This is useful for testing bypass behavior when many tasks are concentrated on a single CPU. If the load balancer doesn't work, bypass mode can trigger task hangs or RCU stalls as the queue is long and there's only one CPU working on it. v2: Check whether task is on CPU0 at enqueue using scx_bpf_task_cpu() instead of nr_cpus_allowed (Andrea Righi). Cc: Dan Schatzberg <schatzberg.dan@gmail.com> Cc: Emil Tsalapatis <etsal@meta.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-12vfio: selftests: replace iova=vaddr with allocated iovasAlex Mastro
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr as part of DMA mapping operations. However, not all IOMMUs support the same virtual address width as the processor. For instance, older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA fails. Make the tests more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container. Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: add iova allocatorAlex Mastro
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map. This allocator traverses the sorted available IOVA ranges linearly, requires power-of-two size allocations, and does not support freeing iova allocations. The assumption is that tests are not IOVA space-bounded, and will not need to recycle IOVAs. This is based on Alex Williamson's patch series for adding an IOVA allocator [1]. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/ Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: fix map limit tests to use last available iovaAlex Mastro
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests. Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: add iova range query helpersAlex Mastro
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped. Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct. Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE. The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed. Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12selftests/vsock: disable shellcheck SC2317 and SC2119Bobby Eshleman
Disable shellcheck rules SC2317 and SC2119. These rules are being triggered due to false positives. For SC2317, many `return "${KSFT_PASS}"` lines are reported as unreachable, even though they are executed during normal runs. For SC2119, the fact that log_guest/log_host accept either stdin or arguments triggers SC2119, despite being valid. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-12-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: add vsock_loopback module loadingBobby Eshleman
Add vsock_loopback module loading to the loopback test so that vmtest.sh can be used for kernels built with loopback as a module. This is not technically a fix as kselftest expects loopback to be built-in already (defined in selftests/vsock/config). This is useful only for using vmtest.sh outside of kselftest. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-11-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: add 1.37 to tested virtme-ng versionsBobby Eshleman
Testing with 1.37 shows all tests passing but emits the warning: warning: vng version 'virtme-ng 1.37' has not been tested and may not function properly. The following versions have been tested: 1.33 1.36 This patch adds 1.37 to the virtme-ng versions to get rid of the above warning. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-10-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: add BUILD=0 definitionBobby Eshleman
Add the definition for BUILD and initialize it to zero. This avoids 'bash -u vmtest.sh` from throwing 'unbound variable' when BUILD is not set to 1 and is later checked for its value. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-9-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: identify and execute tests that can re-use VMBobby Eshleman
In preparation for future patches that introduce tests that cannot re-use the same VM, add functions to identify those that *can* re-use a VM. By continuing to re-use the same VM for these tests we can save time by avoiding the delay of booting a VM for every test. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-8-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: add check_result() for pass/fail countingBobby Eshleman
Add check_result() function to reuse logic for incrementing the pass/fail counters. This function will get used by different callers as we add different types of tests in future patches (namely, namespace and non-namespace tests will be called at different places, and re-use this function). Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-7-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: speed up tests by reducing the QEMU pidfile timeoutBobby Eshleman
Reduce the time waiting for the QEMU pidfile from three minutes to five seconds. The three minute time window was chosen to make sure QEMU had enough time to fully boot up. This, however, is an unreasonably long delay for QEMU to write the pidfile, which happens earlier when the QEMU process starts (not after VM boot). The three minute delay becomes noticeably wasteful in future tests that expect QEMU to fail and wait a full three minutes for a pidfile that will never exist. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-6-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: do not unconditionally die if qemu failsBobby Eshleman
If QEMU fails to boot, then set the returncode (via timeout) instead of unconditionally dying. This is in preparation for tests that expect QEMU to fail to boot. In that case, we just want to know if the boot failed or not so we can test the pass/fail criteria, and continue executing the next test. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-5-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: avoid multi-VM pidfile collisions with QEMUBobby Eshleman
Change QEMU to use generated pidfile names instead of just a single globally-defined pidfile. This allows multiple QEMU instances to co-exist with different pidfiles. This is required for future tests that use multiple VMs to check for CID collissions. Additionally, this also places the burden of killing the QEMU process and cleaning up the pidfile on the caller of vm_start(). To help with this, a function terminate_pidfiles() is introduced that callers use to perform the cleanup. The terminate_pidfiles() function supports multiple pidfile removals because future patches will need to process two pidfiles at a time. Change QEMU_OPTS to be initialized inside the vm_start(). This allows the generated pidfile to be passed to the string assignment, and prepares for future vm-specific options as well (e.g., cid). Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-4-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: reuse logic for vsock_test through wrapper functionsBobby Eshleman
Add wrapper functions vm_vsock_test() and host_vsock_test() to invoke the vsock_test binary. This encapsulates several items of repeat logic, such as waiting for the server to reach listening state and enabling/disabling the bash option pipefail to avoid pipe-style logging from hiding failures. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-3-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: make wait_for_listener() work even if pipefail is onBobby Eshleman
Rewrite wait_for_listener()'s pattern matching to avoid tripping the if-condition when pipefail is on. awk doesn't gracefully handle SIGPIPE with a non-zero exit code, so grep exiting upon finding a match causes false-positives when the pipefail option is used (grep exits, SIGPIPE emits, and awk complains with a non-zero exit code). Instead, move all of the pattern matching into awk so that SIGPIPE cannot happen and the correct exit code is returned. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-2-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12selftests/vsock: improve logging in vmtest.shBobby Eshleman
Improve usability of logging functions. Remove the test name prefix from logging functions so that logging calls can be made deeper into the call stack without passing down the test name or setting some global. Teach log function to accept a LOG_PREFIX variable to avoid unnecessary argument shifting. Remove log_setup() and instead use log_host(). The host/guest prefixes are useful to show whether a failure happened on the guest or host side, but "setup" doesn't really give additional useful information. Since all log_setup() calls happen on the host, lets just use log_host() instead. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-1-d5e8d6c87289@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12KVM: selftests: Test for KVM_EXIT_ARM_SEAJiaqi Yan
Test how KVM handles guest SEA when APEI is unable to claim it, and KVM_CAP_ARM_SEA_TO_USER is enabled. The behavior is triggered by consuming recoverable memory error (UER) injected via EINJ. The test asserts two major things: 1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and has provided expected fault information, e.g. esr, flags, gva, gpa. 2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to guest and KVM injects expected SEA into the VCPU. Tested on a data center server running Siryn AmpereOne processor that has RAS support. Several things to notice before attempting to run this selftest: - The test relies on EINJ support in both firmware and kernel to inject UER. Otherwise the test will be skipped. - The under-test platform's APEI should be unable to claim the SEA. Otherwise the test will be skipped. - Some platform doesn't support notrigger in EINJ, which may cause APEI and GHES to offline the memory before guest can consume injected UER, and making test unable to trigger SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-11selftests: mptcp: join: properly kill background tasksMatthieu Baerts (NGI0)
The 'run_tests' function is executed in the background, but killing its associated PID would not kill the children tasks running in the background. To properly kill all background tasks, 'kill -- -PID' could be used, but this requires kill from procps-ng. Instead, all children tasks are listed using 'ps', and 'kill' is called with all PIDs of this group. Fixes: 31ee4ad86afd ("selftests: mptcp: join: stop transfer when check is done (part 1)") Cc: stable@vger.kernel.org Fixes: 04b57c9e096a ("selftests: mptcp: join: stop transfer when check is done (part 2)") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: connect: trunc: read all recv dataMatthieu Baerts (NGI0)
MPTCP Join "fastclose server" selftest is sometimes failing because the client output file doesn't have the expected size, e.g. 296B instead of 1024B. When looking at a packet trace when this happens, the server sent the expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE. It is then strange to see the client only receiving 296B, which would mean it only got a part of the second packet. The problem is then not on the networking side, but rather on the data reception side. When mptcp_connect is launched with '-f -1', it means the connection might stop before having sent everything, because a reset has been received. When this happens, the program was directly stopped. But it is also possible there are still some data to read, simply because the previous 'read' step was done with a buffer smaller than the pending data, see do_rnd_read(). In this case, it is important to read what's left in the kernel buffers before stopping without error like before. SIGPIPE is now ignored, not to quit the app before having read everything. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: userspace: longer transferMatthieu Baerts (NGI0)
In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all userspace tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer") Cc: stable@vger.kernel.org Fixes: b2e2248f365a ("selftests: mptcp: userspace pm create id 0 subflow") Fixes: e3b47e460b4b ("selftests: mptcp: userspace pm remove initial subflow") Fixes: b9fb176081fb ("selftests: mptcp: userspace pm send RM_ADDR for ID 0") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: endpoints: longer transferMatthieu Baerts (NGI0)
In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all endpoints tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Fixes: b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd") Fixes: e06959e9eebd ("selftests: mptcp: join: test for flush/re-add endpoints") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: rm: set backup flagMatthieu Baerts (NGI0)
Some of these 'remove' tests rarely fail because a subflow has been reset instead of cleanly removed. This can happen when one extra subflow which has never carried data is being closed (FIN) on one side, while the other is sending data for the first time. To avoid such subflows to be used right at the end, the backup flag has been added. With that, data will be only carried on the initial subflow. Fixes: d2c4333a801c ("selftests: mptcp: add testcases for removing addrs") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-2-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: connect: fix fallback note due to OoOMatthieu Baerts (NGI0)
The "fallback due to TCP OoO" was never printed because the stat_ooo_now variable was checked twice: once in the parent if-statement, and one in the child one. The second condition was then always true then, and the 'else' branch was never taken. The idea is that when there are more ACK + MP_CAPABLE than expected, the test either fails if there was no out of order packets, or a notice is printed. Fixes: 69ca3d29a755 ("mptcp: update selftest for fallback due to OoO") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-1-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11perf stat: Align metric output without eventsNamhyung Kim
One of my concern in the perf stat output was the alignment in the metrics and shadow stats. I think it missed to calculate the basic output length using COUNTS_LEN and EVNAME_LEN but missed to add the unit length like "msec" and surround 2 spaces. I'm not sure why it's not printed below though. But anyway, now it shows correctly aligned metric output. $ perf stat true Performance counter stats for 'true': 859,772 task-clock # 0.395 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 56 page-faults # 65.134 K/sec 1,075,022 instructions # 0.86 insn per cycle 1,255,911 cycles # 1.461 GHz 220,573 branches # 256.548 M/sec 7,381 branch-misses # 3.35% of all branches TopdownL1 # 19.2 % tma_retiring # 28.6 % tma_backend_bound # 9.5 % tma_bad_speculation # 42.6 % tma_frontend_bound 0.002174871 seconds time elapsed ^ | 0.002154000 seconds user | 0.000000000 seconds sys here Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf tool_pmu: Make core_wide and target_cpu json eventsIan Rogers
For the sake of better documentation, add core_wide and target_cpu to the tool.json. When the values of system_wide and user_requested_cpu_list are unknown, use the values from the global stat_config. Example output showing how '-a' modifies the values in `perf stat`: ``` $ perf stat -e core_wide,target_cpu true Performance counter stats for 'true': 0 core_wide 0 target_cpu 0.000993787 seconds time elapsed 0.001128000 seconds user 0.000000000 seconds sys $ perf stat -e core_wide,target_cpu -a true Performance counter stats for 'system wide': 1 core_wide 1 target_cpu 0.002271723 seconds time elapsed $ perf list ... tool: core_wide [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool] ... target_cpu [1 if CPUs being analyzed,0 if threads/processes. Unit: tool] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat csv: Update test expectations and eventsIan Rogers
Explicitly use a metric rather than implicitly expecting '-e instructions,cycles' to produce a metric. Use a metric with software events to make it more compatible. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update test expectations and eventsIan Rogers
test_stat_record_report and test_stat_record_script used default output which triggers a bug when sending metrics. As this isn't relevant to the test switch to using named software events. Update the match in test_hybrid as the cycles event is now cpu-cycles to workaround potential ARM issues. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update shadow test to use metricsIan Rogers
Previously '-e cycles,instructions' would implicitly create an IPC metric. This now has to be explicit with '-M insn_per_cycle'. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test metrics: Update all metrics for possibly failing default metricsIan Rogers
Default metrics may use unsupported events and be ignored. These metrics shouldn't cause metric testing to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update std_output testing metric expectationsIan Rogers
Make the expectations match json metrics rather than the previous hard coded ones. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Ignore failures in Default[234] metricgroupsIan Rogers
The Default[234] metric groups may contain unsupported legacy events. Allow those metric groups to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat+json: Improve metric-only testingIan Rogers
When testing metric-only, pass a metric to perf rather than expecting a hard coded metric value to be generated. Remove keys that were really metric-only units and instead don't expect metric only to have a matching json key as it encodes metrics as {"metric_name", "metric_value"}. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove "unit" workarounds for metric-onlyIan Rogers
Remove code that tested the "unit" as in KB/sec for certain hard coded metric values and did workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Sort default events/metricsIan Rogers
To improve the readability of default events/metrics, sort the evsels after the Default metric groups have be parsed. Before: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,087 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation # 25.8 % tma_frontend_bound # 34.5 % tma_backend_bound # 29.3 % tma_retiring 7,829 page-faults # nan faults/sec page_faults_per_second 880,144,270 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.10%) 1,693,081,235 cpu_core/cpu-cycles/ # nan GHz cycles_frequency TopdownL1 (cpu_atom) # 20.5 % tma_bad_speculation # 13.8 % tma_retiring (50.26%) # 34.6 % tma_frontend_bound (50.23%) 89,326,916 cpu_atom/branches/ # nan M/sec branch_frequency (60.19%) 538,123,088 cpu_core/branches/ # nan M/sec branch_frequency 1,368 cpu-migrations # nan migrations/sec migrations_per_second # 31.1 % tma_backend_bound (60.19%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 485,744,856 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (59.87%) 3,093,112,283 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 4,939,427 cpu_atom/branch-misses/ # 5.0 % branch_miss_rate (49.77%) 7,632,248 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.005084693 seconds time elapsed ``` After: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,165 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 2,260 cpu-migrations # nan migrations/sec migrations_per_second 20,476 page-faults # nan faults/sec page_faults_per_second 17,052,357 cpu_core/branch-misses/ # 1.5 % branch_miss_rate 1,120,090,590 cpu_core/branches/ # nan M/sec branch_frequency 3,402,892,275 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 6,129,236,701 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 6,159,523 cpu_atom/branch-misses/ # 3.1 % branch_miss_rate (49.86%) 222,158,812 cpu_atom/branches/ # nan M/sec branch_frequency (50.25%) 1,547,610,244 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.40%) 1,304,901,260 cpu_atom/instructions/ # 0.8 instructions insn_per_cycle (50.41%) TopdownL1 (cpu_core) # 13.7 % tma_bad_speculation # 23.5 % tma_frontend_bound # 33.3 % tma_backend_bound # 29.6 % tma_retiring TopdownL1 (cpu_atom) # 32.1 % tma_backend_bound (59.65%) # 30.1 % tma_frontend_bound (59.51%) # 22.3 % tma_bad_speculation # 15.5 % tma_retiring (59.53%) 1.008405429 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Fix default metricgroup display on hybridIan Rogers
The logic to skip output of a default metric line was firing on Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the need_full_name check as it is equivalent to the different PMU test in the cases we care about, merge the 'if's and flip the evsel of the PMU test. The 'if' is now basically saying, if the output matches the last printed output then skip the output. Before: ``` TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation # 24.3 % tma_frontend_bound TopdownL1 (cpu_core) # 33.9 % tma_backend_bound # 30.6 % tma_retiring # 42.2 % tma_backend_bound # 25.0 % tma_frontend_bound (49.81%) # 12.8 % tma_bad_speculation # 20.0 % tma_retiring (59.46%) ``` After: ``` TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation # 43.7 % tma_frontend_bound # 30.7 % tma_backend_bound # 17.2 % tma_retiring TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound # 37.6 % tma_frontend_bound (49.66%) # 18.0 % tma_bad_speculation # 12.6 % tma_retiring (59.58%) ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove hard coded shadow metricsIan Rogers
Now that the metrics are encoded in common json the hard coded printing means the metrics are shown twice. Remove the hard coded version. This means that when specifying events, and those events correspond to a hard coded metric, the metric will no longer be displayed. The metric will be displayed if the metric is requested. Due to the adhoc printing in the previous approach it was often found frustrating, the new approach avoids this. The default perf stat output on an alderlake now looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 19,697 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation # 24.9 % tma_frontend_bound TopdownL1 (cpu_core) # 34.3 % tma_backend_bound # 30.1 % tma_retiring 6,593 page-faults # nan faults/sec page_faults_per_second 729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%) 1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 19.7 % tma_bad_speculation # 14.2 % tma_retiring (50.14%) # 37.3 % tma_frontend_bound (50.31%) 87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%) 512,046,956 cpu_core/branches/ # nan M/sec branch_frequency 1,111 cpu-migrations # nan migrations/sec migrations_per_second # 28.8 % tma_backend_bound (60.26%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%) 2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle 3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%) 7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.006621701 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf script: Change metric format to use json metricsIan Rogers
The metric format option isn't properly supported. This change improves that by making the sample events update the counts of an evsel, where the shadow metric code expects to read the values. To support printing metrics, metrics need to be found. This is done on the first attempt to print a metric. Every metric is parsed and then the evsels in the metric's evlist compared to those in perf script using the perf_event_attr type and config. If the metric matches then it is added for printing. As an event in the perf script's evlist may have >1 metric id, or different leader for aggregation, the first metric matched will be displayed in those cases. An example use is: ``` $ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1 $ perf script -F period,metric ... 867817 metric: 0.30 insn per cycle 125394 metric: 0.04 insn per cycle 313516 metric: 0.11 insn per cycle metric: 1.00 insn per cycle ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Add detail -d,-dd,-ddd metricsIan Rogers
Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded metrics. Remove the events as these now come from the metrics. Following this change a detailed perf stat output looks like: ``` $ perf stat -a -ddd -- sleep 1 Performance counter stats for 'system wide': 21,089 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 14.1 % tma_bad_speculation # 27.3 % tma_frontend_bound (30.56%) TopdownL1 (cpu_core) # 31.5 % tma_backend_bound # 27.2 % tma_retiring (30.56%) 6,302 page-faults # nan faults/sec page_faults_per_second 928,495,163 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (28.41%) 1,841,409,834 cpu_core/cpu-cycles/ # nan GHz cycles_frequency (38.51%) # 14.5 % tma_bad_speculation # 16.0 % tma_retiring (28.41%) # 36.8 % tma_frontend_bound (35.57%) 100,859,118 cpu_atom/branches/ # nan M/sec branch_frequency (42.73%) 572,657,734 cpu_core/branches/ # nan M/sec branch_frequency (54.43%) 1,527 cpu-migrations # nan migrations/sec migrations_per_second # 32.7 % tma_backend_bound (42.73%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 498,668,509 cpu_atom/instructions/ # 0.57 insn per cycle # 0.6 instructions insn_per_cycle (42.97%) 3,281,762,225 cpu_core/instructions/ # 1.84 insn per cycle # 1.8 instructions insn_per_cycle (62.20%) 4,919,511 cpu_atom/branch-misses/ # 5.43% of all branches # 5.4 % branch_miss_rate (35.80%) 7,431,776 cpu_core/branch-misses/ # 1.39% of all branches # 1.4 % branch_miss_rate (62.20%) 2,517,007 cpu_atom/LLC-loads/ # 0.1 % llc_miss_rate (28.62%) 3,931,318 cpu_core/LLC-loads/ # 40.4 % llc_miss_rate (45.98%) 14,918,674 cpu_core/L1-dcache-load-misses/ # 2.25% of all L1-dcache accesses # nan % l1d_miss_rate (37.80%) 27,067,264 cpu_atom/L1-icache-load-misses/ # 15.92% of all L1-icache accesses # 15.9 % l1i_miss_rate (21.47%) 116,848,994 cpu_atom/dTLB-loads/ # 0.8 % dtlb_miss_rate (21.47%) 764,870,407 cpu_core/dTLB-loads/ # 0.1 % dtlb_miss_rate (15.12%) 1.006181526 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf jevents: Add metric DefaultShowEventsIan Rogers
Some Default group metrics require their events showing for consistency with perf's previous behavior. Add a flag to indicate when this is the case and use it in stat-display. As events are coming from Default metrics remove that default hardware and software events from perf stat. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 20,550 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.1 % tma_frontend_bound TopdownL1 (cpu_core) # 29.2 % tma_backend_bound # 33.7 % tma_retiring 6,685 page-faults # nan faults/sec page_faults_per_second 790,091,064 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.83%) 2,563,918,366 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 12.3 % tma_bad_speculation # 14.5 % tma_retiring (50.20%) # 33.8 % tma_frontend_bound (50.24%) 76,390,322 cpu_atom/branches/ # nan M/sec branch_frequency (60.20%) 1,015,173,047 cpu_core/branches/ # nan M/sec branch_frequency 1,325 cpu-migrations # nan migrations/sec migrations_per_second # 39.3 % tma_backend_bound (60.17%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 554,347,072 cpu_atom/instructions/ # 0.64 insn per cycle # 0.6 instructions insn_per_cycle (60.14%) 5,228,931,991 cpu_core/instructions/ # 2.04 insn per cycle # 2.0 instructions insn_per_cycle 4,308,874 cpu_atom/branch-misses/ # 5.65% of all branches # 5.6 % branch_miss_rate (49.76%) 9,890,606 cpu_core/branch-misses/ # 0.97% of all branches # 1.0 % branch_miss_rate 1.005477803 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf jevents: Add set of common metrics based on default onesIan Rogers
Add support to getting a common set of metrics from a default table. It simplifies the generation to add json metrics at the same time. The metrics added are CPUs_utilized, cs_per_second, migrations_per_second, page_faults_per_second, insn_per_cycle, stalled_cycles_per_instruction, frontend_cycles_idle, backend_cycles_idle, cycles_frequency, branch_frequency and branch_miss_rate based on the shadow metric definitions. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 2 Performance counter stats for 'system wide': 0.00 msec cpu-clock # 0.000 CPUs utilized 77,739 context-switches 15,033 cpu-migrations 321,313 page-faults 14,355,634,225 cpu_atom/instructions/ # 1.40 insn per cycle (35.37%) 134,561,560,583 cpu_core/instructions/ # 3.44 insn per cycle (57.85%) 10,263,836,145 cpu_atom/cycles/ (35.42%) 39,138,632,894 cpu_core/cycles/ (57.60%) 2,989,658,777 cpu_atom/branches/ (42.60%) 32,170,570,388 cpu_core/branches/ (57.39%) 29,789,870 cpu_atom/branch-misses/ # 1.00% of all branches (42.69%) 165,991,152 cpu_core/branch-misses/ # 0.52% of all branches (57.19%) (software) # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 11.9 % tma_bad_speculation # 19.6 % tma_frontend_bound (63.97%) TopdownL1 (cpu_core) # 18.8 % tma_backend_bound # 49.7 % tma_retiring (63.97%) (software) # nan faults/sec page_faults_per_second # nan GHz cycles_frequency (42.88%) # nan GHz cycles_frequency (69.88%) TopdownL1 (cpu_atom) # 11.7 % tma_bad_speculation # 29.9 % tma_retiring (50.07%) TopdownL1 (cpu_atom) # 31.3 % tma_frontend_bound (43.09%) (cpu_atom) # nan M/sec branch_frequency (43.09%) # nan M/sec branch_frequency (70.07%) # nan migrations/sec migrations_per_second TopdownL1 (cpu_atom) # 27.1 % tma_backend_bound (43.08%) (software) # 0.0 CPUs CPUs_utilized # 1.4 instructions insn_per_cycle (43.04%) # 3.5 instructions insn_per_cycle (69.99%) # 1.0 % branch_miss_rate (35.46%) # 0.5 % branch_miss_rate (65.02%) 2.005626564 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>