kernel - Hosts the 0x221E linux distro kernel.

Age	Commit message (Collapse)	Author
2025-10-17	selftests/net: io_uring: fix unknown errnum values	Carlos Llamas
	The io_uring functions return negative error values, but error() expects these to be positive to properly match them to an errno string. Fix this to make sure the correct error descriptions are displayed upon failure. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20251016182538.3790567-1-cmllamas@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-17	Merge branch 'build-script' into docs-mw	Jonathan Corbet
	Quoth Mauro: This series should probably be called: "Move the trick-or-treat build hacks accumulated over time into a single place and document them." as this reflects its main goal. As such: - it places the jobserver logic on a library; - it removes sphinx/parallel-wrapper.sh; - the code now properly implements a jobserver-aware logic to do the parallelism when called via GNU make, failing back to "-j" when there's no jobserver; - converts check-variable-fonts.sh to Python and uses it via function call; - drops an extra script to generate man pages, adding a makefile target for it; - ensures that return code is 0 when PDF successfully builds; - about half of the script is comments and documentation. I tried to do my best to document all tricks that are inside the script. This way, the docs build steps is now documented. It should be noticed that it is out of the scope of this series to change the implementation. Surely the process can be improved, but first let's consolidate and document everything on a single place. Such script was written in a way that it can be called either directly or via a Makefile. Running outside Makefile is interesting specially when debug is needed. The command line interface replaces the need of having lots of env vars before calling sphinx-build: $ ./tools/docs/sphinx-build-wrapper --help usage: sphinx-build-wrapper [-h] [--sphinxdirs SPHINXDIRS [SPHINXDIRS ...]] [--conf CONF] [--builddir BUILDDIR] [--theme THEME] [--css CSS] [--paper {,a4,letter}] [-v] [-j JOBS] [-i] [-V [VENV]] {cleandocs,linkcheckdocs,htmldocs,epubdocs,texinfodocs,infodocs,mandocs,latexdocs,pdfdocs,xmldocs} Kernel documentation builder positional arguments: {cleandocs,linkcheckdocs,htmldocs,epubdocs,texinfodocs,infodocs,mandocs,latexdocs,pdfdocs,xmldocs} Documentation target to build options: -h, --help show this help message and exit --sphinxdirs SPHINXDIRS [SPHINXDIRS ...] Specific directories to build --conf CONF Sphinx configuration file --builddir BUILDDIR Sphinx configuration file --theme THEME Sphinx theme to use --css CSS Custom CSS file for HTML/EPUB --paper {,a4,letter} Paper size for LaTeX/PDF output -v, --verbose place build in verbose mode -j, --jobs JOBS Sets number of jobs to use with sphinx-build -i, --interactive Change latex default to run in interactive mode -V, --venv [VENV] If used, run Sphinx from a venv dir (default dir: sphinx_latest) the only mandatory argument is the target, which is identical with "make" targets. The call inside Makefile doesn't use the last four arguments. They're there to help identifying problems at the build: -v makes the output verbose; -j helps to test parallelism; -i runs latexmk in interactive mode, allowing to debug PDF build issues; -V is useful when testing it with different venvs. When used with GNU make (or some other make which implements jobserver), a call like: make -j <targets> htmldocs will make the wrapper to automatically use POSIX jobserver to claim the number of available job slots, calling sphinx-build with a "-j" parameter reflecting it. ON such case, the default can be overriden via SPHINXDIRS argument. Visiable changes when compared with the old behavior: When V=0, the only visible difference is that: - pdfdocs target now returns 0 on success, 1 on failures. This addresses an issue over the current process where we it always return success even on failures; - it will now print the name of PDF files that failed to build, if any. In verbose mode, sphinx-build-wrapper and sphinx-build command lines are now displayed.
2025-10-17	tools: docs: parse_data_structs.py: accept more reftypes	Mauro Carvalho Chehab
	Current regex is limited to only some c-domain reftypes. There are several others. Change the code to pick the name specified there. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <5c146923d1e3183893f290216fb1378954e2e540.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	tools: docs: parse_data_structs.py: add namespace support	Mauro Carvalho Chehab
	C domain supports a ".. c:namespace::" tag that allows setting a symbol namespace. This is used within the kernel to avoid warnings about duplicated symbols. This is specially important for syscalls, as each subsystem may have their own documentation for them. This is specially true for ioctl. When such tag is used, all C domain symbols have c++ style, e.g. they'll become "{namespace}.{reference}". Allow specifying C namespace at the exception files, avoiding the need of override rules for every symbol. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <cc27ec60ceb3bdac4197fb7266d2df8155edacda.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	tools: docs: parse_data_structs.py: get rid of process_exceptions()	Mauro Carvalho Chehab
	Add an extra parameter to parse_file to make it handle exceptions internally, cleaning up the API. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <8575bbc94ff706aa7e7cc3a188399ca17a3169e6.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	tools: docs: parse_data_structs: make process_exceptions two stages	Mauro Carvalho Chehab
	Split the logic which parses exceptions on two stages, preparing the exceptions file to have rules that will affect xref generation. For now, preserve the original API. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <da9ca5f2ce1ffcfb355e32e676ff013607c227e0.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	docs: kernel_include.py: fix line numbers for TOC	Mauro Carvalho Chehab
	On TOC output, we need to embeed line numbers with ViewList. Change the parse class to produce a line-number parsed result, and adjust the output accordingly. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <74eed96e32f79eaaef7a99ffe7c3224fed369c27.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	tools: docs: parse_data_structs.py: output a line number	Mauro Carvalho Chehab
	Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <dcffa6844dede00052f5fb851a857991468f22b5.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	tools: docs: parse_data_structs.py: drop contents header	Mauro Carvalho Chehab
	When used in practice, one may want to have multiple header files on a single rst file, like: ********************* Digital TV uAPI symbols ********************* .. contents:: Table of Contents :depth: 2 :local: Frontend ======== .. kernel-include:: include/uapi/linux/dvb/frontend.h :generate-cross-refs: :toc: Demux ===== .. kernel-include:: include/uapi/linux/dvb/dmx.h :generate-cross-refs: :toc: ... So, don't add ..contents:: here. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <4bf353e5248133a3b0abd82519a38453402fe7c6.1759329363.git.mchehab+huawei@kernel.org>
2025-10-17	selftests/bpf: Fix redefinition of 'off' as different kind of symbol	Brahmajit Das
	This fixes the following build error CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as different kind of symbol 228 \| u32 off; \| ^ The symbol 'off' was previously defined in tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an enum i40e_ptp_gpio_pin_state from drivers/net/ethernet/intel/i40e/i40e_ptp.c: enum i40e_ptp_gpio_pin_state { end = -2, invalid = -1, off = 0, in_A = 1, in_B = 2, out_A = 3, out_B = 4, }; This enum is included when CONFIG_I40E is enabled. As of commit 032676ff8217 ("LoongArch: Update Loongson-3 default config file"), CONFIG_I40E is set in the defconfig, which leads to the conflict. Renaming the local variable avoids the redefinition and allows the build to succeed. Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Brahmajit Das <listout@listout.xyz> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-16	selftests/net: packetdrill: unflake tcp_user_timeout_user-timeout-probe.pkt	Eric Dumazet
	This test fails the first time I am running it after a fresh virtme-ng boot. tcp_user_timeout_user-timeout-probe.pkt:33: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory) Tweaks the timings a bit, to reduce flakiness. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soham Chakradeo <sohamch@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251014171907.3554413-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16	selftests/bpf: Add test for sk->sk_bypass_prot_mem.	Kuniyuki Iwashima
	The test does the following for IPv4/IPv6 x TCP/UDP sockets with/without sk->sk_bypass_prot_mem, which can be turned on by net.core.bypass_prot_mem or bpf_setsockopt(SK_BPF_BYPASS_PROT_MEM). 1. Create socket pairs 2. Send NR_PAGES (32) of data (TCP consumes around 35 pages, and UDP consuems 66 pages due to skb overhead) 3. Read memory_allocated from sk->sk_prot->memory_allocated and sk->sk_prot->memory_per_cpu_fw_alloc 4. Check if unread data is charged to memory_allocated If sk->sk_bypass_prot_mem is set, memory_allocated should not be changed, but we allow a small error (up to 10 pages) in case other processes on the host use some amounts of TCP/UDP memory. The amount of allocated pages are buffered to per-cpu variable {tcp,udp}_memory_per_cpu_fw_alloc up to +/- net.core.mem_pcpu_rsv before reported to {tcp,udp}_memory_allocated. At 3., memory_allocated is calculated from the 2 variables at fentry of socket create function. We drain the receive queue only for UDP before close() because UDP recv queue is destroyed after RCU grace period. When I printed memory_allocated, UDP bypass cases sometimes saw the no-bypass case's leftover, but it's still in the small error range (<10 pages). bpf_trace_printk: memory_allocated: 0 <-- TCP no-bypass bpf_trace_printk: memory_allocated: 35 bpf_trace_printk: memory_allocated: 0 <-- TCP w/ sysctl bpf_trace_printk: memory_allocated: 0 bpf_trace_printk: memory_allocated: 0 <-- TCP w/ bpf bpf_trace_printk: memory_allocated: 0 bpf_trace_printk: memory_allocated: 0 <-- UDP no-bypass bpf_trace_printk: memory_allocated: 66 bpf_trace_printk: memory_allocated: 2 <-- UDP w/ sysctl (2 pages leftover) bpf_trace_printk: memory_allocated: 2 bpf_trace_printk: memory_allocated: 2 <-- UDP w/ bpf (2 pages leftover) bpf_trace_printk: memory_allocated: 2 We prefer finishing tests faster than oversleeping for call_rcu() + sk_destruct(). The test completes within 2s on QEMU (64 CPUs) w/ KVM. # time ./test_progs -t sk_bypass #371/1 sk_bypass_prot_mem/TCP :OK #371/2 sk_bypass_prot_mem/UDP :OK #371/3 sk_bypass_prot_mem/TCPv6:OK #371/4 sk_bypass_prot_mem/UDPv6:OK #371 sk_bypass_prot_mem:OK Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED real 0m1.481s user 0m0.181s sys 0m0.441s Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-7-kuniyu@google.com
2025-10-16	bpf: Introduce SK_BPF_BYPASS_PROT_MEM.	Kuniyuki Iwashima
	If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out of the global protocol memory accounting. This is easily controlled by net.core.bypass_prot_mem sysctl, but it lacks flexibility. Let's support flagging (and clearing) sk->sk_bypass_prot_mem via bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook. int val = 1; bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_BYPASS_PROT_MEM, &val, sizeof(val)); As with net.core.bypass_prot_mem, this is inherited to child sockets, and BPF always takes precedence over sysctl at socket(2) and accept(2). SK_BPF_BYPASS_PROT_MEM is only supported at BPF_CGROUP_INET_SOCK_CREATE and not supported on other hooks for some reasons: 1. UDP charges memory under sk->sk_receive_queue.lock instead of lock_sock() 2. Modifying the flag after skb is charged to sk requires such adjustment during bpf_setsockopt() and complicates the logic unnecessarily We can support other hooks later if a real use case justifies that. Most changes are inline and hard to trace, but a microbenchmark on __sk_mem_raise_allocated() during neper/tcp_stream showed that more samples completed faster with sk->sk_bypass_prot_mem == 1. This will be more visible under tcp_mem pressure (but it's not a fair comparison). # bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; } kretprobe:__sk_mem_raise_allocated /@start[tid]/ { @end[tid] = nsecs - @start[tid]; @times = hist(@end[tid]); delete(@start[tid]); }' # tcp_stream -6 -F 1000 -N -T 256 Without bpf prog: [128, 256) 3846 \| \| [256, 512) 1505326 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1371006 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 198207 \|@@@@@@ \| [2K, 4K) 31199 \|@ \| With bpf prog in the next patch: (must be attached before tcp_stream) # bpftool prog load sk_bypass_prot_mem.bpf.o /sys/fs/bpf/test type cgroup/sock_create # bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/test [128, 256) 6413 \| \| [256, 512) 1868425 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [512, 1K) 1101697 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 117031 \|@@@@ \| [2K, 4K) 11773 \| \| Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-6-kuniyu@google.com
2025-10-16	Merge tag 'net-6.18-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN Current release - regressions: - udp: do not use skb_release_head_state() before skb_attempt_defer_free() - gro_cells: use nested-BH locking for gro_cell - dpll: zl3073x: increase maximum size of flash utility Previous releases - regressions: - core: fix lockdep splat on device unregister - tcp: fix tcp_tso_should_defer() vs large RTT - tls: - don't rely on tx_work during send() - wait for pending async decryptions if tls_strp_msg_hold fails - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom Previous releases - always broken: - ip6_tunnel: prevent perpetual tunnel growth - dpll: zl3073x: handle missing or corrupted flash configuration - can: m_can: fix pm_runtime and CAN state handling - eth: - ixgbe: fix too early devlink_free() in ixgbe_remove() - ixgbevf: fix mailbox API compatibility - gve: Check valid ts bit on RX descriptor before hw timestamping - idpf: cleanup remaining SKBs in PTP flows - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H" * tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) udp: do not use skb_release_head_state() before skb_attempt_defer_free() net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset netdevsim: set the carrier when the device goes up selftests: tls: add test for short splice due to full skmsg selftests: net: tls: add tests for cmsg vs MSG_MORE tls: don't rely on tx_work during send() tls: wait for pending async decryptions if tls_strp_msg_hold fails tls: always set record_type in tls_process_cmsg tls: wait for async encrypt in case of error during latter iterations of sendmsg tls: trim encrypted message to match the plaintext on short splice tg3: prevent use of uninitialized remote_adv and local_adv variables MAINTAINERS: new entry for IPv6 IOAM gve: Check valid ts bit on RX descriptor before hw timestamping net: core: fix lockdep splat on device unregister MAINTAINERS: add myself as maintainer for b53 selftests: net: check jq command is supported net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit() tcp: fix tcp_tso_should_defer() vs large RTT r8152: add error handling in rtl8152_driver_init usbnet: Fix using smp_processor_id() in preemptible code warnings ...
2025-10-16	selftests: arg_parsing: Ensure data is flushed to disk before reading.	Xing Guo
	test_parse_test_list_file writes some data to /tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read the data back. However, after writing data to that file, we forget to call fsync() and it's causing testing failure in my laptop. This patch helps fix it by adding the missing fsync() call. Fixes: 64276f01dce8 ("selftests/bpf: Test_progs can read test lists from file") Signed-off-by: Xing Guo <higuoxing@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com
2025-10-16	Merge branch 'objtool/core' of ↵	Peter Zijlstra
	https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux This series introduces new objtool features and a klp-build script to generate livepatch modules using a source .patch as input. This builds on concepts from the longstanding out-of-tree kpatch [1] project which began in 2012 and has been used for many years to generate livepatch modules for production kernels. However, this is a complete rewrite which incorporates hard-earned lessons from 12+ years of maintaining kpatch. Key improvements compared to kpatch-build: - Integrated with objtool: Leverages objtool's existing control-flow graph analysis to help detect changed functions. - Works on vmlinux.o: Supports late-linked objects, making it compatible with LTO, IBT, and similar. - Simplified code base: ~3k fewer lines of code. - Upstream: No more out-of-tree #ifdef hacks, far less cruft. - Cleaner internals: Vastly simplified logic for symbol/section/reloc inclusion and special section extraction. - Robust __LINE__ macro handling: Avoids false positive binary diffs caused by the __LINE__ macro by introducing a fix-patch-lines script which injects #line directives into the source .patch to preserve the original line numbers at compile time. The primary user interface is the klp-build script which does the following: - Builds an original kernel with -function-sections and -fdata-sections, plus objtool function checksumming. - Applies the .patch file and rebuilds the kernel using the same options. - Runs 'objtool klp diff' to detect changed functions and generate intermediate binary diff objects. - Builds a kernel module which links the diff objects with some livepatch module init code (scripts/livepatch/init.c). - Finalizes the livepatch module (aka work around linker wreckage) using 'objtool klp post-link'. I've tested with a variety of patches on defconfig and Fedora-config kernels with both GCC and Clang.
2025-10-16	x86/insn: Simplify for_each_insn_prefix()	Peter Zijlstra
	Use the new-found freedom of allowing variable declarions inside for() to simplify the for_each_insn_prefix() iterator to no longer need an external temporary. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-15	selftests: tls: add test for short splice due to full skmsg	Sabrina Dubroca
	We don't have a test triggering a partial splice caused by a full skmsg. Add one, based on a program by Jann Horn. Use MAX_FRAGS=48 to make sure the skmsg will be full for any allowed value of CONFIG_MAX_SKB_FRAGS (17..45). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/1d129a15f526ea3602f3a2b368aa0b6f7e0d35d5.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15	selftests: net: tls: add tests for cmsg vs MSG_MORE	Sabrina Dubroca
	We don't have a test to check that MSG_MORE won't let us merge records of different types across sendmsg calls. Add new tests that check: - MSG_MORE is only allowed for DATA records - a pending DATA record gets closed and pushed before a non-DATA record is processed Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/b34feeadefe8a997f068d5ed5617afd0072df3c0.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15	sched_ext: Add a selftest for scx_bpf_dsq_peek	Ryan Newton
	This commit adds two tests. The first is the most basic unit test: make sure an empty queue peeks as empty, and when we put one element in the queue, make sure peek returns that element. However, even this simple test is a little complicated by the different behavior of scx_bpf_dsq_insert in different calling contexts: - insert is for direct dispatch in enqueue - insert is delayed when called from select_cpu In this case we split the insert and the peek that verifies the result between enqueue/dispatch. Note: An alternative would be to call `scx_bpf_dsq_move_to_local` on an empty queue, which in turn calls `flush_dispatch_buf`, in order to flush the buffered insert. Unfortunately, this is not viable within the enqueue path, as it attempts a voluntary context switch within an RCU read-side critical section. The second test is a stress test that performs many peeks on all DSQs and records the observed tasks. Signed-off-by: Ryan Newton <newton@meta.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15	sched_ext: Add lockless peek operation for DSQs	Ryan Newton
	The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data structures. They might be per-cpu with low-contention, or high-contention shared queues. Unfortunately, DSQs have a coarse-grained lock around the whole data structure. Without going all the way to a lock-free, more scalable implementation, a small step we can take to reduce lock contention is to allow a lockless, small-fixed-cost peek at the head of the queue. This change allows certain custom SCX schedulers to cheaply peek at queues, e.g. during load balancing, before locking them. But it represents a few extra memory operations to update the pointer each time the DSQ is modified, including a memory barrier on ARM so the write appears correctly ordered. This commit adds a first_task pointer field which is updated atomically when the DSQ is modified, and allows any thread to peek at the head of the queue without holding the lock. Signed-off-by: Ryan Newton <newton@meta.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15	selftests/hid: add tests for missing release on the Dell Synaptics	Benjamin Tissoires
	Add a simple test for the corner case not currently covered by the sticky fingers quirk. Because it's a corner case test, we only test this on a couple of devices, not on all of them because the value of adding the same test over and over is rather moot. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-10-15	selftests: cgroup: Use values_close_report in test_cpu	Sebastian Chlad
	Convert test_cpu to use the newly added values_close_report() helper to print detailed diagnostics when a tolerance check fails. This provides clearer insight into deviations while run in the CI. Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15	selftests: cgroup: add values_close_report helper	Sebastian Chlad
	Some cgroup selftests, such as test_cpu, occasionally fail by a very small margin and if run in the CI context, it is useful to have detailed diagnostic output to understand the deviation. Introduce a values_close_report() helper which performs the same comparison as values_close(), but prints detailed information when the values differ beyond the allowed tolerance. Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15	perf test parse-events: Add evsel test helper	Ian Rogers
	Add TEST_ASSERT_EVSEL to dump the failing evsel in the event of a failure. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test parse-events: Add evlist test helper	Ian Rogers
	Add TEST_ASSERT_EVLIST to dump the failing evlist in the event of a failure. Add the macro to a number of tests not currently checking the evlist length. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test: Clean up test_..config helpers	Ian Rogers
	Just have a single test_hw_config helper that strips extended type information in the case of hardware and hardware cache events. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test: Switch cycles event to cpu-cycles	Ian Rogers
	Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. As tests assume a core event, switch to use "cpu-cycles" that avoids the overloaded "cycles" event on troublesome PMUs and is so far not overloaded. Note, on x86 this changes a legacy event into a sysfs one. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test parse-events: Remove cpu PMU requirement	Ian Rogers
	In the event parse string, switch "cpu" to "default_core" and then rewrite this to the first core PMU name prior to parsing. This enables testing with a PMU on hybrid x86 and other systems that don't use "cpu" for the core PMU name. The name "default_core" is already used by jevents. Update test expectations to match. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test parse-events: Without a PMU use cpu-cycles rather than cycles	Ian Rogers
	Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. Switch to using "cpu-cycles" which is an indentical legacy event but avoids the multiple PMU confusion introduced by the PMU drivers. Note, on x86 cpu-cycles is also a sysfs event but cycles isn't. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf test parse-events: Use evsel__match for legacy events	Ian Rogers
	Switch from the test's assert_hw/test_config to the common evsel__match code that appropriately handles events with both legacy and sysfs/json encoding. For tests asserting that a config value matches that placed in the perf_event_attr just directly compare the config values. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf evsel: Improvements to __evsel__match	Ian Rogers
	Ensure both the perf_event_attr and alternate_hw_config are checked in the match. Don't mask the config if the perf_event_attr isn't a HARDWARE or HW_CACHE event. Add common early exit cases. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf evlist: Avoid scanning all PMUs for evlist__new_default	Ian Rogers
	Rather than wildcard matching the cycles event specify only the core PMUs. This avoids potentially loading unnecessary uncore PMUs. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf top: Use evlist__new_default when no events specified	Ian Rogers
	Rather than distributing the code doing similar things to evlist__new_default, use the one implementation so that paranoia and wildcard scanning can be optimized. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf record: Use evlist__new_default when no events specified	Ian Rogers
	Rather than distributing the code doing similar things to evlist__new_default, use the one implementation so that paranoia and wildcard scanning can be optimized. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf parse-events: Remove hard coded legacy hardware and cache parsing	Ian Rogers
	Now that legacy hardware and cache events are in json, having the lexer match the specific event is no longer necessary and generic PMU parsing can take place. Because of this remove the specific term parsing, event adding, and passing of alternate_hw_config which was now always PERF_COUNT_HW_MAX. This mirrors a similar change for software events in commit 6e9fa4131abb ("perf parse-events: Remove non-json software events"). With no hard coded legacy hardware or cache events the wild card, case insensitivity, etc. is consistent for events. This does, however, mean events like cycles will wild card against all PMUs. A change does the same was originally posted and merged from: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com and reverted by Linus in commit 4f1b067359ac ("Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"") due to his dislike for the cycles behavior on ARM. Earlier patches in this series make perf record event opening failures non-fatal and hide the cycles event's failure to open on ARM in perf record, so it is expected the behavior will now be transparent in perf record. perf stat with a cycles event will wildcard open the event on all PMUs. As cycles is a "default event", the perf stat behavior for default events was updated to only open them on core/software PMUs. The change to support legacy events with PMUs was done to clean up Intel's hybrid PMU implementation. Having sysfs/json events with increased priority to legacy was requested by Mark Rutland <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy events on that PMU. It was requested that RISC-V be able to add events to the perf tool json so the PMU driver didn't need to map legacy events to config encodings: https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ A previous series of patches decreasing legacy hardware event priorities was posted in: https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ Namhyung Kim <namhyung@kernel.org> mentioned that hardware and software events can be implemented similarly: https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf print-events: Remove print_symbol_events	Ian Rogers
	Now legacy hardware events are in json there's no need for a specific printing routine that previously served for both hardware and software events. The associated event_symbols_hw is also removed. To support the previous filtered version use an event glob of "legacy hardware" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf print-events: Remove print_hwcache_events	Ian Rogers
	Now legacy cache events are in json there's no need for a specific printing routine. To support the previous filtered version use an event glob of "legacy cache" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf jevents: Add legacy-hardware and legacy-cache json	Ian Rogers
	The legacy-hardware.json is added containing hardware events similarly to the software.json file. A difference is that for the software PMU the name is known and matches sysfs. In the legacy-hardware.json no Unit/PMU is specified for the events meaning default_core is used and the events will appear for all core PMUs. There are potentially 1216 legacy cache events, rather than list them in a json file add a make_legacy_cache.py helper to generate them. By using json for legacy hardware and cache events: descriptions of the events can be added; events can be marked as deprecated, such as those misleadingly named l2 (deprecated is also used to mark all events that weren't previously displayed in perf list); and the name lookup becomes case insensitive. The C string encoding all the perf events and metrics is increased in size by 123,499 bytes which will increase the perf binary size. Later changes will remove hard coded event parsing for legacy hardware and cache events, turning parsing overhead into a binary search during event lookup. That event descriptions are based off of those in perf_event_open man page, credit to Vince Weaver <vincent.weaver@maine.edu>. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf pmu: Add and use legacy_terms in alias information	Ian Rogers
	Add support to finding/adding events from the default_core event table. If an event already exists from sysfs/json then the default_core configuration is saved in the legacy_terms string. Lazily use the legacy_terms string to set a legacy hardware or cache event as deprecated if the core PMU doesn't support it. Use the legacy terms string to set the alternate_hw_config, avoiding the value needing to be passed from the parse_events parser. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf jevents: Add legacy json terms and default_core event table helper	Ian Rogers
	Add json LegacyConfigCode and LegacyCacheCode values that translate to legacy-hardware-config and legacy-cache-config event terms respectively. Add perf_pmu__default_core_events_table as a means to find a default_core event table that will later contain legacy events. In situations like hypervisors it is more likely that tables will be NULL. Rather than testing in the calling PMU code, early exit in the pmu-event.c routines. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf parse-events: Add terms for legacy hardware and cache config values	Ian Rogers
	Add the PMU terms legacy-hardware-config and legacy-cache-config. These terms are similar to the config term in that their values are assigned to the perf_event_attr config value. They differ in that the PMU type is switched to be either PERF_TYPE_HARDWARE or PERF_TYPE_HW_CACHE, and the PMU type is moved into the extended type information of the config value. This will allow later patches to add legacy events to json. An example use of the terms is in the following: ``` $ perf stat -vv -e 'cpu/legacy-hardware-config=1/,cpu/legacy-cache-config=0x10001/' true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/legacy-hardware-config=0x1/ ..after resolving event: cpu/legacy-hardware-config=0x1/ Attempt to add: cpu/legacy-cache-config=0x10001/ ..after resolving event: cpu/legacy-cache-config=0x10001/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x1 (PERF_COUNT_HW_INSTRUCTIONS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10001 (PERF_COUNT_HW_CACHE_RESULT_MISS \| PERF_COUNT_HW_CACHE_OP_READ \| PERF_COUNT_HW_CACHE_L1I) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 4 cpu/legacy-hardware-config=1/: -1: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: -1: 57453 414756 414756 cpu/legacy-hardware-config=1/: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: 57453 414756 414756 Performance counter stats for 'true': 1,364,046 cpu/legacy-hardware-config=1/ 57,453 cpu/legacy-cache-config=0x10001/ 0.001988593 seconds time elapsed 0.002194000 seconds user 0.000000000 seconds sys ``` Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf pmu: Factor term parsing into a perf_event_attr into a helper	Ian Rogers
	Factor existing functionality in perf_pmu__name_from_config into a helper that will be used in later patches. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf pmu: Use fd rather than FILE from new_alias	Ian Rogers
	The FILE argument was necessary for the scanner but now that functionality is not being used we can switch to just using io__getline which should cut down on stdio buffer usage. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf parse-events: Remove unused FILE input argument to scanner	Ian Rogers
	Now the events file isn't directly parsed from a FILE but stored in a string prior to parsing, remove the FILE argument to the associated scanner functions as they only ever pass NULL. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf pmu: Don't eagerly parse event terms	Ian Rogers
	When an event/alias is created for a PMU the terms are eagerly parsed using parse_events_terms. For a command like perf stat or perf record, the particular event/alias will be found, the terms parsed, the terms cloned for use in the event parsing, and then the terms used to configure the perf_event_attr. Events/aliases may be eagerly loaded, such as from sysfs or in perf list, in which case the aliases terms will be little or never used. To avoid redundant work, to avoid cloning, and to reduce memory overhead, hold the terms for an event as a string until they need handling as a term list. This may introduce duplicate parsing if an event is repeated in a list, but this situation is expected to be uncommon. Measuring the number of instructions before and after with a sysfs event and perf stat, there is a minor reduction in the number of instructions executed by 0.3%. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf jevents: Support copying the source json files to OUTPUT	Ian Rogers
	The jevents command expects all json files to be organized under a single directory. When generating json files from scripts (to reduce laborious copy and paste in the json) we don't want to generate the json into the source directory if there is an OUTPUT directory specified. This change adds a GEN_JSON for this case where the GEN_JSON copies the JSON files to OUTPUT, only when OUTPUT is specified. The Makefile.perf clean code is updated to clean up this directory when present. This patch is part of: https://lore.kernel.org/lkml/20240926173554.404411-12-irogers@google.com/ which was similarly adding support for generating json in scripts for the consumption of jevents.py. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf record: Skip don't fail for events that don't open	Ian Rogers
	Whilst for many tools it is an expected behavior that failure to open a perf event is a failure, ARM decided to name PMU events the same as legacy events and then failed to rename such events on a server uncore SLC PMU. As perf's default behavior when no PMU is specified is to open the event on all PMUs that advertise/"have" the event, this yielded failures when trying to make the priority of legacy and sysfs/json events uniform - something requested by RISC-V and ARM. A legacy event user on ARM hardware may find their event opened on an uncore PMU which for perf record will fail. Arnaldo suggested skipping such events which this patch implements. Rather than have the skipping conditional on running on ARM, the skipping is done on all architectures as such a fundamental behavioral difference could lead to problems with tools built/depending on perf. An example of perf record failing to open events on x86 is: ``` $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1 Error: Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed. The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read). "dmesg \| grep -i perf" may provide additional information. Error: Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed. The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read). "dmesg \| grep -i perf" may provide additional information. Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ] $ perf report --stats Aggregated stats: TOTAL events: 17255 MMAP events: 284 ( 1.6%) COMM events: 1961 (11.4%) EXIT events: 1 ( 0.0%) FORK events: 1960 (11.4%) SAMPLE events: 87 ( 0.5%) MMAP2 events: 12836 (74.4%) KSYMBOL events: 83 ( 0.5%) BPF_EVENT events: 36 ( 0.2%) FINISHED_ROUND events: 2 ( 0.0%) ID_INDEX events: 1 ( 0.0%) THREAD_MAP events: 1 ( 0.0%) CPU_MAP events: 1 ( 0.0%) TIME_CONV events: 1 ( 0.0%) FINISHED_INIT events: 1 ( 0.0%) cycles stats: SAMPLE events: 87 ``` If all events fail to open then the perf record will fail: ``` $ perf record -e LLC-prefetch-read true Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. Error: Failure to open any events for recording ``` As an evlist may have dummy events that open when all command line events fail we ignore dummy events when detecting if at least some events open. This still permits the dummy event on its own to be used as a permission check: ``` $ perf record -e dummy true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] ``` but allows failure when a dummy event is implicilty inserted or when there are insufficient permissions to open it: ``` $ perf record -e LLC-prefetch-read -a true Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. Error: Failure to open any events for recording ``` As the first parsed event in an evlist is marked as tracking, removing this event can remove tracking from the evlist, removing mmap events and breaking symbolization. To avoid this, if a tracking event is removed then the next event has tracking added. The issue with legacy events is that on RISC-V they want the driver to not have mappings from legacy to non-legacy config encodings for each vendor/model due to size, complexity and difficulty to update. It was reported that on ARM Apple-M? CPUs the legacy mapping in the driver was broken and the sysfs/json events should always take precedent, however, it isn't clear this is still the case. It is the case that without working around this issue a legacy event like cycles without a PMU can encode differently than when specified with a PMU - the non-PMU version favoring legacy encodings, the PMU one avoiding legacy encodings. Legacy events are also case sensitive while sysfs/json events are not. The patch removes events and then adjusts the idx value for each evsel. This is done so that the dense xyarrays used for file descriptors, etc. don't contain broken entries. On ARM it could be common following this change to see a lot of warnings for the cycles event due to many ARM PMUs advertising the cycles event (ARM inconsistently have events bus_cycles and then cycles implying CPU cycles, they also sometimes have a cpu_cycles event). As cycles is a popular event, avoid potentially spamming users with error messages on ARM when there are multiple cycles events in the evlist, the error is still shown when verbose is enabled. Prior versions without adding the tracking data and not warning for cycles on ARM was: Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf stat: Avoid wildcarding PMUs for default events	Ian Rogers
	Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. To make perf's behavior consistent, just look up default events with their designated PMU types. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15	perf perf_api_probe: Avoid scanning all PMUs, try software PMU first	Ian Rogers
	Scan the software PMU first rather than last as it is the least likely to fail the probe. Specifying the software PMU by name was enabled by commit 9957d8c801fe ("perf jevents: Add common software event json"). For hardware events, add core PMU names when getting events to probe so that not all PMUs are scanned. For example, when legacy events support wildcards and for the event "cycles:u" on x86, we want to only scan the "cpu" PMU and not all uncore PMUs for the event too. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>