kernel - Hosts the 0x221E linux distro kernel.

Age	Commit message (Collapse)	Author
2026-01-12	selftests: ublk: add stop command with --safe option	Ming Lei
	Add 'stop' subcommand to kublk utility that uses the new UBLK_CMD_TRY_STOP_DEV command when --safe option is specified. This allows stopping a device only if it has no active openers, returning -EBUSY otherwise. Also add test_generic_16.sh to test the new functionality. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()	Waiman Long
	As stated in commit 1c09b195d37f ("cpuset: fix a regression in validating config change"), it is not allowed to clear masks of a cpuset if there're tasks in it. This is specific to v1 since empty "cpuset.cpus" or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs or memory nodes from its parent. So it is OK to have empty cpus or mems even if there are tasks in the cpuset. Move this empty cpus/mems check in validate_change() to cpuset1_validate_change() to allow more flexibility in setting cpus or mems in v2. cpuset_is_populated() needs to be moved into cpuset-internal.h as it is needed by the empty cpus/mems checking code. Also add a test case to test_cpuset_prs.sh to verify that. Reported-by: Chen Ridong <chenridong@huaweicloud.com> Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@huaweicloud.com/ Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-12	cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict	Waiman Long
	Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition, the sibling's partition state becomes invalid. This is overly harsh and is probably not necessary. The cpuset.cpus.exclusive control file, if set, will override the cpuset.cpus of the same cpuset when creating a cpuset partition. So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up a partition. However, it cannot override a conflicting cpuset.cpus file in a sibling cpuset and the partition creation process will fail. This is inconsistent. That will also make using cpuset.cpus.exclusive less valuable as a tool to set up cpuset partitions as the users have to check if such a cpuset.cpus conflict exists or not. Fix these problems by making sure that once a cpuset.cpus.exclusive is set without failure, it will always be allowed to form a valid partition as long as at least one CPU can be granted from its parent irrespective of the state of the siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive will fail if it conflicts with the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value of a sibling. Partition can still be created by setting only cpuset.cpus without setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will be removed from its cpuset.cpus.exclusive.effective as long as there is still one or more CPUs left and can be granted from its parent. This CPU stripping is currently done in rm_siblings_excl_cpus(). The new code will now try its best to enable the creation of new partitions with only cpuset.cpus set without invalidating existing ones. However it is not guaranteed that all the CPUs requested in cpuset.cpus will be used in the new partition even when all these CPUs can be granted from the parent. This is similar to the fact that cpuset.cpus.effective may not be able to include all the CPUs requested in cpuset.cpus. In this case, the parent may not able to grant all the exclusive CPUs requested in cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have already been granted to other partitions earlier. With the creation of multiple sibling partitions by setting only cpuset.cpus, this does have the side effect that their exact cpuset.cpus.exclusive.effective settings will depend on the order of partition creation if there are conflicts. Due to the exclusive nature of the CPUs in a partition, it is not easy to make it fair other than the old behavior of invalidating all the conflicting partitions. For example, # echo "0-2" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cat A1/cpuset.cpus.partition root # cat A1/cpuset.cpus.exclusive.effective 0-2 # echo "2-4" > B1/cpuset.cpus # echo "root" > B1/cpuset.cpus.partition # cat B1/cpuset.cpus.partition root # cat B1/cpuset.cpus.exclusive.effective 3-4 # cat B1/cpuset.cpus.effective 3-4 For users who want to be sure that they can get most of the CPUs they want, cpuset.cpus.exclusive should be used instead if they can set it successfully without failure. Setting cpuset.cpus.exclusive will guarantee that sibling conflicts from then onward is no longer possible. To make this change, we have to separate out the is_cpu_exclusive() check in cpus_excl_conflict() into a cgroup v1 only cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change() helper is now no longer needed and can be removed. Some existing tests in test_cpuset_prs.sh are updated and new ones are added to reflect the new behavior. The cgroup-v2.rst doc file is also updated the clarify what exclusive CPUs will be used when a partition is created. Reported-by: Sun Shaojie <sunshaojie@kylinos.cn> Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/ Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-12	selftests: ublk: add end-to-end integrity test	Caleb Sander Mateos
	Add test case loop_08 to verify the ublk integrity data flow. It uses the kublk loop target to create a ublk device with integrity on top of backing data and integrity files. It then writes to the whole device with fio configured to generate integrity data. Then it reads back the whole device with fio configured to verify the integrity data. It also verifies that injected guard, reftag, and apptag corruptions are correctly detected. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: add integrity params test	Caleb Sander Mateos
	Add test case null_04 to exercise all the different integrity params. It creates 4 different ublk devices with different combinations of integrity arguments and verifies their integrity limits via sysfs and the metadata_size utility. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: add integrity data support to loop target	Caleb Sander Mateos
	To perform and end-to-end test of integrity information through a ublk device, we need to actually store it somewhere and retrieve it. Add this support to kublk's loop target. It uses a second backing file for the integrity data corresponding to the data stored in the first file. The integrity file is initialized with byte 0xFF, which ensures the app and reference tags are set to the "escape" pattern to disable the bio-integrity-auto guard and reftag checks until the blocks are written. The integrity file is opened without O_DIRECT since it will be accessed at sub-block granularity. Each incoming read/write results in a pair of reads/writes, one to the data file, and one to the integrity file. If either backing I/O fails, the error is propagated to the ublk request. If both backing I/Os read/write some bytes, the ublk request is completed with the smaller of the number of blocks accessed by each I/O. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: support non-O_DIRECT backing files	Caleb Sander Mateos
	A subsequent commit will add support for using a backing file to store integrity data. Since integrity data is accessed in intervals of metadata_size, which may be much smaller than a logical block on the backing device, direct I/O cannot be used. Add an argument to backing_file_tgt_init() to specify the number of files to open for direct I/O. The remaining files will use buffered I/O. For now, continue to request direct I/O for all the files. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: implement integrity user copy in kublk	Caleb Sander Mateos
	If integrity data is enabled for kublk, allocate an integrity buffer for each I/O. Extend ublk_user_copy() to copy the integrity data between the ublk request and the integrity buffer if the ublksrv_io_desc indicates that the request has integrity data. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: add kublk support for integrity params	Caleb Sander Mateos
	Add integrity param command line arguments to kublk. Plumb these to struct ublk_params for the null and fault_inject targets, as they don't need to actually read or write the integrity data. Forbid the integrity params for loop or stripe until the integrity data copy is implemented. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: add utility to get block device metadata size	Caleb Sander Mateos
	Some block device integrity parameters are available in sysfs, but others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a metadata_size utility program to print out the logical block metadata size, PI offset, and PI size within the metadata. Example output: $ metadata_size /dev/ublkb0 metadata_size: 64 pi_offset: 56 pi_tuple_size: 8 Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12	selftests: ublk: display UBLK_F_INTEGRITY support	Caleb Sander Mateos
	Add support for printing the UBLK_F_INTEGRITY feature flag in the human-readable kublk features output. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-11	Merge branch 'block-6.19' into for-7.0/block	Jens Axboe
	Merge in fixes that went to 6.19 after for-7.0/block was branched. Pending ublk changes depend on particularly the async scan work. * block-6.19: block: zero non-PI portion of auto integrity buffer ublk: fix use-after-free in ublk_partition_scan_work blk-mq: avoid stall during boot due to synchronize_rcu_expedited loop: add missing bd_abort_claiming in loop_set_status block: don't merge bios with different app_tags blk-rq-qos: Remove unlikely() hints from QoS checks loop: don't change loop device under exclusive opener in loop_set_status block, bfq: update outdated comment blk-mq: skip CPU offline notify on unmapped hctx selftests/ublk: fix Makefile to rebuild on header changes selftests/ublk: add test for async partition scan ublk: scan partition in async way block,bfq: fix aux stat accumulation destination md: Fix forward incompatibility from configurable logical block size md: Fix logical_block_size configuration being overwritten md: suspend array while updating raid_disks via sysfs md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt() md: Fix static checker warning in analyze_sbs
2026-01-11	tools/nolibc: Add a simple test for writing to a FILE and reading it back	Daniel Palmer
	Add a test that exercises create->write->seek->read to check that using the stream functions (fwrite() etc) is not totally broken. The only edge cases this is testing for are: - Reading the file after writing but without rewinding reads nothing. - Trying to read more items than the file contains returns the count of fully read items. Signed-off-by: Daniel Palmer <daniel@thingy.jp> Link: https://patch.msgid.link/20260105023629.1502801-4-daniel@thingy.jp Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2026-01-11	selftests/nolibc: also test libc-test through regular selftest framework	Thomas Weißschuh
	Hook up libc-test to the regular selftest build to make sure nolibc-test.c stays compatible with a normal libc. As the pattern rule from lib.mk does not handle compiling a target from a differently named source file, add an explicit rule definition. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-3-f82101c2c505@weissschuh.net
2026-01-11	selftests/nolibc: scope custom flags to the nolibc-test target	Thomas Weißschuh
	A new target for 'libc-test' is going to be added which should not be affected by these options. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-2-f82101c2c505@weissschuh.net
2026-01-11	selftests/nolibc: try to read from stdin in readv_zero test	Thomas Weißschuh
	When stdout is redirected to a file this test fails. This happens when running through the kselftest runner since commit d9e6269e3303 ("selftests/run_kselftest.sh: exit with error if tests fail"). For consistency with other tests that read from a file descriptor, switch to stdin over stdout. The tests are still brittle against a redirected stdin, but at least they are now consistently so. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-1-f82101c2c505@weissschuh.net
2026-01-10	Merge tag 'linux_kselftest-fixes-6.19-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "Fix tracing test_multiple_writes stalls when buffer_size_kb is less than 12KB" * tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/tracing: Fix test_multiple_writes stall
2026-01-10	selftests: net: py: ensure defer() is only used within a test case	Jakub Kicinski
	I wasted a couple of hours recently after accidentally adding a defer() from within a function which itself was called as part of defer(). This leads to an infinite loop of defer(). Make sure this cannot happen and raise a helpful exception. I understand that the pair of _ksft_defer_arm() calls may not be the most Pythonic way to implement this, but it's easy enough to understand. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260108225257.2684238-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: net: py: capitalize defer queue and improve import	Jakub Kicinski
	Import utils and refer to the global defer queue that way instead of importing the queue. This will make it possible to assign value to the global variable. While at it capitalize the name, to comply with the Python coding style. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260108225257.2684238-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests/net: parametrise iou-zcrx.py with ksft_variants	David Wei
	Use ksft_variants to parametrise tests in iou-zcrx.py to either use single queues or RSS contexts, reducing duplication. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20260108234521.3619621-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: drv-net: psp: Better control the used PSP dev	Cosmin Ratiu
	The PSP responder fails when zero or multiple PSP devices are detected. There's an option to select the device id to use (-d) but it's currently not used from the PSP self test. It's also hard to use because the PSP test doesn't dump the PSP devices so can't choose one. When zero devices are detected, psp_responder fails which will cause the parent test to fail as well instead of skipping PSP tests. Fix both of these problems. Change psp_responder to: - not fail when no PSP devs are detected. - get an optional -i ifindex argument instead of -d. - select the correct PSP dev from the dump corresponding to ifindex or - select the first PSP dev when -i is not given. - fail when multiple devs are found and -i is not given. - warn and continue when the requested ifindex is not found. Also plumb the ifindex from the Python test. With these, when there are no PSP devs found or the wrong one is chosen, psp_responder opens the server socket, listens for control connections normally, and leaves the skipping of the various test cases which require a PSP device (~most, but not all of them) to the parent test. This results in output like: ok 1 psp.test_case # SKIP No PSP devices found [...] ok 12 psp.dev_get_device # SKIP No PSP devices found ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate # SKIP No PSP devices found [...] Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Link: https://patch.msgid.link/20260109110851.2952906-2-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10	selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1	Sean Christopherson
	Rework the AMX test's #NM handling to use kvm_asm_safe() to verify an #NM actually occurs. As is, a completely missing #NM could go unnoticed. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10	selftests: kvm: try getting XFD and XSAVE state out of sync	Paolo Bonzini
	The host is allowed to set FPU state that includes a disabled xstate component. Check that this does not cause bad effects. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10	selftests: kvm: replace numbered sync points with actions	Paolo Bonzini
	Rework the guest=>host syncs in the AMX test to use named actions instead of arbitrary, incrementing numbers. The "stage" of the test has no real meaning, what matters is what action the test wants the host to perform. The incrementing numbers are somewhat helpful for triaging failures, but fully debugging failures almost always requires a much deeper dive into the test (and KVM). Using named actions not only makes it easier to extend the test without having to shift all sync point numbers, it makes the code easier to read. [Commit message by Sean Christopherson] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-09	selftests: forwarding: update PTP tcpdump patterns	Jakub Kicinski
	Recent version of tcpdump (tcpdump-4.99.6-1.fc43.x86_64) seems to have removed the spurious space after msg type in PTP info, e.g.: before: PTPv2, majorSdoId: 0x0, msg type : sync msg, length: 44 after: PTPv2, majorSdoId: 0x0, msg type: sync msg, length: 44 Update our patterns to match both. Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Link: https://patch.msgid.link/20260107145320.1837464-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09	selftests: drv-net: gro: increase the rcvbuf size	Jakub Kicinski
	The gro.py test (testing software GRO) is slightly flaky when running against fbnic. We see one flake per roughly 20 runs in NIPA, mostly in ipip.large, and always including some EAGAIN: # Shouldn't coalesce if exceed IP max pkt size: Test succeeded # Expected {65475 899 }, Total 2 packets # Received {65475 899 }, Total 2 packets. # Expected {64576 900 900 }, Total 3 packets # Received {64576 /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable The test sends 2 large frames (64k + change). Looks like the default packet socket rcvbuf (~200kB) may not be large enough to hold them. Bump the rcvbuf to 1MB. Add a debug print showing socket statistics to make debugging this issue easier in the future. Without the rcvbuf increase we see: # Shouldn't coalesce if exceed IP max pkt size: Test succeeded # Expected {65475 899 }, Total 2 packets # Received {65475 899 }, Total 2 packets. # Expected {64576 900 900 }, Total 3 packets # Received {64576 Socket stats: packets=7, drops=3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260107232557.2147760-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09	selftests: tls: avoid flakiness in data_steal	Jakub Kicinski
	We see the following failure a few times a week: # RUN global.data_steal ... # tls.c:3280:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (10000) == -1 (-1) # data_steal: Test failed # FAIL global.data_steal not ok 8 global.data_steal The 10000 bytes read suggests that the child process did a recv() of half of the data using the TLS ULP and we're now getting the remaining half. The intent of the test is to get the child to enter _TCP_ recvmsg handler, so it needs to enter the syscall before parent installed the TLS recvmsg with setsockopt(SOL_TLS). Instead of the 10msec sleep send 1 byte of data and wait for the child to consume it. Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260106200205.1593915-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09	selftests/resctrl: Fix non-contiguous CBM check for Hygon	Xiaochen Shen
	The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error: "# Hardware and kernel differ on non-contiguous CBM support!" This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability. Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat(). Link: https://lore.kernel.org/r/20251217030456.3834956-5-shenxiaochen@open-hieco.net Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/resctrl: Add CPU vendor detection for Hygon	Xiaochen Shen
	The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error: "# Can not get vendor info..." This occurs because vendor detection is missing for Hygon CPUs. Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID. Link: https://lore.kernel.org/r/20251217030456.3834956-4-shenxiaochen@open-hieco.net Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/resctrl: Define CPU vendor IDs as bits to match usage	Xiaochen Shen
	The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2 A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs. Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions. Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state. Link: https://lore.kernel.org/r/20251217030456.3834956-3-shenxiaochen@open-hieco.net Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Suggested-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/resctrl: Fix a division by zero error on Hygon	Xiaochen Shen
	Change to adjust effective L3 cache size with SNC enabled change introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported. However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus). This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache(); Causing the resctrl selftest to fail with: "Floating point exception (core dumped)" Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform. Updated commit log to fix commit has issues: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20251217030456.3834956-2-shenxiaochen@open-hieco.net Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/tracing: Fix test_multiple_writes stall	Fushuai Wang
	When /sys/kernel/tracing/buffer_size_kb is less than 12KB, the test_multiple_writes test will stall and wait for more input due to insufficient buffer space. Check current buffer_size_kb value before the test. If it is less than 12KB, it temporarily increase the buffer to 12KB, and restore the original value after the tests are completed. Link: https://lore.kernel.org/r/20260109033620.25727-1-fushuai.wang@linux.dev Fixes: 37f46601383a ("selftests/tracing: Add basic test for trace_marker_raw file") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09	selftests/landlock: Properly close a file descriptor	Günther Noack
	Add a missing close(srv_fd) call, and use EXPECT_EQ() to check the result. Signed-off-by: Günther Noack <gnoack3000@gmail.com> Fixes: f83d51a5bdfe ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets") Link: https://lore.kernel.org/r/20260101134102.25938-2-gnoack3000@gmail.com [mic: Use EXPECT_EQ() and update commit message] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-08	KVM: selftests: Extend vmx_set_nested_state_test to cover SVM	Yosry Ahmed
	Add test cases for the validation checks in svm_set_nested_state(), and allow the test to run with SVM as well as VMX. The SVM test also makes sure that KVM_SET_NESTED_STATE accepts GIF being set or cleared if EFER.SVME is cleared, verifying a recently fixed bug where GIF was incorrectly expected to always be set when EFER.SVME is cleared. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-5-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Use TEST_ASSERT_EQ() in test_vmx_nested_state()	Yosry Ahmed
	The assert messages do not add much value, so use TEST_ASSERT_EQ(), which also nicely displays the addresses in hex. While at it, also assert the values of state->flags. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-4-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()	Sean Christopherson
	Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the "page table entry" makes it unnecessarily difficult to quickly understand what callers are doing. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Extend memstress to run on nested SVM	Yosry Ahmed
	Add L1 SVM code and generalize the setup code to work for both VMX and SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Extend vmx_dirty_log_test to cover SVM	Yosry Ahmed
	Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1 code, doing some renaming (e.g. EPT -> TDP), and having setup code for both SVM and VMX in test_dirty_log(). Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Set the user bit on nested NPT PTEs	Yosry Ahmed
	According to the APM, NPT walks are treated as user accesses. In preparation for supporting NPT mappings, set the 'user' bit on NPTs by adding a mask of bits to always be set on PTEs in kvm_mmu. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add support for nested NPTs	Yosry Ahmed
	Implement nCR3 and NPT initialization functions, similar to the EPT equivalents, and create common TDP helpers for enablement checking and initialization. Enable NPT for nested guests by default if the TDP MMU was initialized, similar to VMX. Reuse the PTE masks from the main MMU in the NPT MMU, except for the C and S bits related to confidential VMs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-17-seanjc@google.com [sean: apply Yosry's fixup for ncr3_gpa] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs	Yosry Ahmed
	In preparation for generalizing the nested dirty logging test, checking if either EPT or NPT is enabled will be needed. To avoid needing to gate the kvm_cpu_has_ept() call by the CPU type, make sure the function returns false if VMX is not available instead of trying to read VMX-only MSRs. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Move TDP mapping functions outside of vmx.c	Sean Christopherson
	Now that the functions are no longer VMX-specific, move them to processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Reuse virt mapping functions for nested EPTs	Yosry Ahmed
	Rework tdp_map() and friends to use __virt_pg_map() and drop the custom EPT code in __tdp_pg_map() and tdp_create_pte(). The EPT code and __virt_pg_map() are practically identical, the main differences are: - EPT uses the EPT struct overlay instead of the PTE masks. - EPT always assumes 4-level EPTs. To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass in the EPT PTE masks along with the root page level (which is currently hardcoded to '4'). Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's no reason to multiplex bits in the selftests, e.g. selftests aren't trying to shadow guest PTEs and thus don't care about funnelling protections into a common permissions check. Another benefit of reusing the code is having separate handling for upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the large bit on a 4K PTE in the EPTs. For all intents and purposes, no functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251230230150.4150236-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add a stage-2 MMU instance to kvm_vm	Sean Christopherson
	Add a stage-2 MMU instance so that architectures that support nested virtualization (more specifically, nested stage-2 page tables) can create and track stage-2 page tables for running L2 guests. Plumb the structure into common code to avoid cyclical dependencies, and to provide some line of sight to having common APIs for creating stage-2 mappings. As a bonus, putting the member in common code justifies using stage2_mmu instead of tdp_mmu for x86. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Stop passing VMX metadata to TDP mapping functions	Yosry Ahmed
	The root GPA is now retrieved from the nested MMU, stop passing VMX metadata. This is in preparation for making these functions work for NPTs as well. Opportunistically drop tdp_pg_map() since it's unused. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs	Yosry Ahmed
	prepare_eptp() currently allocates new EPTs for each vCPU. memstress has its own hack to share the EPTs between vCPUs. Currently, there is no reason to have separate EPTs for each vCPU, and the complexity is significant. The only reason it doesn't matter now is because memstress is the only user with multiple vCPUs. Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use it everywhere to replace prepare_eptp(). Drop 'eptp' and 'eptp_hva' from 'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need to be passed in along with vmx_pages. Dynamically allocate the TDP MMU structure to avoid a cyclical dependency between kvm_util_arch.h and kvm_util.h. Remove the workaround in memstress to copy the EPT root between vCPUs since that's now the default behavior. Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid recreating the same mess that KVM has with respect to "nested" MMUs, e.g. does nested refer to the stage-2 page tables created by L1, or the stage-1 page tables created by L2? Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251230230150.4150236-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Move PTE bitmasks to kvm_mmu	Yosry Ahmed
	Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping functions. Introduce helpers to read/write different PTE bits given a kvm_mmu. Drop the 'global' bit definition as it's currently unused, but leave the 'user' bit as it will be used in coming changes. Opportunisitcally rename 'large' to 'huge' as it's more consistent with the kernel naming. Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and a lot of other macros depend on it. It's tempting to move all the other macros to be per-struct instead, but it would be too much noise for little benefit. Keep c_bit and s_bit in vm->arch as they used before the MMU is initialized, through __vmcreate() -> vm_userspace_mem_region_add() -> vm_mem_add() -> vm_arch_has_protected_memory(). No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: rename accessors to is_<adjective>_pte()] Link: https://patch.msgid.link/20251230230150.4150236-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu	Sean Christopherson
	Add an arch structure+field in "struct kvm_mmu" so that architectures can track arch-specific information for a given MMU. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs	Sean Christopherson
	In preparation for generalizing the x86 virt mapping APIs to work with TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper functions instead of operating on vm->mmu directly. Opportunistically swap the order of the check in virt_get_pte() to first assert that the parent is the PGD, and then check that the PTE is present, as it makes more sense to check if the parent PTE is the PGD/root (i.e. not a PTE) before checking that the PTE is PRESENT. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: rebase on common kvm_mmu structure, rewrite changelog] Link: https://patch.msgid.link/20251230230150.4150236-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08	KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance	Sean Christopherson
	Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1 MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for both stage-1 and stage-2 MMUs, without creating the potential for subtle bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2 MMU. Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in ____vm_create(), and in the hopes that other architectures can utilize the formalized MMU structure if/when they too support stage-2 page tables. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230230150.4150236-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>