summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-01-13hrtimer: Remove unused resolution constantsThomas Weißschuh
These constants are never used, remove them. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260107-hrtimer-header-cleanup-v1-1-1a698ef0ddae@linutronix.de
2026-01-13fs: report filesystem and file I/O errors to fsnotifyDarrick J. Wong
Create some wrapper code around struct super_block so that filesystems have a standard way to queue filesystem metadata and file I/O error reports to have them sent to fsnotify. If a filesystem wants to provide an error number, it must supply only negative error numbers. These are stored internally as negative numbers, but they are converted to positive error numbers before being passed to fanotify, per the fanotify(7) manpage. Implementations of super_operations::report_error are passed the raw internal event data. Note that we have to play some shenanigans with mempools and queue_work so that the error handling doesn't happen outside of process context, and the event handler functions (both ->report_error and fsnotify) can handle file I/O error messages without having to worry about whatever locks might be held. This asynchronicity requires that unmount wait for pending events to clear. Add a new callback to the superblock operations structure so that filesystem drivers can themselves respond to file I/O errors if they so desire. This will be used for an upcoming self-healing patchset for XFS. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13uapi: promote EFSCORRUPTED and EUCLEAN to errno.hDarrick J. Wong
Stop definining these privately and instead move them to the uapi errno.h so that they become canonical instead of copy pasta. Cc: linux-api@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402587.3490369.17659117524205214600.stgit@frogsfrogsfrogs Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13net/mlx5: Add IFC bits for extended ETS rate limit bandwidth valueAlexei Lazar
Add hardware interface definitions to support extended bandwidth rate limiting in the QoS Enhanced Transmission Selection (ETS) configuration. The new fields include: - max_bw_value: extended from 8-bit to 16-bit in ets_tcn_config_reg, simplifying the implementation by using a single field instead of separate MSB/LSB fields. - qetcr_qshr_max_bw_val_msb: capability bit in qcam_qos_feature_cap_mask indicating device support for the extended 16-bit max_bw_value field. These interface additions are prerequisites for increasing the per-TC rate limit beyond 255 Gbps to support higher-bandwidth NICs. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768200608-1543180-1-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-12net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definitionLorenzo Bianconi
Fix Typo in airoha_ppe_dev_setup_tc_block_cb routine definition when CONFIG_NET_AIROHA is not enabled. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601090517.Fj6v501r-lkp@intel.com/ Fixes: f45fc18b6de04 ("net: airoha: Add airoha_ppe_dev struct definition") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260109-airoha_ppe_dev_setup_tc_block_cb-typo-v1-1-282e8834a9f9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12Merge tag 'cgroup-for-6.19-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: - Fix -Wflex-array-member-not-at-end warnings in cgroup_root * tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Eliminate cgrp_ancestor_storage in cgroup_root
2026-01-12PCI: Provide pci_free_irq_vectors() stubBoqun Feng
473b9f331718 ("rust: pci: fix build failure when CONFIG_PCI_MSI is disabled") fixed a build error by providing Rust helpers when CONFIG_PCI_MSI is not set. However the Rust helpers rely on pci_free_irq_vectors(), which is only available when CONFIG_PCI=y. When CONFIG_PCI is not set, there is already a stub for pci_alloc_irq_vectors(). Add a similar stub for pci_free_irq_vectors(). Fixes: 473b9f331718 ("rust: pci: fix build failure when CONFIG_PCI_MSI is disabled") Reported-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Closes: https://lore.kernel.org/rust-for-linux/20251209014312.575940-1-fujita.tomonori@gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512220740.4Kexm4dW-lkp@intel.com/ Reported-by: Liang Jie <liangjie@lixiang.com> Closes: https://lore.kernel.org/rust-for-linux/20251222034415.1384223-1-buaajxlj@163.com/ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Drew Fustini <fustini@kernel.org> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20251226113938.52145-1-boqun.feng@gmail.com
2026-01-12initrd: remove deprecated code path (linuxrc)Askar Safin
Remove linuxrc initrd code path, which was deprecated in 2020. Initramfs and (non-initial) RAM disks (i. e. brd) still work. Both built-in and bootloader-supplied initramfs still work. Non-linuxrc initrd code path (i. e. using /dev/ram as final root filesystem) still works, but I put deprecation message into it. Also I deprecate command line parameters "noinitrd" and "ramdisk_start=". Signed-off-by: Askar Safin <safinaskar@gmail.com> Link: https://patch.msgid.link/20251119222407.3333257-3-safinaskar@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12blk-integrity: take const pointer in blk_integrity_rq()Caleb Sander Mateos
blk_integrity_rq() doesn't modify the struct request passed in, so allow a const pointer to be passed. Use a matching signature for the !CONFIG_BLK_DEV_INTEGRITY version. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12fs: add init_pivot_root()Christian Brauner
We will soon be able to pivot_root() with the introduction of the immutable rootfs. Add a wrapper for kernel internal usage. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-2-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12sched: Move clock related paravirt code to kernel/schedJuergen Gross
Paravirt clock related functions are available in multiple archs. In order to share the common parts, move the common static keys to kernel/sched/ and remove them from the arch specific files. Make a common paravirt_steal_clock() implementation available in kernel/sched/cputime.c, guarding it with a new config option CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch in case it wants to use that common variant. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
2026-01-12fs: add a ->sync_lazytime methodChristoph Hellwig
Allow the file system to explicitly implement lazytime syncing instead of pigging back on generic inode dirtying. This allows to simplify the XFS implementation and prepares for non-blocking lazytime timestamp updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: refactor ->update_time handlingChristoph Hellwig
Pass the type of update (atime vs c/mtime plus version) as an enum instead of a set of flags that caused all kinds of confusion. Because inode_update_timestamps now can't return a modified version of those flags, return the I_DIRTY_* flags needed to persist the update, which is what the main caller in generic_update_time wants anyway, and which is suitable for the other callers that only want to know if an update happened. The whole update_time path keeps the flags argument, which will be used to support non-blocking updates soon even if it is unused, and (the slightly renamed) inode_update_time also gains the possibility to return a negative errno to support this. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: allow error returns from generic_update_timeChristoph Hellwig
Now that no caller looks at the updated flags, switch generic_update_time to the same calling convention as the ->update_time method and return 0 or a negative errno. This prepares for adding non-blocking timestamp updates that could return -EAGAIN. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: remove inode_update_timeChristoph Hellwig
The only external user is gone now, open code it in the two VFS callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-2-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12spi: spi-mem: Create a repeated address operationMiquel Raynal
In octal DTR mode addresses may either be long enough to cover at least two bytes (in which case the existing macro works), or otherwise for single byte addresses, the byte must also be duplicated and sent twice: on each front of the clock. Create a macro for this common case. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://patch.msgid.link/20260109-winbond-v6-17-rc1-oddr-v2-2-1fff6a2ddb80@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12spi: spi-mem: Make the DTR command operation macro more suitableMiquel Raynal
In order to introduce DTR support in SPI NAND, a number of macros had to be created in the spi-mem layer. One of them remained unused at this point, SPI_MEM_DTR_OP_CMD. Being in the process of introducing octal DTR support now, experience shows that as-is the macro is not useful. In order to be really useful in octal DTR mode, the command opcode (one byte) must always be transmitted on the 8 data lines on both the rising and falling edge of the clock. Align the macro with the real needs by duplicating the opcode in the buffer and doubling its size. Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://patch.msgid.link/20260109-winbond-v6-17-rc1-oddr-v2-1-1fff6a2ddb80@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12Merge tag 'v6.19-rc5' into driver-core-nextDanilo Krummrich
We need the driver-core fixes in here as well to build on top of. Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-01-12readdir: require opt-in for d_type flagsAmir Goldstein
Commit c31f91c6af96 ("fuse: don't allow signals to interrupt getdents copying") introduced the use of high bits in d_type as flags. However, overlayfs was not adapted to handle this change. In ovl_cache_entry_new(), the code checks if d_type == DT_CHR to determine if an entry might be a whiteout. When fuse is used as the lower layer and sets high bits in d_type, this comparison fails, causing whiteout files to not be recognized properly and resulting in incorrect overlayfs behavior. Fix this by requiring callers of iterate_dir() to opt-in for getting flag bits in d_type outside of S_DT_MASK. Fixes: c31f91c6af96 ("fuse: don't allow signals to interrupt getdents copying") Link: https://lore.kernel.org/all/20260107034551.439-1-luochunsheng@ustc.edu/ Link: https://github.com/containerd/stargz-snapshotter/issues/2214 Reported-by: Chunsheng Luo <luochunsheng@ustc.edu> Reviewed-by: Chunsheng Luo <luochunsheng@ustc.edu> Tested-by: Chunsheng Luo <luochunsheng@ustc.edu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260108074522.3400998-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: remove simple_nosetlease()Jeff Layton
Setting ->setlease() to a NULL pointer now has the same effect as setting it to simple_nosetlease(). Remove all of the setlease file_operations that are set to simple_nosetlease, and the function itself. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12Merge 6.19-rc5 into char-misc-nextGreg Kroah-Hartman
We need the char/misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-12rv: Refactor da_monitor to minimise macrosGabriele Monaco
The da_monitor helper functions are generated from macros of the type: DECLARE_DA_FUNCTION(name, type) \ static void da_func_x_##name(type arg) {} \ static void da_func_y_##name(type arg) {} \ This is good to minimise code duplication but the long macros made of skipped end of lines is rather hard to parse. Since functions are static, the advantage of naming them differently for each monitor is minimal. Refactor the da_monitor.h file to minimise macros, instead of declaring functions from macros, we simply declare them with the same name for all monitors (e.g. da_func_x) and for any remaining reference to the monitor name (e.g. tracepoints, enums, global variables) we use the CONCATENATE macro. In this way the file is much easier to maintain while keeping the same generality. Functions depending on the monitor types are now conditionally compiled according to the value of RV_MON_TYPE, which must be defined in the monitor source. The monitor type can be specified as in the original implementation, although it's best to keep the default implementation (unsigned char) as not all parts of code support larger data types, and likely there's no need. We keep the empty macro definitions to ease review of this change with diff tools, but cleanup is required. Also adapt existing monitors to keep the build working. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20251126104241.291258-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-01-12firewire: core: stop using page private to store DMA mapping addressTakashi Sakamoto
There is a long discussion about the use of private field in page structure between Linux kernel developers. This commit stop using page private to store DMA mapping address for isochronous context, to prepare for mm future change. Link: https://lore.kernel.org/r/20260110013911.19160-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2026-01-12firewire: core: move private function declaration from public header to ↵Takashi Sakamoto
internal header The fw_iso_buffer_lookup function is used by core module only, thus no need to describe its prototype in kernel internal header. Link: https://lore.kernel.org/r/20260110013911.19160-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2026-01-11blk-crypto: handle the fallback above the block layerChristoph Hellwig
Add a blk_crypto_submit_bio helper that either submits the bio when it is not encrypted or inline encryption is provided, but otherwise handles the encryption before going down into the low-level driver. This reduces the risk from bio reordering and keeps memory allocation as high up in the stack as possible. Note that if the submitter knows that inline enctryption is known to be supported by the underyling driver, it can still use plain submit_bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-11blk-crypto: add a bio_crypt_ctx() helperChristoph Hellwig
This returns the bio_crypt_ctx if CONFIG_BLK_INLINE_ENCRYPTION is enabled and a crypto context is attached to the bio, else NULL. The use case is to allow safely dereferencing the context in common code without needed #ifdef CONFIG_BLK_INLINE_ENCRYPTION. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-11treewide: Update email addressThomas Gleixner
In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-10bnxt_en: Update FW interface to 1.10.3.151Michael Chan
The main changes are the new HWRM_PORT_PHY_FDRSTAT command to collect FEC histogram bins and the new HWRM_NVM_DEFRAG command to defragment the NVRAM. There is also a minor name change in struct hwrm_vnic_cfg_input that requires updating the bnxt_re driver's main.c. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260108183521.215610-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKGTony Luck
There are now three meanings for "number of RMIDs": 1) The number for legacy features enumerated by CPUID leaf 0xF. This is the maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled will force scaling down the CPUID enumerated value by the number of SNC nodes per L3-cache. 2) The number of registers in MMIO space for each event. This is enumerated in the XML files and is the value initialized into event_group::num_rmid. 3) The number of "hardware counters" (this isn't a strictly accurate description of how things work, but serves as a useful analogy that does describe the limitations) feeding to those MMIO registers. This is enumerated in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature(). Event groups with insufficient "hardware counters" to track all RMIDs are difficult for users to use, since the system may reassign "hardware counters" at any time. This means that users cannot reliably collect two consecutive event counts to compute the rate at which events are occurring. Disable such event groups by default. The user may override this with a command line "rdt=" option. In this case limit an under-resourced event group's number of possible monitor resource groups to the lowest number of "hardware counters". Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource "num_rmid" value to the smallest of these values as this value will be used later to compare against the number of RMIDs supported by other resources to determine how many monitoring resource groups are supported. N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't complain about mixing signed and unsigned types. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10iommu: debug-pagealloc: Check mapped/unmapped kernel memoryMostafa Saleh
Now, as the page_ext holds count of IOMMU mappings, we can use it to assert that any page allocated/freed is indeed not in the IOMMU. The sanitizer doesn’t protect against mapping/unmapping during this period. However, that’s less harmful as the page is not used by the kernel. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Add calls for IOMMU_DEBUG_PAGEALLOCMostafa Saleh
Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC: - iommu_debug_init: Enable the debug mode if configured by the user. - iommu_debug_map: Track iommu pages mapped, using physical address. - iommu_debug_unmap_begin: Track start of iommu unmap operation, with IOVA and size. - iommu_debug_unmap_end: Track the end of unmap operation, passing the actual unmapped size versus the tracked one at unmap_begin. We have to do the unmap_begin/end as once pages are unmapped we lose the information of the physical address. This is racy, but the API is racy by construction as it uses refcounts and doesn't attempt to lock/synchronize with the IOMMU API as that will be costly, meaning that possibility of false negative exists. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOCMostafa Saleh
Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to page_ext. This config will be used by the IOMMU API to track pages mapped in the IOMMU to catch drivers trying to free kernel memory that they still map in their domains, causing all types of memory corruption. This behaviour is disabled by default and can be enabled using kernel cmdline iommu.debug_pagealloc. Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Introduce pci_dev_reset_iommu_prepare/done()Nicolin Chen
PCIe permits a device to ignore ATS invalidation TLPs while processing a reset. This creates a problem visible to the OS where an ATS invalidation command will time out. E.g. an SVA domain will have no coordination with a reset event and can racily issue ATS invalidations to a resetting device. The OS should do something to mitigate this as we do not want production systems to be reporting critical ATS failures, especially in a hypervisor environment. Broadly, OS could arrange to ignore the timeouts, block page table mutations to prevent invalidations, or disable and block ATS. The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and block ATS before initiating a Function Level Reset. It also mentions that other reset methods could have the same vulnerability as well. Provide a callback from the PCI subsystem that will enclose the reset and have the iommu core temporarily change all the attached RID/PASID domains group->blocking_domain so that the IOMMU hardware would fence any incoming ATS queries. And IOMMU drivers should also synchronously stop issuing new ATS invalidations and wait for all ATS invalidations to complete. This can avoid any ATS invaliation timeouts. However, if there is a domain attachment/replacement happening during an ongoing reset, ATS routines may be re-activated between the two function calls. So, introduce a new resetting_domain in the iommu_group structure to reject any concurrent attach_dev/set_dev_pasid call during a reset for a concern of compatibility failure. Since this changes the behavior of an attach operation, update the uAPI accordingly. Note that there are two corner cases: 1. Devices in the same iommu_group Since an attachment is always per iommu_group, this means that any sibling devices in the iommu_group cannot change domain, to prevent race conditions. 2. An SR-IOV PF that is being reset while its VF is not In such case, the VF itself is already broken. So, there is no point in preventing PF from going through the iommu reset. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Add iommu_driver_get_domain_for_dev() helperNicolin Chen
There is a need to stage a resetting PCI device to temporarily the blocked domain and then attach back to its previously attached domain after reset. This can be simply done by keeping the "previously attached domain" in the iommu_group->domain pointer while adding an iommu_group->resetting_domain, which gives troubles to IOMMU drivers using the iommu_get_domain_for_dev() for a device's physical domain in order to program IOMMU hardware. And in such for-driver use cases, the iommu_group->mutex must be held, so it doesn't fit in external callers that don't hold the iommu_group->mutex. Introduce a new iommu_driver_get_domain_for_dev() helper, exclusively for driver use cases that hold the iommu_group->mutex, to separate from those external use cases. Add a lockdep_assert_not_held to the existing iommu_get_domain_for_dev() and highlight that in a kdoc. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10driver core: make bus_find_device_by_acpi_dev() stub prototype alignedAndy Shevchenko
Currently the bus_find_device_by_acpi_dev() stub for !CONFIG_ACPI case takes a const void * parameter instead of const struct acpi_device *. As long as it's a pointer, we may named it as we want to with the help of a forward declaration. Hence move the declaration out of the ifdeffery and use the same prototype in both cases. This adds a bit of an additional type checking at a compilation time. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251229144325.1252197-1-andriy.shevchenko@linux.intel.com [ Fix minor typo in the commit message. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-01-09x86/resctrl: Find and enable usable telemetry eventsTony Luck
Every event group has a private copy of the data of all telemetry event aggregators (aka "telemetry regions") tracking its feature type. Included may be regions that have the same feature type but tracking different GUID from the event group's. Traverse the event group's telemetry region data and mark all regions that are not usable by the event group as unusable by clearing those regions' MMIO addresses. A region is considered unusable if: 1) GUID does not match the GUID of the event group. 2) Package ID is invalid. 3) The enumerated size of the MMIO region does not match the expected value from the XML description file. Hereafter any telemetry region with an MMIO address is considered valid for the event group it is associated with. Enable all the event group's events as long as there is at least one usable region from where data for its events can be read. Enabling of an event can fail if the same event has already been enabled as part of another event group. It should never happen that the same event is described by different GUID supported by the same system so just WARN (via resctrl_enable_mon_event()) and skip the event. Note that it is architecturally possible that some telemetry events are only supported by a subset of the packages in the system. It is not expected that systems will ever do this. If they do the user will see event files in resctrl that always return "Unavailable". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09audit: move the compat_xxx_class[] extern declarations to audit_arch.hBen Dooks
The comapt_xxx_class symbols aren't declared in anything that lib/comapt_audit.c is including (arm64 build) which is causing the following sparse warnings: lib/compat_audit.c:7:10: warning: symbol 'compat_dir_class' was not declared. Should it be static? lib/compat_audit.c:12:10: warning: symbol 'compat_read_class' was not declared. Should it be static? lib/compat_audit.c:17:10: warning: symbol 'compat_write_class' was not declared. Should it be static? lib/compat_audit.c:22:10: warning: symbol 'compat_chattr_class' was not declared. Should it be static? lib/compat_audit.c:27:10: warning: symbol 'compat_signal_class' was not declared. Should it be static? Trying to fix this by chaning compat_audit.c to inclde <linux/audit.h> does not work on arm64 due to compile errors with the extra includes that changing this header makes. The simpler thing would be just to move the definitons of these symbols out of <linux/audit.h> into <linux/audit_arch.h> which is included. Fixes: 4b58841149dca ("audit: Add generic compat syscall support") Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> [PM: rewrite subject line, fixed line length in description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-09PCI: Add ASPEED vendor ID to pci_ids.hNirmoy Das
Add PCI_VENDOR_ID_ASPEED to the shared pci_ids.h header and remove the duplicate local definition from ehci-pci.c. This prepares for adding a PCI quirk for ASPEED devices. Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/20251217154529.377586-1-nirmoyd@nvidia.com
2026-01-09Merge tag 'vfs-6.19-rc5.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Remove incorrect __user annotation from struct xattr_args::value - Documentation fix: Add missing kernel-doc description for the @isnew parameter in ilookup5_nowait() to silence Sphinx warnings - Documentation fix: Fix kernel-doc comment for __start_dirop() - the function name in the comment was wrong and the @state parameter was undocumented - Replace dynamic folio_batch allocation with stack allocation in iomap_zero_range(). The dynamic allocation was problematic for ext4-on-iomap work (didn't handle allocation failure properly) and triggered lockdep complaints. Uses a flag instead to control batch usage - Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls. When a namespace type is disabled, ns->ops is NULL, causes crashes during inode eviction when closing the fd. The ifdefs were removed in a recent simplification but are still needed - Fixe a race where a folio could be unlocked before the trailing zeros (for EOF within the page) were written - Split out a dedicated lease_dispose_list() helper since lease code paths always know they're disposing of leases. Removes unnecessary runtime flag checks and prepares for upcoming lease_manager enhancements - Fix userland delegation requests succeeding despite conflicting opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager operation so userland delegations get proper conflict checking while nfsd can continue its own conflict handling - Fix LOOKUP_CACHED path lookups incorrectly falling through to the slow path. After legitimize_links() calls were conditionally elided, the routine would always fail with LOOKUP_CACHED regardless of whether there were any links. Now the flag is checked at the two callsites before calling legitimize_links() - Fix bug in media fd allocation in media_request_alloc() - Fix mismatched API calls in ecryptfs_mknod(): was calling end_removing() instead of end_creating() after ecryptfs_start_creating_dentry() - Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the lower parent dir was added but never dput()'d, causing BUG during lower filesystem unmount due to the still-in-use dentry * tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pidfs: protect PIDFD_GET_* ioctls() via ifdef ecryptfs: Release lower parent dentry after creating dir ecryptfs: Fix improper mknod pairing of start_creating()/end_removing() get rid of bogus __user in struct xattr_args::value VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED netfs: Fix early read unlock of page with EOF in middle filelock: allow lease_managers to dictate what qualifies as a conflict filelock: add lease_dispose_list() helper iomap: replace folio_batch allocation with stack allocation media: mc: fix potential use-after-free in media_request_alloc()
2026-01-09x86,fs/resctrl: Add architectural event pointerTony Luck
The resctrl file system layer passes the domain, RMID, and event id to the architecture to fetch an event counter. Fetching a telemetry event counter requires additional information that is private to the architecture, for example, the offset into MMIO space from where the counter should be read. Add mon_evt::arch_priv that architecture can use for any private data related to the event. The resctrl filesystem initializes mon_evt::arch_priv when the architecture enables the event and passes it back to architecture when needing to fetch an event counter. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Fill in details of events for performance and energy GUIDsTony Luck
The telemetry event aggregators of the Intel Clearwater Forest CPU support two RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with GUID 0x26557651². The event counter offsets in an aggregator's MMIO space are arranged in groups for each RMID. E.g., the "energy" counters for GUID 0x26696143 are arranged like this: MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY ... MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY After all counters there are three status registers that provide indications of how many times an aggregator was unable to process event counts, the time stamp for the most recent loss of data, and the time stamp of the most recent successful update. MMIO offset:0x2400 AGG_DATA_LOSS_COUNT MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP Define event_group structures for both of these aggregator types and define the events tracked by the aggregators in the file system code. PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format. File system code must output as floating point values. ¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml ²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add and initialize a resource for package scope monitoringTony Luck
Add a new PERF_PKG resource and introduce package level scope for monitoring telemetry events so that CPU hotplug notifiers can build domains at the package granularity. Use the physical package ID available via topology_physical_package_id() to identify the monitoring domains with package level scope. This enables user space to use: /sys/devices/system/cpu/cpuX/topology/physical_package_id to identify the monitoring domain a CPU is associated with. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add an architectural hook called for first mountTony Luck
Enumeration of Intel telemetry events is an asynchronous process involving several mutually dependent drivers added as auxiliary devices during the device_initcall() phase of Linux boot. The process finishes after the probe functions of these drivers completes. But this happens after resctrl_arch_late_init() is executed. Tracing the enumeration process shows that it does complete a full seven seconds before the earliest possible mount of the resctrl file system (when included in /etc/fstab for automatic mount by systemd). Add a hook for use by telemetry event enumeration and initialization and run it once at the beginning of resctrl mount without any locks held. The architecture is responsible for any required locking. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20260105191711.GBaVwON5nZn-uO6Sqg@fat_crate.local
2026-01-09regulator: core: don't fail regulator_register() with missing required supplyAndré Draszik
Since commit 98e48cd9283d ("regulator: core: resolve supply for boot-on/always-on regulators"), the regulator core returns -EPROBE_DEFER if a supply can not be resolved at regulator_register() time due to set_machine_constraints() requiring that supply (e.g. because of always-on or boot-on). In some hardware designs, multiple PMICs are used where individual rails of each act as supplies for rails of the other, and vice-versa. In such a design no PMIC driver can probe when registering one top- level regulator device (as is common practice for almost all regulator drivers in Linux) since that commit. Supplies are only considered when their driver has fully bound, but because in a design like the above two drivers / devices depend on each other, neither will have fully bound while the other probes. The Google Pixel 6 and 6 Pro (oriole and raven) are examples of such a design. One way to make this work would be to register each rail as an individual device, rather than just one top-level regulator device. Then, fw-devlink and Linux' driver core could do their usual handling of deferred device probe as each rail would be probed individually. This approach was dismissed in [1] as each regulator driver would have to take care of this itself. Alternatively, we can change the regulator core to not fail regulator_register() if a rail's required supply can not be resolved while keeping the intended change from above mentioned commit, and instead retry whenever a new rail is registered. This commit implements such an approach: If set_machine_constraints() requests probe deferral, regulator_register() still succeeds and we retry setting constraints as part of regulator_resolve_supply(). We still do not enable the regulator or allow consumers to use it until constraints have been set (including resolution of the supply) to prevent enabling of a regulator before its supply. With this change, we keep track of regulators with missing required supplies and can therefore try to resolve them again and try to set the constraints again once more regulators become available. Care has to be taken to not allow consumers to use regulators that haven't had their constraints set yet. regulator_get() ensures that and now returns -EPROBE_DEFER in that case. The implementation is straight-forward, thanks to our newly introduced regulator-bus. Locking in regulator_resolve_supply() has to be done carefully, as a combination of regulator_(un)lock_two() and regulator_(un)lock_dependent() is needed. The reason is that set_machine_constraints() might call regulator_enable() which needs rdev and all its dependents locked, but everything else requires to only have rdev and its supply locked. Link: https://lore.kernel.org/all/aRn_-o-vie_QoDXD@sirena.co.uk/ [1] Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://patch.msgid.link/20260109-regulators-defer-v2-8-1a25dc968e60@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-09regulator: core: reresolve unresolved supplies when availableAndré Draszik
When a regulator A and its supply B are provided by different devices, the driver implementing B might be last to probe (with A still pending resolution of its supply B). While we try to resolve all pending supplies for all regulators (including A) during regulator_register() of B via regulator_register_resolve_supply(), supply resolution will still not work for A as the driver for B hasn't finished binding to the PMIC device corresponding to B at that stage yet. The regulator core explicitly only allows supplies from other devices to be used once the relevant driver has fully bound, mainly to avoid having to deal with cases where B itself might -EPROBE_DEFER. In this case, A's supply will only be resolved as part of the core's regulator_init_complete_work_function(), which currently is scheduled to run after 30s. This was added as a work-around in commit 3827b64dba27 ("regulator: core: Resolve supplies before disabling unused regulators") to cover this situation. There are two problems with that approach: * it potentially runs long after all our consumers have probed * an upcoming change will allow regulator_register() to complete successfully even when required supplies (e.g. due to always-on or boot-on) are missing at register time, deferring full configuration of the regulator (and usability by consumers, i.e. usually consumer probe) until the supply becomes available. Resolving supplies in the late work func can therefore make it impossible for consumers to probe at all, as the driver core will not know to reprobe consumers when supplies have resolved. We could schedule an earlier work to try to resolve supplies sooner, but that'd be racy as consumers of A might try to probe before A's supply gets fully resolved via this extra work. Instead, add a very simple regulator bus and add a dummy device with a corresponding driver to it for each regulator that is missing its supply during regulator_register(). This way, the driver core will call our bus' probe whenever a new (regulator) device was successfully bound, allowing us to retry resolving the supply during (our bus) probe and to bind this dummy device if successful. In turn this means the driver core will see a newly bound device and retry probing of all pending consumers, if any. With that in place, we can avoid walking the full list of all known regulators to try resolve missing supplies during regulator_register(), as the driver core will invoke the bus probe for regulators that are still pending their supplies. We can also drop the code trying to resolve supplies one last time before unused regulators get disabled, as all supplies should have resolved at that point in time, and if they haven't then there's no point in trying again, as the outcome won't change. Note: We can not reuse the existing struct device created for each rail, as a device can not be part of a class and a bus simultaneously. Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://patch.msgid.link/20260109-regulators-defer-v2-7-1a25dc968e60@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-09Merge tag 'drm-misc-next-2026-01-08' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.20: UAPI Changes: Cross-subsystem Changes: Core Changes: - draw: Add API to check if a format conversion can be done - panic: Rename draw_panic_static_* to draw_panic_screen_*, Add kunit tests - shmem: Improve tests Driver Changes: - ast: Big endian fixes - etnaviv: Add PPU flop reset support - panfrost: Add GPU_PM_RT support for RZ/G3E SoC - panthor: multiple fixes around VM termination, huge page support - pl111: Fix build regression - v3d: Fix DMA segment size - bridge: - Add connector argument to .hpd_notify - Plenty of patches to convert existing drivers to refcounting - Convert Rockchip's inno hdmi support to a proper bridge - lontium-lt9611uxc: Switch to HDMI audio helpers - panel: - New panel: BOE NV140WUM-T08 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260108-literate-nyala-of-courtesy-de501a@houat
2026-01-09scatterlist: introduce sg_nents_for_dma() helperAndy Shevchenko
Sometimes the user needs to split each entry on the mapped scatter list due to DMA length constrains. This helper returns a number of entities assuming that each of them is not bigger than supplied maximum length. Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260108105619.3513561-2-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-08KVM: x86: Mark vmcs12 pages as dirty if and only if they're mappedSean Christopherson
Mark vmcs12 pages as dirty (in KVM's dirty log bitmap) if and only if the page is mapped, i.e. if the page is actually "active" in vmcs02. For some pages, KVM simply disables the associated VMCS control if the vmcs12 page is unreachable, i.e. it's possible for nested VM-Enter to succeed with a "bad" vmcs12 page. Link: https://patch.msgid.link/20251121223444.355422-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: Add a simplified wrapper for registering perf callbacksSean Christopherson
Add a parameter-less API for registering perf callbacks in anticipation of introducing another x86-only parameter for handling mediated PMU PMIs. No functional change intended. Acked-by: Anup Patel <anup@brainfault.org> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.19-rc5). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>