summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-02-26netfilter: nft_counter: fix reset of counters on 32bit archsAnders Grahn
[ Upstream commit 1e13f27e0675552161ab1778be9a23a636dde8a7 ] nft_counter_reset() calls u64_stats_add() with a negative value to reset the counter. This will work on 64bit archs, hence the negative value added will wrap as a 64bit value which then can wrap the stat counter as well. On 32bit archs, the added negative value will wrap as a 32bit value and _not_ wrapping the stat counter properly. In most cases, this would just lead to a very large 32bit value being added to the stat counter. Fix by introducing u64_stats_sub(). Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.") Signed-off-by: Anders Grahn <anders.grahn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26PCI: endpoint: Add BAR subrange mapping supportKoichiro Den
[ Upstream commit 31fb95400451040050361e22ff480476964280f0 ] Some endpoint platforms have only a small number of usable BARs. At the same time, EPF drivers (e.g. vNTB) may need multiple independent inbound regions (control/scratchpad, one or more memory windows, and optionally MSI or other feature-related regions). Subrange mapping allows these to share a single BAR without consuming additional BARs that may not be available, or forcing a fragile layout by aggressively packing into a single contiguous memory range. Extend the PCI endpoint core to support mapping subranges within a BAR. Add an optional 'submap' field in struct pci_epf_bar so an endpoint function driver can request inbound mappings that fully cover the BAR. Introduce a new EPC feature bit, subrange_mapping, and reject submap requests from pci_epc_set_bar() unless the controller advertises both subrange_mapping and dynamic_inbound_mapping features. The submap array describes the complete BAR layout (no overlaps and no gaps are allowed to avoid exposing untranslated address ranges). This provides the generic infrastructure needed to map multiple logical regions into a single BAR at different offsets, without assuming a controller-specific inbound address translation mechanism. Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260124145012.2794108-3-den@valinux.co.jp Stable-dep-of: 72cb5ed2a5c6 ("PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26PCI: endpoint: Add dynamic_inbound_mapping EPC featureKoichiro Den
[ Upstream commit 06a81c5940e46cc7bddee28f16bdd29a12a76344 ] Introduce a new EPC feature bit (dynamic_inbound_mapping) that indicates whether an Endpoint Controller can update the inbound address translation for a BAR without requiring the EPF driver to clear/reset the BAR first. Endpoint Function drivers (e.g. vNTB) can use this information to decide whether it really is safe to call pci_epc_set_bar() multiple times to update inbound mappings for the BAR. Suggested-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260124145012.2794108-2-den@valinux.co.jp Stable-dep-of: 72cb5ed2a5c6 ("PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26ipc: don't audit capability check in ipc_permissions()Ondrej Mosnacek
[ Upstream commit 8924336531e21b187d724b5fdf5277269c9ec22c ] The IPC sysctls implement the ctl_table_root::permissions hook and they override the file access mode based on the CAP_CHECKPOINT_RESTORE capability, which is being checked regardless of whether any access is actually denied or not, so if an LSM denies the capability, an audit record may be logged even when access is in fact granted. It wouldn't be viable to restructure the sysctl permission logic to only check the capability when the access would be actually denied if it's not granted. Thus, do the same as in net_ctl_permissions() (net/sysctl_net.c) - switch from ns_capable() to ns_capable_noaudit(), so that the check never emits an audit record. Link: https://lkml.kernel.org/r/20260122141303.241133-1-omosnace@redhat.com Fixes: 0889f44e2810 ("ipc: Check permissions for checkpoint_restart sysctls at open time") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Alexey Gladkov <legion@kernel.org> Acked-by: Serge Hallyn <serge@hallyn.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26xdrgen: Initialize data pointer for zero-length itemsChuck Lever
[ Upstream commit 27b0fcae8f535fb882b1876227a935dcfdf576aa ] The xdrgen decoders for strings and opaque data had an optimization that skipped calling xdr_inline_decode() when the item length was zero. This left the data pointer uninitialized, which could lead to unpredictable behavior when callers access it. Remove the zero-length check and always call xdr_inline_decode(). When passed a length of zero, xdr_inline_decode() returns the current buffer position, which is valid and matches the behavior of hand-coded XDR decoders throughout the kernel. Fixes: 4b132aacb076 ("tools: Add xdrgen") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26kallsyms/ftrace: set module buildid in ftrace_mod_address_lookup()Petr Mladek
[ Upstream commit e8a1e7eaa19d0b757b06a2f913e3eeb4b1c002c6 ] __sprint_symbol() might access an invalid pointer when kallsyms_lookup_buildid() returns a symbol found by ftrace_mod_address_lookup(). The ftrace lookup function must set both @modname and @modbuildid the same way as module_address_lookup(). Link: https://lkml.kernel.org/r/20251128135920.217303-7-pmladek@suse.com Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26module: add helper function for reading module_buildid()Petr Mladek
[ Upstream commit acfdbb4ab2910ff6f03becb569c23ac7b2223913 ] Add a helper function for reading the optional "build_id" member of struct module. It is going to be used also in ftrace_mod_address_lookup(). Use "#ifdef" instead of "#if IS_ENABLED()" to match the declaration of the optional field in struct module. Link: https://lkml.kernel.org/r/20251128135920.217303-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: e8a1e7eaa19d ("kallsyms/ftrace: set module buildid in ftrace_mod_address_lookup()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26kallsyms/bpf: rename __bpf_address_lookup() to bpf_address_lookup()Petr Mladek
[ Upstream commit cd6735896d0343942cf3dafb48ce32eb79341990 ] bpf_address_lookup() has been used only in kallsyms_lookup_buildid(). It was supposed to set @modname and @modbuildid when the symbol was in a module. But it always just cleared @modname because BPF symbols were never in a module. And it did not clear @modbuildid because the pointer was not passed. The wrapper is no longer needed. Both @modname and @modbuildid are now always initialized to NULL in kallsyms_lookup_buildid(). Remove the wrapper and rename __bpf_address_lookup() to bpf_address_lookup() because this variant is used everywhere. [akpm@linux-foundation.org: fix loongarch] Link: https://lkml.kernel.org/r/20251128135920.217303-6-pmladek@suse.com Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26vsnprintf: drop __printf() attributes on binary printing functionsArnd Bergmann
[ Upstream commit b07829d546c83134629591f02c5348d57cea0c1e ] The printf() format attributes are applied inconsistently for the binary printf helpers, which causes warnings for the bpf_trace code using them from functions that pass down format strings: kernel/trace/bpf_trace.c: In function '____bpf_trace_printk': kernel/trace/bpf_trace.c:377:9: error: function '____bpf_trace_printk' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format] 377 | ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt, data.bin_args); | ^~~ This can be addressed either by annotating all five callers in bpf code, or by removing the annotations on the callees that were added by Andy Shevchenko last year. As Alexei Starovoitov points out, there are no callers in C code that would benefit from the __printf attributes, the only users are in BPF code or in the do_trace_printk() helper that already checks the arguments. Drop all three of these annotations, reverting the earlierl commits that added these, in order to get a clean build with -Wsuggest-attribute=format. Fixes: 6b2c1e30ad68 ("seq_file: Mark binary printing functions with __printf() attribute") Fixes: 7bf819aa992f ("vsnprintf: Mark binary printing functions with __printf() attribute") Link: https://lore.kernel.org/all/CAADnVQK3eZp3yp35OUx8j1UBsQFhgsn5-4VReqAJ=68PaaKYmg@mail.gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512061640.9hKTnB8p-lkp@intel.com/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260204132643.1302967-1-arnd@kernel.org Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26soc: qcom: ubwc: add missing includeDmitry Baryshkov
[ Upstream commit ccef4b2703ff5b0de0b1bda30a0de3026d52eb19 ] The header has a function which calls pr_err(). Don't require users of the header to include <linux/printk.h> and include it here. Fixes: 87cfc79dcd60 ("drm/msm/a6xx: Resolve the meaning of UBWC_MODE") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Link: https://lore.kernel.org/r/20260110-iris-ubwc-v1-1-dd70494dcd7b@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26hwrng: core - use RCU and work_struct to fix race conditionLianjie Wang
[ Upstream commit cc2f39d6ac48e6e3cb2d6240bc0d6df839dd0828 ] Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex lock, a concurrent hwrng_unregister() may call kthread_stop() again on the same task. Additionally, if hwrng_unregister() is called immediately after hwrng_register(), the stopped thread may have never been executed. Thus, hwrng_fill remains dirty even after hwrng_unregister() returns. In this case, subsequent calls to hwrng_register() will fail to start new threads, and hwrng_unregister() will call kthread_stop() on the same freed task. In both cases, a use-after-free occurs: refcount_t: addition on 0; use-after-free. WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0 Call Trace: kthread_stop+0x181/0x360 hwrng_unregister+0x288/0x380 virtrng_remove+0xe3/0x200 This patch fixes the race by protecting the global hwrng_fill pointer inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only once, and calls to kthread_run() and kthread_stop() are serialized with the lock held. To avoid deadlock in hwrng_fillfn() while being stopped with the lock held, we convert current_rng to RCU, so that get_current_rng() can read current_rng without holding the lock. To remove the lock from put_rng(), we also delay the actual cleanup into a work_struct. Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR() checks are removed from its callers. With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no longer clear hwrng_fill itself. Therefore, if hwrng_fillfn() returns directly after current_rng is dropped, kthread_stop() would be called on a freed task_struct later. To fix this, hwrng_fillfn() calls schedule() now to keep the task alive until being stopped. The kthread_stop() call is also moved from hwrng_unregister() to drop_current_rng(), ensuring kthread_stop() is called on all possible paths where current_rng becomes NULL, so that the thread would not wait forever. Fixes: be4000bc4644 ("hwrng: create filler thread") Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Lianjie Wang <karin0.zst@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26mfd: wm8350-core: Use IRQF_ONESHOTSebastian Andrzej Siewior
[ Upstream commit 553b4999cbe231b5011cb8db05a3092dec168aca ] Using a threaded interrupt without a dedicated primary handler mandates the IRQF_ONESHOT flag to mask the interrupt source while the threaded handler is active. Otherwise the interrupt can fire again before the threaded handler had a chance to run. Mark explained that this should not happen with this hardware since it is a slow irqchip which is behind an I2C/ SPI bus but the IRQ-core will refuse to accept such a handler. Set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: 1c6c69525b40e ("genirq: Reject bogus threaded irq requests") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260128095540.863589-16-bigeasy@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26genirq: Set IRQF_COND_ONESHOT in devm_request_irq().Sebastian Andrzej Siewior
[ Upstream commit 943b052ded21feb84f293d40b06af3181cd0d0d7 ] The flag IRQF_COND_ONESHOT was already force-added to request_irq() because the ACPI SCI interrupt handler is using the IRQF_ONESHOT flag which breaks all shared handlers. devm_request_irq() needs the same change since some users, such as int0002_vgpio, are using this function instead. Add IRQF_COND_ONESHOT to the flags passed to devm_request_irq(). Fixes: c37927a203fa2 ("genirq: Set IRQF_COND_ONESHOT in request_irq()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-2-bigeasy@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26ftrace,bpf: Remove FTRACE_OPS_FL_JMP ftrace_ops flagJiri Olsa
[ Upstream commit 4be42c92220128b3128854a2d6b0f0ad0bcedbdb ] At the moment the we allow the jmp attach only for ftrace_ops that has FTRACE_OPS_FL_JMP set. This conflicts with following changes where we use single ftrace_ops object for all direct call sites, so all could be be attached via just call or jmp. We already limit the jmp attach support with config option and bit (LSB) set on the trampoline address. It turns out that's actually enough to limit the jmp attach for architecture and only for chosen addresses (with LSB bit set). Each user of register_ftrace_direct or modify_ftrace_direct can set the trampoline bit (LSB) to indicate it has to be attached by jmp. The bpf trampoline generation code uses trampoline flags to generate jmp-attach specific code and ftrace inner code uses the trampoline bit (LSB) to handle return from jmp attachment, so there's no harm to remove the FTRACE_OPS_FL_JMP bit. The fexit/fmodret performance stays the same (did not drop), current code: fentry : 77.904 ± 0.546M/s fexit : 62.430 ± 0.554M/s fmodret : 66.503 ± 0.902M/s with this change: fentry : 80.472 ± 0.061M/s fexit : 63.995 ± 0.127M/s fmodret : 67.362 ± 0.175M/s Fixes: 25e4e3565d45 ("ftrace: Introduce FTRACE_OPS_FL_JMP") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26seqlock: fix scoped_seqlock_read kernel-docRandy Dunlap
[ Upstream commit f88a31308db6a856229150039b0f56d59696ed31 ] Eliminate all kernel-doc warnings in seqlock.h: - correct the macro to have "()" immediately following the macro name - don't include the macro parameters in the short description (first line) - make the parameter names in the comments match the actual macro parameter names. - use "::" for the Example WARNING: include/linux/seqlock.h:1341 This comment starts with '/**', but isn't a kernel-doc comment. * scoped_seqlock_read (lock, ss_state) - execute the read side critical Documentation/locking/seqlock:242: include/linux/seqlock.h:1351: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils] Warning: include/linux/seqlock.h:1357 function parameter '_seqlock' not described in 'scoped_seqlock_read' Warning: include/linux/seqlock.h:1357 function parameter '_target' not described in 'scoped_seqlock_read' Fixes: cc39f3872c08 ("seqlock: Introduce scoped_seqlock_read()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260123183749.3997533-1-rdunlap@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26bpf: Fix tcx/netkit detach permissions when prog fd isn't givenGuillaume Gonnet
[ Upstream commit ae23bc81ddf7c17b663c4ed1b21e35527b0a7131 ] This commit fixes a security issue where BPF_PROG_DETACH on tcx or netkit devices could be executed by any user when no program fd was provided, bypassing permission checks. The fix adds a capability check for CAP_NET_ADMIN or CAP_SYS_ADMIN in this case. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Signed-off-by: Guillaume Gonnet <ggonnet.linux@gmail.com> Link: https://lore.kernel.org/r/20260127160200.10395-1-ggonnet.linux@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26bpf, sockmap: Fix FIONREAD for sockmapJiayuan Chen
[ Upstream commit 929e30f9312514902133c45e51c79088421ab084 ] A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to calculate FIONREAD is not enough. This patch adds a new msg_tot_len field in the psock structure to record the data length in ingress_msg. Additionally, we implement new ioctl interfaces for TCP and UDP to intercept FIONREAD operations. Note that we intentionally do not include sk_receive_queue data in the FIONREAD result. Data in sk_receive_queue has not yet been processed by the BPF verdict program, and may be redirected to other sockets or dropped. Including it would create semantic ambiguity since this data may never be readable by the user. Unix and VSOCK sockets have similar issues, but fixing them is outside the scope of this patch as it would require more intrusive changes. Previous work by John Fastabend made some efforts towards FIONREAD support: commit e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Although the current patch is based on the previous work by John Fastabend, it is acceptable for our Fixes tag to point to the same commit. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20260124113314.113584-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26bpf, sockmap: Fix incorrect copied_seq calculationJiayuan Chen
[ Upstream commit b40cc5adaa80e1471095a62d78233b611d7a558c ] A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. The issue is that when reading from ingress_msg, we update tp->copied_seq by default. However, if the data is not from its own protocol stack, tcp->rcv_nxt is not increased. Later, if we convert this socket to a native socket, reading from this socket may fail because copied_seq might be significantly larger than rcv_nxt. This fix also addresses the syzkaller-reported bug referenced in the Closes tag. This patch marks the skmsg objects in ingress_msg. When reading, we update copied_seq only if the data is from its own protocol stack. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack Closes: https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260124113314.113584-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26crypto: hisilicon - consolidate qp creation and start in hisi_qm_alloc_qps_nodeChenghai Huang
[ Upstream commit 72f3bbebff15e87171271d643ee2672fb8e92031 ] Consolidate the creation and start of qp into the function hisi_qm_alloc_qps_node. This change eliminates the need for each module to perform these steps in two separate phases (creation and start). Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Stable-dep-of: 6aff4d977e2d ("crypto: hisilicon/hpre - support the hpre algorithm fallback") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26crypto: hisilicon/qm - centralize the sending locks of each module into qmChenghai Huang
[ Upstream commit 8cd9b608ee8dea78cac3f373bd5e3b3de2755d46 ] When a single queue used by multiple tfms, the protection of shared resources by individual module driver programs is no longer sufficient. The hisi_qp_send needs to be ensured by the lock in qp. Fixes: 5fdb4b345cfb ("crypto: hisilicon - add a lock for the qp send operation") Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26crypto: hisilicon/qm - enhance the configuration of req_type in queue attributesChenghai Huang
[ Upstream commit 21452eaa06edb5f6038720e643aed0bbfffad9c3 ] Originally, when a queue was requested, it could only be configured with the default algorithm type of 0. Now, when multiple tfms use the same queue, the queue must be selected based on its attributes to meet the requirements of tfm tasks. So the algorithm type attribute of queue need to be distinguished. Just like a queue used for compression in ZIP cannot be used for decompression tasks. Fixes: 3f1ec97aacf1 ("crypto: hisilicon/qm - Put device finding logic into QM") Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26crypto: hisilicon/sec - move backlog management to qp and store sqe in qp ↵Chenghai Huang
for callback [ Upstream commit 08eb67d23e5172a5d1e60f1f0acccee569fe10ba ] When multiple tfm use a same qp, the backlog data should be managed centrally by the qp, rather than in the qp_ctx of each req. Additionally, since SEC_BD_TYPE1 and SEC_BD_TYPE2 cannot use the tag of the sqe to carry the virtual address of the req, the sent sqe is stored in the qp. This allows the callback function to get the req address. To handle the differences between hardware types, the callback functions are split into two separate implementations. Fixes: f0ae287c5045 ("crypto: hisilicon/sec2 - implement full backlog mode for sec") Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26io_uring/eventfd: remove unused ctx->evfd_last_cq_tail memberJens Axboe
[ Upstream commit 07f3c3a1cd56c2048a92dad0c11f15e4ac3888c1 ] A previous commit got rid of any use of this member, but forgot to remove it. Kill it. Fixes: f4bb2f65bb81 ("io_uring/eventfd: move ctx->evfd_last_cq_tail into io_ev_fd") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26device_cgroup: remove branch hint after code refactorBreno Leitao
[ Upstream commit 6784f274722559c0cdaaa418bc8b7b1d61c314f9 ] commit 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()") reordered the checks in devcgroup_inode_permission() to check the inode mode before checking i_rdev, for better cache behavior. However, the likely() annotation on the i_rdev check was not updated to reflect the new code flow. Originally, when i_rdev was checked first, likely(!inode->i_rdev) made sense because most inodes were(?) regular files/directories, thus i_rdev == 0. After the reorder, by the time we reach the i_rdev check, we have already confirmed the inode IS a block or character device. Block and character special files are precisely defined by having a device number (i_rdev), so !inode->i_rdev is now the rare edge case, not the common case. Branch profiling confirmed this is 100% mispredicted: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 0 2631904 100 devcgroup_inode_permission device_cgroup.h 24 Remove likely() to avoid giving the wrong hint to the CPU. Fixes: 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()") Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260107-likely_device-v1-1-0c55f83a7e47@debian.org Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26audit: move the compat_xxx_class[] extern declarations to audit_arch.hBen Dooks
[ Upstream commit 76489955c6d4a065ca69dc88faf7a50a59b66f35 ] The comapt_xxx_class symbols aren't declared in anything that lib/comapt_audit.c is including (arm64 build) which is causing the following sparse warnings: lib/compat_audit.c:7:10: warning: symbol 'compat_dir_class' was not declared. Should it be static? lib/compat_audit.c:12:10: warning: symbol 'compat_read_class' was not declared. Should it be static? lib/compat_audit.c:17:10: warning: symbol 'compat_write_class' was not declared. Should it be static? lib/compat_audit.c:22:10: warning: symbol 'compat_chattr_class' was not declared. Should it be static? lib/compat_audit.c:27:10: warning: symbol 'compat_signal_class' was not declared. Should it be static? Trying to fix this by chaning compat_audit.c to inclde <linux/audit.h> does not work on arm64 due to compile errors with the extra includes that changing this header makes. The simpler thing would be just to move the definitons of these symbols out of <linux/audit.h> into <linux/audit_arch.h> which is included. Fixes: 4b58841149dca ("audit: Add generic compat syscall support") Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> [PM: rewrite subject line, fixed line length in description] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19f2fs: support non-4KB block size without packed_ssa featureDaeho Jeong
commit e48e16f3e37fac76e2f0c14c58df2b0398a323b0 upstream. Currently, F2FS requires the packed_ssa feature to be enabled when utilizing non-4KB block sizes (e.g., 16KB). This restriction limits the flexibility of filesystem formatting options. This patch allows F2FS to support non-4KB block sizes even when the packed_ssa feature is disabled. It adjusts the SSA calculation logic to correctly handle summary entries in larger blocks without the packed layout. Cc: stable@kernel.org Fixes: 7ee8bc3942f2 ("f2fs: revert summary entry count from 2048 to 512 in 16kb block support") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-07Merge tag 'sched-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out" * tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Optimize transitional CIDs when scheduling out sched/mmcid: Drop per CPU CID immediately when switching to per task mode sched/mmcid: Protect transition on weakly ordered systems sched/mmcid: Prevent live lock on task to CPU mode transition
2026-02-07Merge tag 'objtool-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar:: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations - Strip livepatching symbol artifacts from non-livepatch modules - Fix livepatch build warnings when certain Clang LTO options are enabled - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y * tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool/klp: Fix unexported static call key access for manually built livepatch modules objtool/klp: Fix symbol correlation for orphaned local symbols livepatch: Free klp_{object,func}_ext data after initialization livepatch: Fix having __klp_objects relics in non-livepatch modules livepatch/klp-build: Require Clang assembler >= 20
2026-02-06Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "A couple of late-breaking MM fixes. One against a new-in-this-cycle patch and the other addresses a locking issue which has been there for over a year" * tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: reject unsupported non-folio compound page procfs: avoid fetching build ID while holding VMA lock
2026-02-06Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "One RBD and two CephFS fixes which address potential oopses. The RBD thing is more of a rare edge case that pops up in our CI, while the two CephFS scenarios are regressions that were reported by users and can be triggered trivially in normal operation. All marked for stable" * tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client: ceph: fix NULL pointer dereference in ceph_mds_auth_match() ceph: fix oops due to invalid pointer for kfree() in parse_longname() rbd: check for EOD after exclusive lock is ensured to be held
2026-02-05procfs: avoid fetching build ID while holding VMA lockAndrii Nakryiko
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock or per-VMA lock, whichever was used to lock VMA under question, to avoid deadlock reported by syzbot: -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** This seems to be exacerbated (as we haven't seen these syzbot reports before that) by the recent: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") To make this safe, we need to grab file refcount while VMA is still locked, but other than that everything is pretty straightforward. Internal build_id_parse() API assumes VMA is passed, but it only needs the underlying file reference, so just add another variant build_id_parse_file() that expects file passed directly. [akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-05Merge tag 'net-6.19-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and Netfilter. Previous releases - regressions: - eth: stmmac: fix stm32 (and potentially others) resume regression - nf_tables: fix inverted genmask check in nft_map_catchall_activate() - usb: r8152: fix resume reset deadlock - fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts Previous releases - always broken: - sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads with malicious u32 rules - eth: ice: timestamping related fixes" * tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() net: cpsw: Execute ndo_set_rx_mode callback in a work queue net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue gve: Correct ethtool rx_dropped calculation gve: Fix stats report corruption on queue count change selftest: net: add a test-case for encap segmentation after GRO net: gro: fix outer network offset net: add proper RCU protection to /proc/net/ptype net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi() wifi: iwlwifi: mvm: pause TCM on fast resume wifi: iwlwifi: mld: cancel mlo_scan_start_wk net: spacemit: k1-emac: fix jumbo frame support net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4 net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4 net: enetc: Remove CBDR cacheability AXI settings for ENETC v4 net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4 tipc: use kfree_sensitive() for session key material net: stmmac: fix stm32 (and potentially others) resume regression net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts ...
2026-02-05livepatch: Fix having __klp_objects relics in non-livepatch modulesPetr Pavlu
The linker script scripts/module.lds.S specifies that all input __klp_objects sections should be consolidated into an output section of the same name, and start/stop symbols should be created to enable scripts/livepatch/init.c to locate this data. This start/stop pattern is not ideal for modules because the symbols are created even if no __klp_objects input sections are present. Consequently, a dummy __klp_objects section also appears in the resulting module. This unnecessarily pollutes non-livepatch modules. Instead, since modules are relocatable files, the usual method for locating consolidated data in a module is to read its section table. This approach avoids the aforementioned problem. The klp_modinfo already stores a copy of the entire section table with the final addresses. Introduce a helper function that scripts/livepatch/init.c can call to obtain the location of the __klp_objects section from this data. Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2026-02-04Merge tag 'tsm-fixes-for-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm Pull TSM (TEE security Manager) fixes from Dan Williams: "The largest change is reverting part of an ABI that never shipped in a released kernel (Documentation/ABI/testing/sysfs-class-tsm). The fix / replacement for that is too large to squeeze in at this late date. The rest is a collection of small fixups: - Fix multiple streams per host bridge for SEV-TIO - Drop the TSM ABI for reporting IDE streams (to be replaced) - Fix virtual function enumeration - Fix reserved stream ID initialization - Fix unused variable compiler warning" * tag 'tsm-fixes-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: crypto/ccp: Allow multiple streams on the same root bridge crypto/ccp: Use PCI bridge defaults for IDE coco/tsm: Remove unused variable tsm_rwsem PCI/IDE: Fix reading a wrong reg for unused sel stream initialization PCI/IDE: Fix off by one error calculating VF RID range Revert "PCI/TSM: Report active IDE streams"
2026-02-04ceph: fix NULL pointer dereference in ceph_mds_auth_match()Viacheslav Dubeyko
The CephFS kernel client has regression starting from 6.18-rc1. We have issue in ceph_mds_auth_match() if fs_name == NULL: const char fs_name = mdsc->fsc->mount_options->mds_namespace; ... if (auth->match.fs_name && strcmp(auth->match.fs_name, fs_name)) { / fsname mismatch, try next one */ return 0; } Patrick Donnelly suggested that: In summary, we should definitely start decoding `fs_name` from the MDSMap and do strict authorizations checks against it. Note that the `-o mds_namespace=foo` should only be used for selecting the file system to mount and nothing else. It's possible no mds_namespace is specified but the kernel will mount the only file system that exists which may have name "foo". This patch reworks ceph_mdsmap_decode() and namespace_equals() with the goal of supporting the suggested concept. Now struct ceph_mdsmap contains m_fs_name field that receives copy of extracted FS name by ceph_extract_encoded_string(). For the case of "old" CephFS file systems, it is used "cephfs" name. [ idryomov: replace redundant %*pE with %s in ceph_mdsmap_decode(), get rid of a series of strlen() calls in ceph_namespace_match(), drop changes to namespace_equals() body to avoid treating empty mds_namespace as equal, drop changes to ceph_mdsc_handle_fsmap() as namespace_equals() isn't an equivalent substitution there ] Cc: stable@vger.kernel.org Fixes: 22c73d52a6d0 ("ceph: fix multifs mds auth caps issue") Link: https://tracker.ceph.com/issues/73886 Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Tested-by: Patrick Donnelly <pdonnell@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-04sched/mmcid: Protect transition on weakly ordered systemsThomas Gleixner
Shrikanth reported a hard lockup which he observed once. The stack trace shows the following CID related participants: watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188 NIP: mm_get_cid+0xe8/0x188 LR: mm_get_cid+0x108/0x188 mm_cid_switch_to+0x3c4/0x52c __schedule+0x47c/0x700 schedule_idle+0x3c/0x64 do_idle+0x160/0x1b0 cpu_startup_entry+0x48/0x50 start_secondary+0x284/0x288 start_secondary_prolog+0x10/0x14 watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c NIP: plpar_hcall_norets_notrace+0x18/0x2c LR: queued_spin_lock_slowpath+0xd88/0x15d0 _raw_spin_lock+0x80/0xa0 raw_spin_rq_lock_nested+0x3c/0xf8 mm_cid_fixup_cpus_to_tasks+0xc8/0x28c sched_mm_cid_exit+0x108/0x22c do_exit+0xf4/0x5d0 make_task_dead+0x0/0x178 system_call_exception+0x128/0x390 system_call_vectored_common+0x15c/0x2ec The task on CPU11 is running the CID ownership mode change fixup function and is stuck on a runqueue lock. The task on CPU23 is trying to get a CID from the pool with the same runqueue lock held, but the pool is empty. After decoding a similar issue in the opposite direction switching from per task to per CPU mode the tool which models the possible scenarios failed to come up with a similar loop hole. This showed up only once, was not reproducible and according to tooling not related to a overlooked scheduling scenario permutation. But the fact that it was observed on a PowerPC system gave the right hint: PowerPC is a weakly ordered architecture. The transition mechanism does: WRITE_ONCE(mm->mm_cid.transit, MM_CID_TRANSIT); WRITE_ONCE(mm->mm_cid.percpu, new_mode); fixup() WRITE_ONCE(mm->mm_cid.transit, 0); mm_cid_schedin() does: if (!READ_ONCE(mm->mm_cid.percpu)) ... cid |= READ_ONCE(mm->mm_cid.transit); so weakly ordered systems can observe percpu == false and transit == 0 even if the fixup function has not yet completed. As a consequence the task will not drop the CID when scheduling out before the fixup is completed, which means the CID space can be exhausted and the next task scheduling in will loop in mm_get_cid() and the fixup thread can livelock on the held runqueue lock as above. This could obviously be solved by using: smp_store_release(&mm->mm_cid.percpu, true); and smp_load_acquire(&mm->mm_cid.percpu); but that brings a memory barrier back into the scheduler hotpath, which was just designed out by the CID rewrite. That can be completely avoided by combining the per CPU mode and the transit storage into a single mm_cid::mode member and ordering the stores against the fixup functions to prevent the CPU from reordering them. That makes the update of both states atomic and a concurrent read observes always consistent state. The price is an additional AND operation in mm_cid_schedin() to evaluate the per CPU or the per task path, but that's in the noise even on strongly ordered architectures as the actual load can be significantly more expensive and the conditional branch evaluation is there anyway. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Closes: https://lore.kernel.org/bdfea828-4585-40e8-8835-247c6a8a76b0@linux.ibm.com Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192834.965217106@kernel.org
2026-02-01Merge tag 'perf-urgent-2026-02-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Fix a race in the user-callchains code" * tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: sched: Fix perf crash with new is_user_task() helper
2026-01-30perf: sched: Fix perf crash with new is_user_task() helperSteven Rostedt
In order to do a user space stacktrace the current task needs to be a user task that has executed in user space. It use to be possible to test if a task is a user task or not by simply checking the task_struct mm field. If it was non NULL, it was a user task and if not it was a kernel task. But things have changed over time, and some kernel tasks now have their own mm field. An idea was made to instead test PF_KTHREAD and two functions were used to wrap this check in case it became more complex to test if a task was a user task or not[1]. But this was rejected and the C code simply checked the PF_KTHREAD directly. It was later found that not all kernel threads set PF_KTHREAD. The io-uring helpers instead set PF_USER_WORKER and this needed to be added as well. But checking the flags is still not enough. There's a very small window when a task exits that it frees its mm field and it is set back to NULL. If perf were to trigger at this moment, the flags test would say its a user space task but when perf would read the mm field it would crash with at NULL pointer dereference. Now there are flags that can be used to test if a task is exiting, but they are set in areas that perf may still want to profile the user space task (to see where it exited). The only real test is to check both the flags and the mm field. Instead of making this modification in every location, create a new is_user_task() helper function that does all the tests needed to know if it is safe to read the user space memory or not. [1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/ Fixes: 90942f9fac05 ("perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL") Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home
2026-01-30Merge tag 'dma-mapping-6.19-2026-01-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) * tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: distinguish between missing and exhausted atomic pools of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param
2026-01-29net: add skb_header_pointer_careful() helperEric Dumazet
This variant of skb_header_pointer() should be used in contexts where @offset argument is user-controlled and could be negative. Negative offsets are supported, as long as the zone starts between skb->head and skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128141539.3404400-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split
2026-01-29of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel paramOreoluwa Babatunde
When initializing the default cma region, the "cma=" kernel parameter takes priority over a DT defined linux,cma-default region. Hence, give the reserved_mem framework the ability to detect this so that the DT defined cma region can skip initialization accordingly. Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: Joy Zou <joy.zou@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Fixes: 8a6e02d0c00e ("of: reserved_mem: Restructure how the reserved memory regions are processed") Fixes: 2c223f7239f3 ("of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()") Link: https://lore.kernel.org/r/20251210002027.1171519-1-oreoluwa.babatunde@oss.qualcomm.com [mszyprow: rebased onto v6.19-rc1, added fixes tags, added a stub for cma_skip_dt_default_reserved_mem() if no CONFIG_DMA_CMA is set] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2026-01-26mm/zone_device: reinitialize large zone device private foliosMatthew Brost
Reinitialize metadata for large zone device private folios in zone_device_page_init prior to creating a higher-order zone device private folio. This step is necessary when the folio's order changes dynamically between zone_device_page_init calls to avoid building a corrupt folio. As part of the metadata reinitialization, the dev_pagemap must be passed in from the caller because the pgmap stored in the folio page may have been overwritten with a compound head. Without this fix, individual pages could have invalid pgmap fields and flags (with PG_locked being notably problematic) due to prior different order allocations, which can, and will, result in kernel crashes. Link: https://lkml.kernel.org/r/20260116111325.1736137-2-francois.dugast@intel.com Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26memfd: export alloc_file()Pratyush Yadav (Google)
Patch series "mm: memfd_luo hotfixes". This series contains a couple of fixes for memfd preservation using LUO. This patch (of 3): The Live Update Orchestrator's (LUO) memfd preservation works by preserving all the folios of a memfd, re-creating an empty memfd on the next boot, and then inserting back the preserved folios. Currently it creates the file by directly calling shmem_file_setup(). This leaves out other work done by alloc_file() like setting up the file mode, flags, or calling the security hooks. Export alloc_file() to let memfd_luo use it. Rename it to memfd_alloc_file() since it is no longer private and thus needs a subsystem prefix. Link: https://lkml.kernel.org/r/20260122151842.4069702-1-pratyush@kernel.org Link: https://lkml.kernel.org/r/20260122151842.4069702-2-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26mm/kasan: fix KASAN poisoning in vrealloc()Andrey Ryabinin
A KASAN warning can be triggered when vrealloc() changes the requested size to a value that is not aligned to KASAN_GRANULE_SIZE. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at mm/kasan/shadow.c:174 kasan_unpoison+0x40/0x48 ... pc : kasan_unpoison+0x40/0x48 lr : __kasan_unpoison_vmalloc+0x40/0x68 Call trace: kasan_unpoison+0x40/0x48 (P) vrealloc_node_align_noprof+0x200/0x320 bpf_patch_insn_data+0x90/0x2f0 convert_ctx_accesses+0x8c0/0x1158 bpf_check+0x1488/0x1900 bpf_prog_load+0xd20/0x1258 __sys_bpf+0x96c/0xdf0 __arm64_sys_bpf+0x50/0xa0 invoke_syscall+0x90/0x160 Introduce a dedicated kasan_vrealloc() helper that centralizes KASAN handling for vmalloc reallocations. The helper accounts for KASAN granule alignment when growing or shrinking an allocation and ensures that partial granules are handled correctly. Use this helper from vrealloc_node_align_noprof() to fix poisoning logic. [ryabinin.a.a@gmail.com: move kasan_enabled() check, fix build] Link: https://lkml.kernel.org/r/20260119144509.32767-1-ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20260113191516.31015-1-ryabinin.a.a@gmail.com Fixes: d699440f58ce ("mm: fix vrealloc()'s KASAN poisoning logic") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: <joonki.min@samsung-slsi.corp-partner.google.com> Closes: https://lkml.kernel.org/r/CANP3RGeuRW53vukDy7WDO3FiVgu34-xVJYkfpm08oLO3odYFrA@mail.gmail.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26Merge tag 'vfs-6.19-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-25Merge tag 'char-misc-6.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/iio driver fixes from Greg KH: "Here are some small char/misc/iio and some other minor driver subsystem fixes for 6.19-rc7. Nothing huge here, just some fixes for reported issues including: - lots of little iio driver fixes - comedi driver fixes - mux driver fix - w1 driver fixes - uio driver fix - slimbus driver fixes - hwtracing bugfix - other tiny bugfixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (36 commits) comedi: dmm32at: serialize use of paged registers mei: trace: treat reg parameter as string uio: pci_sva: correct '-ENODEV' check logic uacce: ensure safe queue release with state management uacce: implement mremap in uacce_vm_ops to return -EPERM uacce: fix isolate sysfs check condition uacce: fix cdev handling in the cleanup path slimbus: core: clean up of_slim_get_device() slimbus: core: fix of_slim_get_device() kernel doc slimbus: core: amend slim_get_device() kernel doc slimbus: core: fix device reference leak on report present slimbus: core: fix runtime PM imbalance on report present slimbus: core: fix OF node leak on registration failure intel_th: rename error label intel_th: fix device leak on output open() comedi: Fix getting range information for subdevices 16 to 255 mux: mmio: Fix IS_ERR() vs NULL check in probe() interconnect: debugfs: initialize src_node and dst_node to empty strings iio: dac: ad3552r-hs: fix out-of-bound write in ad3552r_hs_write_data_source iio: accel: iis328dq: fix gain values ...
2026-01-24Merge tag 'driver-core-6.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Always inline I/O and IRQ methods using build_assert!() to avoid false positive build errors - Do not free the driver's device private data in I2C shutdown() avoiding race conditions that can lead to UAF bugs - Drop the driver's device private data after the driver has been fully unbound from its device to avoid UAF bugs from &Device<Bound> scopes, such as IRQ callbacks * tag 'driver-core-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: rust: driver: drop device private data post unbind rust: driver: add DriverData type to the DriverLayout trait rust: driver: add DEVICE_DRIVER_OFFSET to the DriverLayout trait rust: driver: introduce a DriverLayout trait rust: auxiliary: add Driver::unbind() callback rust: i2c: do not drop device private data on shutdown() rust: irq: always inline functions using build_assert with arguments rust: io: always inline functions using build_assert with arguments
2026-01-22PCI/IDE: Fix off by one error calculating VF RID rangeLi Ming
The VF ID range of an SR-IOV device is [0, num_VFs - 1]. pci_ide_stream_alloc() mistakenly uses num_VFs to represent the last ID. Fix that off by one error to stay in bounds of the range. Fixes: 1e4d2ff3ae45 ("PCI/IDE: Add IDE establishment helpers") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com> Link: https://patch.msgid.link/20260114111455.550984-1-ming.li@zohomail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2026-01-22Revert "PCI/TSM: Report active IDE streams"Dan Williams
The proposed ABI failed to account for multiple host bridges with the same stream name. The fix needs to namespace streams or otherwise link back to the host bridge, but a change like that is too big for a fix. Given this ABI never saw a released kernel, delete it for now and bring it back later with this issue addressed. Reported-by: Xu Yilun <yilun.xu@linux.intel.com> Reported-by: Yi Lai <yi1.lai@intel.com> Closes: http://lore.kernel.org/20251223085601.2607455-1-yilun.xu@linux.intel.com Link: http://patch.msgid.link/6972c872acbb9_1d3310035@dwillia2-mobl4.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com>