kernel - Hosts the 0x221E linux distro kernel.

Age	Commit message (Collapse)	Author
2026-02-10	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()	Eric Dumazet
	Instead of using private @fl6 and @final variables use respectively inet->cork.fl.u.ip6 and np->final. As explained in commit 85d05e281712 ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, active TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6	Eric Dumazet
	Convert inet6_csk_route_socket() to use np->final instead of an automatic variable to get rid of a stack canary. Convert inet6_csk_xmit() and inet6_csk_update_pmtu() to use inet->cork.fl.u.ip6 instead of @fl6 automatic variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()	Eric Dumazet
	Get rid of @fl6 and &final variables in ip6_datagram_dst_update(). Use instead inet->cork.fl.u.ip6 and np->final so that a stack canary is no longer needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	ipv6: use np->final in inet6_sk_rebuild_header()	Eric Dumazet
	Instead of using an automatic variable, use np->final to get rid of the stack canary in inet6_sk_rebuild_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	Merge tag 'nf-next-26-02-06' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for net-next: 1) Fix net-next-only use-after-free bug in nf_tables rbtree set: Expired elements cannot be released right away after unlink anymore because there is no guarantee that the binary-search blob is going to be updated. Spotted by syzkaller. 2) Fix esoteric bug in nf_queue with udp fraglist gro, broken since 6.11. Patch 3 adds extends the nfqueue selftest for this. 4) Use dedicated slab for flowtable entries, currently the -512 cache is used, which is wasteful. From Qingfang Deng. 5) Recent net-next update extended existing test for ip6ip6 tunnels, add the required /config entry. Test still passed by accident because the previous tests network setup gets re-used, so also update the test so it will fail in case the ip6ip6 tunnel interface cannot be added. 6) Fix 'nft get element mytable myset { 1.2.3.4 }' on big endian platforms, this was broken since code was added in v5.1. 7) Fix nf_tables counter reset support on 32bit platforms, where counter reset may cause huge values to appear due to wraparound. Broken since reset feature was added in v6.11. From Anders Grahn. 8-11) update nf_tables rbtree set type to detect partial operlaps. This will eventually speed up nftables userspace: at this time userspace does a netlink dump of the set content which slows down incremental updates on interval sets. From Pablo Neira Ayuso. * tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_set_rbtree: validate open interval overlap netfilter: nft_set_rbtree: validate element belonging to interval netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval netfilter: nft_counter: fix reset of counters on 32bit archs netfilter: nft_set_hash: fix get operation on big endian selftests: netfilter: add IPV6_TUNNEL to config netfilter: flowtable: dedicated slab for flow entry selftests: netfilter: nft_queue.sh: add udp fraglist gro test case netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation netfilter: nft_set_rbtree: don't gc elements on insert ==================== Link: https://patch.msgid.link/20260206153048.17570-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	mptcp: allow overridden write_space to be invoked	Geliang Tang
	Future extensions with psock will override their own sk->sk_write_space callback. This patch ensures that the overridden sk_write_space can be invoked by MPTCP. INDIRECT_CALL is used to keep the default path optimised. Note that sk->sk_write_space was never called directly with MPTCP sockets, so changing it to sk_stream_write_space in the init, and using it from mptcp_write_space() is not supposed to change the current behaviour. This patch is shared early to ease discussions around future RFC and avoid confusions with this "fix" that is needed for different future extensions. Suggested-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Gang Yan <yangang@kylinos.cn> Signed-off-by: Gang Yan <yangang@kylinos.cn> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260206-net-next-mptcp-write_space-override-v2-1-e0b12be818c6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	Merge tag 'locking-core-2026-02-08' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Lock debugging: - Implement compiler-driven static analysis locking context checking, using the upcoming Clang 22 compiler's context analysis features (Marco Elver) We removed Sparse context analysis support, because prior to removal even a defconfig kernel produced 1,700+ context tracking Sparse warnings, the overwhelming majority of which are false positives. On an allmodconfig kernel the number of false positive context tracking Sparse warnings grows to over 5,200... On the plus side of the balance actual locking bugs found by Sparse context analysis is also rather ... sparse: I found only 3 such commits in the last 3 years. So the rate of false positives and the maintenance overhead is rather high and there appears to be no active policy in place to achieve a zero-warnings baseline to move the annotations & fixers to developers who introduce new code. Clang context analysis is more complete and more aggressive in trying to find bugs, at least in principle. Plus it has a different model to enabling it: it's enabled subsystem by subsystem, which results in zero warnings on all relevant kernel builds (as far as our testing managed to cover it). Which allowed us to enable it by default, similar to other compiler warnings, with the expectation that there are no warnings going forward. This enforces a zero-warnings baseline on clang-22+ builds (Which are still limited in distribution, admittedly) Hopefully the Clang approach can lead to a more maintainable zero-warnings status quo and policy, with more and more subsystems and drivers enabling the feature. Context tracking can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default disabled), but this will generate a lot of false positives. ( Having said that, Sparse support could still be added back, if anyone is interested - the removal patch is still relatively straightforward to revert at this stage. ) Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng) - Add support for Atomic<i8/i16/bool> and replace most Rust native AtomicBool usages with Atomic<bool> - Clean up LockClassKey and improve its documentation - Add missing Send and Sync trait implementation for SetOnce - Make ARef Unpin as it is supposed to be - Add __rust_helper to a few Rust helpers as a preparation for helper LTO - Inline various lock related functions to avoid additional function calls WW mutexes: - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz) Misc fixes and cleanups: - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd Bergmann) - locking/local_lock: Include more missing headers (Peter Zijlstra) - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap) - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir Duberstein)" * tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits) locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK rcu: Mark lockdep_assert_rcu_helper() __always_inline compiler-context-analysis: Remove __assume_ctx_lock from initializers tomoyo: Use scoped init guard crypto: Use scoped init guard kcov: Use scoped init guard compiler-context-analysis: Introduce scoped init guards cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers seqlock: fix scoped_seqlock_read kernel-doc tools: Update context analysis macros in compiler_types.h rust: sync: Replace `kernel::c_str!` with C-Strings rust: sync: Inline various lock related methods rust: helpers: Move #define __rust_helper out of atomic.c rust: wait: Add __rust_helper to helpers rust: time: Add __rust_helper to helpers rust: task: Add __rust_helper to helpers rust: sync: Add __rust_helper to helpers rust: refcount: Add __rust_helper to helpers rust: rcu: Add __rust_helper to helpers rust: processor: Add __rust_helper to helpers ...
2026-02-10	Merge tag 'bpf-next-7.0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support associating BPF program with struct_ops (Amery Hung) - Switch BPF local storage to rqspinlock and remove recursion detection counters which were causing false positives (Amery Hung) - Fix live registers marking for indirect jumps (Anton Protopopov) - Introduce execution context detection BPF helpers (Changwoo Min) - Improve verifier precision for 32bit sign extension pattern (Cupertino Miranda) - Optimize BTF type lookup by sorting vmlinux BTF and doing binary search (Donglin Peng) - Allow states pruning for misc/invalid slots in iterator loops (Eduard Zingerman) - In preparation for ASAN support in BPF arenas teach libbpf to move global BPF variables to the end of the region and enable arena kfuncs while holding locks (Emil Tsalapatis) - Introduce support for implicit arguments in kfuncs and migrate a number of them to new API. This is a prerequisite for cgroup sub-schedulers in sched-ext (Ihor Solodrai) - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen) - Fix ORC stack unwind from kprobe_multi (Jiri Olsa) - Speed up fentry attach by using single ftrace direct ops in BPF trampolines (Jiri Olsa) - Require frozen map for calculating map hash (KP Singh) - Fix lock entry creation in TAS fallback in rqspinlock (Kumar Kartikeya Dwivedi) - Allow user space to select cpu in lookup/update operations on per-cpu array and hash maps (Leon Hwang) - Make kfuncs return trusted pointers by default (Matt Bobrowski) - Introduce "fsession" support where single BPF program is executed upon entry and exit from traced kernel function (Menglong Dong) - Allow bpf_timer and bpf_wq use in all programs types (Mykyta Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei Starovoitov) - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their definition across the tree (Puranjay Mohan) - Allow BPF arena calls from non-sleepable context (Puranjay Mohan) - Improve register id comparison logic in the verifier and extend linked registers with negative offsets (Puranjay Mohan) - In preparation for BPF-OOM introduce kfuncs to access memcg events (Roman Gushchin) - Use CFI compatible destructor kfunc type (Sami Tolvanen) - Add bitwise tracking for BPF_END in the verifier (Tianci Cao) - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou Tang) - Make BPF selftests work with 64k page size (Yonghong Song) * tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits) selftests/bpf: Fix outdated test on storage->smap selftests/bpf: Choose another percpu variable in bpf for btf_dump test selftests/bpf: Remove test_task_storage_map_stress_lookup selftests/bpf: Update task_local_storage/task_storage_nodeadlock test selftests/bpf: Update task_local_storage/recursion test selftests/bpf: Update sk_storage_omem_uncharge test bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} bpf: Support lockless unlink when freeing map or local storage bpf: Prepare for bpf_selem_unlink_nofail() bpf: Remove unused percpu counter from bpf_local_storage_map_free bpf: Remove cgroup local storage percpu counter bpf: Remove task local storage percpu counter bpf: Change local_storage->lock and b->lock to rqspinlock bpf: Convert bpf_selem_unlink to failable bpf: Convert bpf_selem_link_map to failable bpf: Convert bpf_selem_unlink_map to failable bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage selftests/xsk: fix number of Tx frags in invalid packet selftests/xsk: properly handle batch ending in the middle of a packet bpf: Prevent reentrance into call_rcu_tasks_trace() ...
2026-02-10	Merge tag 'libcrypto-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: - Add support for verifying ML-DSA signatures. ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is a recently-standardized post-quantum (quantum-resistant) signature algorithm. It was known as Dilithium pre-standardization. The first use case in the kernel will be module signing. But there are also other users of RSA and ECDSA signatures in the kernel that might want to upgrade to ML-DSA eventually. - Improve the AES library: - Make the AES key expansion and single block encryption and decryption functions use the architecture-optimized AES code. Enable these optimizations by default. - Support preparing an AES key for encryption-only, using about half as much memory as a bidirectional key. - Replace the existing two generic implementations of AES with a single one. - Simplify how Adiantum message hashing is implemented. Remove the "nhpoly1305" crypto_shash in favor of direct lib/crypto/ support for NH hashing, and enable optimizations by default. * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (53 commits) lib/crypto: mldsa: Clarify the documentation for mldsa_verify() slightly lib/crypto: aes: Drop 'volatile' from aes_sbox and aes_inv_sbox lib/crypto: aes: Remove old AES en/decryption functions lib/crypto: aesgcm: Use new AES library API lib/crypto: aescfb: Use new AES library API crypto: omap - Use new AES library API crypto: inside-secure - Use new AES library API crypto: drbg - Use new AES library API crypto: crypto4xx - Use new AES library API crypto: chelsio - Use new AES library API crypto: ccp - Use new AES library API crypto: x86/aes-gcm - Use new AES library API crypto: arm64/ghash - Use new AES library API crypto: arm/ghash - Use new AES library API staging: rtl8723bs: core: Use new AES library API net: phy: mscc: macsec: Use new AES library API chelsio: Use new AES library API Bluetooth: SMP: Use new AES library API crypto: x86/aes - Remove the superseded AES-NI crypto_cipher lib/crypto: x86/aes: Add AES-NI optimization ...
2026-02-10	net: dsa: eliminate local type for tc policers	Vladimir Oltean
	David Yang is saying that struct flow_action_entry in include/net/flow_offload.h has gained new fields and DSA's struct dsa_mall_policer_tc_entry, derived from that, isn't keeping up. This structure is passed to drivers and they are completely oblivious to the values of fields they don't see. This has happened before, and almost always the solution was to make the DSA layer thinner and use the upstream data structures. Here, the reason why we didn't do that is because struct flow_action_entry :: police is an anonymous structure. That is easily enough fixable, just name those fields "struct flow_action_police" and reference them from DSA. Make the according transformations to the two users (sja1105 and felix): "rate_bytes_per_sec" -> "rate_bytes_ps". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: David Yang <mmyangfl@gmail.com> Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260206075427.44733-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10	xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path	Jiayuan Chen
	icmp_route_lookup() performs multiple route lookups to find a suitable route for sending ICMP error messages, with special handling for XFRM (IPsec) policies. The lookup sequence is: 1. First, lookup output route for ICMP reply (dst = original src) 2. Pass through xfrm_lookup() for policy check 3. If blocked (-EPERM) or dst is not local, enter "reverse path" 4. In reverse path, call xfrm_decode_session_reverse() to get fl4_dec which reverses the original packet's flow (saddr<->daddr swapped) 5. If fl4_dec.saddr is local (we are the original destination), use __ip_route_output_key() for output route lookup 6. If fl4_dec.saddr is NOT local (we are a forwarding node), use ip_route_input() to simulate the reverse packet's input path 7. Finally, pass rt2 through xfrm_lookup() with XFRM_LOOKUP_ICMP flag The bug occurs in step 6: ip_route_input() is called with fl4_dec.daddr (original packet's source) as destination. If this address becomes local between the initial check and ip_route_input() call (e.g., due to concurrent "ip addr add"), ip_route_input() returns a LOCAL route with dst.output set to ip_rt_bug. This route is then used for ICMP output, causing dst_output() to call ip_rt_bug(), triggering a WARN_ON: ------------[ cut here ]------------ WARNING: net/ipv4/route.c:1275 at ip_rt_bug+0x21/0x30, CPU#1 Call Trace: <TASK> ip_push_pending_frames+0x202/0x240 icmp_push_reply+0x30d/0x430 __icmp_send+0x1149/0x24f0 ip_options_compile+0xa2/0xd0 ip_rcv_finish_core+0x829/0x1950 ip_rcv+0x2d7/0x420 __netif_receive_skb_one_core+0x185/0x1f0 netif_receive_skb+0x90/0x450 tun_get_user+0x3413/0x3fb0 tun_chr_write_iter+0xe4/0x220 ... Fix this by checking rt2->rt_type after ip_route_input(). If it's RTN_LOCAL, the route cannot be used for output, so treat it as an error. The reproducer requires kernel modification to widen the race window, making it unsuitable as a selftest. It is available at: https://gist.github.com/mrpre/eae853b72ac6a750f5d45d64ddac1e81 Reported-by: syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000b1060905eada8881@google.com/T/ Closes: https://lore.kernel.org/r/20260128090523.356953-1-jiayuan.chen@linux.dev Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260206050220.59642-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10	hsr: Implement more robust duplicate discard for HSR	Felix Maurer
	The HSR duplicate discard algorithm had even more basic problems than the described for PRP in the previous patch. It relied only on the last received sequence number to decide if a new frame should be forwarded to any port. This does not work correctly in any case where frames are received out of order. The linked bug report claims that this can even happen with perfectly fine links due to the order in which incoming frames are processed (which can be unexpected on multi-core systems). The issue also occasionally shows up in the HSR selftests. The main reason is that the sequence number that was last forwarded to the master port may have skipped a number which will in turn never be delivered to the host. As the problem (we accidentally skip over a sequence number that has not been received but will be received in the future) is similar to PRP, we can apply a similar solution. The duplicate discard algorithm based on the "sparse bitmap" works well for HSR if it is extended to track one bitmap for each port (A, B, master, interlink). To do this, change the sequence number blocks to contain a flexible array member as the last member that can keep chunks for as many bitmaps as we need. This design makes it easy to reuse the same algorithm in a potential PRP RedBox implementation. The duplicate discard algorithm functions are modified to deal with sequence number blocks of different sizes and to correctly use the array of bitmap chunks. There is a notable speciality for HSR: the port type has a special port type NONE with value 0. This leads to the number of port types being 5 instead of actually 4. To save memory, remove the NONE port from the bitmap (by subtracting 1) when setting up the block buffer and when accessing the bitmap chunks in the array. Removing the old algorithm allows us to get rid of a few fields that are not needed any more: time_out and seq_out for each port. We can also remove some functions that were only necessary for the previous duplicate discard algorithm. The removal of seq_out is possible despite its previous usage in hsr_register_frame_in: it was used to prevent updates to time_in when "invalid" sequence numbers were received. With the new duplicate discard algorithm, time_in has no relevance for the expiry of sequence numbers anymore. They will expire based on the timestamps in the sequence number blocks after at most 400ms. There is no need that a node "re-registers" to "resume communication": after 400ms, all sequence numbers are accepted again. Also, according to the IEC 62439-3:2021, all nodes are supposed to send no traffic for 500ms after boot to lead exactly to this expiry of seen sequence numbers. time_in is still used for pruning nodes from the node table after no traffic has been received for 60sec. Pruning is only needed if the node is really gone and has not been sending any traffic for that period. seq_out was also used to report the last incoming sequence number from a node through netlink. I am not sure how useful this value is to userspace at all, but added getting it from the sequence number blocks. This number can be outdated after node merging until a new block has been added. Update the KUnit test for the PRP duplicate discard so that the node allocation matches and expectations on the removed fields are removed. Reported-by: Yoann Congal <yoann.congal@smile.fr> Closes: https://lore.kernel.org/netdev/7d221a07-8358-4c0b-a09c-3b029c052245@smile.fr/ Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/36dc3bc5bdb7e68b70bb5ef86f53ca95a3f35418.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10	hsr: Implement more robust duplicate discard for PRP	Felix Maurer
	The PRP duplicate discard algorithm does not work reliably with certain link faults. Especially with packet loss on one link, the duplicate discard algorithm drops valid packets which leads to packet loss on the PRP interface where the link fault should in theory be perfectly recoverable by PRP. This happens because the algorithm opens the drop window on the lossy link, covering received and lost sequence numbers. If the other, non-lossy link receives the duplicate for a lost frame, it is within the drop window of the lossy link and therefore dropped. Since IEC 62439-3:2012, a node has one sequence number counter for frames it sends, instead of one sequence number counter for each destination. Therefore, a node can not expect to receive contiguous sequence numbers from a sender. A missing sequence number can be totally normal (if the sender intermittently communicates with another node) or mean a frame was lost. The algorithm, as previously implemented in commit 05fd00e5e7b1 ("net: hsr: Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0) but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no algorithm is specified but up to implementers. It should be "designed such that it never rejects a legitimate frame, while occasional acceptance of a duplicate can be tolerated" (IEC 62439-3:2021). For the duplicate discard algorithm, this means that 1) we need to track the sequence numbers individually to account for non-contiguous sequence numbers, and 2) we should always err on the side of accepting a duplicate than dropping a valid frame. The idea of the new algorithm is to store the seen sequence numbers in a bitmap. To keep the size of the bitmap in control, we store it as a "sparse bitmap" where the bitmap is split into blocks and not all blocks exist at the same time. The sparse bitmap is implemented using an xarray that keeps the references to the individual blocks and a backing ring buffer that stores the actual blocks. New blocks are initialized in the buffer and added to the xarray as needed when new frames arrive. Existing blocks are removed in two conditions: 1. The block found for an arriving sequence number is old and therefore not relevant to the duplicate discard algorithm anymore, i.e., it has been added more than the entry forget time ago. In this case, the block is removed from the xarray and marked as forgotten (by setting its timestamp to 0). 2. Space is needed in the ring buffer for a new block. In this case, the block is removed from the xarray, if it hasn't already been forgotten (by 1.). Afterwards, the new block is initialized in its place. This has the nice property that we can reliably track sequence numbers on low traffic situations (where they expire based on their timestamp) and more quickly forget sequence numbers in high traffic situations before they potentially wrap over and repeat before they are expired. When nodes are merged, the blocks are merged as well. The timestamp of a merged block is set to the minimum of the two timestamps to never keep around a seen sequence number for too long. The bitmaps are or'd to mark all seen sequence numbers as seen. All of this still happens under seq_out_lock, to prevent concurrent access to the blocks. The KUnit test for the algorithm is updated as well. The updates are done in a way to match the original intends pretty closely. Currently, there is much knowledge about the actual algorithm baked into the tests (especially the expectations) which may need some redesign in the future. Reported-by: Steffen Lindner <steffen.lindner@de.abb.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Steffen Lindner <steffen.lindner@de.abb.com> Link: https://patch.msgid.link/8ce15a996099df2df5b700969a39e7df400e8dbb.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10	net: atm: fix crash due to unvalidated vcc pointer in sigd_send()	Jiayuan Chen
	Reproducer available at [1]. The ATM send path (sendmsg -> vcc_sendmsg -> sigd_send) reads the vcc pointer from msg->vcc and uses it directly without any validation. This pointer comes from userspace via sendmsg() and can be arbitrarily forged: int fd = socket(AF_ATMSVC, SOCK_DGRAM, 0); ioctl(fd, ATMSIGD_CTRL); // become ATM signaling daemon struct msghdr msg = { .msg_iov = &iov, ... }; (unsigned long )(buf + 4) = 0xdeadbeef; // fake vcc pointer sendmsg(fd, &msg, 0); // kernel dereferences 0xdeadbeef In normal operation, the kernel sends the vcc pointer to the signaling daemon via sigd_enq() when processing operations like connect(), bind(), or listen(). The daemon is expected to return the same pointer when responding. However, a malicious daemon can send arbitrary pointer values. Fix this by introducing find_get_vcc() which validates the pointer by searching through vcc_hash (similar to how sigd_close() iterates over all VCCs), and acquires a reference via sock_hold() if found. Since struct atm_vcc embeds struct sock as its first member, they share the same lifetime. Therefore using sock_hold/sock_put is sufficient to keep the vcc alive while it is being used. Note that there may be a race with sigd_close() which could mark the vcc with various flags (e.g., ATM_VF_RELEASED) after find_get_vcc() returns. However, sock_hold() guarantees the memory remains valid, so this race only affects the logical state, not memory safety. [1]: https://gist.github.com/mrpre/1ba5949c45529c511152e2f4c755b0f3 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1f22cb1769f249df9fa0@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69039850.a70a0220.5b2ed.005d.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260205095501.131890-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-09	Merge tag 'kthread-for-7.0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull kthread updates from Frederic Weisbecker: "The kthread code provides an infrastructure which manages the preferred affinity of unbound kthreads (node or custom cpumask) against housekeeping (CPU isolation) constraints and CPU hotplug events. One crucial missing piece is the handling of cpuset: when an isolated partition is created, deleted, or its CPUs updated, all the unbound kthreads in the top cpuset become indifferently affine to _all_ the non-isolated CPUs, possibly breaking their preferred affinity along the way. Solve this with performing the kthreads affinity update from cpuset to the kthreads consolidated relevant code instead so that preferred affinities are honoured and applied against the updated cpuset isolated partitions. The dispatch of the new isolated cpumasks to timers, workqueues and kthreads is performed by housekeeping, as per the nice Tejun's suggestion. As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set from boot defined domain isolation (through isolcpus=) and cpuset isolated partitions. Housekeeping cpumasks are now modifiable with a specific RCU based synchronization. A big step toward making nohz_full= also mutable through cpuset in the future" * tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (33 commits) doc: Add housekeeping documentation kthread: Document kthread_affine_preferred() kthread: Comment on the purpose and placement of kthread_affine_node() call kthread: Honour kthreads preferred affinity after cpuset changes sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management kthread: Include kthreadd to the managed affinity list kthread: Include unbound kthreads in the managed affinity list kthread: Refine naming of affinity related fields PCI: Remove superfluous HK_TYPE_WQ check sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() cpuset: Remove cpuset_cpu_is_isolated() timers/migration: Remove superfluous cpuset isolation test cpuset: Propagate cpuset isolation update to timers through housekeeping cpuset: Propagate cpuset isolation update to workqueue through housekeeping PCI: Flush PCI probe workqueue on cpuset isolated partition change sched/isolation: Flush vmstat workqueues on cpuset isolated partition change sched/isolation: Flush memcg workqueues on cpuset isolated partition change cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset ...
2026-02-09	SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path	Daniel Hodges
	Commit 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") added a kref_get(&gss_auth->kref) call to balance the gss_put_auth() done in gss_release_msg(), but forgot to add a corresponding kref_put() on the error path when kstrdup_const() fails. If service_name is non-NULL and kstrdup_const() fails, the function jumps to err_put_pipe_version which calls put_pipe_version() and kfree(gss_msg), but never releases the gss_auth reference. This leads to a kref leak where the gss_auth structure is never freed. Add a forward declaration for gss_free_callback() and call kref_put() in the err_put_pipe_version error path to properly release the reference taken earlier. Fixes: 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") Cc: stable@vger.kernel.org Signed-off-by: Daniel Hodges <git@danielhodges.dev> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09	SUNRPC: Change list definition method	Chenguang Zhao
	The LIST_HEAD macro can both define a linked list and initialize it in one step. To simplify code, we replace the separate operations of linked list definition and manual initialization with the LIST_HEAD macro. Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09	Merge tag 'audit-pr-20260203' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Improve the NETFILTER_PKT audit records Add source and destination ports to the NETFILTER_PKT audit records while also consolidating a lot of the code into a new, singular audit_log_nf_skb() function. This new approach to structuring the NETFILTER_PKT record generation should eliminate some unnecessary overhead when audit is not built into the kernel. - Update the audit syscall classifier code Add the listxattrat(), getxattrat(), and fchmodat2() syscall to the audit code which classifies syscalls into categories of operations, e.g. "read" or "change attributes". - Move the syscall classifier declarations into audit_arch.h Shuffle around some header file declarations to resolve some sparse warnings. * tag 'audit-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move the compat_xxx_class[] extern declarations to audit_arch.h audit: add missing syscalls to read class audit: include source and destination ports to NETFILTER_PKT audit: add audit_log_nf_skb helper function audit: add fchmodat2() to change attributes class
2026-02-09	libceph: adapt ceph_x_challenge_blob hashing and msgr1 message signing	Ilya Dryomov
	The existing approach where ceph_x_challenge_blob is encrypted with the client's secret key and then the digest derived from the ciphertext is used for the test doesn't work with CEPH_CRYPTO_AES256KRB5 because the confounder randomizes the ciphertext: the client and the server get two different ciphertexts and therefore two different digests. msgr1 signatures are affected the same way: a digest derived from the ciphertext for the message's "sigblock" is what becomes a signature and the two sides disagree on the expected value. For CEPH_CRYPTO_AES256KRB5 (and potential future encryption schemes), switch to HMAC-SHA256 function keyed in the same way as the existing encryption. For CEPH_CRYPTO_AES, everything is preserved as is. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	libceph: add support for CEPH_CRYPTO_AES256KRB5	Ilya Dryomov
	This is based on AES256-CTS-HMAC384-192 crypto algorithm per RFC 8009 (i.e. Kerberos 5, hence the name) with custom-defined key usage numbers. The implementation allows a given key to have/be linked to between one and three usage numbers. The existing CEPH_CRYPTO_AES remains in place and unchanged. The usage_slot parameter that needed to be added to ceph_crypt() and its wrappers is simply ignored there. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	libceph: introduce ceph_crypto_key_prepare()	Ilya Dryomov
	In preparation for bringing in a new encryption scheme/key type, decouple decoding or cloning the key from allocating required crypto API objects and setting them up. The rationale is that a) in some cases a shallow clone is sufficient and b) ceph_crypto_key_prepare() may grow additional parameters that would be inconvenient to provide at the point the key is originally decoded. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	libceph: generalize ceph_x_encrypt_offset() and ceph_x_encrypt_buflen()	Ilya Dryomov
	- introduce the notion of a data offset for ceph_x_encrypt_offset() to allow for e.g. confounder to be prepended before the encryption header in the future. For CEPH_CRYPTO_AES, the data offset is 0 (i.e. nothing is prepended). - adjust ceph_x_encrypt_buflen() accordingly and make it account for PKCS#7 padding that is used by CEPH_CRYPTO_AES precisely instead of just always adding 16. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	libceph: define and enforce CEPH_MAX_KEY_LEN	Ilya Dryomov
	When decoding the key, verify that the key material would fit into a fixed-size buffer in process_auth_done() and generally has a sane length. The new CEPH_MAX_KEY_LEN check replaces the existing check for a key with no key material which is a) not universal since CEPH_CRYPTO_NONE has to be excluded and b) doesn't provide much value since a smaller than needed key is just as invalid as no key -- this has to be handled elsewhere anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	xfrm: always flush state and policy upon NETDEV_UNREGISTER event	Tetsuo Handa
	syzbot is reporting that "struct xfrm_state" refcount is leaking. unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2 ref_tracker: netdev@ffff888052f24618 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_tracker_alloc include/linux/netdevice.h:4412 [inline] xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316 xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline] xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022 xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550 xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646 __sys_sendmsg+0x16d/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") implemented xfrm_dev_unregister() as no-op despite xfrm_dev_state_add() from xfrm_state_construct() acquires a reference to "struct net_device". I guess that that commit expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add() is called only if (dev->features & NETIF_F_HW_ESP) != 0. Sabrina Dubroca identified steps to reproduce the same symptoms as below. echo 0 > /sys/bus/netdevsim/new_device dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/) ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \ spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \ offload crypto dev $dev dir out ethtool -K $dev esp-hw-offload off echo 0 > /sys/bus/netdevsim/del_device Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after xfrm_dev_state_add() acquired a reference to "struct net_device". Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit when acquiring a reference to "struct net_device". Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device") re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that commit for unknown reason chose to share xfrm_dev_down() between the NETDEV_DOWN event and the NETDEV_UNREGISTER event. I guess that that commit missed the behavior in the previous paragraph. Therefore, we need to re-introduce xfrm_dev_unregister() in order to release the reference to "struct net_device" by unconditionally flushing state and policy. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-06	net/ipv6: Remove jumbo_remove step from TX path	Alice Mikityanska
	Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove unnecessary steps from the GSO TX path, that used to check and remove HBH. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-5-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net/ipv6: Drop HBH for BIG TCP on RX side	Alice Mikityanska
	Complementary to the previous commit, stop inserting HBH when building BIG TCP GRO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-4-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net/ipv6: Drop HBH for BIG TCP on TX side	Alice Mikityanska
	BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real IPv6 payload length when it doesn't fit into the 16-bit field in the IPv6 header itself. While it helps tools parse the packet, it also requires every driver that supports TSO and BIG TCP to remove this 8-byte extension header. It might not sound that bad until we try to apply it to tunneled traffic. Currently, the drivers don't attempt to strip HBH if skb->encapsulation = 1. Moreover, trying to do so would require dissecting different tunnel protocols and making corresponding adjustments on case-by-case basis, which would slow down the fastpath (potentially also requiring adjusting checksums in outer headers). At the same time, BIG TCP IPv4 doesn't insert any extra headers and just calculates the payload length from skb->len, significantly simplifying implementing BIG TCP for tunnels. Stop inserting HBH when building BIG TCP GSO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-3-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net/ipv6: Introduce payload_len helpers	Alice Mikityanska
	The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	mptcp: fix kdoc warnings	Matthieu Baerts (NGI0)
	The following warnings were visible: $ ./scripts/kernel-doc -Wall -none \ net/mptcp/ include/net/mptcp.h include/uapi/linux/mptcp.h \ include/trace/events/mptcp.h Warning: net/mptcp/token.c:108 No description found for return value of 'mptcp_token_new_request' Warning: net/mptcp/token.c:151 No description found for return value of 'mptcp_token_new_connect' Warning: net/mptcp/token.c:246 No description found for return value of 'mptcp_token_get_sock' Warning: net/mptcp/token.c:298 No description found for return value of 'mptcp_token_iter_next' Warning: net/mptcp/protocol.c:4431 No description found for return value of 'mptcp_splice_read' Warning: include/uapi/linux/mptcp_pm.h:13 missing initial short description on line: enum mptcp_event_type Address all of them: either by using the 'Return:' keyword, or by adding a missing initial short description. The MPTCP CI will soon report issues with kdoc to avoid introducing new issues and being flagged by the Netdev CI. Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-3-c2720ce75c34@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	mptcp: pm: in-kernel: clarify mptcp_pm_remove_anno_addr()	Matthieu Baerts (NGI0)
	The variable 'ret' was used, but it was not cleared what it was, and probably led to an issue [1]. Rename it to 'announced' to avoid confusions. While at it, remove the returned value of the helper: it is only used in one place, and the returned value is not used. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/606 [1] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-2-c2720ce75c34@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	mptcp: pm: in-kernel: always set ID as avail when rm endp	Matthieu Baerts (NGI0)
	Syzkaller managed to find a combination of actions that was generating this warning: WARNING: net/mptcp/pm_kernel.c:1074 at __mark_subflow_endp_available net/mptcp/pm_kernel.c:1074 [inline], CPU#1: syz.7.48/2535 WARNING: net/mptcp/pm_kernel.c:1074 at mptcp_pm_nl_fullmesh net/mptcp/pm_kernel.c:1446 [inline], CPU#1: syz.7.48/2535 WARNING: net/mptcp/pm_kernel.c:1074 at mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1474 [inline], CPU#1: syz.7.48/2535 WARNING: net/mptcp/pm_kernel.c:1074 at mptcp_pm_nl_set_flags+0x5de/0x640 net/mptcp/pm_kernel.c:1538, CPU#1: syz.7.48/2535 Modules linked in: CPU: 1 UID: 0 PID: 2535 Comm: syz.7.48 Not tainted 6.18.0-03987-gea5f5e676cf5 #17 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 25.10 PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:__mark_subflow_endp_available net/mptcp/pm_kernel.c:1074 [inline] RIP: 0010:mptcp_pm_nl_fullmesh net/mptcp/pm_kernel.c:1446 [inline] RIP: 0010:mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1474 [inline] RIP: 0010:mptcp_pm_nl_set_flags+0x5de/0x640 net/mptcp/pm_kernel.c:1538 Code: 89 c7 e8 c5 8c 73 fe e9 f7 fd ff ff 49 83 ef 80 e8 b7 8c 73 fe 4c 89 ff be 03 00 00 00 e8 4a 29 e3 fe eb ac e8 a3 8c 73 fe 90 <0f> 0b 90 e9 3d ff ff ff e8 95 8c 73 fe b8 a1 ff ff ff eb 1a e8 89 RSP: 0018:ffffc9001535b820 EFLAGS: 00010287 netdevsim0: tun_chr_ioctl cmd 1074025677 RAX: ffffffff82da294d RBX: 0000000000000001 RCX: 0000000000080000 RDX: ffffc900096d0000 RSI: 00000000000006d6 RDI: 00000000000006d7 netdevsim0: linktype set to 823 RBP: ffff88802cdb2240 R08: 00000000000104ae R09: ffffffffffffffff R10: ffffffff82da27d4 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88801246d8c0 R14: ffffc9001535b8b8 R15: ffff88802cdb1800 FS: 00007fc6ac5a76c0(0000) GS:ffff8880f90c8000(0000) knlGS:0000000000000000 netlink: 'syz.3.50': attribute type 5 has an invalid length. CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 netlink: 1232 bytes leftover after parsing attributes in process `syz.3.50'. CR2: 0000200000010000 CR3: 0000000025b1a000 CR4: 0000000000350ef0 Call Trace: <TASK> mptcp_pm_set_flags net/mptcp/pm_netlink.c:277 [inline] mptcp_pm_nl_set_flags_doit+0x1d7/0x210 net/mptcp/pm_netlink.c:282 genl_family_rcv_msg_doit+0x117/0x180 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x3a8/0x3f0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x16d/0x240 net/netlink/af_netlink.c:2550 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x3e9/0x4c0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x4ab/0x5b0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0xc9/0xf0 net/socket.c:733 ____sys_sendmsg+0x272/0x3b0 net/socket.c:2608 ___sys_sendmsg+0x2de/0x320 net/socket.c:2662 __sys_sendmsg net/socket.c:2694 [inline] __do_sys_sendmsg net/socket.c:2699 [inline] __se_sys_sendmsg net/socket.c:2697 [inline] __x64_sys_sendmsg+0x110/0x1a0 net/socket.c:2697 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xed/0x360 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc6adb66f6d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fc6ac5a6ff8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fc6addf5fa0 RCX: 00007fc6adb66f6d RDX: 0000000000048084 RSI: 00002000000002c0 RDI: 000000000000000e RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 netlink: 'syz.5.51': attribute type 2 has an invalid length. R13: 00007fff25e91fe0 R14: 00007fc6ac5a7ce4 R15: 00007fff25e920d7 </TASK> The actions that caused that seem to be: - Create an MPTCP endpoint for address A without any flags - Create a new MPTCP connection from address A - Remove the MPTCP endpoint: the corresponding subflows will be removed - Recreate the endpoint with the same ID, but with the subflow flag - Change the same endpoint to add the fullmesh flag In this case, msk->pm.local_addr_used has been kept to 0 as expected, but the corresponding bit in msk->pm.id_avail_bitmap was still unset after having removed the endpoint, causing the splat later on. When removing an endpoint, the corresponding endpoint ID was only marked as available for "signal" types with an announced address, plus all "subflow" types, but not the other types like an endpoint corresponding to the initial subflow. In these cases, re-creating an endpoint with the same ID didn't signal/create anything. Here, adding the fullmesh flag was creating the splat when calling __mark_subflow_endp_available() from mptcp_pm_nl_fullmesh(), because msk->pm.local_addr_used was set to 0 while the ID was marked as used. To fix this issue, the corresponding bit in msk->pm.id_avail_bitmap can always be set as available when removing an MPTCP in-kernel endpoint. In other words, moving the call to __set_bit() to do it in all cases, except for "subflow" types where this bit is handled in a dedicated helper. Note: instead of adding a new spin_(un)lock_bh that would be taken in all cases, do all the actions requiring the spin lock under the same block. This modification potentially fixes another issue reported by syzbot, see [1]. But without a reproducer or more details about what exactly happened before, it is hard to confirm. Fixes: e255683c06df ("mptcp: pm: re-using ID of unused removed ADD_ADDR") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/606 Reported-by: syzbot+f56f7d56e2c6e11a01b6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/68fcfc4a.050a0220.346f24.02fb.GAE@google.com [1] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-1-c2720ce75c34@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	ipv6: do not use skb_header_pointer() in icmpv6_filter()	Eric Dumazet
	Prefer pskb_may_pull() to avoid a stack canary in raw6_local_deliver(). Note: skb->head can change, hence we reload ip6h pointer in ipv6_raw_deliver() $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-86 (-86) Function old new delta raw6_local_deliver 780 694 -86 Total: Before=24889784, After=24889698, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205211909.4115285-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	tcp: inline tcp_filter()	Eric Dumazet
	This helper is already (auto)inlined from IPv4 TCP stack. Make it an inline function to benefit IPv6 as well. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19) Function old new delta tcp_v6_rcv 3448 3478 +30 __pfx_tcp_filter 16 - -16 tcp_filter 33 - -33 Total: Before=24891904, After=24891885, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net: skb: allow up to 8 skb extension ids	Oliver Hartkopp
	The skb extension ids range from 0 .. 7 to fit their bits as flags into a single byte. The ids are automatically enumnerated in enum skb_ext_id in skbuff.h, where SKB_EXT_NUM is defined as the last value. When having 8 skb extension ids (0 .. 7), SKB_EXT_NUM becomes 8 which is a valid value for SKB_EXT_NUM. Fixes: 96ea3a1e2d31 ("can: add CAN skb extension infrastructure") Link: https://lore.kernel.org/netdev/aXoMqaA7b2CqJZNA@strlen.de/ Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260205-skb_ext-v1-1-9ba992ccee8b@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net_sched: sch_fq: rework fq_gc() to avoid stack canary	Eric Dumazet
	Using kmem_cache_free_bulk() in fq_gc() was not optimal. 1) It needs an array. 2) It is only saving cpu cycles for large batches. The automatic array forces a stack canary, which is expensive. In practice fq_gc was finding zero, one or two flows at most per round. Remove the array, use kmem_cache_free(). This makes fq_enqueue() smaller and faster. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-79 (-79) Function old new delta fq_enqueue 1629 1550 -79 Total: Before=24886583, After=24886504, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260204190034.76277-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	netns: optimize netns cleaning by batching unhash_nsid calls	Qiliang Yuan
	Currently, unhash_nsid() scans the entire system for each netns being killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as __peernet2id() also performs a linear search in the IDR. Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move unhash_nsid() out of the per-netns loop in cleanup_net() to perform a single-pass traversal over survivor namespaces. Identify dying peers by an 'is_dying' flag, which is set under net_rwsem write lock after the netns is removed from the global list. This batches the unhashing work and eliminates the O(L_dying_net) multiplier. To minimize the impact on struct net size, 'is_dying' is placed in an existing hole after 'hash_mix' in struct net. Use a restartable idr_get_next() loop for iteration. This avoids the unsafe modification issue inherent to idr_for_each() callbacks and allows dropping the nsid_lock to safely call sleepy rtnl_net_notifyid(). Clean up redundant nsid_lock and simplify the destruction loop now that unhashing is centralized. Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}	Amery Hung
	Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}() properly by switching to bpf_selem_unlink_nofail(). Both functions iterate their own RCU-protected list of selems and call bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when both map_free() and destroy() fail to remove a selem from b->list (extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(), also switch to hlist_for_each_entry_rcu() since we no longer iterate local_storage->list under local_storage->lock. bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths so reuse_now should always be false. Remove it from the argument and hardcode it. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
2026-02-06	bpf: Remove unused percpu counter from bpf_local_storage_map_free	Amery Hung
	Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06	bpf: Convert bpf_selem_unlink to failable	Amery Hung
	To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06	bpf: Convert bpf_selem_link_map to failable	Amery Hung
	To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06	bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage	Amery Hung
	A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
2026-02-06	netfilter: nft_set_rbtree: validate open interval overlap	Pablo Neira Ayuso
	Open intervals do not have an end element, in particular an open interval at the end of the set is hard to validate because of it is lacking the end element, and interval validation relies on such end element to perform the checks. This patch adds a new flag field to struct nft_set_elem, this is not an issue because this is a temporary object that is allocated in the stack from the insert/deactivate path. This flag field is used to specify that this is the last element in this add/delete command. The last flag is used, in combination with the start element cookie, to check if there is a partial overlap, eg. Already exists: 255.255.255.0-255.255.255.254 Add interval: 255.255.255.0-255.255.255.255 ~~~~~~~~~~~~~ start element overlap Basically, the idea is to check for an existing end element in the set if there is an overlap with an existing start element. However, the last open interval can come in any position in the add command, the corner case can get a bit more complicated: Already exists: 255.255.255.0-255.255.255.254 Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254 ~~~~~~~~~~~~~ start element overlap To catch this overlap, annotate that the new start element is a possible overlap, then report the overlap if the next element is another start element that confirms that previous element in an open interval at the end of the set. For deletions, do not update the start cookie when deleting an open interval, otherwise this can trigger spurious EEXIST when adding new elements. Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would make easier to detect open interval overlaps. Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_set_rbtree: validate element belonging to interval	Pablo Neira Ayuso
	The existing partial overlap detection does not check if the elements belong to the interval, eg. add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5 } add element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT Similar situation occurs with deletions: add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5} delete element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT This currently works via mitigation by nft in userspace, which is performing the overlap detection before sending the elements to the kernel. This requires a previous netlink dump of the set content which slows down incremental updates on interval sets, because a netlink set content dump is needed. This patch extends the existing overlap detection to track the most recent start element that already exists. The pointer to the existing start element is stored as a cookie (no pointer dereference is ever possible). If the end element is added and it already exists, then check that the existing end element is adjacent to the already existing start element. Similar logic applies to element deactivation. This patch also annotates the timestamp to identify if start cookie comes from an older batch, in such case reset it. Otherwise, a failing create element command leaves the start cookie in place, resulting in bogus error reporting. There is still a few more corner cases of overlap detection related to the open interval that are addressed in follow up patches. This is address an early design mistake where an interval is expressed as two elements, using the NFT_SET_ELEM_INTERVAL_END flag, instead of the more recent NFTA_SET_ELEM_KEY_END attribute that pipapo already uses. Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets	Pablo Neira Ayuso
	Userspace provides an optimized representation in case intervals are adjacent, where the end element is omitted. The existing partial overlap detection logic skips anonymous set checks on start elements for this reason. However, it is possible to add intervals that overlap to this anonymous where two start elements with the same, eg. A-B, A-C where C < B. start end A B start end A C Restore the check on overlapping start elements to report an overlap. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval	Pablo Neira Ayuso
	Userspace adds a non-matching null element to the kernel for historical reasons. This null element is added when the set is populated with elements. Inclusion of this element is conditional, therefore, userspace needs to dump the set content to check for its presence. If the NLM_F_CREATE flag is turned on, this becomes an issue because kernel bogusly reports EEXIST. Add special case to ignore NLM_F_CREATE in this case, therefore, re-adding the nul-element never fails. Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_counter: fix reset of counters on 32bit archs	Anders Grahn
	nft_counter_reset() calls u64_stats_add() with a negative value to reset the counter. This will work on 64bit archs, hence the negative value added will wrap as a 64bit value which then can wrap the stat counter as well. On 32bit archs, the added negative value will wrap as a 32bit value and _not_ wrapping the stat counter properly. In most cases, this would just lead to a very large 32bit value being added to the stat counter. Fix by introducing u64_stats_sub(). Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.") Signed-off-by: Anders Grahn <anders.grahn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_set_hash: fix get operation on big endian	Florian Westphal
	tests/shell/testcases/packetpath/set_match_nomatch_hash_fast fails on big endian with: Error: Could not process rule: No such file or directory reset element ip test s { 244.147.90.126 } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Fatal: Cannot fetch element "244.147.90.126" ... because the wrong bucket is searched, jhash() and jhash1_word are not interchangeable on big endian. Fixes: 3b02b0adc242 ("netfilter: nft_set_hash: fix lookups with fixed size hash on big endian") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: flowtable: dedicated slab for flow entry	Qingfang Deng
	The size of `struct flow_offload` has grown beyond 256 bytes on 64-bit kernels (currently 280 bytes) because of the `flow_offload_tunnel` member added recently. So kmalloc() allocates from the kmalloc-512 slab, causing significant memory waste per entry. Introduce a dedicated slab cache for flow entries to reduce memory footprint. Results in a reduction from 512 bytes to 320 bytes per entry on x86_64 kernels. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation	Florian Westphal
	Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06	netfilter: nft_set_rbtree: don't gc elements on insert	Florian Westphal
	During insertion we can queue up expired elements for garbage collection. In case of later abort, the commit hook will never be called. Packet path and 'get' requests will find free'd elements in the binary search blob: nft_set_ext_key include/net/netfilter/nf_tables.h:800 [inline] nft_array_get_cmp+0x1f6/0x2a0 net/netfilter/nft_set_rbtree.c:133 __inline_bsearch include/linux/bsearch.h:15 [inline] bsearch+0x50/0xc0 lib/bsearch.c:33 nft_rbtree_get+0x16b/0x400 net/netfilter/nft_set_rbtree.c:169 nft_setelem_get net/netfilter/nf_tables_api.c:6495 [inline] nft_get_set_elem+0x420/0xaa0 net/netfilter/nf_tables_api.c:6543 nf_tables_getsetelem+0x448/0x5e0 net/netfilter/nf_tables_api.c:6632 nfnetlink_rcv_msg+0x8ae/0x12c0 net/netfilter/nfnetlink.c:290 Also, when we insert an element that triggers -EEXIST, and that insertion happens to also zap a timed-out entry, we end up with same issue: Neither commit nor abort hook is called. Fix this by removing gc api usage during insertion. The blamed commit also removes concurrency of the rbtree with the packet path, so we can now safely rb_erase() the element and move it to a new expired list that can be reaped in the commit hook before building the next blob iteration. This also avoids the need to rebuild the blob in the abort path: Expired elements seen during insertion attempts are kept around until a transaction passes. Reported-by: syzbot+d417922a3e7935517ef6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d417922a3e7935517ef6 Fixes: 7e43e0a1141d ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") Signed-off-by: Florian Westphal <fw@strlen.de>