summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
4 daysnetfilter: ipset: drop logically empty buckets in mtype_delYifan Wu
commit 9862ef9ab0a116c6dca98842aab7de13a252ae02 upstream. mtype_del() counts empty slots below n->pos in k, but it only drops the bucket when both n->pos and k are zero. This misses buckets whose live entries have all been removed while n->pos still points past deleted slots. Treat a bucket as empty when all positions below n->pos are unused and release it directly instead of shrinking it further. Fixes: 8af1c6fbd923 ("netfilter: ipset: Fix forceadd evaluation path") Cc: stable@vger.kernel.org Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysnetfilter: nf_tables: reject immediate NF_QUEUE verdictPablo Neira Ayuso
[ Upstream commit da107398cbd4bbdb6bffecb2ce86d5c9384f4cec ] nft_queue is always used from userspace nftables to deliver the NF_QUEUE verdict. Immediately emitting an NF_QUEUE verdict is never used by the userspace nft tools, so reject immediate NF_QUEUE verdicts. The arp family does not provide queue support, but such an immediate verdict is still reachable. Globally reject NF_QUEUE immediate verdicts to address this issue. Fixes: f342de4e2f33 ("netfilter: nf_tables: reject QUEUE/DROP verdict parameters") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: x_tables: restrict xt_check_match/xt_check_target extensions for ↵Pablo Neira Ayuso
NFPROTO_ARP [ Upstream commit 3d5d488f11776738deab9da336038add95d342d1 ] Weiming Shi says: xt_match and xt_target structs registered with NFPROTO_UNSPEC can be loaded by any protocol family through nft_compat. When such a match/target sets .hooks to restrict which hooks it may run on, the bitmask uses NF_INET_* constants. This is only correct for families whose hook layout matches NF_INET_*: IPv4, IPv6, INET, and bridge all share the same five hooks (PRE_ROUTING ... POST_ROUTING). ARP only has three hooks (IN=0, OUT=1, FORWARD=2) with different semantics. Because NF_ARP_OUT == 1 == NF_INET_LOCAL_IN, the .hooks validation silently passes for the wrong reasons, allowing matches to run on ARP chains where the hook assumptions (e.g. state->in being set on input hooks) do not hold. This leads to NULL pointer dereferences; xt_devgroup is one concrete example: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000044: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000220-0x0000000000000227] RIP: 0010:devgroup_mt+0xff/0x350 Call Trace: <TASK> nft_match_eval (net/netfilter/nft_compat.c:407) nft_do_chain (net/netfilter/nf_tables_core.c:285) nft_do_chain_arp (net/netfilter/nft_chain_filter.c:61) nf_hook_slow (net/netfilter/core.c:623) arp_xmit (net/ipv4/arp.c:666) </TASK> Kernel panic - not syncing: Fatal exception in interrupt Fix it by restricting arptables to NFPROTO_ARP extensions only. Note that arptables-legacy only supports: - arpt_CLASSIFY - arpt_mangle - arpt_MARK that provide explicit NFPROTO_ARP match/target declarations. Fixes: 9291747f118d ("netfilter: xtables: add device group match") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: ctnetlink: ignore explicit helper on new expectationsPablo Neira Ayuso
[ Upstream commit 917b61fa2042f11e2af4c428e43f08199586633a ] Use the existing master conntrack helper, anything else is not really supported and it just makes validation more complicated, so just ignore what helper userspace suggests for this expectation. This was uncovered when validating CTA_EXPECT_CLASS via different helper provided by userspace than the existing master conntrack helper: BUG: KASAN: slab-out-of-bounds in nf_ct_expect_related_report+0x2479/0x27c0 Read of size 4 at addr ffff8880043fe408 by task poc/102 Call Trace: nf_ct_expect_related_report+0x2479/0x27c0 ctnetlink_create_expect+0x22b/0x3b0 ctnetlink_new_expect+0x4bd/0x5c0 nfnetlink_rcv_msg+0x67a/0x950 netlink_rcv_skb+0x120/0x350 Allowing to read kernel memory bytes off the expectation boundary. CTA_EXPECT_HELP_NAME is still used to offer the helper name to userspace via netlink dump. Fixes: bd0779370588 ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Reported-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: nf_conntrack_expect: store netns and zone in expectationPablo Neira Ayuso
[ Upstream commit 02a3231b6d82efe750da6554ebf280e4a6f78756 ] __nf_ct_expect_find() and nf_ct_expect_find_get() are called under rcu_read_lock() but they dereference the master conntrack via exp->master. Since the expectation does not hold a reference on the master conntrack, this could be dying conntrack or different recycled conntrack than the real master due to SLAB_TYPESAFE_RCU. Store the netns, the master_tuple and the zone in struct nf_conntrack_expect as a safety measure. This patch is required by the follow up fix not to dump expectations that do not belong to this netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 917b61fa2042 ("netfilter: ctnetlink: ignore explicit helper on new expectations") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: nf_conntrack_expect: use expect->helperPablo Neira Ayuso
[ Upstream commit f01794106042ee27e54af6fdf5b319a2fe3df94d ] Use expect->helper in ctnetlink and /proc to dump the helper name. Using nfct_help() without holding a reference to the master conntrack is unsafe. Use exp->master->helper in ctnetlink path if userspace does not provide an explicit helper when creating an expectation to retain the existing behaviour. The ctnetlink expectation path holds the reference on the master conntrack and nf_conntrack_expect lock and the nfnetlink glue path refers to the master ct that is attached to the skb. Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 917b61fa2042 ("netfilter: ctnetlink: ignore explicit helper on new expectations") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: nf_conntrack_expect: honor expectation helper fieldPablo Neira Ayuso
[ Upstream commit 9c42bc9db90a154bc61ae337a070465f3393485a ] The expectation helper field is mostly unused. As a result, the netfilter codebase relies on accessing the helper through exp->master. Always set on the expectation helper field so it can be used to reach the helper. nf_ct_expect_init() is called from packet path where the skb owns the ct object, therefore accessing exp->master for the newly created expectation is safe. This saves a lot of updates in all callsites to pass the ct object as parameter to nf_ct_expect_init(). This is a preparation patches for follow up fixes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 917b61fa2042 ("netfilter: ctnetlink: ignore explicit helper on new expectations") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absentQi Tang
[ Upstream commit 35177c6877134a21315f37d57a5577846225623e ] ctnetlink_alloc_expect() allocates expectations from a non-zeroing slab cache via nf_ct_expect_alloc(). When CTA_EXPECT_NAT is not present in the netlink message, saved_addr and saved_proto are never initialized. Stale data from a previous slab occupant can then be dumped to userspace by ctnetlink_exp_dump_expect(), which checks these fields to decide whether to emit CTA_EXPECT_NAT. The safe sibling nf_ct_expect_init(), used by the packet path, explicitly zeroes these fields. Zero saved_addr, saved_proto and dir in the else branch, guarded by IS_ENABLED(CONFIG_NF_NAT) since these fields only exist when NAT is enabled. Confirmed by priming the expect slab with NAT-bearing expectations, freeing them, creating a new expectation without CTA_EXPECT_NAT, and observing that the ctnetlink dump emits a spurious CTA_EXPECT_NAT containing stale data from the prior allocation. Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: nf_conntrack_helper: pass helper to expect cleanupQi Tang
[ Upstream commit a242a9ae58aa46ff7dae51ce64150a93957abe65 ] nf_conntrack_helper_unregister() calls nf_ct_expect_iterate_destroy() to remove expectations belonging to the helper being unregistered. However, it passes NULL instead of the helper pointer as the data argument, so expect_iter_me() never matches any expectation and all of them survive the cleanup. After unregister returns, nfnl_cthelper_del() frees the helper object immediately. Subsequent expectation dumps or packet-driven init_conntrack() calls then dereference the freed exp->helper, causing a use-after-free. Pass the actual helper pointer so expectations referencing it are properly destroyed before the helper object is freed. BUG: KASAN: slab-use-after-free in string+0x38f/0x430 Read of size 1 at addr ffff888003b14d20 by task poc/103 Call Trace: string+0x38f/0x430 vsnprintf+0x3cc/0x1170 seq_printf+0x17a/0x240 exp_seq_show+0x2e5/0x560 seq_read_iter+0x419/0x1280 proc_reg_read+0x1ac/0x270 vfs_read+0x179/0x930 ksys_read+0xef/0x1c0 Freed by task 103: The buggy address is located 32 bytes inside of freed 192-byte region [ffff888003b14d00, ffff888003b14dc0) Fixes: ac7b84839003 ("netfilter: expect: add and use nf_ct_expect_iterate helpers") Signed-off-by: Qi Tang <tpluszz77@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: ipset: use nla_strcmp for IPSET_ATTR_NAME attrFlorian Westphal
[ Upstream commit b7e8590987aa94c9dc51518fad0e58cb887b1db5 ] IPSET_ATTR_NAME and IPSET_ATTR_NAMEREF are of NLA_STRING type, they cannot be treated like a c-string. They either have to be switched to NLA_NUL_STRING, or the compare operations need to use the nla functions. Fixes: f830837f0eed ("netfilter: ipset: list:set set type support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: x_tables: ensure names are nul-terminatedFlorian Westphal
[ Upstream commit a958a4f90ddd7de0800b33ca9d7b886b7d40f74e ] Reject names that lack a \0 character before feeding them to functions that expect c-strings. Fixes tag is the most recent commit that needs this change. Fixes: c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: nfnetlink_log: account for netlink header sizeFlorian Westphal
[ Upstream commit 6d52a4a0520a6696bdde51caa11f2d6821cd0c01 ] This is a followup to an old bug fix: NLMSG_DONE needs to account for the netlink header size, not just the attribute size. This can result in a WARN splat + drop of the netlink message, but other than this there are no ill effects. Fixes: 9dfa1dfe4d5e ("netfilter: nf_log: account for size of NLMSG_DONE attribute") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnetfilter: flowtable: strictly check for maximum number of actionsPablo Neira Ayuso
[ Upstream commit 76522fcdbc3a02b568f5d957f7e66fc194abb893 ] The maximum number of flowtable hardware offload actions in IPv6 is: * ethernet mangling (4 payload actions, 2 for each ethernet address) * SNAT (4 payload actions) * DNAT (4 payload actions) * Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing) for QinQ. * Redirect (1 action) Which makes 17, while the maximum is 16. But act_ct supports for tunnels actions too. Note that payload action operates at 32-bit word level, so mangling an IPv6 address takes 4 payload actions. Update flow_action_entry_next() calls to check for the maximum number of supported actions. While at it, rise the maximum number of actions per flow from 16 to 24 so this works fine with IPv6 setups. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnetfilter: ctnetlink: use netlink policy range checksDavid Carlier
[ Upstream commit 8f15b5071b4548b0aafc03b366eb45c9c6566704 ] Replace manual range and mask validations with netlink policy annotations in ctnetlink code paths, so that the netlink core rejects invalid values early and can generate extack errors. - CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at policy level, removing the manual >= TCP_CONNTRACK_MAX check. - CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE (14). The normal TCP option parsing path already clamps to this value, but the ctnetlink path accepted 0-255, causing undefined behavior when used as a u32 shift count. - CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with CTA_FILTER_F_ALL, removing the manual mask checks. - CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding a new mask define grouping all valid expect flags. Extracted from a broader nf-next patch by Florian Westphal, scoped to ctnetlink for the fixes tree. Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling") Signed-off-by: David Carlier <devnexen@gmail.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnetfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdpWeiming Shi
[ Upstream commit 6a2b724460cb67caed500c508c2ae5cf012e4db4 ] process_sdp() declares union nf_inet_addr rtp_addr on the stack and passes it to the nf_nat_sip sdp_session hook after walking the SDP media descriptions. However rtp_addr is only initialized inside the media loop when a recognized media type with a non-zero port is found. If the SDP body contains no m= lines, only inactive media sections (m=audio 0 ...) or only unrecognized media types, rtp_addr is never assigned. Despite that, the function still calls hooks->sdp_session() with &rtp_addr, causing nf_nat_sdp_session() to format the stale stack value as an IP address and rewrite the SDP session owner and connection lines with it. With CONFIG_INIT_STACK_ALL_ZERO (default on most distributions) this results in the session-level o= and c= addresses being rewritten to 0.0.0.0 for inactive SDP sessions. Without stack auto-init the rewritten address is whatever happened to be on the stack. Fix this by pre-initializing rtp_addr from the session-level connection address (caddr) when available, and tracking via a have_rtp_addr flag whether any valid address was established. Skip the sdp_session hook entirely when no valid address exists. Fixes: 4ab9e64e5e3c ("[NETFILTER]: nf_nat_sip: split up SDP mangling") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnetfilter: nf_conntrack_expect: skip expectations in other netns via procPablo Neira Ayuso
[ Upstream commit 3db5647984de03d9cae0dcddb509b058351f0ee4 ] Skip expectations that do not reside in this netns. Similar to e77e6ff502ea ("netfilter: conntrack: do not dump other netns's conntrack entries via proc"). Fixes: 9b03f38d0487 ("netfilter: netns nf_conntrack: per-netns expectations") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnetfilter: nft_set_rbtree: revisit array resize logicPablo Neira Ayuso
[ Upstream commit fafdd92b9e30fe057740c5bb5cd4f92ecea9bf26 ] Chris Arges reports high memory consumption with thousands of containers, this patch revisits the array allocation logic. For anonymous sets, start by 16 slots (which takes 256 bytes on x86_64). Expand it by x2 until threshold of 512 slots is reached, over that threshold, expand it by x1.5. For non-anonymous set, start by 1024 slots in the array (which takes 16 Kbytes initially on x86_64). Expand it by x1.5. Use set->ndeact to subtract deactivated elements when calculating the number of the slots in the array, otherwise the array size array gets increased artifically. Add special case shrink logic to deal with flush set too. The shrink logic is skipped by anonymous sets. Use check_add_overflow() to calculate the new array size. Add a WARN_ON_ONCE check to make sure elements fit into the new array size. Reported-by: Chris Arges <carges@cloudflare.com> Fixes: 7e43e0a1141d ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnetfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOADWeiming Shi
[ Upstream commit 52025ebaa29f4eb4ed8bf92ce83a68f24ab7fdf7 ] __build_packet_message() manually constructs the NFULA_PAYLOAD netlink attribute using skb_put() and skb_copy_bits(), bypassing the standard nla_reserve()/nla_put() helpers. While nla_total_size(data_len) bytes are allocated (including NLA alignment padding), only data_len bytes of actual packet data are copied. The trailing nla_padlen(data_len) bytes (1-3 when data_len is not 4-byte aligned) are never initialized, leaking stale heap contents to userspace via the NFLOG netlink socket. Replace the manual attribute construction with nla_reserve(), which handles the tailroom check, header setup, and padding zeroing via __nla_reserve(). The subsequent skb_copy_bits() fills in the payload data on top of the properly initialized attribute. Fixes: df6fb868d611 ("[NETFILTER]: nfnetlink: convert to generic netlink attribute functions") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25nfnetlink_osf: validate individual option lengths in fingerprintsWeiming Shi
[ Upstream commit dbdfaae9609629a9569362e3b8f33d0a20fd783c ] nfnl_osf_add_callback() validates opt_num bounds and string NUL-termination but does not check individual option length fields. A zero-length option causes nf_osf_match_one() to enter the option matching loop even when foptsize sums to zero, which matches packets with no TCP options where ctx->optp is NULL: Oops: general protection fault KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:nf_osf_match_one (net/netfilter/nfnetlink_osf.c:98) Call Trace: nf_osf_match (net/netfilter/nfnetlink_osf.c:227) xt_osf_match_packet (net/netfilter/xt_osf.c:32) ipt_do_table (net/ipv4/netfilter/ip_tables.c:293) nf_hook_slow (net/netfilter/core.c:623) ip_local_deliver (net/ipv4/ip_input.c:262) ip_rcv (net/ipv4/ip_input.c:573) Additionally, an MSS option (kind=2) with length < 4 causes out-of-bounds reads when nf_osf_match_one() unconditionally accesses optp[2] and optp[3] for MSS value extraction. While RFC 9293 section 3.2 specifies that the MSS option is always exactly 4 bytes (Kind=2, Length=4), the check uses "< 4" rather than "!= 4" because lengths greater than 4 do not cause memory safety issues -- the buffer is guaranteed to be at least foptsize bytes by the ctx->optsize == foptsize check. Reject fingerprints where any option has zero length, or where an MSS option has length less than 4, at add time rather than trusting these values in the packet matching hot path. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nf_tables: release flowtable after rcu grace period on errorPablo Neira Ayuso
[ Upstream commit d73f4b53aaaea4c95f245e491aa5eeb8a21874ce ] Call synchronize_rcu() after unregistering the hooks from error path, since a hook that already refers to this flowtable can be already registered, exposing this flowtable to packet path and nfnetlink_hook control plane. This error path is rare, it should only happen by reaching the maximum number hooks or by failing to set up to hardware offload, just call synchronize_rcu(). There is a check for already used device hooks by different flowtable that could result in EEXIST at this late stage. The hook parser can be updated to perform this check earlier to this error path really becomes rarely exercised. Uncovered by KASAN reported as use-after-free from nfnetlink_hook path when dumping hooks. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: bpf: defer hook memory release until rcu readers are doneFlorian Westphal
[ Upstream commit 24f90fa3994b992d1a09003a3db2599330a5232a ] Yiming Qian reports UaF when concurrent process is dumping hooks via nfnetlink_hooks: BUG: KASAN: slab-use-after-free in nfnl_hook_dump_one.isra.0+0xe71/0x10f0 Read of size 8 at addr ffff888003edbf88 by task poc/79 Call Trace: <TASK> nfnl_hook_dump_one.isra.0+0xe71/0x10f0 netlink_dump+0x554/0x12b0 nfnl_hook_get+0x176/0x230 [..] Defer release until after concurrent readers have completed. Reported-by: Yiming Qian <yimingqian591@gmail.com> Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nf_conntrack_h323: check for zero length in DecodeQ931()Jenny Guanni Qu
[ Upstream commit f173d0f4c0f689173f8cdac79991043a4a89bf66 ] In DecodeQ931(), the UserUserIE code path reads a 16-bit length from the packet, then decrements it by 1 to skip the protocol discriminator byte before passing it to DecodeH323_UserInformation(). If the encoded length is 0, the decrement wraps to -1, which is then passed as a large value to the decoder, leading to an out-of-bounds read. Add a check to ensure len is positive after the decrement. Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Tested-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: xt_time: use unsigned int for monthday bit shiftJenny Guanni Qu
[ Upstream commit 00050ec08cecfda447e1209b388086d76addda3a ] The monthday field can be up to 31, and shifting a signed integer 1 by 31 positions (1 << 31) is undefined behavior in C, as the result overflows a 32-bit signed int. Use 1U to ensure well-defined behavior for all valid monthday values. Change the weekday shift to 1U as well for consistency. Fixes: ee4411a1b1e0 ("[NETFILTER]: x_tables: add xt_time match") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Tested-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: xt_CT: drop pending enqueued packets on template removalPablo Neira Ayuso
[ Upstream commit f62a218a946b19bb59abdd5361da85fa4606b96b ] Templates refer to objects that can go away while packets are sitting in nfqueue refer to: - helper, this can be an issue on module removal. - timeout policy, nfnetlink_cttimeout might remove it. The use of templates with zone and event cache filter are safe, since this just copies values. Flush these enqueued packets in case the template rule gets removed. Fixes: 24de58f46516 ("netfilter: xt_CT: allow to attach timeout policy + glue code") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nft_ct: drop pending enqueued packets on removalPablo Neira Ayuso
[ Upstream commit 36eae0956f659e48d5366d9b083d9417f3263ddc ] Packets sitting in nfqueue might hold a reference to: - templates that specify the conntrack zone, because a percpu area is used and module removal is possible. - conntrack timeout policies and helper, where object removal leave a stale reference. Since these objects can just go away, drop enqueued packets to avoid stale reference to them. If there is a need for finer grain removal, this logic can be revisited to make selective packet drop upon dependencies. Fixes: 7e0b2b57f01d ("netfilter: nft_ct: add ct timeout support") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25nf_tables: nft_dynset: fix possible stateful expression memleak in error pathPablo Neira Ayuso
[ Upstream commit 0548a13b5a145b16e4da0628b5936baf35f51b43 ] If cloning the second stateful expression in the element via GFP_ATOMIC fails, then the first stateful expression remains in place without being released.   unreferenced object (percpu) 0x607b97e9cab8 (size 16):     comm "softirq", pid 0, jiffies 4294931867     hex dump (first 16 bytes on cpu 3):       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     backtrace (crc 0):       pcpu_alloc_noprof+0x453/0xd80       nft_counter_clone+0x9c/0x190 [nf_tables]       nft_expr_clone+0x8f/0x1b0 [nf_tables]       nft_dynset_new+0x2cb/0x5f0 [nf_tables]       nft_rhash_update+0x236/0x11c0 [nf_tables]       nft_dynset_eval+0x11f/0x670 [nf_tables]       nft_do_chain+0x253/0x1700 [nf_tables]       nft_do_chain_ipv4+0x18d/0x270 [nf_tables]       nf_hook_slow+0xaa/0x1e0       ip_local_deliver+0x209/0x330 Fixes: 563125a73ac3 ("netfilter: nftables: generalize set extension to support for several expressions") Reported-by: Gurpreet Shergill <giki.shergill@proton.me> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nf_conntrack_h323: fix OOB read in decode_int() CONS caseJenny Guanni Qu
[ Upstream commit 1e3a3593162c96e8a8de48b1e14f60c3b57fca8a ] In decode_int(), the CONS case calls get_bits(bs, 2) to read a length value, then calls get_uint(bs, len) without checking that len bytes remain in the buffer. The existing boundary check only validates the 2 bits for get_bits(), not the subsequent 1-4 bytes that get_uint() reads. This allows a malformed H.323/RAS packet to cause a 1-4 byte slab-out-of-bounds read. Add a boundary check for len bytes after get_bits() and before get_uint(). Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nf_flow_table_ip: reset mac header before vlan pushEric Woudstra
[ Upstream commit a3aca98aec9a278ee56da4f8013bfa1dd1a1c298 ] With double vlan tagged packets in the fastpath, getting the error: skb_vlan_push got skb with skb->data not at mac header (offset 18) Call skb_reset_mac_header() before calling skb_vlan_push(). Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path") Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: nf_conntrack_sip: fix Content-Length u32 truncation in sip_help_tcp()Lukas Johannes Möller
[ Upstream commit fbce58e719a17aa215c724473fd5baaa4a8dc57c ] sip_help_tcp() parses the SIP Content-Length header with simple_strtoul(), which returns unsigned long, but stores the result in unsigned int clen. On 64-bit systems, values exceeding UINT_MAX are silently truncated before computing the SIP message boundary. For example, Content-Length 4294967328 (2^32 + 32) is truncated to 32, causing the parser to miscalculate where the current message ends. The loop then treats trailing data in the TCP segment as a second SIP message and processes it through the SDP parser. Fix this by changing clen to unsigned long to match the return type of simple_strtoul(), and reject Content-Length values that exceed the remaining TCP payload length. Fixes: f5b321bd37fb ("netfilter: nf_conntrack_sip: add TCP support") Signed-off-by: Lukas Johannes Möller <research@johannes-moeller.dev> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: conntrack: add missing netlink policy validationsFlorian Westphal
[ Upstream commit f900e1d77ee0ef87bfb5ab3fe60f0b3d8ad5ba05 ] Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink. These attributes are used by the kernel without any validation. Extend the netlink policies accordingly. Quoting the reporter: nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE value directly to ct->proto.sctp.state without checking that it is within the valid range. [..] and: ... with exp->dir = 100, the access at ct->master->tuplehash[100] reads 5600 bytes past the start of a 320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by UBSAN. Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations") Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25netfilter: ctnetlink: fix use-after-free in ctnetlink_dump_exp_ct()Hyunwoo Kim
[ Upstream commit 5cb81eeda909dbb2def209dd10636b51549a3f8a ] ctnetlink_dump_exp_ct() stores a conntrack pointer in cb->data for the netlink dump callback ctnetlink_exp_ct_dump_table(), but drops the conntrack reference immediately after netlink_dump_start(). When the dump spans multiple rounds, the second recvmsg() triggers the dump callback which dereferences the now-freed conntrack via nfct_help(ct), leading to a use-after-free on ct->ext. The bug is that the netlink_dump_control has no .start or .done callbacks to manage the conntrack reference across dump rounds. Other dump functions in the same file (e.g. ctnetlink_get_conntrack) properly use .start/.done callbacks for this purpose. Fix this by adding .start and .done callbacks that hold and release the conntrack reference for the duration of the dump, and move the nfct_help() call after the cb->args[0] early-return check in the dump callback to avoid dereferencing ct->ext unnecessarily. BUG: KASAN: slab-use-after-free in ctnetlink_exp_ct_dump_table+0x4f/0x2e0 Read of size 8 at addr ffff88810597ebf0 by task ctnetlink_poc/133 CPU: 1 UID: 0 PID: 133 Comm: ctnetlink_poc Not tainted 7.0.0-rc2+ #3 PREEMPTLAZY Call Trace: <TASK> ctnetlink_exp_ct_dump_table+0x4f/0x2e0 netlink_dump+0x333/0x880 netlink_recvmsg+0x3e2/0x4b0 ? aa_sk_perm+0x184/0x450 sock_recvmsg+0xde/0xf0 Allocated by task 133: kmem_cache_alloc_noprof+0x134/0x440 __nf_conntrack_alloc+0xa8/0x2b0 ctnetlink_create_conntrack+0xa1/0x900 ctnetlink_new_conntrack+0x3cf/0x7d0 nfnetlink_rcv_msg+0x48e/0x510 netlink_rcv_skb+0xc9/0x1f0 nfnetlink_rcv+0xdb/0x220 netlink_unicast+0x3ec/0x590 netlink_sendmsg+0x397/0x690 __sys_sendmsg+0xf4/0x180 Freed by task 0: slab_free_after_rcu_debug+0xad/0x1e0 rcu_core+0x5c3/0x9c0 Fixes: e844a928431f ("netfilter: ctnetlink: allow to dump expectation per master conntrack") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: xt_IDLETIMER: reject rev0 reuse of ALARM timer labelsYuan Tan
[ Upstream commit 329f0b9b48ee6ab59d1ab72fef55fe8c6463a6cf ] IDLETIMER revision 0 rules reuse existing timers by label and always call mod_timer() on timer->timer. If the label was created first by revision 1 with XT_IDLETIMER_ALARM, the object uses alarm timer semantics and timer->timer is never initialized. Reusing that object from revision 0 causes mod_timer() on an uninitialized timer_list, triggering debugobjects warnings and possible panic when panic_on_warn=1. Fix this by rejecting revision 0 rule insertion when an existing timer with the same label is of ALARM type. Fixes: 68983a354a65 ("netfilter: xtables: Add snapshot of hardidletimer target") Co-developed-by: Yifan Wu <yifanwucs@gmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: nfnetlink_cthelper: fix OOB read in nfnl_cthelper_dump_table()Hyunwoo Kim
[ Upstream commit 6dcee8496d53165b2d8a5909b3050b62ae71fe89 ] nfnl_cthelper_dump_table() has a 'goto restart' that jumps to a label inside the for loop body. When the "last" helper saved in cb->args[1] is deleted between dump rounds, every entry fails the (cur != last) check, so cb->args[1] is never cleared. The for loop finishes with cb->args[0] == nf_ct_helper_hsize, and the 'goto restart' jumps back into the loop body bypassing the bounds check, causing an 8-byte out-of-bounds read on nf_ct_helper_hash[nf_ct_helper_hsize]. The 'goto restart' block was meant to re-traverse the current bucket when "last" is no longer found, but it was placed after the for loop instead of inside it. Move the block into the for loop body so that the restart only occurs while cb->args[0] is still within bounds. BUG: KASAN: slab-out-of-bounds in nfnl_cthelper_dump_table+0x9f/0x1b0 Read of size 8 at addr ffff888104ca3000 by task poc_cthelper/131 Call Trace: nfnl_cthelper_dump_table+0x9f/0x1b0 netlink_dump+0x333/0x880 netlink_recvmsg+0x3e2/0x4b0 sock_recvmsg+0xde/0xf0 __sys_recvfrom+0x150/0x200 __x64_sys_recvfrom+0x76/0x90 do_syscall_64+0xc3/0x6e0 Allocated by task 1: __kvmalloc_node_noprof+0x21b/0x700 nf_ct_alloc_hashtable+0x65/0xd0 nf_conntrack_helper_init+0x21/0x60 nf_conntrack_init_start+0x18d/0x300 nf_conntrack_standalone_init+0x12/0xc0 Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: nfnetlink_queue: fix entry leak in bridge verdict error pathHyunwoo Kim
[ Upstream commit f1ba83755d81c6fc66ac7acd723d238f974091e9 ] nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue entry from the queue data structures, taking ownership of the entry. For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN attributes. If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN present but NFQA_VLAN_TCI missing), the function returns immediately without freeing the dequeued entry or its sk_buff. This leaks the nf_queue_entry, its associated sk_buff, and all held references (net_device refcounts, struct net refcount). Repeated triggering exhausts kernel memory. Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict on the error path, consistent with other error handling in this file. Fixes: 8d45ff22f1b4 ("netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR") Reviewed-by: David Dull <monderasdor@gmail.com> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: x_tables: guard option walkers against 1-byte tail readsDavid Dull
[ Upstream commit cfe770220ac2dbd3e104c6b45094037455da81d4 ] When the last byte of options is a non-single-byte option kind, walkers that advance with i += op[i + 1] ? : 1 can read op[i + 1] past the end of the option area. Add an explicit i == optlen - 1 check before dereferencing op[i + 1] in xt_tcpudp and xt_dccp option walkers. Fixes: 2e4e6a17af35 ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables") Signed-off-by: David Dull <monderasdor@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()Jenny Guanni Qu
[ Upstream commit d6d8cd2db236a9dd13dbc2d05843b3445cc964b5 ] pipapo_drop() passes rulemap[i + 1].n to pipapo_unmap() as the to_offset argument on every iteration, including the last one where i == m->field_count - 1. This reads one element past the end of the stack-allocated rulemap array (declared as rulemap[NFT_PIPAPO_MAX_FIELDS] with NFT_PIPAPO_MAX_FIELDS == 16). Although pipapo_unmap() returns early when is_last is true without using the to_offset value, the argument is evaluated at the call site before the function body executes, making this a genuine out-of-bounds stack read confirmed by KASAN: BUG: KASAN: stack-out-of-bounds in pipapo_drop+0x50c/0x57c [nf_tables] Read of size 4 at addr ffff8000810e71a4 This frame has 1 object: [32, 160) 'rulemap' The buggy address is at offset 164 -- exactly 4 bytes past the end of the rulemap array. Pass 0 instead of rulemap[i + 1].n on the last iteration to avoid the out-of-bounds read. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: nf_tables: always walk all pending catchall elementsFlorian Westphal
[ Upstream commit 7cb9a23d7ae40a702577d3d8bacb7026f04ac2a9 ] During transaction processing we might have more than one catchall element: 1 live catchall element and 1 pending element that is coming as part of the new batch. If the map holding the catchall elements is also going away, its required to toggle all catchall elements and not just the first viable candidate. Otherwise, we get: WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404 RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables] [..] __nft_set_elem_destroy+0x106/0x380 [nf_tables] nf_tables_abort_release+0x348/0x8d0 [nf_tables] nf_tables_abort+0xcf2/0x3ac0 [nf_tables] nfnetlink_rcv_batch+0x9c9/0x20e0 [..] Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19netfilter: nf_tables: Fix for duplicate device in netdev hooksPhil Sutter
[ Upstream commit b7cdc5a97d02c943f4bdde4d5767ad0c13cad92b ] When handling NETDEV_REGISTER notification, duplicate device registration must be avoided since the device may have been added by nft_netdev_hook_alloc() already when creating the hook. Suggested-by: Florian Westphal <fw@strlen.de> Reported-by: syzbot+bb9127e278fa198e110c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bb9127e278fa198e110c Fixes: a331b78a5525 ("netfilter: nf_tables: Respect NETDEV_REGISTER events") Tested-by: Helen Koike <koike@igalia.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12netfilter: nft_set_pipapo: split gc into unlink and reclaim phaseFlorian Westphal
[ Upstream commit 9df95785d3d8302f7c066050117b04cd3c2048c2 ] Yiming Qian reports Use-after-free in the pipapo set type: Under a large number of expired elements, commit-time GC can run for a very long time in a non-preemptible context, triggering soft lockup warnings and RCU stall reports (local denial of service). We must split GC in an unlink and a reclaim phase. We cannot queue elements for freeing until pointers have been swapped. Expired elements are still exposed to both the packet path and userspace dumpers via the live copy of the data structure. call_rcu() does not protect us: dump operations or element lookups starting after call_rcu has fired can still observe the free'd element, unless the commit phase has made enough progress to swap the clone and live pointers before any new reader has picked up the old version. This a similar approach as done recently for the rbtree backend in commit 35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert"). Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12netfilter: nf_tables: clone set on flush onlyPablo Neira Ayuso
[ Upstream commit fb7fb4016300ac622c964069e286dc83166a5d52 ] Syzbot with fault injection triggered a failing memory allocation with GFP_KERNEL which results in a WARN splat: iter.err WARNING: net/netfilter/nf_tables_api.c:845 at nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845, CPU#0: syz.0.17/5992 Modules linked in: CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026 RIP: 0010:nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845 Code: 8b 05 86 5a 4e 09 48 3b 84 24 a0 00 00 00 75 62 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 63 6d fa f7 90 <0f> 0b 90 43 +80 7c 35 00 00 0f 85 23 fe ff ff e9 26 fe ff ff 89 d9 RSP: 0018:ffffc900045af780 EFLAGS: 00010293 RAX: ffffffff89ca45bd RBX: 00000000fffffff4 RCX: ffff888028111e40 RDX: 0000000000000000 RSI: 00000000fffffff4 RDI: 0000000000000000 RBP: ffffc900045af870 R08: 0000000000400dc0 R09: 00000000ffffffff R10: dffffc0000000000 R11: fffffbfff1d141db R12: ffffc900045af7e0 R13: 1ffff920008b5f24 R14: dffffc0000000000 R15: ffffc900045af920 FS: 000055557a6a5500(0000) GS:ffff888125496000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb5ea271fc0 CR3: 000000003269e000 CR4: 00000000003526f0 Call Trace: <TASK> __nft_release_table+0xceb/0x11f0 net/netfilter/nf_tables_api.c:12115 nft_rcv_nl_event+0xc25/0xdb0 net/netfilter/nf_tables_api.c:12187 notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85 blocking_notifier_call_chain+0x6a/0x90 kernel/notifier.c:380 netlink_release+0x123b/0x1ad0 net/netlink/af_netlink.c:761 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 Restrict set clone to the flush set command in the preparation phase. Add NFT_ITER_UPDATE_CLONE and use it for this purpose, update the rbtree and pipapo backends to only clone the set when this iteration type is used. As for the existing NFT_ITER_UPDATE type, update the pipapo backend to use the existing set clone if available, otherwise use the existing set representation. After this update, there is no need to clone a set that is being deleted, this includes bound anonymous set. An alternative approach to NFT_ITER_UPDATE_CLONE is to add a .clone interface and call it from the flush set path. Reported-by: syzbot+4924a0edc148e8b4b342@syzkaller.appspotmail.com Fixes: 3f1d886cc7c3 ("netfilter: nft_set_pipapo: move cloning of match info to insert/removal path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12netfilter: nf_tables: unconditionally bump set->nelems before insertionPablo Neira Ayuso
[ Upstream commit def602e498a4f951da95c95b1b8ce8ae68aa733a ] In case that the set is full, a new element gets published then removed without waiting for the RCU grace period, while RCU reader can be walking over it already. To address this issue, add the element transaction even if set is full, but toggle the set_full flag to report -ENFILE so the abort path safely unwinds the set to its previous state. As for element updates, decrement set->nelems to restore it. A simpler fix is to call synchronize_rcu() in the error path. However, with a large batch adding elements to already maxed-out set, this could cause noticeable slowdown of such batches. Fixes: 35d0ac9070ef ("netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL") Reported-by: Inseo An <y0un9sa@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12Revert "netfilter: nft_set_rbtree: validate open interval overlap"Greg Kroah-Hartman
This reverts commit 6db2be971e3d70c9e3f85d39eff7103c2ee2f579 which is commit 648946966a08e4cb1a71619e3d1b12bd7642de7b upstream. It is causing netfilter issues, so revert it for now. Link: https://lore.kernel.org/r/aaeEd8UqYQ33Af7_@chamomile Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04netfilter: nf_conntrack_h323: fix OOB read in decode_choice()Vahagn Vardanian
[ Upstream commit baed0d9ba91d4f390da12d5039128ee897253d60 ] In decode_choice(), the boundary check before get_len() uses the variable `len`, which is still 0 from its initialization at the top of the function: unsigned int type, ext, len = 0; ... if (ext || (son->attr & OPEN)) { BYTE_ALIGN(bs); if (nf_h323_error_boundary(bs, len, 0)) /* len is 0 here */ return H323_ERROR_BOUND; len = get_len(bs); /* OOB read */ When the bitstream is exactly consumed (bs->cur == bs->end), the check nf_h323_error_boundary(bs, 0, 0) evaluates to (bs->cur + 0 > bs->end), which is false. The subsequent get_len() call then dereferences *bs->cur++, reading 1 byte past the end of the buffer. If that byte has bit 7 set, get_len() reads a second byte as well. This can be triggered remotely by sending a crafted Q.931 SETUP message with a User-User Information Element containing exactly 2 bytes of PER-encoded data ({0x08, 0x00}) to port 1720 through a firewall with the nf_conntrack_h323 helper active. The decoder fully consumes the PER buffer before reaching this code path, resulting in a 1-2 byte heap-buffer-overflow read confirmed by AddressSanitizer. Fix this by checking for 2 bytes (the maximum that get_len() may read) instead of the uninitialized `len`. This matches the pattern used at every other get_len() call site in the same file, where the caller checks for 2 bytes of available data before calling get_len(). Fixes: ec8a8f3c31dd ("netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well") Signed-off-by: Vahagn Vardanian <vahagn@redrays.io> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260225130619.1248-2-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04netfilter: xt_tcpmss: check remaining length before reading optlenFlorian Westphal
[ Upstream commit 735ee8582da3d239eb0c7a53adca61b79fb228b3 ] Quoting reporter: In net/netfilter/xt_tcpmss.c (lines 53-68), the TCP option parser reads op[i+1] directly without validating the remaining option length. If the last byte of the option field is not EOL/NOP (0/1), the code attempts to index op[i+1]. In the case where i + 1 == optlen, this causes an out-of-bounds read, accessing memory past the optlen boundary (either reading beyond the stack buffer _opt or the following payload). Reported-by: sungzii <sungzii@pm.me> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04netfilter: nf_conntrack: Add allow_clash to generic protocol handlerYuto Hamaguchi
[ Upstream commit 8a49fc8d8a3e83dc51ec05bcd4007bdea3c56eec ] The upstream commit, 71d8c47fc653711c41bc3282e5b0e605b3727956 ("netfilter: conntrack: introduce clash resolution on insertion race"), sets allow_clash=true in the UDP/UDPLITE protocol handler but does not set it in the generic protocol handler. As a result, packets composed of connectionless protocols at each layer, such as UDP over IP-in-IP, still drop packets due to conflicts during conntrack insertion. To resolve this, this patch sets allow_clash in the nf_conntrack_l4proto_generic. Signed-off-by: Yuto Hamaguchi <Hamaguchi.Yuto@da.MitsubishiElectric.co.jp> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27netfilter: nf_tables: add .abort_skip_removal flag for set typesPablo Neira Ayuso
commit f175b46d9134f708358b5404730c6dfa200fbf3c upstream. The pipapo set backend is the only user of the .abort interface so far. To speed up pipapo abort path, removals are skipped. The follow up patch updates the rbtree to use to build an array of ordered elements, then use binary search. This needs a new .abort interface but, unlike pipapo, it also need to undo/remove elements. Add a flag and use it from the pipapo set backend. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Cc: "Kris Karas (Bug Reporting)" <bugs-a21@moonlit-rail.com> Cc: Genes Lists <lists@sapience.com> Cc: Philip Müller <philm@manjaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-26netfilter: nf_tables: fix use-after-free in nf_tables_addchain()Inseo An
[ Upstream commit 71e99ee20fc3f662555118cf1159443250647533 ] nf_tables_addchain() publishes the chain to table->chains via list_add_tail_rcu() (in nft_chain_add()) before registering hooks. If nf_tables_register_hook() then fails, the error path calls nft_chain_del() (list_del_rcu()) followed by nf_tables_chain_destroy() with no RCU grace period in between. This creates two use-after-free conditions: 1) Control-plane: nf_tables_dump_chains() traverses table->chains under rcu_read_lock(). A concurrent dump can still be walking the chain when the error path frees it. 2) Packet path: for NFPROTO_INET, nf_register_net_hook() briefly installs the IPv4 hook before IPv6 registration fails. Packets entering nft_do_chain() via the transient IPv4 hook can still be dereferencing chain->blob_gen_X when the error path frees the chain. Add synchronize_rcu() between nft_chain_del() and the chain destroy so that all RCU readers -- both dump threads and in-flight packet evaluation -- have finished before the chain is freed. Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain") Signed-off-by: Inseo An <y0un9sa@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26ipvs: do not keep dest_dst if dev is going downJulian Anastasov
[ Upstream commit 8fde939b0206afc1d5846217a01a16b9bc8c7896 ] There is race between the netdev notifier ip_vs_dst_event() and the code that caches dst with dev that is going down. As the FIB can be notified for the closed device after our handler finishes, it is possible valid route to be returned and cached resuling in a leaked dev reference until the dest is not removed. To prevent new dest_dst to be attached to dest just after the handler dropped the old one, add a netif_running() check to make sure the notifier handler is not currently running for device that is closing. Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26ipvs: skip ipv6 extension headers for csum checksJulian Anastasov
[ Upstream commit 05cfe9863ef049d98141dc2969eefde72fb07625 ] Protocol checksum validation fails for IPv6 if there are extension headers before the protocol header. iph->len already contains its offset, so use it to fix the problem. Fixes: 2906f66a5682 ("ipvs: SCTP Trasport Loadbalancing Support") Fixes: 0bbdd42b7efa ("IPVS: Extend protocol DNAT/SNAT and state handlers") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26netfilter: nf_conntrack_h323: don't pass uninitialised l3num valueFlorian Westphal
[ Upstream commit a6d28eb8efe96b3e35c92efdf1bfacb0cccf541f ] Mihail Milev reports: Error: UNINIT (CWE-457): net/netfilter/nf_conntrack_h323_main.c:1189:2: var_decl: Declaring variable "tuple" without initializer. net/netfilter/nf_conntrack_h323_main.c:1197:2: uninit_use_in_call: Using uninitialized value "tuple.src.l3num" when calling "__nf_ct_expect_find". net/netfilter/nf_conntrack_expect.c:142:2: read_value: Reading value "tuple->src.l3num" when calling "nf_ct_expect_dst_hash". 1195| tuple.dst.protonum = IPPROTO_TCP; 1196| 1197|-> exp = __nf_ct_expect_find(net, nf_ct_zone(ct), &tuple); 1198| if (exp && exp->master == ct) 1199| return exp; Switch this to a C99 initialiser and set the l3num value. Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>