| Age | Commit message (Collapse) | Author |
|
[ Upstream commit 6a65c0cb0ff20b3cbc5f1c87b37dd22cdde14a1c ]
tipc_aead_users_dec() calls rcu_dereference(aead) twice: once to store
in 'tmp' for the NULL check, and again inside the atomic_add_unless()
call.
Use the already-dereferenced 'tmp' pointer consistently, matching the
correct pattern used in tipc_aead_users_inc() and tipc_aead_users_set().
Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
Link: https://patch.msgid.link/20260203145621.17399-1-git@danielhodges.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit baed0d9ba91d4f390da12d5039128ee897253d60 ]
In decode_choice(), the boundary check before get_len() uses the
variable `len`, which is still 0 from its initialization at the top of
the function:
unsigned int type, ext, len = 0;
...
if (ext || (son->attr & OPEN)) {
BYTE_ALIGN(bs);
if (nf_h323_error_boundary(bs, len, 0)) /* len is 0 here */
return H323_ERROR_BOUND;
len = get_len(bs); /* OOB read */
When the bitstream is exactly consumed (bs->cur == bs->end), the check
nf_h323_error_boundary(bs, 0, 0) evaluates to (bs->cur + 0 > bs->end),
which is false. The subsequent get_len() call then dereferences
*bs->cur++, reading 1 byte past the end of the buffer. If that byte
has bit 7 set, get_len() reads a second byte as well.
This can be triggered remotely by sending a crafted Q.931 SETUP message
with a User-User Information Element containing exactly 2 bytes of
PER-encoded data ({0x08, 0x00}) to port 1720 through a firewall with
the nf_conntrack_h323 helper active. The decoder fully consumes the
PER buffer before reaching this code path, resulting in a 1-2 byte
heap-buffer-overflow read confirmed by AddressSanitizer.
Fix this by checking for 2 bytes (the maximum that get_len() may read)
instead of the uninitialized `len`. This matches the pattern used at
every other get_len() call site in the same file, where the caller
checks for 2 bytes of available data before calling get_len().
Fixes: ec8a8f3c31dd ("netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well")
Signed-off-by: Vahagn Vardanian <vahagn@redrays.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260225130619.1248-2-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7aa767d0d3d04e50ae94e770db7db8197f666970 ]
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.
These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.
Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).
In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.
The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:
-------------------------------------------------
| GSO super frame 1 | GSO super frame 2 |
|-----------------------------------------------|
| seg | seg | seg | seg | seg | seg | seg | seg |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-------------------------------------------------
x ok ok <ok>| ok ok ok <x>
\\
snd_nxt
"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.
So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).
Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.
We have multiple ways to fix this.
1) make veth not return an error when it lost a packet.
While this is what I think we did in the past, the issue keeps
reappearing and it's annoying to debug. The game of whack
a mole is not great.
2) fix the damn return codes
We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
documentation, so maybe we should make the return code from
ndo_start_xmit() a boolean. I like that the most, but perhaps
some ancient, not-really-networking protocol would suffer.
3) make TCP ignore the errors
It is not entirely clear to me what benefit TCP gets from
interpreting the result of ip_queue_xmit()? Specifically once
the connection is established and we're pushing data - packet
loss is just packet loss?
4) this fix
Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
We already always return OK in the TCQ_F_CAN_BYPASS case.
In the Qdisc-less case let's be a bit more conservative and only
mask the GSO errors. This path is taken by non-IP-"networks"
like CAN, MCTP etc, so we could regress some ancient thing.
This is the simplest, but also maybe the hackiest fix?
Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).
Link: https://lore.kernel.org/CANn89iJcLepEin7EtBETrZ36bjoD9LrR=k4cfwWh046GB+4f9A@mail.gmail.com
Fixes: 1f59533f9ca5 ("qdisc: validate frames going through the direct_xmit path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260223235100.108939-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 138d7eca445ef37a0333425d269ee59900ca1104 ]
This adds a check for encryption key size upon receiving
L2CAP_LE_CONN_REQ which is required by L2CAP/LE/CFC/BV-15-C which
expects L2CAP_CR_LE_BAD_KEY_SIZE.
Link: https://lore.kernel.org/linux-bluetooth/5782243.rdbgypaU67@n9w6sw14/
Fixes: 27e2d4c8d28b ("Bluetooth: Add basic LE L2CAP connect request receiving support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7af8479d9eb4319b4ba7b47a8c4d2c55af1c31e1 ]
l2cap_check_enc_key_size shall check the security level of the
l2cap_chan rather than the hci_conn since for incoming connection
request that may be different as hci_conn may already been
encrypted using a different security level.
Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 138d7eca445e ("Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 522e9ed157e3c21b4dd623c79967f72c21e45b78 ]
This is required for passing GAP/SEC/SEM/BI-04-C PTS test case:
Security Mode 4 Level 4, Responder - Invalid Encryption Key Size
- 128 bit
This tests the security key with size from 1 to 15 bytes while the
Security Mode 4 Level 4 requests 16 bytes key size.
Currently PTS fails with the following logs:
- expected:Connection Response:
Code: [3 (0x03)] Code
Identifier: (lt)WildCard: Exists(gt)
Length: [8 (0x0008)]
Destination CID: (lt)WildCard: Exists(gt)
Source CID: [64 (0x0040)]
Result: [3 (0x0003)] Connection refused - Security block
Status: (lt)WildCard: Exists(gt),
but received:Connection Response:
Code: [3 (0x03)] Code
Identifier: [1 (0x01)]
Length: [8 (0x0008)]
Destination CID: [64 (0x0040)]
Source CID: [64 (0x0040)]
Result: [0 (0x0000)] Connection Successful
Status: [0 (0x0000)] No further information available
And HCI logs:
< HCI Command: Read Encrypti.. (0x05|0x0008) plen 2
Handle: 14 Address: 00:1B:DC:F2:24:10 (Vencer Co., Ltd.)
> HCI Event: Command Complete (0x0e) plen 7
Read Encryption Key Size (0x05|0x0008) ncmd 1
Status: Success (0x00)
Handle: 14 Address: 00:1B:DC:F2:24:10 (Vencer Co., Ltd.)
Key size: 7
> ACL Data RX: Handle 14 flags 0x02 dlen 12
L2CAP: Connection Request (0x02) ident 1 len 4
PSM: 4097 (0x1001)
Source CID: 64
< ACL Data TX: Handle 14 flags 0x00 dlen 16
L2CAP: Connection Response (0x03) ident 1 len 8
Destination CID: 64
Source CID: 64
Result: Connection successful (0x0000)
Status: No further information available (0x0000)
Fixes: 288c06973daa ("Bluetooth: Enforce key size of 16 bytes on FIPS level")
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 138d7eca445e ("Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 288c06973daae4637f25a0d1bdaf65fdbf8455f9 ]
According to the spec Ver 5.2, Vol 3, Part C, Sec 5.2.2.8:
Device in security mode 4 level 4 shall enforce:
128-bit equivalent strength for link and encryption keys required
using FIPS approved algorithms (E0 not allowed, SAFER+ not allowed,
and P-192 not allowed; encryption key not shortened)
This patch rejects connection with key size below 16 for FIPS
level services.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Stable-dep-of: 138d7eca445e ("Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 05761c2c2b5bfec85c47f60c903c461e9b56cf87 ]
Similar to 03dba9cea72f ("Bluetooth: L2CAP: Fix not responding with
L2CAP_CR_LE_ENCRYPTION") the result code L2CAP_CR_LE_ENCRYPTION shall
be used when BT_SECURITY_MEDIUM is set since that means security mode 2
which mean it doesn't require authentication which results in
qualification test L2CAP/ECFC/BV-32-C failing.
Link: https://github.com/bluez/bluez/issues/1871
Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7accb1c4321acb617faf934af59d928b0b047e2b ]
This fixes responding with an invalid result caused by checking the
wrong size of CID which should have been (cmd_len - sizeof(*req)) and
on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which
is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C:
> ACL Data RX: Handle 64 flags 0x02 dlen 14
LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
MTU: 64
MPS: 64
Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
! Result: Reserved (0x000c)
Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)
Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002)
when more than one channel gets its MPS reduced:
> ACL Data RX: Handle 64 flags 0x02 dlen 16
LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8
MTU: 264
MPS: 99
Source CID: 64
! Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
! Result: Reconfiguration successful (0x0000)
Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)
Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected):
> ACL Data RX: Handle 64 flags 0x02 dlen 14
LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
MTU: 65
MPS: 64
! Source CID: 85
< ACL Data TX: Handle 64 flags 0x00 dlen 10
LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
! Result: Reconfiguration successful (0x0000)
Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)
Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64):
> ACL Data RX: Handle 64 flags 0x02 dlen 14
LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
MTU: 672
! MPS: 63
Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
! Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)
Result: Reconfiguration failed - other unacceptable parameters (0x0004)
Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel:
> ACL Data RX: Handle 64 flags 0x02 dlen 16
LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8
MTU: 84
! MPS: 71
Source CID: 64
! Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
! Result: Reconfiguration successful (0x0000)
Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)
Link: https://github.com/bluez/bluez/issues/1865
Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c8d7f21ead727485ebf965e2b4d42d4a4f0840f6 ]
The IGTK key ID must be 4 or 5, but the code checks against
key ID + 1, so must check against 5/6 rather than 4/5. Fix
that.
Reported-by: Jouni Malinen <j@w1.fi>
Fixes: 08645126dd24 ("cfg80211: implement wext key handling")
Link: https://patch.msgid.link/20260209181220.362205-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1799d8abeabc68ec05679292aaf6cba93b343c05 ]
xfrm6_get_saddr() does not check the return value of
ipv6_dev_get_saddr(). When ipv6_dev_get_saddr() fails to find a suitable
source address (returns -EADDRNOTAVAIL), saddr->in6 is left
uninitialized, but xfrm6_get_saddr() still returns 0 (success).
This causes the caller xfrm_tmpl_resolve_one() to use the uninitialized
address in xfrm_state_find(), triggering KMSAN warning:
=====================================================
BUG: KMSAN: uninit-value in xfrm_state_find+0x2424/0xa940
xfrm_state_find+0x2424/0xa940
xfrm_resolve_and_create_bundle+0x906/0x5a20
xfrm_lookup_with_ifid+0xcc0/0x3770
xfrm_lookup_route+0x63/0x2b0
ip_route_output_flow+0x1ce/0x270
udp_sendmsg+0x2ce1/0x3400
inet_sendmsg+0x1ef/0x2a0
__sock_sendmsg+0x278/0x3d0
__sys_sendto+0x593/0x720
__x64_sys_sendto+0x130/0x200
x64_sys_call+0x332b/0x3e70
do_syscall_64+0xd3/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Local variable tmp.i.i created at:
xfrm_resolve_and_create_bundle+0x3e3/0x5a20
xfrm_lookup_with_ifid+0xcc0/0x3770
=====================================================
Fix by checking the return value of ipv6_dev_get_saddr() and propagating
the error.
Fixes: a1e59abf8249 ("[XFRM]: Fix wildcard as tunnel source")
Reported-by: syzbot+e136d86d34b42399a8b1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68bf1024.a70a0220.7a912.02c2.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b89fc7c2523b2b0750d91840f4e52521270d70ed ]
When canceling the reconnect worker, care must be taken to reset the
reconnect-pending bit. If the reconnect worker has not yet been
scheduled before it is canceled, the reconnect-pending bit will stay
on forever.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260203055723.1085751-6-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e25dbf561e03c0c5e36228e3b8b784392819ce85 ]
The gcc-16.0.1 snapshot produces a false-positive warning that turns
into a build failure with CONFIG_WERROR:
In file included from arch/x86/include/asm/string.h:6,
from net/vmw_vsock/vmci_transport.c:10:
In function 'vmci_transport_packet_init',
inlined from '__vmci_transport_send_control_pkt.constprop' at net/vmw_vsock/vmci_transport.c:198:2:
arch/x86/include/asm/string_32.h:150:25: error: argument 2 null where non-null expected because argument 3 is nonzero [-Werror=nonnull]
150 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
net/vmw_vsock/vmci_transport.c:164:17: note: in expansion of macro 'memcpy'
164 | memcpy(&pkt->u.wait, wait, sizeof(pkt->u.wait));
| ^~~~~~
arch/x86/include/asm/string_32.h:150:25: note: in a call to built-in function '__builtin_memcpy'
net/vmw_vsock/vmci_transport.c:164:17: note: in expansion of macro 'memcpy'
164 | memcpy(&pkt->u.wait, wait, sizeof(pkt->u.wait));
| ^~~~~~
This seems relatively harmless, and it so far the only instance of this
warning I have found. The __vmci_transport_send_control_pkt function
is called either with wait=NULL or with one of the type values that
pass 'wait' into memcpy() here, but not from the same caller.
Replacing the memcpy with a struct assignment is otherwise the same
but avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bryan Tan <bryan-bt.tan@broadcom.com>
Link: https://patch.msgid.link/20260203163406.2636463-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 49d0901e260739de2fcc90c0c29f9e31e39a2d9b ]
hci_conn_enter_active_mode() uses queue_delayed_work() with the
intention that the work will run after the given timeout. However,
queue_delayed_work() does nothing if the work is already queued, so
depending on the link policy we may end up putting the connection
into idle mode every hdev->idle_timeout ms.
Use mod_delayed_work() instead so the work is queued if not already
queued, and the timeout is updated otherwise.
Signed-off-by: Stefan Sørensen <ssorensen@roku.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6e84fc395e90465f1418f582a9f7d53c87ab010e ]
syzbot reported that struct fib_alias.fa_state can be
modified locklessly by RCU readers. [0]
Let's use READ_ONCE()/WRITE_ONCE() properly.
[0]:
BUG: KCSAN: data-race in fib_table_lookup / fib_table_lookup
write to 0xffff88811b06a7fa of 1 bytes by task 4167 on cpu 0:
fib_alias_accessed net/ipv4/fib_lookup.h:32 [inline]
fib_table_lookup+0x361/0xd60 net/ipv4/fib_trie.c:1565
fib_lookup include/net/ip_fib.h:390 [inline]
ip_route_output_key_hash_rcu+0x378/0x1380 net/ipv4/route.c:2814
ip_route_output_key_hash net/ipv4/route.c:2705 [inline]
__ip_route_output_key include/net/route.h:169 [inline]
ip_route_output_flow+0x65/0x110 net/ipv4/route.c:2932
udp_sendmsg+0x13c3/0x15d0 net/ipv4/udp.c:1450
inet_sendmsg+0xac/0xd0 net/ipv4/af_inet.c:859
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x53a/0x600 net/socket.c:2592
___sys_sendmsg+0x195/0x1e0 net/socket.c:2646
__sys_sendmmsg+0x185/0x320 net/socket.c:2735
__do_sys_sendmmsg net/socket.c:2762 [inline]
__se_sys_sendmmsg net/socket.c:2759 [inline]
__x64_sys_sendmmsg+0x57/0x70 net/socket.c:2759
x64_sys_call+0x1e28/0x3000 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff88811b06a7fa of 1 bytes by task 4168 on cpu 1:
fib_alias_accessed net/ipv4/fib_lookup.h:31 [inline]
fib_table_lookup+0x338/0xd60 net/ipv4/fib_trie.c:1565
fib_lookup include/net/ip_fib.h:390 [inline]
ip_route_output_key_hash_rcu+0x378/0x1380 net/ipv4/route.c:2814
ip_route_output_key_hash net/ipv4/route.c:2705 [inline]
__ip_route_output_key include/net/route.h:169 [inline]
ip_route_output_flow+0x65/0x110 net/ipv4/route.c:2932
udp_sendmsg+0x13c3/0x15d0 net/ipv4/udp.c:1450
inet_sendmsg+0xac/0xd0 net/ipv4/af_inet.c:859
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x53a/0x600 net/socket.c:2592
___sys_sendmsg+0x195/0x1e0 net/socket.c:2646
__sys_sendmmsg+0x185/0x320 net/socket.c:2735
__do_sys_sendmmsg net/socket.c:2762 [inline]
__se_sys_sendmmsg net/socket.c:2759 [inline]
__x64_sys_sendmmsg+0x57/0x70 net/socket.c:2759
x64_sys_call+0x1e28/0x3000 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 4168 Comm: syz.4.206 Not tainted syzkaller #0 PREEMPT(voluntary)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Reported-by: syzbot+d24f940f770afda885cf@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69783ead.050a0220.c9109.0013.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260127043528.514160-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ad22d24be635c6beab6a1fdd3f8b1f3c478d15da ]
RDS connections carry a state "rds_conn_path::cp_state"
and transitions from one state to another and are conditional
upon an expected state: "rds_conn_path_transition."
There is one exception to this conditionality, which is
"RDS_CONN_ERROR" that can be enforced by "rds_conn_path_drop"
regardless of what state the condition is currently in.
But as soon as a connection enters state "RDS_CONN_ERROR",
the connection handling code expects it to go through the
shutdown-path.
The RDS/TCP multipath changes added a shortcut out of
"RDS_CONN_ERROR" straight back to "RDS_CONN_CONNECTING"
via "rds_tcp_accept_one_path" (e.g. after "rds_tcp_state_change").
A subsequent "rds_tcp_reset_callbacks" can then transition
the state to "RDS_CONN_RESETTING" with a shutdown-worker queued.
That'll trip up "rds_conn_init_shutdown", which was
never adjusted to handle "RDS_CONN_RESETTING" and subsequently
drops the connection with the dreaded "DR_INV_CONN_STATE",
which leaves "RDS_SHUTDOWN_WORK_QUEUED" on forever.
So we do two things here:
a) Don't shortcut "RDS_CONN_ERROR", but take the longer
path through the shutdown code.
b) Add "RDS_CONN_RESETTING" to the expected states in
"rds_conn_init_shutdown" so that we won't error out
and get stuck, if we ever hit weird state transitions
like this again."
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260122055213.83608-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 735ee8582da3d239eb0c7a53adca61b79fb228b3 ]
Quoting reporter:
In net/netfilter/xt_tcpmss.c (lines 53-68), the TCP option parser reads
op[i+1] directly without validating the remaining option length.
If the last byte of the option field is not EOL/NOP (0/1), the code attempts
to index op[i+1]. In the case where i + 1 == optlen, this causes an
out-of-bounds read, accessing memory past the optlen boundary
(either reading beyond the stack buffer _opt or the
following payload).
Reported-by: sungzii <sungzii@pm.me>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8a49fc8d8a3e83dc51ec05bcd4007bdea3c56eec ]
The upstream commit, 71d8c47fc653711c41bc3282e5b0e605b3727956
("netfilter: conntrack: introduce clash resolution on insertion race"),
sets allow_clash=true in the UDP/UDPLITE protocol handler
but does not set it in the generic protocol handler.
As a result, packets composed of connectionless protocols at each layer,
such as UDP over IP-in-IP, still drop packets due to conflicts during conntrack insertion.
To resolve this, this patch sets allow_clash in the nf_conntrack_l4proto_generic.
Signed-off-by: Yuto Hamaguchi <Hamaguchi.Yuto@da.MitsubishiElectric.co.jp>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit dd2fdc3504592d85e549c523b054898a036a6afe upstream.
Commit 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") added
a kref_get(&gss_auth->kref) call to balance the gss_put_auth() done
in gss_release_msg(), but forgot to add a corresponding kref_put()
on the error path when kstrdup_const() fails.
If service_name is non-NULL and kstrdup_const() fails, the function
jumps to err_put_pipe_version which calls put_pipe_version() and
kfree(gss_msg), but never releases the gss_auth reference. This leads
to a kref leak where the gss_auth structure is never freed.
Add a forward declaration for gss_free_callback() and call kref_put()
in the err_put_pipe_version error path to properly release the
reference taken earlier.
Fixes: 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Hodges <git@danielhodges.dev>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3e6397b056335cc56ef0e9da36c95946a19f5118 upstream.
The gssx_dec_ctx(), gssx_dec_status(), and gssx_dec_name()
functions allocate memory via gssx_dec_buffer(), which calls
kmemdup(). When a subsequent decode operation fails, these
functions return immediately without freeing previously
allocated buffers, causing memory leaks.
The leak in gssx_dec_ctx() is particularly relevant because
the caller (gssp_accept_sec_context_upcall) initializes several
buffer length fields to non-zero values, resulting in memory
allocation:
struct gssx_ctx rctxh = {
.exported_context_token.len = GSSX_max_output_handle_sz,
.mech.len = GSS_OID_MAX_LEN,
.src_name.display_name.len = GSSX_max_princ_sz,
.targ_name.display_name.len = GSSX_max_princ_sz
};
If, for example, gssx_dec_name() succeeds for src_name but
fails for targ_name, the memory allocated for
exported_context_token, mech, and src_name.display_name
remains unreferenced and cannot be reclaimed.
Add error handling with goto-based cleanup to free any
previously allocated buffers before returning an error.
Reported-by: Xingjing Deng <micro6947@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/CAK+ZN9qttsFDu6h1FoqGadXjMx1QXqPMoYQ=6O9RY4SxVTvKng@mail.gmail.com/
Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a6d28eb8efe96b3e35c92efdf1bfacb0cccf541f ]
Mihail Milev reports: Error: UNINIT (CWE-457):
net/netfilter/nf_conntrack_h323_main.c:1189:2: var_decl:
Declaring variable "tuple" without initializer.
net/netfilter/nf_conntrack_h323_main.c:1197:2:
uninit_use_in_call: Using uninitialized value "tuple.src.l3num" when calling "__nf_ct_expect_find".
net/netfilter/nf_conntrack_expect.c:142:2:
read_value: Reading value "tuple->src.l3num" when calling "nf_ct_expect_dst_hash".
1195| tuple.dst.protonum = IPPROTO_TCP;
1196|
1197|-> exp = __nf_ct_expect_find(net, nf_ct_zone(ct), &tuple);
1198| if (exp && exp->master == ct)
1199| return exp;
Switch this to a C99 initialiser and set the l3num value.
Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit da29e453dcb3aa7cabead7915f5f945d0add3a52 ]
Commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
connection teardown") modifies rds_sendmsg to avoid enqueueing work
while a tear down is in progress. However, it also changed the return
value of rds_sendmsg to that of rds_send_xmit instead of the
payload_len. This means the user may incorrectly receive errno values
when it should have simply received a payload of 0 while the peer
attempts a reconnections. So this patch corrects the teardown handling
code to only use the out error path in that case, thus restoring the
original payload_len return value.
Fixes: 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260213035409.1963391-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit afcae7d7b8a278a6c29e064f99e5bafd4ac1fb37 ]
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the
number of rdma_rw contexts (ctxts). This value is used to allocate the
Send CQ and to initialize the sc_sq_avail credit pool.
However, when the device uses memory registration for RDMA operations,
rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three
per context to account for REG and INV work requests. The Send CQ and
credit pool remain sized for only one work request per context,
causing Send Queue exhaustion under heavy NFS WRITE workloads.
Introduce rdma_rw_max_sge() to compute the actual number of Send Queue
entries required for a given number of rdma_rw contexts. Upper layer
protocols call this helper before creating a Queue Pair so that their
Send CQs and credit accounting match the QP's true capacity.
Update svc_rdma_accept() to use rdma_rw_max_sge() when computing
sc_sq_depth, ensuring the credit pool reflects the work requests
that rdma_rw_init_qp() will reserve.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-5-cel@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 59243315890578a040a2d50ae9e001a2ef2fcb62 ]
There is an upper bound on the number of rdma_rw contexts that can
be created per QP.
This invisible upper bound is because rdma_create_qp() adds one or
more additional SQEs for each ctxt that the ULP requests via
qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
the order of the sum of qp_attr.cap.max_send_wr and a factor times
qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
on whether MR operations are required before RDMA Reads.
This limit is not visible to RDMA consumers via dev->attrs. When the
limit is surpassed, QP creation fails with -ENOMEM. For example:
svcrdma's estimate of the number of rdma_rw contexts it needs is
three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES
is about 260, the internally-computed SQ length should be:
64 credits + 10 backlog + 3 * (3 * 260) = 2414
Which is well below the advertised qp_max_wr of 32768.
If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages:
64 credits + 10 backlog + 3 * (3 * 1040) = 9434
However, QP creation fails. Dynamic printk for mlx5 shows:
calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768)
Although 9326 is still far below qp_max_wr, QP creation still
fails.
Because the total SQ length calculation is opaque to RDMA consumers,
there doesn't seem to be much that can be done about this except for
consumers to try to keep the requested rdma_rw ctxt count low.
Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count")
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2da0f610e733606e06284ac3c1f188b9dec75d68 ]
rdma_rw_mr_factor() returns the smallest number of MRs needed to
move a particular number of pages. svcrdma currently asks for the
number of MRs needed to move RPCSVC_MAXPAGES (a little over one
megabyte), as that is the number of pages in the largest r/wsize
the server supports.
This call assumes that the client's NIC can bundle a full one
megabyte payload in a single rdma_segment. In fact, most NICs cannot
handle a full megabyte with a single rkey / rdma_segment. Clients
will typically split even a single Read chunk into many segments.
The server needs one MR to read each rdma_segment in a Read chunk,
and thus each one needs an rw_ctx.
svcrdma has been vastly underestimating the number of rw_ctxs needed
to handle 64 RPC requests with large Read chunks using small
rdma_segments.
Unfortunately there doesn't seem to be a good way to estimate this
number without knowing the client NIC's capabilities. Even then,
the client RPC/RDMA implementation is still free to split a chunk
into smaller segments (for example, it might be using physical
registration, which needs an rdma_segment per page).
The best we can do for now is choose a number that will guarantee
forward progress in the worst case (one page per segment).
At some later point, we could add some mechanisms to make this
much less of a problem:
- Add a core API to add more rw_ctxs to an already-established QP
- svcrdma could treat rw_ctx exhaustion as a temporary error and
try again
- Limit the number of Reads in flight
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fc2e69db82c1ac506cd7f539a3ab66d51d3380dc ]
The comment that starts "Qualify ..." applies to only some of the
following code paragraph. Re-arrange the lines so the comment makes
more sense.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b918bfcf370c92ea3b82fa9bb3d017702b5fa4cb ]
These won't have much diagnostic value for site administrators.
Since they can't be disabled, they become noise.
What's more, the subsequent rdma_create_qp() call adjusts the Send
Queue size (possibly downward) without warning, making the size
reported by these pr_warns inaccurate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c558d47596867ff1082fd7475b63670f63f7f5cf ]
Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7b748c30cc046056a24c459de415844a856ea54b ]
Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.
Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 77f0a2aa5cdde0524eab745f7a117706d3e3014f ]
Introduce a server-side mechanism similar to commit e340c2d6ef2a
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: afcae7d7b8a2 ("RDMA/core: add rdma_rw_max_sge() helper for SQ sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ae88a5d2f29b69819dc7b04086734439d074a643 ]
Reproducer available at [1].
The ATM send path (sendmsg -> vcc_sendmsg -> sigd_send) reads the vcc
pointer from msg->vcc and uses it directly without any validation. This
pointer comes from userspace via sendmsg() and can be arbitrarily forged:
int fd = socket(AF_ATMSVC, SOCK_DGRAM, 0);
ioctl(fd, ATMSIGD_CTRL); // become ATM signaling daemon
struct msghdr msg = { .msg_iov = &iov, ... };
*(unsigned long *)(buf + 4) = 0xdeadbeef; // fake vcc pointer
sendmsg(fd, &msg, 0); // kernel dereferences 0xdeadbeef
In normal operation, the kernel sends the vcc pointer to the signaling
daemon via sigd_enq() when processing operations like connect(), bind(),
or listen(). The daemon is expected to return the same pointer when
responding. However, a malicious daemon can send arbitrary pointer values.
Fix this by introducing find_get_vcc() which validates the pointer by
searching through vcc_hash (similar to how sigd_close() iterates over
all VCCs), and acquires a reference via sock_hold() if found.
Since struct atm_vcc embeds struct sock as its first member, they share
the same lifetime. Therefore using sock_hold/sock_put is sufficient to
keep the vcc alive while it is being used.
Note that there may be a race with sigd_close() which could mark the vcc
with various flags (e.g., ATM_VF_RELEASED) after find_get_vcc() returns.
However, sock_hold() guarantees the memory remains valid, so this race
only affects the logical state, not memory safety.
[1]: https://gist.github.com/mrpre/1ba5949c45529c511152e2f4c755b0f3
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1f22cb1769f249df9fa0@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69039850.a70a0220.5b2ed.005d.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260205095501.131890-1-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4780ec142cbb24b794129d3080eee5cac2943ffc ]
Userspace provides an optimized representation in case intervals are
adjacent, where the end element is omitted.
The existing partial overlap detection logic skips anonymous set checks
on start elements for this reason.
However, it is possible to add intervals that overlap to this anonymous
where two start elements with the same, eg. A-B, A-C where C < B.
start end
A B
start end
A C
Restore the check on overlapping start elements to report an overlap.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2f635adbe2642d398a0be3ab245accd2987be0c3 ]
tests/shell/testcases/packetpath/set_match_nomatch_hash_fast
fails on big endian with:
Error: Could not process rule: No such file or directory
reset element ip test s { 244.147.90.126 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fatal: Cannot fetch element "244.147.90.126"
... because the wrong bucket is searched, jhash() and jhash1_word are
not interchangeable on big endian.
Fixes: 3b02b0adc242 ("netfilter: nft_set_hash: fix lookups with fixed size hash on big endian")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 838eb9687691d29915797a885b861fd09353386e ]
tcp_tx_timestamp() is only called at the end of tcp_sendmsg_locked()
before the final tcp_push().
By the time it is called, it is possible all the copied data
has been sent already (transmit queue is empty).
If this is the case, use the last skb in the rtx queue.
Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20260127123828.4098577-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit de8a70cefcb26cdceaafdc5ac144712681419c29 ]
Since commit be102eb6a0e7 ("netfilter: nf_conncount: rework API to use
sk_buff directly"), we skip the adding and trigger a GC when the ct is
confirmed. For connections originated from local to local it doesn't
work because the connection is confirmed on POSTROUTING, therefore
tracking on the INPUT hook is always skipped.
In order to fix this, we check whether skb input ifindex is set to
loopback ifindex. If it is then we fallback on a GC plus track operation
skipping the optimization. This fallback is necessary to avoid
duplicated tracking of a packet train e.g 10 UDP datagrams sent on a
burst when initiating the connection.
Tested with xt_connlimit/nft_connlimit and OVS limit and with a HTTP
server and iperf3 on UDP mode.
Fixes: be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly")
Reported-by: Michal Slabihoudek <michal.slabihoudek@gooddata.com>
Closes: https://lore.kernel.org/netfilter/6989BD9F-8C24-4397-9AD7-4613B28BF0DB@gooddata.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 21d033e472735ecec677f1ae46d6740b5e47a4f3 ]
After the optimization to only perform one GC per jiffy, a new problem
was introduced. If more than 8 new connections are tracked per jiffy the
list won't be cleaned up fast enough possibly reaching the limit
wrongly.
In order to prevent this issue, only skip the GC if it was already
triggered during the same jiffy and the increment is lower than the
clean up limit. In addition, increase the clean up limit to 64
connections to avoid triggering GC too often and do more effective GCs.
This has been tested using a HTTP server and several
performance tools while having nft_connlimit/xt_connlimit or OVS limit
configured.
Output of slowhttptest + OVS limit at 52000 connections:
slow HTTP test status on 340th second:
initializing: 0
pending: 432
connected: 51998
error: 0
closed: 0
service available: YES
Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Reported-by: Aleksandra Rukomoinikova <ARukomoinikova@k2.cloud>
Closes: https://lore.kernel.org/netfilter/b2064e7b-0776-4e14-adb6-c68080987471@k2.cloud/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c0362b5748282e22fa1592a8d3474f726ad964c2 ]
For convenience when performing GC over the connection list, make
nf_conncount_gc_list() to disable BH. This unifies the behavior with
nf_conncount_add() and nf_conncount_count().
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: 21d033e47273 ("netfilter: nf_conncount: increase the connection clean up limit to 64")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e1696c8bd0056bc1a5f7766f58ac333adc203e8a ]
Seems that there is an assumption that this function should be called
only for netdev interfaces, but it can also be called in suspend, or
from nl80211_netlink_notify (indirectly).
Note that the documentation of NL80211_ATTR_SOCKET_OWNER explicitly
says that NAN interfaces would be destroyed as well in the
nl80211_netlink_notify case.
Fix this by also stopping P2P and NAN.
Fixes: cb3b7d87652a ("cfg80211: add start / stop NAN commands")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260107140430.dab142cbef0b.I290cc47836d56dd7e35012ce06bec36c6da688cd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 74d9391e8849e70ded5309222d09b0ed0edbd039 ]
The rx->skey field contains a struct tipc_aead_key with GCM-AES
encryption keys used for TIPC cluster communication. Using plain
kfree() leaves this sensitive key material in freed memory pages
where it could potentially be recovered.
Switch to kfree_sensitive() to ensure the key material is zeroed
before the memory is freed.
Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
Link: https://patch.msgid.link/20260131180114.2121438-1-hodgesd@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3f3d8ff31496874a69b131866f62474eb24ed20a ]
In reconfig, in case the driver asks to disconnect during the reconfig,
all the keys of the interface are marked as tainted.
Then ieee80211_reenable_keys will loop over all the interface keys, and
for each one it will
a) increment crypto_tx_tailroom_needed_cnt
b) call ieee80211_key_enable_hw_accel, which in turn will detect that
this key is tainted, so it will mark it as "not in hardware", which is
paired with crypto_tx_tailroom_needed_cnt incrementation, so we get two
incrementations for each tainted key.
Then we get a warning in ieee80211_free_keys.
To fix it, don't increment the count in ieee80211_reenable_keys for
tainted keys
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260118092821.4ca111fddcda.Id6e554f4b1c83760aa02d5a9e4e3080edb197aa2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a3034bf0746d88a00cceda9541534a5721445a24 ]
An integer overflow occurs in cfg80211_calculate_bitrate_he() when
calculating bitrates for high throughput HE configurations.
For example, with 160 MHz bandwidth, HE-MCS 13, HE-NSS 4, and HE-GI 0,
the multiplication (result * rate->nss) overflows the 32-bit 'result'
variable before division by 8, leading to significantly underestimated
bitrate values.
The overflow occurs because the NSS multiplication operates on a 32-bit
integer that cannot accommodate intermediate values exceeding
4,294,967,295. When overflow happens, the value wraps around, producing
incorrect bitrates for high MCS and NSS combinations.
Fix this by utilizing the 64-bit 'tmp' variable for the NSS
multiplication and subsequent divisions via do_div(). This approach
preserves full precision throughout the entire calculation, with the
final value assigned to 'result' only after completing all operations.
Signed-off-by: Veerendranath Jakkam <veerendranath.jakkam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260109-he_bitrate_overflow-v1-1-95575e466b6e@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a203dbeeca15a9b924f0d51f510921f4bae96801 ]
In __sta_info_destroy_part2(), station statistics are requested after the
IEEE80211_STA_NONE -> IEEE80211_STA_NOTEXIST transition. This is
problematic because the driver may be unable to handle the request due to
the STA being in the NOTEXIST state (i.e. if the driver destroys the
underlying data when transitioning to NOTEXIST).
Move the statistics collection to before the state transition to avoid
this issue.
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20251222-mac80211-move-station-stats-collection-earlier-v1-1-12cd4e42c633@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ff4071c60018a668249dc6a2df7d16330543540e ]
ieee80211_ocb_rx_no_sta() assumes a valid channel context, which is only
present after JOIN_OCB.
RX may run before JOIN_OCB is executed, in which case the OCB interface
is not operational. Skip RX peer handling when the interface is not
joined to avoid warnings in the RX path.
Reported-by: syzbot+b364457b2d1d4e4a3054@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b364457b2d1d4e4a3054
Tested-by: syzbot+b364457b2d1d4e4a3054@syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com>
Link: https://patch.msgid.link/20251216035932.18332-1-moonhee.lee.ca@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b85e3367a5716ed3662a4fe266525190d2af76df upstream.
Otherwise, it is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof()
when resizing hashtable because __GFP_NOWARN is unset.
Similar to:
b541ba7d1f5a ("netfilter: conntrack: clamp maximum hashtable size to INT_MAX")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[ Keerthana: Handle freeing new_lt ]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51edb2ff1c6fc27d3fa73f0773a31597ecd8e230 upstream.
This should check for NULL in case memory allocation fails.
Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com>
Fixes: 3b9e2ea6c11b ("netfilter: nft_limit: move stateful fields out of expression data")
Fixes: 37f319f37d90 ("netfilter: nft_connlimit: move stateful fields out of expression data")
Fixes: 33a24de37e81 ("netfilter: nft_last: move stateful fields out of expression data")
Fixes: ed0a0c60f0e5 ("netfilter: nft_quota: move stateful fields out of expression data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20220110194817.53481-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
[ Portion of this patch applied - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a458b2902115b26a25d67393b12ddd57d1216aaa upstream.
To prevent timing attacks, MACs need to be compared in constant time.
Use the appropriate helper function for this.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250818202724.15713-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Include crypto/algapi.h instead of crypto/utils.h in v5.10.y. ]
Signed-off-by: Alva Lan <alvalan9@foxmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 205305c028ad986d0649b8b100bab6032dcd1bb5 upstream.
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251112072709.73755-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cc0cf10fdaeadf5542d64a55b5b4120d3df90b7d ]
Fix the check if netfilter's static keys are available. netfilter defines
and exports static keys if CONFIG_JUMP_LABEL is enabled. (HAVE_JUMP_LABEL
is never defined.)
Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260127101925.1754425-1-martin@kaiser.cx
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d2492688bb9fed6ab6e313682c387ae71a66ebae ]
syzbot reported the splat below [0] without a repro.
It indicates that struct nci_dev.cmd_wq had been destroyed before
nci_close_device() was called via rfkill.
nci_dev.cmd_wq is only destroyed in nci_unregister_device(), which
(I think) was called from virtual_ncidev_close() when syzbot close()d
an fd of virtual_ncidev.
The problem is that nci_unregister_device() destroys nci_dev.cmd_wq
first and then calls nfc_unregister_device(), which removes the
device from rfkill by rfkill_unregister().
So, the device is still visible via rfkill even after nci_dev.cmd_wq
is destroyed.
Let's unregister the device from rfkill first in nci_unregister_device().
Note that we cannot call nfc_unregister_device() before
nci_close_device() because
1) nfc_unregister_device() calls device_del() which frees
all memory allocated by devm_kzalloc() and linked to
ndev->conn_info_list
2) nci_rx_work() could try to queue nci_conn_info to
ndev->conn_info_list which could be leaked
Thus, nfc_unregister_device() is split into two functions so we
can remove rfkill interfaces only before nci_close_device().
[0]:
DEBUG_LOCKS_WARN_ON(1)
WARNING: kernel/locking/lockdep.c:238 at hlock_class kernel/locking/lockdep.c:238 [inline], CPU#0: syz.0.8675/6349
WARNING: kernel/locking/lockdep.c:238 at check_wait_context kernel/locking/lockdep.c:4854 [inline], CPU#0: syz.0.8675/6349
WARNING: kernel/locking/lockdep.c:238 at __lock_acquire+0x39d/0x2cf0 kernel/locking/lockdep.c:5187, CPU#0: syz.0.8675/6349
Modules linked in:
CPU: 0 UID: 0 PID: 6349 Comm: syz.0.8675 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
RIP: 0010:hlock_class kernel/locking/lockdep.c:238 [inline]
RIP: 0010:check_wait_context kernel/locking/lockdep.c:4854 [inline]
RIP: 0010:__lock_acquire+0x3a4/0x2cf0 kernel/locking/lockdep.c:5187
Code: 18 00 4c 8b 74 24 08 75 27 90 e8 17 f2 fc 02 85 c0 74 1c 83 3d 50 e0 4e 0e 00 75 13 48 8d 3d 43 f7 51 0e 48 c7 c6 8b 3a de 8d <67> 48 0f b9 3a 90 31 c0 0f b6 98 c4 00 00 00 41 8b 45 20 25 ff 1f
RSP: 0018:ffffc9000c767680 EFLAGS: 00010046
RAX: 0000000000000001 RBX: 0000000000040000 RCX: 0000000000080000
RDX: ffffc90013080000 RSI: ffffffff8dde3a8b RDI: ffffffff8ff24ca0
RBP: 0000000000000003 R08: ffffffff8fef35a3 R09: 1ffffffff1fde6b4
R10: dffffc0000000000 R11: fffffbfff1fde6b5 R12: 00000000000012a2
R13: ffff888030338ba8 R14: ffff888030338000 R15: ffff888030338b30
FS: 00007fa5995f66c0(0000) GS:ffff8881256f8000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7e72f842d0 CR3: 00000000485a0000 CR4: 00000000003526f0
Call Trace:
<TASK>
lock_acquire+0x106/0x330 kernel/locking/lockdep.c:5868
touch_wq_lockdep_map+0xcb/0x180 kernel/workqueue.c:3940
__flush_workqueue+0x14b/0x14f0 kernel/workqueue.c:3982
nci_close_device+0x302/0x630 net/nfc/nci/core.c:567
nci_dev_down+0x3b/0x50 net/nfc/nci/core.c:639
nfc_dev_down+0x152/0x290 net/nfc/core.c:161
nfc_rfkill_set_block+0x2d/0x100 net/nfc/core.c:179
rfkill_set_block+0x1d2/0x440 net/rfkill/core.c:346
rfkill_fop_write+0x461/0x5a0 net/rfkill/core.c:1301
vfs_write+0x29a/0xb90 fs/read_write.c:684
ksys_write+0x150/0x270 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa59b39acb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa5995f6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fa59b615fa0 RCX: 00007fa59b39acb9
RDX: 0000000000000008 RSI: 0000200000000080 RDI: 0000000000000007
RBP: 00007fa59b408bf7 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa59b616038 R14: 00007fa59b615fa0 R15: 00007ffc82218788
</TASK>
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reported-by: syzbot+f9c5fd1a0874f9069dce@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/695e7f56.050a0220.1c677c.036c.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260127040411.494931-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 165c34fb6068ff153e3fc99a932a80a9d5755709 ]
syzbot reported various memory leaks related to NFC, struct
nfc_llcp_sock, sk_buff, nfc_dev, etc. [0]
The leading log hinted that nfc_llcp_send_ui_frame() failed
to allocate skb due to sock_error(sk) being -ENXIO.
ENXIO is set by nfc_llcp_socket_release() when struct
nfc_llcp_local is destroyed by local_cleanup().
The problem is that there is no synchronisation between
nfc_llcp_send_ui_frame() and local_cleanup(), and skb
could be put into local->tx_queue after it was purged in
local_cleanup():
CPU1 CPU2
---- ----
nfc_llcp_send_ui_frame() local_cleanup()
|- do { '
|- pdu = nfc_alloc_send_skb(..., &err)
| .
| |- nfc_llcp_socket_release(local, false, ENXIO);
| |- skb_queue_purge(&local->tx_queue); |
| ' |
|- skb_queue_tail(&local->tx_queue, pdu); |
... |
|- pdu = nfc_alloc_send_skb(..., &err) |
^._________________________________.'
local_cleanup() is called for struct nfc_llcp_local only
after nfc_llcp_remove_local() unlinks it from llcp_devices.
If we hold local->tx_queue.lock then, we can synchronise
the thread and nfc_llcp_send_ui_frame().
Let's do that and check list_empty(&local->list) before
queuing skb to local->tx_queue in nfc_llcp_send_ui_frame().
[0]:
[ 56.074943][ T6096] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU (error=-6)
[ 64.318868][ T5813] kmemleak: 6 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
BUG: memory leak
unreferenced object 0xffff8881272f6800 (size 1024):
comm "syz.0.17", pid 6096, jiffies 4294942766
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
27 00 03 40 00 00 00 00 00 00 00 00 00 00 00 00 '..@............
backtrace (crc da58d84d):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4979 [inline]
slab_alloc_node mm/slub.c:5284 [inline]
__do_kmalloc_node mm/slub.c:5645 [inline]
__kmalloc_noprof+0x3e3/0x6b0 mm/slub.c:5658
kmalloc_noprof include/linux/slab.h:961 [inline]
sk_prot_alloc+0x11a/0x1b0 net/core/sock.c:2239
sk_alloc+0x36/0x360 net/core/sock.c:2295
nfc_llcp_sock_alloc+0x37/0x130 net/nfc/llcp_sock.c:979
llcp_sock_create+0x71/0xd0 net/nfc/llcp_sock.c:1044
nfc_sock_create+0xc9/0xf0 net/nfc/af_nfc.c:31
__sock_create+0x1a9/0x340 net/socket.c:1605
sock_create net/socket.c:1663 [inline]
__sys_socket_create net/socket.c:1700 [inline]
__sys_socket+0xb9/0x1a0 net/socket.c:1747
__do_sys_socket net/socket.c:1761 [inline]
__se_sys_socket net/socket.c:1759 [inline]
__x64_sys_socket+0x1b/0x30 net/socket.c:1759
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
BUG: memory leak
unreferenced object 0xffff88810fbd9800 (size 240):
comm "syz.0.17", pid 6096, jiffies 4294942850
hex dump (first 32 bytes):
68 f0 ff 08 81 88 ff ff 68 f0 ff 08 81 88 ff ff h.......h.......
00 00 00 00 00 00 00 00 00 68 2f 27 81 88 ff ff .........h/'....
backtrace (crc 6cc652b1):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4979 [inline]
slab_alloc_node mm/slub.c:5284 [inline]
kmem_cache_alloc_node_noprof+0x36f/0x5e0 mm/slub.c:5336
__alloc_skb+0x203/0x240 net/core/skbuff.c:660
alloc_skb include/linux/skbuff.h:1383 [inline]
alloc_skb_with_frags+0x69/0x3f0 net/core/skbuff.c:6671
sock_alloc_send_pskb+0x379/0x3e0 net/core/sock.c:2965
sock_alloc_send_skb include/net/sock.h:1859 [inline]
nfc_alloc_send_skb+0x45/0x80 net/nfc/core.c:724
nfc_llcp_send_ui_frame+0x162/0x360 net/nfc/llcp_commands.c:766
llcp_sock_sendmsg+0x14c/0x1d0 net/nfc/llcp_sock.c:814
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
__sys_sendto+0x2d8/0x2f0 net/socket.c:2244
__do_sys_sendto net/socket.c:2251 [inline]
__se_sys_sendto net/socket.c:2247 [inline]
__x64_sys_sendto+0x28/0x30 net/socket.c:2247
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 94f418a20664 ("NFC: UI frame sending routine implementation")
Reported-by: syzbot+f2d245f1d76bbfa50e4c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/697569c7.a00a0220.33ccc7.0014.GAE@google.com/T/#u
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260125010214.1572439-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|