summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2025-11-24net: factor-out _sk_charge() helperPaolo Abeni
Move out of __inet_accept() the code dealing charging newly accepted socket to memcg. MPTCP will soon use it to on a per subflow basis, in different contexts. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Geliang Tang <geliang@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-1-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24net: sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()Eric Dumazet
syzbot reported that tcf_get_base_ptr() can be called while transport header is not set [1]. Instead of returning a dangling pointer, return NULL. Fix tcf_get_base_ptr() callers to handle this NULL value. [1] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 skb_transport_header include/linux/skbuff.h:3071 [inline] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 tcf_get_base_ptr include/net/pkt_cls.h:539 [inline] WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 em_nbyte_match+0x2d8/0x3f0 net/sched/em_nbyte.c:43 Modules linked in: CPU: 1 UID: 0 PID: 6019 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Call Trace: <TASK> tcf_em_match net/sched/ematch.c:494 [inline] __tcf_em_tree_match+0x1ac/0x770 net/sched/ematch.c:520 tcf_em_tree_match include/net/pkt_cls.h:512 [inline] basic_classify+0x115/0x2d0 net/sched/cls_basic.c:50 tc_classify include/net/tc_wrapper.h:197 [inline] __tcf_classify net/sched/cls_api.c:1764 [inline] tcf_classify+0x4cf/0x1140 net/sched/cls_api.c:1860 multiq_classify net/sched/sch_multiq.c:39 [inline] multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66 dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118 __dev_xmit_skb net/core/dev.c:4214 [inline] __dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729 packet_snd net/packet/af_packet.c:3076 [inline] packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 ____sys_sendmsg+0x505/0x830 net/socket.c:2630 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+f3a497f02c389d86ef16@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6920855a.a70a0220.2ea503.0058.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251121154100.1616228-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20devlink: support default values for param-get and param-setDaniel Zahka
Support querying and resetting to default param values. Introduce two new devlink netlink attrs: DEVLINK_ATTR_PARAM_VALUE_DEFAULT and DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an optional parameter value inside of the param_value nested attribute. The latter is used in param-set requests from userspace to indicate that the driver should reset the param to its default value. To implement this, two new functions are added to the devlink driver api: devlink_param::get_default() and devlink_param::reset_default(). These callbacks allow drivers to implement default param actions for runtime and permanent cmodes. For driverinit params, the core latches the last value set by a driver via devl_param_driverinit_value_set(), and uses that as the default value for a param. Because default parameter values are optional, it would be impossible to discern whether or not a param of type bool has default value of false or not provided if the default value is encoded using a netlink flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an associated default value, the default value is encoded using a u8 type. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20devlink: pass extack through to devlink_param::get()Daniel Zahka
Allow devlink_param::get() handlers to report error messages via extack. This function is called in a few different contexts, but not all of them will have an valid extack to use. When devlink_param::get() is called from param_get_doit or param_get_dumpit contexts, pass the extack through so that drivers can report errors when retrieving param values. devlink_param::get() is called from the context of devlink_param_notify(), pass NULL in for the extack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20tcp: add net.ipv4.tcp_rcvbuf_low_rttEric Dumazet
This is a follow up of commit aa251c84636c ("tcp: fix too slow tcp_rcvbuf_grow() action") which brought again the issue that I tried to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot") We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d ("tcp: increase tcp_rmem[2] to 32 MB") Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can force NIC driver to not recycle pages from their page pool, and also can cause cache evictions for DDIO enabled cpus/NIC, as receivers are usually slower than senders. Add net.ipv4.tcp_rcvbuf_low_rtt sysctl, set by default to 1000 usec (1 ms) If RTT if smaller than the sysctl value, use the RTT/tcp_rcvbuf_low_rtt ratio to control sk_rcvbuf inflation. Tested: Pair of hosts with a 200Gbit IDPF NIC. Using netperf/netserver Client initiates 8 TCP bulk flows, asking netserver to use CPU #10 only. super_netperf 8 -H server -T,10 -l 30 On server, use perf -e tcp:tcp_rcvbuf_grow while test is running. Before: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script|tail -20|cut -c30-230 1153.051201: tcp:tcp_rcvbuf_grow: time=398 rtt_us=382 copied=6905856 inq=180224 space=6115328 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.138752: tcp:tcp_rcvbuf_grow: time=446 rtt_us=413 copied=5529600 inq=180224 space=4505600 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1153.361484: tcp:tcp_rcvbuf_grow: time=415 rtt_us=380 copied=7061504 inq=204800 space=6725632 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.457642: tcp:tcp_rcvbuf_grow: time=483 rtt_us=421 copied=5885952 inq=720896 space=4407296 ooo=0 scaling_ratio=240 rcvbuf=23763511 rcv_ssthresh=22223271 window_clamp=22278291 rcv_wnd=21430272 famil 1153.466002: tcp:tcp_rcvbuf_grow: time=308 rtt_us=281 copied=3244032 inq=180224 space=2883584 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41713664 famil 1153.747792: tcp:tcp_rcvbuf_grow: time=394 rtt_us=332 copied=4460544 inq=585728 space=3063808 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41373696 famil 1154.260747: tcp:tcp_rcvbuf_grow: time=652 rtt_us=226 copied=10977280 inq=737280 space=9486336 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29197743 window_clamp=29217691 rcv_wnd=28368896 fami 1154.375019: tcp:tcp_rcvbuf_grow: time=461 rtt_us=443 copied=7573504 inq=507904 space=6856704 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25288704 famil 1154.463072: tcp:tcp_rcvbuf_grow: time=494 rtt_us=408 copied=7983104 inq=200704 space=7065600 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25579520 famil 1154.474658: tcp:tcp_rcvbuf_grow: time=507 rtt_us=459 copied=5586944 inq=540672 space=4718592 ooo=0 scaling_ratio=240 rcvbuf=17852266 rcv_ssthresh=16692999 window_clamp=16736499 rcv_wnd=16056320 famil 1154.584657: tcp:tcp_rcvbuf_grow: time=494 rtt_us=427 copied=8126464 inq=204800 space=7782400 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1154.702117: tcp:tcp_rcvbuf_grow: time=480 rtt_us=406 copied=5734400 inq=180224 space=5349376 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1155.941595: tcp:tcp_rcvbuf_grow: time=717 rtt_us=670 copied=11042816 inq=3784704 space=7159808 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=14614528 fam 1156.384735: tcp:tcp_rcvbuf_grow: time=529 rtt_us=473 copied=9011200 inq=180224 space=7258112 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=18018304 famil 1157.821676: tcp:tcp_rcvbuf_grow: time=529 rtt_us=272 copied=8224768 inq=602112 space=6545408 ooo=0 scaling_ratio=240 rcvbuf=67000000 rcv_ssthresh=62793576 window_clamp=62812500 rcv_wnd=62115840 famil 1158.906379: tcp:tcp_rcvbuf_grow: time=710 rtt_us=445 copied=11845632 inq=540672 space=10240000 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29205935 window_clamp=29217691 rcv_wnd=28536832 fam 1164.600160: tcp:tcp_rcvbuf_grow: time=841 rtt_us=430 copied=12976128 inq=1290240 space=11304960 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29212591 window_clamp=29217691 rcv_wnd=27856896 fa 1165.163572: tcp:tcp_rcvbuf_grow: time=845 rtt_us=800 copied=12632064 inq=540672 space=7921664 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25912795 window_clamp=25937095 rcv_wnd=25260032 fami 1165.653464: tcp:tcp_rcvbuf_grow: time=388 rtt_us=309 copied=4493312 inq=180224 space=3874816 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41995899 window_clamp=42050919 rcv_wnd=41713664 famil 1166.651211: tcp:tcp_rcvbuf_grow: time=556 rtt_us=553 copied=6328320 inq=540672 space=5554176 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=20946944 famil After: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1000 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script|tail -20|cut -c30-230 1457.053149: tcp:tcp_rcvbuf_grow: time=128 rtt_us=24 copied=1441792 inq=40960 space=1269760 ooo=0 scaling_ratio=240 rcvbuf=2960741 rcv_ssthresh=2605474 window_clamp=2775694 rcv_wnd=2568192 family=AF_I 1458.000778: tcp:tcp_rcvbuf_grow: time=128 rtt_us=31 copied=1441792 inq=24576 space=1400832 ooo=0 scaling_ratio=240 rcvbuf=3060163 rcv_ssthresh=2810042 window_clamp=2868902 rcv_wnd=2674688 family=AF_I 1458.088059: tcp:tcp_rcvbuf_grow: time=190 rtt_us=110 copied=3227648 inq=385024 space=2781184 ooo=0 scaling_ratio=240 rcvbuf=6728240 rcv_ssthresh=6252705 window_clamp=6307725 rcv_wnd=5799936 family=AF 1458.148549: tcp:tcp_rcvbuf_grow: time=232 rtt_us=129 copied=3956736 inq=237568 space=2842624 ooo=0 scaling_ratio=240 rcvbuf=6731333 rcv_ssthresh=6252705 window_clamp=6310624 rcv_wnd=5918720 family=AF 1458.466861: tcp:tcp_rcvbuf_grow: time=193 rtt_us=83 copied=2949120 inq=180224 space=2457600 ooo=0 scaling_ratio=240 rcvbuf=5751438 rcv_ssthresh=5357689 window_clamp=5391973 rcv_wnd=5054464 family=AF_ 1458.775476: tcp:tcp_rcvbuf_grow: time=257 rtt_us=127 copied=4304896 inq=352256 space=3346432 ooo=0 scaling_ratio=240 rcvbuf=8067131 rcv_ssthresh=7523275 window_clamp=7562935 rcv_wnd=7061504 family=AF 1458.776631: tcp:tcp_rcvbuf_grow: time=200 rtt_us=96 copied=3260416 inq=143360 space=2768896 ooo=0 scaling_ratio=240 rcvbuf=6397256 rcv_ssthresh=5938567 window_clamp=5997427 rcv_wnd=5828608 family=AF_ 1459.707973: tcp:tcp_rcvbuf_grow: time=215 rtt_us=96 copied=2506752 inq=163840 space=1388544 ooo=0 scaling_ratio=240 rcvbuf=3068867 rcv_ssthresh=2768282 window_clamp=2877062 rcv_wnd=2555904 family=AF_ 1460.246494: tcp:tcp_rcvbuf_grow: time=231 rtt_us=80 copied=3756032 inq=204800 space=3117056 ooo=0 scaling_ratio=240 rcvbuf=7288091 rcv_ssthresh=6773725 window_clamp=6832585 rcv_wnd=6471680 family=AF_ 1460.714596: tcp:tcp_rcvbuf_grow: time=270 rtt_us=110 copied=4714496 inq=311296 space=3719168 ooo=0 scaling_ratio=240 rcvbuf=8957739 rcv_ssthresh=8339020 window_clamp=8397880 rcv_wnd=7933952 family=AF 1462.029977: tcp:tcp_rcvbuf_grow: time=101 rtt_us=19 copied=1105920 inq=40960 space=1036288 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=1986560 family=AF_I 1462.802385: tcp:tcp_rcvbuf_grow: time=89 rtt_us=45 copied=1069056 inq=0 space=1064960 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=2035712 family=AF_INET6 1462.918648: tcp:tcp_rcvbuf_grow: time=105 rtt_us=33 copied=1441792 inq=180224 space=1069056 ooo=0 scaling_ratio=240 rcvbuf=2383282 rcv_ssthresh=2091684 window_clamp=2234326 rcv_wnd=1896448 family=AF_ 1463.222533: tcp:tcp_rcvbuf_grow: time=273 rtt_us=144 copied=4603904 inq=385024 space=3469312 ooo=0 scaling_ratio=240 rcvbuf=8422564 rcv_ssthresh=7891053 window_clamp=7896153 rcv_wnd=7409664 family=AF 1466.519312: tcp:tcp_rcvbuf_grow: time=130 rtt_us=23 copied=1343488 inq=0 space=1261568 ooo=0 scaling_ratio=240 rcvbuf=2780158 rcv_ssthresh=2493778 window_clamp=2606398 rcv_wnd=2494464 family=AF_INET6 1466.681003: tcp:tcp_rcvbuf_grow: time=128 rtt_us=21 copied=1441792 inq=12288 space=1343488 ooo=0 scaling_ratio=240 rcvbuf=2932027 rcv_ssthresh=2578555 window_clamp=2748775 rcv_wnd=2568192 family=AF_I 1470.689959: tcp:tcp_rcvbuf_grow: time=255 rtt_us=122 copied=3932160 inq=204800 space=3551232 ooo=0 scaling_ratio=240 rcvbuf=8182038 rcv_ssthresh=7647384 window_clamp=7670660 rcv_wnd=7442432 family=AF 1471.754154: tcp:tcp_rcvbuf_grow: time=188 rtt_us=95 copied=2138112 inq=577536 space=1429504 ooo=0 scaling_ratio=240 rcvbuf=3113650 rcv_ssthresh=2806426 window_clamp=2919046 rcv_wnd=2248704 family=AF_ 1476.813542: tcp:tcp_rcvbuf_grow: time=269 rtt_us=99 copied=3088384 inq=180224 space=2564096 ooo=0 scaling_ratio=240 rcvbuf=6219470 rcv_ssthresh=5771893 window_clamp=5830753 rcv_wnd=5509120 family=AF_ 1477.738309: tcp:tcp_rcvbuf_grow: time=166 rtt_us=54 copied=1777664 inq=180224 space=1417216 ooo=0 scaling_ratio=240 rcvbuf=3117118 rcv_ssthresh=2874958 window_clamp=2922298 rcv_wnd=2613248 family=AF_ We can see sk_rcvbuf values are much smaller, and that rtt_us (estimation of rtt from a receiver point of view) is kept small, instead of being bloated. No difference in throughput. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20251119084813.3684576-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20tcp: tcp_moderate_rcvbuf is only used in rx pathEric Dumazet
sysctl_tcp_moderate_rcvbuf is only used from tcp_rcvbuf_grow(). Move it to netns_ipv4_read_rx group. Remove various CACHELINE_ASSERT_GROUP_SIZE() from netns_ipv4_struct_check(), as they have no real benefit but cause pain for all changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251119084813.3684576-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20Bluetooth: hci_core: lookup hci_conn on RX path on protocol sidePauli Virtanen
The hdev lock/lookup/unlock/use pattern in the packet RX path doesn't ensure hci_conn* is not concurrently modified/deleted. This locking appears to be leftover from before conn_hash started using RCU commit bf4c63252490b ("Bluetooth: convert conn hash to RCU") and not clear if it had purpose since then. Currently, there are code paths that delete hci_conn* from elsewhere than the ordered hdev->workqueue where the RX work runs in. E.g. commit 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") introduced some of these, and there probably were a few others before it. It's better to do the locking so that even if these run concurrently no UAF is possible. Move the lookup of hci_conn and associated socket-specific conn to protocol recv handlers, and do them within a single critical section to cover hci_conn* usage and lookup. syzkaller has reported a crash that appears to be this issue: [Task hdev->workqueue] [Task 2] hci_disconnect_all_sync l2cap_recv_acldata(hcon) hci_conn_get(hcon) hci_abort_conn_sync(hcon) hci_dev_lock hci_dev_lock hci_conn_del(hcon) v-------------------------------- hci_dev_unlock hci_conn_put(hcon) conn = hcon->l2cap_data (UAF) Fixes: 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") Reported-by: syzbot+d32d77220b92eddd89ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d32d77220b92eddd89ad Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-11-20Bluetooth: btusb: mediatek: Fix kernel crash when releasing mtk iso interfaceChris Lu
When performing reset tests and encountering abnormal card drop issues that lead to a kernel crash, it is necessary to perform a null check before releasing resources to avoid attempting to release a null pointer. <4>[ 29.158070] Hardware name: Google Quigon sku196612/196613 board (DT) <4>[ 29.158076] Workqueue: hci0 hci_cmd_sync_work [bluetooth] <4>[ 29.158154] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 29.158162] pc : klist_remove+0x90/0x158 <4>[ 29.158174] lr : klist_remove+0x88/0x158 <4>[ 29.158180] sp : ffffffc0846b3c00 <4>[ 29.158185] pmr_save: 000000e0 <4>[ 29.158188] x29: ffffffc0846b3c30 x28: ffffff80cd31f880 x27: ffffff80c1bdc058 <4>[ 29.158199] x26: dead000000000100 x25: ffffffdbdc624ea3 x24: ffffff80c1bdc4c0 <4>[ 29.158209] x23: ffffffdbdc62a3e6 x22: ffffff80c6c07000 x21: ffffffdbdc829290 <4>[ 29.158219] x20: 0000000000000000 x19: ffffff80cd3e0648 x18: 000000031ec97781 <4>[ 29.158229] x17: ffffff80c1bdc4a8 x16: ffffffdc10576548 x15: ffffff80c1180428 <4>[ 29.158238] x14: 0000000000000000 x13: 000000000000e380 x12: 0000000000000018 <4>[ 29.158248] x11: ffffff80c2a7fd10 x10: 0000000000000000 x9 : 0000000100000000 <4>[ 29.158257] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 2d7223ff6364626d <4>[ 29.158266] x5 : 0000008000000000 x4 : 0000000000000020 x3 : 2e7325006465636e <4>[ 29.158275] x2 : ffffffdc11afeff8 x1 : 0000000000000000 x0 : ffffffdc11be4d0c <4>[ 29.158285] Call trace: <4>[ 29.158290] klist_remove+0x90/0x158 <4>[ 29.158298] device_release_driver_internal+0x20c/0x268 <4>[ 29.158308] device_release_driver+0x1c/0x30 <4>[ 29.158316] usb_driver_release_interface+0x70/0x88 <4>[ 29.158325] btusb_mtk_release_iso_intf+0x68/0xd8 [btusb (HASH:e8b6 5)] <4>[ 29.158347] btusb_mtk_reset+0x5c/0x480 [btusb (HASH:e8b6 5)] <4>[ 29.158361] hci_cmd_sync_work+0x10c/0x188 [bluetooth (HASH:a4fa 6)] <4>[ 29.158430] process_scheduled_works+0x258/0x4e8 <4>[ 29.158441] worker_thread+0x300/0x428 <4>[ 29.158448] kthread+0x108/0x1d0 <4>[ 29.158455] ret_from_fork+0x10/0x20 <0>[ 29.158467] Code: 91343000 940139d1 f9400268 927ff914 (f9401297) <4>[ 29.158474] ---[ end trace 0000000000000000 ]--- <0>[ 29.167129] Kernel panic - not syncing: Oops: Fatal exception <2>[ 29.167144] SMP: stopping secondary CPUs <4>[ 29.167158] ------------[ cut here ]------------ Fixes: ceac1cb0259d ("Bluetooth: btusb: mediatek: add ISO data transmission functions") Signed-off-by: Chris Lu <chris.lu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20wifi: cfg80211: Add support for 6GHz AP role not relevant AP typePagadala Yesu Anjaneyulu
Add IEEE80211_6GHZ_CTRL_REG_AP_ROLE_NOT_RELEVANT and map it to IEEE80211_REG_LPI_AP for safe regulatory compliance when AP role classification is not applicable. Use LPI as safe fallback to prevent power limit violations. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251112110828.856283677cc7.I36138a34847c3b4e680974bf347dde844448f3bc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-19net: mana: Drop TX skb on post_work_request failure and unmap resourcesAditya Garg
Drop TX packets when posting the work request fails and ensure DMA mappings are always cleaned up. Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19net: mana: Handle SKB if TX SGEs exceed hardware limitAditya Garg
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs) per TX WQE. Exceeding this limit can cause TX failures. Add ndo_features_check() callback to validate SKB layout before transmission. For GSO SKBs that would exceed the hardware SGE limit, clear NETIF_F_GSO_MASK to enforce software segmentation in the stack. Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still exceed the SGE limit. Also, Add ethtool counter for SKBs linearized Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18Merge tag 'ipsec-2025-11-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-11-18 1) Misc fixes for xfrm_state creation/modification/deletion. Patchset from Sabrina Dubroca. 2) Fix inner packet family determination for xfrm offloads. From Jianbo Liu. 3) Don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path. From Jianbo Liu. 4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy security contexts. From Zilin Guan. * tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix memory leak in xfrm_add_acquire() xfrm: Prevent locally generated packets from direct output in tunnel mode xfrm: Determine inner GSO type from packet inner protocol xfrm: Check inner packet family directly from skb_dst xfrm: check all hash buckets for leftover states during netns deletion xfrm: set err and extack on failure to create pcpu SA xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state xfrm: make state as DEAD before final put when migrate fails xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added xfrm: drop SA reference in xfrm_state_update if dir doesn't match ==================== Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17net: mana: Add standard counter rx_missed_errorsErni Sri Satya Vennela
Report standard counter stats->rx_missed_errors using hc_rx_discards_no_wqe from the hardware. Add a global workqueue to periodically run mana_query_gf_stats every 2 seconds to get the latest info in eth_stats and define a driver capability flag to notify hardware of the periodic queries. To avoid repeated failures and log flooding, the workqueue is not rescheduled if mana_query_gf_stats fails on HWC timeout error and the stats are reset to 0. Other errors are transient which will not need a VF reset for recovery. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17net: mana: Move hardware counter stats from per-port to per-VF contextErni Sri Satya Vennela
Move hardware counter (HC) statistics from mana_port_context to mana_context to enable sharing stats across multiple network ports on the same MANA VF. Previously, each network port queried hardware counters independently using MANA_QUERY_GF_STAT command (GF = Generic Function stats from GDMA hardware), resulting in redundant queries when multiple ports existed on the same device. Isolate hardware counter stats by introducing mana_ethtool_hc_stats in mana_context and update the code to ensure all stats are properly reported via ethtool -S <interface>, maintaining consistency with previous behavior. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-16memcg: net: track network throttling due to memcg memory pressureShakeel Butt
The kernel can throttle network sockets if the memory cgroup associated with the corresponding socket is under memory pressure. The throttling actions include clamping the transmit window, failing to expand receive or send buffers, aggressively prune out-of-order receive queue, FIN deferred to a retransmitted packet and more. Let's add memcg metric to track such throttling actions. At the moment memcg memory pressure is defined through vmpressure and in future it may be defined using PSI or we may add more flexible way for the users to define memory pressure, maybe through ebpf. However the potential throttling actions will remain the same, so this newly introduced metric will continue to track throttling actions irrespective of how memcg memory pressure is defined. Link: https://lkml.kernel.org/r/20251016161035.86161-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Daniel Sedlak <daniel.sedlak@cdn77.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-14sctp: Remove unused declaration sctp_auth_init_hmacs()Yue Haibing
Commit bf40785fa437 ("sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20251113114501.32905-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14tcp: gro: inline tcp_gro_pull_header()Eric Dumazet
tcp_gro_pull_header() is used in GRO fast path, inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251113140358.58242-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13net: dsa: remove definition of struct dsa_switch_driverHeiner Kallweit
Since 93e86b3bc842 ("net: dsa: Remove legacy probing support") this struct has no user any longer. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/4053a98f-052f-4dc1-a3d4-ed9b3d3cc7cb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c 96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") 61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13Merge tag 'net-6.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and Wireless. No known outstanding regressions. Current release - regressions: - eth: - bonding: fix mii_status when slave is down - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state() Previous releases - regressions: - sched: limit try_bulk_dequeue_skb() batches - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe - af_unix: initialise scc_index in unix_add_edge() - netpoll: fix incorrect refcount handling causing incorrect cleanup - bluetooth: don't hold spin lock over sleeping functions - hsr: Fix supervision frame sending on HSRv0 - sctp: prevent possible shift out-of-bounds - tipc: fix use-after-free in tipc_mon_reinit_self(). - dsa: tag_brcm: do not mark link local traffic as offloaded - eth: virtio-net: fix incorrect flags recording in big mode Previous releases - always broken: - sched: initialize struct tc_ife to fix kernel-infoleak - wifi: - mac80211: reject address change while connecting - iwlwifi: avoid toggling links due to wrong element use - bluetooth: cancel mesh send timer when hdev removed - strparser: fix signed/unsigned mismatch bug - handshake: fix memory leak in tls_handshake_accept() Misc: - selftests: mptcp: fix some flaky tests" * tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) hsr: Follow standard for HSRv0 supervision frames hsr: Fix supervision frame sending on HSRv0 virtio-net: fix incorrect flags recording in big mode ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage net_sched: limit try_bulk_dequeue_skb() batches selftests: mptcp: join: properly kill background tasks selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: userspace: longer transfer selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: rm: set backup flag selftests: mptcp: connect: fix fallback note due to OoO ethtool: fix incorrect kernel-doc style comment in ethtool.h mlx5: Fix default values in create CQ Bluetooth: btrtl: Avoid loading the config file on security chips net/mlx5e: Fix potentially misleading debug message net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps net/mlx5e: Fix maxrate wraparound in threshold between units ...
2025-11-12Merge tag 'wireless-next-2025-11-12' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== More -next material, notably: - split ieee80211.h file, it's way too big - mac80211: initial chanctx work towards NAN - mac80211: MU-MIMO sniffer improvements - ath12k: statistics improvements * tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (26 commits) wifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper() wifi: mac80211: make monitor link info check more specific wifi: mac80211: track MU-MIMO configuration on disabled interfaces wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection wifi: cfg80211/mac80211: clean up duplicate ap_power handling wifi: cfg80211: use a C99 initializer in wiphy_register wifi: cfg80211: fix doc of struct key_params wifi: mac80211: remove unnecessary vlan NULL check wifi: mac80211: pass frame type to element parsing wifi: mac80211: remove "disabling VHT" message wifi: mac80211: add and use chanctx usage iteration wifi: mac80211: simplify ieee80211_recalc_chanctx_min_def() API wifi: mac80211: remove chanctx to link back-references wifi: mac80211: make link iteration safe for 'break' wifi: mac80211: fix EHT typo wifi: cfg80211: fix EHT typo wifi: ieee80211: split NAN definitions out wifi: ieee80211: split P2P definitions out wifi: ieee80211: split S1G definitions out wifi: ieee80211: split EHT definitions out ... ==================== Link: https://patch.msgid.link/20251112115126.16223-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11Bluetooth: hci_event: Fix not handling PA Sync Lost eventLuiz Augusto von Dentz
This handles PA Sync Lost event which previously was assumed to be handled with BIG Sync Lost but their lifetime are not the same thus why there are 2 different events to inform when each sync is lost. Fixes: b2a5f2e1c127 ("Bluetooth: hci_event: Add support for handling LE BIG Sync Lost event") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-11-11wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connectionPagadala Yesu Anjaneyulu
Implement fallback to LPI mode when SP mode is not permitted by regulatory constraints for INDOOR_SP connections. Limit fallback mechanism to client mode. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110140806.8b43201a34ae.I37fc7bb5892eb9d044d619802e8f2095fde6b296@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-11wifi: cfg80211/mac80211: clean up duplicate ap_power handlingPagadala Yesu Anjaneyulu
Move duplicated ap_power type handling code to an inline function in cfg80211. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110140806.959948da1cb5.I893b5168329fb3232f249c182a35c99804112da6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-11xsk: add indirect call for xsk_destruct_skbJason Xing
Since Eric proposed an idea about adding indirect call wrappers for UDP and managed to see a huge improvement[1], the same situation can also be applied in xsk scenario. This patch adds an indirect call for xsk and helps current copy mode improve the performance by around 1% stably which was observed with IXGBE at 10Gb/sec loaded. If the throughput grows, the positive effect will be magnified. I applied this patch on top of batch xmit series[2], and was able to see <5% improvement from our internal application which is a little bit unstable though. Use INDIRECT wrappers to keep xsk_destruct_skb static as it used to be when the mitigation config is off. Be aware of the freeing path that can be very hot since the frequency can reach around 2,000,000 times per second with the xdpsock test. [1]: https://lore.kernel.org/netdev/20251006193103.2684156-2-edumazet@google.com/ [2]: https://lore.kernel.org/all/20251021131209.41491-1-kerneljasonxing@gmail.com/ Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251031103328.95468-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-10Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-11-10 We've added 19 non-merge commits during the last 3 day(s) which contain a total of 22 files changed, 1345 insertions(+), 197 deletions(-). The main changes are: 1) Preserve skb metadata after a TC BPF program has changed the skb, from Jakub Sitnicki. This allows a TC program at the end of a TC filter chain to still see the skb metadata, even if another TC program at the front of the chain has changed the skb using BPF helpers. 2) Initial af_smc bpf_struct_ops support to control the smc specific syn/synack options, from D. Wythe. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf/selftests: Add selftest for bpf_smc_hs_ctrl net/smc: bpf: Introduce generic hook for handshake flow bpf: Export necessary symbols for modules with struct_ops selftests/bpf: Cover skb metadata access after bpf_skb_change_proto selftests/bpf: Cover skb metadata access after change_head/tail helper selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room selftests/bpf: Cover skb metadata access after vlan push/pop helper selftests/bpf: Expect unclone to preserve skb metadata selftests/bpf: Dump skb metadata on verification failure selftests/bpf: Verify skb metadata in BPF instead of userspace bpf: Make bpf_skb_change_head helper metadata-safe bpf: Make bpf_skb_change_proto helper metadata-safe bpf: Make bpf_skb_adjust_room metadata-safe bpf: Make bpf_skb_vlan_push helper metadata-safe bpf: Make bpf_skb_vlan_pop helper metadata-safe vlan: Make vlan_remove_tag return nothing bpf: Unclone skb head on bpf_dynptr_write to skb metadata net: Preserve metadata on pskb_expand_head net: Helper to move packet data and metadata after skb_push/pull ==================== Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10sctp: Don't inherit do_auto_asconf in sctp_clone_sock().Kuniyuki Iwashima
syzbot reported list_del(&sp->auto_asconf_list) corruption in sctp_destroy_sock(). The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP listener, calls accept(), and close()s the child socket. setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf to 1 and links sp->auto_asconf_list to a per-netns list. Both fields are placed after sp->pd_lobby in struct sctp_sock, and sctp_copy_descendant() did not copy the fields before the cited commit. Also, sctp_clone_sock() did not set them explicitly. In addition, sctp_auto_asconf_init() is called from sctp_sock_migrate(), but it initialises the fields only conditionally. The two fields relied on __GFP_ZERO added in sk_alloc(), but sk_clone() does not use it. Let's clear newsp->do_auto_asconf in sctp_clone_sock(). [0]: list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8) kernel BUG at lib/list_debug.c:64! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62 Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246 RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00 R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8 FS: 00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:223 [inline] list_del include/linux/list.h:237 [inline] sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163 sk_common_release+0x75/0x310 net/core/sock.c:3961 sctp_close+0x77e/0x900 net/sctp/socket.c:1550 inet_release+0x144/0x190 net/ipv4/af_inet.c:437 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 16942cf4d3e3 ("sctp: Use sk_clone() in sctp_accept().") Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10net/smc: bpf: Introduce generic hook for handshake flowD. Wythe
The introduction of IPPROTO_SMC enables eBPF programs to determine whether to use SMC based on the context of socket creation, such as network namespaces, PID and comm name, etc. As a subsequent enhancement, to introduce a new generic hook that allows decisions on whether to use SMC or not at runtime, including but not limited to local/remote IP address or ports. User can write their own implememtion via bpf_struct_ops now to choose whether to use SMC or not before TCP 3rd handshake to be comleted. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com
2025-11-10wifi: cfg80211: fix doc of struct key_paramsChien Wong
The seq in struct key_params is for many ciphers, including CCMP, GCMP, CMAC, GMAC. In addition to get_key(), it is also used when setting keys. Signed-off-by: Chien Wong <m@xv97.com> Link: https://patch.msgid.link/20251107142332.181308-1-m@xv97.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-10wifi: mac80211: fix EHT typoJohannes Berg
This is clearly EHT, not ETH, fix the typo. Link: https://patch.msgid.link/20251105153958.12a04517f7ec.Idcf800817fa30605b1002c3d2287cad016e7aea7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-10wifi: cfg80211: fix EHT typoJohannes Berg
This is clearly EHT, not ETH, fix the typo. Link: https://patch.msgid.link/20251105153958.e9d4af3b768e.I5f3378326837e3f62928a2f1fd3403f29cea069b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-07Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-11-06 (i40, ice, iavf) Mohammad Heib introduces a new devlink parameter, max_mac_per_vf, for controlling the maximum number of MAC address filters allowed by a VF. This allows administrators to control the VF behavior in a more nuanced manner. Aleksandr and Przemek add support for Receive Side Scaling of GTP to iAVF for VFs running on E800 series ice hardware. This improves performance and scalability for virtualized network functions in 5G and LTE deployments. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add RSS support for GTP protocol via ethtool ice: Extend PTYPE bitmap coverage for GTP encapsulated flows ice: improve TCAM priority handling for RSS profiles ice: implement GTP RSS context tracking and configuration ice: add virtchnl definitions and static data for GTP RSS ice: add flow parsing for GTP and new protocol field support i40e: support generic devlink param "max_mac_per_vf" devlink: Add new "max_mac_per_vf" generic device param ==================== Link: https://patch.msgid.link/20251106225321.1609605-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07psp: add stats from psp spec to driver facing apiJakub Kicinski
Provide a driver api for reporting device statistics required by the "Implementation Requirements" section of the PSP Architecture Specification. Use a warning to ensure drivers report stats required by the spec. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07psp: report basic stats from the coreJakub Kicinski
Track and report stats common to all psp devices from the core. A 'stale-event' is when the core marks the rx state of an active psp_assoc as incapable of authenticating psp encapsulated data. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07tcp: add net.ipv4.tcp_comp_sack_rtt_percentEric Dumazet
TCP SACK compression has been added in 2018 in commit 5d9f4262b7ea ("tcp: add SACK compression"). It is working great for WAN flows (with large RTT). Wifi in particular gets a significant boost _when_ ACK are suppressed. Add a new sysctl so that we can tune the very conservative 5 % value that has been used so far in this formula, so that small RTT flows can benefit from this feature. delay = min ( 5 % of RTT, 1 ms) This patch adds new tcp_comp_sack_rtt_percent sysctl to ease experiments and tuning. Given that we cap the delay to 1ms (tcp_comp_sack_delay_ns sysctl), set the default value to 33 %. Quoting Neal Cardwell ( https://lore.kernel.org/netdev/CADVnQymZ1tFnEA1Q=vtECs0=Db7zHQ8=+WCQtnhHFVbEOzjVnQ@mail.gmail.com/ ) The rationale for 33% is basically to try to facilitate pipelining, where there are always at least 3 ACKs and 3 GSO/TSO skbs per SRTT, so that the path can maintain a budget for 3 full-sized GSO/TSO skbs "in flight" at all times: + 1 skb in the qdisc waiting to be sent by the NIC next + 1 skb being sent by the NIC (being serialized by the NIC out onto the wire) + 1 skb being received and aggregated by the receiver machine's aggregation mechanism (some combination of LRO, GRO, and sack compression) Note that this is basically the same magic number (3) and the same rationales as: (a) tcp_tso_should_defer() ensuring that we defer sending data for no longer than cwnd/tcp_tso_win_divisor (where tcp_tso_win_divisor = 3), and (b) bbr_quantization_budget() ensuring that cwnd is at least 3 GSO/TSO skbs to maintain pipelining and full throughput at low RTTs Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251106115236.3450026-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07tcp: Apply max RTO to non-TFO SYN+ACK.Kuniyuki Iwashima
Since commit 54a378f43425 ("tcp: add the ability to control max RTO"), TFO SYN+ACK RTO is capped by the TFO full sk's inet_csk(sk)->icsk_rto_max. The value is inherited from the parent listener. Let's apply the same cap to non-TFO SYN+ACK. Note that req->rsk_listener is always non-NULL when we call tcp_reqsk_timeout() in reqsk_timer_handler() or tcp_check_req(). It could be NULL for SYN cookie req, but we do not use req->timeout then. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07tcp: Remove timeout arg from reqsk_timeout().Kuniyuki Iwashima
reqsk_timeout() is always called with @timeout being TCP_RTO_MAX. Let's remove the arg. As a prep for the next patch, reqsk_timeout() is moved to tcp.h and renamed to tcp_reqsk_timeout(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07tcp: Remove timeout arg from reqsk_queue_hash_req().Kuniyuki Iwashima
inet_csk_reqsk_queue_hash_add() is no longer shared by DCCP. We do not need to pass req->timeout down to reqsk_queue_hash_req(). Let's move tcp_timeout_init() from tcp_conn_request() to reqsk_queue_hash_req(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07tcp: Call tcp_syn_ack_timeout() directly.Kuniyuki Iwashima
Since DCCP has been removed, we do not need to use request_sock_ops.syn_ack_timeout(). Let's call tcp_syn_ack_timeout() directly. Now other function pointers of request_sock_ops are protocol-dependent. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106003357.273403-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06xsk: Move NETDEV_XDP_ACT_ZC into generic headerDaniel Borkmann
Move NETDEV_XDP_ACT_ZC into xdp_sock_drv.h header such that external code can reuse it, and rename it into more generic NETDEV_XDP_ACT_XSK. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251031212103.310683-7-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: dsa: add tagging driver for MaxLinear GSW1xx switch familyDaniel Golle
Add support for a new DSA tagging protocol driver for the MaxLinear GSW1xx switch family. The GSW1xx switches use a proprietary 8-byte special tag inserted between the source MAC address and the EtherType field to indicate the source and destination ports for frames traversing the CPU port. Implement the tag handling logic to insert the special tag on transmit and parse it on receive. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/0e973ebfd9433c30c96f50670da9e9449a0d98f2.1762170107.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06devlink: Add new "max_mac_per_vf" generic device paramMohammad Heib
Add a new device generic parameter to controls the maximum number of MAC filters allowed per VF. For example, to limit a VF to 3 MAC addresses: $ devlink dev param set pci/0000:3b:00.0 name max_mac_per_vf \ value 3 \ cmode runtime Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-11-06Merge tag 'hardening-v6.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: "This is a work-around for a (now fixed) corner case in the arm32 build with Clang KCFI enabled. - Introduce __nocfi_generic for arm32 Clang (Nathan Chancellor)" * tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk() ARM: Select ARCH_USES_CFI_GENERIC_LLVM_PASS compiler_types: Introduce __nocfi_generic
2025-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig b1d16f7c0063 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") 93f53db9f9dc ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: selftests: export packet creation helpers for driver useRaju Rangoju
Export the network selftest packet creation infrastructure to allow network drivers to reuse the existing selftest framework instead of duplicating packet creation code. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251031111811.775434-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-04net: Convert proto callbacks from sockaddr to sockaddr_unsizedKees Cook
Convert struct proto pre_connect(), connect(), bind(), and bind_add() callback function prototypes from struct sockaddr to struct sockaddr_unsized. This does not change per-implementation use of sockaddr for passing around an arbitrarily sized sockaddr struct. Those will be addressed in future patches. Additionally removes the no longer referenced struct sockaddr from include/net/inet_common.h. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-5-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04xsk: use a smaller new lock for shared pool caseJason Xing
- Split cq_lock into two smaller locks: cq_prod_lock and cq_cached_prod_lock - Avoid disabling/enabling interrupts in the hot xmit path In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function, the race condition is only between multiple xsks sharing the same pool. They are all in the process context rather than interrupt context, so now the small lock named cq_cached_prod_lock can be used without handling interrupts. While cq_cached_prod_lock ensures the exclusive modification of @cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares about @producer and corresponding @desc. Both of them don't necessarily be consistent with @cached_prod protected by cq_cached_prod_lock. That's the reason why the previous big lock can be split into two smaller ones. Please note that SPSC rule is all about the global state of producer and consumer that can affect both layers instead of local or cached ones. Frequently disabling and enabling interrupt are very time consuming in some cases, especially in a per-descriptor granularity, which now can be avoided after this optimization, even when the pool is shared by multiple xsks. With this patch, the performance number[1] could go from 1,872,565 pps to 1,961,009 pps. It's a minor rise of around 5%. [1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64 Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20251030000646.18859-3-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>