summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2026-01-20net/xen-netback: Fix mispelling of "Software" as "Softare"Yicong Hui
Fix misspelling of "software" as "softare" in xen-netback code comment. Signed-off-by: Yicong Hui <yiconghui@gmail.com> Link: https://patch.msgid.link/20260118121001.136806-4-yiconghui@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net/micrel: Fix typos in micrel driver code commentsYicong Hui
Fix various typos and misspellings in code comments in the drivers/net/ethernet/micrel directory Signed-off-by: Yicong Hui <yiconghui@gmail.com> Link: https://patch.msgid.link/20260118121001.136806-3-yiconghui@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net/benet: Fix typos in driver code commentsYicong Hui
Fix various typos and misspellings in code comments in the drivers/net/ethernet/emulex directory Signed-off-by: Yicong Hui <yiconghui@gmail.com> Link: https://patch.msgid.link/20260118121001.136806-2-yiconghui@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Octeontx2-pf: Update xdp featuresHariprasad Kelam
In recent testing, verification of XDP_REDIRECT and zero-copy features failed because the driver is not setting the corresponding feature flags. Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support") Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260119100222.2267925-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20usbnet: limit max_mtu based on device's hard_mtuLaurent Vivier
The usbnet driver initializes net->max_mtu to ETH_MAX_MTU before calling the device's bind() callback. When the bind() callback sets dev->hard_mtu based the device's actual capability (from CDC Ethernet's wMaxSegmentSize descriptor), max_mtu is never updated to reflect this hardware limitation). This allows userspace (DHCP or IPv6 RA) to configure MTU larger than the device can handle, leading to silent packet drops when the backend sends packet exceeding the device's buffer size. Fix this by limiting net->max_mtu to the device's hard_mtu after the bind callback returns. See https://gitlab.com/qemu-project/qemu/-/issues/3268 and https://bugs.passt.top/attachment.cgi?bugid=189 Fixes: f77f0aee4da4 ("net: use core MTU range checking in USB NIC drivers") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Link: https://bugs.passt.top/show_bug.cgi?id=189 Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Link: https://patch.msgid.link/20260119075518.2774373-1-lvivier@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: phy: simplify PHY fixup registrationHeiner Kallweit
Based on the fact that either bus_id-based matching or phy_uid-based matching is used, the code can be simplified. PHY_ANY_ID and PHY_ANY_UID are not needed. Ensure that phy_id_compare() is called only if phy_uid_mask isn't zero, because a zero value would always result in a match. In addition change the return value type of phy_needs_fixup() to bool. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/e7394cc8-5895-4d02-a8fe-802345c7c547@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: macb: Replace open-coded device config retrieval with ↵Kevin Hao
of_device_get_match_data() Use of_device_get_match_data() to replace the open-coded method for obtaining the device config. Additionally, adjust the ordering of local variables to ensure compatibility with RCS. Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260117-macb-v1-1-f092092d8c91@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: airoha_eth: increase max MTU to 9220 for DSA jumbo framesSayantan Nandy
The industry standard jumbo frame MTU is 9216 bytes. When using the DSA subsystem, a 4-byte tag is added to each Ethernet frame. Increase AIROHA_MAX_MTU to 9220 bytes (9216 + 4) so that users can set a standard 9216-byte MTU on DSA ports. The underlying hardware supports significantly larger frame sizes (approximately 16K). However, the maximum MTU is limited to 9220 bytes for now, as this is sufficient to support standard jumbo frames and does not incur additional memory allocation overhead. Signed-off-by: Sayantan Nandy <sayantann11@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260119073658.6216-1-sayantann11@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: txgbe: remove the redundant data return in SW-FW mailboxJiawen Wu
For these two firmware mailbox commands, in txgbe_test_hostif() and txgbe_set_phy_link_hostif(), there is no need to read data from the buffer. Under the current setting, OEM firmware will cause the driver to fail to probe. Because OEM firmware returns more link information, with a larger OEM structure txgbe_hic_ephy_getlink. However, the current driver does not support the OEM function. So just fix it in the way that does not involve reading the returned data. Fixes: d84a3ff9aae8 ("net: txgbe: Restrict the use of mismatched FW versions") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/2914AB0BC6158DDA+20260119065935.6015-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix the HCLGE_FD_AD_NXT_KEY error setting issueJijie Shao
Use next_input_key instead of counter_id to set HCLGE_FD_AD_NXT_KEY. Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260119132840.410513-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix wrong GENMASK() for HCLGE_FD_AD_COUNTER_NUM_MJijie Shao
HCLGE_FD_AD_COUNTER_NUM_M should be at GENMASK(19, 13), rather than at GENMASK(20, 13), because bit 20 is HCLGE_FD_AD_NXT_STEP_B. This patch corrects the wrong definition. Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260119132840.410513-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: stmmac: fix resume: calculate tso last_segmentRussell King (Oracle)
Tao Wang reports that sometimes, after resume, stmmac can watchdog: NETDEV WATCHDOG: CPU: x: transmit queue x timed out xx ms When this occurs, the DMA transmit descriptors contain: eth0: 221 [0x0000000876d10dd0]: 0x73660cbe 0x8 0x42 0xb04416a0 eth0: 222 [0x0000000876d10de0]: 0x77731d40 0x8 0x16a0 0x90000000 where descriptor 221 is the TSO header and 222 is the TSO payload. tdes3 for descriptor 221 (0xb04416a0) has both bit 29 (first descriptor) and bit 28 (last descriptor) set, which is incorrect. The following packet also has bit 28 set, but isn't marked as a first descriptor, and this causes the transmit DMA to stall. This occurs because stmmac_tso_allocator() populates the first descriptor, but does not set .last_segment correctly. There are two places where this matters: one is later in stmmac_tso_xmit() where we use it to update the TSO header descriptor. The other is in the ring/chain mode clean_desc3() which is a performance optimisation. Rather than using tx_q->tx_skbuff_dma[].last_segment to determine whether the first descriptor entry is the only segment, calculate the number of descriptor entries used. If there is only one descriptor, then the first is also the last, so mark it as such. Further work will be necessary to either eliminate .last_segment entirely or set it correctly. Code analysis also indicates that a similar issue exists with .is_jumbo. These will be the subject of a future patch. Reported-by: Tao Wang <tao03.wang@horizon.auto> Fixes: c2837423cb54 ("net: stmmac: Rework TX Coalesce logic") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vhq8O-00000005N5s-0Ke5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20be2net: fix data race in be_get_new_eqdDavid Yang
In be_get_new_eqd(), statistics of pkts, protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. Before the commit in question, these statistics were retrieved one by one directly from queues. Fix this by reading them into temporary variables first. Fixes: 209477704187 ("be2net: set interrupt moderation for Skyhawk-R using EQ-DB") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260119153440.1440578-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20idpf: Fix data race in idpf_net_dimDavid Yang
In idpf_net_dim(), some statistics protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. The correct way to copy statistics is already illustrated by idpf_add_queue_stats(). Fix this by reading them into temporary variables first. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix data race in hns3_fetch_statsDavid Yang
In hns3_fetch_stats(), ring statistics, protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. These statistics are already accumulated by hns3_ring_stats_update(). Fix this by reading them into a temporary buffer first. Fixes: b20d7fe51e0d ("net: hns3: add some statitics info to tx process") Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260119160759.1455950-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20octeontx2-pf: Remove unnecessary bounds checkSimon Horman
active_fec is a 2-bit unsigned field, and thus can only have the values 0-3. So checking that it is less than 4 is unnecessary. Simplify the code by dropping this check. As it no longer fits well where it is, move FEC_MAX_INDEX to towards the top of the file. And add the prefix OXT2. I believe this is more idiomatic. Flagged by Smatch as: ...//otx2_ethtool.c:1024 otx2_get_fecparam() warn: always true condition '(pfvf->linfo.fec < 4) => (0-3 < 4)' No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260119-oob-v1-1-a4147e75e770@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: usb: r8152: fix transmit queue timeoutMingj Ye
When the TX queue length reaches the threshold, the netdev watchdog immediately detects a TX queue timeout. This patch updates the trans_start timestamp of the transmit queue on every asynchronous USB URB submission along the transmit path, ensuring that the network watchdog accurately reflects ongoing transmission activity. Signed-off-by: Mingj Ye <insyelu@gmail.com> Reviewed-by: Hayes Wang <hayeswang@realtek.com> Link: https://patch.msgid.link/20260120015949.84996-1-insyelu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Update RX mbox timeout valueMohsin Bashir
While waiting for completions on read requests, driver is using different timeout values for different messages. Make use of a single timeout value. Introduce a wrapper function to handle the wait, which also simplify maintaining the 80 char line limit. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Remove retry supportMohsin Bashir
The driver retries sensor read requests from firmware, but this is unnecessary. A functioning firmware should respond to each request within the timeout period. Remove the retry logic and set the timeout to the sum of all retry timeouts. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-5-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Reuse RX mailbox pagesMohsin Bashir
Currently, the RX mailbox frees and reallocates a page for each received message. Since FW Rx messages are processed synchronously, and nothing hold these pages (unlike skbs which we hand over to the stack), reuse the pages and put them back on the Rx ring. Now that we ensure the ring is always fully populated we don't have to worry about filling it up after partial population during init, either. Update fbnic_mbx_process_rx_msgs() to recycle pages after message processing. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Allocate all pages for RX mailboxMohsin Bashir
Now that memory is allocated with GFP_KERNEL, allocation failures should be extremely rare. Ensure the FW communication ring is always fully populated with free pages, and hard fail initialization otherwise. This enables simplifications in next patches. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Use GFP_KERNEL to allocting mbx pagesMohsin Bashir
Replace GFP_ATOMIC with GFP_KERNEL for mailbox RX page allocation. Since interrupt handler is threaded GFP_KERNEL is a safe option to reduce allocation failures. Also remove __GFP_NOWARN so the kernel reports a warning on allocation failure to aid debugging. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linuxJakub Kicinski
Pavel Begunkov says: ==================== Add support for providers with large rx buffer Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the user exposed parameter is defined in zcrx terms, which allows it to be flexible and adjusted in the future. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v8 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len * tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux: io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes eth: bnxt: support qcfg provided rx page size eth: bnxt: adjust the fill level of agg queues with larger buffers eth: bnxt: store rx buffer size per queue net: pass queue rx page size from memory provider net: add bare bone queue configs net: reduce indent of struct netdev_queue_mgmt_ops members net: memzero mp params when closing a queue ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"Jakub Kicinski
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497: 931420a2fc36 ("selftests/net: Add netkit container tests") ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing") 6be87fbb2776 ("selftests/net: Add env for container based tests") 61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program") 920da3634194 ("netkit: Add xsk support for af_xdp applications") eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices") b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") b5c3fa4a0b16 ("netkit: Add single device mode for netkit") 0073d2fd679d ("xsk: Proxy pool management for leased queues") 1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation") 804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues") 0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues") ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized") 9e2103f36110 ("net: Add lease info to queue-get response") 31127deddef4 ("net: Implement netdev_nl_queue_create_doit") a5546e18f77c ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: phy: intel-xway: fix OF node refcount leakageDaniel Golle
Automated review spotted am OF node reference count leakage when checking if the 'leds' child node exists. Call of_put_node() to correctly maintain the refcount. Link: https://netdev-ai.bots.linux.dev/ai-review.html?id=20f173ba-0c64-422b-a663-fea4b4ad01d0 Fixes: 1758af47b98c1 ("net: phy: intel-xway: add support for PHY LEDs") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/e3275e1c1cdca7e6426bb9c11f33bd84b8d900c8.1768783208.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20idpf: read lower clock bits inside the time sandwichMina Almasry
PCIe reads need to be done inside the time sandwich because PCIe writes may get buffered in the PCIe fabric and posted to the device after the _postts completes. Doing the PCIe read inside the time sandwich guarantees that the write gets flushed before the _postts timestamp is taken. Cc: lrizzo@google.com Cc: namangulati@google.com Cc: willemb@google.com Cc: intel-wired-lan@lists.osuosl.org Cc: milena.olech@intel.com Cc: jacob.e.keller@intel.com Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Suggested-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-01-20ice: fix devlink reload call tracePaul Greenwalt
Commit 4da71a77fc3b ("ice: read internal temperature sensor") introduced internal temperature sensor reading via HWMON. ice_hwmon_init() was added to ice_init_feature() and ice_hwmon_exit() was added to ice_remove(). As a result if devlink reload is used to reinit the device and then the driver is removed, a call trace can occur. BUG: unable to handle page fault for address: ffffffffc0fd4b5d Call Trace: string+0x48/0xe0 vsnprintf+0x1f9/0x650 sprintf+0x62/0x80 name_show+0x1f/0x30 dev_attr_show+0x19/0x60 The call trace repeats approximately every 10 minutes when system monitoring tools (e.g., sadc) attempt to read the orphaned hwmon sysfs attributes that reference freed module memory. The sequence is: 1. Driver load, ice_hwmon_init() gets called from ice_init_feature() 2. Devlink reload down, flow does not call ice_remove() 3. Devlink reload up, ice_hwmon_init() gets called from ice_init_feature() resulting in a second instance 4. Driver unload, ice_hwmon_exit() called from ice_remove() leaving the first hwmon instance orphaned with dangling pointer Fix this by moving ice_hwmon_exit() from ice_remove() to ice_deinit_features() to ensure proper cleanup symmetry with ice_hwmon_init(). Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-01-20ice: add missing ice_deinit_hw() in devlink reinit pathPaul Greenwalt
devlink-reload results in ice_init_hw failed error, and then removing the ice driver causes a NULL pointer dereference. [ +0.102213] ice 0000:ca:00.0: ice_init_hw failed: -16 ... [ +0.000001] Call Trace: [ +0.000003] <TASK> [ +0.000006] ice_unload+0x8f/0x100 [ice] [ +0.000081] ice_remove+0xba/0x300 [ice] Commit 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") removed ice_deinit_hw() from ice_deinit_dev(). As a result ice_devlink_reinit_down() no longer calls ice_deinit_hw(), but ice_devlink_reinit_up() still calls ice_init_hw(). Since the control queues are not uninitialized, ice_init_hw() fails with -EBUSY. Add ice_deinit_hw() to ice_devlink_reinit_down() to correspond with ice_init_hw() in ice_devlink_reinit_up(). Fixes: 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-01-20ice: Fix persistent failure in ice_get_rxfhCody Haas
Several ioctl functions have the ability to call ice_get_rxfh, however all of these ioctl functions do not provide all of the expected information in ethtool_rxfh_param. For example, ethtool_get_rxfh_indir does not provide an rss_key. This previously caused ethtool_get_rxfh_indir to always fail with -EINVAL. This change draws inspiration from i40e_get_rss to handle this situation, by only calling the appropriate rss helpers when the necessary information has been provided via ethtool_rxfh_param. Fixes: b66a972abb6b ("ice: Refactor ice_set/get_rss into LUT and key specific functions") Signed-off-by: Cody Haas <chaas@riotgames.com> Closes: https://lore.kernel.org/intel-wired-lan/CAH7f-UKkJV8MLY7zCdgCrGE55whRhbGAXvgkDnwgiZ9gUZT7_w@mail.gmail.com/ Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-01-20netdevsim: fix a race issue related to the operation on bpf_bound_progs listYun Lu
The netdevsim driver lacks a protection mechanism for operations on the bpf_bound_progs list. When the nsim_bpf_create_prog() performs list_add_tail, it is possible that nsim_bpf_destroy_prog() is simultaneously performs list_del. Concurrent operations on the list may lead to list corruption and trigger a kernel crash as follows: [ 417.290971] kernel BUG at lib/list_debug.c:62! [ 417.290983] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 417.290992] CPU: 10 PID: 168 Comm: kworker/10:1 Kdump: loaded Not tainted 6.19.0-rc5 #1 [ 417.291003] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 417.291007] Workqueue: events bpf_prog_free_deferred [ 417.291021] RIP: 0010:__list_del_entry_valid_or_report+0xa7/0xc0 [ 417.291034] Code: a8 ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 48 a1 eb ae e8 ed fb a8 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 80 a1 eb ae e8 d9 fb a8 ff <0f> 0b 48 89 d1 48 c7 c7 d0 a1 eb ae 48 89 f2 48 89 c6 e8 c2 fb a8 [ 417.291040] RSP: 0018:ffffb16a40807df8 EFLAGS: 00010246 [ 417.291046] RAX: 000000000000006d RBX: ffff8e589866f500 RCX: 0000000000000000 [ 417.291051] RDX: 0000000000000000 RSI: ffff8e59f7b23180 RDI: ffff8e59f7b23180 [ 417.291055] RBP: ffffb16a412c9000 R08: 0000000000000000 R09: 0000000000000003 [ 417.291059] R10: ffffb16a40807c80 R11: ffffffffaf9edce8 R12: ffff8e594427ac20 [ 417.291063] R13: ffff8e59f7b44780 R14: ffff8e58800b7a05 R15: 0000000000000000 [ 417.291074] FS: 0000000000000000(0000) GS:ffff8e59f7b00000(0000) knlGS:0000000000000000 [ 417.291079] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.291083] CR2: 00007fc4083efe08 CR3: 00000001c3626006 CR4: 0000000000770ee0 [ 417.291088] PKRU: 55555554 [ 417.291091] Call Trace: [ 417.291096] <TASK> [ 417.291103] nsim_bpf_destroy_prog+0x31/0x80 [netdevsim] [ 417.291154] __bpf_prog_offload_destroy+0x2a/0x80 [ 417.291163] bpf_prog_dev_bound_destroy+0x6f/0xb0 [ 417.291171] bpf_prog_free_deferred+0x18e/0x1a0 [ 417.291178] process_one_work+0x18a/0x3a0 [ 417.291188] worker_thread+0x27b/0x3a0 [ 417.291197] ? __pfx_worker_thread+0x10/0x10 [ 417.291207] kthread+0xe5/0x120 [ 417.291214] ? __pfx_kthread+0x10/0x10 [ 417.291221] ret_from_fork+0x31/0x50 [ 417.291230] ? __pfx_kthread+0x10/0x10 [ 417.291236] ret_from_fork_asm+0x1a/0x30 [ 417.291246] </TASK> Add a mutex lock, to prevent simultaneous addition and deletion operations on the list. Fixes: 31d3ad832948 ("netdevsim: add bpf offload support") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Signed-off-by: Yun Lu <luyun@kylinos.cn> Link: https://patch.msgid.link/20260116095308.11441-1-luyun_611@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20Merge tag 'mhi-for-v6.20' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host -------- - Add support for loading dual ELF image format firmware to Qcom Trust Management Engine Lit (TME-L) supported devices like QCC2072, which require separate ELF header for SBL and WLAN firmware segments in a single firmware. - Remove the MHI auto_queue feature support. This feature was added to offload the queuing of buffers from the client drivers to the MHI stack, but it caused a lot of race over the time. So remove this feature from the QRTR client driver and also from the MHI stack/controller drivers. - Move the .probe() and .remove() callbacks from driver level to bus level. MHI Endpoint ------------ - Move the .probe() and .remove() callbacks from driver level to bus level. * tag 'mhi-for-v6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: ep: Use bus callbacks for .probe() and .remove() bus: mhi: host: Use bus callbacks for .probe() and .remove() bus: mhi: host: Drop the auto_queue support net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels mhi: host: Add support for loading dual ELF image format
2026-01-20netkit: Add xsk support for af_xdp applicationsDaniel Borkmann
Enable support for AF_XDP applications to operate on a netkit device. The goal is that AF_XDP applications can natively consume AF_XDP from network namespaces. The use-case from Cilium side is to support Kubernetes KubeVirt VMs through QEMU's AF_XDP backend. KubeVirt is a virtual machine management add-on for Kubernetes which aims to provide a common ground for virtualization. KubeVirt spawns the VMs inside Kubernetes Pods which reside in their own network namespace just like regular Pods. Raw QEMU AF_XDP backend example with eth0 being a physical device with 16 queues where netkit is bound to the last queue (for multi-queue RSS context can be used if supported by the driver): # ethtool -X eth0 start 0 equal 15 # ethtool -X eth0 start 15 equal 1 context new # ethtool --config-ntuple eth0 flow-type ether \ src 00:00:00:00:00:00 \ src-mask ff:ff:ff:ff:ff:ff \ dst $mac dst-mask 00:00:00:00:00:00 \ proto 0 proto-mask 0xffff action 15 [ ... setup BPF/XDP prog on eth0 to steer into shared xsk map ... ] # ip netns add foo # ip link add numrxqueues 2 nk type netkit single # ./pyynl/cli.py --spec ~/netlink/specs/netdev.yaml \ --do queue-create \ --json "{"ifindex": $(ifindex nk), "type": "rx", \ "lease": { "ifindex": $(ifindex eth0), \ "queue": { "type": "rx", "id": 15 } } }" {'id': 1} # ip link set nk netns foo # ip netns exec foo ip link set lo up # ip netns exec foo ip link set nk up # ip netns exec foo qemu-system-x86_64 \ -kernel $kernel \ -drive file=${image_name},index=0,media=disk,format=raw \ -append "root=/dev/sda rw console=ttyS0" \ -cpu host \ -m $memory \ -enable-kvm \ -device virtio-net-pci,netdev=net0,mac=$mac \ -netdev af-xdp,ifname=nk,id=net0,mode=native,queues=1,start-queue=1,inhibit=on,map-path=$dir/xsks_map \ -nographic We have tested the above against a dual-port Nvidia ConnectX-6 (mlx5) 100G NIC with successful network connectivity out of QEMU. An earlier iteration of this work was presented at LSF/MM/BPF [0] and more recently at LPC [1]. For getting to a first starting point to connect all things with KubeVirt, bind mounting the xsk map from Cilium into the VM launcher Pod which acts as a regular Kubernetes Pod while not perfect, is not a big problem given its out of reach from the application sitting inside the VM (and some of the control plane aspects are baked in the launcher Pod already), so the isolation barrier is still the VM. Eventually the goal is to have a XDP/XSK redirect extension where there is no need to have the xsk map, and the BPF program can just derive the target xsk through the queue where traffic was received on. The exposure through netkit is because Cilium should not act as a proxy handing out xsk sockets. Existing applications expect a netdev from kernel side and should not need to rewrite just to implement against a CNI's protocol. Also, all the memory should not be accounted against Cilium but rather the application Pod itself which is consuming AF_XDP. Further, on up/downgrades we expect the data plane to being completely decoupled from the control plane; if Cilium would own the sockets that would be disruptive. Another use-case which opens up and is regularly asked from users would be to have DPDK applications on top of AF_XDP in regular Kubernetes Pods. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Link: https://lpc.events/event/19/contributions/2275/ [1] Link: https://patch.msgid.link/20260115082603.219152-13-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20netkit: Add netkit notifier to check for unregistering devicesDaniel Borkmann
Add a netdevice notifier in netkit to watch for NETDEV_UNREGISTER events. If the target device is indeed NETREG_UNREGISTERING and previously leased a queue to a netkit device, then collect the related netkit devices and batch-unregister_netdevice_many() them. If this would not be done, then the netkit device would hold a reference on the physical device preventing it from going away. However, in case of both io_uring zero-copy as well as AF_XDP this situation is handled gracefully and the allocated resources are torn down. In the case where mentioned infra is used through netkit, the applications have a reference on netkit, and netkit in turn holds a reference on the physical device. In order to have netkit release the reference on the physical device, we need such watcher to then unregister the netkit ones. This is generally quite similar to the dependency handling in case of tunnels (e.g. vxlan bound to a underlying netdev) where the tunnel device gets removed along with the physical device. # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff [...] # rmmod mlx5_ib # rmmod mlx5_core [ 309.261822] mlx5_core 0000:0a:00.0 mlx5_0: Port: 1 Link DOWN [ 344.235236] mlx5_core 0000:0a:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.246948] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.463754] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.770155] mlx5_core 0000:0a:00.1: E-Switch: cleanup [ 345.345709] mlx5_core 0000:0a:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 345.357524] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 350.995989] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 351.574396] mlx5_core 0000:0a:00.0: E-Switch: cleanup # ip a [...] [ both enp10s0f0np0 and nk gone ] [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-12-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20netkit: Implement rtnl_link_ops->alloc and ndo_queue_createDavid Wei
Implement rtnl_link_ops->alloc that allows the number of rx queues to be set when netkit is created. By default, netkit has only a single rxq (and single txq). The number of queues is deliberately not allowed to be changed via ethtool -L and is fixed for the lifetime of a netkit instance. For netkit device creation, numrxqueues with larger than one rxq can be specified. These rxqs are leasable to real rxqs in physical netdevs: ip link add type netkit peer numrxqueues 64 # for device pair ip link add numrxqueues 64 type netkit single # for single device The limit of numrxqueues for netkit is currently set to 1024, which allows leasing multiple real rxqs from physical netdevs. The implementation of ndo_queue_create() adds a new rxq during the queue lease operation. We allow to create queues either in single device mode or for the case of dual device mode for the netkit peer device which gets placed into the target network namespace. For dual device mode the lease against the primary device does not make sense for the targeted use cases, and therefore gets rejected. We also need to add a lockdep class for netkit, such that lockdep does not trip over us, similarly done as in commit 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers"). This is also the last missing bit to netkit for supporting io_uring with zero-copy mode [0]. Up until this point it was not possible to consume the latter out of containers or Kubernetes Pods where applications are in their own network namespace. io_uring example with eth0 being a physical device with 16 queues where netkit is bound to the last queue, iou-zcrx.c is binary from selftests. Flow steering to that queue is based on the service VIP:port of the server utilizing io_uring: # ethtool -X eth0 start 0 equal 15 # ethtool -X eth0 start 15 equal 1 context new # ethtool --config-ntuple eth0 flow-type tcp4 dst-ip 1.2.3.4 dst-port 5000 action 15 # ip netns add foo # ip link add type netkit peer numrxqueues 2 # ./pyynl/cli.py --spec ~/netlink/specs/netdev.yaml \ --do queue-create \ --json "{"ifindex": $(ifindex nk0), "type": "rx", \ "lease": { "ifindex": $(ifindex eth0), \ "queue": { "type": "rx", "id": 15 } } }" {'id': 1} # ip link set nk0 netns foo # ip link set nk1 up # ip netns exec foo ip link set lo up # ip netns exec foo ip link set nk0 up # ip netns exec foo ip addr add 1.2.3.4/32 dev nk0 [ ... setup routing etc to get external traffic into the netns ... ] # ip netns exec foo ./iou-zcrx -s -p 5000 -i nk0 -q 1 Remote io_uring client: # ./iou-zcrx -c -h 1.2.3.4 -p 5000 -l 12840 -z 65536 We have tested the above against a Broadcom BCM957504 (bnxt_en) 100G NIC, supporting TCP header/data split. Similarly, this also works for devmem which we tested using ncdevmem: # ip netns exec foo ./ncdevmem -s 1.2.3.4 -l -p 5000 -f nk0 -t 1 -q 1 And on the remote client: # ./ncdevmem -s 1.2.3.4 -p 5000 -f eth0 For Cilium, the plan is to open up support for the various memory providers for regular Kubernetes Pods when Cilium is configured with netkit datapath mode. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://kernel-recipes.org/en/2024/schedule/efficient-zero-copy-networking-using-io_uring [0] Link: https://patch.msgid.link/20260115082603.219152-11-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20netkit: Add single device mode for netkitDaniel Borkmann
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode: * For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device. * For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260115082603.219152-10-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add HW event handlerFan Gong
Add HINIC3_INIT_UP flags to trace netdev open status. Add port module event handler. Add link status event type(FAULT, PCIE link down, heart lost, mgmt watchdog). Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/53a2b928136998f740d597bbd45ca1740b95538f.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add mac filter opsFan Gong
Add ops to support unicast and multicast to filter mac address in packets sending and receiving. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/9ea618ad5d6c404e222e9114e08e913a73177705.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add adaptive IRQ coalescing with DIMFan Gong
DIM offers a way to adjust the coalescing settings based on load. As hinic3 rx and tx share interrupts, we only need to base dim on rx stats. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/af96c20a836800a5972a09cdaf520029d976ad48.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add .ndo_vlan_rx_add/kill_vid and .ndo_validate_addrFan Gong
Implement following callback function: .ndo_vlan_rx_add_vid .ndo_vlan_rx_kill_vid .ndo_validate_addr Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/70be187afcaa7b38981d114c088ffdc2cba0b4f1.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add .ndo_features_checkFan Gong
As we cannot solve packets with multiple stacked vlan, so we use .ndo_features_check to check for these packets and return a smaller feature without offload features. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/3879b20b7ffa20106a3f8f56dbf2d5eb389f260a.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add .ndo_set_features and .ndo_fix_featuresFan Gong
Implement following callback function: .ndo_set_features .ndo_fix_features The .ndo_set_features function includes five features: rx_csum, tso, lro, rx_cvlan and vlan_filter. Add these new features in netdev_feature_init. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/682734a08fde421413048bf70057dafe3cbe8497.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add .ndo_tx_timeout and .ndo_get_stats64Fan Gong
Implement following callback function: .ndo_tx_timeout .ndo_get_stats64 Use a work queue to trace tx_timeout callback and dump necessary debug information. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/ec34d2ff9b142e1e142e47700714533baf7e659c.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add PF management interfacesFan Gong
Add management and communication pathways between PF and HW. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/447e72b1c5f255ea5d79ecf96c8dac58011e604d.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20hinic3: Add PF frameworkFan Gong
Add support for PF framework based on the VF code. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/4796d6076bbad0f096eb10261ab3c7d989c5f7b7.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-19net/mlx5e: Conditionally create async ICOSQWilliam Tu
The async ICOSQ is only required by TLS RX (for re-sync flow) and XSK TX. Create it only when these features are enabled instead of always allocating it. This reduces per-channel memory usage, saves hardware resources, improves latency, and decreases the default number of SQs (from 3 to 2) and CQs (from 4 to 3). It also speeds up channel open/close operations for a netdev when async ICOSQ is not needed. Currently when TLS RX is enabled, there is no channel reset triggered. As a result, async ICOSQ allocation is not triggered, causing a NULL pointer crash. One solution is to do channel reset every time when toggling TLS RX. However, it's not straightforward as the offload state matters only on connection creation, and can go on beyond the channels reset. Instead, introduce a new field 'ktls_rx_was_enabled': if TLS RX is enabled for the first time: reset channels, create async ICOSQ, set the field. From that point on, no need to reset channels for any TLS RX enable/disable. Async ICOSQ will always be needed. For XSK TX, async ICOSQ is used in wakeup control and is guaranteed to have async ICOSQ allocated. This improves the latency of interface up/down operations when it applies. Perf numbers: NIC: Connect-X7. Test: Latency of interface up + down operations. Measured 20% speedup. Saving ~0.36 sec for 248 channels (~1.45 msec per channel). Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19net/mlx5e: Move async ICOSQ to dynamic allocationWilliam Tu
Dynamically allocate async ICOSQ. ICO (Internal Communication Operations) is for driver to communicate with the HW, and it's not used for traffic. Currently mlx5 driver has sync and async ICO send queues. The async ICOSQ means that it's not necessarily under NAPI context protection. The patch is in preparation for the later patch to detect its usage and enable it when necessary. Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19net/mlx5e: Use regular ICOSQ for triggering NAPIWilliam Tu
Before the cited commit, ICOSQ is used to post NOP WQE to trigger hardware interrupt and start NAPI, but this mechanism suffers from a race condition: mlx5e_alloc_rx_mpwqe may post UMR WQEs to ICOSQ _before_ NOP WQE is posted. The cited commit fixes the issue by replacing ICOSQ with async ICOSQ, as a new way to post the NOP WQE to trigger the hardware interrupt and NAPI. The patch changes it back by replacing async ICOSQ with regular ICOSQ, for the purpose of saving memory in later patches, and solves the issue by adding a new SQ state, MLX5E_SQ_STATE_LOCK_NEEDED for syncing the start of NAPI. What it does: - Switch trigger path from async ICOSQ to regular ICOSQ to reduce need for async SQ. - Introduce MLX5E_SQ_STATE_LOCK_NEEDED and mlx5e_icosq_sync_lock(), unlock() to prevent the race where UMR WQEs could be posted before the NOP WQE used to trigger NAPI. - Use synchronize_net() once per trigger cycle to quiesce in-flight softirqs before serializing the NOP WQE and any UMR postings via the ICOSQ lock. - Wrap ICOSQ UMR posting in en_rx.c and xsk/rx.c with the new conditional lock. The conditional locking approach is critical for performance: always locking would impose unnecessary overhead. Synchronization is not needed between regular NAPI cycles once the channel is activated and running. The lock is only required to protect against the race during channel activation—specifically, when the very first NOP WQE is posted to trigger NAPI. After that initial trigger, normal NAPI polling handles subsequent work without contention. The MLX5E_SQ_STATE_LOCK_NEEDED flag ensures we pay the synchronization cost only when necessary. Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19net/mlx5e: Move async ICOSQ lock into ICOSQ structWilliam Tu
Move the async_icosq spinlock from the mlx5e_channel structure into the mlx5e_icosq structure itself for better encapsulation and for later patch to also use it for other icosq use cases. Changes: - Add spinlock_t lock field to struct mlx5e_icosq - Remove async_icosq_lock field from struct mlx5e_channel - Initialize the new lock in mlx5e_open_icosq() - Update all lock usage in ktls_rx.c and en_main.c to use sq->lock instead of c->async_icosq_lock Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19octeon_ep: reset firmware ready statusVimlesh Kumar
Add support to reset firmware ready status when the driver is removed(either in unload or unbind) Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115092048.870237-1-vimleshk@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19net: thunderbolt: Allow reading link settingsIan MacDonald
In order to use Thunderbolt networking as part of bonding device it needs to support ->get_link_ksettings() ethtool operation, so that the bonding driver can read the link speed and the related attributes. Add support for this to the driver. Signed-off-by: Ian MacDonald <ian@netstatz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://patch.msgid.link/20260115115646.328898-5-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>