summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)Author
2026-03-19ice: fix retry for AQ command 0x06EEJakub Staniszewski
commit fb4903b3354aed4a2301180cf991226f896c87ed upstream. Executing ethtool -m can fail reporting a netlink I/O error while firmware link management holds the i2c bus used to communicate with the module. According to Intel(R) Ethernet Controller E810 Datasheet Rev 2.8 [1] Section 3.3.10.4 Read/Write SFF EEPROM (0x06EE) request should to be retried upon receiving EBUSY from firmware. Commit e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") implemented it only for part of ice_get_module_eeprom(), leaving all other calls to ice_aq_sff_eeprom() vulnerable to returning early on getting EBUSY without retrying. Remove the retry loop from ice_get_module_eeprom() and add Admin Queue (AQ) command with opcode 0x06EE to the list of commands that should be retried on receiving EBUSY from firmware. Cc: stable@vger.kernel.org Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com> Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://www.intel.com/content/www/us/en/content-details/613875/intel-ethernet-controller-e810-datasheet.html [1] Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19ixgbevf: fix link setup issueJedrzej Jagielski
commit feae40a6a178bb525a15f19288016e5778102a99 upstream. It may happen that VF spawned for E610 adapter has problem with setting link up. This happens when ixgbevf supporting mailbox API 1.6 cooperates with PF driver which doesn't support this version of API, and hence doesn't support new approach for getting PF link data. In that case VF asks PF to provide link data but as PF doesn't support it, returns -EOPNOTSUPP what leads to early bail from link configuration sequence. Avoid such situation by using legacy VFLINKS approach whenever negotiated API version is less than 1.6. To reproduce the issue just create VF and set its link up - adapter must be any from the E610 family, ixgbevf must support API 1.6 or higher while ixgbevf must not. Fixes: 53f0eb62b4d2 ("ixgbevf: fix getting link speed data for E610 devices") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19ice: reintroduce retry mechanism for indirect AQJakub Staniszewski
commit 326256c0a72d4877cec1d4df85357da106233128 upstream. Add retry mechanism for indirect Admin Queue (AQ) commands. To do so we need to keep the command buffer. This technically reverts commit 43a630e37e25 ("ice: remove unused buffer copy code in ice_sq_send_cmd_retry()"), but combines it with a fix in the logic by using a kmemdup() call, making it more robust and less likely to break in the future due to programmer error. Cc: Michal Schmidt <mschmidt@redhat.com> Cc: stable@vger.kernel.org Fixes: 3056df93f7a8 ("ice: Re-send some AQ commands, as result of EBUSY AQ error") Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com> Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19e1000/e1000e: Fix leak in DMA error cleanupMatt Vollrath
[ Upstream commit e94eaef11142b01f77bf8ba4d0b59720b7858109 ] If an error is encountered while mapping TX buffers, the driver should unmap any buffers already mapped for that skb. Because count is incremented after a successful mapping, it will always match the correct number of unmappings needed when dma_error is reached. Decrementing count before the while loop in dma_error causes an off-by-one error. If any mapping was successful before an unsuccessful mapping, exactly one DMA mapping would leak. In these commits, a faulty while condition caused an infinite loop in dma_error: Commit 03b1320dfcee ("e1000e: remove use of skb_dma_map from e1000e driver") Commit 602c0554d7b0 ("e1000: remove use of skb_dma_map from e1000 driver") Commit c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") fixed the infinite loop, but introduced the off-by-one error. This issue may still exist in the igbvf driver, but I did not address it in this patch. Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") Assisted-by: Claude:claude-4.6-opus Signed-off-by: Matt Vollrath <tactii@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19i40e: fix src IP mask checks and memcpy argument names in cloud filterAlok Tiwari
[ Upstream commit e809085f492842ce7a519c9ef72d40f4bca89c13 ] Fix following issues in the IPv4 and IPv6 cloud filter handling logic in both the add and delete paths: - The source-IP mask check incorrectly compares mask.src_ip[0] against tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely goes unnoticed because the check is in an "else if" path that only executes when dst_ip is not set, most cloud filter use cases focus on destination-IP matching, and the buggy condition can accidentally evaluate true in some cases. - memcpy() for the IPv4 source address incorrectly uses ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although both arrays are the same size. - The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE (tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size. - In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent explicit, even though both fields are struct in6_addr. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19iavf: fix incorrect reset handling in callbacksPetr Oros
[ Upstream commit fdadbf6e84c44df8dbb85cfdd38bc10e4431501d ] Three driver callbacks schedule a reset and wait for its completion: ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels(). Waiting for reset in ndo_change_mtu() and set_ringparam() was added by commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger it") to fix a race condition where adding an interface to bonding immediately after MTU or ring parameter change failed because the interface was still in __RESETTING state. The same commit also added waiting in iavf_set_priv_flags(), which was later removed by commit 53844673d555 ("iavf: kill "legacy-rx" for good"). Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") to ensure the PF has enough time to complete the VF reset when changing channel count, and to return correct error codes to userspace. Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added net_shaper_ops to iavf, which required reset_task to use _locked NAPI variants (napi_enable_locked, napi_disable_locked) that need the netdev instance lock. Later, commit 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") started holding the netdev instance lock during ndo and ethtool callbacks for drivers with net_shaper_ops. Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock") replaced the driver's crit_lock with netdev_lock in reset_task, causing incorrect behavior: the callback holds netdev_lock and waits for reset_task, but reset_task needs the same lock: Thread 1 (callback) Thread 2 (reset_task) ------------------- --------------------- netdev_lock() [blocked on workqueue] ndo_change_mtu() or ethtool op iavf_schedule_reset() iavf_wait_for_reset() iavf_reset_task() waiting... netdev_lock() <- blocked This does not strictly deadlock because iavf_wait_for_reset() uses wait_event_interruptible_timeout() with a 5-second timeout. The wait eventually times out, the callback returns an error to userspace, and after the lock is released reset_task completes the reset. This leads to incorrect behavior: userspace sees an error even though the configuration change silently takes effect after the timeout. Fix this by extracting the reset logic from iavf_reset_task() into a new iavf_reset_step() function that expects netdev_lock to be already held. The three callbacks now call iavf_reset_step() directly instead of scheduling the work and waiting, performing the reset synchronously in the caller's context which already holds netdev_lock. This eliminates both the incorrect error reporting and the need for iavf_wait_for_reset(), which is removed along with the now-unused reset_waitqueue. The workqueue-based iavf_reset_task() becomes a thin wrapper that acquires netdev_lock and calls iavf_reset_step(), preserving its use for PF-initiated resets. The callbacks may block for several seconds while iavf_reset_step() polls hardware registers, but this is acceptable since netdev_lock is a per-device mutex and only serializes operations on the same interface. v3: - Remove netif_running() guard from iavf_set_channels(). Unlike set_ringparam where descriptor counts are picked up by iavf_open() directly, num_req_queues is only consumed during iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset on a down device would silently discard the channel count change. - Remove dead reset_waitqueue code (struct field, init, and all wake_up calls) since iavf_wait_for_reset() was the only consumer. Fixes: 120f28a6f314 ("iavf: get rid of the crit lock") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19iavf: fix PTP use-after-free during resetPetr Oros
[ Upstream commit efc54fb13d79117a825fef17364315a58682c7ec ] Commit 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") introduced a worker to cache PHC time, but failed to stop it during reset or disable. This creates a race condition where `iavf_reset_task()` or `iavf_disable_vf()` free adapter resources (AQ) while the worker is still running. If the worker triggers `iavf_queue_ptp_cmd()` during teardown, it accesses freed memory/locks, leading to a crash. Fix this by calling `iavf_ptp_release()` before tearing down the adapter. This ensures `ptp_clock_unregister()` synchronously cancels the worker and cleans up the chardev before the backing resources are destroyed. Fixes: 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19drivers: net: ice: fix devlink parameters get without irdmaNikolay Aleksandrov
[ Upstream commit bd98c6204d1195973b1760fe45860863deb6200c ] If CONFIG_IRDMA isn't enabled but there are ice NICs in the system, the driver will prevent full devlink dev param show dump because its rdma get callbacks return ENODEV and stop the dump. For example: $ devlink dev param show pci/0000:82:00.0: name msix_vec_per_pf_max type generic values: cmode driverinit value 2 name msix_vec_per_pf_min type generic values: cmode driverinit value 2 kernel answers: No such device Returning EOPNOTSUPP allows the dump to continue so we can see all devices' devlink parameters. Fixes: c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12i40e: use xdp.frame_sz as XDP RxQ info frag_sizeLarysa Zaremba
[ Upstream commit c69d22c6c46a1d792ba8af3d8d6356fdc0e6f538 ] The only user of frag_size field in XDP RxQ info is bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead of DMA write size. Different assumptions in i40e driver configuration lead to negative tailroom. Set frag_size to the same value as frame_sz in shared pages mode, use new helper to set frag_size when AF_XDP ZC is active. Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-7-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12i40e: fix registering XDP RxQ infoLarysa Zaremba
[ Upstream commit 8f497dc8a61429cc004720aa8e713743355d80cf ] Current way of handling XDP RxQ info in i40e has a problem, where frag_size is not updated when xsk_buff_pool is detached or when MTU is changed, this leads to growing tail always failing for multi-buffer packets. Couple XDP RxQ info registering with buffer allocations and unregistering with cleaning the ring. Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-6-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12igb: Fix trigger of incorrect irq in igb_xsk_wakeupVivek Behera
[ Upstream commit d4c13ab36273a8c318ba06799793cc1f5d9c6fa1 ] The current implementation in the igb_xsk_wakeup expects the Rx and Tx queues to share the same irq. This would lead to triggering of incorrect irq in split irq configuration. This patch addresses this issue which could impact environments with 2 active cpu cores or when the number of queues is reduced to 2 or less cat /proc/interrupts | grep eno2 167: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0 0-edge eno2 168: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0 1-edge eno2-rx-0 169: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0 2-edge eno2-rx-1 170: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0 3-edge eno2-tx-0 171: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0 4-edge eno2-tx-1 Furthermore it uses the flags input argument to trigger either rx, tx or both rx and tx irqs as specified in the ndo_xsk_wakeup api documentation Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and helpers") Signed-off-by: Vivek Behera <vivek.behera@siemens.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12iavf: fix netdev->max_mtu to respect actual hardware limitKohei Enju
[ Upstream commit b84852170153671bb0fa6737a6e48370addd8e1a ] iavf sets LIBIE_MAX_MTU as netdev->max_mtu, ignoring vf_res->max_mtu from PF [1]. This allows setting an MTU beyond the actual hardware limit, causing TX queue timeouts [2]. Set correct netdev->max_mtu using vf_res->max_mtu from the PF. Note that currently PF drivers such as ice/i40e set the frame size in vf_res->max_mtu, not MTU. Convert vf_res->max_mtu to MTU before setting netdev->max_mtu. [1] # ip -j -d link show $DEV | jq '.[0].max_mtu' 16356 [2] iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5692 ms iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 6: transmit queue 3 timed out 5312 ms iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex ... Fixes: 5fa4caff59f2 ("iavf: switch to Page Pool") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12libie: don't unroll if fwlog isn't supportedMichal Swiatkowski
[ Upstream commit 636cc3bd12f499c74eaf5dc9a7d5b832f1bb24ed ] The libie_fwlog_deinit() function can be called during driver unload even when firmware logging was never properly initialized. This led to call trace: [ 148.576156] Oops: Oops: 0000 [#1] SMP NOPTI [ 148.576167] CPU: 80 UID: 0 PID: 12843 Comm: rmmod Kdump: loaded Not tainted 6.17.0-rc7next-queue-3oct-01915-g06d79d51cf51 #1 PREEMPT(full) [ 148.576177] Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 07/18/2020 [ 148.576182] RIP: 0010:__dev_printk+0x16/0x70 [ 148.576196] Code: 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 3c <4c> 8b 6e 50 48 89 f3 4d 85 ed 75 03 4c 8b 2e 48 89 df e8 f3 27 98 [ 148.576204] RSP: 0018:ffffd2fd7ea17a48 EFLAGS: 00010202 [ 148.576211] RAX: ffffd2fd7ea17aa0 RBX: ffff8eb288ae2000 RCX: 0000000000000000 [ 148.576217] RDX: ffffd2fd7ea17a70 RSI: 00000000000000c8 RDI: ffffffffb68d3d88 [ 148.576222] RBP: ffffffffb68d3d88 R08: 0000000000000000 R09: 0000000000000000 [ 148.576227] R10: 00000000000000c8 R11: ffff8eb2b1a49400 R12: ffffd2fd7ea17a70 [ 148.576231] R13: ffff8eb3141fb000 R14: ffffffffc1215b48 R15: ffffffffc1215bd8 [ 148.576236] FS: 00007f5666ba6740(0000) GS:ffff8eb2472b9000(0000) knlGS:0000000000000000 [ 148.576242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 148.576247] CR2: 0000000000000118 CR3: 000000011ad17000 CR4: 0000000000350ef0 [ 148.576252] Call Trace: [ 148.576258] <TASK> [ 148.576269] _dev_warn+0x7c/0x96 [ 148.576290] libie_fwlog_deinit+0x112/0x117 [libie_fwlog] [ 148.576303] ixgbe_remove+0x63/0x290 [ixgbe] [ 148.576342] pci_device_remove+0x42/0xb0 [ 148.576354] device_release_driver_internal+0x19c/0x200 [ 148.576365] driver_detach+0x48/0x90 [ 148.576372] bus_remove_driver+0x6d/0xf0 [ 148.576383] pci_unregister_driver+0x2e/0xb0 [ 148.576393] ixgbe_exit_module+0x1c/0xd50 [ixgbe] [ 148.576430] __do_sys_delete_module.isra.0+0x1bc/0x2e0 [ 148.576446] do_syscall_64+0x7f/0x980 It can be reproduced by trying to unload ixgbe driver in recovery mode. Fix that by checking if fwlog is supported before doing unroll. Fixes: 641585bc978e ("ixgbe: fwlog support for e610") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12ice: fix adding AQ LLDP filter for VFLarysa Zaremba
[ Upstream commit eef33aa44935d001747ca97703c08dd6f9031162 ] The referenced commit came from a misunderstanding of the FW LLDP filter AQ (Admin Queue) command due to the error in the internal documentation. Contrary to the assumptions in the original commit, VFs can be added and deleted from this filter without any problems. Introduced dev_info message proved to be useful, so reverting the whole commit does not make sense. Without this fix, trusted VFs do not receive LLDP traffic, if there is an AQ LLDP filter on PF. When trusted VF attempts to add an LLDP multicast MAC address, the following message can be seen in dmesg on host: ice 0000:33:00.0: Failed to add Rx LLDP rule on VSI 20 error: -95 Revert checking VSI type when adding LLDP filter through AQ. Fixes: 4d5a1c4e6d49 ("ice: do not add LLDP-specific filter if not necessary") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12e1000e: clear DPG_EN after reset to avoid autonomous power-gatingVitaly Lifshits
[ Upstream commit 0942fc6d324eb9c6b16187b2aa994c0823557f06 ] Panther Lake systems introduced an autonomous power gating feature for the integrated Gigabit Ethernet in shutdown state (S5) state. As part of it, the reset value of DPG_EN bit was changed to 1. Clear this bit after performing hardware reset to avoid errors such as Tx/Rx hangs, or packet loss/corruption. Fixes: 0c9183ce61bc ("e1000e: Add support for the next LOM generation") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12i40e: Fix preempt count leak in napi poll tracepointThomas Gleixner
[ Upstream commit 4b3d54a85bd37ebf2d9836f0d0de775c0ff21af9 ] Using get_cpu() in the tracepoint assignment causes an obvious preempt count leak because nothing invokes put_cpu() to undo it: softirq: huh, entered softirq 3 NET_RX with preempt_count 00000100, exited with 00000101? This clearly has seen a lot of testing in the last 3+ years... Use smp_processor_id() instead. Fixes: 6d4d584a7ea8 ("i40e: Add i40e_napi_poll tracepoint") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12ice: recap the VSI and QoS info after rebuildAaron Ma
[ Upstream commit 6aa07e23dd3ccd35a0100c06fcb6b6c3b01e7965 ] Fix IRDMA hardware initialization timeout (-110) after resume by separating VSI-dependent configuration from RDMA resource allocation, ensuring VSI is rebuilt before IRDMA accesses it. After resume from suspend, IRDMA hardware initialization fails: ice: IRDMA hardware initialization FAILED init_state=4 status=-110 Separate RDMA initialization into two phases: 1. ice_init_rdma() - Allocate resources only (no VSI/QoS access, no plug) 2. ice_rdma_finalize_setup() - Assign VSI/QoS info and plug device This allows: - ice_init_rdma() to stay in ice_resume() (mirrors ice_deinit_rdma() in ice_suspend()) - VSI assignment deferred until after ice_vsi_rebuild() completes - QoS info updated after ice_dcb_rebuild() completes - Device plugged only when control queues, VSI, and DCB are all ready Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 resume") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12idpf: Fix flow rule delete failure due to invalid validationSreedevi Joshi
[ Upstream commit 2c31557336a8e4d209ed8d4513cef2c0f15e7ef4 ] When deleting a flow rule using "ethtool -N <dev> delete <location>", idpf_sideband_action_ena() incorrectly validates fsp->ring_cookie even though ethtool doesn't populate this field for delete operations. The uninitialized ring_cookie may randomly match RX_CLS_FLOW_DISC or RX_CLS_FLOW_WAKE, causing validation to fail and preventing legitimate rule deletions. Remove the unnecessary sideband action enable check and ring_cookie validation during delete operations since action validation is not required when removing existing rules. Fixes: ada3e24b84a0 ("idpf: add flow steering support") Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12idpf: change IRQ naming to match netdev and ethtool queue numberingBrian Vazquez
[ Upstream commit 1500a8662d2d41d6bb03e034de45ddfe6d7d362d ] The code uses the vidx for the IRQ name but that doesn't match ethtool reporting nor netdev naming, this makes it hard to tune the device and associate queues with IRQs. Sequentially requesting irqs starting from '0' makes the output consistent. This commit changes the interrupt numbering but preserves the name format, maintaining ABI compatibility. Existing tools relying on the old numbering are already non-functional, as they lack a useful correlation to the interrupts. Before: ethtool -L eth1 tx 1 combined 3 grep . /proc/irq/*/*idpf*/../smp_affinity_list /proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167 /proc/irq/68/idpf-eth1-TxRx-1/../smp_affinity_list:0 /proc/irq/70/idpf-eth1-TxRx-3/../smp_affinity_list:1 /proc/irq/71/idpf-eth1-TxRx-4/../smp_affinity_list:2 /proc/irq/72/idpf-eth1-Tx-5/../smp_affinity_list:3 ethtool -S eth1 | grep -v ': 0' NIC statistics: tx_q-0_pkts: 1002 tx_q-1_pkts: 2679 tx_q-2_pkts: 1113 tx_q-3_pkts: 1192 <----- tx_q-3 vs idpf-eth1-Tx-5 rx_q-0_pkts: 1143 rx_q-1_pkts: 3172 rx_q-2_pkts: 1074 After: ethtool -L eth1 tx 1 combined 3 grep . /proc/irq/*/*idpf*/../smp_affinity_list /proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167 /proc/irq/68/idpf-eth1-TxRx-0/../smp_affinity_list:0 /proc/irq/70/idpf-eth1-TxRx-1/../smp_affinity_list:1 /proc/irq/71/idpf-eth1-TxRx-2/../smp_affinity_list:2 /proc/irq/72/idpf-eth1-Tx-3/../smp_affinity_list:3 ethtool -S eth1 | grep -v ': 0' NIC statistics: tx_q-0_pkts: 118 tx_q-1_pkts: 134 tx_q-2_pkts: 228 tx_q-3_pkts: 138 <--- tx_q-3 matches idpf-eth1-Tx-3 rx_q-0_pkts: 111 rx_q-1_pkts: 366 rx_q-2_pkts: 120 Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12idpf: increment completion queue next_to_clean in sw marker wait routineLi Li
[ Upstream commit 712896ac4bce38a965a1c175f6e7804ed0381334 ] Currently, in idpf_wait_for_sw_marker_completion(), when an IDPF_TXD_COMPLT_SW_MARKER packet is found, the routine breaks out of the for loop and does not increment the next_to_clean counter. This causes the subsequent NAPI polls to run into the same IDPF_TXD_COMPLT_SW_MARKER packet again and print out the following: [ 23.261341] idpf 0000:05:00.0 eth1: Unknown TX completion type: 5 Instead, we should increment next_to_clean regardless when an IDPF_TXD_COMPLT_SW_MARKER packet is found. Tested: with the patch applied, we do not see the errors above from NAPI polls anymore. Fixes: 9d39447051a0 ("idpf: remove SW marker handling from NAPI") Signed-off-by: Li Li <boolli@google.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04net: intel: fix PCI device ID conflict between i40e and ipw2200Ethan Nelson-Moore
[ Upstream commit d03e094473ecdeb68d853752ba467abe13e1de44 ] The ID 8086:104f is matched by both i40e and ipw2200. The same device ID should not be in more than one driver, because in that case, which driver is used is unpredictable. Fix this by taking advantage of the fact that i40e devices use PCI_CLASS_NETWORK_ETHERNET and ipw2200 devices use PCI_CLASS_NETWORK_OTHER to differentiate the devices. Fixes: 2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards") Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20260210021235.16315-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11i40e: drop udp_tunnel_get_rx_info() call from i40e_open()Mohammad Heib
[ Upstream commit 40857194956dcaf3d2b66d6bd113d844c93bef54 ] The i40e driver calls udp_tunnel_get_rx_info() during i40e_open(). This is redundant because UDP tunnel RX offload state is preserved across device down/up cycles. The udp_tunnel core handles synchronization automatically when required. Furthermore, recent changes in the udp_tunnel infrastructure require querying RX info while holding the udp_tunnel lock. Calling it directly from the ndo_open path violates this requirement, triggering the following lockdep warning: Call Trace: <TASK> ? __udp_tunnel_nic_assert_locked+0x39/0x40 [udp_tunnel] i40e_open+0x135/0x14f [i40e] __dev_open+0x121/0x2e0 __dev_change_flags+0x227/0x270 dev_change_flags+0x3d/0xb0 devinet_ioctl+0x56f/0x860 sock_do_ioctl+0x7b/0x130 __x64_sys_ioctl+0x91/0xd0 do_syscall_64+0x90/0x170 ... </TASK> Remove the redundant and unsafe call to udp_tunnel_get_rx_info() from i40e_open() resolve the locking violation. Fixes: 1ead7501094c ("udp_tunnel: remove rtnl_lock dependency") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11ice: drop udp_tunnel_get_rx_info() call from ndo_open()Mohammad Heib
[ Upstream commit 234e615bfece9e3e91c50fe49ab9e68ee37c791a ] The ice driver calls udp_tunnel_get_rx_info() during ice_open_internal(). This is redundant because UDP tunnel RX offload state is preserved across device down/up cycles. The udp_tunnel core handles synchronization automatically when required. Furthermore, recent changes in the udp_tunnel infrastructure require querying RX info while holding the udp_tunnel lock. Calling it directly from the ndo_open path violates this requirement, triggering the following lockdep warning: Call Trace: <TASK> ice_open_internal+0x253/0x350 [ice] __udp_tunnel_nic_assert_locked+0x86/0xb0 [udp_tunnel] __dev_open+0x2f5/0x880 __dev_change_flags+0x44c/0x660 netif_change_flags+0x80/0x160 devinet_ioctl+0xd21/0x15f0 inet_ioctl+0x311/0x350 sock_ioctl+0x114/0x220 __x64_sys_ioctl+0x131/0x1a0 ... </TASK> Remove the redundant and unsafe call to udp_tunnel_get_rx_info() from ice_open_internal() to resolve the locking violation Fixes: 1ead7501094c ("udp_tunnel: remove rtnl_lock dependency") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11ice: Fix PTP NULL pointer dereference during VSI rebuildAaron Ma
[ Upstream commit fc6f36eaaedcf4b81af6fe1a568f018ffd530660 ] Fix race condition where PTP periodic work runs while VSI is being rebuilt, accessing NULL vsi->rx_rings. The sequence was: 1. ice_ptp_prepare_for_reset() cancels PTP work 2. ice_ptp_rebuild() immediately queues PTP work 3. VSI rebuild happens AFTER ice_ptp_rebuild() 4. PTP work runs and accesses NULL vsi->rx_rings Fix: Keep PTP work cancelled during rebuild, only queue it after VSI rebuild completes in ice_rebuild(). Added ice_ptp_queue_work() helper function to encapsulate the logic for queuing PTP work, ensuring it's only queued when PTP is supported and the state is ICE_PTP_READY. Error log: [ 121.392544] ice 0000:60:00.1: PTP reset successful [ 121.392692] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 121.392712] #PF: supervisor read access in kernel mode [ 121.392720] #PF: error_code(0x0000) - not-present page [ 121.392727] PGD 0 [ 121.392734] Oops: Oops: 0000 [#1] SMP NOPTI [ 121.392746] CPU: 8 UID: 0 PID: 1005 Comm: ice-ptp-0000:60 Tainted: G S 6.19.0-rc6+ #4 PREEMPT(voluntary) [ 121.392761] Tainted: [S]=CPU_OUT_OF_SPEC [ 121.392773] RIP: 0010:ice_ptp_update_cached_phctime+0xbf/0x150 [ice] [ 121.393042] Call Trace: [ 121.393047] <TASK> [ 121.393055] ice_ptp_periodic_work+0x69/0x180 [ice] [ 121.393202] kthread_worker_fn+0xa2/0x260 [ 121.393216] ? __pfx_ice_ptp_periodic_work+0x10/0x10 [ice] [ 121.393359] ? __pfx_kthread_worker_fn+0x10/0x10 [ 121.393371] kthread+0x10d/0x230 [ 121.393382] ? __pfx_kthread+0x10/0x10 [ 121.393393] ret_from_fork+0x273/0x2b0 [ 121.393407] ? __pfx_kthread+0x10/0x10 [ 121.393417] ret_from_fork_asm+0x1a/0x30 [ 121.393432] </TASK> Fixes: 803bef817807d ("ice: factor out ice_ptp_rebuild_owner()") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11ice: PTP: fix missing timestamps on E825 hardwareJacob Keller
[ Upstream commit 88b68f35eb43ad5ac77ac1107059040b04e6f477 ] The E825 hardware currently has each PF handle the PFINT_TSYN_TX cause of the miscellaneous OICR interrupt vector. The actual interrupt cause underlying this is shared by all ports on the same quad: ┌─────────────────────────────────┐ │ │ │ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ │ │PF 0│ │PF 1│ │PF 2│ │PF 3│ │ │ └────┘ └────┘ └────┘ └────┘ │ │ │ └────────────────▲────────────────┘ │ │ ┌────────────────┼────────────────┐ │ PHY QUAD │ └───▲────────▲────────▲────────▲──┘ │ │ │ │ ┌───┼──┐ ┌───┴──┐ ┌───┼──┐ ┌───┼──┐ │Port 0│ │Port 1│ │Port 2│ │Port 3│ └──────┘ └──────┘ └──────┘ └──────┘ If multiple PFs issue Tx timestamp requests near simultaneously, it is possible that the correct PF will not be interrupted and will miss its timestamp. Understanding why is somewhat complex. Consider the following sequence of events: CPU 0: Send Tx packet on PF 0 ... PF 0 enqueues packet with Tx request CPU 1, PF1: ... Send Tx packet on PF1 ... PF 1 enqueues packet with Tx request HW: PHY Port 0 sends packet PHY raises Tx timestamp event interrupt MAC raises each PF interrupt CPU 0, PF0: CPU 1, PF1: ice_misc_intr() checks for Tx timestamps ice_misc_intr() checks for Tx timestamp Sees packet ready bit set Sees nothing available ... Exits ... ... HW: PHY port 1 sends packet PHY interrupt ignored because not all packet timestamps read yet. ... Read timestamp, report to stack Because the interrupt event is shared for all ports on the same quad, the PHY will not raise a new interrupt for any PF until all timestamps are read. In the example above, the second timestamp comes in for port 1 before the timestamp from port 0 is read. At this point, there is no longer an interrupt thread running that will read the timestamps, because each PF has checked and found that there was no work to do. Applications such as ptp4l will timeout after waiting a few milliseconds. Eventually, the watchdog service task will re-check for all quads and notice that there are outstanding timestamps, and issue a software interrupt to recover. However, by this point it is far too late, and applications have already failed. All of this occurs because of the underlying hardware behavior. The PHY cannot raise a new interrupt signal until all outstanding timestamps have been read. As a first step to fix this, switch the E825C hardware to the ICE_PTP_TX_INTERRUPT_ALL mode. In this mode, only the clock owner PF will respond to the PFINT_TSYN_TX cause. Other PFs disable this cause and will not wake. In this mode, the clock owner will iterate over all ports and handle timestamps for each connected port. This matches the E822 behavior, and is a necessary but insufficient step to resolve the missing timestamps. Even with use of the ICE_PTP_TX_INTERRUPT_ALL mode, we still sometimes miss a timestamp event. The ice_ptp_tx_tstamp_owner() does re-check the ready bitmap, but does so before re-enabling the OICR interrupt vector. It also only checks the ready bitmap, but not the software Tx timestamp tracker. To avoid risk of losing a timestamp, refactor the logic to check both the software Tx timestamp tracker bitmap *and* the hardware ready bitmap. Additionally, do this outside of ice_ptp_process_ts() after we have already re-enabled the OICR interrupt. Remove the checks from the ice_ptp_tx_tstamp(), ice_ptp_tx_tstamp_owner(), and the ice_ptp_process_ts() functions. This results in ice_ptp_tx_tstamp() being nothing more than a wrapper around ice_ptp_process_tx_tstamp() so we can remove it. Add the ice_ptp_tx_tstamps_pending() function which returns a boolean indicating if there are any pending Tx timestamps. First, check the software timestamp tracker bitmap. In ICE_PTP_TX_INTERRUPT_ALL mode, check *all* ports software trackers. If a tracker has outstanding timestamp requests, return true. Additionally, check the PHY ready bitmap to confirm if the PHY indicates any outstanding timestamps. In the ice_misc_thread_fn(), call ice_ptp_tx_tstamps_pending() just before returning from the IRQ thread handler. If it returns true, write to PFINT_OICR to trigger a PFINT_OICR_TSYN_TX_M software interrupt. This will force the handler to interrupt again and complete the work even if the PHY hardware did not interrupt for any reason. This results in the following new flow for handling Tx timestamps: 1) send Tx packet 2) PHY captures timestamp 3) PHY triggers MAC interrupt 4) clock owner executes ice_misc_intr() with PFINT_OICR_TSYN_TX flag set 5) ice_ptp_ts_irq() returns IRQ_WAKE_THREAD 7) The interrupt thread wakes up and kernel calls ice_misc_intr_thread_fn() 8) ice_ptp_process_ts() is called to handle any outstanding timestamps 9) ice_irq_dynamic_ena() is called to re-enable the OICR hardware interrupt cause 10) ice_ptp_tx_tstamps_pending() is called to check if we missed any more outstanding timestamps, checking both software and hardware indicators. With this change, it should no longer be possible for new timestamps to come in such a way that we lose an interrupt. If a timestamp comes in before the ice_ptp_tx_tstamps_pending() call, it will be noticed by at least one of the software bitmap check or the hardware bitmap check. If the timestamp comes in *after* this check, it should cause a timestamp interrupt as we have already read all timestamps from the PHY and the OICR vector has been re-enabled. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemyslaw Korba <przemyslaw.korba@intel.com> Tested-by: Vitaly Grinberg <vgrinber@redhat.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11ice: fix missing TX timestamps interrupts on E825 devicesGrzegorz Nitka
[ Upstream commit 99854c167cfc113ad863832b1601c4ca1a639cfe ] Modify PTP (Precision Time Protocol) configuration on link down flow. Previously, PHY_REG_TX_OFFSET_READY register was cleared in such case. This register is used to determine if the timestamp is valid or not on the hardware side. However, there is a possibility that there is still the packet in the HW queue which originally was supposed to be timestamped but the link is already down and given register is cleared. This potentially might lead to the situation in which that 'delayed' packet's timestamp is treated as invalid one when the link is up again. This in turn leads to the situation in which the driver is not able to effectively clean timestamp memory and interrupt configuration. From the hardware perspective, that 'old' interrupt was not handled properly and even if new timestamp packets are processed, no new interrupts is generated. As a result, providing timestamps to the user applications (like ptp4l) is not possible. The solution for this problem is implemented at the driver level rather than the firmware, and maintains the tx_ready bit high, even during link down events. This avoids entering a potential inconsistent state between the driver and the timestamp hardware. Testing hints: - run PTP traffic at higher rate (like 16 PTP messages per second) - observe ptp4l behaviour at the client side in the following conditions: a) trigger link toggle events. It needs to be physiscal link down/up events b) link speed change In all above cases, PTP processing at ptp4l application should resume always. In failure case, the following permanent error message in ptp4l log was observed: controller-0 ptp4l: err [6175.116] ptp4l-legacy timed out while polling for tx timestamp Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11treewide: Drop pci_save_state() after pci_restore_state()Lukas Wunner
commit 383d89699c5028de510a6667f674ed38585f77fc upstream. In 2009, commit c82f63e411f1 ("PCI: check saved state before restore") changed the behavior of pci_restore_state() such that it became necessary to call pci_save_state() afterwards, lest recovery from subsequent PCI errors fails. The commit has just been reverted and so all the pci_save_state() after pci_restore_state() calls that have accumulated in the tree are now superfluous. Drop them. Two drivers chose a different approach to achieve the same result: drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the pci_dev's "state_saved" flag to true before calling pci_restore_state(). Drop this as well. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # qat Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06ice: stop counting UDP csum mismatch as rx_errorsJesse Brandeburg
[ Upstream commit 05faf2c0a76581d0a7fdbb8ec46477ba183df95b ] Since the beginning, the Intel ice driver has counted receive checksum offload mismatches into the rx_errors member of the rtnl_link_stats64 struct. In ethtool -S these show up as rx_csum_bad.nic. I believe counting these in rx_errors is fundamentally wrong, as it's pretty clear from the comments in if_link.h and from every other statistic the driver is summing into rx_errors, that all of them would cause a "hardware drop" except for the UDP checksum mismatch, as well as the fact that all the other causes for rx_errors are L2 reasons, and this L4 UDP "mismatch" is an outlier. A last nail in the coffin is that rx_errors is monitored in production and can indicate a bad NIC/cable/Switch port, but instead some random series of UDP packets with bad checksums will now trigger this alert. This false positive makes the alert useless and affects us as well as other companies. This packet with presumably a bad UDP checksum is *already* passed to the stack, just not marked as offloaded by the hardware/driver. If it is dropped by the stack it will show up as UDP_MIB_CSUMERRORS. And one more thing, none of the other Intel drivers, and at least bnxt_en and mlx5 both don't appear to count UDP offload mismatches as rx_errors. Here is a related customer complaint: https://community.intel.com/t5/Ethernet-Products/ice-rx-errros-is-too-sensitive-to-IP-TCP-attack-packets-Intel/td-p/1662125 Fixes: 4f1fe43c920b ("ice: Add more Rx errors to netdev's rx_error counter") Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Jake Keller <jacob.e.keller@intel.com> Cc: IWL <intel-wired-lan@lists.osuosl.org> Signed-off-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-06ice: Fix NULL pointer dereference in ice_vsi_set_napi_queuesAaron Ma
[ Upstream commit 9bb30be4d89ff9a8d7ab1aa0eb2edaca83431f85 ] Add NULL pointer checks in ice_vsi_set_napi_queues() to prevent crashes during resume from suspend when rings[q_idx]->q_vector is NULL. Tested adaptor: 60:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02) Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-2 [8086:4003] SR-IOV state: both disabled and enabled can reproduce this issue. kernel version: v6.18 Reproduce steps: Boot up and execute suspend like systemctl suspend or rtcwake. Log: <1>[ 231.443607] BUG: kernel NULL pointer dereference, address: 0000000000000040 <1>[ 231.444052] #PF: supervisor read access in kernel mode <1>[ 231.444484] #PF: error_code(0x0000) - not-present page <6>[ 231.444913] PGD 0 P4D 0 <4>[ 231.445342] Oops: Oops: 0000 [#1] SMP NOPTI <4>[ 231.446635] RIP: 0010:netif_queue_set_napi+0xa/0x170 <4>[ 231.447067] Code: 31 f6 31 ff c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 c9 74 0b <48> 83 79 30 00 0f 84 39 01 00 00 55 41 89 d1 49 89 f8 89 f2 48 89 <4>[ 231.447513] RSP: 0018:ffffcc780fc078c0 EFLAGS: 00010202 <4>[ 231.447961] RAX: ffff8b848ca30400 RBX: ffff8b848caf2028 RCX: 0000000000000010 <4>[ 231.448443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b848dbd4000 <4>[ 231.448896] RBP: ffffcc780fc078e8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 231.449345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4>[ 231.449817] R13: ffff8b848dbd4000 R14: ffff8b84833390c8 R15: 0000000000000000 <4>[ 231.450265] FS: 00007c7b29e9d740(0000) GS:ffff8b8c068e2000(0000) knlGS:0000000000000000 <4>[ 231.450715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 231.451179] CR2: 0000000000000040 CR3: 000000030626f004 CR4: 0000000000f72ef0 <4>[ 231.451629] PKRU: 55555554 <4>[ 231.452076] Call Trace: <4>[ 231.452549] <TASK> <4>[ 231.452996] ? ice_vsi_set_napi_queues+0x4d/0x110 [ice] <4>[ 231.453482] ice_resume+0xfd/0x220 [ice] <4>[ 231.453977] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.454425] pci_pm_resume+0x8c/0x140 <4>[ 231.454872] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.455347] dpm_run_callback+0x5f/0x160 <4>[ 231.455796] ? dpm_wait_for_superior+0x107/0x170 <4>[ 231.456244] device_resume+0x177/0x270 <4>[ 231.456708] dpm_resume+0x209/0x2f0 <4>[ 231.457151] dpm_resume_end+0x15/0x30 <4>[ 231.457596] suspend_devices_and_enter+0x1da/0x2b0 <4>[ 231.458054] enter_state+0x10e/0x570 Add defensive checks for both the ring pointer and its q_vector before dereferencing, allowing the system to resume successfully even when q_vectors are unmapped. Fixes: 2a5dc090b92cf ("ice: move netif_queue_set_napi to rtnl-protected sections") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-06ixgbe: don't initialize aci lock in ixgbe_recovery_probe()Kohei Enju
[ Upstream commit 100cf7b4ca6ed770ec4287f3789b1da2e340a05a ] hw->aci.lock is already initialized in ixgbe_sw_init(), so ixgbe_recovery_probe() doesn't need to initialize the lock. This function is also not responsible for destroying the lock on failures. Additionally, change the name of label in accordance with this change. Fixes: 29cb3b8d95c7 ("ixgbe: add E610 implementation of FW recovery mode") Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/intel-wired-lan/aTcFhoH-z2btEKT-@horms.kernel.org/ Signed-off-by: Kohei Enju <enjuk@amazon.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-06ixgbe: fix memory leaks in the ixgbe_recovery_probe() pathKohei Enju
[ Upstream commit 638344712aefeba97b6e0d90f560815fd88abd0f ] When ixgbe_recovery_probe() is invoked and this function fails, allocated resources in advance are not completely freed, because ixgbe_probe() returns ixgbe_recovery_probe() directly and ixgbe_recovery_probe() only frees partial resources, resulting in memory leaks including: - adapter->io_addr - adapter->jump_tables[0] - adapter->mac_table - adapter->rss_key - adapter->af_xdp_zc_qps The leaked MMIO region can be observed in /proc/vmallocinfo, and the remaining leaks are reported by kmemleak. Don't return ixgbe_recovery_probe() directly, and instead let ixgbe_probe() to clean up resources on failures. Fixes: 29cb3b8d95c7 ("ixgbe: add E610 implementation of FW recovery mode") Signed-off-by: Kohei Enju <enjuk@amazon.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30idpf: Fix data race in idpf_net_dimDavid Yang
[ Upstream commit 5fbe395cd1fdbc883584e7f38369e4ba5ca778d2 ] In idpf_net_dim(), some statistics protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. The correct way to copy statistics is already illustrated by idpf_add_queue_stats(). Fix this by reading them into temporary variables first. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30idpf: read lower clock bits inside the time sandwichMina Almasry
[ Upstream commit bdfc7b55adcd04834ccc1b6b13e55e3fd7eaa789 ] PCIe reads need to be done inside the time sandwich because PCIe writes may get buffered in the PCIe fabric and posted to the device after the _postts completes. Doing the PCIe read inside the time sandwich guarantees that the write gets flushed before the _postts timestamp is taken. Cc: lrizzo@google.com Cc: namangulati@google.com Cc: willemb@google.com Cc: intel-wired-lan@lists.osuosl.org Cc: milena.olech@intel.com Cc: jacob.e.keller@intel.com Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Suggested-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: fix devlink reload call tracePaul Greenwalt
[ Upstream commit d3f867e7a04678640ebcbfb81893c59f4af48586 ] Commit 4da71a77fc3b ("ice: read internal temperature sensor") introduced internal temperature sensor reading via HWMON. ice_hwmon_init() was added to ice_init_feature() and ice_hwmon_exit() was added to ice_remove(). As a result if devlink reload is used to reinit the device and then the driver is removed, a call trace can occur. BUG: unable to handle page fault for address: ffffffffc0fd4b5d Call Trace: string+0x48/0xe0 vsnprintf+0x1f9/0x650 sprintf+0x62/0x80 name_show+0x1f/0x30 dev_attr_show+0x19/0x60 The call trace repeats approximately every 10 minutes when system monitoring tools (e.g., sadc) attempt to read the orphaned hwmon sysfs attributes that reference freed module memory. The sequence is: 1. Driver load, ice_hwmon_init() gets called from ice_init_feature() 2. Devlink reload down, flow does not call ice_remove() 3. Devlink reload up, ice_hwmon_init() gets called from ice_init_feature() resulting in a second instance 4. Driver unload, ice_hwmon_exit() called from ice_remove() leaving the first hwmon instance orphaned with dangling pointer Fix this by moving ice_hwmon_exit() from ice_remove() to ice_deinit_features() to ensure proper cleanup symmetry with ice_hwmon_init(). Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: add missing ice_deinit_hw() in devlink reinit pathPaul Greenwalt
[ Upstream commit 42fb5f3deb582cb96440e4683745017dbabb83d6 ] devlink-reload results in ice_init_hw failed error, and then removing the ice driver causes a NULL pointer dereference. [ +0.102213] ice 0000:ca:00.0: ice_init_hw failed: -16 ... [ +0.000001] Call Trace: [ +0.000003] <TASK> [ +0.000006] ice_unload+0x8f/0x100 [ice] [ +0.000081] ice_remove+0xba/0x300 [ice] Commit 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") removed ice_deinit_hw() from ice_deinit_dev(). As a result ice_devlink_reinit_down() no longer calls ice_deinit_hw(), but ice_devlink_reinit_up() still calls ice_init_hw(). Since the control queues are not uninitialized, ice_init_hw() fails with -EBUSY. Add ice_deinit_hw() to ice_devlink_reinit_down() to correspond with ice_init_hw() in ice_devlink_reinit_up(). Fixes: 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: Fix persistent failure in ice_get_rxfhCody Haas
[ Upstream commit f406220eb8e227ca344eef1a6d30aff53706b196 ] Several ioctl functions have the ability to call ice_get_rxfh, however all of these ioctl functions do not provide all of the expected information in ethtool_rxfh_param. For example, ethtool_get_rxfh_indir does not provide an rss_key. This previously caused ethtool_get_rxfh_indir to always fail with -EINVAL. This change draws inspiration from i40e_get_rss to handle this situation, by only calling the appropriate rss helpers when the necessary information has been provided via ethtool_rxfh_param. Fixes: b66a972abb6b ("ice: Refactor ice_set/get_rss into LUT and key specific functions") Signed-off-by: Cody Haas <chaas@riotgames.com> Closes: https://lore.kernel.org/intel-wired-lan/CAH7f-UKkJV8MLY7zCdgCrGE55whRhbGAXvgkDnwgiZ9gUZT7_w@mail.gmail.com/ Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30igc: Reduce TSN TX packet buffer from 7KB to 5KB per queueChwee-Lin Choong
[ Upstream commit 8ad1b6c1e63d25f5465b7a8aa403bdcee84b86f9 ] The previous 7 KB per queue caused TX unit hangs under heavy timestamping load. Reducing to 5 KB avoids these hangs and matches the TSN recommendation in I225/I226 SW User Manual Section 7.5.4. The 8 KB "freed" by this change is currently unused. This reduction is not expected to impact throughput, as the i226 is PCIe-limited for small TSN packets rather than TX-buffer-limited. Fixes: 0d58cdc902da ("igc: optimize TX packet buffer utilization for TSN mode") Reported-by: Zdenek Bouska <zdenek.bouska@siemens.com> Closes: https://lore.kernel.org/netdev/AS1PR10MB5675DBFE7CE5F2A9336ABFA4EBEAA@AS1PR10MB5675.EURPRD10.PROD.OUTLOOK.COM/ Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30igc: fix race condition in TX timestamp read for register 0Chwee-Lin Choong
[ Upstream commit 6990dc392a9ab10e52af37e0bee8c7b753756dc4 ] The current HW bug workaround checks the TXTT_0 ready bit first, then reads TXSTMPL_0 twice (before and after reading TXSTMPH_0) to detect whether a new timestamp was captured by timestamp register 0 during the workaround. This sequence has a race: if a new timestamp is captured after checking the TXTT_0 bit but before the first TXSTMPL_0 read, the detection fails because both the "old" and "new" values come from the same timestamp. Fix by reading TXSTMPL_0 first to establish a baseline, then checking the TXTT_0 bit. This ensures any timestamp captured during the race window will be detected. Old sequence: 1. Check TXTT_0 ready bit 2. Read TXSTMPL_0 (baseline) 3. Read TXSTMPH_0 (interrupt workaround) 4. Read TXSTMPL_0 (detect changes vs baseline) New sequence: 1. Read TXSTMPL_0 (baseline) 2. Check TXTT_0 ready bit 3. Read TXSTMPH_0 (interrupt workaround) 4. Read TXSTMPL_0 (detect changes vs baseline) Fixes: c789ad7cbebc ("igc: Work around HW bug causing missing timestamps") Suggested-by: Avi Shalev <avi.shalev@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Co-developed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30igc: Restore default Qbv schedule when changing channelsKurt Kanzenbach
[ Upstream commit 41a9a6826f20a524242a6c984845c4855f629841 ] The Multi-queue Priority (MQPRIO) and Earliest TxTime First (ETF) offloads utilize the Time Sensitive Networking (TSN) Tx mode. This mode is always coupled to IEEE 802.1Qbv time aware shaper (Qbv). Therefore, the driver sets a default Qbv schedule of all gates opened and a cycle time of 1s. This schedule is set during probe. However, the following sequence of events lead to Tx issues: - Boot a dual core system igc_probe(): igc_tsn_clear_schedule(): -> Default Schedule is set Note: At this point the driver has allocated two Tx/Rx queues, because there are only two CPUs. - ethtool -L enp3s0 combined 4 igc_ethtool_set_channels(): igc_reinit_queues() -> Default schedule is gone, per Tx ring start and end time are zero - tc qdisc replace dev enp3s0 handle 100 parent root mqprio \ num_tc 4 map 3 3 2 2 0 1 1 1 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 hw 1 igc_tsn_offload_apply(): igc_tsn_enable_offload(): -> Writes zeros to IGC_STQT(i) and IGC_ENDQT(i), causing Tx to stall/fail Therefore, restore the default Qbv schedule after changing the number of channels. Furthermore, add a restriction to not allow queue reconfiguration when TSN/Qbv is enabled, because it may lead to inconsistent states. Fixes: c814a2d2d48f ("igc: Use default cycle 'start' and 'end' values for queues") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: Fix incorrect timeout ice_release_res()Ding Hui
[ Upstream commit 01139a2ce532d77379e1593230127caa261a8036 ] The commit 5f6df173f92e ("ice: implement and use rd32_poll_timeout for ice_sq_done timeout") converted ICE_CTL_Q_SQ_CMD_TIMEOUT from jiffies to microseconds. But the ice_release_res() function was missed, and its logic still treats ICE_CTL_Q_SQ_CMD_TIMEOUT as a jiffies value. So correct the issue by usecs_to_jiffies(). Found by inspection of the DDP downloading process. Compile and modprobe tested only. Fixes: 5f6df173f92e ("ice: implement and use rd32_poll_timeout for ice_sq_done timeout") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: Avoid detrimental cleanup for bond during interface stopDave Ertman
[ Upstream commit a9d45c22ed120cdd15ff56d0a6e4700c46451901 ] When the user issues an administrative down to an interface that is the primary for an aggregate bond, the prune lists are being purged. This breaks communication to the secondary interface, which shares a prune list on the main switch block while bonded together. For the primary interface of an aggregate, avoid deleting these prune lists during stop, and since they are hardcoded to specific values for the default vlan and QinQ vlans, the attempt to re-add them during the up phase will quietly fail without any additional problem. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: initialize ring_stats->syncpJacob Keller
[ Upstream commit 8439016c3b8b5ab687c2420317b1691585106611 ] The u64_stats_sync structure is empty on 64-bit systems. However, on 32-bit systems it contains a seqcount_t which needs to be initialized. While the memory is zero-initialized, a lack of u64_stats_init means that lockdep won't get initialized properly. Fix this by adding u64_stats_init() calls to the rings just after allocation. Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: fix aux device unplugging when rdma is not supported by vportLarysa Zaremba
[ Upstream commit 4648fb2f2e7210c53b85220ee07d42d1e4bae3f9 ] If vport flags do not contain VIRTCHNL2_VPORT_ENABLE_RDMA, driver does not allocate vdev_info for this vport. This leads to kernel NULL pointer dereference in idpf_idc_vport_dev_down(), which references vdev_info for every vport regardless. Check, if vdev_info was ever allocated before unplugging aux device. Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy") Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: cap maximum Rx buffer sizeJoshua Hay
[ Upstream commit 086efe0a1ecc36cffe46640ce12649a4cd3ff171 ] The HW only supports a maximum Rx buffer size of 16K-128. On systems using large pages, the libeth logic can configure the buffer size to be larger than this. The upper bound is PAGE_SIZE while the lower bound is MTU rounded up to the nearest power of 2. For example, ARM systems with a 64K page size and an mtu of 9000 will set the Rx buffer size to 16K, which will cause the config Rx queues message to fail. Initialize the bufq/fill queue buf_len field to the maximum supported size. This will trigger the libeth logic to cap the maximum Rx buffer size by reducing the upper bound. Fixes: 74d1412ac8f37 ("idpf: use libeth Rx buffer management for payload buffer") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: David Decotigny <ddecotig@google.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: Fix error handling in idpf_vport_open()Sreedevi Joshi
[ Upstream commit 87b8ee64685bc096a087af833d4594b2332bfdb1 ] Fix error handling to properly cleanup interrupts when idpf_vport_queue_ids_init() or idpf_rx_bufs_init_all() fail. Jump to 'intr_deinit' instead of 'queues_rel' to ensure interrupts are cleaned up before releasing other resources. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: Fix RSS LUT NULL ptr issue after soft resetSreedevi Joshi
[ Upstream commit ebecca5b093895da801b3eba1a55b4ec4027d196 ] During soft reset, the RSS LUT is freed and not restored unless the interface is up. If an ethtool command that accesses the rss lut is attempted immediately after reset, it will result in NULL ptr dereference. Also, there is no need to reset the rss lut if the soft reset does not involve queue count change. After soft reset, set the RSS LUT to default values based on the updated queue count only if the reset was a result of a queue count change and the LUT was not configured by the user. In all other cases, don't touch the LUT. Steps to reproduce: ** Bring the interface down (if up) ifconfig eth1 down ** update the queue count (eg., 27->20) ethtool -L eth1 combined 20 ** display the RSS LUT ethtool -x eth1 [82375.558338] BUG: kernel NULL pointer dereference, address: 0000000000000000 [82375.558373] #PF: supervisor read access in kernel mode [82375.558391] #PF: error_code(0x0000) - not-present page [82375.558408] PGD 0 P4D 0 [82375.558421] Oops: Oops: 0000 [#1] SMP NOPTI <snip> [82375.558516] RIP: 0010:idpf_get_rxfh+0x108/0x150 [idpf] [82375.558786] Call Trace: [82375.558793] <TASK> [82375.558804] rss_prepare.isra.0+0x187/0x2a0 [82375.558827] rss_prepare_data+0x3a/0x50 [82375.558845] ethnl_default_doit+0x13d/0x3e0 [82375.558863] genl_family_rcv_msg_doit+0x11f/0x180 [82375.558886] genl_rcv_msg+0x1ad/0x2b0 [82375.558902] ? __pfx_ethnl_default_doit+0x10/0x10 [82375.558920] ? __pfx_genl_rcv_msg+0x10/0x10 [82375.558937] netlink_rcv_skb+0x58/0x100 [82375.558957] genl_rcv+0x2c/0x50 [82375.558971] netlink_unicast+0x289/0x3e0 [82375.558988] netlink_sendmsg+0x215/0x440 [82375.559005] __sys_sendto+0x234/0x240 [82375.559555] __x64_sys_sendto+0x28/0x30 [82375.560068] x64_sys_call+0x1909/0x1da0 [82375.560576] do_syscall_64+0x7a/0xfa0 [82375.561076] ? clear_bhb_loop+0x60/0xb0 [82375.561567] entry_SYSCALL_64_after_hwframe+0x76/0x7e <snip> Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: Fix RSS LUT configuration on down interfacesSreedevi Joshi
[ Upstream commit 445b49d13787da2fe8d51891ee196e5077feef44 ] RSS LUT provisioning and queries on a down interface currently return silently without effect. Users should be able to configure RSS settings even when the interface is down. Fix by maintaining RSS configuration changes in the driver's soft copy and deferring HW programming until the interface comes up. Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: Fix RSS LUT NULL pointer crash on early ethtool operationsSreedevi Joshi
[ Upstream commit 83f38f210b85676f40ba8586b5a8edae19b56995 ] The RSS LUT is not initialized until the interface comes up, causing the following NULL pointer crash when ethtool operations like rxhash on/off are performed before the interface is brought up for the first time. Move RSS LUT initialization from ndo_open to vport creation to ensure LUT is always available. This enables RSS configuration via ethtool before bringing the interface up. Simplify LUT management by maintaining all changes in the driver's soft copy and programming zeros to the indirection table when rxhash is disabled. Defer HW programming until the interface comes up if it is down during rxhash and LUT configuration changes. Steps to reproduce: ** Load idpf driver; interfaces will be created modprobe idpf ** Before bringing the interfaces up, turn rxhash off ethtool -K eth2 rxhash off [89408.371875] BUG: kernel NULL pointer dereference, address: 0000000000000000 [89408.371908] #PF: supervisor read access in kernel mode [89408.371924] #PF: error_code(0x0000) - not-present page [89408.371940] PGD 0 P4D 0 [89408.371953] Oops: Oops: 0000 [#1] SMP NOPTI <snip> [89408.372052] RIP: 0010:memcpy_orig+0x16/0x130 [89408.372310] Call Trace: [89408.372317] <TASK> [89408.372326] ? idpf_set_features+0xfc/0x180 [idpf] [89408.372363] __netdev_update_features+0x295/0xde0 [89408.372384] ethnl_set_features+0x15e/0x460 [89408.372406] genl_family_rcv_msg_doit+0x11f/0x180 [89408.372429] genl_rcv_msg+0x1ad/0x2b0 [89408.372446] ? __pfx_ethnl_set_features+0x10/0x10 [89408.372465] ? __pfx_genl_rcv_msg+0x10/0x10 [89408.372482] netlink_rcv_skb+0x58/0x100 [89408.372502] genl_rcv+0x2c/0x50 [89408.372516] netlink_unicast+0x289/0x3e0 [89408.372533] netlink_sendmsg+0x215/0x440 [89408.372551] __sys_sendto+0x234/0x240 [89408.372571] __x64_sys_sendto+0x28/0x30 [89408.372585] x64_sys_call+0x1909/0x1da0 [89408.372604] do_syscall_64+0x7a/0xfa0 [89408.373140] ? clear_bhb_loop+0x60/0xb0 [89408.373647] entry_SYSCALL_64_after_hwframe+0x76/0x7e [89408.378887] </TASK> <snip> Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: fix issue with ethtool -n command displayErik Gabriel Carrillo
[ Upstream commit 36aae2ea6bd76b8246caa50e34a4f4824f0a3be8 ] When ethtool -n is executed on an interface to display the flow steering rules, "rxclass: Unknown flow type" error is generated. The flow steering list maintained in the driver currently stores only the location and q_index but other fields of the ethtool_rx_flow_spec are not stored. This may be enough for the virtchnl command to delete the entry. However, when the ethtool -n command is used to query the flow steering rules, the ethtool_rx_flow_spec returned is not complete causing the error below. Resolve this by storing the flow spec (fsp) when rules are added and returning the complete flow spec when rules are queried. Also, change the return value from EINVAL to ENOENT when flow steering entry is not found during query by location or when deleting an entry. Add logic to detect and reject duplicate filter entries at the same location and change logic to perform upfront validation of all error conditions before adding flow rules through virtchnl. This avoids the need for additional virtchnl delete messages when subsequent operations fail, which was missing in the original upstream code. Example: Before the fix: ethtool -n eth1 2 RX rings available Total 2 rules rxclass: Unknown flow type rxclass: Unknown flow type After the fix: ethtool -n eth1 2 RX rings available Total 2 rules Filter: 0 Rule Type: TCP over IPv4 Src IP addr: 10.0.0.1 mask: 0.0.0.0 Dest IP addr: 0.0.0.0 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Src port: 0 mask: 0xffff Dest port: 0 mask: 0xffff Action: Direct to queue 0 Filter: 1 Rule Type: UDP over IPv4 Src IP addr: 10.0.0.1 mask: 0.0.0.0 Dest IP addr: 0.0.0.0 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Src port: 0 mask: 0xffff Dest port: 0 mask: 0xffff Action: Direct to queue 0 Fixes: ada3e24b84a0 ("idpf: add flow steering support") Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Co-developed-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17idpf: fix memory leak of flow steer list on rmmodSreedevi Joshi
[ Upstream commit f9841bd28b600526ca4f6713b0ca49bf7bb98452 ] The flow steering list maintains entries that are added and removed as ethtool creates and deletes flow steering rules. Module removal with active entries causes memory leak as the list is not properly cleaned up. Prevent this by iterating through the remaining entries in the list and freeing the associated memory during module removal. Add a spinlock (flow_steer_list_lock) to protect the list access from multiple threads. Fixes: ada3e24b84a0 ("idpf: add flow steering support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>