<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/net/ethernet/microsoft/mana, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2026-03-19T15:15:18Z</updated>
<entry>
<title>net: mana: Ring doorbell at 4 CQ wraparounds</title>
<updated>2026-03-19T15:15:18Z</updated>
<author>
<name>Long Li</name>
<email>longli@microsoft.com</email>
</author>
<published>2026-02-26T19:28:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e07c00d20342d4af174e7f9094219b5e22635fee'/>
<id>urn:sha1:e07c00d20342d4af174e7f9094219b5e22635fee</id>
<content type='text'>
commit dabffd08545ffa1d7183bc45e387860984025291 upstream.

MANA hardware requires at least one doorbell ring every 8 wraparounds
of the CQ. The driver rings the doorbell as a form of flow control to
inform hardware that CQEs have been consumed.

The NAPI poll functions mana_poll_tx_cq() and mana_poll_rx_cq() can
poll up to CQE_POLLING_BUFFER (512) completions per call. If the CQ
has fewer than 512 entries, a single poll call can process more than
4 wraparounds without ringing the doorbell. The doorbell threshold
check also uses "&gt;" instead of "&gt;=", delaying the ring by one extra
CQE beyond 4 wraparounds. Combined, these issues can cause the driver
to exceed the 8-wraparound hardware limit, leading to missed
completions and stalled queues.

Fix this by capping the number of CQEs polled per call to 4 wraparounds
of the CQ in both TX and RX paths. Also change the doorbell threshold
from "&gt;" to "&gt;=" so the doorbell is rung as soon as 4 wraparounds are
reached.

Cc: stable@vger.kernel.org
Fixes: 58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
Signed-off-by: Long Li &lt;longli@microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Reviewed-by: Vadim Fedorenko &lt;vadim.fedorenko@linux.dev&gt;
Link: https://patch.msgid.link/20260226192833.1050807-1-longli@microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/mana: Null service_wq on setup error to prevent double destroy</title>
<updated>2026-03-19T15:14:58Z</updated>
<author>
<name>Shiraz Saleem</name>
<email>shirazsaleem@microsoft.com</email>
</author>
<published>2026-03-09T17:24:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6c92392602b451e3869f15ab685f8f650e942b13'/>
<id>urn:sha1:6c92392602b451e3869f15ab685f8f650e942b13</id>
<content type='text'>
[ Upstream commit 87c2302813abc55c46485711a678e3c312b00666 ]

In mana_gd_setup() error path, set gc-&gt;service_wq to NULL after
destroy_workqueue() to match the cleanup in mana_gd_cleanup().
This prevents a use-after-free if the workqueue pointer is checked
after a failed setup.

Fixes: f975a0955276 ("net: mana: Fix double destroy_workqueue on service rescan PCI path")
Signed-off-by: Shiraz Saleem &lt;shirazsaleem@microsoft.com&gt;
Signed-off-by: Konstantin Taranov &lt;kotaranov@microsoft.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Fix double destroy_workqueue on service rescan PCI path</title>
<updated>2026-03-04T12:20:57Z</updated>
<author>
<name>Dipayaan Roy</name>
<email>dipayanroy@linux.microsoft.com</email>
</author>
<published>2026-02-24T12:38:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a9a7c3203fdc4d4a8d8a7a3b1ed05d2bb4c6e77e'/>
<id>urn:sha1:a9a7c3203fdc4d4a8d8a7a3b1ed05d2bb4c6e77e</id>
<content type='text'>
[ Upstream commit f975a0955276579e2176a134366ed586071c7c6a ]

While testing corner cases in the driver, a use-after-free crash
was found on the service rescan PCI path.

When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup()
destroys gc-&gt;service_wq. If the subsequent mana_gd_resume() fails
with -ETIMEDOUT or -EPROTO, the code falls through to
mana_serv_rescan() which triggers pci_stop_and_remove_bus_device().
This invokes the PCI .remove callback (mana_gd_remove), which calls
mana_gd_cleanup() a second time, attempting to destroy the already-
freed workqueue. Fix this by NULL-checking gc-&gt;service_wq in
mana_gd_cleanup() and setting it to NULL after destruction.

Call stack of issue for reference:
[Sat Feb 21 18:53:48 2026] Call Trace:
[Sat Feb 21 18:53:48 2026]  &lt;TASK&gt;
[Sat Feb 21 18:53:48 2026]  mana_gd_cleanup+0x33/0x70 [mana]
[Sat Feb 21 18:53:48 2026]  mana_gd_remove+0x3a/0xc0 [mana]
[Sat Feb 21 18:53:48 2026]  pci_device_remove+0x41/0xb0
[Sat Feb 21 18:53:48 2026]  device_remove+0x46/0x70
[Sat Feb 21 18:53:48 2026]  device_release_driver_internal+0x1e3/0x250
[Sat Feb 21 18:53:48 2026]  device_release_driver+0x12/0x20
[Sat Feb 21 18:53:48 2026]  pci_stop_bus_device+0x6a/0x90
[Sat Feb 21 18:53:48 2026]  pci_stop_and_remove_bus_device+0x13/0x30
[Sat Feb 21 18:53:48 2026]  mana_do_service+0x180/0x290 [mana]
[Sat Feb 21 18:53:48 2026]  mana_serv_func+0x24/0x50 [mana]
[Sat Feb 21 18:53:48 2026]  process_one_work+0x190/0x3d0
[Sat Feb 21 18:53:48 2026]  worker_thread+0x16e/0x2e0
[Sat Feb 21 18:53:48 2026]  kthread+0xf7/0x130
[Sat Feb 21 18:53:48 2026]  ? __pfx_worker_thread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork+0x269/0x350
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork_asm+0x1a/0x30
[Sat Feb 21 18:53:48 2026]  &lt;/TASK&gt;

Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing events")
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Fix use-after-free in reset service rescan path</title>
<updated>2025-12-28T09:34:00Z</updated>
<author>
<name>Dipayaan Roy</name>
<email>dipayanroy@linux.microsoft.com</email>
</author>
<published>2025-12-18T13:10:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3387a7ad478b46970ae8254049167d166e398aeb'/>
<id>urn:sha1:3387a7ad478b46970ae8254049167d166e398aeb</id>
<content type='text'>
When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc-&gt;in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  &lt;TASK&gt;
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  &lt;/TASK&gt;

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Long Li &lt;longli@microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: mana: Handle hardware recovery events when probing the device</title>
<updated>2025-12-01T21:53:53Z</updated>
<author>
<name>Long Li</name>
<email>longli@microsoft.com</email>
</author>
<published>2025-11-26T21:45:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9bf66036d686b9a67000ba22bd94be13a4ea79ac'/>
<id>urn:sha1:9bf66036d686b9a67000ba22bd94be13a4ea79ac</id>
<content type='text'>
When MANA is being probed, it's possible that hardware is in recovery
mode and the device may get GDMA_EQE_HWC_RESET_REQUEST over HWC in the
middle of the probe. Detect such condition and go through the recovery
service procedure.

Signed-off-by: Long Li &lt;longli@microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1764193552-9712-1-git-send-email-longli@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Drop TX skb on post_work_request failure and unmap resources</title>
<updated>2025-11-20T04:11:57Z</updated>
<author>
<name>Aditya Garg</name>
<email>gargaditya@linux.microsoft.com</email>
</author>
<published>2025-11-18T11:11:09Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=45120304e84171fd215c1b57b15b285446d15106'/>
<id>urn:sha1:45120304e84171fd215c1b57b15b285446d15106</id>
<content type='text'>
Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.

Signed-off-by: Aditya Garg &lt;gargaditya@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Handle SKB if TX SGEs exceed hardware limit</title>
<updated>2025-11-20T04:11:57Z</updated>
<author>
<name>Aditya Garg</name>
<email>gargaditya@linux.microsoft.com</email>
</author>
<published>2025-11-18T11:11:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=934fa943b53795339486cc0026b3ab7ad39dc600'/>
<id>urn:sha1:934fa943b53795339486cc0026b3ab7ad39dc600</id>
<content type='text'>
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.

Also, Add ethtool counter for SKBs linearized

Co-developed-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Signed-off-by: Aditya Garg &lt;gargaditya@linux.microsoft.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Add standard counter rx_missed_errors</title>
<updated>2025-11-18T03:52:30Z</updated>
<author>
<name>Erni Sri Satya Vennela</name>
<email>ernis@linux.microsoft.com</email>
</author>
<published>2025-11-14T11:43:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=be4f1d67ec56f23f37714ac73c01094e63c7ff28'/>
<id>urn:sha1:be4f1d67ec56f23f37714ac73c01094e63c7ff28</id>
<content type='text'>
Report standard counter stats-&gt;rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.

Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.

To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.

Signed-off-by: Erni Sri Satya Vennela &lt;ernis@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Move hardware counter stats from per-port to per-VF context</title>
<updated>2025-11-18T03:52:30Z</updated>
<author>
<name>Erni Sri Satya Vennela</name>
<email>ernis@linux.microsoft.com</email>
</author>
<published>2025-11-14T11:43:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e275d9091c01b3b46f3ec534ce4ac77cffc9e3ae'/>
<id>urn:sha1:e275d9091c01b3b46f3ec534ce4ac77cffc9e3ae</id>
<content type='text'>
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.

Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S &lt;interface&gt;, maintaining consistency with
previous behavior.

Signed-off-by: Erni Sri Satya Vennela &lt;ernis@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Fix incorrect speed reported by debugfs</title>
<updated>2025-11-08T02:49:14Z</updated>
<author>
<name>Erni Sri Satya Vennela</name>
<email>ernis@linux.microsoft.com</email>
</author>
<published>2025-11-05T19:04:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=140039580efa96d18576a6c9fa6b1158be8a1d0f'/>
<id>urn:sha1:140039580efa96d18576a6c9fa6b1158be8a1d0f</id>
<content type='text'>
Once the netshaper is created for MANA, the current bandwidth
is reported in debugfs like this:

$ sudo ./tools/net/ynl/pyynl/cli.py \
  --spec Documentation/netlink/specs/net_shaper.yaml \
  --do set \
  --json '{"ifindex":'3',
           "handle":{ "scope": "netdev", "id":'1' },
           "bw-max": 200000000 }'
None

$ sudo cat /sys/kernel/debug/mana/1/vport0/current_speed
200

After the shaper  is deleted, it is expected to report
the maximum speed supported by the SKU. But currently it is
reporting 0, which is incorrect.

Fix this inconsistency, by resetting apc-&gt;speed to apc-&gt;max_speed
during deletion of the shaper object. This will improve
readability and debuggability.

Signed-off-by: Erni Sri Satya Vennela &lt;ernis@linux.microsoft.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Link: https://patch.msgid.link/1762369468-32570-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
