<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2023-05-17T11:58:43Z</updated>
<entry>
<title>ixgbe: Fix panic during XDP_TX with &gt; 64 CPUs</title>
<updated>2023-05-17T11:58:43Z</updated>
<author>
<name>John Hickey</name>
<email>jjh@daedalian.us</email>
</author>
<published>2023-04-25T17:03:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=785b2b5b47b1aa4c31862948b312ea845401c5ec'/>
<id>urn:sha1:785b2b5b47b1aa4c31862948b312ea845401c5ec</id>
<content type='text'>
[ Upstream commit c23ae5091a8b3e50fe755257df020907e7c029bb ]

Commit 4fe815850bdc ("ixgbe: let the xdpdrv work with more than 64 cpus")
adds support to allow XDP programs to run on systems with more than
64 CPUs by locking the XDP TX rings and indexing them using cpu % 64
(IXGBE_MAX_XDP_QS).

Upon trying this out patch on a system with more than 64 cores,
the kernel paniced with an array-index-out-of-bounds at the return in
ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx
was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS.  An example
splat:

 ==========================================================================
 UBSAN: array-index-out-of-bounds in
 /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
 index 65 is out of range for type 'ixgbe_ring *[64]'
 ==========================================================================
 BUG: kernel NULL pointer dereference, address: 0000000000000058
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 65 PID: 408 Comm: ksoftirqd/65
 Tainted: G          IOE     5.15.0-48-generic #54~20.04.1-Ubuntu
 Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
 RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
 Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 &lt;44&gt; 0f b7
 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
 RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
 RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
 RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
 RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
 R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
 R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
 FS:  0000000000000000(0000) GS:ffff92b9dfc00000(0000)
 knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  &lt;TASK&gt;
  ixgbe_poll+0x103e/0x1280 [ixgbe]
  ? sched_clock_cpu+0x12/0xe0
  __napi_poll+0x30/0x160
  net_rx_action+0x11c/0x270
  __do_softirq+0xda/0x2ee
  run_ksoftirqd+0x2f/0x50
  smpboot_thread_fn+0xb7/0x150
  ? sort_range+0x30/0x30
  kthread+0x127/0x150
  ? set_kthread_struct+0x50/0x50
  ret_from_fork+0x1f/0x30
  &lt;/TASK&gt;

I think this is how it happens:

Upon loading the first XDP program on a system with more than 64 CPUs,
ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup.  However,
immediately after this, the rings are reconfigured by ixgbe_setup_tc.
ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
it is non-zero.  Commenting out the decrement in ixgbe_free_q_vector
stopped my system from panicing.

I suspect to make the original patch work, I would need to load an XDP
program and then replace it in order to get ixgbe_xdp_locking_key back
above 0 since ixgbe_setup_tc is only called when transitioning between
XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
incremented every time ixgbe_xdp_setup is called.

Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
with more than 64 CPUs.

Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied
to the number of CPUs present, there is no reason to disable it upon
unloading an XDP program.  To avoid confusion, I have moved enabling
ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path.

Fixes: 4fe815850bdc ("ixgbe: let the xdpdrv work with more than 64 cpus")
Signed-off-by: John Hickey &lt;jjh@daedalian.us&gt;
Reviewed-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Tested-by: Chandan Kumar Rout &lt;chandanx.rout@intel.com&gt; (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen &lt;anthony.l.nguyen@intel.com&gt;
Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: drop the weight argument from netif_napi_add</title>
<updated>2022-09-29T01:57:14Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-09-27T13:27:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b48b89f9c189d24eb5e2b4a0ac067da5a24ee86d'/>
<id>urn:sha1:b48b89f9c189d24eb5e2b4a0ac067da5a24ee86d</id>
<content type='text'>
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().

Acked-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt; # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ixgbe: let the xdpdrv work with more than 64 cpus</title>
<updated>2021-09-30T12:38:08Z</updated>
<author>
<name>Jason Xing</name>
<email>xingwanli@kuaishou.com</email>
</author>
<published>2021-09-29T17:56:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4fe815850bdc8d4cc94e06fe1de069424a895826'/>
<id>urn:sha1:4fe815850bdc8d4cc94e06fe1de069424a895826</id>
<content type='text'>
Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the
server is equipped with more than 64 cpus online. So it turns out that
the loading of xdpdrv causes the "NOMEM" failure.

Actually, we can adjust the algorithm and then make it work through
mapping the current cpu to some xdp ring with the protect of @tx_lock.

Here are some numbers before/after applying this patch with xdp-example
loaded on the eth0X:

As client (tx path):
                     Before    After
TCP_STREAM send-64   734.14    714.20
TCP_STREAM send-128  1401.91   1395.05
TCP_STREAM send-512  5311.67   5292.84
TCP_STREAM send-1k   9277.40   9356.22 (not stable)
TCP_RR     send-1    22559.75  21844.22
TCP_RR     send-128  23169.54  22725.13
TCP_RR     send-512  21670.91  21412.56

As server (rx path):
                     Before    After
TCP_STREAM send-64   1416.49   1383.12
TCP_STREAM send-128  3141.49   3055.50
TCP_STREAM send-512  9488.73   9487.44
TCP_STREAM send-1k   9491.17   9356.22 (not stable)
TCP_RR     send-1    23617.74  23601.60
...

Notice: the TCP_RR mode is unstable as the official document explains.

I tested many times with different parameters combined through netperf.
Though the result is not that accurate, I cannot see much influence on
this patch. The static key is places on the hot path, but it actually
shouldn't cause a huge regression theoretically.

Co-developed-by: Shujin Li &lt;lishujin@kuaishou.com&gt;
Signed-off-by: Shujin Li &lt;lishujin@kuaishou.com&gt;
Signed-off-by: Jason Xing &lt;xingwanli@kuaishou.com&gt;
Tested-by: Sandeep Penigalapati &lt;sandeep.penigalapati@intel.com&gt;
Signed-off-by: Tony Nguyen &lt;anthony.l.nguyen@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ixgbe: Fix fall-through warnings for Clang</title>
<updated>2021-03-23T18:34:02Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2021-03-05T09:00:00Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=27e40255e5aca08220893f3a7441741042af072d'/>
<id>urn:sha1:27e40255e5aca08220893f3a7441741042af072d</id>
<content type='text'>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: Tony Nguyen &lt;anthony.l.nguyen@intel.com&gt;
</content>
</entry>
<entry>
<title>net: remove napi_hash_del() from driver-facing API</title>
<updated>2020-09-10T20:08:46Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2020-09-09T17:37:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5198d545dba8ad893f5e5a029ca8d43ee7bcf011'/>
<id>urn:sha1:5198d545dba8ad893f5e5a029ca8d43ee7bcf011</id>
<content type='text'>
We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.

Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.

Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.

Some notes on driver oddness:
 - veth observed the grace period before calling netif_napi_del()
   but that should not matter
 - myri10ge observed normal RCU flavor
 - bnx2x and enic did not actually observe the grace period
   (unless they did so implicitly)
 - virtio_net and enic only unhashed Rx NAPIs

The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.

This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev-&gt;napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ixgbe: protect ring accesses with READ- and WRITE_ONCE</title>
<updated>2020-06-19T05:30:04Z</updated>
<author>
<name>Ciara Loftus</name>
<email>ciara.loftus@intel.com</email>
</author>
<published>2020-06-09T13:19:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f140ad9fe2ae16f385f8fe4dc9cf67bb4c51d794'/>
<id>urn:sha1:f140ad9fe2ae16f385f8fe4dc9cf67bb4c51d794</id>
<content type='text'>
READ_ONCE should be used when reading rings prior to accessing the
statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
usage when allocating and freeing the rings, to ensure protected access.

Signed-off-by: Ciara Loftus &lt;ciara.loftus@intel.com&gt;
Tested-by: Andrew Bowers &lt;andrewx.bowers@intel.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
</content>
</entry>
<entry>
<title>ixgbe: Make use of cpumask_local_spread to improve RSS locality</title>
<updated>2019-11-04T21:12:15Z</updated>
<author>
<name>Alexander Duyck</name>
<email>alexander.h.duyck@linux.intel.com</email>
</author>
<published>2019-09-21T00:18:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=780e354dcdb911351c624e11a86e9b3155d4e7ab'/>
<id>urn:sha1:780e354dcdb911351c624e11a86e9b3155d4e7ab</id>
<content type='text'>
This patch is meant to address locality issues present in the ixgbe driver
when it is loaded on a system supporting multiple NUMA nodes and more CPUs
then the device can map in a 1:1 fashion. Instead of just arbitrarily
mapping itself to CPUs 0-62 it would make much more sense to map itself to
the local CPUs first, and then map itself to any remaining CPUs that might
be used.

The first effect of this is that queue 0 should always be allocated on the
local CPU/NUMA node. This is important as it is the default destination if
a packet doesn't match any existing flow director filter or RSS rule and as
such having it local should help to reduce QPI cross-talk in the event of
an unrecognized traffic type.

In addition this should increase the likelihood of the RSS queues being
allocated and used on CPUs local to the device while the ATR/Flow Director
queues would be able to route traffic directly to the CPU that is likely to
be processing it.

Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Tested-by: Andrew Bowers &lt;andrewx.bowers@intel.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
</content>
</entry>
<entry>
<title>ixgbe: Use struct_size() helper</title>
<updated>2019-02-09T07:03:48Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2019-02-08T04:22:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=439bb9edd430f606cce80b791307190ba23f7266'/>
<id>urn:sha1:439bb9edd430f606cce80b791307190ba23f7266</id>
<content type='text'>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

Notice that, in this case, variable size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ixgbe: add AF_XDP zero-copy Rx support</title>
<updated>2018-10-03T19:51:14Z</updated>
<author>
<name>Björn Töpel</name>
<email>bjorn.topel@intel.com</email>
</author>
<published>2018-10-02T08:00:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d0bcacd0a130974f58a56318db7a5ca6a7ba1d5a'/>
<id>urn:sha1:d0bcacd0a130974f58a56318db7a5ca6a7ba1d5a</id>
<content type='text'>
This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Tested-by: William Tu &lt;u9012063@gmail.com&gt;
Tested-by: Andrew Bowers &lt;andrewx.bowers@intel.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
</content>
</entry>
<entry>
<title>ixgbe: Fix setting of TC configuration for macvlan case</title>
<updated>2018-06-11T14:49:55Z</updated>
<author>
<name>Alexander Duyck</name>
<email>alexander.h.duyck@intel.com</email>
</author>
<published>2018-06-04T15:07:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=646bb57ce86e4d7b0bd9d33244450ae009411e48'/>
<id>urn:sha1:646bb57ce86e4d7b0bd9d33244450ae009411e48</id>
<content type='text'>
When we were enabling macvlan interfaces we weren't correctly configuring
things until ixgbe_setup_tc was called a second time either by tweaking the
number of queues or increasing the macvlan count past 15.

The issue came down to the fact that num_rx_pools is not populated until
after the queues and interrupts are reinitialized.

Instead of trying to set it sooner we can just move the call to setup at
least 1 traffic class to the SR-IOV/VMDq setup function so that we just set
it for this one case. We already had a spot that was configuring the queues
for TC 0 in the code here anyway so it makes sense to also set the number
of TCs here as well.

Fixes: 49cfbeb7a95c ("ixgbe: Fix handling of macvlan Tx offload")
Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Andrew Bowers &lt;andrewx.bowers@intel.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
</content>
</entry>
</feed>
