<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/net/pkt_sched.h, branch linux-5.11.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.11.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2020-11-07T18:41:15Z</updated>
<entry>
<title>net: sched: convert tasklets to use new tasklet_setup() API</title>
<updated>2020-11-07T18:41:15Z</updated>
<author>
<name>Allen Pais</name>
<email>apais@linux.microsoft.com</email>
</author>
<published>2020-11-03T09:18:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6e1978a9a941d342a8b2e6432f52e07d4af86e21'/>
<id>urn:sha1:6e1978a9a941d342a8b2e6432f52e07d4af86e21</id>
<content type='text'>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Signed-off-by: Romain Perier &lt;romain.perier@gmail.com&gt;
Signed-off-by: Allen Pais &lt;apais@linux.microsoft.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: get rid of qdisc-&gt;padded</title>
<updated>2020-10-09T15:08:08Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-10-07T16:51:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=846e463a70e910f2a831aea19f9a361422a2ff5b'/>
<id>urn:sha1:846e463a70e910f2a831aea19f9a361422a2ff5b</id>
<content type='text'>
kmalloc() of sufficiently big portion of memory is cache-aligned
in regular conditions. If some debugging options are used,
there is no reason qdisc structures would need 64-byte alignment
if most other kernel structures are not aligned.

This get rid of QDISC_ALIGN and QDISC_ALIGNTO.

Addition of privdata field will help implementing
the reverse of qdisc_priv() and documents where
the private data is.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Allen Pais &lt;allen.lkml@gmail.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: consistently handle layer3 header accesses in the presence of VLANs</title>
<updated>2020-07-03T21:34:53Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2020-07-03T20:26:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4'/>
<id>urn:sha1:d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4</id>
<content type='text'>
There are a couple of places in net/sched/ that check skb-&gt;protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb-&gt;protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb-&gt;protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
  bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb-&gt;protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev &lt;i.ponetaev@ndmsystems.com&gt;
Fixes: d8b9605d2697 ("net: sched: fix skb-&gt;protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: add qdisc_watchdog_schedule_range_ns()</title>
<updated>2020-03-18T04:16:34Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-03-17T02:12:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=efe074c2cc1c67021481f56ba87cc61d1ab03e33'/>
<id>urn:sha1:efe074c2cc1c67021481f56ba87cc61d1ab03e33</id>
<content type='text'>
Some packet schedulers might want to add a slack
when programming hrtimers. This can reduce number
of interrupts and increase batch sizes and thus
give good xmit_more savings.

This commit adds qdisc_watchdog_schedule_range_ns()
helper, with an extra delta_ns parameter.

Legacy qdisc_watchdog_schedule_n() becomes an inline
passing a zero slack.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: Replace zero-length array with flexible-array member</title>
<updated>2020-03-01T05:27:02Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2020-02-27T20:58:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b90feaff2a2cbf339069adec4bfd6091cfb44b50'/>
<id>urn:sha1:b90feaff2a2cbf339069adec4bfd6091cfb44b50</id>
<content type='text'>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2019-09-17T21:51:10Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-09-17T21:51:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1bab8d4c488be22d57f9dd09968c90a0ddc413bf'/>
<id>urn:sha1:1bab8d4c488be22d57f9dd09968c90a0ddc413bf</id>
<content type='text'>
Pull in bug fixes from 'net' tree for the merge window.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>taprio: Add support for hardware offloading</title>
<updated>2019-09-16T19:32:57Z</updated>
<author>
<name>Vinicius Costa Gomes</name>
<email>vinicius.gomes@intel.com</email>
</author>
<published>2019-09-15T01:59:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9c66d15646760eb8982242b4531c4d4fd36118fd'/>
<id>urn:sha1:9c66d15646760eb8982242b4531c4d4fd36118fd</id>
<content type='text'>
This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.

The gate mask acts on traffic classes (groups of queues of same
priority), as specified in IEEE 802.1Q-2018, and following the existing
taprio and mqprio semantics.
It is up to the driver to perform conversion between tc and individual
netdev queues if for some reason it needs to make that distinction.

Full offload is requested from the network interface by specifying
"flags 2" in the tc qdisc creation command, which in turn corresponds to
the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.

The important detail here is the clockid which is implicitly /dev/ptpN
for full offload, and hence not configurable.

A reference counting API is added to support the use case where Ethernet
drivers need to keep the taprio offload structure locally (i.e. they are
a multi-port switch driver, and configuring a port depends on the
settings of other ports as well). The refcount_t variable is kept in a
private structure (__tc_taprio_qopt_offload) and not exposed to drivers.

In the future, the private structure might also be expanded with a
backpointer to taprio_sched *q, to implement the notification system
described in the patch (of when admin became oper, or an error occurred,
etc, so the offload can be monitored with 'tc qdisc show').

Signed-off-by: Vinicius Costa Gomes &lt;vinicius.gomes@intel.com&gt;
Signed-off-by: Voon Weifeng &lt;weifeng.voon@intel.com&gt;
Signed-off-by: Vladimir Oltean &lt;olteanv@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/sched: fix race between deactivation and dequeue for NOLOCK qdisc</title>
<updated>2019-09-15T18:35:55Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2019-09-12T10:02:42Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d518d2ed8640c1cbbbb6f63939e3e65471817367'/>
<id>urn:sha1:d518d2ed8640c1cbbbb6f63939e3e65471817367</id>
<content type='text'>
The test implemented by some_qdisc_is_busy() is somewhat loosy for
NOLOCK qdisc, as we may hit the following scenario:

CPU1						CPU2
// in net_tx_action()
clear_bit(__QDISC_STATE_SCHED...);
						// in some_qdisc_is_busy()
						val = (qdisc_is_running(q) ||
						       test_bit(__QDISC_STATE_SCHED,
								&amp;q-&gt;state));
						// here val is 0 but...
qdisc_run(q)
// ... CPU1 is going to run the qdisc next

As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
both the above bit operations are under the root qdisc lock().

After commit 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue")
the race can cause use after free and/or null ptr dereference, but the root
cause is likely older.

This patch addresses the issue explicitly checking for deactivation under
the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
scenario becomes a no-op.

Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
but that is safe due to the skb_array() locking, and we can't avoid that
for NOLOCK qdiscs.

Fixes: 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue")
Reported-by: Li Shuang &lt;shuali@redhat.com&gt;
Reported-and-tested-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: extend Qdisc with rcu</title>
<updated>2018-09-26T03:17:35Z</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-09-24T16:22:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3a7d0d07a386716b459b00783b11a8211cefcc0f'/>
<id>urn:sha1:3a7d0d07a386716b459b00783b11a8211cefcc0f</id>
<content type='text'>
Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/sched: Add HW offloading capability to ETF</title>
<updated>2018-07-04T13:30:27Z</updated>
<author>
<name>Jesus Sanchez-Palencia</name>
<email>jesus.sanchez-palencia@intel.com</email>
</author>
<published>2018-07-03T22:42:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=88cab77162e86e0f6a2b7e4f859c1435c4e24feb'/>
<id>urn:sha1:88cab77162e86e0f6a2b7e4f859c1435c4e24feb</id>
<content type='text'>
Add infra so etf qdisc supports HW offload of time-based transmission.

For hw offload, the time sorted list is still used, so packets are
dequeued always in order of txtime.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \
	   clockid CLOCK_REALTIME

In this example, the Qdisc will use HW offload for the control of the
transmission time through the network adapter. The hrtimer used for
packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME
as reference and packets leave the Qdisc "delta" (100000) nanoseconds
before their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by the hrtimer, the system clock
and the PHC clock must be synchronized for this mode to behave as
expected.

Signed-off-by: Jesus Sanchez-Palencia &lt;jesus.sanchez-palencia@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
