<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv4/tcp_bbr.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-11-18T01:15:15Z</updated>
<entry>
<title>treewide: use get_random_u32_below() instead of deprecated function</title>
<updated>2022-11-18T01:15:15Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-10T02:44:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8032bf1233a74627ce69b803608e650f3f35971c'/>
<id>urn:sha1:8032bf1233a74627ce69b803608e650f3f35971c</id>
<content type='text'>
This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
  (E)

Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # for xfs
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt; # for damon
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt; # for infiniband
Reviewed-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt; # for arm
Acked-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt; # for mmc
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>bpf: Switch to new kfunc flags infrastructure</title>
<updated>2022-07-22T03:59:42Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2022-07-21T13:42:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a4703e3184320d6e15e2bc81d2ccf1c8c883f9d1'/>
<id>urn:sha1:a4703e3184320d6e15e2bc81d2ccf1c8c883f9d1</id>
<content type='text'>
Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.

Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: allow gso_max_size to exceed 65536</title>
<updated>2022-05-16T09:18:55Z</updated>
<author>
<name>Alexander Duyck</name>
<email>alexanderduyck@fb.com</email>
</author>
<published>2022-05-13T18:33:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7c4e983c4f3cf94fcd879730c6caa877e0768a4d'/>
<id>urn:sha1:7c4e983c4f3cf94fcd879730c6caa877e0768a4d</id>
<content type='text'>
The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.

With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.

So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.

v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
               in a new sk_trim_gso_size() helper.
               netif_set_tso_max_size() caps the requested TSO size
               with GSO_MAX_SIZE.

Signed-off-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add accessors to read/set tp-&gt;snd_cwnd</title>
<updated>2022-04-06T19:05:41Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-04-05T23:35:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=40570375356c874b1578e05c1dcc3ff7c1322dbe'/>
<id>urn:sha1:40570375356c874b1578e05c1dcc3ff7c1322dbe</id>
<content type='text'>
We had various bugs over the years with code
breaking the assumption that tp-&gt;snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp-&gt;prior_cwnd) added
in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp-&gt;snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Suggested-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove check_kfunc_call callback and old kfunc BTF ID API</title>
<updated>2022-01-18T22:26:41Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2022-01-14T16:39:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b202d84422223b7222cba5031d182f20b37e146e'/>
<id>urn:sha1:b202d84422223b7222cba5031d182f20b37e146e</id>
<content type='text'>
Completely remove the old code for check_kfunc_call to help it work
with modules, and also the callback itself.

The previous commit adds infrastructure to register all sets and put
them in vmlinux or module BTF, and concatenates all related sets
organized by the hook and the type. Once populated, these sets remain
immutable for the lifetime of the struct btf.

Also, since we don't need the 'owner' module anywhere when doing
check_kfunc_call, drop the 'btf_modp' module parameter from
find_kfunc_desc_btf.

Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Enable TCP congestion control kfunc from modules</title>
<updated>2021-10-06T00:07:41Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2021-10-02T01:17:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0e32dfc80bae53b05e9eda7eaf259f30ab9ba43a'/>
<id>urn:sha1:0e32dfc80bae53b05e9eda7eaf259f30ab9ba43a</id>
<content type='text'>
This commit moves BTF ID lookup into the newly added registration
helper, in a way that the bbr, cubic, and dctcp implementation set up
their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not
dependent on modules are looked up from the wrapper function.

This lifts the restriction for them to be compiled as built in objects,
and can be loaded as modules if required. Also modify Makefile.modfinal
to call resolve_btfids for each module.

Note that since kernel kfunc_ids never overlap with module kfunc_ids, we
only match the owner for module btf id sets.

See following commits for background on use of:

 CONFIG_X86 ifdef:
 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86)

 CONFIG_DYNAMIC_FTRACE ifdef:
 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE)

Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
</content>
</entry>
<entry>
<title>tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets</title>
<updated>2021-08-11T22:00:15Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2021-08-11T02:40:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6de035fec045f8ae5ee5f3a02373a18b939e91fb'/>
<id>urn:sha1:6de035fec045f8ae5ee5f3a02373a18b939e91fb</id>
<content type='text'>
Currently if BBR congestion control is initialized after more than 2B
packets have been delivered, depending on the phase of the
tp-&gt;delivered counter the tracking of BBR round trips can get stuck.

The bug arises because if tp-&gt;delivered is between 2^31 and 2^32 at
the time the BBR congestion control module is initialized, then the
initialization of bbr-&gt;next_rtt_delivered to 0 will cause the logic to
believe that the end of the round trip is still billions of packets in
the future. More specifically, the following check will fail
repeatedly:

  !before(rs-&gt;prior_delivered, bbr-&gt;next_rtt_delivered)

and thus the connection will take up to 2B packets delivered before
that check will pass and the connection will set:

  bbr-&gt;round_start = 1;

This could cause many mechanisms in BBR to fail to trigger, for
example bbr_check_full_bw_reached() would likely never exit STARTUP.

This bug is 5 years old and has not been observed, and as a practical
matter this would likely rarely trigger, since it would require
transferring at least 2B packets, or likely more than 3 terabytes of
data, before switching congestion control algorithms to BBR.

This patch is a stable candidate for kernels as far back as v4.9,
when tcp_bbr.c was added.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Reviewed-by: Kevin Yang &lt;yyd@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: only postpone PROBE_RTT if RTT is &lt; current min_rtt estimate</title>
<updated>2020-11-17T19:03:22Z</updated>
<author>
<name>Ryan Sharpelletti</name>
<email>sharpelletti@google.com</email>
</author>
<published>2020-11-16T17:44:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1b9e2a8c99a5c021041bfb2d512dc3ed92a94ffd'/>
<id>urn:sha1:1b9e2a8c99a5c021041bfb2d512dc3ed92a94ffd</id>
<content type='text'>
During loss recovery, retransmitted packets are forced to use TCP
timestamps to calculate the RTT samples, which have a millisecond
granularity. BBR is designed using a microsecond granularity. As a
result, multiple RTT samples could be truncated to the same RTT value
during loss recovery. This is problematic, as BBR will not enter
PROBE_RTT if the RTT sample is &lt;= the current min_rtt sample, meaning
that if there are persistent losses, PROBE_RTT will constantly be
pushed off and potentially never re-entered. This patch makes sure
that BBR enters PROBE_RTT by checking if RTT sample is &lt; the current
min_rtt sample, rather than &lt;=.

The Netflix transport/TCP team discovered this bug in the Linux TCP
BBR code during lab tests.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Ryan Sharpelletti &lt;sharpelletti@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp_bbr: improve arithmetic division in bbr_update_bw()</title>
<updated>2020-01-21T09:45:49Z</updated>
<author>
<name>Wen Yang</name>
<email>wenyang@linux.alibaba.com</email>
</author>
<published>2020-01-20T10:04:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5b2f1f3070b6447b76174ea8bfb7390dc6253ebd'/>
<id>urn:sha1:5b2f1f3070b6447b76174ea8bfb7390dc6253ebd</id>
<content type='text'>
do_div() does a 64-by-32 division. Use div64_long() instead of it
if the divisor is long, to avoid truncation to 32-bit.
And as a nice side effect also cleans up the function a bit.

Signed-off-by: Wen Yang &lt;wenyang@linux.alibaba.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Alexey Kuznetsov &lt;kuznet@ms2.inr.ac.ru&gt;
Cc: Hideaki YOSHIFUJI &lt;yoshfuji@linux-ipv6.org&gt;
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: annotate lockless accesses to sk-&gt;sk_pacing_shift</title>
<updated>2019-12-18T06:09:52Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-12-17T02:51:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7c68fa2bddda6d942bd387c9ba5b4300737fd991'/>
<id>urn:sha1:7c68fa2bddda6d942bd387c9ba5b4300737fd991</id>
<content type='text'>
sk-&gt;sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.

This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk-&gt;sk_pacing_shift :

if (sk-&gt;sk_pacing_shift != val)
	sk-&gt;sk_pacing_shift = val;

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
