<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv4/udp.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-11-21T13:05:39Z</updated>
<entry>
<title>net: Return errno in sk-&gt;sk_prot-&gt;get_port().</title>
<updated>2022-11-21T13:05:39Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-18T18:25:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7a7160edf1bfde25422262fb26851cef65f695d3'/>
<id>urn:sha1:7a7160edf1bfde25422262fb26851cef65f695d3</id>
<content type='text'>
We assume the correct errno is -EADDRINUSE when sk-&gt;sk_prot-&gt;get_port()
fails, so some -&gt;get_port() functions return just 1 on failure and the
callers return -EADDRINUSE instead.

However, mptcp_get_port() can return -EINVAL.  Let's not ignore the error.

Note the only exception is inet_autobind(), all of whose callers return
-EAGAIN instead.

Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Introduce optional per-netns hash table.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9804985bf27f8fbcf0d96c7435b5ad94a2a6ea20'/>
<id>urn:sha1:9804985bf27f8fbcf0d96c7435b5ad94a2a6ea20</id>
<content type='text'>
The maximum hash table size is 64K due to the nature of the protocol. [0]
It's smaller than TCP, and fewer sockets can cause a performance drop.

On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in
different netns, creating 32Mi sockets without data transfer in the root
netns causes regression for the iperf3's connection.

  uhash_entries		sockets		length		Gbps
	    64K		      1		     1		5.69
			    1Mi		    16		5.27
			    2Mi		    32		4.90
			    4Mi		    64		4.09
			    8Mi		   128		2.96
			   16Mi		   256		2.06
			   32Mi		   512		1.12

The per-netns hash table breaks the lengthy lists into shorter ones.  It is
useful on a multi-tenant system with thousands of netns.  With smaller hash
tables, we can look up sockets faster, isolate noisy neighbours, and reduce
lock contention.

The max size of the per-netns table is 64K as well.  This is because the
possible hash range by udp_hashfn() always fits in 64K within the same
netns and we cannot make full use of the whole buckets larger than 64K.

  /* 0 &lt; num &lt; 64K  -&gt;  X &lt; hash &lt; X + 64K */
  (num + net_hash_mix(net)) &amp; mask;

Also, the min size is 128.  We use a bitmap to search for an available
port in udp_lib_get_port().  To keep the bitmap on the stack and not
fire the CONFIG_FRAME_WARN error at build time, we round up the table
size to 128.

The sysctl usage is the same with TCP:

  $ dmesg | cut -d ' ' -f 6- | grep "UDP hash"
  UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)

  # sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 65536  # can be changed by uhash_entries

  # sysctl net.ipv4.udp_child_hash_entries
  net.ipv4.udp_child_hash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = -65536  # share the global table

  # sysctl -w net.ipv4.udp_child_hash_entries=100
  net.ipv4.udp_child_hash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 128  # own a per-netns table with 2^n buckets

We could optimise the hash table lookup/iteration further by removing
the netns comparison for the per-netns one in the future.  Also, we
could optimise the sparse udp_hslot layout by putting it in udp_table.

[0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Access &amp;udp_table via net.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ba6aac1516779dd0ced22c136a2c2c4a9c70cf29'/>
<id>urn:sha1:ba6aac1516779dd0ced22c136a2c2c4a9c70cf29</id>
<content type='text'>
We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use udp_table directly in most places.

Instead, access it via net-&gt;ipv4.udp_table.

The access will be valid only while initialising udp_table
itself and creating/destroying each netns.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Set NULL to udp_seq_afinfo.udp_table.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=478aee5d6bf617c932f4e9c2981f17e86e093fc5'/>
<id>urn:sha1:478aee5d6bf617c932f4e9c2981f17e86e093fc5</id>
<content type='text'>
We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use the global udp_seq_afinfo.udp_table
to fetch a UDP hash table.

Instead, set NULL to udp_seq_afinfo.udp_table for UDP and get
a proper table from net-&gt;ipv4.udp_table.

Note that we still need udp_seq_afinfo.udp_table for UDP LITE.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Set NULL to sk-&gt;sk_prot-&gt;h.udp_table.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=67fb43308f4b354f13aabcc66dd5d99bfbb7e838'/>
<id>urn:sha1:67fb43308f4b354f13aabcc66dd5d99bfbb7e838</id>
<content type='text'>
We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use the global sk-&gt;sk_prot-&gt;h.udp_table
to fetch a UDP hash table.

Instead, set NULL to sk-&gt;sk_prot-&gt;h.udp_table for UDP and get
a proper table from net-&gt;ipv4.udp_table.

Note that we still need sk-&gt;sk_prot-&gt;h.udp_table for UDP LITE.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Clean up some functions.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=919dfa0b20ae56060dce0436eb710717f8987d18'/>
<id>urn:sha1:919dfa0b20ae56060dce0436eb710717f8987d18</id>
<content type='text'>
This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert.  The change is mainly to keep reverse
christmas tree order.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()</title>
<updated>2022-11-08T16:42:28Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2022-10-12T07:49:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=42fb06b391ace2aec5cdb1ebb8ff668f0a34332f'/>
<id>urn:sha1:42fb06b391ace2aec5cdb1ebb8ff668f0a34332f</id>
<content type='text'>
Change the udp encap_err_rcv signature to match ip_icmp_error() and
ipv6_icmp_error() so that those can be used from the called function and
export them.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-10-24T20:44:11Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-10-24T20:44:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=96917bb3a377e1ed103fd5c420a95e5fb25ca104'/>
<id>urn:sha1:96917bb3a377e1ed103fd5c420a95e5fb25ca104</id>
<content type='text'>
include/linux/net.h
  a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag")
  e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>udp: track the forward memory release threshold in an hot cacheline</title>
<updated>2022-10-24T09:52:50Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2022-10-20T17:48:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8a3854c7b8e4532063b14bed34115079b7d0cb36'/>
<id>urn:sha1:8a3854c7b8e4532063b14bed34115079b7d0cb36</id>
<content type='text'>
When the receiver process and the BH runs on different cores,
udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
as the latter shares the same cacheline with sk_forward_alloc, written
by the BH.

With this patch, UDP tracks the rcvbuf value and its update via custom
SOL_SOCKET socket options, and copies the forward memory threshold value
used by udp_rmem_release() in a different cacheline, already accessed by
the above function and uncontended.

Since the UDP socket init operation grown a bit, factor out the common
code between v4 and v6 in a shared helper.

Overall the above give a 10% peek throughput increase under UDP flood.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux</title>
<updated>2022-10-23T16:55:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-23T16:55:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=942e01ab90151a16b79b5c0cb8e77530d1ee3dbb'/>
<id>urn:sha1:942e01ab90151a16b79b5c0cb8e77530d1ee3dbb</id>
<content type='text'>
Pull io_uring follow-up from Jens Axboe:
 "Currently the zero-copy has automatic fallback to normal transmit, and
  it was decided that it'd be cleaner to return an error instead if the
  socket type doesn't support it.

  Zero-copy does work with UDP and TCP, it's more of a future proofing
  kind of thing (eg for samba)"

* tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux:
  io_uring/net: fail zc sendmsg when unsupported by socket
  io_uring/net: fail zc send when unsupported by socket
  net: flag sockets supporting msghdr originated zerocopy
</content>
</entry>
</feed>
