<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv4/sysctl_net_ipv4.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-06-17T17:50:37Z</updated>
<entry>
<title>tcp: add tcp_min_snd_mss sysctl</title>
<updated>2019-06-17T17:50:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-06-06T16:15:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2efabe3e1491f10bf3cf82ae1a371755ba054a1b'/>
<id>urn:sha1:2efabe3e1491f10bf3cf82ae1a371755ba054a1b</id>
<content type='text'>
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.

Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.

This forces the stack to send packets with a very high network/cpu
overhead.

Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.

In some cases, it can be useful to increase the minimal value
to a saner value.

We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.

Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.

We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.

CVE-2019-11479 -- tcp mss hardcoded to 48

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Suggested-by: Jonathan Looney &lt;jtl@netflix.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Cc: Bruce Curtis &lt;brucec@netflix.com&gt;
Cc: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: set the tcp_min_rtt_wlen range from 0 to one day</title>
<updated>2019-04-17T20:57:11Z</updated>
<author>
<name>ZhangXiaoxu</name>
<email>zhangxiaoxu5@huawei.com</email>
</author>
<published>2019-04-16T01:47:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=19fad20d15a6494f47f85d869f00b11343ee5c78'/>
<id>urn:sha1:19fad20d15a6494f47f85d869f00b11343ee5c78</id>
<content type='text'>
There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
 &lt;IRQ&gt;
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 &lt;/IRQ&gt;

It can be reproduced by:
  echo 2147483647 &gt; /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu &lt;zhangxiaoxu5@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs</title>
<updated>2018-11-08T00:12:38Z</updated>
<author>
<name>Mike Manning</name>
<email>mmanning@vyatta.att-mail.com</email>
</author>
<published>2018-11-07T15:36:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6897445fb194c8ad046df4a13e1ee9f080a5a21e'/>
<id>urn:sha1:6897445fb194c8ad046df4a13e1ee9f080a5a21e</id>
<content type='text'>
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
for datagram sockets. Have this default to enabled for reasons of
backwards compatibility. This is so as to specify the output device
with cmsg and IP_PKTINFO, but using a socket not bound to the
corresponding VRF. This allows e.g. older ping implementations to be
run with specifying the device but without executing it in the VRF.
If the option is disabled, packets received in a VRF context are only
handled by a raw socket bound to the VRF, and correspondingly packets
in the default VRF are only handled by a socket not bound to any VRF.

Signed-off-by: Mike Manning &lt;mmanning@vyatta.att-mail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Tested-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int</title>
<updated>2018-09-27T03:33:21Z</updated>
<author>
<name>Maciej Żenczykowski</name>
<email>maze@google.com</email>
</author>
<published>2018-09-26T04:59:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d4ce58082f206bf6e7d697380c7bc5480a8b0264'/>
<id>urn:sha1:d4ce58082f206bf6e7d697380c7bc5480a8b0264</id>
<content type='text'>
(fix documentation and sysctl access to treat it as such)

Tested:
  # zcat /proc/config.gz | egrep ^CONFIG_HZ
  CONFIG_HZ_1000=y
  CONFIG_HZ=1000
  # echo $[(1&lt;&lt;32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294968
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
  # echo $[(1&lt;&lt;32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294967
  # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
  # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
  -1
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument

Signed-off-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: Notify about changes to ip_forward_update_priority</title>
<updated>2018-08-01T16:52:30Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@mellanox.com</email>
</author>
<published>2018-07-31T22:36:42Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d18c5d1995aa322b722fa731397e28ebcd00b3c6'/>
<id>urn:sha1:d18c5d1995aa322b722fa731397e28ebcd00b3c6</id>
<content type='text'>
Drivers may make offloading decision based on whether
ip_forward_update_priority is enabled or not. Therefore distribute
netevent notifications to give them a chance to react to a change.

Signed-off-by: Petr Machata &lt;petrm@mellanox.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: Control SKB reprioritization after forwarding</title>
<updated>2018-08-01T16:52:30Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@mellanox.com</email>
</author>
<published>2018-07-31T22:36:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=432e05d328921c68c35bfdeff7d7b7400b8e3d1a'/>
<id>urn:sha1:432e05d328921c68c35bfdeff7d7b7400b8e3d1a</id>
<content type='text'>
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.

Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.

Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.

Signed-off-by: Petr Machata &lt;petrm@mellanox.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns</title>
<updated>2018-07-06T02:51:18Z</updated>
<author>
<name>Tyler Hicks</name>
<email>tyhicks@canonical.com</email>
</author>
<published>2018-07-05T18:49:23Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=70ba5b6db96ff7324b8cfc87e0d0383cf59c9677'/>
<id>urn:sha1:70ba5b6db96ff7324b8cfc87e0d0383cf59c9677</id>
<content type='text'>
The low and high values of the net.ipv4.ping_group_range sysctl were
being silently forced to the default disabled state when a write to the
sysctl contained GIDs that didn't map to the associated user namespace.
Confusingly, the sysctl's write operation would return success and then
a subsequent read of the sysctl would indicate that the low and high
values are the overflowgid.

This patch changes the behavior by clearly returning an error when the
sysctl write operation receives a GID range that doesn't map to the
associated user namespace. In such a situation, the previous value of
the sysctl is preserved and that range will be returned in a subsequent
read of the sysctl.

Signed-off-by: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix Fast Open key endianness</title>
<updated>2018-06-30T09:40:46Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2018-06-27T23:04:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c860e997e9170a6d68f9d1e6e2cf61f572191aaf'/>
<id>urn:sha1:c860e997e9170a6d68f9d1e6e2cf61f572191aaf</id>
<content type='text'>
Fast Open key could be stored in different endian based on the CPU.
Previously hosts in different endianness in a server farm using
the same key config (sysctl value) would produce different cookies.
This patch fixes it by always storing it as little endian to keep
same API for LE hosts.

Reported-by: Daniele Iamartino &lt;danielei@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization</title>
<updated>2018-06-04T21:13:35Z</updated>
<author>
<name>Maciej Żenczykowski</name>
<email>maze@google.com</email>
</author>
<published>2018-06-03T17:41:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=79e9fed460385a3d8ba0b5782e9e74405cb199b1'/>
<id>urn:sha1:79e9fed460385a3d8ba0b5782e9e74405cb199b1</id>
<content type='text'>
This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.

It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
  ie. bound to 'lo' interface or where one of source or destination
  IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.

This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').

This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)

Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -&gt; [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.

This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.

Signed-off-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add tcp_comp_sack_nr sysctl</title>
<updated>2018-05-18T15:40:27Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-05-17T21:47:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9c21d2fc41c0c8930600c9dd83eadac2e336fcfa'/>
<id>urn:sha1:9c21d2fc41c0c8930600c9dd83eadac2e336fcfa</id>
<content type='text'>
This per netns sysctl allows for TCP SACK compression fine-tuning.

This limits number of SACK that can be compressed.
Using 0 disables SACK compression.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
