<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv6/reassembly.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-02-26T16:27:05Z</updated>
<entry>
<title>net: remove unused struct inet_frag_queue.fragments field</title>
<updated>2019-02-26T16:27:05Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2019-02-26T01:43:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d8cf757fbd3ee96a449f656707e773c91ca805b8'/>
<id>urn:sha1:d8cf757fbd3ee96a449f656707e773c91ca805b8</id>
<content type='text'>
Now that all users of struct inet_frag_queue have been converted
to use 'rb_fragments', remove the unused 'fragments' field.

Build with `make allyesconfig` succeeded. ip_defrag selftest passed.

Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Acked-by: Stefan Schmidt &lt;stefan@datenfreihafen.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: IP6 defrag: use rbtrees for IPv6 defrag</title>
<updated>2019-01-26T05:37:11Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2019-01-22T18:02:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d4289fcc9b16b89619ee1c54f829e05e56de8b9a'/>
<id>urn:sha1:d4289fcc9b16b89619ee1c54f829e05e56de8b9a</id>
<content type='text'>
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu")

This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html

This patch re-uses common IP defragmentation queueing and reassembly
code in IPv6, removing the 1280 byte restriction.

Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Reported-by: Tom Herbert &lt;tom@herbertland.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: fix typo in net/ipv6/reassembly.c</title>
<updated>2018-12-30T21:02:46Z</updated>
<author>
<name>Su Yanjun</name>
<email>suyj.fnst@cn.fujitsu.com</email>
</author>
<published>2018-12-29T19:07:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7f334a7e1ae113212e39aafba51352ea9ab8c9f9'/>
<id>urn:sha1:7f334a7e1ae113212e39aafba51352ea9ab8c9f9</id>
<content type='text'>
Signed-off-by: Su Yanjun &lt;suyj.fnst@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: frags: Fix bogus skb-&gt;sk in reassembled packets</title>
<updated>2018-12-21T00:31:36Z</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2018-12-20T13:20:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d15f5ac8deea936d3adf629421a66a88b42b8a2f'/>
<id>urn:sha1:d15f5ac8deea936d3adf629421a66a88b42b8a2f</id>
<content type='text'>
It was reported that IPsec would crash when it encounters an IPv6
reassembled packet because skb-&gt;sk is non-zero and not a valid
pointer.

This is because skb-&gt;sk is now a union with ip_defrag_offset.

This patch fixes this by resetting skb-&gt;sk when exiting from
the reassembly code.

Reported-by: Xiumei Mu &lt;xmu@redhat.com&gt;
Fixes: 219badfaade9 ("ipv6: frags: get rid of ip6frag_skb_cb/...")
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes</title>
<updated>2018-12-06T04:44:46Z</updated>
<author>
<name>Jiri Wiesner</name>
<email>jwiesner@suse.com</email>
</author>
<published>2018-12-05T15:55:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ebaf39e6032faf77218220707fc3fa22487784e0'/>
<id>urn:sha1:ebaf39e6032faf77218220707fc3fa22487784e0</id>
<content type='text'>
The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 #10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner &lt;jwiesner@suse.com&gt;
Reported-by: Per Sundstrom &lt;per.sundstrom@redqube.se&gt;
Acked-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net</title>
<updated>2018-09-22T02:45:52Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-09-21T18:17:16Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=83619623929815a76fa7af49309d2cdfcf024fd3'/>
<id>urn:sha1:83619623929815a76fa7af49309d2cdfcf024fd3</id>
<content type='text'>
Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
hard-limited to those of the root/init ns.

There are at least two use cases when it would be desirable to
set the high_thresh values higher in a child namespace vs the global hard
limit:

- a security/ddos protection policy may lower the thresholds in the
  root/init ns but allow for a special exception in a child namespace
- testing: a test running in a namespace may want to set these
  thresholds higher in its namespace than what is in the root/init ns

The new behavior:

 # ip netns add testns
 # ip netns exec testns bash

 # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl net.ipv4.ipfrag_high_thresh
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
 net.ipv6.ip6frag_high_thresh = 9000000

 # sysctl net.ipv6.ip6frag_high_thresh
 net.ipv6.ip6frag_high_thresh = 9000000

The old behavior:

 # ip netns add testns
 # ip netns exec testns bash

 # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl net.ipv4.ipfrag_high_thresh
 net.ipv4.ipfrag_high_thresh = 4194304

 # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
 net.ipv6.ip6frag_high_thresh = 9000000

 # sysctl net.ipv6.ip6frag_high_thresh
 net.ipv6.ip6frag_high_thresh = 4194304

Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: discard IP frag queue on more errors</title>
<updated>2018-09-22T02:45:52Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-09-21T18:17:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2475f59c618ea58e9f72ae5ded2db392ee47810d'/>
<id>urn:sha1:2475f59c618ea58e9f72ae5ded2db392ee47810d</id>
<content type='text'>
This is similar to how ipv4 now behaves:
commit 0ff89efb5246 ("ip: fail fast on IP defrag errors").

Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add and use skb_mark_not_on_list().</title>
<updated>2018-09-10T17:06:54Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-07-30T03:42:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218'/>
<id>urn:sha1:a8305bff685252e80b7c60f4f5e7dd2e63e38218</id>
<content type='text'>
An SKB is not on a list if skb-&gt;next is NULL.

Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: defrag: drop non-last frags smaller than min mtu</title>
<updated>2018-08-06T00:21:14Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2018-08-03T00:22:20Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0ed4229b08c13c84a3c301a08defdc9e7f4467e6'/>
<id>urn:sha1:0ed4229b08c13c84a3c301a08defdc9e7f4467e6</id>
<content type='text'>
don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).

v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
    There were concerns that there could be even smaller frags
    generated by intermediate nodes, e.g. on radio networks.

Cc: Peter Oskolkov &lt;posk@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ip: use rb trees for IP frag queue.</title>
<updated>2018-08-06T00:16:46Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-08-02T23:34:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fa0f527358bd900ef92f925878ed6bfbd51305cc'/>
<id>urn:sha1:fa0f527358bd900ef92f925878ed6bfbd51305cc</id>
<content type='text'>
Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H &lt;host&gt; -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Reported-by: Juha-Matti Tilli &lt;juha-matti.tilli@iki.fi&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
