<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/net/ip6_fib.h, branch linux-4.16.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.16.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.16.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2018-01-10T20:14:44Z</updated>
<entry>
<title>ipv6: Add support for non-equal-cost multipath</title>
<updated>2018-01-10T20:14:44Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2018-01-09T14:40:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=398958ae48f44bb036d0fa9829cd489270bf1fc2'/>
<id>urn:sha1:398958ae48f44bb036d0fa9829cd489270bf1fc2</id>
<content type='text'>
The use of hash-threshold instead of modulo-N makes it trivial to add
support for non-equal-cost multipath.

Instead of dividing the multipath hash function's output space equally
between the nexthops, each nexthop is assigned a region size which is
proportional to its weight.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: Calculate hash thresholds for IPv6 nexthops</title>
<updated>2018-01-10T20:14:44Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2018-01-09T14:40:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d7dedee184e775f77d321cfa1c660a7680cf6588'/>
<id>urn:sha1:d7dedee184e775f77d321cfa1c660a7680cf6588</id>
<content type='text'>
Before we convert IPv6 to use hash-threshold instead of modulo-N, we
first need each nexthop to store its region boundary in the hash
function's output space.

The boundary is calculated by dividing the output space equally between
the different active nexthops. That is, nexthops that are not dead or
linkdown.

The boundaries are rebalanced whenever a nexthop is added or removed to
a multipath route and whenever a nexthop becomes active or inactive.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: Export sernum update function</title>
<updated>2018-01-08T02:29:40Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2018-01-07T10:45:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4a8e56ee2c8551e674f69ba007aabede8f0b88d9'/>
<id>urn:sha1:4a8e56ee2c8551e674f69ba007aabede8f0b88d9</id>
<content type='text'>
We are going to allow dead routes to stay in the FIB tree (e.g., when
they are part of a multipath route, directly connected route with no
carrier) and revive them when their nexthop device gains carrier or when
it is put administratively up.

This is equivalent to the addition of the route to the FIB tree and we
should therefore take care of updating the sernum of all the parent
nodes of the node where the route is stored. Otherwise, we risk sockets
caching and using sub-optimal dst entries.

Export the function that performs the above, so that it could be invoked
from fib6_ifup() later on.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: Add explicit flush indication to routes</title>
<updated>2018-01-08T02:29:40Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2018-01-07T10:45:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a2c554d3f8f65dd9c46a55811efad8156e328f85'/>
<id>urn:sha1:a2c554d3f8f65dd9c46a55811efad8156e328f85</id>
<content type='text'>
When routes that are a part of a multipath route are evaluated by
fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
the state of their sibling routes is not considered.

This will change in subsequent patches in order to align IPv6 with
IPv4's behavior. For example, when the last sibling in a multipath route
becomes dead, the entire multipath route needs to be removed.

To prevent the tree walker from re-evaluating all the sibling routes
each time, we can simply evaluate them once - when the first sibling is
traversed.

If we determine the entire multipath route needs to be removed, then the
'should_flush' bit is set in all the siblings, which will cause the
walker to flush them when it traverses them.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: Move dst-&gt;from into struct rt6_info.</title>
<updated>2017-11-30T14:54:26Z</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-11-28T20:40:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3a2232e92e87166a8a5113e918b8c7b7bdce4d83'/>
<id>urn:sha1:3a2232e92e87166a8a5113e918b8c7b7bdce4d83</id>
<content type='text'>
The dst-&gt;from value is only used by ipv6 routes to track where
a route "came from".

Any time we clone or copy a core ipv6 route in the ipv6 routing
tables, we have the copy/clone's -&gt;from point to the base route.

This is used to handle route expiration properly.

Only ipv6 uses this mechanism, and only ipv6 code references
it.  So it is safe to move it into rt6_info.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</content>
</entry>
<entry>
<title>ipv6: Move rt6_next from dst_entry into ipv6 route structure.</title>
<updated>2017-11-30T14:54:25Z</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-11-28T20:40:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=071fb37ec43dcd88937a669c5f97bd37f7d29dea'/>
<id>urn:sha1:071fb37ec43dcd88937a669c5f97bd37f7d29dea</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</content>
</entry>
<entry>
<title>ipv6: take care of rt6_stats</title>
<updated>2017-10-07T20:22:58Z</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2017-10-06T19:06:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=81eb8447daae3b62247aa66bb17b82f8fef68249'/>
<id>urn:sha1:81eb8447daae3b62247aa66bb17b82f8fef68249</id>
<content type='text'>
Currently, most of the rt6_stats are not hooked up correctly. As the
last part of this patch series, hook up all existing rt6_stats and add
one new stat fib_rt_uncache to indicate the number of routes in the
uncached list.
For details of the stats, please refer to the comments added in
include/net/ip6_fib.h.

Note: fib_rt_alloc and fib_rt_uncache are not guaranteed to be modified
under a lock. So atomic_t is used for them.

Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: replace rwlock with rcu and spinlock in fib6_table</title>
<updated>2017-10-07T20:22:58Z</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2017-10-06T19:06:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=66f5d6ce53e665477d2a33e8f539d4fa4ca81c83'/>
<id>urn:sha1:66f5d6ce53e665477d2a33e8f539d4fa4ca81c83</id>
<content type='text'>
With all the preparation work before, we are now ready to replace rwlock
with rcu and spinlock in fib6_table.
That means now all fib6_node in fib6_table are protected by rcu. And
when freeing fib6_node, call_rcu() is used to wait for the rcu grace
period before releasing the memory.
When accessing fib6_node, corresponding rcu APIs need to be used.
And all previous sessions protected by the write lock will now be
protected by the spin lock per table.
All previous sessions protected by read lock will now be protected by
rcu_read_lock().

A couple of things to note here:
1. As part of the work of replacing rwlock with rcu, the linked list of
fn-&gt;leaf now has to be rcu protected as well. So both fn-&gt;leaf and
rt-&gt;dst.rt6_next are now __rcu tagged and corresponding rcu APIs are
used when manipulating them.

2. For fn-&gt;rr_ptr, first of all, it also needs to be rcu protected now
and is tagged with __rcu and rcu APIs are used in corresponding places.
Secondly, fn-&gt;rr_ptr is changed in rt6_select() which is a reader
thread. This makes the issue a bit complicated. We think a valid
solution for it is to let rt6_select() grab the tb6_lock if it decides
to change it. As it is not in the normal operation and only happens when
there is no valid neighbor cache for the route, we think the performance
impact should be low.

3. fib6_walk_continue() has to be called with tb6_lock held even in the
route dumping related functions, e.g. inet6_dump_fib(),
fib6_tables_dump() and ipv6_route_seq_ops. It is because
fib6_walk_continue() makes modifications to the walker structure, and so
are fib6_repair_tree() and fib6_del_route(). In order to do proper
syncing between them, we need to let fib6_walk_continue() hold the lock.
We may be able to do further improvement on the way we do the tree walk
to get rid of the need for holding the spin lock. But not for now.

4. When fib6_del_route() removes a route from the tree, we no longer
mark rt-&gt;dst.rt6_next to NULL to make simultaneous reader be able to
further traverse the list with rcu. However, rt-&gt;dst.rt6_next is only
valid within this same rcu period. No one should access it later.

5. All the operation of atomic_inc(rt-&gt;rt6i_ref) is changed to be
performed before we publish this route (either by linking it to fn-&gt;leaf
or insert it in the list pointed by fn-&gt;leaf) just to be safe because as
soon as we publish the route, some read thread will be able to access it.

Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: update fn_sernum after route is inserted to tree</title>
<updated>2017-10-07T20:22:58Z</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2017-10-06T19:06:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bbd63f06d114a52be33f6982fc89ca2768cdeb62'/>
<id>urn:sha1:bbd63f06d114a52be33f6982fc89ca2768cdeb62</id>
<content type='text'>
fib6_add() logic currently calls fib6_add_1() to figure out what node
should be used for the newly added route and then call
fib6_add_rt2node() to insert the route to the node.
And during the call of fib6_add_1(), fn_sernum is updated for all nodes
that share the same prefix as the new route.
This does not have issue in the current code because reader thread will
not be able to access the tree while writer thread is inserting new
route to it. However, it is not the case once we transition to use RCU.
Reader thread could potentially see the new fn_sernum before the new
route is inserted. As a result, reader thread's route lookup will return
a stale route with the new fn_sernum.

In order to solve this issue, we remove all the update of fn_sernum in
fib6_add_1(), and instead, introduce a new function that updates fn_sernum
for all related nodes and call this functions once the route is
successfully inserted to the tree.
Also, smp_wmb() is used after a route is successfully inserted into the
fib tree and right before the updated of fn-&gt;sernum. And smp_rmb() is
used right after fn-&gt;sernum is accessed in rt6_get_cookie_safe(). This
is to guarantee that when the reader thread sees the new fn-&gt;sernum, the
new route is already inserted in the tree in memory.

Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: hook up exception table to store dst cache</title>
<updated>2017-10-07T20:22:57Z</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2017-10-06T19:06:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2b760fcf5cfb34e8610df56d83745b2b74ae1379'/>
<id>urn:sha1:2b760fcf5cfb34e8610df56d83745b2b74ae1379</id>
<content type='text'>
This commit makes use of the exception hash table implementation to
store dst caches created by pmtu discovery and ip redirect into the hash
table under the rt_info and no longer inserts these routes into fib6
tree.
This makes the fib6 tree only contain static configured routes and could
now be protected by rcu instead of a rw lock.
With this change, in the route lookup related functions, after finding
the rt6_info with the longest prefix, we also need to search for the
exception table before doing backtracking.
In the route delete function, if the route being deleted is not a dst
cache, deletion of this route also need to flush the whole hash table
under it. If it is a dst cache, then only delete the cached dst in the
hash table.

Note: for fib6_walk_continue() function, w-&gt;root now is always pointing
to a root node considering that fib6_prune_clones() is removed from the
code. So we add a WARN_ON() msg to make sure w-&gt;root always points to a
root node and also removed the update of w-&gt;root in fib6_repair_tree().
This is a prerequisite for later patch because we don't need to make
w-&gt;root as rcu protected when replacing rwlock with RCU.
Also, we remove all prune related variables as it is no longer used.

Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
