<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv4/nexthop.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-10-27T17:17:40Z</updated>
<entry>
<title>nh: fix scope used to find saddr when adding non gw nh</title>
<updated>2022-10-27T17:17:40Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2022-10-20T10:09:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bac0f937c343d651874f83b265ca8f5070ed4f06'/>
<id>urn:sha1:bac0f937c343d651874f83b265ca8f5070ed4f06</id>
<content type='text'>
As explained by Julian, fib_nh_scope is related to fib_nh_gw4, but
fib_info_update_nhc_saddr() needs the scope of the route, which is
the scope "before" fib_nh_scope, ie fib_nh_scope - 1.

This patch fixes the problem described in commit 747c14307214 ("ip: fix
dflt addr selection for connected nexthop").

Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>nexthop: Fix data-races around nexthop_compat_mode.</title>
<updated>2022-07-13T11:56:50Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-07-12T00:15:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bdf00bf24bef9be1ca641a6390fd5487873e0d2e'/>
<id>urn:sha1:bdf00bf24bef9be1ca641a6390fd5487873e0d2e</id>
<content type='text'>
While reading nexthop_compat_mode, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 4f80116d3df3 ("net: ipv4: add sysctl for nexthop api compatibility mode")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>nexthop: change nexthop_net_exit() to nexthop_net_exit_batch()</title>
<updated>2022-02-09T04:41:33Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-08T04:50:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fea7b201320ca222d532398685d23db2b8d55558'/>
<id>urn:sha1:fea7b201320ca222d532398685d23db2b8d55558</id>
<content type='text'>
cleanup_net() is competing with other rtnl users.

nexthop_net_exit() seems a good candidate for exit_batch(),
as this gives chance for cleanup_net() to progress much faster,
holding rtnl a bit longer.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Don't include filter.h from net/sock.h</title>
<updated>2021-12-29T16:48:14Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-12-29T00:49:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b6459415b384cb829f0b2a4268f211c789f6cf0b'/>
<id>urn:sha1:b6459415b384cb829f0b2a4268f211c789f6cf0b</id>
<content type='text'>
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
Acked-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@nvidia.com&gt;
Acked-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
</content>
</entry>
<entry>
<title>net: nexthop: reduce rcu synchronizations when replacing resilient groups</title>
<updated>2021-11-30T11:59:18Z</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@nvidia.com</email>
</author>
<published>2021-11-29T12:09:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7709efa62c4fd2a79d154579ea19be34f9fa9a31'/>
<id>urn:sha1:7709efa62c4fd2a79d154579ea19be34f9fa9a31</id>
<content type='text'>
We can optimize resilient nexthop group replaces by reducing the number of
synchronize_net calls. After commit 1005f19b9357 ("net: nexthop: release
IPv6 per-cpu dsts when replacing a nexthop group") we always do a
synchronize_net because we must ensure no new dsts can be created for the
replaced group's removed nexthops, but we already did that when replacing
resilient groups, so if we always call synchronize_net after any group
type replacement we'll take care of both cases and reduce synchronize_net
calls for resilient groups.

Suggested-by: Ido Schimmel &lt;idosch@idosch.org&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@nvidia.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Tested-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: nexthop: fix null pointer dereference when IPv6 is not enabled</title>
<updated>2021-11-23T11:39:53Z</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@nvidia.com</email>
</author>
<published>2021-11-23T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1c743127cc54b112b155f434756bd4b5fa565a99'/>
<id>urn:sha1:1c743127cc54b112b155f434756bd4b5fa565a99</id>
<content type='text'>
When we try to add an IPv6 nexthop and IPv6 is not enabled
(!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path
of nh_create_ipv6() due to calling ipv6_stub-&gt;fib6_nh_release. The bug
has been present since the beginning of IPv6 nexthop gateway support.
Commit 1aefd3de7bc6 ("ipv6: Add fib6_nh_init and release to stubs") tells
us that only fib6_nh_init has a dummy stub because fib6_nh_release should
not be called if fib6_nh_init returns an error, but the commit below added
a call to ipv6_stub-&gt;fib6_nh_release in its error path. To fix it return
the dummy stub's -EAFNOSUPPORT error directly without calling
ipv6_stub-&gt;fib6_nh_release in nh_create_ipv6()'s error path.

[1]
 Output is a bit truncated, but it clearly shows the error.
 BUG: kernel NULL pointer dereference, address: 000000000000000000
 #PF: supervisor instruction fetch in kernel modede
 #PF: error_code(0x0010) - not-present pagege
 PGD 0 P4D 0
 Oops: 0010 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ #446
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
 RSP: 0018:ffff888109f5b8f0 EFLAGS: 00010286^Ac
 RAX: 0000000000000000 RBX: ffff888109f5ba28 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881008a2860
 RBP: ffff888109f5b9d8 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff888109f5b978 R11: ffff888109f5b948 R12: 00000000ffffff9f
 R13: ffff8881008a2a80 R14: ffff8881008a2860 R15: ffff8881008a2840
 FS:  00007f98de70f100(0000) GS:ffff88822bf00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffffffffd6 CR3: 0000000100efc000 CR4: 00000000000006e0
 Call Trace:
  &lt;TASK&gt;
  nh_create_ipv6+0xed/0x10c
  rtm_new_nexthop+0x6d7/0x13f3
  ? check_preemption_disabled+0x3d/0xf2
  ? lock_is_held_type+0xbe/0xfd
  rtnetlink_rcv_msg+0x23f/0x26a
  ? check_preemption_disabled+0x3d/0xf2
  ? rtnl_calcit.isra.0+0x147/0x147
  netlink_rcv_skb+0x61/0xb2
  netlink_unicast+0x100/0x187
  netlink_sendmsg+0x37f/0x3a0
  ? netlink_unicast+0x187/0x187
  sock_sendmsg_nosec+0x67/0x9b
  ____sys_sendmsg+0x19d/0x1f9
  ? copy_msghdr_from_user+0x4c/0x5e
  ? rcu_read_lock_any_held+0x2a/0x78
  ___sys_sendmsg+0x6c/0x8c
  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
  ? lockdep_hardirqs_on+0xd9/0x102
  ? sockfd_lookup_light+0x69/0x99
  __sys_sendmsg+0x50/0x6e
  do_syscall_64+0xcb/0xf2
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f98dea28914
 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
 RSP: 002b:00007fff859f5e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e2e
 RAX: ffffffffffffffda RBX: 00000000619cb810 RCX: 00007f98dea28914
 RDX: 0000000000000000 RSI: 00007fff859f5ed0 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008
 R10: fffffffffffffce6 R11: 0000000000000246 R12: 0000000000000001
 R13: 000055c0097ae520 R14: 000055c0097957fd R15: 00007fff859f63a0
 &lt;/TASK&gt;
 Modules linked in: bridge stp llc bonding virtio_net

Cc: stable@vger.kernel.org
Fixes: 53010f991a9f ("nexthop: Add support for IPv6 gateways")
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group</title>
<updated>2021-11-22T15:44:49Z</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@nvidia.com</email>
</author>
<published>2021-11-22T15:15:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1005f19b9357b81aa64e1decd08d6e332caaa284'/>
<id>urn:sha1:1005f19b9357b81aa64e1decd08d6e332caaa284</id>
<content type='text'>
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
the removed nexthop entries after an RCU grace period because they
contain references to the nexthop's net device and to the fib6 info.
With specific series of events[1] we can reach net device refcount
imbalance which is unrecoverable. IPv4 is not affected because dsts
don't take a refcount on the route.

[1]
 $ ip nexthop list
  id 200 via 2002:db8::2 dev bridge.10 scope link onlink
  id 201 via 2002:db8::3 dev bridge scope link onlink
  id 203 group 201/200
 $ ip -6 route
  2001:db8::10 nhid 203 metric 1024 pref medium
     nexthop via 2002:db8::3 dev bridge weight 1 onlink
     nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink

Create rt6_info through one of the multipath legs, e.g.:
 $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
 (pkt_inj is just a custom packet generator, nothing special)

Then remove that leg from the group by replace (let's assume it is id
200 in this case):
 $ ip nexthop replace id 203 group 201

Now remove the IPv6 route:
 $ ip -6 route del 2001:db8::10/128

The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
 (deleted) IPv6 route holds 1 reference over nhid 203
 nh 203 holds 1 ref over id 201
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
 $ ip nexthop replace id 203 group 201/200

And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.

To trigger the bug just delete the group (nhid 203):
 $ ip nexthop del id 203

It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.

At this point the dependencies are:
 (deleted, only unlinked) IPv6 route holds reference over group nh 203
 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.

If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
 $ ip nexthop del id 200
 $ ip nexthop
 $

Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.

 Message from syslogd@dev at Nov 19 14:04:10 ...
  kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
 Message from syslogd@dev at Nov 19 14:04:20 ...
  kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3

Fixes: 7bf4796dd099 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>nexthop: Fix memory leaks in nexthop notification chain listeners</title>
<updated>2021-09-23T11:33:22Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2021-09-22T10:25:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3106a0847525befe3e22fc723909d1b21eb0d520'/>
<id>urn:sha1:3106a0847525befe3e22fc723909d1b21eb0d520</id>
<content type='text'>
syzkaller discovered memory leaks [1] that can be reduced to the
following commands:

 # ip nexthop add id 1 blackhole
 # devlink dev reload pci/0000:06:00.0

As part of the reload flow, mlxsw will unregister its netdevs and then
unregister from the nexthop notification chain. Before unregistering
from the notification chain, mlxsw will receive delete notifications for
nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
will not receive notifications for nexthops using netdevs that are not
dismantled as part of the reload flow. For example, the blackhole
nexthop above that internally uses the loopback netdev as its nexthop
device.

One way to fix this problem is to have listeners flush their nexthop
tables after unregistering from the notification chain. This is
error-prone as evident by this patch and also not symmetric with the
registration path where a listener receives a dump of all the existing
nexthops.

Therefore, fix this problem by replaying delete notifications for the
listener being unregistered. This is symmetric to the registration path
and also consistent with the netdev notification chain.

The above means that unregister_nexthop_notifier(), like
register_nexthop_notifier(), will have to take RTNL in order to iterate
over the existing nexthops and that any callers of the function cannot
hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
driver. To avoid a deadlock, change the latter to unregister its nexthop
listener without holding RTNL, making it symmetric to the registration
path.

[1]
unreferenced object 0xffff88806173d600 (size 512):
  comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
  hex dump (first 32 bytes):
    41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff  A..`......sa....
    08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00  ..sa............
  backtrace:
    [&lt;ffffffff81a6b576&gt;] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [&lt;ffffffff81a6b576&gt;] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
    [&lt;ffffffff81a716d3&gt;] slab_alloc_node mm/slub.c:3206 [inline]
    [&lt;ffffffff81a716d3&gt;] slab_alloc mm/slub.c:3214 [inline]
    [&lt;ffffffff81a716d3&gt;] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
    [&lt;ffffffff82e8681a&gt;] kmalloc include/linux/slab.h:591 [inline]
    [&lt;ffffffff82e8681a&gt;] kzalloc include/linux/slab.h:721 [inline]
    [&lt;ffffffff82e8681a&gt;] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
    [&lt;ffffffff82e8681a&gt;] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
    [&lt;ffffffff82e8681a&gt;] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
    [&lt;ffffffff813ef67d&gt;] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
    [&lt;ffffffff813f0662&gt;] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
    [&lt;ffffffff813f0662&gt;] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
    [&lt;ffffffff8384b9c6&gt;] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
    [&lt;ffffffff83852bd8&gt;] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
    [&lt;ffffffff83852bd8&gt;] nexthop_add net/ipv4/nexthop.c:2644 [inline]
    [&lt;ffffffff83852bd8&gt;] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
    [&lt;ffffffff833e9a78&gt;] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
    [&lt;ffffffff83608703&gt;] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
    [&lt;ffffffff833de032&gt;] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
    [&lt;ffffffff836069de&gt;] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
    [&lt;ffffffff836069de&gt;] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
    [&lt;ffffffff83607501&gt;] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
    [&lt;ffffffff832fde84&gt;] sock_sendmsg_nosec net/socket.c:704 [inline]
    [&lt;ffffffff832fde84&gt;] sock_sendmsg net/socket.c:724 [inline]
    [&lt;ffffffff832fde84&gt;] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
    [&lt;ffffffff83304a44&gt;] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
    [&lt;ffffffff83304c01&gt;] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
    [&lt;ffffffff83304d5d&gt;] __do_sys_sendmsg net/socket.c:2501 [inline]
    [&lt;ffffffff83304d5d&gt;] __se_sys_sendmsg net/socket.c:2499 [inline]
    [&lt;ffffffff83304d5d&gt;] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499

Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects")
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>nexthop: Fix division by zero while replacing a resilient group</title>
<updated>2021-09-20T08:45:14Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2021-09-17T13:02:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=563f23b002534176f49524b5ca0e1d94d8906c40'/>
<id>urn:sha1:563f23b002534176f49524b5ca0e1d94d8906c40</id>
<content type='text'>
The resilient nexthop group torture tests in fib_nexthop.sh exposed a
possible division by zero while replacing a resilient group [1]. The
division by zero occurs when the data path sees a resilient nexthop
group with zero buckets.

The tests replace a resilient nexthop group in a loop while traffic is
forwarded through it. The tests do not specify the number of buckets
while performing the replacement, resulting in the kernel allocating a
stub resilient table (i.e, 'struct nh_res_table') with zero buckets.

This table should never be visible to the data path, but the old nexthop
group (i.e., 'oldg') might still be used by the data path when the stub
table is assigned to it.

Fix this by only assigning the stub table to the old nexthop group after
making sure the group is no longer used by the data path.

Tested with fib_nexthops.sh:

Tests passed: 222
Tests failed:   0

[1]
 divide error: 0000 [#1] PREEMPT SMP KASAN
 CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
 RIP: 0010:nexthop_select_path+0x2d2/0x1a80
[...]
 Call Trace:
  fib_select_multipath+0x79b/0x1530
  fib_select_path+0x8fb/0x1c10
  ip_route_output_key_hash_rcu+0x1198/0x2da0
  ip_route_output_key_hash+0x190/0x340
  ip_route_output_flow+0x21/0x120
  raw_sendmsg+0x91d/0x2e10
  inet_sendmsg+0x9e/0xe0
  __sys_sendto+0x23d/0x360
  __x64_sys_sendto+0xe1/0x1b0
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups")
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6</title>
<updated>2021-09-02T10:42:13Z</updated>
<author>
<name>Ryoga Saito</name>
<email>contact@proelbtn.com</email>
</author>
<published>2021-09-02T05:20:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9aca491e0dccf8a9d84a5b478e5eee3c6ea7803b'/>
<id>urn:sha1:9aca491e0dccf8a9d84a5b478e5eee3c6ea7803b</id>
<content type='text'>
This patch fixes kernel NULL pointer dereference when creating nexthop
which is bound with SRv6 decapsulation. In the creation of nexthop,
__seg6_end_dt_vrf_build is called. __seg6_end_dt_vrf_build expects
fc_lninfo in fib6_config is set correctly, but it isn't set in
nh_create_ipv6, which causes kernel crash.

Here is steps to reproduce kernel crash:

1. modprobe vrf
2. ip -6 nexthop add encap seg6local action End.DT4 vrftable 1 dev eth0

We got the following message:

[  901.370336] BUG: kernel NULL pointer dereference, address: 0000000000000ba0
[  901.371658] #PF: supervisor read access in kernel mode
[  901.372672] #PF: error_code(0x0000) - not-present page
[  901.373672] PGD 0 P4D 0
[  901.374248] Oops: 0000 [#1] SMP PTI
[  901.374944] CPU: 0 PID: 8593 Comm: ip Not tainted 5.14-051400-generic #202108310811-Ubuntu
[  901.376404] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014
[  901.377907] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf]
[  901.379182] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 &lt;48&gt; 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00
[  901.382652] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282
[  901.383746] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8
[  901.385436] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
[  901.386924] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40
[  901.388537] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000
[  901.389937] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b
[  901.391226] FS:  00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000
[  901.392737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  901.393803] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0
[  901.395122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  901.396496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  901.397833] PKRU: 55555554
[  901.398578] Call Trace:
[  901.399144]  l3mdev_ifindex_lookup_by_table_id+0x3b/0x70
[  901.400179]  __seg6_end_dt_vrf_build+0x34/0xd0
[  901.401067]  seg6_end_dt4_build+0x16/0x20
[  901.401904]  seg6_local_build_state+0x271/0x430
[  901.402797]  lwtunnel_build_state+0x81/0x130
[  901.403645]  fib_nh_common_init+0x82/0x100
[  901.404465]  ? sock_def_readable+0x4b/0x80
[  901.405285]  fib6_nh_init+0x115/0x7c0
[  901.406033]  nh_create_ipv6.isra.0+0xe1/0x140
[  901.406932]  rtm_new_nexthop+0x3b7/0xeb0
[  901.407828]  rtnetlink_rcv_msg+0x152/0x3a0
[  901.408663]  ? rtnl_calcit.isra.0+0x130/0x130
[  901.409535]  netlink_rcv_skb+0x55/0x100
[  901.410319]  rtnetlink_rcv+0x15/0x20
[  901.411026]  netlink_unicast+0x1a8/0x250
[  901.411813]  netlink_sendmsg+0x238/0x470
[  901.412602]  ? _copy_from_user+0x2b/0x60
[  901.413394]  sock_sendmsg+0x65/0x70
[  901.414112]  ____sys_sendmsg+0x218/0x290
[  901.414929]  ? copy_msghdr_from_user+0x5c/0x90
[  901.415814]  ___sys_sendmsg+0x81/0xc0
[  901.416559]  ? fsnotify_destroy_marks+0x27/0xf0
[  901.417447]  ? call_rcu+0xa4/0x230
[  901.418153]  ? kmem_cache_free+0x23f/0x410
[  901.418972]  ? dentry_free+0x37/0x70
[  901.419705]  ? mntput_no_expire+0x4c/0x260
[  901.420574]  __sys_sendmsg+0x62/0xb0
[  901.421297]  __x64_sys_sendmsg+0x1f/0x30
[  901.422057]  do_syscall_64+0x5c/0xc0
[  901.422756]  ? syscall_exit_to_user_mode+0x27/0x50
[  901.423675]  ? __x64_sys_close+0x12/0x40
[  901.424462]  ? do_syscall_64+0x69/0xc0
[  901.425219]  ? irqentry_exit_to_user_mode+0x9/0x20
[  901.426149]  ? irqentry_exit+0x19/0x30
[  901.426901]  ? exc_page_fault+0x89/0x160
[  901.427709]  ? asm_exc_page_fault+0x8/0x30
[  901.428536]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  901.429514] RIP: 0033:0x7fe493945747
[  901.430248] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  901.433549] RSP: 002b:00007ffe9932cf68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  901.434981] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe493945747
[  901.436303] RDX: 0000000000000000 RSI: 00007ffe9932cfe0 RDI: 0000000000000003
[  901.437607] RBP: 00000000613053f7 R08: 0000000000000001 R09: 00007ffe9932d07c
[  901.438990] R10: 000055f4a903a010 R11: 0000000000000246 R12: 0000000000000001
[  901.440340] R13: 0000000000000001 R14: 000055f4a802b163 R15: 000055f4a8042020
[  901.441630] Modules linked in: vrf nls_utf8 isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit rapl input_leds joydev serio_raw qemu_fw_cfg mac_hid sch_fq_codel drm virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd virtio_net net_failover cryptd psmouse virtio_blk failover i2c_piix4 pata_acpi floppy
[  901.450808] CR2: 0000000000000ba0
[  901.451514] ---[ end trace c27b934b99ade304 ]---
[  901.452403] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf]
[  901.453626] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 &lt;48&gt; 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00
[  901.456910] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282
[  901.457912] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8
[  901.459238] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
[  901.460552] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40
[  901.461882] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000
[  901.463208] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b
[  901.464529] FS:  00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000
[  901.466058] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  901.467189] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0
[  901.468515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  901.469858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  901.471139] PKRU: 55555554

Signed-off-by: Ryoga Saito &lt;contact@proelbtn.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
