kernel/drivers/net/bonding, branch linux-4.1.y

bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave

2018-05-23T01:36:36Z

[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ] After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: process the err returned by dev_set_allmulti properly in bond_enslave

2018-05-23T01:36:31Z

[ Upstream commit 9f5a90c107741b864398f4ac0014711a8c1d8474 ] When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails, dev_set_promiscuity(-1) should be done before going to the err path. Otherwise, dev->promiscuity will leak. Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti") Signed-off-by: Xin Long Acked-by: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave

2018-05-23T01:36:30Z

[ Upstream commit ae42cc62a9f07f1f6979054ed92606b9c30f4a2e ] Beniamino found a crash when adding vlan as slave of bond which is also the parent link: ip link add bond1 type bond ip link set bond1 up ip link add link bond1 vlan1 type vlan id 80 ip link set vlan1 master bond1 The call trace is as below: [] queued_spin_lock_slowpath+0xb/0xf [] _raw_spin_lock+0x20/0x30 [] dev_mc_sync+0x37/0x80 [] vlan_dev_set_rx_mode+0x1c/0x30 [8021q] [] __dev_set_rx_mode+0x5a/0xa0 [] dev_mc_sync_multiple+0x78/0x80 [] bond_enslave+0x67c/0x1190 [bonding] [] do_setlink+0x9c9/0xe50 [] rtnl_newlink+0x522/0x880 [] rtnetlink_rcv_msg+0xa7/0x260 [] netlink_rcv_skb+0xab/0xc0 [] rtnetlink_rcv+0x28/0x30 [] netlink_unicast+0x170/0x210 [] netlink_sendmsg+0x308/0x420 [] sock_sendmsg+0xb6/0xf0 This is actually a dead lock caused by sync slave hwaddr from master when the master is the slave's 'slave'. This dead loop check is actually done by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") moved it after dev_mc_sync. This patch is to fix it by moving dev_mc_sync after master_upper_dev_link, so that this loop check would be earlier than dev_mc_sync. It also moves if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an improvement. Note team driver also has this issue, I will fix it in another patch. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Reported-by: Beniamino Galvani Signed-off-by: Xin Long Acked-by: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: fix the err path for dev hwaddr sync in bond_enslave

2018-05-23T01:36:30Z

[ Upstream commit 5c78f6bfae2b10ff70e21d343e64584ea6280c26 ] vlan_vids_add_by_dev is called right after dev hwaddr sync, so on the err path it should unsync dev hwaddr. Otherwise, the slave dev's hwaddr will never be unsync when this err happens. Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones") Signed-off-by: Xin Long Reviewed-by: Nikolay Aleksandrov Acked-by: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: Don't update slave->link until ready to commit

2018-05-23T01:36:26Z

[ Upstream commit 797a93647a48d6cb8a20641a86a71713a947f786 ] In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state. However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link. This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link. NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that. Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link. Signed-off-by: Nithin Nayak Sujir Cc: Mahesh Bandewar Cc: Jay Vosburgh Acked-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: refine bond_fold_stats() wrap detection

2018-05-23T01:33:44Z

[ Upstream commit 142c6594acbcc32391af9c15f8cd65c6c177698f ] Some device drivers reset their stats at down/up events, possibly fooling bonding stats, since they operate with relative deltas. It is nearly not possible to fix drivers, since some of them compute the tx/rx counters based on per rx/tx queue stats, and the queues can be reconfigured (ethtool -L) between the down/up sequence. Lets avoid accumulating 'negative' values that render bonding stats useless. It is better to lose small deltas, assuming the bonding stats are fetched at a reasonable frequency. Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: discard lowest hash bit for 802.3ad layer3+4

2017-12-07T02:20:14Z

[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ] After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), we will try to use even ports for connect(). Then if an application (seen clearly with iperf) opens multiple streams to the same destination IP and port, each stream will be given an even source port. So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing will always hash all these streams to the same interface. And the total throughput will limited to a single slave. Change the tcp code will impact the whole tcp behavior, only for bonding usage. Paolo Abeni suggested fix this by changing the bonding code only, which should be more reasonable, and less impact. Fix this by discarding the lowest hash bit because it contains little entropy. After the fix we can re-balance between slaves. Signed-off-by: Paolo Abeni Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: fix bond_get_stats()

2016-07-11T03:07:05Z

[ Upstream commit fe30937b65354c7fec244caebbdaae68e28ca797 ] bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet Cc: Andy Gospodarek Cc: Jay Vosburgh Cc: Veaceslav Falico Reviewed-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: Fix ARP monitor validation

2016-03-04T15:25:50Z

[ Upstream commit 21a75f0915dde8674708b39abfcda113911c49b1 ] The current logic in bond_arp_rcv will accept an incoming ARP for validation if (a) the receiving slave is either "active" (which includes the currently active slave, or the current ARP slave) or, (b) there is a currently active slave, and it has received an ARP since it became active. For case (b), the receiving slave isn't the currently active slave, and is receiving the original broadcast ARP request, not an ARP reply from the target. This logic can fail if there is no currently active slave. In this situation, the ARP probe logic cycles through all slaves, assigning each in turn as the "current_arp_slave" for one arp_interval, then setting that one as "active," and sending an ARP probe from that slave. The current logic expects the ARP reply to arrive on the sending current_arp_slave, however, due to switch FDB updating delays, the reply may be directed to another slave. This can arise if the bonding slaves and switch are working, but the ARP target is not responding. When the ARP target recovers, a condition may result wherein the ARP target host replies faster than the switch can update its forwarding table, causing each ARP reply to be sent to the previous current_arp_slave. This will never pass the logic in bond_arp_rcv, as neither of the above conditions (a) or (b) are met. Some experimentation on a LAN shows ARP reply round trips in the 200 usec range, but my available switches never update their FDB in less than 4000 usec. This patch changes the logic in bond_arp_rcv to additionally accept an ARP reply for validation on any slave if there is a current ARP slave and it sent an ARP probe during the previous arp_interval. Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works") Cc: Veaceslav Falico Cc: Andy Gospodarek Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

bonding: Prevent IPv6 link local address on enslaved devices

2016-01-31T19:23:36Z

[ Upstream commit 03d84a5f83a67e692af00a3d3901e7820e3e84d5 ] Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") undoes the fix provided by commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master") by effectively setting the slave flag after the slave has been opened. If the slave comes up quickly enough, it will go through the IPv6 addrconf before the slave flag has been set and will get a link local IPv6 address. In order to ensure that addrconf knows to ignore the slave devices on state change, set IFF_SLAVE before dev_open() during bonding enslavement. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss Signed-off-by: Jay Vosburgh Reviewed-by: Jarod Wilson Signed-off-by: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman