<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/smc/af_smc.c, branch linux-5.16.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.16.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.16.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-04-13T18:03:24Z</updated>
<entry>
<title>net/smc: send directly on setting TCP_NODELAY</title>
<updated>2022-04-13T18:03:24Z</updated>
<author>
<name>Dust Li</name>
<email>dust.li@linux.alibaba.com</email>
</author>
<published>2022-03-01T09:43:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=00183be98b9d3dfdca7f473084a2495ace8fc733'/>
<id>urn:sha1:00183be98b9d3dfdca7f473084a2495ace8fc733</id>
<content type='text'>
commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream.

In commit ea785a1a573b("net/smc: Send directly when
TCP_CORK is cleared"), we don't use delayed work
to implement cork.

This patch use the same algorithm, removes the
delayed work when setting TCP_NODELAY and send
directly in setsockopt(). This also makes the
TCP_NODELAY the same as TCP.

Cc: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Send directly when TCP_CORK is cleared</title>
<updated>2022-04-13T18:02:59Z</updated>
<author>
<name>Tony Lu</name>
<email>tonylu@linux.alibaba.com</email>
</author>
<published>2022-01-30T18:02:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5d79fbe06177bd9b6e4bc65d1c8cb24a09f47933'/>
<id>urn:sha1:5d79fbe06177bd9b6e4bc65d1c8cb24a09f47933</id>
<content type='text'>
[ Upstream commit ea785a1a573b390a150010b3c5b81e1ccd8c98a8 ]

According to the man page of TCP_CORK [1], if set, don't send out
partial frames. All queued partial frames are sent when option is
cleared again.

When applications call setsockopt to disable TCP_CORK, this call is
protected by lock_sock(), and tries to mod_delayed_work() to 0, in order
to send pending data right now. However, the delayed work smc_tx_work is
also protected by lock_sock(). There introduces lock contention for
sending data.

To fix it, send pending data directly which acts like TCP, without
lock_sock() protected in the context of setsockopt (already lock_sock()ed),
and cancel unnecessary dealyed work, which is protected by lock.

[1] https://linux.die.net/man/7/tcp

Signed-off-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: fix connection leak</title>
<updated>2022-03-08T18:14:09Z</updated>
<author>
<name>D. Wythe</name>
<email>alibuda@linux.alibaba.com</email>
</author>
<published>2022-02-24T15:26:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e98d46ccfa84b35a9e4b1ccdd83961b41a5d7ce5'/>
<id>urn:sha1:e98d46ccfa84b35a9e4b1ccdd83961b41a5d7ce5</id>
<content type='text'>
commit 9f1c50cf39167ff71dc5953a3234f3f6eeb8fcb5 upstream.

There's a potential leak issue under following execution sequence :

smc_release  				smc_connect_work
if (sk-&gt;sk_state == SMC_INIT)
					send_clc_confirim
	tcp_abort();
					...
					sk.sk_state = SMC_ACTIVE
smc_close_active
switch(sk-&gt;sk_state) {
...
case SMC_ACTIVE:
	smc_close_final()
	// then wait peer closed

Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
still in the tcp send buffer, in which case our connection token cannot
be delivered to the server side, which means that we cannot get a
passive close message at all. Therefore, it is impossible for the to be
disconnected at all.

This patch tries a very simple way to avoid this issue, once the state
has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
smc connection, considering that the state is SMC_INIT before
tcp_abort(), abandoning the complete disconnection process should not
cause too much problem.

In fact, this problem may exist as long as the CLC CONFIRM message is
not received by the server. Whether a timer should be added after
smc_close_final() needs to be discussed in the future. But even so, this
patch provides a faster release for connection in above case, it should
also be valuable.

Fixes: 39f41f367b08 ("net/smc: common release code for non-accepted sockets")
Signed-off-by: D. Wythe &lt;alibuda@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Avoid overwriting the copies of clcsock callback functions</title>
<updated>2022-02-23T11:05:58Z</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-02-09T14:10:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f00b6c976ae0dfbd9b891175f713f59095d23842'/>
<id>urn:sha1:f00b6c976ae0dfbd9b891175f713f59095d23842</id>
<content type='text'>
commit 1de9770d121ee9294794cca0e0be8fbfa0134ee8 upstream.

The callback functions of clcsock will be saved and replaced during
the fallback. But if the fallback happens more than once, then the
copies of these callback functions will be overwritten incorrectly,
resulting in a loop call issue:

clcsk-&gt;sk_error_report
 |- smc_fback_error_report() &lt;------------------------------|
     |- smc_fback_forward_wakeup()                          | (loop)
         |- clcsock_callback()  (incorrectly overwritten)   |
             |- smc-&gt;clcsk_error_report() ------------------|

So this patch fixes the issue by saving these function pointers only
once in the fallback and avoiding overwriting.

Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Forward wakeup to smc socket waitqueue after fallback</title>
<updated>2022-02-08T17:35:14Z</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-01-26T15:33:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=504078fbe9dd570d685361b57784a6050bc40aaa'/>
<id>urn:sha1:504078fbe9dd570d685361b57784a6050bc40aaa</id>
<content type='text'>
commit 341adeec9adad0874f29a0a1af35638207352a39 upstream.

When we replace TCP with SMC and a fallback occurs, there may be
some socket waitqueue entries remaining in smc socket-&gt;wq, such
as eppoll_entries inserted by userspace applications.

After the fallback, data flows over TCP/IP and only clcsocket-&gt;wq
will be woken up. Applications can't be notified by the entries
which were inserted in smc socket-&gt;wq before fallback. So we need
a mechanism to wake up smc socket-&gt;wq at the same time if some
entries remaining in it.

The current workaround is to transfer the entries from smc socket-&gt;wq
to clcsock-&gt;wq during the fallback. But this may cause a crash
like this:

 general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E     5.16.0+ #107
 RIP: 0010:__wake_up_common+0x65/0x170
 Call Trace:
  &lt;IRQ&gt;
  __wake_up_common_lock+0x7a/0xc0
  sock_def_readable+0x3c/0x70
  tcp_data_queue+0x4a7/0xc40
  tcp_rcv_established+0x32f/0x660
  ? sk_filter_trim_cap+0xcb/0x2e0
  tcp_v4_do_rcv+0x10b/0x260
  tcp_v4_rcv+0xd2a/0xde0
  ip_protocol_deliver_rcu+0x3b/0x1d0
  ip_local_deliver_finish+0x54/0x60
  ip_local_deliver+0x6a/0x110
  ? tcp_v4_early_demux+0xa2/0x140
  ? tcp_v4_early_demux+0x10d/0x140
  ip_sublist_rcv_finish+0x49/0x60
  ip_sublist_rcv+0x19d/0x230
  ip_list_rcv+0x13e/0x170
  __netif_receive_skb_list_core+0x1c2/0x240
  netif_receive_skb_list_internal+0x1e6/0x320
  napi_complete_done+0x11d/0x190
  mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
  __napi_poll+0x3c/0x1b0
  net_rx_action+0x27c/0x300
  __do_softirq+0x114/0x2d2
  irq_exit_rcu+0xb4/0xe0
  common_interrupt+0xba/0xe0
  &lt;/IRQ&gt;
  &lt;TASK&gt;

The crash is caused by privately transferring waitqueue entries from
smc socket-&gt;wq to clcsock-&gt;wq. The owners of these entries, such as
epoll, have no idea that the entries have been transferred to a
different socket wait queue and still use original waitqueue spinlock
(smc socket-&gt;wq.wait.lock) to make the entries operation exclusive,
but it doesn't work. The operations to the entries, such as removing
from the waitqueue (now is clcsock-&gt;wq after fallback), may cause a
crash when clcsock waitqueue is being iterated over at the moment.

This patch tries to fix this by no longer transferring wait queue
entries privately, but introducing own implementations of clcsock's
callback functions in fallback situation. The callback functions will
forward the wakeup to smc socket-&gt;wq if clcsock-&gt;wq is actually woken
up and smc socket-&gt;wq has remaining entries.

Fixes: 2153bd1e3d3d ("net/smc: Transfer remaining wait queue entries during fallback")
Suggested-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Transitional solution for clcsock race issue</title>
<updated>2022-02-01T16:29:16Z</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-01-22T09:43:09Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4284225cd8001e134f5cf533a7cd244bbb654d0f'/>
<id>urn:sha1:4284225cd8001e134f5cf533a7cd244bbb654d0f</id>
<content type='text'>
[ Upstream commit c0bf3d8a943b6f2e912b7c1de03e2ef28e76f760 ]

We encountered a crash in smc_setsockopt() and it is caused by
accessing smc-&gt;clcsock after clcsock was released.

 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
 RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
 Call Trace:
  &lt;TASK&gt;
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f16ba83918e
  &lt;/TASK&gt;

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc-&gt;clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Fixes: fd57770dd198 ("net/smc: wait for pending work before clcsock release_sock")
Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/
Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Reset conn-&gt;lgr when link group registration fails</title>
<updated>2022-01-27T11:01:56Z</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-01-06T12:42:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ac2e01b7e82a9f1f2185c86b2791e654a62b784e'/>
<id>urn:sha1:ac2e01b7e82a9f1f2185c86b2791e654a62b784e</id>
<content type='text'>
[ Upstream commit 36595d8ad46d9e4c41cc7c48c4405b7c3322deac ]

SMC connections might fail to be registered in a link group due to
unable to find a usable link during its creation. As a result,
smc_conn_create() will return a failure and most resources related
to the connection won't be applied or initialized, such as
conn-&gt;abort_work or conn-&gt;lnk.

If smc_conn_free() is invoked later, it will try to access the
uninitialized resources related to the connection, thus causing
a warning or crash.

This patch tries to fix this by resetting conn-&gt;lgr to NULL if an
abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the
access to uninitialized resources in smc_conn_free().

Meanwhile, the new created link group should be terminated if smc
connections can't be registered in it. So smc_lgr_cleanup_early() is
modified to take care of link group only and invoked to terminate
unusable link group by smc_conn_create(). The call to smc_conn_free()
is moved out from smc_lgr_cleanup_early() to smc_conn_abort().

Fixes: 56bc3b2094b4 ("net/smc: assign link to a new connection")
Suggested-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Prevent smc_release() from long blocking</title>
<updated>2021-12-16T16:11:05Z</updated>
<author>
<name>D. Wythe</name>
<email>alibuda@linux.alibaba.com</email>
</author>
<published>2021-12-15T12:29:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5c15b3123f65f8fbb1b445d9a7e8812e0e435df2'/>
<id>urn:sha1:5c15b3123f65f8fbb1b445d9a7e8812e0e435df2</id>
<content type='text'>
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)

server: smc_run nginx

client: smc_run wrk -c 10000 -t 1 http://server

Client hangs with the following backtrace:

0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c

This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.

In order to fix this, There are two changes:

1. For those idle smc_connect_work(), cancel it from the workqueue; for
   executing smc_connect_work(), waiting for it to finish. For that
   purpose, replace flush_work() with cancel_work_sync().

2. Since smc_connect() hold a reference for passive closing, if
   smc_connect_work() has been cancelled, release the reference.

Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Tested-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: D. Wythe &lt;alibuda@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Don't call clcsock shutdown twice when smc shutdown</title>
<updated>2021-11-26T19:23:35Z</updated>
<author>
<name>Tony Lu</name>
<email>tonylu@linux.alibaba.com</email>
</author>
<published>2021-11-26T02:41:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bacb6c1e47691cda4a95056c21b5487fb7199fcc'/>
<id>urn:sha1:bacb6c1e47691cda4a95056c21b5487fb7199fcc</id>
<content type='text'>
When applications call shutdown() with SHUT_RDWR in userspace,
smc_close_active() calls kernel_sock_shutdown(), and it is called
twice in smc_shutdown().

This fixes this by checking sk_state before do clcsock shutdown, and
avoids missing the application's call of smc_shutdown().

Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
Fixes: 606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock")
Signed-off-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Reviewed-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Acked-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Fix loop in smc_listen</title>
<updated>2021-11-25T03:02:21Z</updated>
<author>
<name>Guo DaXing</name>
<email>guodaxing@huawei.com</email>
</author>
<published>2021-11-24T12:32:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9ebb0c4b27a6158303b791b5b91e66d7665ee30e'/>
<id>urn:sha1:9ebb0c4b27a6158303b791b5b91e66d7665ee30e</id>
<content type='text'>
The kernel_listen function in smc_listen will fail when all the available
ports are occupied.  At this point smc-&gt;clcsock-&gt;sk-&gt;sk_data_ready has
been changed to smc_clcsock_data_ready.  When we call smc_listen again,
now both smc-&gt;clcsock-&gt;sk-&gt;sk_data_ready and smc-&gt;clcsk_data_ready point
to the smc_clcsock_data_ready function.

The smc_clcsock_data_ready() function calls lsmc-&gt;clcsk_data_ready which
now points to itself resulting in an infinite loop.

This patch restores smc-&gt;clcsock-&gt;sk-&gt;sk_data_ready with the old value.

Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
Signed-off-by: Guo DaXing &lt;guodaxing@huawei.com&gt;
Acked-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
