From da3627c30d229fea1e070e984366f80a1c4d9166 Mon Sep 17 00:00:00 2001 From: Gang He Date: Tue, 29 May 2018 11:09:22 +0800 Subject: dlm: remove O_NONBLOCK flag in sctp_connect_to_sock We should remove O_NONBLOCK flag when calling sock->ops->connect() in sctp_connect_to_sock() function. Why? 1. up to now, sctp socket connect() function ignores the flag argument, that means O_NONBLOCK flag does not take effect, then we should remove it to avoid the confusion (but is not urgent). 2. for the future, there will be a patch to fix this problem, then the flag argument will take effect, the patch has been queued at https://git.kernel.o rg/pub/scm/linux/kernel/git/davem/net.git/commit/net/sctp?id=644fbdeacf1d3ed d366e44b8ba214de9d1dd66a9. But, the O_NONBLOCK flag will make sock->ops->connect() directly return without any wait time, then the connection will not be established, DLM kernel module will call sock->ops->connect() again and again, the bad results are, CPU usage is almost 100%, even trigger soft_lockup problem if the related configurations are enabled, DLM kernel module also prints lots of messages like, [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592 [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592 [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592 [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592 The upper application (e.g. ocfs2 mount command) is hanged at new_lockspace(), the whole backtrace is as below, tb0307-nd2:~ # cat /proc/2935/stack [<0>] new_lockspace+0x957/0xac0 [dlm] [<0>] dlm_new_lockspace+0xae/0x140 [dlm] [<0>] user_cluster_connect+0xc3/0x3a0 [ocfs2_stack_user] [<0>] ocfs2_cluster_connect+0x144/0x220 [ocfs2_stackglue] [<0>] ocfs2_dlm_init+0x215/0x440 [ocfs2] [<0>] ocfs2_fill_super+0xcb0/0x1290 [ocfs2] [<0>] mount_bdev+0x173/0x1b0 [<0>] mount_fs+0x35/0x150 [<0>] vfs_kern_mount.part.23+0x54/0x100 [<0>] do_mount+0x59a/0xc40 [<0>] SyS_mount+0x80/0xd0 [<0>] do_syscall_64+0x76/0x140 [<0>] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [<0>] 0xffffffffffffffff So, I think we should remove O_NONBLOCK flag here, since DLM kernel module can not handle non-block sockect in connect() properly. Signed-off-by: Gang He Signed-off-by: David Teigland --- fs/dlm/lowcomms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index d31e9abfb9f1..a5e4a221435c 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1092,7 +1092,7 @@ static void sctp_connect_to_sock(struct connection *con) kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv, sizeof(tv)); result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len, - O_NONBLOCK); + 0); memset(&tv, 0, sizeof(tv)); kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv, sizeof(tv)); -- cgit v1.2.3-58-ga151