Skip to content

Conversation

gou4shi1
Copy link
Contributor

@gou4shi1 gou4shi1 commented May 17, 2023

I experienced a deadlock issue, the CRcvQueue::worker thread was blocked in processCtrlShutdown()->releaseSynch()->pthread_join() forever:

(gdb) thread 37 
[Switching to thread 37 (LWP 2609)]
#0  0x00007fd2f29cd98d in pthread_join (threadid=140541933123328, thread_return=0x0) at pthread_join.c:90
90	pthread_join.c: No such file or directory.
(gdb) bt
#0  0x00007fd2f29cd98d in pthread_join (threadid=140541933123328, thread_return=0x0) at pthread_join.c:90
#1  0x00007fd2f22a0c33 in std::thread::join() () from /home/proxy_server/lib/libstdc++.so.6
#2  0x000055dd734a4242 in srt::CUDT::releaseSynch (this=this@entry=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:7645
#3  0x000055dd734ad456 in srt::CUDT::updateBrokenConnection (this=this@entry=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:11426
#4  0x000055dd734ad8d0 in srt::CUDT::processCtrlShutdown (this=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:8916
#5  srt::CUDT::processCtrl (this=0x7fd241ee63f8, ctrlpkt=...) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:8994
#6  0x000055dd734c6cda in srt::CRcvQueue::worker_ProcessAddressedPacket (this=0x55dd746c3e90, id=<optimized out>, unit=0xca, addr=...) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/queue.cpp:1476
#7  0x000055dd734c5e21 in srt::CRcvQueue::worker (param=0x55dd746c3e90) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/queue.cpp:1255
#8  0x00007fd2f22a0a00 in ?? () from /home/proxy_server/lib/libstdc++.so.6
#9  0x00007fd2f29cc6ba in start_thread (arg=0x7fd24e7fc700) at pthread_create.c:333
#10 0x00007fd2f1cf551d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) p this->m_SocketID
$1 = 94643394

While the corresponding tsbpd thread of @94643394 was blocked in wait() forever:

(gdb) thread 67
[Switching to thread 67 (LWP 3397)]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fd2f229b0fc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /home/proxy_server/lib/libstdc++.so.6
#2  0x000055dd734a26f1 in srt::sync::CSync::wait (this=<optimized out>) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/sync.h:519
#3  srt::CUDT::tsbpd (param=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:5495
#4  0x00007fd2f22a0a00 in ?? () from /home/proxy_server/lib/libstdc++.so.6
#5  0x00007fd2f29cc6ba in start_thread (arg=0x7fd277fff700) at pthread_create.c:333
#6  0x00007fd2f1cf551d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) p ((srt::CUDT*)param)->m_SocketID
$59 = 94643394

The only reason I can find is that the

CSync::lock_notify_one(m_RcvTsbPdCond, m_RecvLock);

happened while the tsbpd thread was under
InvertedLock unrecv(self->m_RecvLock);

The order was
TsbPd: unlock(m_RecvLock) ->
RcvQ: lock_notify(m_RcvTsbPdCond, m_RecvLock); m_RcvTsbPdThread.join() ->
TsbPd: lock(m_RecvLock); tsbpd_cc.wait()

The proposed solution is simple. We just need to check m_bClosing again after re-locking m_RecvLock:

  • If m_bClosing = true happens before this check, the tsbpd thread will exit after this check.
  • If m_bClosing = true happens after this check, the following lock_notify(m_RcvTsbPdCond, m_RecvLock) will happen after tsbpd_cc.wait() (because of the m_RecvLock), so the tsbpd thread will be woken up and cleared.

@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels May 17, 2023
@maxsharabayko maxsharabayko added this to the v1.6.0 milestone May 17, 2023
@maxsharabayko maxsharabayko modified the milestones: v1.6.0, v1.5.3 Aug 9, 2023
@maxsharabayko maxsharabayko merged commit 7cfe12b into Haivision:master Aug 9, 2023
@gou4shi1 gou4shi1 deleted the fix-tsbpd-deadlock branch August 9, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants