重點是查看進程的線程中,哪個線程佔用cpu過高,然後用gdb附加到進程,調試線程,看是否有死循環或者死鎖等問題,步驟如下:
1 先用ps + grep找出該死的進程pid,比如 1706
2 top -H -p 1706,(top然後shift+H可以看出某個線程,左上角有提示:thread on 則爲可查看線程)所有該進程的線程都列出來, 看看哪個線程pid佔用最多,記下對應的線程號,如:1723
- gdb attach 到進程號碼(1706)
- (仍然在gdb中) info threads 結果大致如下:
(gdb) info threads
8 Thread 0x7f9fa9366700 (LWP 1716) 0x0000003cec00b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f9fa8965700 (LWP 1720) 0x0000003cec00b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7f9fa7f64700 (LWP 1721) 0x0000003cec00f4b5 in sigwait ()
from /lib64/libpthread.so.0
5 Thread 0x7f9fa7563700 (LWP 1722) 0x0000003cec00b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f9fa6b62700 (LWP 1723) 0x0000003cec00b5bc in pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
3 Thread 0x7f9fa6161700 (LWP 1724) 0x0000003cebce9163 in epoll_wait ()
from /lib64/libc.so.6
2 Thread 0x7f9fa1159700 (LWP 1887) 0x0000003cebce9163 in epoll_wait ()
from /lib64/libc.so.6
* 1 Thread 0x7f9fa95ad820 (LWP 1706) 0x0000003cec00b5bc in pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
找到線程號碼對應的thread(LWP1723)即是我們剛剛記下的線程號
- (仍然在gdb中)thread 線程號碼切換到線程(4)–這裏在info threads顯示出來的序號需要使用gdb能識別的線程序號,即執行:thread 4切換到我們剛剛記下的線程號:1723的對應線程,如下:
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f9fa6b62700 (LWP 1723))]#0 0x0000003cec00b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6.(仍然在gdb中)bt 查看線程調用堆棧
(gdb) bt
#0 0x0000003cec00b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9fa9f7144d in IceUtil::Cond::waitImpl (this=0x263f4c8,
mutex=...) at ../../include/IceUtil/Cond.h:215
#2 0x00007f9fa9f9a4b1 in IceUtil::Monitor::wait (this=0x263f4c8)
at ../../include/IceUtil/Monitor.h:152
#3 0x00007f9fa9fd7567 in IceInternal::EndpointHostResolver::run (this=0x263f480)
at EndpointI.cpp:161
#4 0x00007f9fa9b1b975 in startHook (arg=0x263f480) at Thread.cpp:413
#5 0x0000003cec0079d1 in start_thread () from /lib64/libpthread.so.0
#6 0x0000003cebce8b6d in clone () from /lib64/libc.so.6
7.從上面輸出的信息,基本上可以查看線程對應的代碼斷,是否有死循環等,如果是死鎖的話,需要多次查看當前線程堆棧,或者查看全部線程的堆棧,總是會有某些個線程跟其他線程不一致,然後再對應到代碼來進行定位解決