做系統穩定性問題分析,當遇到系統卡死時,我們經常要使用“kill -3 pid”來打印System_Server進程各個線程的Java調用棧,根據線程狀態及調用棧來更進一步定位問題點,當然某個應該界面卡頓時間長時也可以通過這個命令來抓取Java調用棧進行分析。注意native進程是不能用kill -3來打trace的,而是使用debuggerd。但是某些時候打印不出來trace,要知道原因,自然要知道“kill -3 pid”原理是怎麼樣的。
“Signal Catcher”線程。由Zygote孵化出來的每個進程會啓動一個“Signal Catcher”線程,這個線程就是專門用來接收、處理進程收到的SIGQUIT、SIGUSR1信號的。注意,Zygote進程是不存在“Signal Catcher”線程的,所以是打不出來trace的。利用“ps -t pid”可打印進程pid的所有線程,可以看到有一個“Signal Catcher”線程。
“Signal Catcher”線程啓動。啓動流程很簡單,如下圖所示,可根據下面這個流程自行走一遍代碼(基於Android 5.1)。
上面這個時序圖中,主要邏輯集中在art/runtime/Signal_catcher.cc文件中,下面將具體分析時序圖中的run()、HandleSigQuit()、Output()三個函數。
1、run()
- void* SignalCatcher::Run(void* arg) {
- SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
- CHECK(signal_catcher != NULL);
- Runtime* runtime = Runtime::Current();
- CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
- !runtime->IsCompiler())); //將線程名更改爲“Signal Catcher”,該函數更詳細的解釋見:http://www.th7.cn/Program/java/201405/195472.shtml
- Thread* self = Thread::Current();
- DCHECK_NE(self->GetState(), kRunnable);
- {
- MutexLock mu(self, signal_catcher->lock_);
- signal_catcher->thread_ = self;
- signal_catcher->cond_.Broadcast(self);
- }
- // Set up mask with signals we want to handle.
- SignalSet signals;
- signals.Add(SIGQUIT); //添加接收的信號包括SIGQUIT、SIGUSR1。SIGQUIT毫無疑問是打印trace的,SIGUSR1(-10)是觸發強制GC。
- signals.Add(SIGUSR1);
- while (true) {
- int signal_number = signal_catcher->WaitForSignal(self, signals); //等待SIGQUIT、SIGUSR1信號來臨,信號來了後該調用返回,否則阻塞在該調用上(WaitForSignal()函數裏面實際上是調用了SignalSet.Wait(),具體實現在art/runtime/Signal_set.h文件中,SignalSet.wait()函數調用了sigwait()這個系統調用來阻塞接收SIGQUIT、SIGUSR1信號);
- if (signal_catcher->ShouldHalt()) { //如果SignalCatcher對象已經調了析構函數,那麼直接調用DetachCurrentThread(),正常情況下該條件不滿足;
- runtime->DetachCurrentThread();
- return NULL;
- }
- switch (signal_number) {
- case SIGQUIT: //kill -3 pid,調用HandleSigQuit(),打印所有線程的調用棧;
- signal_catcher->HandleSigQuit();
- break;
- case SIGUSR1: //kill -10 pid,調用HandleSigUsr1(),觸發強制GC;
- signal_catcher->HandleSigUsr1();
- break;
- default:
- LOG(ERROR) << "Unexpected signal %d" << signal_number;
- break;
- }
- }
- }
- void SignalCatcher::HandleSigQuit() {
- Runtime* runtime = Runtime::Current();
- ThreadList* thread_list = runtime->GetThreadList(); //獲取所有的線程;
- // Grab exclusively the mutator lock, set state to Runnable without checking for a pending
- // suspend request as we're going to suspend soon anyway. We set the state to Runnable to avoid
- // giving away the mutator lock.
- thread_list->SuspendAll(); //掛起所有的線程。上面那段註釋的意識是:如果某個線程持有某個鎖並在runnable狀態,那麼並不真的去掛起這個線程,所以我們會在trace中見到runnable的線程?
- Thread* self = Thread::Current();
- Locks::mutator_lock_->AssertExclusiveHeld(self);
- const char* old_cause = self->StartAssertNoThreadSuspension("Handling SIGQUIT");
- ThreadState old_state = self->SetStateUnsafe(kRunnable);
- std::ostringstream os; //定義一個字符串流,用來包裝、格式化輸出內容;
- os << "\n"
- << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";
- DumpCmdLine(os); //打印cmdline中的內容;
- // Note: The string "ABI:" is chosen to match the format used by debuggerd.
- os << "ABI: " << GetInstructionSetString(runtime->GetInstructionSet()) << "\n";
- os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";
- runtime->DumpForSigQuit(os);
- if (false) {
- std::string maps;
- if (ReadFileToString("/proc/self/maps", &maps)) {
- os << "/proc/self/maps:\n" << maps;
- }
- }
- os << "----- end " << getpid() << " -----\n"; //trace結束標誌;
- CHECK_EQ(self->SetStateUnsafe(old_state), kRunnable);
- self->EndAssertNoThreadSuspension(old_cause);
- thread_list->ResumeAll(); //resume所有掛起的線程;
- // Run the checkpoints after resuming the threads to prevent deadlocks if the checkpoint function
- // acquires the mutator lock.
- if (self->ReadFlag(kCheckpointRequest)) {
- self->RunCheckpointFunction();
- }
- Output(os.str()); //調用Output()將字符串流中的內容寫到traces.txt中;
- }
3、Output()
- void SignalCatcher::Output(const std::string& s) {
- if (stack_trace_file_.empty()) {
- LOG(INFO) << s;
- return;
- }
- ScopedThreadStateChange tsc(Thread::Current(), kWaitingForSignalCatcherOutput);
- int fd = open(stack_trace_file_.c_str(), O_APPEND | O_CREAT | O_WRONLY, 0666); //以追加、創建、可寫方式打開/data/anr/traces.txt
- if (fd == -1) {
- PLOG(ERROR) << "Unable to open stack trace file '" << stack_trace_file_ << "'";
- return;
- }
- std::unique_ptr<File> file(new File(fd, stack_trace_file_));
- if (!file->WriteFully(s.data(), s.size())) { //將字符串流寫入/data/anr/traces.txt中
- PLOG(ERROR) << "Failed to write stack traces to '" << stack_trace_file_ << "'";
- } else {
- LOG(INFO) << "Wrote stack traces to '" << stack_trace_file_ << "'";
- }
- }
總結:熟悉了這個流程,以後碰到打不出來trace,通過日誌可大致定位問題點。最後再說一下SIGQUIT、SIGUSR1信號處理,SIGQUIT(kill -3 pid)用來打印Java進程trace,SIGUSR1(kill -10 pid)可觸發進程進行一次強制GC。