android anr 產生的類型及原因

android anr 產生的條件

android 系統中anr的本質是主線程無法響應。而導致主線程無法響應的原因大致如下:

  • 主線程請求網絡資源,數據庫訪問或者io訪問,這些操作都是耗時操作,主線程處於阻塞狀態,如果超時等待,會發生anr;
  • cpu處於飢餓狀態,無法讓主線程運行,導致anr;
  • 其他進程或者線程佔用cpu資源,無法釋放資源讓該主線程運行,導致anr;
  • 死鎖,即主線程等待的鎖正在被其它線程佔用,無法釋放。

anr問題一般出現在app代碼中,systemserver進程中的inputDispatcher線程會一直監聽app的響應時間,如果鍵盤或者觸摸事件超時等待5s沒有響應,broadcastreceiver超時10s沒有響應,或者service超時響應都會發生anr,ActivityManagerService會將anr的直接原因在aplog中打印出來,另外通知kernel往對應進程發送signal 3,將該進程的各個線程的函數堆棧信息打印出來,輸出到data/anr/traces.txt中。所以分析anr問題一般主要看的就是aplog和traces.txt。具體類型分解如下:

1. 輸入事件處理無響應

當應用程序的窗口處於活動狀態並且能夠接收輸入事件(例如按鍵事件、觸摸事件等)時,系統底層上報的事件就會被InputDispatcher 分發給該應用程序。對大多數窗口而言“處於活動狀態”可以理解爲“能夠獲得焦點且已經獲取焦點”,但是一些具有FLAG_NOT_FOCUSABLE 屬性的窗口(設置之後window永遠不會獲取焦點,所以用戶不能給此window發送點擊事件焦點會傳遞給在其下面的可獲取焦點的window)除外。

應用程序的主線程通過InputChannel讀取輸入事件並交給界面視圖處理,界面視圖是一個樹狀結構,DecorView是視圖樹的根,事件從樹根開始一層一層向焦點控件(例如一個 Button)傳遞。開發者通常需要註冊監聽器來接收並處理事件,或者創建自定義的視圖控件來處理事件。

InputDispatcher運行在system_server進程的一個子線程中,每當接收到一個新的輸入事件,InputDispatcher就會檢測前一個已經發給應用程序的輸入時間是否已經處理完畢,如果超時,會通過一系列的回調通知WMS的notifyANR函數報告ANR發生。

需要注意的是,產生這種ANR的前提是要有輸入事件,如果沒有輸入事件,即使主線程阻塞了也不會報告ANR。從設計的角度看,此時系統會推測用戶沒有關注手機,寄希望於一段時間後阻塞會自行消失,因此會暫時“隱瞞不報”。從實現的角度看,InputDispatcher沒有分發事件給應用程序,當然也不會檢測處理超時和報告ANR了。

此類ANR發生時的提示語是:Reason: Input dispatching timed out (Waiting because the focused window has not finished processing the input events that were previously delivered to it.)需要注意區分同爲Input dispatching timed out大類的窗口獲取焦點超時,這兩類超時括號內的提示語是不同的。

此類ANR的超時時間在ActivityManagerService.java中定義,默認爲5秒。如果有需要可以修改代碼將小內存設備上的超時時間改爲大於5秒。或者在某一段時間內將此參數值設置爲相應合理值。

2 . get focus timeout 窗口獲取焦點超時

窗口獲取焦點超時是用戶輸入事件處理超時的一種子類型,它們都由InputDispatcher向AMS上報。當應用程序的窗口處於“活動狀態”並且能夠接收輸入事件時,系統底層上報的事件就會被InputDispatcher分發給該應用程序。如果由於某種原因,窗口遲遲不能達到“活動狀態”,不能接收輸入事件,此時InputDispatcher就會報出“窗口獲取焦點超時”。

此類ANR發生時的提示語是:Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)需要注意區分同爲Input dispatching timed out大類的用戶輸入事件處理超時,這兩類超時括號內的提示語是不同的。

爲了研究窗口爲什麼會獲取焦點超時,我們需要簡單瞭解在窗口切換過程中焦點應用和焦點窗口的切換邏輯。假設當前正處於應用A中,將要啓動應用B。啓動過程中焦點應用和焦點窗口轉換如下:

流程開始,焦點應用是A,焦點窗口是A(的某一個窗口) ====》 當A開始OnPause流程後,焦點應用是A,焦點窗口是null  ====》 在zygote創建B的進程完畢後,焦點應用是B,焦點窗口是null  ====》 應用B的OnResume流程完成後,焦點應用是B,焦點窗口是B(的某一個窗口)

在這個過程當中有兩個階段的焦點窗口是null,那麼如果焦點窗口爲 null 階段的時間超過了5秒,應用就會被報告爲窗口獲取焦點超時類的ANR。另外這個過程當中有兩個階段的焦點窗口是null,系統報告的ANR應用不一定是真實產生ANR的應用。因此在分析窗口獲取焦點超時的ANR時,一定要注意分析當前焦點應用和焦點窗口是否一致,首先要明確ANR的真正應用是哪一個,再進行進一步分析纔會更有意義。

那麼“焦點窗口爲 null 階段的時間超過了5秒”這種情況又是爲什麼會出現呢?一般由下面幾個原因導致:

  • 應用程序創建慢。程序的OnCreate/OnStart/OnResume方法執行速度慢/存在死鎖/死循環導致OnResume遲遲不能執行完畢,超時造成ANR。
  • 應用程序'OnPause'慢。對同一個應用而言,前一次OnPause執行完畢之前後一次OnResume不會執行。但不同應用之間不會互相影響。
  • 系統整體性能慢。由於系統性能原因,如CPU佔用率高/平均等待隊列長/內存碎片化/頁錯誤高/GC慢/用戶空間凍結/進程陷入不可打斷的睡眠,會造成整體運行慢使ANR頻繁發生。
  • 'WMS'異常。由於4.4上存在的原生Bug,有時應用OnResume執行完畢後8秒焦點仍然不會轉換。導致ANR發生。

3 . Broadcast timeout 廣播接收處理超時

當應用程序主線程在執行BroadcastReceiver的onReceive方法時,超時沒有執行完畢,就會報出廣播超時類型的ANR。對於前臺進程超時時間是10秒,後臺進程超時時間是60秒。如果需要完成一項比較耗時的工作,應當通過發送Intent給應用的Service來完成,而不應長時間佔用OnReceive主線程。與前兩類ANR不同,系統對這類ANR不會顯示對話框提示,僅在slog中輸出異常信息。

此類ANR發生時的提示語是:Reason: Broadcast of Intent  { act=android.intent.action.NEW_OUTGOING_CALL  flg=0x10000010 cmp=com.qualcomm.location/.GpsNetInitiatedHandler$OutgoingCallReceiver (has extras) }

在小內存Android設備上,Kernel中的LowMemoryKiller會頻繁地殺死一些後臺應用以釋放內存。如果一個應用恰好在開始執行OnReceive方法時被LMK殺死,那麼在60秒後BoardcastQueue檢查廣播處理情況時此應用就一定會發生ANR。這種場景的關鍵特徵是報出ANR時System.log中會顯示ANR應用的PID爲0。

爲避免此類問題發生,提高Monkey測試首錯時間,可以在BoardcastQueue中添加代碼,檢測廣播超時ANR的PID,爲0時不報ANR。

4.Service Timeout 服務超時

Service 的各個生命週期函數,如OnStart、OnCreate、OnStop也運行在主線程中,當這些函數超過 20 秒鐘沒有返回就會觸發 ANR。同樣對這種情況的 ANR 系統也不會顯示對話框提示,僅輸出 log。

此類ANR的提示語是:Reason: Executing service com.ysxj.RenHeDao/.Service.PollingService

5.ContentProvider執行超時

主線程在執行 ContentProvider 相關操作時沒有在規定的時間內執行完畢。log如:Reason: ContentProvider not responding。不會報告 ANR彈框。

產生這類ANR是應用啓動,調用AMS.attachApplicationLocked()方法,發佈啓動進程的所有
ContentProvider時發生

在android5.1中相關安然提示出處如下(ActivityManagerService.java):

    public boolean inputDispatchingTimedOut(final ProcessRecord proc,
            final ActivityRecord activity, final ActivityRecord parent,
            final boolean aboveSystem, String reason) {
        if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS)
                != PackageManager.PERMISSION_GRANTED) {
            throw new SecurityException("Requires permission "
                    + android.Manifest.permission.FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
            annotation = "Input dispatching timed out";
        } else {
            annotation = "Input dispatching timed out (" + reason + ")";
        }
    ......
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
        const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) {
    int32_t injectionResult;
    String8 reason;

    // If there is no currently focused window and no focused application
    // then drop the event.
    if (mFocusedWindowHandle == NULL) {
        if (mFocusedApplicationHandle != NULL) {
            injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                    mFocusedApplicationHandle, NULL, nextWakeupTime,
                    "Waiting because no window has focus but there is a "
                    "focused application that may eventually add a window "
                    "when it finishes starting up.");
            goto Unresponsive;
        }

        ALOGI("Dropping event because there is no focused window or focused application.");
        injectionResult = INPUT_EVENT_INJECTION_FAILED;
        goto Failed;
    }
    ......
String8 InputDispatcher::checkWindowReadyForMoreInputLocked(nsecs_t currentTime,
        const sp<InputWindowHandle>& windowHandle, const EventEntry* eventEntry,
        const char* targetType) {
    // If the window is paused then keep waiting.
    if (windowHandle->getInfo()->paused) {
        return String8::format("Waiting because the %s window is paused.", targetType);
    }

    // If the window's connection is not registered then keep waiting.
    ssize_t connectionIndex = getConnectionIndexLocked(windowHandle->getInputChannel());
    if (connectionIndex < 0) {
        return String8::format("Waiting because the %s window's input channel is not "
                "registered with the input dispatcher.  The window may be in the process "
                "of being removed.", targetType);
    }

    // If the connection is dead then keep waiting.
    sp<Connection> connection = mConnectionsByFd.valueAt(connectionIndex);
    if (connection->status != Connection::STATUS_NORMAL) {
        return String8::format("Waiting because the %s window's input connection is %s."
                "The window may be in the process of being removed.", targetType,
                connection->getStatusLabel());
    }

    // If the connection is backed up then keep waiting.
    if (connection->inputPublisherBlocked) {
        return String8::format("Waiting because the %s window's input channel is full.  "
                "Outbound queue length: %d.  Wait queue length: %d.",
                targetType, connection->outboundQueue.count(), connection->waitQueue.count());
    }

    // Ensure that the dispatch queues aren't too far backed up for this event.
    if (eventEntry->type == EventEntry::TYPE_KEY) {
        // If the event is a key event, then we must wait for all previous events to
        // complete before delivering it because previous events may have the
        // side-effect of transferring focus to a different window and we want to
        // ensure that the following keys are sent to the new window.
        //
        // Suppose the user touches a button in a window then immediately presses "A".
        // If the button causes a pop-up window to appear then we want to ensure that
        // the "A" key is delivered to the new pop-up window.  This is because users
        // often anticipate pending UI changes when typing on a keyboard.
        // To obtain this behavior, we must serialize key events with respect to all
        // prior input events.
        if (!connection->outboundQueue.isEmpty() || !connection->waitQueue.isEmpty()) {
            return String8::format("Waiting to send key event because the %s window has not "
                    "finished processing all of the input events that were previously "
                    "delivered to it.  Outbound queue length: %d.  Wait queue length: %d.",
                    targetType, connection->outboundQueue.count(), connection->waitQueue.count());
        }
    } else {
        // Touch events can always be sent to a window immediately because the user intended
        // to touch whatever was visible at the time.  Even if focus changes or a new
        // window appears moments later, the touch event was meant to be delivered to
        // whatever window happened to be on screen at the time.
        //
        // Generic motion events, such as trackball or joystick events are a little trickier.
        // Like key events, generic motion events are delivered to the focused window.
        // Unlike key events, generic motion events don't tend to transfer focus to other
        // windows and it is not important for them to be serialized.  So we prefer to deliver
        // generic motion events as soon as possible to improve efficiency and reduce lag
        // through batching.
        //
        // The one case where we pause input event delivery is when the wait queue is piling
        // up with lots of events because the application is not responding.
        // This condition ensures that ANRs are detected reliably.
        if (!connection->waitQueue.isEmpty()
                && currentTime >= connection->waitQueue.head->deliveryTime
                        + STREAM_AHEAD_EVENT_TIMEOUT) {
            return String8::format("Waiting to send non-key event because the %s window has not "
                    "finished processing certain input events that were delivered to it over "
                    "%0.1fms ago.  Wait queue length: %d.  Wait queue head age: %0.1fms.",
                    targetType, STREAM_AHEAD_EVENT_TIMEOUT * 0.000001f,
                    connection->waitQueue.count(),
                    (currentTime - connection->waitQueue.head->deliveryTime) * 0.000001f);
        }
    }
    return String8::empty();
}
    public void appNotRespondingViaProvider(IBinder connection) {
        enforceCallingPermission(
                android.Manifest.permission.REMOVE_TASKS, "appNotRespondingViaProvider()");

        final ContentProviderConnection conn = (ContentProviderConnection) connection;
        if (conn == null) {
            Slog.w(TAG, "ContentProviderConnection is null");
            return;
        }

        final ProcessRecord host = conn.provider.proc;
        if (host == null) {
            Slog.w(TAG, "Failed to find hosting ProcessRecord");
            return;
        }

        final long token = Binder.clearCallingIdentity();
        try {
            appNotResponding(host, null, null, false, "ContentProvider not responding");
        } finally {
            Binder.restoreCallingIdentity(token);
        }
    }

以上表現形式上的anr其總體上可以有下面這些情況:

首先anr主要是由於應用程序的不合理設計導致,其主要由一下這些方面引入:

  • 調用thread的join()方法、sleep()方法、wait()方法或者其他線程持有鎖或者其它線程終止或崩潰導致主線程等待超時;
  • service binder的數量達到上限,system server中發生WatchDog ANR,service忙導致超時無響應
  • 在主線程中做了非常耗時的操作:像耗時的網絡訪問,大量的數據讀寫,數據庫操作,硬件操作(比如camera),耗時的計算如操作位圖;

另外其他進程CPU佔用率過高,導致當前應用進程無法搶佔到CPU時間片。如文件讀寫頻繁,io進程CPU佔用率過高,導致當前應用出現ANR。

具體來說主要有一下情況:

  • 應用使用外設的有問題的驅動導致運行不穩定最終在應用層出現anr問題。
  • Kernel將用戶空間凍結導致任何程序都不能執行
  • I/O吞吐量低下導致應用程序長時間等待I/O
  • HAL層實時進程長時間佔用CPU導致調度隊列過長
  • AMS原生Bug導致系統焦點不能正確轉換

整體來說以上幾方面是由於系統原因,不能提供應用正常運行的時間保證導致。

注意以下方面以避免ANR

  • 避免在主線程進行復雜耗時的操作,特別是文件讀取或者數據庫操作;
  • 避免頻繁實時更新UI;
  • BroadCastReceiver 要進行復雜操作的的時候,可以在onReceive()方法中啓動一個Service來處理;
  • 避免在IntentReceiver裏啓動一個Activity,因爲它會創建一個新的畫面,並從當前用戶正在運行的程序上搶奪焦點。如果你的應用程序在響應Intent廣 播時需要向用戶展示什麼,你應該使用Notification Manager來實現。
  • 在設計及代碼編寫階段避免出現出現同步/死鎖或者錯誤處理不恰當等情況。

以上ANR產生的原因及類型基本介紹完畢,隨後看看如何來分析anr問題。

android anr問題分析之一

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章