【開發小記】 Java 線程池之被“喫掉”的線程異常（附源碼分析和解決方法）

前言

今天遇到了一個bug，現象是，一個任務放入線程池中，似乎“沒有被執行”，日誌也沒有打。

經過本地代碼調試之後，發現在任務邏輯的前半段，拋出了NPE，但是代碼外層沒有try-catch，導致這個異常被喫掉。

這個問題解決起來是很簡單的，外層加個try-catch就好了，但是這個異常如果沒有被catch，線程池內部邏輯是怎麼處理這個異常的呢？這個異常最後會跑到哪裏呢？

帶着疑問和好奇心，我研究了一下線程池那一塊的源碼，並且做了以下的總結。

源碼分析

項目中出問題的代碼差不多就是下面這個樣子

ExecutorService threadPool = Executors.newFixedThreadPool(3);

threadPool.submit(() -> {
    String pennyStr = null;
    Double penny = Double.valueOf(pennyStr);
    ...
})

先進到newFixedThreadPool這個工廠方法中看生成的具體實現類，發現是ThreadPoolExecutor

public static ExecutorService newFixedThreadPool(int nThreads) {
        return new ThreadPoolExecutor(nThreads, nThreads,
                                      0L, TimeUnit.MILLISECONDS,
                                      new LinkedBlockingQueue<Runnable>());
    }

再看這個類的繼承關係，

再進到submit方法，這個方法在ExecutorService接口中約定，其實是在AbstractExectorService中實現，ThreadPoolExecutor並沒有override這個方法。

 public Future<?> submit(Runnable task) {
        if (task == null) throw new NullPointerException();
        RunnableFuture<Void> ftask = newTaskFor(task, null);
        execute(ftask);
        return ftask;
    }

protected <T> RunnableFuture<T> newTaskFor(Runnable runnable, T value) {
        return new FutureTask<T>(runnable, value);
    }

對應的FutureTask對象的構造方法

public FutureTask(Runnable runnable, V result) {
        this.callable = Executors.callable(runnable, result);
        this.state = NEW;       // state由volatile 修飾 保證多線程下的可見性
    }

對應Callable 對象的構造方法

public static <T> Callable<T> callable(Runnable task, T result) {
        if (task == null)
            throw new NullPointerException();
        return new RunnableAdapter<T>(task, result);
    }

對應RunnableAdapter 對象的構造方法

 /**
     * A callable that runs given task and returns given result
     * 一個能執行所給任務並且返回結果的Callable對象
     */
    static final class RunnableAdapter<T> implements Callable<T> {
        final Runnable task;
        final T result;
        RunnableAdapter(Runnable task, T result) {
            this.task = task;
            this.result = result;
        }
        public T call() {
            task.run();
            return result;
        }
    }

總結上面的，newTaskFor就是把我們提交的Runnable 對象包裝成了一個Future。

接下來就是會把任務提交到隊列中給線程池調度處理：

public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
    
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        else if (!addWorker(command, false))
            reject(command);
    }

因爲主要關心的是這個線程怎麼執行，異常的拋出和處理，所以我們暫時不解析多餘的邏輯。很容易發現，如果任務要被執行，肯定是進到了addWorker方法當中，所以我們再進去看，鑑於addWorker方法的很長，不想列太多的代碼，我就摘了關鍵代碼段：

private boolean addWorker(Runnable firstTask, boolean core) {

   ...
   boolean workerStarted = false;
   boolean workerAdded = false;
   Worker w = null;
   try {
      // 實例化一個worker對象
      w = new Worker(firstTask);
      final Thread t = w.thread;
      if (t != null) {
          final ReentrantLock mainLock = this.mainLock;
          mainLock.lock();
          try {
            
              int rs = runStateOf(ctl.get());

              if (rs < SHUTDOWN ||
                  (rs == SHUTDOWN && firstTask == null)) {
                  if (t.isAlive()) // precheck that t is startable
                      throw new IllegalThreadStateException();
                  workers.add(w);
                  int s = workers.size();
                  if (s > largestPoolSize)
                      largestPoolSize = s;
                  workerAdded = true;
              }
          } finally {
              mainLock.unlock();
          }
          if (workerAdded) {
              // 從Worker對象的構造方法看，當這個thread對象start之後，
              // 之後實際上就是調用Worker對象的run()
              t.start();
              workerStarted = true;
          }
      }
   } finally {
      if (! workerStarted)
          addWorkerFailed(w);
   }
   return workerStarted;
}

// Worker的構造方法
  Worker(Runnable firstTask) {
            setState(-1); // inhibit interrupts until runWorker
            this.firstTask = firstTask;
            this.thread = getThreadFactory().newThread(this);
        }

我們再看這個ThreadPoolExecutor的內部類Worker對象：

private final class Worker
        extends AbstractQueuedSynchronizer
        implements Runnable
   {
        ...

        /** Delegates main run loop to outer runWorker  */
        public void run() {
            runWorker(this);
        }

      ...
   }

看來真正執行任務的是在這個外部的runWorker當中，讓我們再看看這個方法是怎麼消費Worker線程的。

final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        while (task != null || (task = getTask()) != null) {
            w.lock();
   
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                Throwable thrown = null;
                // ==== 關鍵代碼 start ====
                try {
                    // 很簡潔明瞭，調用了任務的run方法
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x; throw new Error(x);
                } finally {
                    afterExecute(task, thrown);
                }
                 // ==== 關鍵代碼 end ====
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
    }
}

終於走到底了，可以看到關鍵代碼中的try-catch block代碼塊中，調用了本次執行任務的run方法。

// ==== 關鍵代碼 start ====
try {
  // 很簡潔明瞭，調用了任務的run方法
  task.run();
} catch (RuntimeException x) {
  thrown = x; throw x;
} catch (Error x) {
  thrown = x; throw x;
} catch (Throwable x) {
  thrown = x; throw new Error(x);
} finally {
  afterExecute(task, thrown);
}
// ==== 關鍵代碼 end ====

可以看到捕捉了異常之後，會再向外拋出，只不過再finally block 中有個afterExecute()方法，似乎在這裏是可以處理這個異常信息的，進去看看

protected void afterExecute(Runnable r, Throwable t) { }

可以看到ThreadPoolExecutor#afterExecute()方法中，是什麼都沒做的，看來是讓使用者通過override這個方法來定製化任務執行之後的邏輯，其中可以包括異常處理。

那麼這個異常到底是拋到哪裏去了呢。我在一個大佬的文章找到了hotSpot JVM處理線程異常的邏輯，

if (!destroy_vm || JDK_Version::is_jdk12x_version()) {
    // JSR-166: change call from from ThreadGroup.uncaughtException to
    // java.lang.Thread.dispatchUncaughtException
    if (uncaught_exception.not_null()) {
      //如果有未捕獲的異常
      Handle group(this, java_lang_Thread::threadGroup(threadObj()));
      {
        KlassHandle recvrKlass(THREAD, threadObj->klass());
        CallInfo callinfo;
        KlassHandle thread_klass(THREAD, SystemDictionary::Thread_klass());
        /*  
         這裏類似一個方法表，實際就會去調用Thread#dispatchUncaughtException方法
         template(dispatchUncaughtException_name,            "dispatchUncaughtException")                
        */
        LinkResolver::resolve_virtual_call(callinfo, threadObj, recvrKlass, thread_klass,
                                           vmSymbols::dispatchUncaughtException_name(),
                                           vmSymbols::throwable_void_signature(),
                                           KlassHandle(), false, false, THREAD);
        CLEAR_PENDING_EXCEPTION;
        methodHandle method = callinfo.selected_method();
        if (method.not_null()) {
          JavaValue result(T_VOID);
          JavaCalls::call_virtual(&result,
                                  threadObj, thread_klass,
                                  vmSymbols::dispatchUncaughtException_name(),
                                  vmSymbols::throwable_void_signature(),
                                  uncaught_exception,
                                  THREAD);
        } else {
          KlassHandle thread_group(THREAD, SystemDictionary::ThreadGroup_klass());
          JavaValue result(T_VOID);
          JavaCalls::call_virtual(&result,
                                  group, thread_group,
                                  vmSymbols::uncaughtException_name(),
                                  vmSymbols::thread_throwable_void_signature(),
                                  threadObj,           // Arg 1
                                  uncaught_exception,  // Arg 2
                                  THREAD);
        }
        if (HAS_PENDING_EXCEPTION) {
          ResourceMark rm(this);
          jio_fprintf(defaultStream::error_stream(),
                "\nException: %s thrown from the UncaughtExceptionHandler"
                " in thread \"%s\"\n",
                pending_exception()->klass()->external_name(),
                get_thread_name());
          CLEAR_PENDING_EXCEPTION;
        }
      }
    }

代碼是C寫的，有興趣可以去全文，根據英文註釋能稍微看懂一點

http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/runtime/thread.cpp

可以看到這裏最終會去調用Thread#dispatchUncaughtException方法:

/**
     * Dispatch an uncaught exception to the handler. This method is
     * intended to be called only by the JVM.
     */
    private void dispatchUncaughtException(Throwable e) {
        getUncaughtExceptionHandler().uncaughtException(this, e);
    }

/**
 * Called by the Java Virtual Machine when a thread in this
 * thread group stops because of an uncaught exception, and the thread
 * does not have a specific {@link Thread.UncaughtExceptionHandler}
 * installed.
 *
 */
public void uncaughtException(Thread t, Throwable e) {
        if (parent != null) {
            parent.uncaughtException(t, e);
        } else {
            Thread.UncaughtExceptionHandler ueh =
                Thread.getDefaultUncaughtExceptionHandler();
            if (ueh != null) {
                ueh.uncaughtException(t, e);
            } else if (!(e instanceof ThreadDeath)) {
               //可以看到會打到System.err裏面
                System.err.print("Exception in thread \""
                                 + t.getName() + "\" ");
                e.printStackTrace(System.err);
            }
        }
    }

jdk的註釋也說明的很清楚了，當一個線程拋出了一個未捕獲的異常，JVM會去調用這個方法。如果當前線程沒有聲明UncaughtExceptionHandler成員變量並且重寫uncaughtException方法的時候，就會看線程所屬的線程組（如果有線程組的話）有沒有這個類，沒有就會打到System.err裏面。

IBM這篇文章也提倡我們使用ThreadGroup 提供的 uncaughtException 處理程序來在線程異常終止時進行檢測。

https://www.ibm.com/developerworks/cn/java/j-jtp0924/index.html

總結（解決方法）

從上述源碼分析中可以看到，對於本篇的異常“被喫掉”的問題，有以下幾種方法

用try-catch 捕捉，一般都是用這種

線程或者線程組對象設置UncaughtExceptionHandler成員變量

  Thread t = new Thread(r);
            t.setUncaughtExceptionHandler(
                (t1, e) -> LOGGER.error(t1 + " throws exception: " + e));
            return t;

override 線程池的afterExecute方法。

本篇雖然是提出問題的解決方法，但主旨還是分析源碼，瞭解了整個過程中異常的經過的流程，希望能對您產生幫助。

【開發小記】 Java 線程池之被“喫掉”的線程異常（附源碼分析和解決方法）

前言

源碼分析

總結（解決方法）

參考

python gdal 安裝使用（Windows， python 3.6.8）

Java爬蟲（七）-- httpClient進階: https 和證書認證（講故事篇）

【開發小記】 Java 線程池之被“喫掉”的線程異常（附源碼分析和解決方法）

Java 多線程 - CAS

【開發筆記】Spring + websocket 實現服務端推送消息（附幾個坑）

微服務雜談 - 爲什麼大公司一定要使用微服務

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【開發小記】 Java 線程池 之 被“喫掉”的線程異常（附源碼分析和解決方法）

前言

源碼分析

總結 （解決方法）

參考

【開發小記】 Java 線程池之被“喫掉”的線程異常（附源碼分析和解決方法）

總結（解決方法）