java中不太常見的東西(4) - Fork/Join

引言

《java中不太常見的東西》這個模塊已經好久沒寫了，今天寫一個java中自帶的分佈式處理方式Fork/Join。Fork/Join在JDK1.7的時候引入，它某種程度上可以實現簡單的map-reduce操作。筆者目前整理的一些blog針對面試都是超高頻出現的。大家可以點擊鏈接：http://blog.csdn.net/u012403290。

技術點

1、map-reduce
處理大數據的編程模型，分爲”Map（映射）”和”Reduce（歸約）”兩部分。應用於分佈式編程的情況，可以儘可能提升運算效率和速度。通俗來說就是把一個很大的任務，拆分爲很多小任務，然後有各自的線程去處理這些小任務，最後把結果統一起來。

2、產生背景
其實Fork/Join處理一定程度的數據，核心建立於目前水平發展的多核計算機技術，它表達了一種充分利用資源的概念。在如今的計算機領域多核處理器早已是主流，而且併發編程講究多線程處理問題，對計算機資源利用達到一個新的高度。

Fork/Join結構

正確的使用Fork/Join框架，需要一定熟悉它的結構，對於一個分佈式的任務，必然具備兩種條件：①任務調度；②任務執行。在Fork/Join中，我們主要用它自定義的線程池來提交任務和調度任務，稱之爲：ForkJoinPool；同時我們有它自己的任務執行類，稱之爲：ForkJoinTask。

不過我們不直接使用ForkJoinTask來直接執行和分解任務，我們一般都使用它的兩個子類，RecursiveAction和RecursiveTask，其中，前者主要處理沒有返回結果的任務，後者主要處理有返回結果的任務。總結一下，一下就是Fork/Join的基本模型：

接下來我們一部分一部分來分析一下他們各自的結構：

①ForkJoinPool：
網上很多解釋ForkJoinPool的源碼已經非常老了，在JDK1.8中已經不再繼續維護ForkJoinTask和ForkJoinWorkerThread這兩個數組了，前者是一個個任務，後者是執行任務的線程。它現在的模式是形成了一個內部類：WorkQueue，下面是它在JDK1.8中的源碼：

  /**
     * Queues supporting work-stealing as well as external task
     * submission. See above for descriptions and algorithms.
     * Performance on most platforms is very sensitive to placement of
     * instances of both WorkQueues and their arrays -- we absolutely
     * do not want multiple WorkQueue instances or multiple queue
     * arrays sharing cache lines. The @Contended annotation alerts
     * JVMs to try to keep instances apart.
     */
    @sun.misc.Contended
    static final class WorkQueue {

        // Instance fields
        volatile int scanState;    // versioned, <0: inactive; odd:scanning
        int stackPred;             // pool stack (ctl) predecessor
        int nsteals;               // number of steals
        int hint;                  // randomization and stealer index hint
        int config;                // pool index and mode
        volatile int qlock;        // 1: locked, < 0: terminate; else 0
        volatile int base;         // index of next slot for poll
        int top;                   // index of next slot for push
        ForkJoinTask<?>[] array;   // the elements (initially unallocated)
        final ForkJoinPool pool;   // the containing pool (may be null)
        final ForkJoinWorkerThread owner; // owning thread or null if shared
        volatile Thread parker;    // == owner during call to park; else null
        volatile ForkJoinTask<?> currentJoin;  // task being joined in awaitJoin
        volatile ForkJoinTask<?> currentSteal; // mainly used by helpStealer

    }

仔細閱讀源碼我們發現，現在的結構和原來完全不一樣了。本來我們需要從ForkJoinTask數組中把任務分發給ForkJoinWorkerThread來執行。而現在，用一個內部類workQueue來完成這個任務，在workQueue中存在一個ForkJoinWorkerThread表示這個隊列的執行者，同時在workQueue的成員變量中，我們發現有一個ForkJoinTask數組，這個數組是這個Thread需要執行的任務。

閱讀這個內部類的描述，我們發現這個queue還支持線程的任務竊取，什麼叫線程的任務竊取呢？就是說你和你的一個夥伴一起吃水果，你的那份吃完了，他那份沒吃完，那你就偷偷的拿了他的一些水果吃了。存在執行2個任務的子線程，這裏要講成存在A,B兩個個WorkQueue在執行任務，A的任務執行完了，B的任務沒執行完，那麼A的WorkQueue就從B的WorkQueue的ForkJoinTask數組中拿走了一部分尾部的任務來執行，可以合理的提高運行和計算效率。

我們不深入瞭解源碼，這並不是這篇博文的本意。接下來我們看看ForkJoinPool中提交任務的幾個方法：

a、submit

    /**
     * Submits a ForkJoinTask for execution.
     *
     * @param task the task to submit
     * @param <T> the type of the task's result
     * @return the task
     * @throws NullPointerException if the task is null
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     */
    public <T> ForkJoinTask<T> submit(ForkJoinTask<T> task) {
        if (task == null)
            throw new NullPointerException();
        externalPush(task);
        return task;
    }

b、execute

    /**
     * Arranges for (asynchronous) execution of the given task.
     *
     * @param task the task
     * @throws NullPointerException if the task is null
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     */
    public void execute(ForkJoinTask<?> task) {
        if (task == null)
            throw new NullPointerException();
        externalPush(task);
    }

c、invoke

    /**
     * Performs the given task, returning its result upon completion.
     * If the computation encounters an unchecked Exception or Error,
     * it is rethrown as the outcome of this invocation.  Rethrown
     * exceptions behave in the same way as regular exceptions, but,
     * when possible, contain stack traces (as displayed for example
     * using {@code ex.printStackTrace()}) of both the current thread
     * as well as the thread actually encountering the exception;
     * minimally only the latter.
     *
     * @param task the task
     * @param <T> the type of the task's result
     * @return the task's result
     * @throws NullPointerException if the task is null
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     */
    public <T> T invoke(ForkJoinTask<T> task) {
        if (task == null)
            throw new NullPointerException();
        externalPush(task);
        return task.join();
    }

這3種任務提交方法還是有所差別的，在submit中提交了一個任務之後，會異步開始執行任務同時返回這個任務，而 execute會異步執行這個任務但是沒有任何返回。而invoke會異步開始執行任務，直接返回一個結果。

②ForkJoinTask：
在ForkJoinTask中我們就簡單介紹fork和join這兩種操作，以下是fork方法的源碼：

    // public methods

    /**
     * Arranges to asynchronously execute this task in the pool the
     * current task is running in, if applicable, or using the {@link
     * ForkJoinPool#commonPool()} if not {@link #inForkJoinPool}.  While
     * it is not necessarily enforced, it is a usage error to fork a
     * task more than once unless it has completed and been
     * reinitialized.  Subsequent modifications to the state of this
     * task or any data it operates on are not necessarily
     * consistently observable by any thread other than the one
     * executing it unless preceded by a call to {@link #join} or
     * related methods, or a call to {@link #isDone} returning {@code
     * true}.
     *
     * @return {@code this}, to simplify usage
     */
    public final ForkJoinTask<V> fork() {
        Thread t;
        if ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread)
            ((ForkJoinWorkerThread)t).workQueue.push(this);//把當前線程添加到workQueue中
        else
            ForkJoinPool.common.externalPush(this);//直接執行這個任務
        return this;
    }

在fork方法中，它會先判斷當前的線程是否屬於ForkJoinWorkerThread線程，如果屬於這個線程，那麼就把線程添加到workQueue中，否則就直接執行這個任務。

以下是join方法：

    /**
     * Returns the result of the computation when it {@link #isDone is
     * done}.  This method differs from {@link #get()} in that
     * abnormal completion results in {@code RuntimeException} or
     * {@code Error}, not {@code ExecutionException}, and that
     * interrupts of the calling thread do <em>not</em> cause the
     * method to abruptly return by throwing {@code
     * InterruptedException}.
     *
     * @return the computed result
     */
    public final V join() {
        int s;
        if ((s = doJoin() & DONE_MASK) != NORMAL)//判斷任務是否正常，否則要報告異常
            reportException(s);
        return getRawResult();//返回結果
    }



 /**
     * Implementation for join, get, quietlyJoin. Directly handles
     * only cases of already-completed, external wait, and
     * unfork+exec.  Others are relayed to ForkJoinPool.awaitJoin.
     *
     * @return status upon completion
     */
    private int doJoin() {
        int s; Thread t; ForkJoinWorkerThread wt; ForkJoinPool.WorkQueue w;
        return (s = status) < 0 ? s :
            ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) ?
            (w = (wt = (ForkJoinWorkerThread)t).workQueue).
            tryUnpush(this) && (s = doExec()) < 0 ? s :
            wt.pool.awaitJoin(w, this, 0L) :
            externalAwaitDone();
    }

    final int doExec() {
        int s; boolean completed;
        if ((s = status) >= 0) {
            try {
                completed = exec();
            } catch (Throwable rex) {
                return setExceptionalCompletion(rex);
            }
            if (completed)
                s = setCompletion(NORMAL);//如果任務執行完了，那麼就設置爲NORMAL
        }
        return s;
    }

在join的操作主要是判斷當前任務的執行狀態和返回結果，任務狀態有四種：已完成（NORMAL），被取消（CANCELLED），信號（SIGNAL）和出現異常（EXCEPTIONAL）。
在doJoin()方法裏，首先通過查看任務的狀態，通過doExec方法去判斷任務是否執行完畢，如果執行完了，則直接返回任務狀態，如果沒有執行完，就等待繼續執行。如果任務順利執行完成了，則設置任務狀態爲NORMAL，如果出現異常，則需要報告異常。

用代碼實現Fork/Join實現大數據計算

如果真的要很詳細的去介紹Fork/join源碼，貌似需要更進一步的去鑽研，很多底層的的東西還涉及到了一些樂觀鎖。我們不繼續深究了，我們嘗試用fork/join來實現大數列的計算，同時我們嘗試把它和一般的計算方式做比較，看看哪個效率更高。

需求：
計算1+2+3+……..+N的和

以下是我實現的用Fork/Join進行計算，主要的核心思想就是把超大的計算拆分爲小的計算，通俗來說就是把一個極大的任務拆分爲很多個小任務，下面是核心計算模型：

下面是代碼實現：

package com.brickworkers;

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class FockJoinTest extends RecursiveTask<Long>{//繼承RecursiveTask來實現
    //設立一個最大計算容量
    private final int DEFAULT_CAPACITY = 10000;


    //用2個數字表示目前要計算的範圍
    private int start;

    private int end;

    public FockJoinTest(int start, int end) {
        this.start = start;
        this.end = end;
    }

    @Override
    protected Long compute() {//實現compute方法
        //分爲兩種情況進行出來
        long sum = 0;
        //如果任務量在最大容量之內
        if(end - start < DEFAULT_CAPACITY){
            for (int i = start; i < end; i++) {
                sum += i;
            }
        }else{//如果超過了最大容量，那麼就進行拆分處理
            //計算容量中間值
            int middle = (start + end)/2;
            //進行遞歸
            FockJoinTest fockJoinTest1 = new FockJoinTest(start, middle);
            FockJoinTest fockJoinTest2 = new FockJoinTest(middle + 1, end);
            //執行任務
            fockJoinTest1.fork();
            fockJoinTest2.fork();
            //等待任務執行並返回結果
            sum = fockJoinTest1.join() + fockJoinTest2.join();
        }

        return sum;
    }


    public static void main(String[] args) {

        ForkJoinPool forkJoinPool = new ForkJoinPool();
        FockJoinTest fockJoinTest = new FockJoinTest(1, 100000000);
        long fockhoinStartTime = System.currentTimeMillis();
        //前面我們說過，任務提交中invoke可以直接返回結果
        long result = forkJoinPool.invoke(fockJoinTest);
        System.out.println("fock/join計算結果耗時"+(System.currentTimeMillis() - fockhoinStartTime));

        long sum = 0;
        long normalStartTime = System.currentTimeMillis();
        for (int i = 0; i < 100000000; i++) {
            sum += i;
        }
        System.out.println("普通計算結果耗時"+(System.currentTimeMillis() - normalStartTime));
    }

}


//執行結果：
//fock/join計算結果耗時33
//普通計算結果耗時141

注意，在上面的例子中，程序的效率其實首你設置的DEFAULT_CAPACITY影響的，如果你把這個容量值設置的太小，那麼它會被分解成好多好多的子任務，那麼效率反而會降低。但是把容量設置的稍微大一些效率也會相對的提升，經過測試，運行時間和DEFAULT_CAPCITY的關係大致如下圖:

尾記

在我們的日常開發中，很多地方可以用分佈式的方式去實現它，當然了這個是要建立你在資源很富餘的情況之下。比如說，定時任務，半夜執行的時候，資源富足，那麼我們可以用這種方式加快運算效率。再比如說，項目報表文件的導出，我們可以把超級多行的數據一部分一部分拆開出來，也可以達到加快效率的效果。大家可以嘗試。

希望對你有所幫助。

java中不太常見的東西(4) - Fork/Join

引言

技術點

Fork/Join結構

用代碼實現Fork/Join實現大數據計算

尾記

Win10 LTSC 2019 安裝後的一些步驟

Python 潮流週刊#52：Python 處理 Excel 的資源

Mysql你必須知道的查詢語句

java實現排序(4)-堆排序

深入淺出LinkedList與ArrayList

java實現排序(3)-希爾排序

java中不太常見的東西(2) - Lambda表達式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結