Alibaba-Sentinel核心源碼淺析

Alibaba-Sentinel

兩種熔斷降級框架的對比

Sentinel的總體流程

在Sentinel中，如果要對模塊一塊代碼進行限流、熔斷等，需要定義一個資源，然後將要限流、熔斷的代碼塊包裹起來。

在 Sentinel 裏面，所有的資源都對應一個資源名稱（resourceName），每次資源調用都會創建一個 Entry 對象。Entry 可以通過對主流框架的適配自動創建，也可以通過註解的方式或調用 SphU API 顯式創建。Entry 創建的時候，同時也會創建一系列功能插槽（slot chain），以使用SphU.entry(String name)來獲取資源的例子進行分析

// SphU的entry方法 實際上是調用了CtSph的entry方法
public static Entry entry(String name) throws BlockException {
        return Env.sph.entry(name, EntryType.OUT, 1, OBJECTS0);
}

最終調用了CtSph的entryWithPriority方法，由這裏我們可以看出，每次調用SphU.entry都會創建一個Entry入口。

private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
        throws BlockException {
        // 1.從contextHolder(基於ThreadLocal的實現)獲取context
        Context context = ContextUtil.getContext();
        if (context instanceof NullContext) {
            // The {@link NullContext} indicates that the amount of context has exceeded the threshold,
            // so here init the entry only. No rule checking will be done.
            return new CtEntry(resourceWrapper, null, context);
        }

        if (context == null) {
            // 2.這個方法會創建獲取一個默認context，意味着如果不指定context，多線程時所有的線程獲取的也是這個默            // 認context,這個默認context叫Constants.CONTEXT_DEFAULT_NAME
            context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
        }

        // Global switch is close, no rule checking will do.
        if (!Constants.ON) {
            return new CtEntry(resourceWrapper, null, context);
        }
        
        // 3.尋找slotchain，slotchain就是利用責任鏈模式實現sentinel功能的核心
        ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);

        /*
         * Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
         * so no rule checking will be done.
         */
        if (chain == null) {
            return new CtEntry(resourceWrapper, null, context);
        }
        // 創建了Entry入口，每次調用SphU都會創建Entry入口，並把entry設置到context裏
        // 後續再說context裏的內容及其數據結構
        Entry e = new CtEntry(resourceWrapper, chain, context);
        try {
            // 開始進入了責任鏈，進行sentinel的功能
            chain.entry(context, resourceWrapper, null, count, prioritized, args);
        } catch (BlockException e1) {
            e.exit(count, args);
            throw e1;
        } catch (Throwable e1) {
            // This should not happen, unless there are errors existing in Sentinel internal.
            RecordLog.info("Sentinel unexpected exception", e1);
        }
        return e;
    }

上面的方法構造了一個如下的初始化Context，記住每個線程獲取資源前沒有context時都會new一個Context，是不同的對象，但是他們的contextName也許是一樣。entranceNode是和contextName關聯的，所以同一個contextName只會有一個entranceNode，從後續的分析可以知道，entranceNode下掛着一個DefaultNode，這個DefaultNode也是和contextName對應的，而且DefaultNode是繼承自StatisticNode,它裏面可以保存一些流量數據，因爲contextName和defaultNode是一對一關係，所以可以用給context設置流控規則，其實就是利用的EntranceNode下的defaultNode。而給每個資源設置的流控規則，則是這個clusterNode。詳細可見後面的各種slot。

創建過濾鏈的代碼

ProcessorSlot<Object> lookProcessChain(ResourceWrapper resourceWrapper) {
        // ResourceWrapper的hashCode和equals方法都是根據name重寫的，所以同一個資源名稱就有相同的chain
        ProcessorSlotChain chain = chainMap.get(resourceWrapper);
        if (chain == null) {
            synchronized (LOCK) {
                chain = chainMap.get(resourceWrapper);
                if (chain == null) {
                    // Entry size limit.
                    if (chainMap.size() >= Constants.MAX_SLOT_CHAIN_SIZE) {
                        return null;
                    }
                    // 這個方法構建了chain
                    chain = SlotChainProvider.newSlotChain();
                    Map<ResourceWrapper, ProcessorSlotChain> newMap = new HashMap<ResourceWrapper, ProcessorSlotChain>(
                        chainMap.size() + 1);
                    newMap.putAll(chainMap);
                    newMap.put(resourceWrapper, chain);
                    chainMap = newMap;
                }
            }
        }
        return chain;
    }

這些插槽有不同的職責，例如:

NodeSelectorSlot 負責收集資源的路徑，並將這些資源的調用路徑，以樹狀結構存儲起來，用於根據調用路徑來限流降級；
ClusterBuilderSlot 則用於存儲資源的統計信息以及調用者信息，例如該資源的 RT, QPS, thread count 等等，這些信息將用作爲多維度限流，降級的依據；
StatisticSlot 則用於記錄、統計不同緯度的 runtime 指標監控信息；
FlowSlot 則用於根據預設的限流規則以及前面 slot 統計的狀態，來進行流量控制；
AuthoritySlot 則根據配置的黑白名單和調用來源信息，來做黑白名單控制；
DegradeSlot 則通過統計信息以及預設的規則，來做熔斷降級；
SystemSlot 則通過系統的狀態，例如 load1 等，來控制總的入口流量；

通過slot chain，實現了Sentinel的限流、熔斷、降級、系統保護等功能。

NodeSelectorSlot

NodeSelectorSlot 是用來構造調用鏈的，具體的是將資源(Resource,Sentinel的原理是進入你要保護的代碼或接口前要獲取一下資源，以資源獲取的qps爲限流標準)的調用路徑，封裝成一個一個的節點，再組成一個樹狀的結構來形成一個完整的調用鏈。

@Override
    public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        // 通一個context對應的DefaultNode是一樣的，先根據contextName獲取DefaultNode
        DefaultNode node = map.get(context.getName());
        if (node == null) {
            synchronized (this) {
                node = map.get(context.getName());
                if (node == null) {
                    node = new DefaultNode(resourceWrapper, null);
                    HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
                    cacheMap.putAll(map);
                    cacheMap.put(context.getName(), node);
                    map = cacheMap;
                    // 構造調用樹
                    ((DefaultNode) context.getLastNode()).addChild(node);
                }

            }
        }

        context.setCurNode(node);
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

當不同Context進入相同的資源時，DefaultNode的是不相同的，Entry是不同的。DefaultNode也可以存儲一個上下文的qps，用於流控。

相同Context下進入不同的資源時如下,即在同一Context下調用SphU.entry(resourceName)，進入不同的資源

ClusterBuilderSlot

ClusterBuilderSlot的作用就是根據資源找到或創建ClusterNode，並設置到DefaultNode下。ClusteNode是每個資源都對應唯一一個，同一資源的所有統計數據都是在ClusteNode裏。因爲DefaultNode在同一Context中是相同的，所以同一個Context中，訪問不同的資源，在此Slot裏面會將DefaultNode的ClusteNode屬性改變。

StatisticSlot

這個Slot用於統計資源的qps，主要是對上述的ClusterNode進行操作，ClusterNode又是繼承自StatisticNode。看一下StatisticNode.addPassRequest方法。

public void addPassRequest(int count) {
        rollingCounterInSecond.addPass(count);
        rollingCounterInMinute.addPass(count);
}

有兩種刻度，分別是秒和分鐘，這裏的統計方法是利用滑動窗口模式進行統計的。

/**  通過點擊追蹤構造函數，發現其是一個LeapArray的包裝類，看了LeapArray的構造方法，得出這幾個參數的含義
     * Holds statistics of the recent {@code INTERVAL} seconds. The {@code INTERVAL} is divided into time spans
     * by given {@code sampleCount}.sampleCount代表窗口數量，INTERVAL代表總時間，也就是這個數組能保留INTERVAL時間的流量數據，且按INTERVAL / sampleCount的個窗口劃分該時段的流量數據。
     */
    private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);

    /**
     * Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
     * meaning each bucket per second, in this way we can get accurate statistics of each second.60是窗口數， 60 * 1000是數組能保留的總時間， 60 * 1000 / 60 則是窗口的大小即窗口大小爲一分鐘
     */
    private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);

LeapArray是一個滑動窗口的實現類，可以用它獲取一個當前窗口。下面分析一些它的滑動窗口代碼

public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }
        // 算出當前時間對應窗口數組的哪個位置，用當前時間除以窗口大小，再向數組大小取模可以得到
        int idx = calculateTimeIdx(timeMillis);
        // 計算出當前窗口的起始時間
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                // 如果窗口不存在則創建窗口
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    // CAS保證線程安全
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                // 還是舊窗口，直接返回
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) {
                // 需要覆蓋舊窗口
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    // 窗口更新是競爭操作，需要加鎖
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

整個獲取滑動窗口的代碼還是比較好理解的，分爲上面的三種情況，還有一種在代碼健壯時不可能發生的情況。

Sentinel與Feign結合使用的原理

SentinelFeignAutoConfiguration自動配置類會在feign.sentinel.enabled爲true時創建動態代理。
SentinelInvocationHandler 是代理類，下面看部分源碼。

Alibaba-Sentinel核心源碼淺析