聯童大數據調度平臺之路

各位聯童 IT MAN 大家好！列車長近日收到一篇來自大數據團隊 - 張永清同學的原創投稿，這位多本暢銷書的作者今天爲大家分享了聯童基於 incubator-dolphinscheduler 從 0 到 1 構建大數據調度平臺的歷程。

聯童是一家智能化母嬰童產業平臺，從事母嬰童行業以及互聯網技術多年，擁有豐富的母嬰門店運營和系統開發經驗，在會員經營和商品經營方面，能夠圍繞會員需求，深入場景，更貼近合作伙伴和消費者，提供最優服務產品。公司致力於以技術來驅動母嬰童產業的發展，也希望藉助於大數據爲客戶提供更多智能數據分析和決策分析，大數據是公司重點發展的一部分，公司從成立初期起就搭建了大數據團隊，有了大數據團隊後，大數據調度平臺的構建自然是最基礎也是最重要的環節。

爲什麼選擇 incubator-dolphinscheduler

1、incubator-dolphinscheduler 是一個由國內公司發起的開源項目，中國本土社區成員非常活躍，更加容易去進行社區溝通，同時聯童也希望能加入到這個社區中，一起把這個由本土成員爲主成立的社區做的更好。

2、incubator-dolphinscheduler 能夠支撐非常多的應用場景

· 以DAG圖的方式將Task按照任務的依賴關係關聯起來，可實時可視化監控任務的運行狀態

· 支持豐富的任務類型：Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql), Python, Sub_Process、Procedure，flink，datax，sqoop，http等

· 支持工作流定時調度、依賴調度、手動調度、手動暫停/停止/恢復，同時支持失敗重試/告警、從指定節點恢復失敗、Kill任務等操作

· 支持工作流優先級、任務優先級及任務的故障轉移及任務超時告警/失敗

· 支持工作流全局參數及節點自定義參數設置

· 支持資源文件的在線上傳/下載，管理等，支持在線文件創建、編輯

· 支持任務日誌在線查看及滾動、在線下載日誌等

· 實現集羣 HA，通過 Zookeeper實現Master集羣和Worker集羣去中心化

· 支持對 Master/Worker，cpu load，memory，cpu 在線查看

· 支持工作流運行歷史樹形/甘特圖展示、支持任務狀態統計、流程狀態統計

· 支持補數

· 支持多租戶

· 支持國際化

其中 DAG 圖在 dolphinscheduler 一個工作流可以對應多個工作任務，每一個工作任務對應一個 DAG 中的節點。

3、incubator-dolphinscheduler在保證了高併發和高可用的設計時，架構思路也相對簡單，技術架構中沒有引入非常多的複雜技術組件，降低了學習和維護的成本。

備註：此架構圖摘自社區官方網站

incubator-dolphinscheduler 在設計時，除了 zookeeper 外，沒有引入太多複雜的技術組件。整個架構以 zookeeper 作爲集羣管理，採用去中心化思想進行設計。

incubator-dolphinscheduler 功能的不足

1、無法支持串行調度策略

incubator-dolphinscheduler 在一開始設計時，只支持並行調度，不支持串行調度，而在聯童中，大部分場景都是需要串行運行的，也就是每一個工作流任務都只能有一個實例在運行，同一個工作流任務中必須要等前一個實例執行結束，下一個實例才能開始執行，這種場景大多出現在準實時任務中。

2、任務依賴不夠強大，只能支持被動等待依賴執行成功，無法主動觸發下游工作流實例運行

如下圖所示，只能支持在創建任務時，被動去等待依賴執行成功，無法在當前任務執行成功後，主動去觸發別的工作流任務執行。

3、部分模塊中用戶體驗不足，並且在數據量大時，部分模塊數據查詢性能較慢

4、缺少比較完備的監控體系

在 incubator-dolphinscheduler 只提供了一些簡單的監控，當有多大幾千個任務在運行時，很難做到完備監控，更是缺少對每一個任務運行的性能分析。

我們對於 incubator-dolphinscheduler 的功能升級開發

1、增加串行調度的支持

如下圖所示，我們在原有並行執行的基礎上，增加了串行執行方式。

在串行執行時，我們還增加了串行執行的隊列功能，每一任務都可以指定隊列的長度大小。

2、增加下游工作流實例運行

如下圖所示，我們在原有並行執行的基礎上，增加主動觸發下游一個或者多個工作流實例運行。

運行後效果如下：

3、一些較大的 Bug 修復

聯童在使用 incubator-dolphinscheduler 時，也踩過不少坑，這裏我們舉其中一個例子，比如在內部使用時，同事反饋最多的問題就是調度任務的日誌刷新不及時，有時候很久才能刷新出日誌。後來經過源碼分析，發現是源碼中存在了一些不太健壯的處理導致了這個問題。

incubator-dolphinscheduler 中 AbstractCommandExecutor.java 部分源碼

/**

* abstract command executor

public abstract class AbstractCommandExecutor {

..........

/**

* build process

* @param commandFile command file

* @throws IOException IO Exception

private void buildProcess(String commandFile) throws IOException {

// setting up user to run commands

List<String> command = new LinkedList<>();

//init process builder

ProcessBuilder processBuilder = new ProcessBuilder();

// setting up a working directory

processBuilder.directory(new File(taskExecutionContext.getExecutePath()));

// merge error information to standard output stream

processBuilder.redirectErrorStream(true);

// setting up user to run commands

command.add("sudo");

command.add("-u");

command.add(taskExecutionContext.getTenantCode());

command.add(commandInterpreter());

command.addAll(commandOptions());

command.add(commandFile);

// setting commands

processBuilder.command(command);

process = processBuilder.start();

// print command

printCommand(command);

}

..........

/**

* get the standard output of the process

* @param process process

private void parseProcessOutput(Process process) {

String threadLoggerInfoName = String.format(LoggerUtils.TASK_LOGGER_THREAD_NAME + "-%s", taskExecutionContext.getTaskAppId());

ExecutorService parseProcessOutputExecutorService = ThreadUtils.newDaemonSingleThreadExecutor(threadLoggerInfoName);

parseProcessOutputExecutorService.submit(new Runnable() {

@Override

public void run() {

BufferedReader inReader = null;

try {

inReader = new BufferedReader(new InputStreamReader(process.getInputStream()));

String line;

long lastFlushTime = System.currentTimeMillis();

while ((line = inReader.readLine()) != null) {

if (line.startsWith("${setValue(")) {

varPool.append(line.substring("${setValue(".length(), line.length() - 2));

varPool.append("$VarPool$");

} else {

logBuffer.add(line);

lastFlushTime = flush(lastFlushTime);

}

} catch (Exception e) {

logger.error(e.getMessage(), e);

} finally {

clear();

close(inReader);

}

});

parseProcessOutputExecutorService.shutdown();

}

................

/**

* when log buffer siz or flush time reach condition , then flush

* @param lastFlushTime last flush time

* @return last flush time

private long flush(long lastFlushTime) {

long now = System.currentTimeMillis();

/**

* when log buffer siz or flush time reach condition , then flush

if (logBuffer.size() >= Constants.DEFAULT_LOG_ROWS_NUM || now - lastFlushTime > Constants.DEFAULT_LOG_FLUSH_INTERVAL) {

lastFlushTime = now;

/** log handle */

logHandler.accept(logBuffer);

logBuffer.clear();

}

return lastFlushTime;

}

/**

* close buffer reader

* @param inReader in reader

private void close(BufferedReader inReader) {

if (inReader != null) {

try {

inReader.close();

} catch (IOException e) {

logger.error(e.getMessage(), e);

}

protected List<String> commandOptions() {

return Collections.emptyList();

}

protected abstract String buildCommandFilePath();

protected abstract String commandInterpreter();

protected abstract void createCommandFileIfNotExists(String execCommand, String commandFile) throws IOException;

}

在這段源碼中，parseProcessOutput(Process process) 方法是負責任務日誌的獲取以及 Flush。但是由於採用了 BufferedReader 中的 readLine() 方法來讀取任務進程的process.getInputStream() 日誌，由於 readLine() 是一個阻塞方法，

flush(long lastFlushTime) 方法在處理時有一個判斷條件 if (logBuffer.size() >= Constants.DEFAULT_LOG_ROWS_NUM || now - lastFlushTime > Constants.DEFAULT_LOG_FLUSH_INTERVAL)，只有當日志條數達到 64 條或者間隔 1s 時纔會 flush。按理說，代碼其實是要實現至少每隔 1s 會 flash 一次日誌，但是由於 readLine() 是一個阻塞方法，所以並不會一直在執行，而是 readLine() 必須是讀取到新數據後，纔會執行flush 方法。那麼在出現 1s 內產生的任務日誌不滿足 64 條，而任務又很久沒有新日誌出現時，就會觸發這個 bug。例如執行如下一個 shell 腳本任務，由於每個執行步驟產生的日誌少，而且每個步驟執行的時間又很久，時間間隔很大，就會出現很久都不會刷新上一次產生的日誌。

#!/bin/bash

echo "hello world"

exec 10m

sleep 100000s

echo "hello world2"

exec 10m

sleep 100000s

echo "hello world3"

exec 10m

sleep 100000s　

之後我們對這段源碼進行了重寫，採用了兩個線程進行處理，一個線程負責readline()，一個線程負責 flush 做到在 readline() 方法的線程阻塞時，不影響 flush 線程的處理。我們也把修改後的代碼貢獻給了社區，已被 merge 到 dev 分支。

public abstract class AbstractCommandExecutor {

/**

* rules for extracting application ID

protected static final Pattern APPLICATION_REGEX = Pattern.compile(Constants.APPLICATION_REGEX);

/**

* process

private Process process;

/**

* log handler

protected Consumer<List<String>> logHandler;

/**

* logger

protected Logger logger;

/**

* log list

protected final List<String> logBuffer;

protected boolean logOutputIsScuccess = false;

/**

* taskExecutionContext

protected TaskExecutionContext taskExecutionContext;

/**

* taskExecutionContextCacheManager

private TaskExecutionContextCacheManager taskExecutionContextCacheManager;

.........

/**

* get the standard output of the process

* @param process process

private void parseProcessOutput(Process process) {

String threadLoggerInfoName = String.format(LoggerUtils.TASK_LOGGER_THREAD_NAME + "-%s", taskExecutionContext.getTaskAppId());

ExecutorService getOutputLogService = ThreadUtils.newDaemonSingleThreadExecutor(threadLoggerInfoName + "-" + "getOutputLogService");

getOutputLogService.submit(() -> {

BufferedReader inReader = null;

try {

inReader = new BufferedReader(new InputStreamReader(process.getInputStream()));

String line;while ((line = inReader.readLine()) != null) {

logBuffer.add(line);

}

} catch (Exception e) {

logger.error(e.getMessage(), e);

} finally {

logOutputIsScuccess = true;

close(inReader);

}

});

getOutputLogService.shutdown();

ExecutorService parseProcessOutputExecutorService = ThreadUtils.newDaemonSingleThreadExecutor(threadLoggerInfoName);

parseProcessOutputExecutorService.submit(() -> {

try {

long lastFlushTime = System.currentTimeMillis();

while (logBuffer.size() > 0 || !logOutputIsScuccess) {

if (logBuffer.size() > 0) {

lastFlushTime = flush(lastFlushTime);

} else {

Thread.sleep(Constants.DEFAULT_LOG_FLUSH_INTERVAL);

}

} catch (Exception e) {

logger.error(e.getMessage(), e);

} finally {

clear();

}

});

parseProcessOutputExecutorService.shutdown();

}

.......

/**

* when log buffer siz or flush time reach condition , then flush

* @param lastFlushTime last flush time

* @return last flush time

private long flush(long lastFlushTime) throws InterruptedException {

long now = System.currentTimeMillis();

/**

* when log buffer siz or flush time reach condition , then flush

if (logBuffer.size() >= Constants.DEFAULT_LOG_ROWS_NUM || now - lastFlushTime > Constants.DEFAULT_LOG_FLUSH_INTERVAL) {

lastFlushTime = now;

/** log handle */

logHandler.accept(logBuffer);

logBuffer.clear();

}

return lastFlushTime;

}

.......

}

4、將調度系統的監控接入到 prometheus 和 grafana 中

incubator-dolphinscheduler 只提供了一些如下的簡單實時監控，尤其缺少對任務的監控。

聯童在此基礎上，引入了 prometheus 和 grafana。

使用 prometheus 和 grafana 不但可以監控到調度系統任務的總體運行，也可以監控到單個任務的運行耗時曲線等。

5、對 incubator-dolphinscheduler 的性能優化

未完待續

首先，列車長非常感謝大數據團隊的分享，也要爲共創/共享的精神鼓掌，同時，我們希望各個團隊能夠在工作中沉澱經驗、總結覆盤、最終形成價值輸出。

歡迎各位同學多多投稿，分享你的見解

Hello World→Change Our World

本文分享自微信公衆號 - 海豚調度（dolphin-scheduler）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

聯童大數據調度平臺之路

The Data Way Vol.1｜風口下的開源：如何看待開源與商業的關係？

Apache 首次亞洲在線峯會: Workflow & 數據治理專場

官宣！DolphinScheduler 畢業成爲 Apache 軟件基金會頂級項目

ApacheCon Asia 2021 演講徵集

對話 Apache 巨咖 - 如何做好一個開源項目？

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結