本期內容:
解密Spark Streaming Job架構和運行機制
解密Spark Streaming容錯架構和運行機制
理解SparkStreaming的Job的整個架構和運行機制對於精通SparkStreaming是至關重要的。我們知道對於一般的Spark應用程序來說,是RDD的action操作觸發了Job的運行。那對於SparkStreaming來說,Job是怎麼樣運行的呢?我們在編寫SparkStreaming程序的時候,設置了BatchDuration,Job每隔BatchDuration時間會自動觸發,這個功能肯定是SparkStreaming框架提供了一個定時器,時間一到就將編寫的程序提交給Spark,並以Spark job的方式運行。
這裏面涉及到兩個Job的概念:
每個BatchInterval會產生一個具體的Job,其實這裏的Job不是Spark Core中所指的Job,它只是基於DStreamGraph而生成的RDD的DAG而已,從Java角度講,相當於Runnable接口實例,此時要想運行Job需要提交給JobScheduler,在JobScheduler中通過線程池的方式找到一個單獨的線程來提交Job到集羣運行(其實是在線程中基於RDD的Action觸發真正的作業的運行),爲什麼使用線程池呢?
a),作業不斷生成,所以爲了提升效率,我們需要線程池;這和在Executor中通過線程池執行Task有異曲同工之妙;
b),有可能設置了Job的FAIR公平調度的方式,這個時候也需要多線程的支持;
上面Job提交的Spark Job本身。單從這個時刻來看,此次的Job和Spark core中的Job沒有任何的區別。
下面我們看看job運行的過程:
1.首先實例化SparkConf,設置運行期參數。
val conf = new SparkConf().setAppName("UpdateStateByKeyDemo")
2.實例化StreamingContext,設置batchDuration時間間隔來控制Job生成的頻率並且創建Spark Streaming執行的入口。
val ssc = new StreamingContext(conf,Seconds(20))
3.在實例化StreamingContext的過程中,實例化JobScheduler和JobGenerator 。
StreamingContext.scala的第183行
private[streaming] val scheduler = new JobScheduler(this)
JobScheduler.scala的第50行
private val jobGenerator = new JobGenerator(this)
4.StreamingContext調用start方法。
def start(): Unit = synchronized { state match { case INITIALIZED => startSite.set(DStream.getCreationSite()) StreamingContext.ACTIVATION_LOCK.synchronized { StreamingContext.assertNoOtherContextIsActive() try { validate() // Start the streaming scheduler in a new thread, so that thread local properties // like call sites and job groups can be reset without affecting those of the // current thread. ThreadUtils.runInNewThread("streaming-start") { sparkContext.setCallSite(startSite.get) sparkContext.clearJobGroup() sparkContext.setLocalProperty(SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL, "false") scheduler.start() } state = StreamingContextState.ACTIVE } catch { case NonFatal(e) => logError("Error starting the context, marking it as stopped", e) scheduler.stop(false) state = StreamingContextState.STOPPED throw e } StreamingContext.setActiveContext(this) } shutdownHookRef = ShutdownHookManager.addShutdownHook( StreamingContext.SHUTDOWN_HOOK_PRIORITY)(stopOnShutdown) // Registering Streaming Metrics at the start of the StreamingContext assert(env.metricsSystem != null) env.metricsSystem.registerSource(streamingSource) uiTab.foreach(_.attach()) logInfo("StreamingContext started") case ACTIVE => logWarning("StreamingContext has already been started") case STOPPED => throw new IllegalStateException("StreamingContext has already been stopped") } }
5.在StreamingContext.start()內部啓動JobScheduler的Start方法。
scheduler.start()
在JobScheduler.start()內部實例化EventLoop,並執行EventLoop.start()進行消息循環。
在JobScheduler.start()內部構造ReceiverTacker,並且調用JobGenerator和ReceiverTacker的start方法:
def start(): Unit = synchronized { if (eventLoop != null) return // scheduler has already been started logDebug("Starting JobScheduler") eventLoop = new EventLoop[JobSchedulerEvent]("JobScheduler") { override protected def onReceive(event: JobSchedulerEvent): Unit = processEvent(event) override protected def onError(e: Throwable): Unit = reportError("Error in job scheduler", e) } eventLoop.start() // attach rate controllers of input streams to receive batch completion updates for { inputDStream <- ssc.graph.getInputStreams rateController <- inputDStream.rateController } ssc.addStreamingListener(rateController) listenerBus.start(ssc.sparkContext) receiverTracker = new ReceiverTracker(ssc) inputInfoTracker = new InputInfoTracker(ssc) receiverTracker.start() jobGenerator.start() logInfo("Started JobScheduler") }
6.JobGenerator啓動後會不斷的根據batchDuration生成一個個的Job
/** Generate jobs and perform checkpoint for the given `time`. */ private def generateJobs(time: Time) { // Set the SparkEnv in this thread, so that job generation code can access the environment // Example: BlockRDDs are created in this thread, and it needs to access BlockManager // Update: This is probably redundant after threadlocal stuff in SparkEnv has been removed. SparkEnv.set(ssc.env) Try { jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batch graph.generateJobs(time) // generate jobs using allocated block } match { case Success(jobs) => val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time) jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos)) case Failure(e) => jobScheduler.reportError("Error generating jobs for time " + time, e) } eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false)) }
7.ReceiverTracker啓動後首先在Spark Cluster中啓動Receiver(其實是在Executor中先啓動ReceiverSupervisor),在Receiver收到數據後會通過ReceiverSupervisor存儲到Executor並且把數據的Metadata信息發送給Driver中的ReceiverTracker,在ReceiverTracker內部會通過ReceivedBlockTracker來管理接受到的元數據信息。
/** Start the endpoint and receiver execution thread. */ def start(): Unit = synchronized { if (isTrackerStarted) { throw new SparkException("ReceiverTracker already started") } if (!receiverInputStreams.isEmpty) { endpoint = ssc.env.rpcEnv.setupEndpoint( "ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv)) if (!skipReceiverLaunch) launchReceivers() logInfo("ReceiverTracker started") trackerState = Started } }
二. Spark Streaming容錯機制:
我們知道DStream與RDD的關係就是隨着時間流逝不斷的產生RDD,對DStream的操作就是在固定時間上操作RDD。所以從某種意義上而言,Spark Streaming的基於DStream的容錯機制,實際上就是劃分到每一次形成的RDD的容錯機制,這也是Spark Streaming的高明之處。
Spark Streaming的容錯要考慮兩個方面:
Driver運行失敗時的恢復
使用Checkpoint,記錄Driver運行時的狀態,失敗後可以讀取Checkpoint並恢復Driver狀態。
具體的每次Job運行失敗時的恢復
要考慮到Receiver的失敗恢復,也要考慮到RDD計算失敗的恢復。Receiver可以採用寫wal日誌的方式。RDD的容錯是spark core天生提供的,基於RDD的特性,它的容錯機制主要就是兩種:
01. 基於checkpoint;
在stage之間,是寬依賴,產生了shuffle操作,lineage鏈條過於複雜和冗長,這時候就需要做checkpoint。
02. 基於lineage(血統)的容錯:
一般而言,spark選擇血統容錯,因爲對於大規模的數據集,做檢查點的成本很高。考慮到RDD的依賴關係,每個stage內部都是窄依賴,此時一般基於lineage容錯,方便高效。
總結: stage內部做lineage,stage之間做checkpoint。
備註:
1、DT大數據夢工廠微信公衆號DT_Spark
2、IMF晚8點大數據實戰YY直播頻道號:68917580
3、新浪微博: http://www.weibo.com/ilovepains