Executor 源碼解析
Executor 是幹嘛的
處理任務的執行器,是一個JVM進程 ,是一個以線程池實現的運行 Task 的進程。看一下官網和代碼註釋對 Executor 的說明
- 官網介紹
A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
- Executor 代碼註釋:Spark執行器,由線程池支持以運行任務。可以與Mesos、YARN和 standalone 調度程序一起使用。內部RPC接口可以與驅動程序通信,但不支持 Mesos細粒度模式
Spark executor, backed by a threadpool to run tasks.
This can be used with Mesos, YARN, and the standalone scheduler.
An internal RPC interface is used for communication with the driver,except in the case of Mesos fine-grained mode.
Executor
主要調用棧
-
CoarseGrainedExecutorBackend
- CoarseGrainedExecutorBackend#receive() :由 LaunchTask 類型的消息觸發,以及 KillTask 類型的消息觸發
override def receive: PartialFunction[Any, Unit] = { case LaunchTask(data) => if (executor == null) { exitExecutor(1, "Received LaunchTask command but executor was null") } else { val taskDesc = TaskDescription.decode(data.value) logInfo("Got assigned task " + taskDesc.taskId) executor.launchTask(this, taskDesc) } case KillTask(taskId, _, interruptThread, reason) => if (executor == null) { exitExecutor(1, "Received KillTask command but executor was null") } else { executor.killTask(taskId, interruptThread, reason) } }
-
LocalEndpoint
- LocalEndpoint#receive : 由 KillTask 類型的消息觸發
case KillTask(taskId, interruptThread, reason) => executor.killTask(taskId, interruptThread, reason)
- LocalEndpoint#receiveAndReply : 由 StopExecutor 類型的消息觸發
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case StopExecutor => executor.stop() context.reply(true) }
- LocalEndpoint#reviveOffers
def reviveOffers() { val offers = IndexedSeq(new WorkerOffer(localExecutorId, localExecutorHostname, freeCores, Some(rpcEnv.address.hostPort))) for (task <- scheduler.resourceOffers(offers).flatten) { freeCores -= scheduler.CPUS_PER_TASK executor.launchTask(executorBackend, task) } }
關鍵函數說明
-
launchTask :啓動 Task ,通過線程池 threadPool 調度 TaskRunner,執行 TaskRunner#run()
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = { val tr = new TaskRunner(context, taskDescription) runningTasks.put(taskDescription.taskId, tr) threadPool.execute(tr) }
-
killTask : 通過 taskReaperPool 調度TaskReaper#run() -> TaskRunner#kill 或 直接調用 taskRunner#kill
def killTask(taskId: Long, interruptThread: Boolean, reason: String): Unit = { val taskRunner = runningTasks.get(taskId) if (taskRunner != null) { if (taskReaperEnabled) { val maybeNewTaskReaper: Option[TaskReaper] = taskReaperForTask.synchronized { val shouldCreateReaper = taskReaperForTask.get(taskId) match { case None => true case Some(existingReaper) => interruptThread && !existingReaper.interruptThread } if (shouldCreateReaper) { val taskReaper = new TaskReaper( taskRunner, interruptThread = interruptThread, reason = reason) taskReaperForTask(taskId) = taskReaper Some(taskReaper) } else { None } } // Execute the TaskReaper from outside of the synchronized block. maybeNewTaskReaper.foreach(taskReaperPool.execute) } else { taskRunner.kill(interruptThread = interruptThread, reason = reason) } } }
-
startDriverHeartbeater :啓動心跳,調用 reportHeartBeat() 函數, 在初始化的時候被調用
/** * Schedules a task to report heartbeat and partial metrics for active tasks to driver. */ private def startDriverHeartbeater(): Unit = { val intervalMs = HEARTBEAT_INTERVAL_MS // Wait a random interval so the heartbeats don't end up in sync val initialDelay = intervalMs + (math.random * intervalMs).asInstanceOf[Int] val heartbeatTask = new Runnable() { override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat()) } heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS) }
-
reportHeartBeat : 向 driver 定時彙報 心跳(heartbeat ) 和 度量(metrics),由 startDriverHeartbeater() 函數定時調度。
/** Reports heartbeat and metrics for active tasks to the driver. */ private def reportHeartBeat(): Unit = { // list of (task id, accumUpdates) to send back to the driver val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]() val curGCTime = computeTotalGcTime() for (taskRunner <- runningTasks.values().asScala) { if (taskRunner.task != null) { taskRunner.task.metrics.mergeShuffleReadMetrics() taskRunner.task.metrics.setJvmGCTime(curGCTime - taskRunner.startGCTime) accumUpdates += ((taskRunner.taskId, taskRunner.task.metrics.accumulators())) } } val message = Heartbeat(executorId, accumUpdates.toArray, env.blockManager.blockManagerId) try { val response = heartbeatReceiverRef.askSync[HeartbeatResponse]( message, new RpcTimeout(HEARTBEAT_INTERVAL_MS.millis, EXECUTOR_HEARTBEAT_INTERVAL.key)) if (response.reregisterBlockManager) { logInfo("Told to re-register on heartbeat") env.blockManager.reregister() } heartbeatFailures = 0 } catch { case NonFatal(e) => logWarning("Issue communicating with driver in heartbeater", e) heartbeatFailures += 1 if (heartbeatFailures >= HEARTBEAT_MAX_FAILURES) { logError(s"Exit as unable to send heartbeats to driver " + s"more than $HEARTBEAT_MAX_FAILURES times") System.exit(ExecutorExitCode.HEARTBEAT_FAILURE) } } }
-
computeTotalGcTime : 計算本jvm進程花在GC上的總時間
/** Returns the total amount of time this JVM process has spent in garbage collection. */ private def computeTotalGcTime(): Long = { ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum }
成員變量說明
-
threadPool : 使用
Executors.newCachedThreadPool
方式創建的 ThreadPoolExecutor,用此線程池運行以"Executor task launch worker"爲前綴的 TaskRunner 線程 -
taskReaperPool :使用
Executors.newCachedThreadPool
方式創建的 ThreadPoolExecutor,此線程池執行的線程用於監督 Task 的 kill 和 cancel 。 -
runningTasks : 用於維護正在運行的Task的身份標識(taskId)與TaskRunner之間的映射關係。
-
heartbeater : 只有一個線程的 ScheduledThreadPoolExecutor(線程池調度器),此線程池運行以
driver-heartbeater
作爲名稱的線程。該調度器 以spark.executor.heartbeatInterval
毫秒的頻率定時被調起 執行reportHeartBeat()
,將 active狀態 task 的 心跳 (heartbeat)和 度量 ( metrics ) 向 driver 彙報 -
heartbeatReceiverRef : HeartbeatReceiver 的 RpcEndpointRef,通過調用RpcEnv的setupEndpointRef方法獲得。
-
executorId :當前Executor的身份標識
內部類說明
-
TaskRunner :是一個實現 java.lang.Runnable 的類
-
TaskReaper :