Spark源碼-Executor 源碼解析

Executor 源碼解析

Executor 是幹嘛的

處理任務的執行器，是一個JVM進程，是一個以線程池實現的運行 Task 的進程。看一下官網和代碼註釋對 Executor 的說明

官網介紹

A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.

Executor 代碼註釋：Spark執行器，由線程池支持以運行任務。可以與Mesos、YARN和 standalone 調度程序一起使用。內部RPC接口可以與驅動程序通信，但不支持 Mesos細粒度模式

Spark executor, backed by a threadpool to run tasks.
This can be used with Mesos, YARN, and the standalone scheduler.
An internal RPC interface is used for communication with the driver,except in the case of Mesos fine-grained mode.

Executor

主要調用棧

CoarseGrainedExecutorBackend

CoarseGrainedExecutorBackend#receive() :由 LaunchTask 類型的消息觸發，以及 KillTask 類型的消息觸發

  override def receive: PartialFunction[Any, Unit] = {
    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
        val taskDesc = TaskDescription.decode(data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc)
      }

    case KillTask(taskId, _, interruptThread, reason) =>
      if (executor == null) {
        exitExecutor(1, "Received KillTask command but executor was null")
      } else {
        executor.killTask(taskId, interruptThread, reason)
      }
  }

LocalEndpoint

LocalEndpoint#receive ：由 KillTask 類型的消息觸發

    case KillTask(taskId, interruptThread, reason) =>
         executor.killTask(taskId, interruptThread, reason)

LocalEndpoint#receiveAndReply : 由 StopExecutor 類型的消息觸發

  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
    case StopExecutor =>
      executor.stop()
      context.reply(true)
  }

LocalEndpoint#reviveOffers

  def reviveOffers() {
    val offers = IndexedSeq(new WorkerOffer(localExecutorId, localExecutorHostname, freeCores,
      Some(rpcEnv.address.hostPort)))
    for (task <- scheduler.resourceOffers(offers).flatten) {
      freeCores -= scheduler.CPUS_PER_TASK
      executor.launchTask(executorBackend, task)
    }
  }

關鍵函數說明

launchTask ：啓動 Task ,通過線程池 threadPool 調度 TaskRunner，執行 TaskRunner#run()

  def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
    val tr = new TaskRunner(context, taskDescription)
    runningTasks.put(taskDescription.taskId, tr)
    threadPool.execute(tr)
  }

killTask : 通過 taskReaperPool 調度TaskReaper#run() -> TaskRunner#kill 或直接調用 taskRunner#kill

  def killTask(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    val taskRunner = runningTasks.get(taskId)
    if (taskRunner != null) {
      if (taskReaperEnabled) {
        val maybeNewTaskReaper: Option[TaskReaper] = taskReaperForTask.synchronized {
          val shouldCreateReaper = taskReaperForTask.get(taskId) match {
            case None => true
            case Some(existingReaper) => interruptThread && !existingReaper.interruptThread
          }
          if (shouldCreateReaper) {
            val taskReaper = new TaskReaper(
              taskRunner, interruptThread = interruptThread, reason = reason)
            taskReaperForTask(taskId) = taskReaper
            Some(taskReaper)
          } else {
            None
          }
        }
        // Execute the TaskReaper from outside of the synchronized block.
        maybeNewTaskReaper.foreach(taskReaperPool.execute)
      } else {
        taskRunner.kill(interruptThread = interruptThread, reason = reason)
      }
    }
  }

startDriverHeartbeater :啓動心跳,調用 reportHeartBeat() 函數，在初始化的時候被調用

  /**
   * Schedules a task to report heartbeat and partial metrics for active tasks to driver.
   */
  private def startDriverHeartbeater(): Unit = {
    val intervalMs = HEARTBEAT_INTERVAL_MS

    // Wait a random interval so the heartbeats don't end up in sync
    val initialDelay = intervalMs + (math.random * intervalMs).asInstanceOf[Int]

    val heartbeatTask = new Runnable() {
      override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
    }
    heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
  }

reportHeartBeat : 向 driver 定時彙報心跳（heartbeat ) 和度量（metrics)，由 startDriverHeartbeater() 函數定時調度。

  /** Reports heartbeat and metrics for active tasks to the driver. */
  private def reportHeartBeat(): Unit = {
    // list of (task id, accumUpdates) to send back to the driver
    val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]()
    val curGCTime = computeTotalGcTime()

    for (taskRunner <- runningTasks.values().asScala) {
      if (taskRunner.task != null) {
        taskRunner.task.metrics.mergeShuffleReadMetrics()
        taskRunner.task.metrics.setJvmGCTime(curGCTime - taskRunner.startGCTime)
        accumUpdates += ((taskRunner.taskId, taskRunner.task.metrics.accumulators()))
      }
    }

    val message = Heartbeat(executorId, accumUpdates.toArray, env.blockManager.blockManagerId)
    try {
      val response = heartbeatReceiverRef.askSync[HeartbeatResponse](
          message, new RpcTimeout(HEARTBEAT_INTERVAL_MS.millis, EXECUTOR_HEARTBEAT_INTERVAL.key))
      if (response.reregisterBlockManager) {
        logInfo("Told to re-register on heartbeat")
        env.blockManager.reregister()
      }
      heartbeatFailures = 0
    } catch {
      case NonFatal(e) =>
        logWarning("Issue communicating with driver in heartbeater", e)
        heartbeatFailures += 1
        if (heartbeatFailures >= HEARTBEAT_MAX_FAILURES) {
          logError(s"Exit as unable to send heartbeats to driver " +
            s"more than $HEARTBEAT_MAX_FAILURES times")
          System.exit(ExecutorExitCode.HEARTBEAT_FAILURE)
        }
    }
  }

computeTotalGcTime ：計算本jvm進程花在GC上的總時間

  /** Returns the total amount of time this JVM process has spent in garbage collection. */
  private def computeTotalGcTime(): Long = {
    ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
  }

成員變量說明

threadPool : 使用 Executors.newCachedThreadPool方式創建的 ThreadPoolExecutor，用此線程池運行以"Executor task launch worker"爲前綴的 TaskRunner 線程
taskReaperPool ：使用 Executors.newCachedThreadPool方式創建的 ThreadPoolExecutor，此線程池執行的線程用於監督 Task 的 kill 和 cancel 。
runningTasks : 用於維護正在運行的Task的身份標識(taskId)與TaskRunner之間的映射關係。
heartbeater : 只有一個線程的 ScheduledThreadPoolExecutor（線程池調度器），此線程池運行以driver-heartbeater作爲名稱的線程。該調度器以 spark.executor.heartbeatInterval
毫秒的頻率定時被調起執行 reportHeartBeat()，將 active狀態 task 的心跳 (heartbeat)和度量 ( metrics ) 向 driver 彙報
heartbeatReceiverRef : HeartbeatReceiver 的 RpcEndpointRef，通過調用RpcEnv的setupEndpointRef方法獲得。
executorId ：當前Executor的身份標識

內部類說明

TaskRunner ：是一個實現 java.lang.Runnable 的類
TaskReaper :

Spark源碼-Executor 源碼解析

Executor 源碼解析

Executor 是幹嘛的

Executor

主要調用棧

關鍵函數說明

成員變量說明

內部類說明

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

Spark源碼-Executor 源碼解析

RDD 的緩存(persist)和檢查點(Checkpoint)

Spark checkpoint 詳述

淺析Broadcast

Mockito(一) -- 入門篇

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結