Spark源碼-Executor 源碼解析

Executor 源碼解析

Executor 是幹嘛的

處理任務的執行器,是一個JVM進程 ,是一個以線程池實現的運行 Task 的進程。看一下官網和代碼註釋對 Executor 的說明

  • 官網介紹

A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.

  • Executor 代碼註釋:Spark執行器,由線程池支持以運行任務。可以與Mesos、YARN和 standalone 調度程序一起使用。內部RPC接口可以與驅動程序通信,但不支持 Mesos細粒度模式

Spark executor, backed by a threadpool to run tasks.
This can be used with Mesos, YARN, and the standalone scheduler.
An internal RPC interface is used for communication with the driver,except in the case of Mesos fine-grained mode.

Executor

主要調用棧

  • CoarseGrainedExecutorBackend

    • CoarseGrainedExecutorBackend#receive() :由 LaunchTask 類型的消息觸發,以及 KillTask 類型的消息觸發
      override def receive: PartialFunction[Any, Unit] = {
        case LaunchTask(data) =>
          if (executor == null) {
            exitExecutor(1, "Received LaunchTask command but executor was null")
          } else {
            val taskDesc = TaskDescription.decode(data.value)
            logInfo("Got assigned task " + taskDesc.taskId)
            executor.launchTask(this, taskDesc)
          }
    
        case KillTask(taskId, _, interruptThread, reason) =>
          if (executor == null) {
            exitExecutor(1, "Received KillTask command but executor was null")
          } else {
            executor.killTask(taskId, interruptThread, reason)
          }
      }
    
  • LocalEndpoint

    • LocalEndpoint#receive : 由 KillTask 類型的消息觸發
        case KillTask(taskId, interruptThread, reason) =>
             executor.killTask(taskId, interruptThread, reason)
    
    • LocalEndpoint#receiveAndReply : 由 StopExecutor 類型的消息觸發
      override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
        case StopExecutor =>
          executor.stop()
          context.reply(true)
      }
    
    • LocalEndpoint#reviveOffers
      def reviveOffers() {
        val offers = IndexedSeq(new WorkerOffer(localExecutorId, localExecutorHostname, freeCores,
          Some(rpcEnv.address.hostPort)))
        for (task <- scheduler.resourceOffers(offers).flatten) {
          freeCores -= scheduler.CPUS_PER_TASK
          executor.launchTask(executorBackend, task)
        }
      }
    

關鍵函數說明

  • launchTask :啓動 Task ,通過線程池 threadPool 調度 TaskRunner,執行 TaskRunner#run()

      def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
        val tr = new TaskRunner(context, taskDescription)
        runningTasks.put(taskDescription.taskId, tr)
        threadPool.execute(tr)
      }
    
  • killTask : 通過 taskReaperPool 調度TaskReaper#run() -> TaskRunner#kill 或 直接調用 taskRunner#kill

      def killTask(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
        val taskRunner = runningTasks.get(taskId)
        if (taskRunner != null) {
          if (taskReaperEnabled) {
            val maybeNewTaskReaper: Option[TaskReaper] = taskReaperForTask.synchronized {
              val shouldCreateReaper = taskReaperForTask.get(taskId) match {
                case None => true
                case Some(existingReaper) => interruptThread && !existingReaper.interruptThread
              }
              if (shouldCreateReaper) {
                val taskReaper = new TaskReaper(
                  taskRunner, interruptThread = interruptThread, reason = reason)
                taskReaperForTask(taskId) = taskReaper
                Some(taskReaper)
              } else {
                None
              }
            }
            // Execute the TaskReaper from outside of the synchronized block.
            maybeNewTaskReaper.foreach(taskReaperPool.execute)
          } else {
            taskRunner.kill(interruptThread = interruptThread, reason = reason)
          }
        }
      }
    
  • startDriverHeartbeater :啓動心跳,調用 reportHeartBeat() 函數, 在初始化的時候被調用

      /**
       * Schedules a task to report heartbeat and partial metrics for active tasks to driver.
       */
      private def startDriverHeartbeater(): Unit = {
        val intervalMs = HEARTBEAT_INTERVAL_MS
    
        // Wait a random interval so the heartbeats don't end up in sync
        val initialDelay = intervalMs + (math.random * intervalMs).asInstanceOf[Int]
    
        val heartbeatTask = new Runnable() {
          override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
        }
        heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
      }
    
    
  • reportHeartBeat : 向 driver 定時彙報 心跳(heartbeat ) 和 度量(metrics),由 startDriverHeartbeater() 函數定時調度。

      /** Reports heartbeat and metrics for active tasks to the driver. */
      private def reportHeartBeat(): Unit = {
        // list of (task id, accumUpdates) to send back to the driver
        val accumUpdates = new ArrayBuffer[(Long, Seq[AccumulatorV2[_, _]])]()
        val curGCTime = computeTotalGcTime()
    
        for (taskRunner <- runningTasks.values().asScala) {
          if (taskRunner.task != null) {
            taskRunner.task.metrics.mergeShuffleReadMetrics()
            taskRunner.task.metrics.setJvmGCTime(curGCTime - taskRunner.startGCTime)
            accumUpdates += ((taskRunner.taskId, taskRunner.task.metrics.accumulators()))
          }
        }
    
        val message = Heartbeat(executorId, accumUpdates.toArray, env.blockManager.blockManagerId)
        try {
          val response = heartbeatReceiverRef.askSync[HeartbeatResponse](
              message, new RpcTimeout(HEARTBEAT_INTERVAL_MS.millis, EXECUTOR_HEARTBEAT_INTERVAL.key))
          if (response.reregisterBlockManager) {
            logInfo("Told to re-register on heartbeat")
            env.blockManager.reregister()
          }
          heartbeatFailures = 0
        } catch {
          case NonFatal(e) =>
            logWarning("Issue communicating with driver in heartbeater", e)
            heartbeatFailures += 1
            if (heartbeatFailures >= HEARTBEAT_MAX_FAILURES) {
              logError(s"Exit as unable to send heartbeats to driver " +
                s"more than $HEARTBEAT_MAX_FAILURES times")
              System.exit(ExecutorExitCode.HEARTBEAT_FAILURE)
            }
        }
      }
    
    
  • computeTotalGcTime : 計算本jvm進程花在GC上的總時間

      /** Returns the total amount of time this JVM process has spent in garbage collection. */
      private def computeTotalGcTime(): Long = {
        ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
      }
    

成員變量說明

  • threadPool : 使用 Executors.newCachedThreadPool方式創建的 ThreadPoolExecutor,用此線程池運行以"Executor task launch worker"爲前綴的 TaskRunner 線程

  • taskReaperPool :使用 Executors.newCachedThreadPool方式創建的 ThreadPoolExecutor,此線程池執行的線程用於監督 Task 的 kill 和 cancel 。

  • runningTasks : 用於維護正在運行的Task的身份標識(taskId)與TaskRunner之間的映射關係。

  • heartbeater : 只有一個線程的 ScheduledThreadPoolExecutor(線程池調度器),此線程池運行以driver-heartbeater作爲名稱的線程。該調度器 以 spark.executor.heartbeatInterval
    毫秒的頻率定時被調起 執行 reportHeartBeat(),將 active狀態 task 的 心跳 (heartbeat)和 度量 ( metrics ) 向 driver 彙報

  • heartbeatReceiverRef : HeartbeatReceiver 的 RpcEndpointRef,通過調用RpcEnv的setupEndpointRef方法獲得。

  • executorId :當前Executor的身份標識

內部類說明

  • TaskRunner :是一個實現 java.lang.Runnable 的類

  • TaskReaper :

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章