spark-core_25:Master通知Worker啓動CoarseGrainedExecutorBackend進程及初始化源碼分析

承接上文（spark-core_24:AppClient的ClientEndpoint註冊RegisterApplication）

上文中提到：master調用launchExecutor(){worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id,exec.application.desc, exec.cores, exec.memory))}讓worker啓動CoarseGrainedExecutorBackend

19,Worker使用JDK的ProcessBuider.start來啓動CoarseGrainedExecutorBackend進程

override def receive: PartialFunction[Any, Unit] =synchronized {
…..
    /**
      * appDesc，裏面包括command信息,裏面有啓動類CoarseGrainedExecutorBackend
        //是在SparkContext初始化時啓動TaskSchedulerImpl.start()之後由SparkDeploySchedulerBackend的AppClient的RpcEndPonit調用registerApplication 放進去的
//再調用startExecutorsOnWorkers==》allocateWorkerResourceToExecutors==》launchExecutor(worker, exec)==》worker.endpoint.send(LaunchExecutor(masterUrl,。。）

        masterUrl:spark://luyl152:7077
        appId: app-20180404172558-0000
        execId: 一個自增的數值，默認從0開始
cores_ : --num-executors或SparkConf的"spark.executor.cores"的值,如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
        memory_ : 對應sc.executorMemory，默認是1024MB
      */
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
    /**worker LaunchExecutor創建了ExecutorRunner，然後調用了ExecutorRunner的start()方法，該start()方法調用了方法fetchAndRunExecutor()，
      * 這個fetchAndRunExecutor()方法中有以下代碼：
        val builder =CommandUtils.buildProcessBuilder(appDesc.command, newSecurityManager(conf),memory, sparkHome.getAbsolutePath, substituteVariables)
        process = builder.start()
      */
    if (masterUrl != activeMasterUrl) {
      logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
   } else {
      try {
        logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))

        // Create the executor's working directory.創建CoarseGrainedExecutorBackend對應的工作目錄
        //這個workDir：在WorkerArguments 中初始化SPARK_WORKER_DIR如果不設置這個變量，會在worker啓動時在spark_home下面創建一個work目錄
        //executorDir :/data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4
        val executorDir = new File(workDir, appId+ "/" + execId)
        if (!executorDir.mkdirs()){
          throw new IOException("Failed tocreate directory " +executorDir)
        }

        // Create local dirs for the executor. These are passedto the executor via the
        // SPARK_EXECUTOR_DIRSenvironment variable, and deleted by the Worker when the
        // application finishes.
        //創建本地目錄爲CoarseGrainedExecutorBackend，會通過環境變量SPARK_EXECUTOR_DIRS（在WorkerArguments 中初始化）傳給CoarseGrainedExecutorBackend
        //當application完成時，會將它刪除掉
        //appDirectories:HashMap[String,Seq[String]]第一次的時候，肯定是沒有值的
        //appLocalDirs:返回Seq("/tmp/spark-b7c124be-813a-4c06-8f8e-1e04fd2b5056/executor-ed6c2e1e-c448-4883-8f34-5efdde76521b")
        val appLocalDirs = appDirectories.get(appId).getOrElse {
          //getOrCreateLocalRootDirs()返回：Array(/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b)
          Utils.getOrCreateLocalRootDirs(conf).map { dir=>
    //返回/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6
            val appDir = Utils.createDirectory(dir, namePrefix = "executor")
            Utils.chmod700(appDir)
            appDir.getAbsolutePath()
          }.toSeq
        }
//appDirectories:HashMap["app-20180404172558-0000",Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")]
        appDirectories(appId) = appLocalDirs
        //ExecutorRunner讓每個Worker節點真正去啓動CoarseGrainedExecutorBackend進程
        //使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend
        val manager = new ExecutorRunner(
          appId, //app-20180404172558-0000
          execId,
          appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
          cores_,
          memory_,
          self,
          workerId, //worker-20180321165947-luyl153-RpcAddress.port值
          host, //worker的host
          webUi.boundPort, //worker的WebUI端口是8081，master的是8080
          publicAddress, //當前worker的主機名
          sparkHome,
          executorDir,///data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4
          workerUri, //spark://sparkWorker@luyl153:RpcAddress.port
          conf,
          appLocalDirs, ExecutorState.RUNNING)
        //executors: HashMap[String, ExecutorRunner]
        executors(appId + "/" + execId) = manager
        manager.start()
        coresUsed += cores_
        memoryUsed += memory_
        sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
      } catch {
        case e: Exception => {
          logError(s"Failed to launch executor $appId/$execId for${appDesc.name}.", e)
          if (executors.contains(appId + "/" + execId)) {
            executors(appId + "/" + execId).kill()
            executors -= appId + "/" + execId
          }
          sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
            Some(e.toString), None))
        }
      }
    }

20，最精彩的代碼即將登場，就是spark使用ProcessBuilder來啓動CoarseGrainedExecutorBackend

private[deploy] class ExecutorRunner(
    val appId:String, //app-20180404172558-0000
    val execId:Int,
   val appDesc:ApplicationDescription,
    val cores: Int,
    val memory: Int,//對應sc.executorMemory，默認是1024MB
    val worker:RpcEndpointRef,
    val workerId: String,
    val host: String,
    val webUiPort: Int,//worker的WebUI端口是8081，master的是8080
    val publicAddress:String,//當前worker的主機名
    val sparkHome:File,
    val executorDir: File, //$spark_home$/work/0
    val workerUrl:String, //spark://sparkWorker@luyl153:RpcAddress.port
    conf: SparkConf,
    val appLocalDirs: Seq[String], // Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
    @volatile var state:ExecutorState.Value)
extends Logging {

private val fullId =appId + "/" + execId
private var workerThread: Thread = null
private var process: Process = null
private var stdoutAppender: FileAppender = null
private var stderrAppender: FileAppender = null

// NOTE: This is now redundant with the automatedshut-down enforced by the Executor. It might make sense to remove this in the future.
private var shutdownHook: AnyRef = null

private[worker] def start() {
    workerThread = new Thread("ExecutorRunner for " + fullId) {
      override def run() {fetchAndRunExecutor() }
    }
    workerThread.start()
    // Shutdown hook that kills actors on shutdown.

//jdk的main類退出時的hook代碼，後面分析，還是挺有用的，也長見識
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
      // It's possible that we arrive here before calling`fetchAndRunExecutor`, then `state` will
      // be `ExecutorState.RUNNING`. Inthis case, we should set `state` to `FAILED`.
      if (state== ExecutorState.RUNNING) {
        state = ExecutorState.FAILED
      }
      killProcess(Some("Worker shutting down")) }
}

===》查看fetchAndRunExecutor方法

/**
* Download and run the executordescribed in our ApplicationDescription
* 使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend
*https://blog.csdn.net/u013256816/article/details/54603910
*/
private def fetchAndRunExecutor() {
try {
    // Launch the process,它就是返回jdk的ProcessBuilder
    val builder= CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf), memory, sparkHome.getAbsolutePath, substituteVariables)
    。。。}

===》先看一下CommandUtils.buildProcessBuilder通過，ProcessBuilder放入執行main類的java命令

object CommandUtils extends Logging {

/**
   * Build a ProcessBuilder based on thegiven parameters.
   * The `env` argument is exposed for testing.
    * //command ：Command(org.apache.spark.executor.CoarseGrainedExecutorBackend,
      // List(--driver-url,spark://[email protected]:49972,
      // --executor-id, {{EXECUTOR_ID}},
      // --hostname, {{HOSTNAME}},
      // --cores, {{CORES}}, --app-id,{{APP_ID}}, --worker-url, {{WORKER_URL}}),
      // Map(SPARK_USER -> root,SPARK_EXECUTOR_MEMORY -> 1024m),
      //List(),List(),ArraySeq(-Dspark.driver.port=49972, -XX:+PrintGCDetails,-Dkey=value, -Dnumbers=one two three))
   */
def buildProcessBuilder(
      command: Command,
      securityMgr: SecurityManager,
      memory: Int,//1024MB
      sparkHome: String,
      substituteArguments: String =>String, //將command中的參數變量{EXECUTOR_ID}}、{{CORES}}轉成具體值
      classPaths: Seq[String] = Seq[String](),
      env: Map[String, String] = sys.env):ProcessBuilder = {
    val localCommand= buildLocalCommand(
      command, securityMgr, substituteArguments, classPaths, env)
    val commandSeq= buildCommandSeq(localCommand, memory, sparkHome)
    /**會把如下命令給ProcessBuilder構造器，可以看出就是一個java -cp *.jarmain類。。啓動命令
      * "/usr/local/java/jdk1.8.0_91/bin/java" "-cp" "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/"   "-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value" "-Dnumbers=one two three" "org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url""spark://[email protected]:47218""--executor-id" "0" "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000" "--worker-url""spark://[email protected]:44713"
      */
    val builder= new ProcessBuilder(commandSeq: _*)
    // environment方法獲得運行進程的環境變量,得到一個Map,可以修改環境變量
    val environment= builder.environment()
    for ((key, value) <- localCommand.environment) {
      environment.put(key, value)
    }
    builder
}

===》得到的ProcessBuilder對象之後，回到fetchAndRunExecutor()繼續往下走

private def fetchAndRunExecutor() {
try {

。。。
      //返回此進程生成器的操作系統程序和參數。
    val command= builder.command()
    val formattedCommand= command.asScala.mkString("\"", "\" \"", "\"")
    /**
      * 94:18/05/03 19:44:17 INFOworker.ExecutorRunner: Launch command: "/usr/local/java/jdk1.8.0_91/bin/java" "-cp" "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/"
"-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value"
"-Dnumbers=one two three""org.apache.spark.executor.CoarseGrainedExecutorBackend"
      * "--driver-url""spark://[email protected]:47218""--executor-id" "0"
      * "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000"
      * "--worker-url""spark://[email protected]:44713"
      */
    logInfo(s"Launchcommand: $formattedCommand")
    //設置當前進程工作目錄,//$spark_home$/work/0
    builder.directory(executorDir)
    // appLocalDirs: Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
    //builder.environment返回一個Map,可以修改環境變量的值
    builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
    // In case we are running this from within the SparkShell, avoid creating a "scala"
    // parent process for the executorcommand
    builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

    // Add webUI log urls
    //worker的WebUI端口是8081，master的是8080,將worker的錯誤和正確的輸出流指定web頁面上
    val baseUrl=
      s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
    builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
    builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
    //峯迴路轉，CoarseGrainedExecutorBackend就是在這啓動的
    process = builder.start()
    val header= "Spark Executor Command: %s\n%s\n\n".format(
      formattedCommand, "=" * 40)

    // Redirect its stdout and stderr to files
    //把標準輸出到: /$spark_home$/work/0/stdout文件中
    val stdout= new File(executorDir, "stdout")
    stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
    //把錯誤輸出到: /$spark_home$/work/0/stderr文件中
    val stderr= new File(executorDir, "stderr")
    Files.write(header, stderr, UTF_8)
    stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

    // Wait for it to exit; executor may exit with code 0(when driver instructs it to shutdown) or with nonzero exit code
    // process.waitFor()會讓當前線程阻塞，在不出現異常時直到進程執行結束
    val exitCode= process.waitFor()
    state = ExecutorState.EXITED
    val message= "Command exited with code " + exitCode
    worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
    case interrupted:InterruptedException => {
      logInfo("Runner thread for executor " + fullId + " interrupted")
      state = ExecutorState.KILLED
      killProcess(None)
    }
    case e: Exception => {
      logError("Error running executor", e)
      state = ExecutorState.FAILED
      killProcess(Some(e.toString))
    }
}
}

21，CoarseGrainedExecutorBackend啓動main進程，會解析從Worker哪傳過來的main參數同時調用run方法

private[spark] object CoarseGrainedExecutorBackendextends Logging {

…..
def main(args: Array[String]) {
    var driverUrl:String = null//CoarseGrainedSchedulerBackend的DriverEndpointRef
    var executorId:String = null//ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
    var hostname:String = null//worker的ip
    var cores:Int = 0 //cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
    var appId:String = null //app-20180503193934-0000
    var workerUrl:Option[String] = None //spark://[email protected]:44713
    val userClassPath= new mutable.ListBuffer[URL]()

    var argv= args.toList

//將參數解析出來，放到成員變量中
    while (!argv.isEmpty){
      argv match {
        case ("--driver-url") :: value :: tail =>
          driverUrl = value
          argv = tail
        case ("--executor-id") :: value :: tail =>
          executorId = value
          argv = tail
       。。。。

          printUsageAndExit()
      }
    }
    //如果有一個值是空就打印退出main
    if (driverUrl== null || executorId == null ||hostname == null || cores <= 0 ||
      appId == null) {
      printUsageAndExit()
    }

    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
}

22，查看run方法初始化了什麼

private[spark] object CoarseGrainedExecutorBackendextends Logging {
/**
    * "--driver-url" "spark://[email protected]:56522" 就是 CoarseGrainedSchedulerBackend的DriverEndpointRef
    * "--executor-id""4"   //ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
    * "--hostname""192.168.1.153"
    * "--cores""1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),
    *                 如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
    * "--app-id""app-20180508234845-0000"
    * "--worker-url""spark://[email protected]:53403"
    */
private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {
    //打印：liunx相關信號，如：當有Ctrl+C 取消命令時對應INT信息，也會將CoarseGrainedExecutorBackend進程取消掉
    SignalLogger.register(log)

    SparkHadoopUtil.get.runAsSparkUser{ () =>
      // Debug code
      Utils.checkHost(hostname)

      // Bootstrap to fetch the driver's Spark properties.
      val executorConf= new SparkConf
      val port= executorConf.getInt("spark.executor.port", 0)
      //創建一個RpcEnv相當於創建ActorSystem,標識是driverPropsFetcher
      val fetcher= RpcEnv.create(
        "driverPropsFetcher",
        hostname,
        port,
        executorConf,
        new SecurityManager(executorConf),
        clientMode = true)
      //得到CoarseGrainedSchedulerBackend的DriverEndpointRef
      val driver= fetcher.setupEndpointRefByURI(driverUrl)
      //會回覆一個Seq[(String,String)]裏面對應sparkConf中key以spark開始的所有屬性，同時將(spark.app.id,"app-20180508234845-0000")也放到這個Seq集合中
     val props= driver.askWithRetry[Seq[(String, String)]](RetrieveSparkProps) ++
        Seq[(String, String)](("spark.app.id", appId))
      //再將fetcher的RpcEnv關掉
      fetcher.shutdown()

      // Create SparkEnv using properties we fetched from thedriver.
      //新new 一個默認的SparkConf()，並從DriverEndpointRef取到的Seq[(String, String)]賦到當前的SparkConf()中
      val driverConf= new SparkConf()
      for ((key, value) <- props) {
        // this is required for SSL in standalone mode
        if (SparkConf.isExecutorStartupConf(key)){
          driverConf.setIfMissing(key, value)
        } else {
          driverConf.set(key, value)
        }
      }
      if (driverConf.contains("spark.yarn.credentials.file")) {
        logInfo("Will periodically update credentials from: " +
          driverConf.get("spark.yarn.credentials.file"))
        SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(driverConf)
      }
      //創建一個CoarseGrainedSchedulerBackend對應的SparkEnv，創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值
      val env= SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, port, cores, isLocal = false)

      // SparkEnv will set spark.executor.port if the rpc envis listening for incoming
      // connections (e.g., if it's usingakka). Otherwise, the executor is running in
      // client mode only, and does notaccept incoming connections.
      //SparkEnv將設置spark.executor.port，如果rpcEnv爲外部聯接提供監聽（如使用akka）.否則executor只會運行在client模式，並且不會接收外部聯接
      //創建一個CoarseGrainedSchedulerBackend對應的SparkEnv，在創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值
      val sparkHostPort= env.conf.getOption("spark.executor.port").map { port =>
          hostname + ":" + port
        }.orNull

      /**
        * 將sparkExecutor對應rpcEnv、
        * driverUrl"spark://[email protected]:56522" 就是CoarseGrainedSchedulerBackend的DriverEndpointRef
        * executorId "4"   //ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
        * sparkHostPort：創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值,所以該值是Null
        * cores "1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
        * userClassPath：空的集合
        * env：SparkEnv 給CoarseGrainedExecutorBackend實例
        */
      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, sparkHostPort, cores, userClassPath, env))
      //也構造了一個WorkerWatcher， url："spark://[email protected]:53403"
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      env.rpcEnv.awaitTermination()
      SparkHadoopUtil.get.stopExecutorDelegationTokenRenewer()
    }
}

23，實例化了CoarseGrainedExecutorBackend，它也是RpcEndpoint

/** 該實例由CoarseGrainedExecutorBackend的main初始化的，
* 將sparkExecutor對應rpcEnv、
* driverUrl"spark://[email protected]:56522" 就是CoarseGrainedSchedulerBackend的DriverEndpointRef
* executorId "4"   //ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
* hostPort：創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值,所以該值是Null
* cores "1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1), 如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
* userClassPath：空的集合
* env： SparkEnv給CoarseGrainedExecutorBackend實例
*/
private[spark] class CoarseGrainedExecutorBackend(
    override val rpcEnv: RpcEnv,
    driverUrl: String,
    executorId: String,
    hostPort: String,
    cores: Int,
    userClassPath: Seq[URL],
    env: SparkEnv)
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {

var executor: Executor = null
//就是DriverEndpoint的引用
@volatile var driver: Option[RpcEndpointRef] = None

// If this CoarseGrainedExecutorBackend is changed tosupport multiple threads, then this may need
// to be changed so that we don't sharethe serializer instance across threads
//如果CoarseGrainedExecutorBackend變成多線程，那麼這個需要改變，以便於我們不會把系列化實例在多線程中共享
private[this] val ser: SerializerInstance =env.closureSerializer.newInstance()

override def onStart() {
    //driverUrl:spark://[email protected]:49972,這個得到的RpcEndpointRef 就是DriverEndpoint的引用
    logInfo("Connectingto driver: " + driverUrl)
   rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use"ThreadUtils.sameThread"
      //這是一個非常快的動作，所以我們使用"ThreadUtils.sameThread"
      driver = Some(ref)
      //會通知DriverEndpoint，然後DriverEndpoint會回覆RegisteredExecutor給CoarseGrainedExecutorBackend，讓它創建Executor
      /* executorId "4"   //ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
      self: CoarseGrainedExecutorBackend
      hostPort:創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值,所以該值是Null
      cores：--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
       extractLogUrls: 從環境變量中過濾出key值含有："SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key，原來v不改
       */
      ref.ask[RegisterExecutorResponse](
        RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
    }(ThreadUtils.sameThread).onComplete{
      // This is a very fast action so we can use"ThreadUtils.sameThread"
      case Success(msg) =>Utils.tryLogNonFatalError {
        Option(self).foreach(_.send(msg))// msg must be RegisterExecutorResponse
      }
      case Failure(e) => {
        logError(s"Cannot register with driver: $driverUrl", e)
        System.exit(1)
      }
    }(ThreadUtils.sameThread)
}

24，和CoarseGrainedExecutorBackend的DriverEndpoint通信，發送RegisterExecutor，讓它創建Executor

class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
extends ThreadSafeRpcEndpoint withLogging {
。。。。。
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
    //CoarseGrainedExecutorBackend在初始化的時候，這個case會被調用
    /* executorId "4"   //ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
       executorRef:CoarseGrainedExecutorBackend
       hostPort:創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值,所以該值是Null
       cores：--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值，只啓動一個CoarseGrainedExecutorBackend，把worker所有可用的core給它
       extractLogUrls: 從環境變量中過濾出key值含有："SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key，原來v不改

(stdout,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stdout)
(stderr,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stderr)
     */
    case RegisterExecutor(executorId, executorRef, hostPort, cores, logUrls) =>

      //executorDataMap:HashMap[String,ExecutorData],剛開始時肯定是沒有值的
      if (executorDataMap.contains(executorId)) {
        context.reply(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
      } else {
        // If the executor's rpc env is not listening forincoming connections, `hostPort`
        // will be null, and the clientconnection should be used to contact the executor.
        //如果CoarseGrainedExecutorBackend的rpcEnv不去監聽外來聯接，hostPort是null，並且客戶端聯接必須被用來聯繫CoarseGrainedExecutorBackend
        val executorAddress = if (executorRef.address!= null) {
            executorRef.address
          } else {
            //standalone的client模式會進入這個代碼，然後將CoarseGrainedSchedulerBackend的RpcAddress取到"luyl155:53561"
            context.senderAddress
          }
        //17/11/12 20:31:22 INFOcluster.SparkDeploySchedulerBackend: Registered executorNettyRpcEndpointRef(null) (luyl155:53561) with ID 2
        logInfo(s"Registeredexecutor $executorRef ($executorAddress) with ID $executorId")
        //addressToExecutorId:HashMap[RpcAddress, String],將CoarseGrainedSchedulerBackend的RpcAddress爲key，值是CoarseGrainedSchedulerBackend自己的id
        addressToExecutorId(executorAddress) = executorId
        //totalCoreCount: AtomicInteger(0)是所有CoarseGrainedSchedulerBackend對應的cores總和
        totalCoreCount.addAndGet(cores)
        //totalRegisteredExecutors: AtomicInteger(0),統計有多少個CoarseGrainedSchedulerBackend
        totalRegisteredExecutors.addAndGet(1)
        /*executorRef: CoarseGrainedExecutorBackend
          executorRef.address、:因爲創建RpcEnv時，因爲是client模式，所以rpcEnv.address沒有值,所以該值是Null
          executorAddress.host：CoarseGrainedSchedulerBackend所在worker的ip
          cores:CoarseGrainedSchedulerBackend所能擁有的cores的個數
          logUrls：從環境變量中過濾出key值含有："SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key，原來v不改
        * */
        val data = new ExecutorData(executorRef, executorRef.address, executorAddress.host,
          cores, cores, logUrls)
        // This must be synchronized because variables mutated in this block are read when requestingexecutors
        // 必須同步，因爲請求 CoarseGrainedSchedulerBackend時，變量在這個塊中會變化
        CoarseGrainedSchedulerBackend.this.synchronized{
          //executorDataMap:HashMap[String, ExecutorData]，將oarseGrainedSchedulerBackend的id和它ExecutorData，裏面有ref引用、ip地址、core個數，放進去
          executorDataMap.put(executorId, data)
          //numPendingExecutors的初始值是0
          if (numPendingExecutors> 0) {
            numPendingExecutors -= 1
            logDebug(s"Decrementednumber of pending executors ($numPendingExecutors left)")
          }
        }
        // Note: some tests expect the reply to come after we putthe executor in the map
        //會通知CoarseGrainedExecutorBackend，初始化Executor線程池，然後makeOffer
        //executorAddress.host:CoarseGrainedSchedulerBackend所在的ip
        context.reply(RegisteredExecutor(executorAddress.host))
        listenerBus.post(
          SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))

//稍等分析這個makeOffer，它是執行任務用的
makeOffers()
}

25，DriverEndpoint回覆CoarseGrainedExecutorBackend，傳RegisteredExecutor對象給它，讓CoarseGrainedExecutorBackend會實例化Executor

private[spark] class CoarseGrainedExecutorBackend(
    override val rpcEnv: RpcEnv,
    driverUrl: String,
    executorId: String,
    hostPort: String,
    cores: Int,
    userClassPath: Seq[URL],
    env: SparkEnv)
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {

……

override def receive:PartialFunction[Any, Unit] = {
    //executorAddress.host:CoarseGrainedSchedulerBackend所在的ip
    case RegisteredExecutor(hostname)=>
      logInfo("Successfully registered with driver")
      /** executorId: ExecutorDesc的id,是一個自增的數，對應每個CoarseGrainedSchedulerBackend
        * hostname:CoarseGrainedSchedulerBackend所在的ip
        * env : SparkEnv 給CoarseGrainedExecutorBackend實例
        * userClassPath：空的集合
        */
      executor = new Executor(executorId, hostname, env, userClassPath, isLocal= false)

26，查看Executor初始化過程，調用了BlockManager的initialize方法

private[spark] class Executor(
    executorId: String,
    executorHostname: String,
    env: SparkEnv,
    userClassPath: Seq[URL] = Nil,
    isLocal: Boolean= false)
extends Logging {

logInfo(s"Starting executor ID $executorId on host $executorHostname")

// Application dependencies (added through SparkContext)that we've fetched so far on this node. Each map holds the master's timestamp for theversion of that file or JAR we got.
//應該是和--jars和--files給Executor的依賴
private val currentFiles: HashMap[String, Long] = new HashMap[String,Long]()
private val currentJars: HashMap[String, Long] = new HashMap[String,Long]()
//生成一個空的ByteBuffer，新的ByteBuffer的limit和capacity是當前數組的長度，position是0，mark不存在
private val EMPTY_BYTE_BUFFER = ByteBuffer.wrap(new Array[Byte](0))
//在CoarseGrainedExecutorBackend被調用main時，將只有key以spark開始的所有屬性，才被放進來
private val conf =env.conf

// No ip or host:port - just hostname
Utils.checkHost(executorHostname, "Expected executed slave to be a hostname")
// must not have port specified.
assert (0 == Utils.parseHostPort(executorHostname)._2)

// Make sure the local hostname we report matches thecluster scheduler's name for this host
//確保我們報告的本地主機名稱與此主機的羣集調度程序名稱相匹配
Utils.setCustomHostname(executorHostname)
//isLocal的默認值是false
if (!isLocal){
    // Setup an uncaught exception handler for non-localmode.
    // Make any thread terminations due touncaught exceptions kill the entire
    // executor process to avoidsurprising stalls.
    //爲非本地模式設置未捕獲的異常處理程序。
    // 由於未捕獲的異常而使任何線程終止都會終止整個執行程序進程，以避免令人喫驚的停頓。
    Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
}

// Start worker thread pool. 初始化緩存線程池
private val threadPool = ThreadUtils.newDaemonCachedThreadPool("Executor task launch worker")
//ExecutorSource用於測量系統。
private val executorSource = new ExecutorSource(threadPool, executorId)

if (!isLocal){
    env.metricsSystem.registerSource(executorSource)
    //Executor在初始化時調用一次，driver在SparkContext初始化也調用

//是在在CoarseGrainedExecutorBackend被調用main時放進去的，值是app-20180508234845-0000

(查看spark-core_28：Executor初始化過程env.blockManager.initialize(conf.getAppId)- NettyBlockTransferService.init()源碼分析)

env.blockManager.initialize(conf.getAppId)
}

===》接下來就是發送心跳給和指標給Driver

// must be initializedbefore running startDriverHeartbeat()
//是SparkContext初始化出來的，HeartbeatReceiver。 ENDPOINT_NAME： HeartbeatReceiver
private val heartbeatReceiverRef =
RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv)
startDriverHeartbeater()

後面再具體看一個具體看一下Executor發起的每10s一次的心跳

/**
* Schedules a task to report heartbeatand partial metrics for active tasks to driver.
* 安排任何去報告心跳，同時部分活動的指標給driver
*/
private def startDriverHeartbeater(): Unit = {
//默認時間是10秒，會轉換成毫秒值
val intervalMs= conf.getTimeAsMs("spark.executor.heartbeatInterval", "10s")

// Wait a random interval so the heartbeats don't end upin sync
//應該是小於20的值
val initialDelay= intervalMs + (math.random * intervalMs).asInstanceOf[Int]

val heartbeatTask= new Runnable() {
override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
}
//先延遲小於20s，然後每10s執行一次
heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
}

spark-core_25:Master通知Worker啓動CoarseGrainedExecutorBackend進程及初始化源碼分析

後面再具體看一個具體看一下Executor發起的每10s一次的心跳

SparkStreaming案例：NetworkWordCount--ReceiverSupervisorImpl.onStart()如何將Reciver數據寫到BlockManager中

SparkStream例子HdfsWordCount--從Dstream到RDD全過程解析

SparkStream源碼分析：JobScheduler的JobStarted、JobCompleted是怎麼被調用的

SparkStream例子HdfsWordCount--InputDStream及OutputDstream是如何進入DStreamGraph中

spark-core_08: $SPARK_HOME/sbin/slaves.sh、start-slave.sh腳本分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結