spark-core_25:Master通知Worker啓動CoarseGrainedExecutorBackend進程及初始化源碼分析

承接上文(spark-core_24:AppClient的ClientEndpoint註冊RegisterApplication)

上文中提到:master調用launchExecutor(){worker.endpoint.send(LaunchExecutor(masterUrl,
    exec.application.id, exec.id,exec.application.desc, exec.cores, exec.memory))}讓worker啓動CoarseGrainedExecutorBackend

19,Worker使用JDK的ProcessBuider.start來啓動CoarseGrainedExecutorBackend進程

override def receive: PartialFunction[Any, Unit] =synchronized {
 
…..
  
  /**
      * appDesc,裏面包括command信息,裏面有啓動類CoarseGrainedExecutorBackend
        //是在SparkContext初始化時啓動TaskSchedulerImpl.start()之後由SparkDeploySchedulerBackend的AppClient的RpcEndPonit調用registerApplication 放進去的
     //再調用startExecutorsOnWorkers==》allocateWorkerResourceToExecutors==》launchExecutor(worker, exec)==》worker.endpoint.send(LaunchExecutor(masterUrl,。。)

        masterUrl:spark://luyl152:7077
        appId: app-20180404172558-0000
        execId: 一個自增的數值,默認從0開始
   cores_ : --num-executors或SparkConf的"spark.executor.cores"的值,如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
        memory_ : 對應sc.executorMemory,默認是1024MB
      */

 
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
    /**worker LaunchExecutor創建了ExecutorRunner,然後調用了ExecutorRunner的start()方法,該start()方法調用了方法fetchAndRunExecutor(),
      * 這個fetchAndRunExecutor()方法中有以下代碼:
        val builder =CommandUtils.buildProcessBuilder(appDesc.command, newSecurityManager(conf),memory, sparkHome.getAbsolutePath, substituteVariables)
        process = builder.start()
      */

   
if (masterUrl != activeMasterUrl) {
     
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
 
  } else {
     
try {
       
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))

      
  // Create the executor's working directory.創建CoarseGrainedExecutorBackend對應的工作目錄
        //這個workDir:在WorkerArguments 中初始化SPARK_WORKER_DIR如果不設置這個變量,會在worker啓動時在spark_home下面創建一個work目錄
        //executorDir :/data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4

        val executorDir = new File(workDir, appId+ "/" + execId)
       
if (!executorDir.mkdirs()){
         
throw new IOException("Failed tocreate directory " +executorDir)
       
}

        // Create local dirs for the executor. These are passedto the executor via the
        // SPARK_EXECUTOR_DIRSenvironment variable, and deleted by the Worker when the
        // application finishes.
        //創建本地目錄爲CoarseGrainedExecutorBackend,會通過環境變量SPARK_EXECUTOR_DIRS(在WorkerArguments 中初始化)傳給CoarseGrainedExecutorBackend
        //當application完成時,會將它刪除掉
        //appDirectories:HashMap[String,Seq[String]]第一次的時候,肯定是沒有值的
        //appLocalDirs:返回Seq("/tmp/spark-b7c124be-813a-4c06-8f8e-1e04fd2b5056/executor-ed6c2e1e-c448-4883-8f34-5efdde76521b")

        val appLocalDirs = appDirectories.get(appId).getOrElse {
         
//getOrCreateLocalRootDirs()返回:Array(/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b)
          Utils.getOrCreateLocalRootDirs(conf).map { dir=>
   
 //返回/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6
            val appDir = Utils.createDirectory(dir, namePrefix = "executor")
           
Utils.chmod700(appDir)
            appDir.getAbsolutePath()
          }.toSeq
        }
//appDirectories:HashMap["app-20180404172558-0000",Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")]
        appDirectories(appId) = appLocalDirs
       
//ExecutorRunner讓每個Worker節點真正去啓動CoarseGrainedExecutorBackend進程
        //使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend

        val manager = new ExecutorRunner(
         
appId, //app-20180404172558-0000
          execId,
         
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
         
cores_,
         
memory_,
         
self,
         
workerId, //worker-20180321165947-luyl153-RpcAddress.port
          host//worker的host
          webUi.boundPort, //worker的WebUI端口是8081,master的是8080
          publicAddress, //當前worker的主機名
          sparkHome,
         
executorDir,///data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4
          workerUri, //spark://sparkWorker@luyl153:RpcAddress.port
          conf,
         
appLocalDirs, ExecutorState.RUNNING)
       
//executors: HashMap[String, ExecutorRunner]
        executors(appId + "/" + execId) = manager
       
manager.start()
        coresUsed += cores_
       
memoryUsed += memory_
       
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
     
} catch {
       
case e: Exception => {
         
logError(s"Failed to launch executor $appId/$execId for${appDesc.name}.", e)
         
if (executors.contains(appId + "/" + execId)) {
           
executors(appId + "/" + execId).kill()
           
executors -= appId + "/" + execId
         
}
          sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
           
Some(e.toString), None))
       
}
      }
    }

20,最精彩的代碼即將登場,就是spark使用ProcessBuilder來啓動CoarseGrainedExecutorBackend

private[deploy] class ExecutorRunner(
   
val appId:String, //app-20180404172558-0000
    val execId:Int,
 
  val appDesc:ApplicationDescription,
   
val cores: Int,
   
val memory: Int,//對應sc.executorMemory,默認是1024MB
    val worker:RpcEndpointRef,
   
val workerId: String,
   
val host: String,
   
val webUiPort: Int,//worker的WebUI端口是8081,master的是8080
    val publicAddress:String,//當前worker的主機名
    val sparkHome:File,
   
val executorDir: File, //$spark_home$/work/0
    val workerUrl:String, //spark://sparkWorker@luyl153:RpcAddress.port
    conf: SparkConf,
   
val appLocalDirs: Seq[String], // Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
    @volatile var state:ExecutorState.Value)
 
extends Logging {

 
private val fullId =appId + "/" + execId
 
private var workerThread: Thread = null
 
private var
process: Process = null
 
private var
stdoutAppender: FileAppender = null
 
private var
stderrAppender: FileAppender = null

 
// NOTE: This is now redundant with the automatedshut-down enforced by the Executor. It might  make sense to remove this in the future.
  private var shutdownHook: AnyRef = null

 
private
[worker] def start() {
   
workerThread = new Thread("ExecutorRunner for " + fullId) {
     
override def run() {fetchAndRunExecutor() }
   
}
    workerThread.start()
   
// Shutdown hook that kills actors on shutdown.

//jdk的main類退出時的hook代碼,後面分析,還是挺有用的,也長見識
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
     
// It's possible that we arrive here before calling`fetchAndRunExecutor`, then `state` will
     
// be `ExecutorState.RUNNING`. Inthis case, we should set `state` to `FAILED`.
      if (state== ExecutorState.RUNNING) {
       
state = ExecutorState.FAILED
     
}
     
killProcess(Some("Worker shutting down")) }
 
}

===》查看fetchAndRunExecutor方法

/**
 * Download and run the executordescribed in our ApplicationDescription
  * 使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend
  *https://blog.csdn.net/u013256816/article/details/54603910
 */

private def fetchAndRunExecutor() {
 
try {
   
// Launch the process,它就是返回jdk的ProcessBuilder
    val builder= CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf), memory, sparkHome.getAbsolutePath, substituteVariables)
   
。。。}

===》先看一下CommandUtils.buildProcessBuilder通過,ProcessBuilder放入執行main類的java命令

object CommandUtils extends Logging {

 
/**
   * Build a ProcessBuilder based on thegiven parameters.
   * The
`env`
argument is exposed for testing.
    * //command :Command(org.apache.spark.executor.CoarseGrainedExecutorBackend,
      // List(--driver-url,spark://[email protected]:49972,
      // --executor-id, {{EXECUTOR_ID}},
      // --hostname, {{HOSTNAME}},
      // --cores, {{CORES}}, --app-id,{{APP_ID}}, --worker-url, {{WORKER_URL}}),
      // Map(SPARK_USER -> root,SPARK_EXECUTOR_MEMORY -> 1024m),
      //List(),List(),ArraySeq(-Dspark.driver.port=49972, -XX:+PrintGCDetails,-Dkey=value, -Dnumbers=one two three))
   */

 
def buildProcessBuilder(
     
command: Command,
     
securityMgr: SecurityManager,
     
memory: Int,//1024MB
      sparkHome: String,
     
substituteArguments: String =>String, //將command中的參數變量{EXECUTOR_ID}}、{{CORES}}轉成具體值
      classPaths: Seq[String] = Seq[String](),
     
env: Map[String, String] = sys.env):ProcessBuilder = {
   
val localCommand= buildLocalCommand(
     
command, securityMgr, substituteArguments, classPaths, env)
   
val commandSeq= buildCommandSeq(localCommand, memory, sparkHome)
   
/**會把如下命令給ProcessBuilder構造器,可以看出就是一個java -cp *.jarmain類 。。啓動命令
      * "/usr/local/java/jdk1.8.0_91/bin/java" "-cp" "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/"   "-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value" "-Dnumbers=one two three"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url""spark://[email protected]:47218""--executor-id" "0"  "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000" "--worker-url""spark://[email protected]:44713"
      */

   
val builder= new ProcessBuilder(commandSeq: _*)
  
  // environment方法獲得運行進程的環境變量,得到一個Map,可以修改環境變量
    val environment= builder.environment()
   
for ((key, value) <- localCommand.environment) {
     
environment.put(key, value)
   
}
    builder
  }

===》得到的ProcessBuilder對象之後,回到fetchAndRunExecutor()繼續往下走

private def fetchAndRunExecutor() {
 
try {

。。。
   
   //返回此進程生成器的操作系統程序和參數。
    val command= builder.command()
   
val formattedCommand= command.asScala.mkString("\"", "\" \"", "\"")
   
/**
      * 94:18/05/03 19:44:17 INFOworker.ExecutorRunner: Launch command: "/usr/local/java/jdk1.8.0_91/bin/java" "-cp"  "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/"
"-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value"
 "-Dnumbers=one two three""org.apache.spark.executor.CoarseGrainedExecutorBackend"
      * "--driver-url""spark://[email protected]:47218""--executor-id" "0"
      * "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000"
      * "--worker-url""spark://[email protected]:44713"
      */

   
logInfo(s"Launchcommand: $formattedCommand")
   
//設置當前進程工作目錄,//$spark_home$/work/0
    builder.directory(executorDir)
 
   // appLocalDirs: Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
    //builder.environment返回一個Map,可以修改環境變量的值

    builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
   
// In case we are running this from within the SparkShell, avoid creating a "scala"
   
// parent process for the executorcommand
    builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

   
// Add webUI log urls
    //worker的WebUI端口是8081,master的是8080,將worker的錯誤和正確的輸出流指定web頁面上

    val baseUrl=
     
s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
   
builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
   
builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
  
  //峯迴路轉,CoarseGrainedExecutorBackend就是在這啓動的
    process = builder.start()
   
val header= "Spark Executor Command: %s\n%s\n\n".format(
     
formattedCommand, "=" * 40)

   
// Redirect its stdout and stderr to files
    //把標準輸出到: /$spark_home$/work/0/stdout文件中

    val stdout= new File(executorDir, "stdout")
   
stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
   
//把錯誤輸出到: /$spark_home$/work/0/stderr文件中
    val stderr= new File(executorDir, "stderr")
   
Files.write(header, stderr, UTF_8)
   
stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

   
// Wait for it to exit; executor may exit with code 0(when driver instructs it to shutdown) or with nonzero exit code
    // process.waitFor()
會讓當前線程阻塞,在不出現異常時直到進程執行結束
    val exitCode= process.waitFor()
   
state = ExecutorState.EXITED
   
val message= "Command exited with code " + exitCode
   
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
 
} catch {
   
case interrupted:InterruptedException => {
     
logInfo("Runner thread for executor " + fullId + " interrupted")
     
state = ExecutorState.KILLED
     
killProcess(None)
   
}
    case e: Exception => {
     
logError("Error running executor", e)
     
state = ExecutorState.FAILED
     
killProcess(Some(e.toString))
   
}
  }
}

21,CoarseGrainedExecutorBackend啓動main進程,會解析從Worker哪傳過來的main參數同時調用run方法

private[spark] object CoarseGrainedExecutorBackendextends Logging {

…..
 
def main(args: Array[String]) {
   
var driverUrl:String = null//CoarseGrainedSchedulerBackend的DriverEndpointRef
    var executorId:String = null//ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
    var hostname:String = null//worker的ip
    var cores:Int = //cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
    var appId:String = null  //app-20180503193934-0000
    var workerUrl:Option[String] = None //spark://[email protected]:44713
    val userClassPath= new mutable.ListBuffer[URL]()

   
var argv= args.toList

//將參數解析出來,放到成員變量中
   
while (!argv.isEmpty){
     
argv match {
       
case ("--driver-url") :: value :: tail =>
         
driverUrl = value
          argv = tail
        case ("--executor-id") :: value :: tail =>
         
executorId = value
          argv = tail
       。。。。

          printUsageAndExit()
     
}
    }
    //如果有一個值是空就打印退出main
    if (driverUrl== null || executorId == null ||hostname == null || cores <= 0 ||
     
appId == null) {
  
   printUsageAndExit()
    }

    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
 
}

22,查看run方法初始化了什麼

private[spark] object CoarseGrainedExecutorBackendextends Logging {
 
/**
    * "--driver-url" "spark://[email protected]:56522"  就是 CoarseGrainedSchedulerBackend的DriverEndpointRef
    * "--executor-id""4"   //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
    * "--hostname""192.168.1.153"
    * "--cores""1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),
    *                 如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
    * "--app-id""app-20180508234845-0000"
    * "--worker-url""spark://[email protected]:53403"
    */

 
private def run(
     
driverUrl: String,
     
executorId: String,
     
hostname: String,
     
cores: Int,
     
appId: String,
     
workerUrl: Option[String],
     
userClassPath: Seq[URL]) {
   
//打印:liunx相關信號,如:當有Ctrl+C 取消命令時對應INT信息,也會將CoarseGrainedExecutorBackend進程取消掉
    SignalLogger.register(log)

   
SparkHadoopUtil.get.runAsSparkUser{ () =>
      // Debug code
     
Utils.checkHost(hostname)

     
// Bootstrap to fetch the driver's Spark properties.
     
val executorConf= new SparkConf
     
val port= executorConf.getInt("spark.executor.port", 0)
     
//創建一個RpcEnv相當於創建ActorSystem,標識是driverPropsFetcher
      val fetcher= RpcEnv.create(
       
"driverPropsFetcher",
       
hostname,
       
port,
       
executorConf,
       
new SecurityManager(executorConf),
       
clientMode = true)
     
//得到CoarseGrainedSchedulerBackend的DriverEndpointRef
      val driver= fetcher.setupEndpointRefByURI(driverUrl)
     
//會回覆一個Seq[(String,String)]裏面對應sparkConf中key以spark開始的所有屬性,同時將(spark.app.id,"app-20180508234845-0000")也放到這個Seq集合中
      val props= driver.askWithRetry[Seq[(String, String)]](RetrieveSparkProps) ++
       
Seq[(String, String)](("spark.app.id", appId))
     
//再將fetcher的RpcEnv關掉
      fetcher.shutdown()

     
// Create SparkEnv using properties we fetched from thedriver.
      //新new 一個默認的SparkConf(),並從DriverEndpointRef取到的Seq[(String, String)]賦到當前的SparkConf()中

      val driverConf= new SparkConf()
     
for ((key, value) <- props) {
       
// this is required for SSL in standalone mode
       
if (SparkConf.isExecutorStartupConf(key)){
         
driverConf.setIfMissing(key, value)
       
} else {
         
driverConf.set(key, value)
       
}
      }
      if (driverConf.contains("spark.yarn.credentials.file")) {
       
logInfo("Will periodically update credentials from: " +
         
driverConf.get("spark.yarn.credentials.file"))
       
SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(driverConf)
      }
      //創建一個CoarseGrainedSchedulerBackend對應的SparkEnv,創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值
      val env= SparkEnv.createExecutorEnv(
       
driverConf, executorId, hostname, port, cores, isLocal = false)

     
// SparkEnv will set spark.executor.port if the rpc envis listening for incoming
      // connections (e.g., if it's usingakka). Otherwise, the executor is running in
      // client mode only, and does notaccept incoming connections.
      //SparkEnv將設置spark.executor.port,如果rpcEnv爲外部聯接提供監聽(如使用akka).否則executor只會運行在client模式,並且不會接收外部聯接
      //創建一個CoarseGrainedSchedulerBackend對應的SparkEnv,在創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值

      val sparkHostPort= env.conf.getOption("spark.executor.port").map { port =>
         
hostname + ":" + port
       
}.orNull

      /**
        * 將sparkExecutor對應rpcEnv、
        * driverUrl"spark://[email protected]:56522"  就是CoarseGrainedSchedulerBackend的DriverEndpointRef
        * executorId "4"   //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
        * sparkHostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
        * cores "1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
        * userClassPath:空的集合
        * env:SparkEnv 給CoarseGrainedExecutorBackend實例
        */

     
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
       
env.rpcEnv, driverUrl, executorId, sparkHostPort, cores, userClassPath, env))
     
//也構造了一個WorkerWatcher, url:"spark://[email protected]:53403"
      workerUrl.foreach { url =>
       
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
     
}
      env.rpcEnv.awaitTermination()
      SparkHadoopUtil.get.stopExecutorDelegationTokenRenewer()
    }
  }

23,實例化了CoarseGrainedExecutorBackend,它也是RpcEndpoint

/** 該實例由CoarseGrainedExecutorBackend的main初始化的,
  * 將sparkExecutor對應rpcEnv、
  * driverUrl"spark://[email protected]:56522"  就是CoarseGrainedSchedulerBackend的DriverEndpointRef
  * executorId "4"   //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
  * hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
  * cores "1"   //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1), 如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
  * userClassPath:空的集合
  * env: SparkEnv給CoarseGrainedExecutorBackend實例
  */

private[spark] class CoarseGrainedExecutorBackend(
   
override val rpcEnv: RpcEnv,
   
driverUrl: String,
   
executorId: String,
   
hostPort: String,
   
cores: Int,
   
userClassPath: Seq[URL],
   
env: SparkEnv)
 
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {

 
var executor: Executor = null
 
//就是DriverEndpoint的引用
  @volatile var driver: Option[RpcEndpointRef] = None

 
// If this CoarseGrainedExecutorBackend is changed tosupport multiple threads, then this may need
  // to be changed so that we don't sharethe serializer instance across threads
  //如果CoarseGrainedExecutorBackend變成多線程,那麼這個需要改變,以便於我們不會把系列化實例在多線程中共享

  private[this] val ser: SerializerInstance =env.closureSerializer.newInstance()

 
override def onStart() {
   
//driverUrl:spark://[email protected]:49972,這個得到的RpcEndpointRef 就是DriverEndpoint的引用
    logInfo("Connectingto driver: " + driverUrl)
   
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use"ThreadUtils.sameThread"
      //
這是一個非常快的動作,所以我們使用"ThreadUtils.sameThread"
      driver = Some(ref)
     
//會通知DriverEndpoint,然後DriverEndpoint會回覆RegisteredExecutor給CoarseGrainedExecutorBackend,讓它創建Executor
      /* executorId "4"   //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
      self: CoarseGrainedExecutorBackend
      hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
      cores:--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
       extractLogUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改
       */

      ref.ask[RegisterExecutorResponse](
       
RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
   
}(ThreadUtils.sameThread).onComplete{
      // This is a very fast action so we can use"ThreadUtils.sameThread"
     
case Success(msg) =>Utils.tryLogNonFatalError {
       
Option(self).foreach(_.send(msg))// msg must be RegisterExecutorResponse
     
}
     
case Failure(e) => {
       
logError(s"Cannot register with driver: $driverUrl", e)
       
System.exit(1)
     
}
    }(ThreadUtils.sameThread)
  }

24,和CoarseGrainedExecutorBackend的DriverEndpoint通信,發送RegisterExecutor,讓它創建Executor

class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
 
extends ThreadSafeRpcEndpoint withLogging {
 
。。。。。
 
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
   
//CoarseGrainedExecutorBackend在初始化的時候,這個case會被調用
    /* executorId "4"   //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
       executorRef:CoarseGrainedExecutorBackend
       hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
       cores:--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
       extractLogUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改

(stdout,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stdout)
 (stderr,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stderr)
     */

    case RegisterExecutor(executorId, executorRef, hostPort, cores, logUrls) =>

     
//executorDataMap:HashMap[String,ExecutorData],剛開始時肯定是沒有值的
      if (executorDataMap.contains(executorId)) {
       
context.reply(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
     
} else {
      
  // If the executor's rpc env is not listening forincoming connections, `hostPort`
        // will be null, and the clientconnection should be used to contact the executor.
        //如果CoarseGrainedExecutorBackend的rpcEnv不去監聽外來聯接,hostPort是null,並且客戶端聯接必須被用來聯繫CoarseGrainedExecutorBackend

        val executorAddress = if (executorRef.address!= null) {
           
executorRef.address
          } else {
           
//standalone的client模式會進入這個代碼,然後將CoarseGrainedSchedulerBackend的RpcAddress取到"luyl155:53561"
            context.senderAddress
         
}
        //17/11/12 20:31:22 INFOcluster.SparkDeploySchedulerBackend: Registered executorNettyRpcEndpointRef(null) (luyl155:53561) with ID 2
        logInfo(s"Registeredexecutor $executorRef ($executorAddress) with ID $executorId")
   
    //addressToExecutorId:HashMap[RpcAddress, String],將CoarseGrainedSchedulerBackend的RpcAddress爲key,值是CoarseGrainedSchedulerBackend自己的id
        addressToExecutorId(executorAddress) = executorId
       
//totalCoreCount: AtomicInteger(0)是所有CoarseGrainedSchedulerBackend對應的cores總和
        totalCoreCount.addAndGet(cores)
       
//totalRegisteredExecutors: AtomicInteger(0),統計有多少個CoarseGrainedSchedulerBackend
        totalRegisteredExecutors.addAndGet(1)
       
/*executorRef: CoarseGrainedExecutorBackend
          executorRef.address、:因爲創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
          executorAddress.host:CoarseGrainedSchedulerBackend所在worker的ip
          cores:CoarseGrainedSchedulerBackend所能擁有的cores的個數
          logUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改
        * */

        val data = new ExecutorData(executorRef, executorRef.address, executorAddress.host,
         
cores, cores, logUrls)
       
// This must be synchronized because variables mutated  in this block are read when requestingexecutors
        // 必須同步,因爲請求 CoarseGrainedSchedulerBackend時,變量在這個塊中會變化

        CoarseGrainedSchedulerBackend.this.synchronized{
         
//executorDataMap:HashMap[String, ExecutorData],將oarseGrainedSchedulerBackend的id和它ExecutorData,裏面有ref引用、ip地址、core個數,放進去
          executorDataMap.put(executorId, data)
         
//numPendingExecutors的初始值是0
          if (numPendingExecutors> 0) {
           
numPendingExecutors -= 1
           
logDebug(s"Decrementednumber of pending executors ($numPendingExecutors left)")
         
}
        }
        // Note: some tests expect the reply to come after we putthe executor in the map
        //會通知CoarseGrainedExecutorBackend,初始化Executor線程池,然後makeOffer
        //executorAddress.host:CoarseGrainedSchedulerBackend所在的ip

        context.reply(RegisteredExecutor(executorAddress.host))
       
listenerBus.post(
         
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))

//稍等分析這個makeOffer,它是執行任務用的
       
makeOffers()
      }

25,DriverEndpoint回覆CoarseGrainedExecutorBackend,傳RegisteredExecutor對象給它,讓CoarseGrainedExecutorBackend會實例化Executor

private[spark] class CoarseGrainedExecutorBackend(
   
override val rpcEnv: RpcEnv,
   
driverUrl: String,
   
executorId: String,
   
hostPort: String,
   
cores: Int,
   
userClassPath: Seq[URL],
   
env: SparkEnv)
 
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {

 
……

 
override def receive:PartialFunction[Any, Unit] = {
   
//executorAddress.host:CoarseGrainedSchedulerBackend所在的ip
   
case RegisteredExecutor(hostname)=>
     
logInfo("Successfully registered with driver")
     
/** executorId: ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
        * hostname:CoarseGrainedSchedulerBackend所在的ip
        * env : SparkEnv 給CoarseGrainedExecutorBackend實例
        * userClassPath: 空的集合
        */

      executor
= new Executor(executorId, hostname, env, userClassPath, isLocal= false)

26,查看Executor初始化過程,調用了BlockManager的initialize方法

private[spark] class Executor(
   
executorId: String,
   
executorHostname: String,
   
env: SparkEnv,
   
userClassPath: Seq[URL] = Nil,
   
isLocal: Boolean= false)
 
extends Logging {

 
logInfo(s"Starting executor ID $executorId on host $executorHostname")

 
// Application dependencies (added through SparkContext)that we've fetched so far on this node.  Each map holds the master's timestamp for theversion of that file or JAR we got.
  //應該是和--jars和--files給Executor的依賴

  private val currentFiles: HashMap[String, Long] = new HashMap[String,Long]()
 
private val currentJars: HashMap[String, Long] = new HashMap[String,Long]()
 
//生成一個空的ByteBuffer,新的ByteBuffer的limit和capacity是當前數組的長度,position是0,mark不存在
  private val EMPTY_BYTE_BUFFER = ByteBuffer.wrap(new Array[Byte](0))
 
//在CoarseGrainedExecutorBackend被調用main時,將只有key以spark開始的所有屬性,才被放進來
  private val conf =env.conf

 
// No ip or host:port - just hostname
 
Utils.checkHost(executorHostname, "Expected executed slave to be a hostname")
 
// must not have port specified.
 
assert (0 == Utils.parseHostPort(executorHostname)._2)

 
// Make sure the local hostname we report matches thecluster scheduler's name for this host
  //
確保我們報告的本地主機名稱與此主機的羣集調度程序名稱相匹配
  Utils.setCustomHostname(executorHostname)
 
//isLocal的默認值是false
  if (!isLocal){
   
// Setup an uncaught exception handler for non-localmode.
    // Make any thread terminations due touncaught exceptions kill the entire
    // executor process to avoidsurprising stalls.
    //爲非本地模式設置未捕獲的異常處理程序。
    // 由於未捕獲的異常而使任何線程終止都會終止整個執行程序進程,以避免令人喫驚的停頓。

    Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
 
}

  // Start worker thread pool. 初始化緩存線程池
  private val threadPool = ThreadUtils.newDaemonCachedThreadPool("Executor task launch worker")
 
//ExecutorSource用於測量系統。
  private val executorSource = new ExecutorSource(threadPool, executorId)

 
if (!isLocal){
   
env.metricsSystem.registerSource(executorSource)
   
//Executor在初始化時調用一次,driver在SparkContext初始化也調用

    //是在在CoarseGrainedExecutorBackend被調用main時放進去的,值是app-20180508234845-0000

(查看spark-core_28:Executor初始化過程env.blockManager.initialize(conf.getAppId)- NettyBlockTransferService.init()源碼分析)

    env.blockManager.initialize(conf.getAppId)
 
}

===》接下來就是發送心跳給和指標給Driver

// must be initializedbefore running startDriverHeartbeat()
//是SparkContext初始化出來的,HeartbeatReceiver。 ENDPOINT_NAME: HeartbeatReceiver

private val heartbeatReceiverRef =
 
RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv)
startDriverHeartbeater()

後面再具體看一個具體看一下Executor發起的每10s一次的心跳

/**
 * Schedules a task to report heartbeatand partial metrics for active tasks to driver.
  * 安排任何去報告心跳,同時部分活動的指標給driver
 */

private def startDriverHeartbeater(): Unit = {
 
//默認時間是10秒,會轉換成毫秒值
  val intervalMs= conf.getTimeAsMs("spark.executor.heartbeatInterval", "10s")

 
// Wait a random interval so the heartbeats don't end upin sync
  //應該是小於20的值

  val initialDelay= intervalMs + (math.random * intervalMs).asInstanceOf[Int]

 
val heartbeatTask= new Runnable() {
   
override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
 
}
  //先延遲小於20s,然後每10s執行一次
  heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
}


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章