spark on yarn yarn-client模式實現源碼走讀

Spark版本2.4.0

 

 

在SparkContext的初始化過程中,將會根據配置的啓動模式來選擇不同的任務調度器TaskScheduler,而這個不同模式的實現也是在這裏根據選擇的TaskScheduler類型進行區分並實現。

case masterUrl =>
  val cm = getClusterManager(masterUrl) match {
    case Some(clusterMgr) => clusterMgr
    case None => throw new SparkException("Could not parse Master URL: '" + master + "'")
  }
  try {
    val scheduler = cm.createTaskScheduler(sc, masterUrl)
    val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)
    cm.initialize(scheduler, backend)
    (backend, scheduler)
  } catch {
    case se: SparkException => throw se
    case NonFatal(e) =>
      throw new SparkException("External scheduler cannot be instantiated", e)
  }

上方式SparkContext的createTaskScheduler()方法,在這裏當選擇了yarn模式,將會在這裏加載相應的ClusterManager來進行創建TaskScheduler,在標題所提到的yarn-client模式下,這裏會分別創建一個YarnScheduler和YarnClinetSchedulerBackend作爲spark任務運行的調度者。

 

YarnScheduler實現只是簡單的繼承了local模型下會選擇的TaskSchedulerImpl,因爲在yarn-client模式下和local一樣,Driver端運行在本地,所以YarnScheduler的實現並沒有什麼特殊的地方。

但是相應的,由於backend實現了和yarn的交互,自然實現存在比較大的差異。

 

當TaskScheduler正式開始啓動的時候,在YarnClinetSchedulerBackend的start()方法中,也會開始初始化一個yarn客戶端,並在這裏完成向yarn的ResourceManager註冊提交應用的流程。

override def start() {
  val driverHost = conf.get("spark.driver.host")
  val driverPort = conf.get("spark.driver.port")
  val hostport = driverHost + ":" + driverPort
  sc.ui.foreach { ui => conf.set("spark.driver.appUIAddress", ui.webUrl) }

  val argsArrayBuf = new ArrayBuffer[String]()
  argsArrayBuf += ("--arg", hostport)

  logDebug("ClientArguments called with: " + argsArrayBuf.mkString(" "))
  val args = new ClientArguments(argsArrayBuf.toArray)
  totalExpectedExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)
  client = new Client(args, conf)
  bindToYarn(client.submitApplication(), None)

  // SPARK-8687: Ensure all necessary properties have already been set before
  // we initialize our driver scheduler backend, which serves these properties
  // to the executors
  super.start()
  waitForApplication()

  monitorThread = asyncMonitorApplication()
  monitorThread.start()
}

上方是YarnClinetSchedulerBackend的start()方法,可以看到在這裏核心兩個步驟,構建Client,Client封裝了與yarn的連接與操作,而後便是通過初始化完畢的Client通過submitApplication()方法提交應用。

 

重點來看Client的submitApplication()方法。

yarnClient.init(hadoopConf)
yarnClient.start()

首先根據工程中的配置完成yarnClient的初始化,之後相關操作都是通過yarnClient來進行完成。

val newApp = yarnClient.createApplication()
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()

之後先通過createApplication()方法向yarn申請一個新的Application,在這裏得到的newAppResponse不僅包含yarn的相關配置部署信息及限制,更重要的是在這裏返回了所申請應用在接下來的appId,在yarn模式下appid是yarn所提供的。

val containerContext = createContainerLaunchContext(newAppResponse)

接下來是重要的一步,根據createContainerLauchContext()方法來構建yarn中的重要屬性Container的上下文containerContext。

 

 

對應在yarn中構建Container中所需的相關重要屬性,都會在createContainerLauchContext()方法中得到。

val appId = newAppResponse.getApplicationId
val appStagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId))

在這裏,根據配置的hdfs屬性,使用用戶,以及剛剛得到的appid創建了之後相關jar包和資源將會上傳的hdfs路徑。

val launchEnv = setupLaunchEnv(appStagingDirPath, pySparkArchives)
val localResources = prepareLocalResources(appStagingDirPath, pySparkArchives)

val amContainer = Records.newRecord(classOf[ContainerLaunchContext])
amContainer.setLocalResources(localResources.asJava)
amContainer.setEnvironment(launchEnv.asJava)

相應的得到了這個路徑,將會在這裏準備將將要上傳至hdfs的本地資源準備上傳到hdfs對應的路徑上去。

val javaOpts = ListBuffer[String]()

// Set the environment variable through a command prefix
// to append to the existing value of the variable
var prefixEnv: Option[String] = None

// Add Xmx for AM memory
javaOpts += "-Xmx" + amMemory + "m"

val tmpDir = new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR)
javaOpts += "-Djava.io.tmpdir=" + tmpDir

// TODO: Remove once cpuset version is pushed out.
// The context is, default gc for server class machines ends up using all cores to do gc -
// hence if there are multiple containers in same node, Spark GC affects all other containers'
// performance (which can be that of other Spark containers)
// Instead of using this, rely on cpusets by YARN to enforce "proper" Spark behavior in
// multi-tenant environments. Not sure how default Java GC behaves if it is limited to subset
// of cores on a node.
val useConcurrentAndIncrementalGC = launchEnv.get("SPARK_USE_CONC_INCR_GC").exists(_.toBoolean)
if (useConcurrentAndIncrementalGC) {
  // In our expts, using (default) throughput collector has severe perf ramifications in
  // multi-tenant machines
  javaOpts += "-XX:+UseConcMarkSweepGC"
  javaOpts += "-XX:MaxTenuringThreshold=31"
  javaOpts += "-XX:SurvivorRatio=8"
  javaOpts += "-XX:+CMSIncrementalMode"
  javaOpts += "-XX:+CMSIncrementalPacing"
  javaOpts += "-XX:CMSIncrementalDutyCycleMin=0"
  javaOpts += "-XX:CMSIncrementalDutyCycle=10"
}

顯然上方一部分是在yarn上將要啓動的一部分java命令行參數的構建,該部分代碼只是對應功能的一部分實現,該部分涉及到的參數很多,代碼也很長。

val amClass =
  if (isClusterMode) {
    Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
  } else {
    Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
  }

值得一提的是,yarn-client模式將要提交給yarn實現的ApplicationMaster將是ExecutorLauncher。

上述提到的都將作爲一部分供在yarn上進行任務創建的時候使用。

val appContext = createApplicationSubmissionContext(newApp, containerContext)

在完成containerContext的創建,將會通過createApplicationSubmissionContext()方法創建appContext,這個app上下文將會直接被用在向yan提交app上。

yarnClient.submitApplication(appContext)

createApplicationSubmissionContext()方法中,進一步根據yarn的要求進行提交app的封裝,之前提到的containerContext也會作爲一部分被封裝,最後通過yarnClient提交app宣告app的提交完畢。

到這裏,yarn-client向yarn的ResourceManager提交ApplicationMaster的步驟完成。

 

提交到yarn上後,首先會在一個NodeManager上啓動一個ExecutorLauncher來與先前的spark端進行通信,由於是yarn-client模式,將根據運行在本地的Driver端的調度來在yarn中進行task的創建。

object ExecutorLauncher {

  def main(args: Array[String]): Unit = {
    ApplicationMaster.main(args)
  }

}

ExecutorLauncher的實現其實還是和yarn-cluster一樣通過ApplicationMaster實現,但是將會在ApplicationMaster具體的實現邏輯中進行相應的區分。

在yarn-client模式中,ApplicationMaster的主要邏輯實現在了runExecutorLauncher()方法中。

val (driverHost, driverPort) = Utils.parseHostPort(args.userArgs(0))
val driverRef = rpcEnv.setupEndpointRef(
  RpcAddress(driverHost, driverPort),
  YarnSchedulerBackend.ENDPOINT_NAME)
addAmIpFilter(Some(driverRef))
createAllocator(driverRef, sparkConf)

在runExecutorLauncher()方法中,首先會直接構造與Driver端的通信連接,並構造一個yarnAllocator準備通過和yarn申請資源來執行task。

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
  case r: RequestExecutors =>
    Option(allocator) match {
      case Some(a) =>
        if (a.requestTotalExecutorsWithPreferredLocalities(r.requestedTotal,
          r.localityAwareTasks, r.hostToLocalTaskCount, r.nodeBlacklist)) {
          resetAllocatorInterval()
        }
        context.reply(true)

      case None =>
        logWarning("Container allocator is not ready to request executors yet.")
        context.reply(false)
    }

  case KillExecutors(executorIds) =>
    logInfo(s"Driver requested to kill executor(s) ${executorIds.mkString(", ")}.")
    Option(allocator) match {
      case Some(a) => executorIds.foreach(a.killExecutor)
      case None => logWarning("Container allocator is not ready to kill executors yet.")
    }
    context.reply(true)

  case GetExecutorLossReason(eid) =>
    Option(allocator) match {
      case Some(a) =>
        a.enqueueGetLossReasonRequest(eid, context)
        resetAllocatorInterval()
      case None =>
        logWarning("Container allocator is not ready to find executor loss reasons yet.")
    }
}

在於Driver端的通信中,將會持續監聽Driver端的task下發,並根據此向yarn申請資源執行task。

override def onDisconnected(remoteAddress: RpcAddress): Unit = {
  // In cluster mode, do not rely on the disassociated event to exit
  // This avoids potentially reporting incorrect exit codes if the driver fails
  if (!isClusterMode) {
    logInfo(s"Driver terminated or disconnected! Shutting down. $remoteAddress")
    finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
  }
}

Yarn-client模式下本地Driver關閉會導致整個應用關閉也在此實現,當與Driver端的連接關閉的時候,將會結束在yarn上的運行。

 

 

最後回到Driver端,上文YarnClinetSchedulerBackend繼承自YarnSchedulerBackend,當任務在調度執行環節時,將task下發至yarn上的ApplicationMaster,便是在YarnSchedulerBackend中實現的。

/**
 * Request executors from the ApplicationMaster by specifying the total number desired.
 * This includes executors already pending or running.
 */
override def doRequestTotalExecutors(requestedTotal: Int): Future[Boolean] = {
  yarnSchedulerEndpointRef.ask[Boolean](prepareRequestExecutors(requestedTotal))
}

/**
 * Request that the ApplicationMaster kill the specified executors.
 */
override def doKillExecutors(executorIds: Seq[String]): Future[Boolean] = {
  yarnSchedulerEndpointRef.ask[Boolean](KillExecutors(executorIds))
}

最後executor的下發都在這裏通過網絡通信下發到yarn上的ApplicationMaster,來進行遠程調度。

 

以上便是spark on yarn中yarn-client的源碼走讀。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章