承接上文(spark-core_24:AppClient的ClientEndpoint註冊RegisterApplication)
上文中提到:master調用launchExecutor(){worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id,exec.application.desc, exec.cores, exec.memory))}讓worker啓動CoarseGrainedExecutorBackend
19,Worker使用JDK的ProcessBuider.start來啓動CoarseGrainedExecutorBackend進程
override def receive: PartialFunction[Any, Unit] =synchronized {
…..
/**
* appDesc,裏面包括command信息,裏面有啓動類CoarseGrainedExecutorBackend
//是在SparkContext初始化時啓動TaskSchedulerImpl.start()之後由SparkDeploySchedulerBackend的AppClient的RpcEndPonit調用registerApplication 放進去的
//再調用startExecutorsOnWorkers==》allocateWorkerResourceToExecutors==》launchExecutor(worker, exec)==》worker.endpoint.send(LaunchExecutor(masterUrl,。。)
masterUrl:spark://luyl152:7077
appId: app-20180404172558-0000
execId: 一個自增的數值,默認從0開始
cores_ : --num-executors或SparkConf的"spark.executor.cores"的值,如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
memory_ : 對應sc.executorMemory,默認是1024MB
*/
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
/**worker LaunchExecutor創建了ExecutorRunner,然後調用了ExecutorRunner的start()方法,該start()方法調用了方法fetchAndRunExecutor(),
* 這個fetchAndRunExecutor()方法中有以下代碼:
val builder =CommandUtils.buildProcessBuilder(appDesc.command, newSecurityManager(conf),memory, sparkHome.getAbsolutePath, substituteVariables)
process = builder.start()
*/
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
// Create the executor's working directory.創建CoarseGrainedExecutorBackend對應的工作目錄
//這個workDir:在WorkerArguments 中初始化SPARK_WORKER_DIR如果不設置這個變量,會在worker啓動時在spark_home下面創建一個work目錄
//executorDir :/data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4
val executorDir = new File(workDir, appId+ "/" + execId)
if (!executorDir.mkdirs()){
throw new IOException("Failed tocreate directory " +executorDir)
}
// Create local dirs for the executor. These are passedto the executor via the
// SPARK_EXECUTOR_DIRSenvironment variable, and deleted by the Worker when the
// application finishes.
//創建本地目錄爲CoarseGrainedExecutorBackend,會通過環境變量SPARK_EXECUTOR_DIRS(在WorkerArguments 中初始化)傳給CoarseGrainedExecutorBackend
//當application完成時,會將它刪除掉
//appDirectories:HashMap[String,Seq[String]]第一次的時候,肯定是沒有值的
//appLocalDirs:返回Seq("/tmp/spark-b7c124be-813a-4c06-8f8e-1e04fd2b5056/executor-ed6c2e1e-c448-4883-8f34-5efdde76521b")
val appLocalDirs = appDirectories.get(appId).getOrElse {
//getOrCreateLocalRootDirs()返回:Array(/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b)
Utils.getOrCreateLocalRootDirs(conf).map { dir=>
//返回/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6
val appDir = Utils.createDirectory(dir, namePrefix = "executor")
Utils.chmod700(appDir)
appDir.getAbsolutePath()
}.toSeq
}
//appDirectories:HashMap["app-20180404172558-0000",Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")]
appDirectories(appId) = appLocalDirs
//ExecutorRunner讓每個Worker節點真正去啓動CoarseGrainedExecutorBackend進程
//使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend
val manager = new ExecutorRunner(
appId, //app-20180404172558-0000
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId, //worker-20180321165947-luyl153-RpcAddress.port值
host, //worker的host
webUi.boundPort, //worker的WebUI端口是8081,master的是8080
publicAddress, //當前worker的主機名
sparkHome,
executorDir,///data/spark-1.6.0-bin-hadoop2.6/work/app-20180508234845-0000/4
workerUri, //spark://sparkWorker@luyl153:RpcAddress.port
conf,
appLocalDirs, ExecutorState.RUNNING)
//executors: HashMap[String, ExecutorRunner]
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception => {
logError(s"Failed to launch executor $appId/$execId for${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
}
20,最精彩的代碼即將登場,就是spark使用ProcessBuilder來啓動CoarseGrainedExecutorBackend
private[deploy] class ExecutorRunner(
val appId:String, //app-20180404172558-0000
val execId:Int,
val appDesc:ApplicationDescription,
val cores: Int,
val memory: Int,//對應sc.executorMemory,默認是1024MB
val worker:RpcEndpointRef,
val workerId: String,
val host: String,
val webUiPort: Int,//worker的WebUI端口是8081,master的是8080
val publicAddress:String,//當前worker的主機名
val sparkHome:File,
val executorDir: File, //$spark_home$/work/0
val workerUrl:String, //spark://sparkWorker@luyl153:RpcAddress.port
conf: SparkConf,
val appLocalDirs: Seq[String], // Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
@volatile var state:ExecutorState.Value)
extends Logging {
private val fullId =appId + "/" + execId
private var workerThread: Thread = null
private var process: Process = null
private var stdoutAppender: FileAppender = null
private var stderrAppender: FileAppender = null
// NOTE: This is now redundant with the automatedshut-down enforced by the Executor. It might make sense to remove this in the future.
private var shutdownHook: AnyRef = null
private[worker] def start() {
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() {fetchAndRunExecutor() }
}
workerThread.start()
// Shutdown hook that kills actors on shutdown.
//jdk的main類退出時的hook代碼,後面分析,還是挺有用的,也長見識
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
// It's possible that we arrive here before calling`fetchAndRunExecutor`, then `state` will
// be `ExecutorState.RUNNING`. Inthis case, we should set `state` to `FAILED`.
if (state== ExecutorState.RUNNING) {
state = ExecutorState.FAILED
}
killProcess(Some("Worker shutting down")) }
}
===》查看fetchAndRunExecutor方法
/**
* Download and run the executordescribed in our ApplicationDescription
* 使用Jdk的ProcessBuilder.start()來啓動CoarseGrainedExecutorBackend
*https://blog.csdn.net/u013256816/article/details/54603910
*/
private def fetchAndRunExecutor() {
try {
// Launch the process,它就是返回jdk的ProcessBuilder
val builder= CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf), memory, sparkHome.getAbsolutePath, substituteVariables)
。。。}
===》先看一下CommandUtils.buildProcessBuilder通過,ProcessBuilder放入執行main類的java命令
object CommandUtils extends Logging {
/**
* Build a ProcessBuilder based on thegiven parameters.
* The `env` argument is exposed for testing.
* //command :Command(org.apache.spark.executor.CoarseGrainedExecutorBackend,
// List(--driver-url,spark://[email protected]:49972,
// --executor-id, {{EXECUTOR_ID}},
// --hostname, {{HOSTNAME}},
// --cores, {{CORES}}, --app-id,{{APP_ID}}, --worker-url, {{WORKER_URL}}),
// Map(SPARK_USER -> root,SPARK_EXECUTOR_MEMORY -> 1024m),
//List(),List(),ArraySeq(-Dspark.driver.port=49972, -XX:+PrintGCDetails,-Dkey=value, -Dnumbers=one two three))
*/
def buildProcessBuilder(
command: Command,
securityMgr: SecurityManager,
memory: Int,//1024MB
sparkHome: String,
substituteArguments: String =>String, //將command中的參數變量{EXECUTOR_ID}}、{{CORES}}轉成具體值
classPaths: Seq[String] = Seq[String](),
env: Map[String, String] = sys.env):ProcessBuilder = {
val localCommand= buildLocalCommand(
command, securityMgr, substituteArguments, classPaths, env)
val commandSeq= buildCommandSeq(localCommand, memory, sparkHome)
/**會把如下命令給ProcessBuilder構造器,可以看出就是一個java -cp *.jarmain類 。。啓動命令
* "/usr/local/java/jdk1.8.0_91/bin/java" "-cp" "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/" "-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value" "-Dnumbers=one two three" "org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url""spark://[email protected]:47218""--executor-id" "0" "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000" "--worker-url""spark://[email protected]:44713"
*/
val builder= new ProcessBuilder(commandSeq: _*)
// environment方法獲得運行進程的環境變量,得到一個Map,可以修改環境變量
val environment= builder.environment()
for ((key, value) <- localCommand.environment) {
environment.put(key, value)
}
builder
}
===》得到的ProcessBuilder對象之後,回到fetchAndRunExecutor()繼續往下走
private def fetchAndRunExecutor() {
try {
。。。
//返回此進程生成器的操作系統程序和參數。
val command= builder.command()
val formattedCommand= command.asScala.mkString("\"", "\" \"", "\"")
/**
* 94:18/05/03 19:44:17 INFOworker.ExecutorRunner: Launch command: "/usr/local/java/jdk1.8.0_91/bin/java" "-cp" "/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/"
"-Xms1024M" "-Xmx1024M""-Dspark.driver.port=47218" "-XX:+PrintGCDetails""-Dkey=value"
"-Dnumbers=one two three""org.apache.spark.executor.CoarseGrainedExecutorBackend"
* "--driver-url""spark://[email protected]:47218""--executor-id" "0"
* "--hostname""192.168.1.153" "--cores" "4""--app-id" "app-20180503193934-0000"
* "--worker-url""spark://[email protected]:44713"
*/
logInfo(s"Launchcommand: $formattedCommand")
//設置當前進程工作目錄,//$spark_home$/work/0
builder.directory(executorDir)
// appLocalDirs: Seq("/tmp/spark-e72251ed-96b6-4fe6-b704-1772b5fc5a8b/executor-7ab80469-4222-40c9-87cf-a6f2f00e30c6")
//builder.environment返回一個Map,可以修改環境變量的值
builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
// In case we are running this from within the SparkShell, avoid creating a "scala"
// parent process for the executorcommand
builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")
// Add webUI log urls
//worker的WebUI端口是8081,master的是8080,將worker的錯誤和正確的輸出流指定web頁面上
val baseUrl=
s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
//峯迴路轉,CoarseGrainedExecutorBackend就是在這啓動的
process = builder.start()
val header= "Spark Executor Command: %s\n%s\n\n".format(
formattedCommand, "=" * 40)
// Redirect its stdout and stderr to files
//把標準輸出到: /$spark_home$/work/0/stdout文件中
val stdout= new File(executorDir, "stdout")
stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
//把錯誤輸出到: /$spark_home$/work/0/stderr文件中
val stderr= new File(executorDir, "stderr")
Files.write(header, stderr, UTF_8)
stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
// Wait for it to exit; executor may exit with code 0(when driver instructs it to shutdown) or with nonzero exit code
// process.waitFor()會讓當前線程阻塞,在不出現異常時直到進程執行結束
val exitCode= process.waitFor()
state = ExecutorState.EXITED
val message= "Command exited with code " + exitCode
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
case interrupted:InterruptedException => {
logInfo("Runner thread for executor " + fullId + " interrupted")
state = ExecutorState.KILLED
killProcess(None)
}
case e: Exception => {
logError("Error running executor", e)
state = ExecutorState.FAILED
killProcess(Some(e.toString))
}
}
}
21,CoarseGrainedExecutorBackend啓動main進程,會解析從Worker哪傳過來的main參數同時調用run方法
private[spark] object CoarseGrainedExecutorBackendextends Logging {
…..
def main(args: Array[String]) {
var driverUrl:String = null//CoarseGrainedSchedulerBackend的DriverEndpointRef
var executorId:String = null//ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
var hostname:String = null//worker的ip
var cores:Int = 0 //cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
var appId:String = null //app-20180503193934-0000
var workerUrl:Option[String] = None //spark://[email protected]:44713
val userClassPath= new mutable.ListBuffer[URL]()
var argv= args.toList
//將參數解析出來,放到成員變量中
while (!argv.isEmpty){
argv match {
case ("--driver-url") :: value :: tail =>
driverUrl = value
argv = tail
case ("--executor-id") :: value :: tail =>
executorId = value
argv = tail
。。。。
printUsageAndExit()
}
}
//如果有一個值是空就打印退出main
if (driverUrl== null || executorId == null ||hostname == null || cores <= 0 ||
appId == null) {
printUsageAndExit()
}
run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
}
22,查看run方法初始化了什麼
private[spark] object CoarseGrainedExecutorBackendextends Logging {
/**
* "--driver-url" "spark://[email protected]:56522" 就是 CoarseGrainedSchedulerBackend的DriverEndpointRef
* "--executor-id""4" //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
* "--hostname""192.168.1.153"
* "--cores""1" //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),
* 如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
* "--app-id""app-20180508234845-0000"
* "--worker-url""spark://[email protected]:53403"
*/
private def run(
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL]) {
//打印:liunx相關信號,如:當有Ctrl+C 取消命令時對應INT信息,也會將CoarseGrainedExecutorBackend進程取消掉
SignalLogger.register(log)
SparkHadoopUtil.get.runAsSparkUser{ () =>
// Debug code
Utils.checkHost(hostname)
// Bootstrap to fetch the driver's Spark properties.
val executorConf= new SparkConf
val port= executorConf.getInt("spark.executor.port", 0)
//創建一個RpcEnv相當於創建ActorSystem,標識是driverPropsFetcher
val fetcher= RpcEnv.create(
"driverPropsFetcher",
hostname,
port,
executorConf,
new SecurityManager(executorConf),
clientMode = true)
//得到CoarseGrainedSchedulerBackend的DriverEndpointRef
val driver= fetcher.setupEndpointRefByURI(driverUrl)
//會回覆一個Seq[(String,String)]裏面對應sparkConf中key以spark開始的所有屬性,同時將(spark.app.id,"app-20180508234845-0000")也放到這個Seq集合中
val props= driver.askWithRetry[Seq[(String, String)]](RetrieveSparkProps) ++
Seq[(String, String)](("spark.app.id", appId))
//再將fetcher的RpcEnv關掉
fetcher.shutdown()
// Create SparkEnv using properties we fetched from thedriver.
//新new 一個默認的SparkConf(),並從DriverEndpointRef取到的Seq[(String, String)]賦到當前的SparkConf()中
val driverConf= new SparkConf()
for ((key, value) <- props) {
// this is required for SSL in standalone mode
if (SparkConf.isExecutorStartupConf(key)){
driverConf.setIfMissing(key, value)
} else {
driverConf.set(key, value)
}
}
if (driverConf.contains("spark.yarn.credentials.file")) {
logInfo("Will periodically update credentials from: " +
driverConf.get("spark.yarn.credentials.file"))
SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(driverConf)
}
//創建一個CoarseGrainedSchedulerBackend對應的SparkEnv,創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值
val env= SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, port, cores, isLocal = false)
// SparkEnv will set spark.executor.port if the rpc envis listening for incoming
// connections (e.g., if it's usingakka). Otherwise, the executor is running in
// client mode only, and does notaccept incoming connections.
//SparkEnv將設置spark.executor.port,如果rpcEnv爲外部聯接提供監聽(如使用akka).否則executor只會運行在client模式,並且不會接收外部聯接
//創建一個CoarseGrainedSchedulerBackend對應的SparkEnv,在創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值
val sparkHostPort= env.conf.getOption("spark.executor.port").map { port =>
hostname + ":" + port
}.orNull
/**
* 將sparkExecutor對應rpcEnv、
* driverUrl"spark://[email protected]:56522" 就是CoarseGrainedSchedulerBackend的DriverEndpointRef
* executorId "4" //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
* sparkHostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
* cores "1" //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
* userClassPath:空的集合
* env:SparkEnv 給CoarseGrainedExecutorBackend實例
*/
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, sparkHostPort, cores, userClassPath, env))
//也構造了一個WorkerWatcher, url:"spark://[email protected]:53403"
workerUrl.foreach { url =>
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
env.rpcEnv.awaitTermination()
SparkHadoopUtil.get.stopExecutorDelegationTokenRenewer()
}
}
23,實例化了CoarseGrainedExecutorBackend,它也是RpcEndpoint
/** 該實例由CoarseGrainedExecutorBackend的main初始化的,
* 將sparkExecutor對應rpcEnv、
* driverUrl"spark://[email protected]:56522" 就是CoarseGrainedSchedulerBackend的DriverEndpointRef
* executorId "4" //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
* hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
* cores "1" //--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1), 如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
* userClassPath:空的集合
* env: SparkEnv給CoarseGrainedExecutorBackend實例
*/
private[spark] class CoarseGrainedExecutorBackend(
override val rpcEnv: RpcEnv,
driverUrl: String,
executorId: String,
hostPort: String,
cores: Int,
userClassPath: Seq[URL],
env: SparkEnv)
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {
var executor: Executor = null
//就是DriverEndpoint的引用
@volatile var driver: Option[RpcEndpointRef] = None
// If this CoarseGrainedExecutorBackend is changed tosupport multiple threads, then this may need
// to be changed so that we don't sharethe serializer instance across threads
//如果CoarseGrainedExecutorBackend變成多線程,那麼這個需要改變,以便於我們不會把系列化實例在多線程中共享
private[this] val ser: SerializerInstance =env.closureSerializer.newInstance()
override def onStart() {
//driverUrl:spark://[email protected]:49972,這個得到的RpcEndpointRef 就是DriverEndpoint的引用
logInfo("Connectingto driver: " + driverUrl)
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use"ThreadUtils.sameThread"
//這是一個非常快的動作,所以我們使用"ThreadUtils.sameThread"
driver = Some(ref)
//會通知DriverEndpoint,然後DriverEndpoint會回覆RegisteredExecutor給CoarseGrainedExecutorBackend,讓它創建Executor
/* executorId "4" //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
self: CoarseGrainedExecutorBackend
hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
cores:--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
extractLogUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改
*/
ref.ask[RegisterExecutorResponse](
RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete{
// This is a very fast action so we can use"ThreadUtils.sameThread"
case Success(msg) =>Utils.tryLogNonFatalError {
Option(self).foreach(_.send(msg))// msg must be RegisterExecutorResponse
}
case Failure(e) => {
logError(s"Cannot register with driver: $driverUrl", e)
System.exit(1)
}
}(ThreadUtils.sameThread)
}
24,和CoarseGrainedExecutorBackend的DriverEndpoint通信,發送RegisterExecutor,讓它創建Executor
class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
extends ThreadSafeRpcEndpoint withLogging {
。。。。。
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
//CoarseGrainedExecutorBackend在初始化的時候,這個case會被調用
/* executorId "4" //ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
executorRef:CoarseGrainedExecutorBackend
hostPort:創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
cores:--cores的值是由SparkConf的"spark.executor.cores"的值決定(我這設置了1所以是1),如果沒有值,只啓動一個CoarseGrainedExecutorBackend,把worker所有可用的core給它
extractLogUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改
(stdout,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stdout)
(stderr,http://192.168.1.154:8081/logPage/?appId=app-20180516150725-0000&executorId=1&logType=stderr)
*/
case RegisterExecutor(executorId, executorRef, hostPort, cores, logUrls) =>
//executorDataMap:HashMap[String,ExecutorData],剛開始時肯定是沒有值的
if (executorDataMap.contains(executorId)) {
context.reply(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
} else {
// If the executor's rpc env is not listening forincoming connections, `hostPort`
// will be null, and the clientconnection should be used to contact the executor.
//如果CoarseGrainedExecutorBackend的rpcEnv不去監聽外來聯接,hostPort是null,並且客戶端聯接必須被用來聯繫CoarseGrainedExecutorBackend
val executorAddress = if (executorRef.address!= null) {
executorRef.address
} else {
//standalone的client模式會進入這個代碼,然後將CoarseGrainedSchedulerBackend的RpcAddress取到"luyl155:53561"
context.senderAddress
}
//17/11/12 20:31:22 INFOcluster.SparkDeploySchedulerBackend: Registered executorNettyRpcEndpointRef(null) (luyl155:53561) with ID 2
logInfo(s"Registeredexecutor $executorRef ($executorAddress) with ID $executorId")
//addressToExecutorId:HashMap[RpcAddress, String],將CoarseGrainedSchedulerBackend的RpcAddress爲key,值是CoarseGrainedSchedulerBackend自己的id
addressToExecutorId(executorAddress) = executorId
//totalCoreCount: AtomicInteger(0)是所有CoarseGrainedSchedulerBackend對應的cores總和
totalCoreCount.addAndGet(cores)
//totalRegisteredExecutors: AtomicInteger(0),統計有多少個CoarseGrainedSchedulerBackend
totalRegisteredExecutors.addAndGet(1)
/*executorRef: CoarseGrainedExecutorBackend
executorRef.address、:因爲創建RpcEnv時,因爲是client模式,所以rpcEnv.address沒有值,所以該值是Null
executorAddress.host:CoarseGrainedSchedulerBackend所在worker的ip
cores:CoarseGrainedSchedulerBackend所能擁有的cores的個數
logUrls: 從環境變量中過濾出key值含有:"SPARK_LOG_URL_"的kv,然後map將key:"SPARK_LOG_URL_"去掉後剩下的部分做爲key,原來v不改
* */
val data = new ExecutorData(executorRef, executorRef.address, executorAddress.host,
cores, cores, logUrls)
// This must be synchronized because variables mutated in this block are read when requestingexecutors
// 必須同步,因爲請求 CoarseGrainedSchedulerBackend時,變量在這個塊中會變化
CoarseGrainedSchedulerBackend.this.synchronized{
//executorDataMap:HashMap[String, ExecutorData],將oarseGrainedSchedulerBackend的id和它ExecutorData,裏面有ref引用、ip地址、core個數,放進去
executorDataMap.put(executorId, data)
//numPendingExecutors的初始值是0
if (numPendingExecutors> 0) {
numPendingExecutors -= 1
logDebug(s"Decrementednumber of pending executors ($numPendingExecutors left)")
}
}
// Note: some tests expect the reply to come after we putthe executor in the map
//會通知CoarseGrainedExecutorBackend,初始化Executor線程池,然後makeOffer
//executorAddress.host:CoarseGrainedSchedulerBackend所在的ip
context.reply(RegisteredExecutor(executorAddress.host))
listenerBus.post(
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
//稍等分析這個makeOffer,它是執行任務用的
makeOffers()
}
25,DriverEndpoint回覆CoarseGrainedExecutorBackend,傳RegisteredExecutor對象給它,讓CoarseGrainedExecutorBackend會實例化Executor
private[spark] class CoarseGrainedExecutorBackend(
override val rpcEnv: RpcEnv,
driverUrl: String,
executorId: String,
hostPort: String,
cores: Int,
userClassPath: Seq[URL],
env: SparkEnv)
extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {
……
override def receive:PartialFunction[Any, Unit] = {
//executorAddress.host:CoarseGrainedSchedulerBackend所在的ip
case RegisteredExecutor(hostname)=>
logInfo("Successfully registered with driver")
/** executorId: ExecutorDesc的id,是一個自增的數,對應每個CoarseGrainedSchedulerBackend
* hostname:CoarseGrainedSchedulerBackend所在的ip
* env : SparkEnv 給CoarseGrainedExecutorBackend實例
* userClassPath: 空的集合
*/
executor = new Executor(executorId, hostname, env, userClassPath, isLocal= false)
26,查看Executor初始化過程,調用了BlockManager的initialize方法
private[spark] class Executor(
executorId: String,
executorHostname: String,
env: SparkEnv,
userClassPath: Seq[URL] = Nil,
isLocal: Boolean= false)
extends Logging {
logInfo(s"Starting executor ID $executorId on host $executorHostname")
// Application dependencies (added through SparkContext)that we've fetched so far on this node. Each map holds the master's timestamp for theversion of that file or JAR we got.
//應該是和--jars和--files給Executor的依賴
private val currentFiles: HashMap[String, Long] = new HashMap[String,Long]()
private val currentJars: HashMap[String, Long] = new HashMap[String,Long]()
//生成一個空的ByteBuffer,新的ByteBuffer的limit和capacity是當前數組的長度,position是0,mark不存在
private val EMPTY_BYTE_BUFFER = ByteBuffer.wrap(new Array[Byte](0))
//在CoarseGrainedExecutorBackend被調用main時,將只有key以spark開始的所有屬性,才被放進來
private val conf =env.conf
// No ip or host:port - just hostname
Utils.checkHost(executorHostname, "Expected executed slave to be a hostname")
// must not have port specified.
assert (0 == Utils.parseHostPort(executorHostname)._2)
// Make sure the local hostname we report matches thecluster scheduler's name for this host
//確保我們報告的本地主機名稱與此主機的羣集調度程序名稱相匹配
Utils.setCustomHostname(executorHostname)
//isLocal的默認值是false
if (!isLocal){
// Setup an uncaught exception handler for non-localmode.
// Make any thread terminations due touncaught exceptions kill the entire
// executor process to avoidsurprising stalls.
//爲非本地模式設置未捕獲的異常處理程序。
// 由於未捕獲的異常而使任何線程終止都會終止整個執行程序進程,以避免令人喫驚的停頓。
Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
}
// Start worker thread pool. 初始化緩存線程池
private val threadPool = ThreadUtils.newDaemonCachedThreadPool("Executor task launch worker")
//ExecutorSource用於測量系統。
private val executorSource = new ExecutorSource(threadPool, executorId)
if (!isLocal){
env.metricsSystem.registerSource(executorSource)
//Executor在初始化時調用一次,driver在SparkContext初始化也調用
//是在在CoarseGrainedExecutorBackend被調用main時放進去的,值是app-20180508234845-0000
(查看spark-core_28:Executor初始化過程env.blockManager.initialize(conf.getAppId)- NettyBlockTransferService.init()源碼分析)
env.blockManager.initialize(conf.getAppId)}
===》接下來就是發送心跳給和指標給Driver
// must be initializedbefore running startDriverHeartbeat()
//是SparkContext初始化出來的,HeartbeatReceiver。 ENDPOINT_NAME: HeartbeatReceiver
private val heartbeatReceiverRef =
RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv)
startDriverHeartbeater()
後面再具體看一個具體看一下Executor發起的每10s一次的心跳
/**
* Schedules a task to report heartbeatand partial metrics for active tasks to driver.
* 安排任何去報告心跳,同時部分活動的指標給driver
*/
private def startDriverHeartbeater(): Unit = {
//默認時間是10秒,會轉換成毫秒值
val intervalMs= conf.getTimeAsMs("spark.executor.heartbeatInterval", "10s")
// Wait a random interval so the heartbeats don't end upin sync
//應該是小於20的值
val initialDelay= intervalMs + (math.random * intervalMs).asInstanceOf[Int]
val heartbeatTask= new Runnable() {
override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
}
//先延遲小於20s,然後每10s執行一次
heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
}