從源碼角度對比Yarn下的App對任務失敗的控制原薦

在Yarn/MRV2/Spark裏都有任務失敗再重試的設置，Yarn作爲資源管理框架，而MRV2/Spark作爲計算框架，它們的失敗重試有什麼不同？有哪些參數來控制失敗重試？ Spark on Yarn的時候參數又有什麼影響？

Yarn的AM失敗重試

Yarn作爲資源管理框架，由RM負責AM(ApplicationMaster)，具體的任務是由AM自己負責，所以Yarn對於一個Job的重試是在AM層級上的，其參數爲 yarn.resourcemanager.am.max-attempts 或 yarn.resourcemanager.am.max-retries，默認值爲2，即如果一個Job的AM死掉了，RM會重新分配container重啓AM一次，而對於container的掛掉，則由具體的AppMaster實現來管理，該參數判斷代碼如下：

// RMAppImpl.java

public RMAppState transition(RMAppImpl app, RMAppEvent event) {
  int numberOfFailure = app.getNumFailedAppAttempts();
  // other code...
  if (numberOfFailure >= app.maxAppAttempts) {
    app.isNumAttemptsBeyondThreshold = true;
  }
  app.rememberTargetTransitionsAndStoreState(event, new AttemptFailedFinalStateSavedTransition(), RMAppState.FAILED, RMAppState.FAILED);
}

注意：如果一個Job失敗了，可不一定會觸發這個重試，Job失敗並不代表其AM失敗，Job失敗的原因是有多種的

MRV2的Task失敗重試

對於AM的失敗次數，Yarn提供了用戶設置參數來單獨控制每個任務，可以覆蓋Yarn的默認參數值，其中在MRV2裏通過參數 mapreduce.am.max-attempts 體現，該參數的默認值也爲2，AM的失敗次數由它和 yarn.resourcemanager.am.max-attempts 一起決定，判斷邏輯如下：

// RMAppImpl.java

// yarn.resourcemanager.am.max-attempts
int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
// mapreduce.am.max-attempts 或Spark裏的 spark.yarn.maxAppAttempts
int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
if (individualMaxAppAttempts <= 0 || individualMaxAppAttempts > globalMaxAppAttempts) {
  this.maxAppAttempts = globalMaxAppAttempts;
  LOG.warn("The specific max attempts: " + individualMaxAppAttempts
          + " for application: " + applicationId.getId()
          + " is invalid, because it is out of the range [1, "
          + globalMaxAppAttempts + "]. Use the global max attempts instead.");
} else {
  this.maxAppAttempts = individualMaxAppAttempts;
}

即在RMV2或Spark裏的用戶設置值在0到 yarn.resourcemanager.am.max-attempts 之間，那麼取用戶的設置值，如果不在，那麼取 yarn.resourcemanager.am.max-attempts 設置值或默認值

由於AM控制了每個Job的運行，而Job由Map Task和Reduce Task組成，因此Job的失敗就與Task相關，在MRV2裏，提供了 mapreduce.map.maxattempts 和 mapreduce.reduce.maxattempts 兩個值來控制MR Task的最大失敗次數，兩個參數的默認值都爲4，但是在Uber模式的時候，兩個參數的值被設爲了1

// ReduceTaskImpl.java
protected int getMaxAttempts() {
  return conf.getInt(MRJobConfig.REDUCE_MAX_ATTEMPTS, 4);
}

// MapTaskImpl.java
protected int getMaxAttempts() {
  return conf.getInt(MRJobConfig.MAP_MAX_ATTEMPTS, 4);
}

這兩個參數的意思是，單個map task或reduce task的最大嘗試次數是4，如果一個task嘗試了4次還未成功，那麼該Task就是失敗的，從而整個Job也是失敗的，這時由於AM並沒有問題，所以不會引起Yarn對Job的重試

同時這兩個參數是針對單個task的，並不是所有task的嘗試次數總和，所以如果多個task都有失敗，只要每個task的嘗試次數不超過4次，Job就不是失敗的，所以有時你看到一個Job有幾十次或上百次失敗，Job最後也是運行成功的！如下：

// TaskImpl.java

// TaskImpl是一個abstract class，Map和Reduce有不同的實現，代表的是單個Task，所以這裏判斷的是單個Task的嘗試次數
if (attemptState == TaskAttemptState.FAILED) {
  failedAttempts.add(attempt.getID());
  if (failedAttempts.size() >= maxAttempts) {
    taces = TaskAttemptCompletionEventStatus.TIPFAILED;
  }
}

Spark on Yarn

對於Spark on Yarn，yarn只負責啓動和管理AM以及分配資源，Spark有自己的AM實現，當Executor運行起來後，任務的控制是由Driver負責的，所以在重試上Yarn只負責AM的重試，沒有重試的參數衝突

同MRV2一樣，Spark可以使用 spark.yarn.maxAppAttempts 參數控制AM的嘗試次數，該參數沒有默認值，如果不設置則保持Yarn的設置，如果有設置，則與MRV2的 mapreduce.am.max-attempts 參數判斷邏輯一致

其次，在Spark對ApplicationMaster的實現裏，Spark提供了參數 spark.yarn.max.executor.failures 來控制Executor的失敗次數，其默認值是 numExecutors * 2(如果dynamicallocation打開了，那麼該值爲最大的Executors個數乘2)，同時其最小值不小於3。當Executor的失敗次數達到這個值的時候，整個Spark Job(這裏的Job是整個Spark任務，而不是DAG裏的Job/Stage/Task)就失敗了，判斷邏輯如下：

// ApplicationMaster.scala

private val maxNumExecutorFailures = {
  val effectiveNumExecutors =
    if (Utils.isDynamicAllocationEnabled(sparkConf)) {
      sparkConf.get(DYN_ALLOCATION_MAX_EXECUTORS)
    } else {
      sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
    }
  // By default, effectiveNumExecutors is Int.MaxValue if dynamic allocation is enabled. We need
  // avoid the integer overflow here.
  val defaultMaxNumExecutorFailures = math.max(3,
    if (effectiveNumExecutors > Int.MaxValue / 2) Int.MaxValue else (2 * effectiveNumExecutors))
}
  sparkConf.get(MAX_EXECUTOR_FAILURES).getOrElse(defaultMaxNumExecutorFailures)

// other code ...

// judge
if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) {
  finish(FinalApplicationStatus.FAILED,
    ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,
    s"Max number of executor failures ($maxNumExecutorFailures) reached")
} else {
  // ...
}

對於Executor失敗的原因，可能是OOM，也可能是心跳超時等等，Task的失敗並不一定能導致Executor的失敗

對於Task的失敗，Spark還提供了參數 spark.task.maxFailures 來控制task的失敗次數，其默認值是4，同一個Task失敗的次數不能超過4次，否則Spark Job(Job是Spark任務，非DAG裏的Job)就失敗了。此參數無法限制Task總的失敗次數，如果有多個Task失敗，只要每個Task的失敗次數不超過4次，Spark Job就是成功的！如下：

// TaskSetManager.scala

// numFailures是一個數組，大小爲numTasks，存的是每個task的失敗次數
if (numFailures(index) >= maxTaskFailures) {
  logError("Task %d in stage %s failed %d times; aborting job".format(
    index, taskSet.id, maxTaskFailures))
  abort("Task %d in stage %s failed %d times, most recent failure: %s\nDriver stacktrace:"
    .format(index, taskSet.id, maxTaskFailures, failureReason), failureException)
  return
}

一個Spark app根據DAGScheduler可分爲多個Job，Stage或Task，但是任務的重試次數跟Job或Stage無關

參數總結

參數	默認值	備註	設置位置
yarn.resourcemanager.am.max-attempts(yarn.resourcemanager.am.max-retries)	2	控制AppMaster的重試	Yarn的RM
mapreduce.am.max-attempts	2	覆蓋Yarn的默認AppMaster的重試次數	MRV2 App or hive
mapreduce.map.maxattempts	4	控制單個map任務的重試次數	MRV2 App or hive
mapreduce.reduce.maxattempts	4	控制單個reduce任務的重試次數	MRV2 App or hive
spark.yarn.maxAppAttempts	none	覆蓋Yarn的默認AppMaster的重試次數	Spark on Yarn App
spark.yarn.max.executor.failures	numExecutors * 2 (twice the maximum number of executors if dynamicallocation is enabled), with a minimum of 3	控制Spark executor的失敗重試次數	Spark on Yarn App
spark.task.maxFailures	4	控制Spark單個task的最大失敗次數	Spark on Yarn App or Spark standalone App

以上代碼基於Hadoop 2.6.0和Spark 2.0.0

歡迎閱讀轉載，轉載請註明出處：https://my.oschina.net/kavn/blog/1543769

從源碼角度對比Yarn下的App對任務失敗的控制原薦

Yarn的AM失敗重試

MRV2的Task失敗重試

Spark on Yarn

參數總結

Hive map階段優化之一次詳細的優化分析過程原

Kerberos的那些報錯彙總原

從源碼角度看Spark on yarn client & cluster模式的本質區別原薦

KMS密鑰管理服務(Hadoop) 原

ClassLoader和雙親委派機制原薦

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

從源碼角度對比Yarn下的App對任務失敗的控制 原 薦

Yarn的AM失敗重試

MRV2的Task失敗重試

Spark on Yarn

參數總結

從源碼角度對比Yarn下的App對任務失敗的控制原薦