yarn3.2源碼分析之ResourceManager基於zk的HA機制

概述

ResourceManager#serviceInit()方法

1、判斷是否啓動HA。如果yarn.resourcemanager.ha.enabled配置參數爲true，則爲啓動HA。

2、如果啓動HA，判斷是否啓用自動失敗重啓。如果yarn.resourcemanager.ha.automatic-failover.enabled配置參數爲true，則爲啓動自動失敗重啓。如果啓用自動失敗重啓，創建EmbeddedElector。

EmbeddedElector有2種類型：

CuratorBasedElectorService
ActiveStandbyElectorBasedElectorService。

如果yarn.resourcemanager.ha.curator-leader-elector.enabled參數爲true，則EmbeddedElector類型爲CuratorBasedElectorService，該參數默認爲false。

CuratorBasedElectorService實現了curator框架的LeaderLatchListener接口。

curator框架的LeaderLatch封裝了zk的主從選舉，在當前進程當選爲leader節點時，LeaderLatch會回調LeaderLatchListener的isLeader方法。

isLeader方法是RM在當選爲leader節點後的執行邏輯。在RM當選爲leader節點後，將會啓動RMActiveServices。

 // Set HA configuration should be done before login
    this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
    if (this.rmContext.isHAEnabled()) {
      HAUtil.verifyAndSetConfiguration(this.conf);
    }

// elector must be added post adminservice
    if (this.rmContext.isHAEnabled()) {
      // If the RM is configured to use an embedded leader elector,
      // initialize the leader elector.
      if (HAUtil.isAutomaticFailoverEnabled(conf)
          && HAUtil.isAutomaticFailoverEmbedded(conf)) {
        EmbeddedElector elector = createEmbeddedElector();
        addIfService(elector);
        rmContext.setLeaderElectorService(elector);
      }
    }

RM基於zk的主從選舉過程

CuratorBasedElectorService啓動leaderLatch

 private void initAndStartLeaderLatch() throws Exception {
    leaderLatch = new LeaderLatch(curator, latchPath, rmId);
    leaderLatch.addListener(this);
    leaderLatch.start();
  }

leaderLatch競選leader節點，並當選爲leader節點時回調LeaderLatchListener的isLeader方法

略，請參閱curator2.1源碼分析之LeaderLatch封裝ZK主從選舉

CuratorBasedElectorService#isLeader方法

RM在當選爲leader節點後的處理邏輯

public void isLeader() {
    LOG.info(rmId + "is elected leader, transitioning to active");
    try {
      rm.getRMContext().getRMAdminService()
          .transitionToActive(
          new HAServiceProtocol.StateChangeRequestInfo(
              HAServiceProtocol.RequestSource.REQUEST_BY_ZKFC));
    } catch (Exception e) {
      LOG.info(rmId + " failed to transition to active, giving up leadership",
          e);
      notLeader();
      rejoinElection();
    }
  }

AdminService#transitionToActive()方法

public synchronized void transitionToActive(
      HAServiceProtocol.StateChangeRequestInfo reqInfo) throws IOException {
    if (isRMActive()) {
      return;
    }
    // call refreshAdminAcls before HA state transition
    // for the case that adminAcls have been updated in previous active RM
    try {
      refreshAdminAcls(false);
    } catch (YarnException ex) {
      throw new ServiceFailedException("Can not execute refreshAdminAcls", ex);
    }

    UserGroupInformation user = checkAccess("transitionToActive");
    checkHaStateChange(reqInfo);

    try {
      // call all refresh*s for active RM to get the updated configurations.
      refreshAll();
    } catch (Exception e) {
      rm.getRMContext()
          .getDispatcher()
          .getEventHandler()
          .handle(
              new RMFatalEvent(RMFatalEventType.TRANSITION_TO_ACTIVE_FAILED,
                  e, "failure to refresh configuration settings"));
      throw new ServiceFailedException(
          "Error on refreshAll during transition to Active", e);
    }

    try {
      rm.transitionToActive();
    } catch (Exception e) {
      RMAuditLogger.logFailure(user.getShortUserName(), "transitionToActive",
          "", "RM",
          "Exception transitioning to active");
      throw new ServiceFailedException(
          "Error when transitioning to Active mode", e);
    }

    RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive",
        "RM");
  }

ResourceManager#transitionToActive()方法

synchronized void transitionToActive() throws Exception {
    if (rmContext.getHAServiceState() == HAServiceProtocol.HAServiceState.ACTIVE) {
      LOG.info("Already in active state");
      return;
    }
    LOG.info("Transitioning to active state");

    this.rmLoginUGI.doAs(new PrivilegedExceptionAction<Void>() {
      @Override
      public Void run() throws Exception {
        try {
//啓動RMActiveServices
          startActiveServices();
          return null;
        } catch (Exception e) {
          reinitialize(true);
          throw e;
        }
      }
    });

    rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.ACTIVE);
    LOG.info("Transitioned to active state");
  }

ResourceManager#startActiveServices()方法

void startActiveServices() throws Exception {
    if (activeServices != null) {
      clusterTimeStamp = System.currentTimeMillis();
      activeServices.start();
    }
  }

RM基於ZK的狀態存儲

ResourceManager.RMActiveServices#serviceInit()方法

RMActiveServices在服務初始化時，根據yarn.resourcemanager.recovery.enabled參數決定是否在active RM啓動後恢復它的狀態。如果參數參數爲true，調用RMStateStoreFactory#getStore()初始化RMStateStore。RMStateStoreFactory會根據yarn.resourcemanager.store.class參數反射生成相應的RMStateStore。

所以，如果yarn.resourcemanager.recovery.enabled參數爲true，必須設置yarn.resourcemanager.store.class參數。

RMStateStore有幾種類型如下：

NullRMStateStore
MemoryRMStateStore
FileSystemRMStateStore
LeveldbRMStateStore
ZKRMStateStore

如果設置yarn.resourcemanager.store.class參數爲ZKRMStateStore，則ResourceManager使用基於zk的狀態存儲。

recoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED,
          YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED);

      RMStateStore rmStore = null;
      if (recoveryEnabled) {
        rmStore = RMStateStoreFactory.getStore(conf);
        boolean isWorkPreservingRecoveryEnabled =
            conf.getBoolean(
              YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
              YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED);
        rmContext
            .setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled);
      } else {
        rmStore = new NullRMStateStore();
      }

ResourceManager#serviceStart()方法

@Override
    protected void serviceStart() throws Exception {
      RMStateStore rmStore = rmContext.getStateStore();
      // The state store needs to start irrespective of recoveryEnabled as apps
      // need events to move to further states.
      rmStore.start();
//是否在active RM啓動後恢復它的狀態
      if(recoveryEnabled) {
        try {
          LOG.info("Recovery started");
          rmStore.checkVersion();
          if (rmContext.isWorkPreservingRecoveryEnabled()) {
            rmContext.setEpoch(rmStore.getAndIncrementEpoch());
          }
          RMState state = rmStore.loadState();
          recover(state);
          LOG.info("Recovery ended");
        } catch (Exception e) {
          // the Exception from loadState() needs to be handled for
          // HA and we need to give up master status if we got fenced
          LOG.error("Failed to load/recover state", e);
          throw e;
        }
      } else {
        if (HAUtil.isFederationEnabled(conf)) {
          long epoch = conf.getLong(YarnConfiguration.RM_EPOCH,
              YarnConfiguration.DEFAULT_RM_EPOCH);
          rmContext.setEpoch(epoch);
          LOG.info("Epoch set for Federation: " + epoch);
        }
      }

      super.serviceStart();
    }

ResourceManager#recover()方法

@Override
  public void recover(RMState state) throws Exception {
    // recover RMdelegationTokenSecretManager
    rmContext.getRMDelegationTokenSecretManager().recover(state);

    // recover AMRMTokenSecretManager
    rmContext.getAMRMTokenSecretManager().recover(state);

    // recover reservations
    if (reservationSystem != null) {
      reservationSystem.recover(state);
    }
    // recover applications
    rmAppManager.recover(state);

    setSchedulerRecoveryStartAndWaitTime(state, conf);
  }

ZKRMSateStore#getAndIncrementEpoch()方法

public synchronized long getAndIncrementEpoch() throws Exception {
    String epochNodePath = getNodePath(zkRootNodePath, EPOCH_NODE);
    long currentEpoch = baseEpoch;

    if (exists(epochNodePath)) {
      // load current epoch
      byte[] data = getData(epochNodePath);
      Epoch epoch = new EpochPBImpl(EpochProto.parseFrom(data));
      currentEpoch = epoch.getEpoch();
      // increment epoch and store it
      byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
          .toByteArray();
      zkManager.safeSetData(epochNodePath, storeData, -1, zkAcl,
          fencingNodePath);
    } else {
      // initialize epoch node with 1 for the next time.
      byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
          .toByteArray();
      zkManager.safeCreate(epochNodePath, storeData, zkAcl,
          CreateMode.PERSISTENT, zkAcl, fencingNodePath);
    }

    return currentEpoch;
  }

待續。。

yarn3.2源碼分析之ResourceManager基於zk的HA機制

概述

RM基於zk的主從選舉過程

RM基於ZK的狀態存儲

spark2.3源碼分析之submitTasks的流程

spark2.3源碼分析之launchTask的流程

yarn3.2源碼分析之ResourceManager基於zk的HA機制

spark2.3源碼分析之ResultTask讀取並處理shuffle file的流程

AbstractQueuedSynchronizer源碼分析之共享鎖實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結