概述
ResourceManager#serviceInit()方法
1、判斷是否啓動HA。如果yarn.resourcemanager.ha.enabled配置參數爲true,則爲啓動HA。
2、如果啓動HA,判斷是否啓用自動失敗重啓。如果yarn.resourcemanager.ha.automatic-failover.enabled配置參數爲true,則爲啓動自動失敗重啓。如果啓用自動失敗重啓,創建EmbeddedElector。
EmbeddedElector有2種類型:
- CuratorBasedElectorService
- ActiveStandbyElectorBasedElectorService。
如果yarn.resourcemanager.ha.curator-leader-elector.enabled參數爲true,則EmbeddedElector類型爲CuratorBasedElectorService,該參數默認爲false。
CuratorBasedElectorService實現了curator框架的LeaderLatchListener接口。
curator框架的LeaderLatch封裝了zk的主從選舉,在當前進程當選爲leader節點時,LeaderLatch會回調LeaderLatchListener的isLeader方法。
isLeader方法是RM在當選爲leader節點後的執行邏輯。在RM當選爲leader節點後,將會啓動RMActiveServices。
// Set HA configuration should be done before login
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) {
HAUtil.verifyAndSetConfiguration(this.conf);
}
// elector must be added post adminservice
if (this.rmContext.isHAEnabled()) {
// If the RM is configured to use an embedded leader elector,
// initialize the leader elector.
if (HAUtil.isAutomaticFailoverEnabled(conf)
&& HAUtil.isAutomaticFailoverEmbedded(conf)) {
EmbeddedElector elector = createEmbeddedElector();
addIfService(elector);
rmContext.setLeaderElectorService(elector);
}
}
RM基於zk的主從選舉過程
CuratorBasedElectorService啓動leaderLatch
private void initAndStartLeaderLatch() throws Exception {
leaderLatch = new LeaderLatch(curator, latchPath, rmId);
leaderLatch.addListener(this);
leaderLatch.start();
}
leaderLatch競選leader節點,並當選爲leader節點時回調LeaderLatchListener的isLeader方法
略,請參閱curator2.1源碼分析之LeaderLatch封裝ZK主從選舉
CuratorBasedElectorService#isLeader方法
RM在當選爲leader節點後的處理邏輯
public void isLeader() {
LOG.info(rmId + "is elected leader, transitioning to active");
try {
rm.getRMContext().getRMAdminService()
.transitionToActive(
new HAServiceProtocol.StateChangeRequestInfo(
HAServiceProtocol.RequestSource.REQUEST_BY_ZKFC));
} catch (Exception e) {
LOG.info(rmId + " failed to transition to active, giving up leadership",
e);
notLeader();
rejoinElection();
}
}
AdminService#transitionToActive()方法
public synchronized void transitionToActive(
HAServiceProtocol.StateChangeRequestInfo reqInfo) throws IOException {
if (isRMActive()) {
return;
}
// call refreshAdminAcls before HA state transition
// for the case that adminAcls have been updated in previous active RM
try {
refreshAdminAcls(false);
} catch (YarnException ex) {
throw new ServiceFailedException("Can not execute refreshAdminAcls", ex);
}
UserGroupInformation user = checkAccess("transitionToActive");
checkHaStateChange(reqInfo);
try {
// call all refresh*s for active RM to get the updated configurations.
refreshAll();
} catch (Exception e) {
rm.getRMContext()
.getDispatcher()
.getEventHandler()
.handle(
new RMFatalEvent(RMFatalEventType.TRANSITION_TO_ACTIVE_FAILED,
e, "failure to refresh configuration settings"));
throw new ServiceFailedException(
"Error on refreshAll during transition to Active", e);
}
try {
rm.transitionToActive();
} catch (Exception e) {
RMAuditLogger.logFailure(user.getShortUserName(), "transitionToActive",
"", "RM",
"Exception transitioning to active");
throw new ServiceFailedException(
"Error when transitioning to Active mode", e);
}
RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive",
"RM");
}
ResourceManager#transitionToActive()方法
synchronized void transitionToActive() throws Exception {
if (rmContext.getHAServiceState() == HAServiceProtocol.HAServiceState.ACTIVE) {
LOG.info("Already in active state");
return;
}
LOG.info("Transitioning to active state");
this.rmLoginUGI.doAs(new PrivilegedExceptionAction<Void>() {
@Override
public Void run() throws Exception {
try {
//啓動RMActiveServices
startActiveServices();
return null;
} catch (Exception e) {
reinitialize(true);
throw e;
}
}
});
rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.ACTIVE);
LOG.info("Transitioned to active state");
}
ResourceManager#startActiveServices()方法
void startActiveServices() throws Exception {
if (activeServices != null) {
clusterTimeStamp = System.currentTimeMillis();
activeServices.start();
}
}
RM基於ZK的狀態存儲
ResourceManager.RMActiveServices#serviceInit()方法
RMActiveServices在服務初始化時,根據yarn.resourcemanager.recovery.enabled參數決定是否在active RM啓動後恢復它的狀態。如果參數參數爲true,調用RMStateStoreFactory#getStore()初始化RMStateStore。RMStateStoreFactory會根據yarn.resourcemanager.store.class參數反射生成相應的RMStateStore。
所以,如果yarn.resourcemanager.recovery.enabled參數爲true,必須設置yarn.resourcemanager.store.class參數。
RMStateStore有幾種類型如下:
- NullRMStateStore
- MemoryRMStateStore
- FileSystemRMStateStore
- LeveldbRMStateStore
- ZKRMStateStore
如果設置yarn.resourcemanager.store.class參數爲ZKRMStateStore,則ResourceManager使用基於zk的狀態存儲。
recoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED,
YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED);
RMStateStore rmStore = null;
if (recoveryEnabled) {
rmStore = RMStateStoreFactory.getStore(conf);
boolean isWorkPreservingRecoveryEnabled =
conf.getBoolean(
YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED);
rmContext
.setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled);
} else {
rmStore = new NullRMStateStore();
}
ResourceManager#serviceStart()方法
@Override
protected void serviceStart() throws Exception {
RMStateStore rmStore = rmContext.getStateStore();
// The state store needs to start irrespective of recoveryEnabled as apps
// need events to move to further states.
rmStore.start();
//是否在active RM啓動後恢復它的狀態
if(recoveryEnabled) {
try {
LOG.info("Recovery started");
rmStore.checkVersion();
if (rmContext.isWorkPreservingRecoveryEnabled()) {
rmContext.setEpoch(rmStore.getAndIncrementEpoch());
}
RMState state = rmStore.loadState();
recover(state);
LOG.info("Recovery ended");
} catch (Exception e) {
// the Exception from loadState() needs to be handled for
// HA and we need to give up master status if we got fenced
LOG.error("Failed to load/recover state", e);
throw e;
}
} else {
if (HAUtil.isFederationEnabled(conf)) {
long epoch = conf.getLong(YarnConfiguration.RM_EPOCH,
YarnConfiguration.DEFAULT_RM_EPOCH);
rmContext.setEpoch(epoch);
LOG.info("Epoch set for Federation: " + epoch);
}
}
super.serviceStart();
}
ResourceManager#recover()方法
@Override
public void recover(RMState state) throws Exception {
// recover RMdelegationTokenSecretManager
rmContext.getRMDelegationTokenSecretManager().recover(state);
// recover AMRMTokenSecretManager
rmContext.getAMRMTokenSecretManager().recover(state);
// recover reservations
if (reservationSystem != null) {
reservationSystem.recover(state);
}
// recover applications
rmAppManager.recover(state);
setSchedulerRecoveryStartAndWaitTime(state, conf);
}
ZKRMSateStore#getAndIncrementEpoch()方法
public synchronized long getAndIncrementEpoch() throws Exception {
String epochNodePath = getNodePath(zkRootNodePath, EPOCH_NODE);
long currentEpoch = baseEpoch;
if (exists(epochNodePath)) {
// load current epoch
byte[] data = getData(epochNodePath);
Epoch epoch = new EpochPBImpl(EpochProto.parseFrom(data));
currentEpoch = epoch.getEpoch();
// increment epoch and store it
byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
.toByteArray();
zkManager.safeSetData(epochNodePath, storeData, -1, zkAcl,
fencingNodePath);
} else {
// initialize epoch node with 1 for the next time.
byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
.toByteArray();
zkManager.safeCreate(epochNodePath, storeData, zkAcl,
CreateMode.PERSISTENT, zkAcl, fencingNodePath);
}
return currentEpoch;
}
待續。。