解決saturn失效分片立即執行後後臺分片項顯示不正確問題

原創

2019-07-08 21:09

背景：

官方的saturn組件，在失效分片後不能立即執行任務，需要等當前正常的分片執行完之後纔可以執行，而且失效分片單個executor上一個一個執行，而我們的需求是立即執行，且是並行執行失效分片。解決方案見：解決saturn executor失敗分片轉移立即執行之源碼分析

問題：

解決如上問題之後，發現saturn後臺，在執行完失效分片之後，單個作業的分片項顯示不正確（經代碼分析，確實沒有處理這塊的邏輯），仍顯示最初始的分片序號，那麼如何根據失效分片更新實際分配的節點，正確顯示呢？

解決：

手動加失效分片重新更新節點信息代碼如下

com/vip/saturn/job/internal/sharding/ShardingService.java新增方法

/**
 *
 * @param getLocalHostFailoverItems
 * @throws Exception
 */
public synchronized void removeAndCreateShardingInfo(List<Integer> getLocalHostFailoverItems) throws Exception {
   LogUtils.info(log, jobName, "removeAndCreateShardingInfo start.");
   //加分佈式鎖，防止多個executor對同一個節點進行更新
   CuratorFramework client = getJobNodeStorage().getClient();
   InterProcessMutex mutex = new InterProcessMutex(client,"/saturnSharding/lock");
   try {
      mutex.acquire();
      // 所有jobserver的（檢查+創建），加上設置sharding necessary內容爲0，都是一個事務
      CuratorTransactionFinal curatorTransactionFinal = getJobNodeStorage().getClient().inTransaction()
            .check().forPath("/").and();
      //遍歷所有的servers/executorName下分片序號，判斷失敗的分片是否在該executorName下，是則刪除
      Set<Integer> getLocalHostFailoverItemSets = new HashSet<>(getLocalHostFailoverItems);
      for (String each : serverService.getAllServers()) {
         if(StringUtils.equals(each,executorName)){
            continue;
         }
         String value = getJobNodeStorage().getJobNodeDataDirectly(ShardingNode.getShardingNode(each));
         if(StringUtils.isEmpty(value)){
            continue;
         }
         List<Integer> getShardingItemsByexecutorName = ItemUtils.toItemList(value);
         Set<Integer> getShardingItemSetsByexecutorName = new HashSet<>(getShardingItemsByexecutorName);
         getShardingItemSetsByexecutorName.removeAll(getLocalHostFailoverItemSets);
         getJobNodeStorage().removeJobNodeIfExisted(ShardingNode.getShardingNode(each));
         curatorTransactionFinal.create().forPath(
               JobNodePath.getNodeFullPath(jobName, ShardingNode.getShardingNode(each)),
               ItemUtils.toItemsString(new ArrayList<>(getShardingItemSetsByexecutorName)).getBytes(StandardCharsets.UTF_8)).and();
      }
      LogUtils.info(log, jobName, "removeAndCreateShardingInfo delete LocalHostFailoverItems.");
      //在local executorName下新加該失敗的分片
      String getLocalHostItems = getJobNodeStorage().getJobNodeDataDirectly(ShardingNode.getShardingNode(executorName));
      List<Integer> getLocalHostItemsList = new ArrayList<>();
      if(StringUtils.isEmpty(getLocalHostItems)){
         getLocalHostItemsList = getLocalHostFailoverItems;
      } else {
         getLocalHostItemsList = ItemUtils.toItemList(getLocalHostItems);
         getLocalHostItemsList.addAll(getLocalHostFailoverItems);
      }
      getJobNodeStorage().removeJobNodeIfExisted(ShardingNode.getShardingNode(executorName));
      curatorTransactionFinal.create().forPath(
            JobNodePath.getNodeFullPath(jobName, ShardingNode.getShardingNode(executorName)),
            ItemUtils.toItemsString(getLocalHostItemsList).getBytes(StandardCharsets.UTF_8)).and();
      curatorTransactionFinal.commit();
      LogUtils.info(log, jobName, "removeAndCreateShardingInfo append LocalHostFailoverItems.");
   } catch (Exception e) {
      // 可能多個sharding task導致計算結果有滯後，但是server機器已經被刪除，導致commit失敗
      // 實際上可能不影響最終結果，仍然能正常分配分片，因爲還會有resharding事件被響應
      // 修改日誌級別爲warn級別，避免不必要的告警
      LogUtils.warn(log, jobName, "Commit shards failed", e);
   } finally {
      mutex.release();
   }
}

然後在com/vip/saturn/job/basic/AbstractElasticJob.java方法的execute

中，在代碼executeJobInternal(shardingContext);前新加如下代碼

//處理失敗分片的分片項
if(!failoverService.getLocalHostFailoverItems().isEmpty()){
   shardingService.removeAndCreateShardingInfo(failoverService.getLocalHostFailoverItems());
}

Ok，到此結束，經測試完全可行。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

解決saturn失效分片立即執行後後臺分片項顯示不正確問題

spring源碼分析之分析入口

解決saturn失效分片立即執行後後臺分片項顯示不正確問題

解決saturn executor失敗分片轉移立即執行之源碼分析

Elasticsearch升級之River替代方案(分佈式任務調度與動態解析日誌)

mysql更新死鎖問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結