【HDFS】hdfs文件系統的刪除操作

常用的rm和rmr 命令有什麼區別，怎麼實現的？然後Trash是啥,通過1.0.3的代碼研究一下。

elif [ "$COMMAND" = "fs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

從hadoop的命令執行腳本可以看到，命令行處理類是org.apache.hadoop.fs.FsShell，進去看一眼：

      } else if ("-rm".equals(cmd)) {
        exitCode = doall(cmd, argv, i);
      } else if ("-rmr".equals(cmd)) {
        exitCode = doall(cmd, argv, i);
rm和rmr都進入到doall方法：
        } else if ("-rm".equals(cmd)) {
          delete(argv[i], false, rmSkipTrash);
        } else if ("-rmr".equals(cmd)) {
          delete(argv[i], true, rmSkipTrash);

rm和rmr區別很簡單就是執行刪除的時候要不要遞歸刪除子目錄和文件，當然目錄刪除只能用rmr

  void delete(String srcf, final boolean recursive, final boolean skipTrash) 
                                                            throws IOException {
    Path srcPattern = new Path(srcf);
    new DelayedExceptionThrowing() {
      @Override
      void process(Path p, FileSystem srcFs) throws IOException {
        delete(p, srcFs, recursive, skipTrash);
      }
    }.globAndProcess(srcPattern, srcPattern.getFileSystem(getConf()));
  }

看到沒有，中間的參數是recursive，這裏搞了一個抽象類DelayedExceptionThrowing，幹啥用的先不管，看名字就是異常檢測之類的，往後看或者不看也知道肯定要回調process方法。
執行delete，這還是在客戶端本地的代碼先進行一些基本的合理性檢查，比如如果你要刪除的是目錄，但是使用了rm命令，那肯定是不行；如果這個文件不存在也沒法給你刪，會拋出異常。
然後看你是不是選擇了跳過trash，跳過trash給你直接調用hdfs的刪除方法，默認應該都是使用Trash.

if(!skipTrash) {
      try {
	      Trash trashTmp = new Trash(srcFs, getConf());
        if (trashTmp.moveToTrash(src)) {
          System.out.println("Moved to trash: " + src);
          return;
        }
      } catch (IOException e) {
        Exception cause = (Exception) e.getCause();
        String msg = "";
        if(cause != null) {
          msg = cause.getLocalizedMessage();
        }
        System.err.println("Problem with Trash." + msg +". Consider using -skipTrash option");        
        throw e;
      }
    }

看上面，先搞一個trash對象出來，然後執行moveToTrash方法，後邊一堆catch，不管了。這個trash咋工作的，下一篇再研究，看看跳過trash的操作：

    if (srcFs.delete(src, true)) {
      System.out.println("Deleted " + src);
    } else {
      throw new IOException("Delete failed " + src);
    }

分佈式文件系統實現這個抽象方法，由hdfs的客戶端發起rpc調用：

  public boolean delete(String src, boolean recursive) throws IOException {
    checkOpen();
    try {
      return namenode.delete(src, recursive);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class);
    }
  }

checkopen是檢查客戶端有木有被意外關閉，這個先不管，繼續看namenode怎麼執行刪除操作的。

  public boolean delete(String src, boolean recursive) throws IOException {
    if (stateChangeLog.isDebugEnabled()) {
      stateChangeLog.debug("*DIR* Namenode.delete: src=" + src
          + ", recursive=" + recursive);
    }
    boolean ret = namesystem.delete(src, recursive);
    if (ret) 
      myMetrics.incrNumDeleteFileOps();
    return ret;
  }

namenode還是交給自己的大管家FSNamesystem去具體操作：

    public boolean delete(String src, boolean recursive) throws IOException {
      if ((!recursive) && (!dir.isDirEmpty(src))) {
        throw new IOException(src + " is non empty");
      }
      boolean status = deleteInternal(src, true);
      getEditLog().logSync();
      if (status && auditLog.isInfoEnabled() && isExternalInvocation()) {
        logAuditEvent(UserGroupInformation.getCurrentUser(),
                      Server.getRemoteIp(),
                      "delete", src, null, null);
      }
      return status;
    }

看到沒有，調用deleteInternal方法繼續執行具體的刪除操作，刪完之後將edit更新，同時如果開啓了審計日誌，還要記錄這個刪除的操作。審計日誌是個好東西啊，查找所有的操作就靠它了，但是有些人不知道沒有打開過它或者揉到一般日誌裏，單獨拿出來比較好啊，扯遠了。注意啊，所有的操作記錄一定要在操作完成之後再記錄，前後順序不要顛倒，雖然正常情況下不會有差別，但是操作失敗了，還寫個屁啊。

  synchronized boolean deleteInternal(String src, 
      boolean enforcePermission) throws IOException {
    if (NameNode.stateChangeLog.isDebugEnabled()) {
      NameNode.stateChangeLog.debug("DIR* NameSystem.delete: " + src);
    }
    if (isInSafeMode())
      throw new SafeModeException("Cannot delete " + src, safeMode);
    if (enforcePermission && isPermissionEnabled) {
      checkPermission(src, false, null, FsAction.WRITE, null, FsAction.ALL);
    }

    return dir.delete(src);
  }

看到沒有，safemode的時候是不能執行刪除操作的。
目錄文件這些玩意大管家是交給dir這個副手管理的，所以dir去刪

boolean delete(String src) {
    if (NameNode.stateChangeLog.isDebugEnabled()) {
      NameNode.stateChangeLog.debug("DIR* FSDirectory.delete: "+src);
    }
    waitForReady();這個是線程一致性安全性檢查
    long now = FSNamesystem.now();
    int filesRemoved = unprotectedDelete(src, now);
    if (filesRemoved <= 0) {
      return false;
    }
    incrDeletedFileCount(filesRemoved);
    fsImage.getEditLog().logDelete(src, now);
    return true;
  }

waitForReady是說dir這個東西準備好了，因爲很多操作要dir去幹，它通過一個ready標誌來標明它是否準備好了，還是它在忙乎。具體幹就是unprotectedDelete這個方法了
http://blog.csdn.net/tracymkgld/article/details/17553173解釋了getExistingPathINodes的原理。

  int unprotectedDelete(String src, long modificationTime) {
    src = normalizePath(src);
    synchronized (rootDir) {
      INode[] inodes =  rootDir.getExistingPathINodes(src);
      //把文件涉及的inode都找到。/a/b/c/d這樣的文件，就是要找到a、b、c、d這四個inode
      INode targetNode = inodes[inodes.length-1];
      //顯然最後一個就是我們要的目標inode
      if (targetNode == null) { // non-existent src
        NameNode.stateChangeLog.debug("DIR* FSDirectory.unprotectedDelete: "
            +"failed to remove "+src+" because it does not exist");
        return 0;
      } else if (inodes.length == 1) { // src is the root
        NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedDelete: " +
            "failed to remove " + src +
            " because the root is not allowed to be deleted");//看到沒有想刪除root是不可以的，但是刪除root下的所有東西是可以的，這種事故我日也碰到過，算是災難性的
			return 0;} else {
        try {
          // Remove the node from the namespace
          removeChild(inodes, inodes.length-1);//例如/a/b/c/d這樣的文件，d可以是文件也可以是目錄，怎麼刪掉孩子？
//其實就是找到c，讓c（inodeDirectory）在他的孩子們List<Inode> children中把這個inode即d對應的inode抹去，抹去前還是要用二分查找法先找到它，保證的確是它的孩子？這個有無確實的必要？
          // set the parent's modification time
          inodes[inodes.length-2].setModificationTime(modificationTime);
//然後讓c修改自己的mtime,注意這裏看到沒有文件刪除會影響其父inode的mtime！！這對冷數據分析是有幫助的。
//到這裏爲止都是namespace的操作，或者叫inode的操作，真正的數據刪除操作開始了
          // GC all the blocks underneath the node.
          ArrayList<Block> v = new ArrayList<Block>();
          int filesRemoved = targetNode.collectSubtreeBlocksAndClear(v);
//把要刪除的塊放到v裏
          namesystem.removePathAndBlocks(src, v);
//namenode的FSNamesystem開始執行刪除塊操作，到底怎麼刪除的，接着往下看。
          if (NameNode.stateChangeLog.isDebugEnabled()) {
            NameNode.stateChangeLog.debug("DIR* FSDirectory.unprotectedDelete: "
              +src+" is removed");
          }
          return filesRemoved;
        } catch (IOException e) {
          NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedDelete: " +
              "failed to remove " + src + " because " + e.getMessage());
          return 0;
        }
      }
    }
  }

collectSubtreeBlocksAndClear是個抽象類，inodeDirectory和InodeFile都實現這個方法，具體看一眼，先看inodeDirectory的：

  int collectSubtreeBlocksAndClear(List<Block> v) {
    int total = 1;
    if (children == null) {
      return total;
    }
    for (INode child : children) {//children就是這個目錄用List方式保存的所有子inode
      total += child.collectSubtreeBlocksAndClear(v);//很顯然，這是個遞歸調用。
    }
    parent = null;
    children = null;
    return total;
  }

再看InodeFile怎麼搞的：

int collectSubtreeBlocksAndClear(List<Block> v) {
    parent = null;//每個inodefile都一個目錄爹
    for (Block blk : blocks) {//protected BlockInfo blocks[] = null;
//blocks管理者這個文件對應的數據塊信息。（數據塊信息包括塊id,所處dn列表，副本數以及其它信息。）
 v.add(blk); } blocks = null; return 1; }
現在明白了哈collectSubtreeBlocksAndClear方法就是把你要刪除的目錄下邊各層的所有文件（注意不是目錄，因爲目錄沒有blk嘛，只要在namespace中清理，目錄就不存在了啊）對應的BlockInfo（繼承自Block）拿到手。拿到手就刪吧，看看到底咋刪我擦：
  void removePathAndBlocks(String src, List<Block> blocks) throws IOException {
    leaseManager.removeLeaseWithPrefixPath(src);//去掉這些文件的租約，關於租約回頭再說。
    for(Block b : blocks) {
      blocksMap.removeINode(b);//大管家有個blockmap，cache住hdfs上所有的塊信息，peta將其分散化，namenodecache這麼多blockmap內存受不鳥啊
//
      corruptReplicas.removeFromCorruptReplicasMap(b);//要刪除的塊，它如果是壞塊，一定會出現在corruptReplicas中，這時候藥膳文件了，也不用顯示壞塊了。
	addToInvalidates(b);//顧名思議，要把這些塊標記爲無效塊，即可刪除塊。
    }
  }

看看addToIvalidates方法

private void addToInvalidates(Block b) {
    for (Iterator<DatanodeDescriptor> it = 
                                blocksMap.nodeIterator(b); it.hasNext();) {
      DatanodeDescriptor node = it.next();
      addToInvalidates(b, node);//要刪除的塊，要根據blockInfo去找它所有的dn，
    }
  }
  void addToInvalidatesNoLog(Block b, DatanodeInfo n) {
    Collection<Block> invalidateSet = recentInvalidateSets.get(n.getStorageID());
    if (invalidateSet == null) {
      invalidateSet = new HashSet<Block>();
      recentInvalidateSets.put(n.getStorageID(), invalidateSet);
    }//構建失效塊的緩存，跟下邊的pendingDeletionBlocksCount共同標明當前有哪些塊需要刪除！

    if (invalidateSet.add(b)) {
      pendingDeletionBlocksCount++;
    }
  }
  // Keeps a Collection for every named machine containing
  // blocks that have recently been invalidated and are thought to live
  // on the machine in question.
  // Mapping: StorageID -> ArrayList<Block>
  //
  private Map<String, Collection<Block>> recentInvalidateSets = 
    new TreeMap<String, Collection<Block>>();

看上面，看到沒有，recentInvalidateSets 用於保存那些失效的塊，或者叫待刪除的塊，就是要告訴dn去刪掉這些塊。同時有個計數器pendingDeletionBlocksCount輔助。
到這裏可以看到客戶端執行所謂的刪除操作都幹了啥？
1、找到要刪除的inode,從根開始匹配，同時找到所有它的上級inode，一直追溯到 “/”，其實找到它的父親一個就夠了
但是後面設計quota的管理，所有有必要找到所有上級inode
2、父inode在管理的children裏清理掉這個inode，修改父inode的mtime

3、從inode往下遞歸，找到所有子文件的所有block，清理目標文件及下邊的租約問題。
4、從namenode的blockmap中清理掉要刪除的blk記錄
5、檢查這些塊是不是在namenode的壞塊map裏，有的話去掉。
6、把要刪除的block信息加到失效blockcache中，同時增加待刪除block計數
其中1、2是namespace相關的操作，3、4、5、6是文件管理類的操作，在peta中就是fms的操作。
所謂刪除，其實namenode並沒有找到這些塊然後馬上命令datanode去刪除數據塊，而只是對namespace做了一些操作，然後把相應文件的塊丟到一個框裏（recentInvalidateSets），然後追加有多少塊要刪的記錄。
未完...

【HDFS】hdfs文件系統的刪除操作

【HDFS】文件入Trash-rename操作

如何簡單地測算系統吞吐量

【HDFS】hdfs文件系統的刪除操作

批量數據的聚合以及groupby實現

【HDFS】存儲balancer到底咋回事

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結