f2fs系列之十：f2fs到底如何避免wandering tree的？

原創

2020-05-17 13:21

名詞解釋

f2fs
flash friend file system，最早由三星開發的一個對flash/SSD 友好的文件系統。
block
f2fs中數據讀寫最小的切片單位，通常4K。沒個block 有一個編號，每個編號對應某個偏移的物理切片。包括：

data block 專門存放數據
node block 專門存放Block 編號，這個編號可以視爲NodeID。

f2fs的索引層次

包括四層：

indoe
文件或目錄的索引節點，含義同通常的inode節點，可以理解成文件或目錄的最頂層索引。
direct pointer:
包含在inode中，地址直接指向數據塊。
indirect node
indirect node是一個node block，裏面存放的是Node ID。這個NodeID需要差NAT才能找到對應的數據Node。
doubel indirect node:
indirect node是一個node block，裏面存放的是Node ID。這個NodeID需要差NAT才能找到對應的indirect node。

NAT
顧名思義，NAT是node address table,是file/directory內邏輯地址到物理地址轉換的查找表。

背景

基於LFS的文件系統如果使用平常的多級索引數據結構，會存在wandering tree問題。比如，如果使用下面的多級索
引：

當我們對某個indirect指向的數據更新的時候，基於LFS實現，就需要更新對應的索引樹，如下圖所示：

顯然這樣一次更新數據，需要更新三級索引結構，寫放大太大。這個問題就是LFS的wandring tree 問題。f2fs是如何解決這個問題的呢？

實現原理

分析上面過程，造成wandring tree的關鍵是上級索引直接指向了下級索引，基於LFS索引也不能原地更新，這樣一旦下級索引有改動，上級索引也需要隨之更新。爲此，可以在上級索引和下級索引直接引入一層防火牆，來隔離這兩者的相互影響，避免更新向上層傳播。

f2fs裏這裏利用的就是NAT(Node Address Table)這個統一的表，用以實現node id和node block的地址映射。
如下圖所示：

比如，我們要訪問某個文件第11個block的數據塊，根據pointer11,找到對應的indirect block，基於inode的索引層次，算出對應物理塊地址在這個indirect block內由哪個entry索引，讀取對應entry記錄的NodeID,查找NAT,得到最終的物理地址。最後基於這個物理地址，訪問數據。

相關數據結構

主要的數據結構包括：

inode

struct inode {
    ......
    __le32 i_flags;                 /* file attributes */
    __le32 i_pino;                  /* parent inode number */
    __le32 i_namelen;               /* file name length */
    __u8 i_name[F2FS_NAME_LEN];     /* file name for SPOR */
    __u8 i_dir_level;               /* dentry_level for large dir */
    ......
    union {
            struct { // for what usage?
                    __le16 i_extra_isize;   /* extra inode attribute size */
                    __le32 i_inode_checksum;/* inode meta checksum */
                    __le64 i_crtime;        /* creation time */
                    __le32 i_crtime_nsec;   /* creation time in nano scale */
                    __le32 i_extra_end[0];  /* for attribute size calculation */
            } __packed;
            __le32 i_addr[DEF_ADDRS_PER_INODE];     /* Pointers to data blocks */
    };
    __le32 i_nid[DEF_NIDS_PER_INODE];       /* direct(2), indirect(2),
                                            double_indirect(1) node id */
} __packed;

node

struct direct_node {
        __le32 addr[ADDRS_PER_BLOCK];   /* array of data block address */
} __packed;

struct indirect_node {
        __le32 nid[NIDS_PER_BLOCK];     /* array of data block address */
} __packed;
struct f2fs_node {
        /* can be one of three types: inode, direct, and indirect types */
        union {
                struct f2fs_inode i;
                struct direct_node dn;
                struct indirect_node in;
        };
        struct node_footer footer;
} __packed;

/*
 * For NAT entries
 */
#define NAT_ENTRY_PER_BLOCK (PAGE_SIZE / sizeof(struct f2fs_nat_entry))

struct f2fs_nat_entry {
        __u8 version;           /* latest version of cached nat entry */
        __le32 ino;             /* inode number */
        __le32 block_addr;      /* block address */
} __packed;

struct f2fs_nat_block {
        struct f2fs_nat_entry entries[NAT_ENTRY_PER_BLOCK];
} __packed;

NAT的更新

根據上面流程可以看到，寫文件的過程中實際也會更新對應的NAT。那麼，NAT 是每一次IO都需要更新嗎？如果這樣，也會有一次寫放大，爲此實際工程中可以把這些更新攥在一起落盤，減少整體寫的次數。

那麼如果不及時更新，掉電怎麼辦？可以把NAT的更新也以log strucuture 的形式追加落盤，在f2fs 做checkpoint的時候，把這些更新統一寫入NAT，然後回收先前的NAT 更新日誌空間，流程參考：http://xiaqichao.cn/wordpress/?p=211

//首發於http://xiaqichao.cn 歡迎光臨

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

f2fs系列之十：f2fs到底如何避免wandering tree的？

名詞解釋

背景

實現原理

相關數據結構

NAT的更新

電子科技大學計算機科學與技術就讀體驗

Golang爬蟲代理接入的技術與實踐

f2fs系列之十：f2fs到底如何避免wandering tree的？

如何計算和優化追加寫引擎中GC的寫放大

page cache的淘汰策略和組織形式

適配SSD介質的存儲引擎的GC的思考

FIO性能測試數據畫圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結