Hbase實戰 2.2.7 合併:HBase日常工作



2.2.7 合併:HBase日常工作

The Delete command doesn’t delete the value immediately. Instead, it marks the
record for deletion. That is, a new “tombstone” record is written for that value, marking
it as deleted. The tombstone is used to indicate that the deleted value should no
longer be included in Get or Scan results. Because HFiles are immutable, it’s not until
a major compaction runs that these tombstone records are reconciled and space is truly
recovered from deleted records.

刪除命令並不立即刪除數據。它只是標記該記錄已經被刪除(標記保存在哪呢?)。也就是說,關於該數據的一個tombstone記錄被寫入,標記該記錄被刪除。該tombstone用做指示該記錄不應該保存在Get、Scan的結果中。因爲HFile是不變的,所以,直到大合併執行時,tombstone記錄纔會被同步,刪除數據的空間纔會被回收。

Compactions come in two flavors: minor and major. Both types result in a consolidation
of the data persisted in HFiles. A minor compaction folds HFiles together,
creating a larger HFile from multiple smaller HFiles, as shown in figure 2.3. Restricting
the number of HFiles is important for read performance, because all of them
must be referenced to read a complete row. During the compaction, HBase reads the
content of the existing HFiles, writing records into a new one. Then, it swaps in the
new HFile as the current active one and deletes the old ones that formed the new
one.2 HBase decides which HFiles to compact based on their number and relative
sizes. Minor compactions are designed to be minimally detrimental to HBase performance,
so there is an upper limit on the number of HFiles involved. All of these settings
are configurable.

合併有兩種:大合併和小合併。兩種合併的結果都是保存在HFile中的數據被合併。小合併把多個小的HFile結合在一起,形成一個較大的HFile。限制HFile的個數對提高讀性能非常重要,因爲讀取一個完整行時,這些HFile都會被引用到(這個意思是要從所有的HFile的block中找到所有符合條件的block吧,HFile越多,查找越耗時?)。合併過程中,HBase讀取現存HFile,把記錄寫入新的HFile。然後進行新舊HFile的切換並刪除舊的HFile。HBase決定合併哪些HFile,取決於他們的號碼和大小。小合併設計出發點是輕微影響HBase的性能,因此所涉及的HFile的數量有上限。這些設置都是可以配置的。

When a compaction operates over all HFiles in a column family in a given region, it’s
called a major compaction. Upon completion of a major compaction, all HFiles in the
column family are merged into a single file. Major compactions can also be triggered for the entire table (or a particular region) manually from the shell.

合併一個列簇的一個region的所有HFile的時候,這個合併類型就是大合併。當一個大合併結束的時候,一個列簇的所有HFile被合併成一個單獨的文件(根據前面一句話,大合併應該不會跨region?)。通過shell,大合並可以手動觸發,觸發對象可以是整個表,也可以是一個特殊的region。

This is a relatively expensive operation and isn’t done often. Minor compactions, on the other hand, are relatively lightweight and happen more frequently.

大合併相對比較耗費資源,不能經常做。另一方面,小合併相對輕量級,可以經常進行。

Major compactions are the only chance HBase has to clean up deleted records. Resolving a delete requires removing both the deleted record and the deletion marker. There’s no guarantee that both the record and marker are in the same HFile. A major compaction is the only time when
HBase is guaranteed to have access to both of these entries at the same time.
The compaction process is described in greater detail, along with incremental
illustrations, in a post on the NGDATA blog.

大合併是HBase清除已刪除記錄的唯一機會。處理刪除記錄包括刪除記錄本身和刪除標記。已刪除的記錄和刪除標記不一定在同一個HFile裏面(如果刪除標記不保存在特殊的地方,那麼查詢某條記錄豈不是要搜索所有的HFile嗎?)。大合併是唯一時機,HBase同時訪問這兩個信息。合併過程在NGDATA blog(www.ngdata.com/site/blog/74-ng.html)裏有詳細描述,還有插圖。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章