leveldb之cache

當向leveldb寫入數據時，首先是將數據寫入leveldb的Memtable（Memtable可能轉化爲IMMemtable）中，Memtable是存儲在內存中的。只有經過compaction操作後，纔會將內存中的數據寫入到磁盤中的sstable中。

當要讀數據時，首先在Memtable中查找，若沒有找到，則在sstable中繼續查找。而sstable是存儲在磁盤中的，這樣就需要進行多次磁盤操作，速度會非常慢。爲了加快查找速度，leveldb在採用了Cache的方式，盡最大可能減少隨機讀操作。

cache分爲Table Cache和 Block Cache兩種，其中Table Cache中緩存的是sstable的索引數據，Block Cache緩存的是Block數據，Block Cache是可選的，即可以在配置中來選擇是否打開這個功能。

當要進行Compaction操作調用CompactMemTable()時，會調用WriteLevel0Table()，此時則會創建一個Meta File，並保存在Table Cache中，然後可通過Table Cache進行讀取。

leveldb中的Cache主要用到了雙向鏈表、哈希表和LRU（least recently used）思想。

1、LRUHandle

LRUHandle表示了Cache中的每一個元素，通過指針形成一個雙向循環鏈表：

struct LRUHandle {
  void* value;//cache存儲的數據
  void (*deleter)(const Slice&, void* value);//是將數據從Cache中清除時執行的函數
  LRUHandle* next_hash;//解決hash碰撞，指向下一個hash值相同的節點
  LRUHandle* next;//next和prev構成雙向循環鏈表
  LRUHandle* prev;
  size_t charge;      // 所佔內存大小
  size_t key_length;
  uint32_t refs;
  uint32_t hash;      // Hash of key(); used for fast sharding and comparisons
  char key_data[1];   // Beginning of key

  Slice key() const {
    if (next == this) {
      return *(reinterpret_cast<Slice*>(value));
    } else {
      return Slice(key_data, key_length);
    }
  }
};

在Table Cache中，Cache的key值是SSTable的文件名稱，Value部分包含兩部分，一個是指向磁盤打開的SSTable文件的文件指針，這是爲了方便讀取內容；另外一個是指向內存中這個SSTable文件對應的Table結構指針。這樣就將不同的sstable文件像cache一樣進行管理。

leveldb通過LRUHandle 結構將hash值相同的所有元素串聯成一個雙向循環鏈表，通過指針next_hash來解決hash 碰撞。

2、HandleTable

leveldb通過HandleTable維護一個哈希表：

class HandleTable {
 public:
  HandleTable() : length_(0), elems_(0), list_(NULL) { Resize(); }
  ~HandleTable() { delete[] list_; }

  LRUHandle* Lookup(const Slice& key, uint32_t hash);
  LRUHandle* Insert(LRUHandle* h);
  LRUHandle* Remove(const Slice& key, uint32_t hash);
 private:
  uint32_t length_;
  uint32_t elems_;
  LRUHandle** list_;

  LRUHandle** FindPointer(const Slice& key, uint32_t hash) ;
  void Resize() ;
};

HandleTable只有3個成員變量： uint32_t length_;   //hash表長，相當於二維數組的元素個數
                          uint32_t elems_;   //hash表中元素總個數
                        LRUHandle** list_; //hash表頭，相當於一個雙向循環鏈表數組，其中每一個元素指向一個循環鏈表

初始時，length_=elems_=0, list_=NULL；此時hash表爲空
然後調用resize()爲hash表分配空間，初始時分配4個元素，當空間不足時，則成倍增加。此時hash表的結構如下：

3、LRUCache

LRUCache顧名思義是指一個緩存，同時它用到了LRU的思想

class LRUCache {
 public:
  LRUCache();
  ~LRUCache();

  void SetCapacity(size_t capacity) { capacity_ = capacity; }
  Cache::Handle* Insert(const Slice& key, uint32_t hash,
                        void* value, size_t charge,
                        void (*deleter)(const Slice& key, void* value));
  Cache::Handle* Lookup(const Slice& key, uint32_t hash);
  void Release(Cache::Handle* handle);
  void Erase(const Slice& key, uint32_t hash);
 private:
  void LRU_Remove(LRUHandle* e);
  void LRU_Append(LRUHandle* e);
  void Unref(LRUHandle* e);

  // Initialized before use.
  size_t capacity_;

  // mutex_ protects the following state.
  port::Mutex mutex_;
  size_t usage_;

  LRUHandle lru_;
  HandleTable table_;
};

LRUCache維護了一個雙向循環鏈表lru_和一個hash表table，當要插入一個元素時，首先將其插入到鏈表lru的尾部，然後根據hash值將其插入到hash表中。

當hash表中已存在hash值與要插入元素的hash值相同的元素時，將原有元素從鏈表中移除，這樣就可以保證最近使用的元素在鏈表的最尾部，這也意味着最近最少使用的元素在鏈表的頭部，這樣即可實現LRU的思想。

capacity_：當前Cache的最大容量
usage_：當前Cache已使用的內存大小，當超過最大容量時，從鏈表的頭部開始移除元素
mutex_：保證每個Cache都是線程安全的

4、Cache和ShardedLRUCache

Class Cache採用虛函數定義了Cache的接口：

class Cache {
 public:
  Cache() { }
  virtual ~Cache();
  struct Handle { };//handle corresponds to a mapping

  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) = 0;
  virtual Handle* Lookup(const Slice& key) = 0;
  virtual void Release(Handle* handle) = 0;
  virtual void* Value(Handle* handle) = 0;//handle returned by Lookup()
  virtual void Erase(const Slice& key) = 0;

  virtual uint64_t NewId() = 0;
 private:
  void LRU_Remove(Handle* e);
  void LRU_Append(Handle* e);
  void Unref(Handle* e);

  struct Rep;
  Rep* rep_;
};

具體實現的ShardedLRUCache繼承自Cache，實現了相應的功能：

class ShardedLRUCache : public Cache {
 private:
  LRUCache shard_[kNumShards];  //kNumShards = 2^4 = 16
  port::Mutex id_mutex_;
  uint64_t last_id_;
 public:
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value));
  virtual Handle* Lookup(const Slice& key);
  virtual void Release(Handle* handle) ;
  virtual void Erase(const Slice& key) ;
  virtual void* Value(Handle* handle) ;
};

由於每一個LRUCache是線程安全的，爲了多線程訪問，儘可能快速，減少鎖開銷，ShardedLRUCache內部將所有Cache根據hash值的高4位分爲16份，即有16個LRUCache分片。

查找Key時首先計算key屬於哪一個分片，分片的計算方法是取32位hash值的高4位，找到對應的LRUCache分配，然後在相應的LRUCache中進行查找，這樣就大大減少了多線程的訪問鎖的開銷。以Insert爲例：

  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) {
    const uint32_t hash = HashSlice(key);//首先得到hash值的高4位,用於找到相應的分片
    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);//然後在對應的LRUCache中進行相應的插入、移除等操作
  }

由上可知，ShardedLRUCache最終都是調用LRUCache的相應操作來實現的，因此下面將詳細介紹LRUCache中的相關操作

5、LRUCache的具體操作

假設此時剛剛完成LRUCache的初始化和Resize()

插入元素：

Cache::Handle* LRUCache::Insert(
    const Slice& key, uint32_t hash, void* value, size_t charge,
    void (*deleter)(const Slice& key, void* value)) {
  MutexLock l(&mutex_);

  LRUHandle* e = reinterpret_cast<LRUHandle*>(
      malloc(sizeof(LRUHandle)-1 + key.size()));
  e->value = value; //首先填充結構體LRUHandle
  e->deleter = deleter;
  e->charge = charge;
  e->key_length = key.size();
  e->hash = hash;
  e->refs = 2;  // One from LRUCache, one for the returned handle
  memcpy(e->key_data, key.data(), key.size());
  LRU_Append(e); //然後將元素加入到雙向循環鏈表中
  usage_ += charge;

  LRUHandle* old = table_.Insert(e); //將元素加入到hash表中
  if (old != NULL) {//當hash表中已存在hash值相同的元素時，將原有的元素移除
    LRU_Remove(old);
    Unref(old);
  }

 //當所有佔用空間超過最大容量時，則從雙向鏈表的表頭開始移除元素
 //由於表頭元素是最早加入的，因此是實現了LRU思想
  while (usage_ > capacity_ && lru_.next != &lru_) {
    LRUHandle* old = lru_.next;
    LRU_Remove(old);
    table_.Remove(old->key(), old->hash);
    Unref(old);
  }
  return reinterpret_cast<Cache::Handle*>(e);
}

1）調用LRU_Append(e) 將元素加入到雙向循環鏈表中：

雙向鏈表的插入操作示意圖如下：

當插入節點D時：

void LRUCache::LRU_Append(LRUHandle* e) {
  // Make "e" newest entry by inserting just before lru_
  e->next = &lru_;//D->next 指向表頭
  e->prev = lru_.prev; //D->prev=A->prev=C
  e->prev->next = e; //D->prev->next = C->next = D
  e->next->prev = e; //D->next->prev = A->prev = D
}

這樣就將一個節點插入到雙向鏈表的尾部了

2）調用 LRUHandle* old = table_.Insert(e); 向hash表中插入一個元素：

  LRUHandle* Insert(LRUHandle* h) {
    LRUHandle** ptr = FindPointer(h->key(), h->hash);//首先在hash表中進行查找
    LRUHandle* old = *ptr;
    h->next_hash = (old == NULL ? NULL : old->next_hash);//當old不爲NULL時，表示hash表中已存在一個與要插入的元素的hash值完全相同的元素，這樣只是替換元素，不會改變元素個數
    *ptr = h;
    if (old == NULL) {//old爲空表示插入一個新元素，要增加元素個數並調整hash表大小
      ++elems_;
      if (elems_ > length_) {
        Resize();
      }
    }
    return old;
  }

當要插入一個元素時，首先進行查找：

  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    while (*ptr != NULL &&
           ((*ptr)->hash != hash || key != (*ptr)->key())) {
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

假設首先要插入元素的hash值爲1：
此時length_ = 4，則 [ hash & (length_ - 1) ]=[01 & 11]= [1]，即ptr= &list_[1]，此時*ptr = NULL
則返回ptr並完成插入操作，此時hash表爲：

當要繼續插入元素，且元素的hash值爲5時：
此時length_ = 4，則 [ hash & (length_ - 1) ]=[101 & 11]= [1]，ptr= &list_[1]，此時*ptr = e(1)，繼續向後查找，ptr=&(*ptr)->next_hash; 最終*ptr = NULL ，則返回ptr並完成插入操作，此時hash表爲：

插入操作的基本操作爲：首先構造LRUHandle結構體，然後將其插入到雙向鏈表中，然後根據hash值將其插入到hash表中，並調整hash表的大小，最後根據內存大小，移除雙向鏈表表頭後第一個元素，來實現LRU算法。

其它操作如Lookup()、Erase()、Release()等，都是先根據hash值的高四位來找到對應的LRUCache，然後在相應的hash表中根據hash值找到相應元素，最終來對找到的元素執行相關操作。

6、總結

1）一個SharedLRUCache根據hash值的高4位，將所有的Cache分爲16份，每一份爲一個LRUCache。
2）LRUCache用來維護所有元素（LRUHandle），通過鎖mutex來保證線程安全，同時記錄當前
Cache的最大容量和已用內存。
3）每一個LRUCache中有一個循環鏈表（LRUHandle）和一個hash表（HandleTable），通過循環鏈
表實現LRU思想，通過hash表來加快查找速度。

假定設置的最大容量爲4，即capacity_=4，然後依次向其中插入元素1、5、2、4、9，簡單的示意圖如下：

當插入元素1，5, 2, 4後，usage=4。每個元素在hash表中的位置爲 list [ hash&length_-1 ] ，由此可知，1和5,2和4有着相同的hash值，分別對應list[1]，list[2]
此時得到如下結果（爲簡化，與循環鏈表頭部相連的部分指針線未畫出）：

此時已達到最大容量，假設要繼續插入一個元素9（對應list[1]），插入9後，usage>capacity，則要從鏈表頭部開始移除元素，即元素1會被移除，得到結果如下：

原文地址：http://blog.csdn.net/u012658346/article/details/45486051

站內首發文章

wolfcc

發佈了6 篇原創文章 · 獲贊 72 · 訪問量 65萬+

他的留言板關注

1、LRUHandle

2、HandleTable

3、LRUCache

4、Cache和ShardedLRUCache

5、LRUCache的具體操作

6、總結

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

Linux C++內存映射方式讀寫文件

uWSGI定時器導致web.py的內存泄露問題

linux和STL 常用頭文件及說明

百家爭鳴|國內外NLP領域學術界和工業界的牛人和團隊

centos 7上nginx+uwsgi 性能調優

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結