LevelDB之LRUCaChe解析

背景:

之前學過操作系統的都應該知道LRU Cache算法，即最近最少使用算法。算法的緣由是Cache的容量有限，不可能無限制的去存儲數據，那麼在容量用完又需要添加新的數據時，就需要在原cache當中選擇一些數據清除掉，而我們選擇的數據就是那些最近最少使用的數據(實際上在我看來，說是最久未使用算法更形象，因爲該算法每次替換掉的就是一段時間內最久沒有使用過的內容)

技術實現:

LRU一般的實現是hash map + 雙向鏈表，hash map是爲了在cache當中尋找數據的時候能夠以O(1)的時間複雜度去返回找尋的結果。而雙向鏈表就是用於實現最近最少使用的思想，當每次數據被訪問時，就將其插入到雙向鏈表的頭部。那麼越接近頭部的數據，就越是最近被使用過的數據，越靠近雙向尾部的數據，就越是最久未被使用的數據，尾部的數據就是我們未來需要刪除的對象。在cache容量滿了而需要刪除數據的時候，只需要從尾部開始遍歷雙向鏈表，將數據清除掉，就達到了剔除的是最近最少使用數據的目的。

現在舉個例子，假設頭節點爲head，1被使用過，此時1是最近被使用過的數據，將其插入到head的next。

此時1是最近被使用過的，然後又使用了數2，又將其插入到頭部，那麼如下所示：

那麼2是最新被使用的數據，1次之。那麼刪除的時候就先選最久未被使用的1。這就是最近最少使用的核心概念。

LevelDB的LRUCache的實現:

先看類的實現框架

// A single shard of sharded cache.
class LRUCache {
 public:
  LRUCache();
  ~LRUCache();

  // Separate from constructor so caller can easily make an array of LRUCache
  void SetCapacity(size_t capacity) { capacity_ = capacity; }

  // Like Cache methods, but with an extra "hash" parameter.
  Cache::Handle* Insert(const Slice& key, uint32_t hash,
                        void* value, size_t charge,
                        void (*deleter)(const Slice& key, void* value));
  Cache::Handle* Lookup(const Slice& key, uint32_t hash);
  void Release(Cache::Handle* handle);
  void Erase(const Slice& key, uint32_t hash);
  void Prune();
  size_t TotalCharge() const {
    MutexLock l(&mutex_);
    return usage_;
  }

 private:
  void LRU_Remove(LRUHandle* e);
  void LRU_Append(LRUHandle*list, LRUHandle* e);
  void Ref(LRUHandle* e);
  void Unref(LRUHandle* e);
  bool FinishErase(LRUHandle* e) EXCLUSIVE_LOCKS_REQUIRED(mutex_);

  // Initialized before use.
  size_t capacity_;

  // mutex_ protects the following state.
  mutable port::Mutex mutex_;
  size_t usage_ GUARDED_BY(mutex_);

  // Dummy head of LRU list.
  // lru.prev is newest entry, lru.next is oldest entry.
  // Entries have refs==1 and in_cache==true.
  LRUHandle lru_ GUARDED_BY(mutex_);      //lru_ 是冷鏈表，屬於冷宮

  // Dummy head of in-use list.
  // Entries are in use by clients, and have refs >= 2 and in_cache==true.
  LRUHandle in_use_ GUARDED_BY(mutex_); //in_use_ 屬於熱鏈表，熱數據在此鏈表

  HandleTable table_ GUARDED_BY(mutex_);
};

可以看到數據成員主要有HandleTable類型變量table_(猜測和hash有關)，LRUHandle類型的變量in_use_(顧名思義是正在使用當中的數據)，LRUHandle類型的變量lru_(顧名思義最近最少使用的數據)，還有size_t類型的usage_(當前使用的容量)和capacity_(總總量)。在看到in_use_和lru_可以進行猜測，其應該是有兩個雙向鏈表，一個維護正在被使用的數據，一個維護最近最少被使用的數據，那麼清除cache數據的時候應當會從lru_當中去選擇數據刪除。

此類的成員方法是圍繞着私有方法LRU_Remove、LRU_Append、Ref、Unref來展開實現的，具體的public實現方法有SetCapacity、Insert、Lookup、Release、Erase、Prune、TotalCharge，實現細節後面具體分析。整體的分析思路是分析各個函數的含義，以及數據成員在其中扮演的角色。最後整體上講述其多線程安全性的實現，還有整個LRUCache與普通LRU方法之間的差異性以及其相關實現的亮點。

Cache::Handle* LRUCache::Insert(
    const Slice& key, uint32_t hash, void* value, size_t charge,
    void (*deleter)(const Slice& key, void* value)) {
  MutexLock l(&mutex_);

  LRUHandle* e = reinterpret_cast<LRUHandle*>(
      malloc(sizeof(LRUHandle)-1 + key.size()));
  e->value = value;
  e->deleter = deleter;
  e->charge = charge;
  e->key_length = key.size();
  e->hash = hash;
  e->in_cache = false;
  e->refs = 1;  // for the returned handle.
  memcpy(e->key_data, key.data(), key.size());

  if (capacity_ > 0) {
    e->refs++;  // for the cache's reference.
    e->in_cache = true;
    LRU_Append(&in_use_, e); //將該緩存記錄插入到雙向鏈表中熱鏈表中
    usage_ += charge;      //使用的容量增加
    fprintf(stderr,"fun(%s) line(%d) usage_(%d) capacity_(%d)\n", __FILE__, __LINE__, usage_, capacity_);
    FinishErase(table_.Insert(e)); //如果是更新操作，回收舊記錄，新的插入哈希表會取代舊的，即舊的不會存在哈希表裏，所以舊的同時需要finish removing *e from the cache
  } else {  // don't cache. (capacity_==0 is supported and turns off caching.)
    // next is read by key() in an assert, so it must be initialized
    e->next = nullptr;
  }
  // 已用容量超過總量，回收最近最少被使用的緩存記錄
  while (usage_ > capacity_ && lru_.next != &lru_) {
  	 //如果容量超過了設計的容量，並且冷鏈表中有內容，則從冷鏈表中刪除元素直到usage_ <= capacity_
    LRUHandle* old = lru_.next;
    assert(old->refs == 1);
    bool erased = FinishErase(table_.Remove(old->key(), old->hash));
    if (!erased) {  // to avoid unused variable when compiled NDEBUG
      assert(erased);
    }
  }

  return reinterpret_cast<Cache::Handle*>(e);
}

從Insert實現可以大體看出這樣幾個步驟:

、先動態分配一個LRUHandle*類型的變量e，內存大小爲sizeof(LRUHandle)-1 + key.size()。
、通過傳遞的參數初始化e，注意，初始化時的引用計數refs爲1，in_cache爲false表示還未存進緩存。
當LRUCache的容量大於0的時候，首先將引用計數增1以及in_cache置爲true表示已插入到cache中。然後會做三件事:1、將e存入到in_use_雙向鏈表當中，表示其正在被使用當中；2、增大LRUCache的當前使用容量；3、將e插入到哈希表(table_)中。
當LRUCache的容量等於0的時候，表示關閉了cache功能，不作插入存儲操作
假如當前已使用的容量_usage大於預定的總容量capacity_且_lru當中有數據(LRUHandle*類型)的時候，會一直清除_lru當中插入的數據直到_usage沒有超過capacity_。
將e抽象爲Handle*類型後返回。

大體步驟就如此，但是深究細節會有一些疑問：1、refs的作用是什麼？2、什麼情況_lru雙向鏈表當中會有數據？3、爲何要拆分爲兩個雙向鏈表in_use_和lru_？

refs的作用是什麼？

refs可以看作其維護這數據的狀態，也可以看作是當前多少個併發持有了這個指針，refs大於1的時候表示數據在in_use雙向鏈表當中、等於1的時候表示其在lru_雙向鏈表當中，等於0的數據會被銷燬掉。其實可以表示數據被使用的熱度，使用得越頻繁，其refs值就會越大。而refs值越低，則使用得頻率越低，表示最近最少被使用，那麼其會是cache中首要被清除得對象。

什麼情況_lru雙向鏈表當中會有數據？

當數據被更新的時候(插入的時候發現key值已被保存過)，或數據不被使用的時候，會進行FinishErase操作，然後執行Unref操作在其refs爲1的時候，將數據插入到lru_雙向鏈表當中，在refs爲0的時候真正的銷燬數據。

爲何要拆分爲兩個雙向鏈表in_use_和lru_？

如果只有一個鏈表，且鏈表的尾部數據引用計數>1的話，這個節點是不能被淘汰的。只能每次從尾部往前查找，直到第一個引用計算==1的數據才能被淘汰。效率較低。

於是這裏拆成兩個鏈表， used鏈表+lru鏈表， used鏈表代表正在使用的鏈表，這裏的數據引用計數>1，這裏的數據不可能被淘汰。當引用計數減少到1的時候，再放到lru鏈表，因此lru_鏈表當中的全是refs爲1的數據，這裏的數據都可以被淘汰。 (隨着引用計數的變更，在兩個鏈表裏來回切換。從lru鏈表淘汰的時候，再delete清理內存。)

講述完Insert,其他的方法實現就比較容易了

Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
  MutexLock l(&mutex_);
  LRUHandle* e = table_.Lookup(key, hash);
  if (e != nullptr) {
    Ref(e);
  }
  return reinterpret_cast<Cache::Handle*>(e);
}

cache的lookup實際上就是調用哈希表table_的LookUp快速尋找數據，table_的類型HandleTable後面會詳細介紹

void LRUCache::Release(Cache::Handle* handle) {
  MutexLock l(&mutex_);
  Unref(reinterpret_cast<LRUHandle*>(handle));
}

cache的Release的意義就是不再使用此數據(進行一次Unref操作)，注意這裏未必會真正銷燬數據，只有其refs爲0的時候纔會執行deleater銷燬數據

void LRUCache::Erase(const Slice& key, uint32_t hash) {
  MutexLock l(&mutex_);
  FinishErase(table_.Remove(key, hash));
}

cache的Erase就是根據傳遞的key和其通過hash算法得到的hash值刪除cache存儲的相關數據。

void LRUCache::Prune() {
  MutexLock l(&mutex_);
  while (lru_.next != &lru_) {
    LRUHandle* e = lru_.next;
    assert(e->refs == 1);
    bool erased = FinishErase(table_.Remove(e->key(), e->hash));
    if (!erased) {  // to avoid unused variable when compiled NDEBUG
      assert(erased);
    }
  }
}

cache的Prune方法就是清除lru_鏈表裏的數據。

cache的方法解析到此結束，可以看到其實現和哈希表類HandleTable息息相關，哈希表主要是用於以0(1)的時間複雜度查詢時間，哈希表類的具體定義如下:

// We provide our own simple hash table since it removes a whole bunch
// of porting hacks and is also faster than some of the built-in hash
// table implementations in some of the compiler/runtime combinations
// we have tested.  E.g., readrandom speeds up by ~5% over the g++
// 4.4.3's builtin hashtable.
class HandleTable {
 public:
  HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); }
  ~HandleTable() { delete[] list_; }

  LRUHandle* Lookup(const Slice& key, uint32_t hash) {
    return *FindPointer(key, hash);
  }

  LRUHandle* Insert(LRUHandle* h) {
    LRUHandle** ptr = FindPointer(h->key(), h->hash);
    LRUHandle* old = *ptr;
    h->next_hash = (old == nullptr ? nullptr : old->next_hash);
    *ptr = h;
    if (old == nullptr) {
      ++elems_;
      if (elems_ > length_) {
        // Since each cache entry is fairly large, we aim for a small
        // average linked list length (<= 1).
        Resize();
      }
    }
    return old;
  }

  LRUHandle* Remove(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = FindPointer(key, hash);
    LRUHandle* result = *ptr;
    if (result != nullptr) {
      *ptr = result->next_hash;
      --elems_;
    }
    return result;
  }

 private:
  // The table consists of an array of buckets where each bucket is
  // a linked list of cache entries that hash into the bucket.
  uint32_t length_; //當前hash桶的個數
  uint32_t elems_; //整個hash表一共存在了多少個元素
  LRUHandle** list_; //二維指針，每個指針指向一個桶的表頭位置

  // Return a pointer to slot that points to a cache entry that
  // matches key/hash.  If there is no such cache entry, return a
  // pointer to the trailing slot in the corresponding linked list.
  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    while (*ptr != nullptr &&
           ((*ptr)->hash != hash || key != (*ptr)->key())) {
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

  void Resize() {
    uint32_t new_length = 4;
    while (new_length < elems_) {
      new_length *= 2;
    }
    LRUHandle** new_list = new LRUHandle*[new_length];
    memset(new_list, 0, sizeof(new_list[0]) * new_length);
    uint32_t count = 0;
    for (uint32_t i = 0; i < length_; i++) {
      LRUHandle* h = list_[i];
      while (h != nullptr) {
        LRUHandle* next = h->next_hash;
        uint32_t hash = h->hash;
        LRUHandle** ptr = &new_list[hash & (new_length - 1)];
        h->next_hash = *ptr;
        *ptr = h;  //將某個hash對應的新桶的鏈表頭指向h，h的next_hash爲剛剛建立的新桶，相當於逐步往桶的頭部插入節點。
        h = next;
        count++;
      }
    }
    assert(elems_ == count);
    delete[] list_;
    list_ = new_list;
    length_ = new_length;
  }
};

該類的數據成員有哈希桶列表list_、哈希桶的個數length_、整個列表擁有的數據的總數elems_。它的方法通俗易懂，解析如下:

HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); }

void Resize() {
    uint32_t new_length = 4;
    while (new_length < elems_) {
      new_length *= 2;
    }
    LRUHandle** new_list = new LRUHandle*[new_length];
    memset(new_list, 0, sizeof(new_list[0]) * new_length);
    uint32_t count = 0;
    for (uint32_t i = 0; i < length_; i++) {
      LRUHandle* h = list_[i];
      while (h != nullptr) {
        LRUHandle* next = h->next_hash;
        uint32_t hash = h->hash;
        LRUHandle** ptr = &new_list[hash & (new_length - 1)];
        h->next_hash = *ptr;
        *ptr = h;  //將某個hash對應的新桶的鏈表頭指向h，h的next_hash爲剛剛建立的新桶，相當於逐步往桶的頭部插入節點。
        h = next;
        count++;
      }
    }
    assert(elems_ == count);
    delete[] list_;
    list_ = new_list;
    length_ = new_length;
  }
};

其構造函數會先進行Resize()操作，第一次Reisze()會創建一個長度爲4的哈希桶列表，每個列表的元素是LRUHandle*類型的指針，由於第一次Reisze()，其各個桶當中的指針會指向nullptr。而之後的Reisze()操作會適當的擴大哈希桶列表的長度，然後將舊的哈希桶列表當中的數據重新分散到新的列表當中，以確保一直能夠保證每個桶當中只存儲一個數據以保證查詢的時間複雜度爲0(1)。

// Return a pointer to slot that points to a cache entry that
  // matches key/hash.  If there is no such cache entry, return a
  // pointer to the trailing slot in the corresponding linked list.
  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    while (*ptr != nullptr &&
           ((*ptr)->hash != hash || key != (*ptr)->key())) {
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

LRUHandle* Lookup(const Slice& key, uint32_t hash) {
    return *FindPointer(key, hash);
  }

而FindPointer就是根據提供的hash值和key值去哈希桶列表list_中尋找數據，規則如下:先通過hash值和哈希桶列表的長度length_確定所要尋找的數據在哈希桶列表中的位置hash & (length_ - 1)。然後遍歷此桶的鏈表找尋數據，若無返回nullptr。

Lookup接口實際上就是FindPointer的封裝

LRUHandle* Insert(LRUHandle* h) {
    LRUHandle** ptr = FindPointer(h->key(), h->hash);
    LRUHandle* old = *ptr;
    h->next_hash = (old == nullptr ? nullptr : old->next_hash);
    *ptr = h;
    if (old == nullptr) {
      ++elems_;
      if (elems_ > length_) {
        // Since each cache entry is fairly large, we aim for a small
        // average linked list length (<= 1).
        Resize();
      }
    }
    return old;
  }

  LRUHandle* Remove(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = FindPointer(key, hash);
    LRUHandle* result = *ptr;
    if (result != nullptr) {
      *ptr = result->next_hash;
      --elems_;
    }
    return result;
  }

Insert操作先通過Key和hash值確認哈希表中是否已有此數據，若有則佔有他的位置並返回舊的數據(更新操作)，若無則插入到相應的哈希桶當中並更新哈希桶的頭節點爲此節點。同時注意！假如哈希桶列表擁有的元素超過哈希桶列表的長度了就再次進行Resize()操作重新分散數據

Remove操作就更簡單了，先通過key和hash值查詢列表當中是否有此數據，有就更新其所在的桶的頭節點爲此節點的下一個節點，然後返回此節點(用於在LRUCache中刪除此節點在lru_雙向鏈表或in_use_雙向鏈表中的位置)。

至此整個LRUCache的核心就解析完畢。

雖然LRUCache的實現基本解析完畢，但在leveldb中實際上這只是一個LRUCache的具體的實現，leveldb爲了提高併發量，還提供了一個類ShardedLRUCache維護有16個LRUCache，當需要容量創造大小爲kCacheSize的cache的時候，會將其均分爲16份，每份生成容量爲kCacheSize / 16的LRUCache，同時在插入和刪除等操作的時候其不是圍繞整個大的容量的cache進行加鎖操作，而是按照分區的粒度去進行鎖操作，提高了併發量(按區進行鎖操作不同分區的操作可以併發執行，不按區進行鎖操作不同線程無法併發進行cache相關操作)

其類定義和實現很簡單，如下所示:

static const int kNumShardBits = 4;
static const int kNumShards = 1 << kNumShardBits;

class ShardedLRUCache : public Cache {
 private:
  LRUCache shard_[kNumShards];
  port::Mutex id_mutex_;
  uint64_t last_id_;

  static inline uint32_t HashSlice(const Slice& s) {
    return Hash(s.data(), s.size(), 0);
  }

  static uint32_t Shard(uint32_t hash) {
//hash右移28位，提取高4位的值，4位二進制最大值爲2^4 - 1。
    return hash >> (32 - kNumShardBits);
  }

 public:
  explicit ShardedLRUCache(size_t capacity)
      : last_id_(0) {
    //爲什麼減1呢？試想一下，總容量capacity爲16，正常情況下，16個分區，每個分區的容量爲1個就可以了，但是假如不減1，則
    //（16+16）/16 =2 ，就變成每個分區有2個容量，這會造成冗餘，於是(16+(16-1)) / 16 =1 ,滿足每個區只有一個容量且不冗餘
    //即只有每超過一個kNumShards時候，纔會增加一個分區。
    const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
    for (int s = 0; s < kNumShards; s++) {
      shard_[s].SetCapacity(per_shard);
    }
  }
/*
使用哈希值的前4位進行路由, 路由到2^4(0-15) 個獨立的緩存區, 各個緩存區維護自己的mutex進行併發控制; 
哈希表在插入節點時判斷空間使用率, 並進行自動擴容, 保證查找效率在O(1)
*/
  virtual ~ShardedLRUCache() { }
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) {
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
  }
  virtual Handle* Lookup(const Slice& key) {
    const uint32_t hash = HashSlice(key);
	printf("hash->%u, Shard(hash)->%d\n",hash, Shard(hash));
    return shard_[Shard(hash)].Lookup(key, hash);
  }
  virtual void Release(Handle* handle) {
    LRUHandle* h = reinterpret_cast<LRUHandle*>(handle);
    shard_[Shard(h->hash)].Release(handle);
  }
  virtual void Erase(const Slice& key) {
    const uint32_t hash = HashSlice(key);
    shard_[Shard(hash)].Erase(key, hash);
  }
  virtual void* Value(Handle* handle) {
    return reinterpret_cast<LRUHandle*>(handle)->value;
  }
  virtual uint64_t NewId() {
    MutexLock l(&id_mutex_);
    return ++(last_id_);
  }
  virtual void Prune() {
    for (int s = 0; s < kNumShards; s++) {
      shard_[s].Prune();
    }
  }
  virtual size_t TotalCharge() const {
    size_t total = 0;
    for (int s = 0; s < kNumShards; s++) {
      total += shard_[s].TotalCharge();
    }
    return total;
  }
};

這裏不得不說一下 NewId() 的作用，NewId() 接口可以生成一個唯一的 id，多線程環境下可以使用這個 id 與自己的鍵值拼接起來，防止不同線程之間互相覆寫，以提高其線程安全性。

至此，Leveldb有關LRUCache的實現終於解析完畢，可以看到大神的代碼淺顯易懂但又亮點多多，從數據結構和算法的使用，從到線程安全性到提高併發性的技巧，都讓我受益匪淺。

LevelDB之LRUCaChe解析

linux高性能服務器編程學習筆記一：TCP/IP協議詳解

Leetcode:minimum-depth-of-binary-tree

levelDB之內存碎片和Arena

彙編語言學習筆記二：訪問內存的寄存器

linux高性能服務器編程學習筆記四：HTTP協議相關格式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結