1.keys命令

keys命令相信大家應該都用過，該命令會遍歷整個redis的字典空間，對要查找的key進行匹配並返回。

就像官方文檔所說：在生產環境使用該方法的過程中要非常小心，因爲redis服務器在執行該命令的時候其他客戶端讀寫命令都會被阻塞。

使用方法：

KEYS pattern

示例：

127.0.0.1:6379> set why1 1
OK
127.0.0.1:6379> set why2 2
OK
127.0.0.1:6379> set why3 3
OK
127.0.0.1:6379> set why4 4
OK
127.0.0.1:6379> keys why*   
1) "why3"
2) "why4"
3) "why2"
4) "why1"
127.0.0.1:6379>

2.redis的HashTable（字典）

keys命令，是遍歷整個數據庫。而redis是又是一個k-v型的內存數據庫,一說到k-v，不由自主就想到了Java的HashMap。那麼redis的"hashtable"的數據結構是什麼樣的呢？

1.HashTable的數據結構上下文

我們以debug模式運行redis-server的時候，可以看到在redis.c的initServer方法中，初始化了db。

dbnum的值來源於配置：databases，默認爲16。

在Redis.h中，對每個數據庫實例做了定義：

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
  	//刪除了一些參數.......
} redisDb;

那看樣子，dict可能是對應的哈希表實現了，我們看下dict的結構：

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dict {
    //一系列操作鍵值空間的函數
    dictType *type;
		//私有數據
    void *privdata;
    
    dictht ht[2];
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */

    int iterators; /* number of iterators currently running */

} dict;

看樣子dict並不是最終的哈希表。我們繼續看下dictht的結構：

typedef struct dictht {
    
    // hash表的數組
    dictEntry **table;

    //表的大小
    unsigned long size;
    
		//size-1，用於計算索引
    unsigned long sizemask;

 		//hash表中元素的數量
    unsigned long used;

} dictht;

看樣子dicttht就是哈希表的實現了。可以看到dictht中定義了一個dictEntry類型的數組table，又定義了一系列的和table有關的上下文。

我們繼續看下dictEntry的結構：

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;

    struct dictEntry *next;

} dictEntry;

看樣子dictEntry就是存儲我們數據的地方了，看到next指針，我們可以猜到，redis解決hash衝突的方法和HashMap一樣，也是拉鍊法。

到這裏我們可以總結一下：

dict是hash表的最外層，存儲了整個鍵值空間。並通過dictType定義了一系列的操作鍵值的函數。
dictht是hash表的實現，定義了hash表的數據結構。
而dictEntry則是定義了數據的存放結構。

2.漸進式rehash

redis的哈希表和HashMap在設計上面一個比較明顯不同就是rehash操作。因爲redis的定義是一個數據庫。所以其存放的數據會很多很多，爲了防止在rehash的過程中因爲大批量數據需要做遷移而引起的服務器長時間阻塞，redis採用的方法是漸進式rehash。

首先我們重新看一下dict結構體，它定義了2個hashtable。其中ht[1]就是協助完成漸進式rehash的。

typedef struct dict {
    //一系列操作鍵值空間的函數
    dictType *type;
		//私有數據
    void *privdata;
    
    dictht ht[2];
 
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */

    int iterators; /* number of iterators currently running */

} dict;

1.rehash的觸發

就拿新增操作來說，每次新增前，都會調用_dictExpandIfNeeded，檢測一下是否要進行擴容操作：

static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
   
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    // T = O(1)
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */

    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        // 擴容到原來的2倍
        return dictExpand(d, d->ht[0].used*2);
    }

    return DICT_OK;
}

如果滿足下面的case，就會調用dictExpand函數：

使用的數量used大於當前字典的長度size。
參數dict_can_resize爲1或者當前數組長度滿足強制擴容閾值dict_force_resize_ratio

就會去調用dictExpand函數，將當前字典擴容到已經使用元素的二倍。

可以看到當我添加第16個元素的時候就觸發擴容操作了。

dictExpand函數負責擴容的的初始化動作（我們只看擴容部分的賦值邏輯）：

調用_dictNextPower函數修正size，保證其大小始終是2的N次冪。
然後將dict中的ht[1]設置爲擴容後的hashtable，並將rehashidx從-1設置爲0。

int dictExpand(dict *d, unsigned long size)
{
  
    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size);
    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
  
    n.size = realsize;
    n.sizemask = realsize-1;
    // T = O(N)
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;
    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }
    /* Prepare a second hash table for incremental rehashing */
  
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

2.dict_force_resize_ratio和dict_can_resize

允許擴容的第二個條件中，需要dict_can_resize=1才允許擴容。這個參數的作用是什麼？什麼情況下dict_can_resize會被更新成0？

帶着這兩個問題我們看下dict_can_resize變量的註釋：

/* Using dictEnableResize() / dictDisableResize() we make possible to
 * enable/disable resizing of the hash table as needed. This is very important
 * for Redis, as we use copy-on-write and don't want to move too much memory
 * around when there is a child performing saving operations.
 * Note that even when dict_can_resize is set to 0, not all resizes are
 * prevented: a hash table is still allowed to grow if the ratio between
 * the number of elements and the buckets > dict_force_resize_ratio.
 */
static int dict_can_resize = 1;

static unsigned int dict_force_resize_ratio = 5;

註釋中說的很清楚：不希望在執行寫時複製的過程中再過多的去操作內存。

個人理解：save操作（比如 bgsave ）通過fork函數創建的子進程，使用的是寫時複製。執行save的過程中一方面有大量的讀取內存的操作（子進程）；另一方面如果在寫時複製的過程中，redis服務端（父進程）又收到大量的寫操作，那麼就會觸發共享對象的只讀保護，引發缺頁中斷，進而觸發頁面的複製和頁表的更新，這個時候系統負載會很大。爲了降低系統負載，就嘗試先關閉數據的遷移（數據遷移的過程中也涉及到了內存的讀寫操作）。

但是dict_can_resize並不會完全的去關閉遷移操作，如果這個時候load factor（used和size之比）超過dict_force_resize_ratio=5了，那麼就強制做一次rehash。

3. 漸進rehash的處理

1.增刪改查前協助rehash

進行rehash的函數是_dictRehashStep，該函數分別被dictAddRaw，dictGenericDelete，dictFind，dictGetRandomKey函數所調用。也就說redis每次在執行指令的時候都會嘗試做一次數據遷移操作：

判斷代碼如下：

//還記得嗎？在rehash開始前，將rehashidx設置爲了0.
//如果當前rehashidx不爲-1 說明在進行擴容
  if (dictIsRehashing(d)) 
{
  _dictRehashStep(d);
}

//dictIsRehashing的判斷邏輯就是判斷是否等於-1
#define dictIsRehashing(ht) ((ht)->rehashidx != -1)

具體的，協助擴容代碼如下：

當不存在安全迭代器的時候，進行一次數據的遷移。

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) 
      dictRehash(d,1);
}

dictRehash函數是真正做數據遷移的操作，n控制遷移的步數。可以知道的是在進行增刪改查操作前，redis每次遷移1個hash槽下所有的數據到新的哈希表中：

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;
    // 遷移次數
    while(n--) {
        dictEntry *de, *nextde;
        /* Check if we already rehashed the whole table... */
        //在下面可以看到每次遷移完成一個元素後，used都會做一個減1的操作. 那麼當used等於0的時候，說明遷移結束了
        if (d->ht[0].used == 0) {
           //做一些數據的釋放和hashtable的替換。
            zfree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            //設置當前狀態爲非擴容的標記
            d->rehashidx = -1;   
          //返回0 說明rehash結束
            return 0;
        }

        //越界判斷
        assert(d->ht[0].size > (unsigned)d->rehashidx);
        //在舊的hashtable中找到一個非空的鏈表
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
      
        de = d->ht[0].table[d->rehashidx];
        
      //遷移開始
      //整個while循環中做的操作就是將舊鏈表中的元素拿出來重新計算hash值，然後放到新hashtable中，並更新新舊hashtable的used
        while(de) {
            unsigned int h;
            nextde = de->next;
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        // 更新rehashidx。也就是說rehashidx不等於的時候，它所指向的就是下一個要進行擴容的hash槽
        d->rehashidx++;
    }
		
    //返回1 說明還需要繼續rehash
    return 1;
}

2.定時事件中處理擴容

如果說我們的redis服務器正在擴容，但是還沒什麼讀寫請求，那這擴容總不能停下來不做了吧？所以redis除了在執行命令前做一個單步擴容外，在其定時事件中，也做了一次rehash操作：

void databasesCron(void) {
  //省略和擴容無關的代碼.....
  
  //沒有做後臺線程在工作，纔去做協助做rehash。
     if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
        if (server.activerehashing) {
            for (j = 0; j < dbs_per_call; j++) {
                int work_done = incrementallyRehash(rehash_db % server.dbnum);
                rehash_db++;
                if (work_done) {
                    /* If the function did some work, stop here, we'll do
                     * more at the next cron loop. */
                    break;
                }
            }
        }
      }
    }
}

定時事件做遷移的前提：

沒有rdb和aof在執行。
Redid.config中的activerehashing配置開啓。關於該配置的介紹：

# Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in
# order to help rehashing the main Redis hash table (the one mapping top-level
# keys to values). The hash table implementation Redis uses (see dict.c)
# performs a lazy rehashing: the more operation you run into a hash table
# that is rehashing, the more rehashing "steps" are performed, so if the
# server is idle the rehashing is never complete and some more memory is used
# by the hash table.
# 
# The default is to use this millisecond 10 times every second in order to
# active rehashing the main dictionaries, freeing memory when possible.
#
# If unsure:
# use "activerehashing no" if you have hard latency requirements and it is
# not a good thing in your environment that Redis can reply form time to time
# to queries with 2 milliseconds delay.
#
# use "activerehashing yes" if you don't have such hard requirements but
# want to free memory asap when possible.

一次處理100個hash槽下面的數據：

int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;
    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }

    return rehashes;
}

3.keys命令的處理邏輯

說完了字典的數據結構和擴容操作後，我們回到key命令，看下keys命令的處理邏輯。keys命令的處理函數是src/db.c的keysCommand函數：

void keysCommand(redisClient *c) {
    dictIterator *di;
    dictEntry *de;
    // 得到匹配模式
    sds pattern = c->argv[1]->ptr;
    int plen = sdslen(pattern), allkeys;
    unsigned long numkeys = 0;
    void *replylen = addDeferredMultiBulkLength(c);
    // 獲取一個安全迭代器  迭代當前連接的整個db
    di = dictGetSafeIterator(c->db->dict);
    allkeys = (pattern[0] == '*' && pattern[1] == '\0');
    while((de = dictNext(di)) != NULL) {
        sds key = dictGetKey(de);
        robj *keyobj;
        // 將鍵名和模式進行比對
        if (allkeys || stringmatchlen(pattern,plen,key,sdslen(key),0)) {
            // 創建一個保存鍵名字的字符串對象
            keyobj = createStringObject(key,sdslen(key));
            // 刪除已過期鍵
            if (expireIfNeeded(c->db,keyobj) == 0) {
                addReplyBulk(c,keyobj);
                numkeys++;
            }
            decrRefCount(keyobj);
        }
    }
  //釋放安全迭代器
    dictReleaseIterator(di);
    setDeferredMultiBulkLength(c,replylen,numkeys);
}

處理邏輯很簡單：解析命令，然後遍歷當前連接對應的db，檢查是否匹配，檢查數據是否過期，最後將數據返回。

但是這個過程中，獲取了一個安全迭代器，爲什麼有安全迭代器？安全指的是什麼安全？線程安全嗎？

4.安全迭代器和非安全迭代器

1.迭代器的上下文

先看下迭代器的結構體定義的參數：

/* If safe is set to 1 this is a safe iterator, that means, you can call
 * dictAdd, dictFind, and other functions against the dictionary even while
 * iterating. Otherwise it is a non safe iterator, and only dictNext()
 * should be called while iterating. */

typedef struct dictIterator {
    // 字典
    dict *d;
    int 
  table,    //當前迭代器指向的hashtable，因爲rehash存在2個hashtable，所以迭代器需要知道當前遍歷到哪個了。
  index,   //迭代器所指向的hashtable的位置。
  safe;   //是否爲安全迭代器
         // entry ：當前迭代的節點
        // nextEntry ：當前節點的下一個節點
    dictEntry *entry, *nextEntry;
    long long fingerprint; //指紋。非安全迭代器釋放前做驗證用
} dictIterator;

從作者的註釋中我們可以知道的是：迭代器區分安全和非安全，並不是爲了處理併發問題，而是決定遍歷的過程中可以不可以去修改數據。

安全迭代器在其迭代過程中，允許執行其他對字典的操作（最典型的就是過期鍵的清理）。

而非安全迭代器只能做遍歷使用。

2.安全迭代器的創建

我們先看下安全迭代器的創建過程，安全迭代器的創建函數是dictGetSafeIterator：

 
dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    // 設置安全迭代器標識
    i->safe = 1;

    return i;
}

內部調用了dictGetIterator函數，它的作用就是初始化迭代器：

dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;

    return iter;
}

小總結一下，初始化安全迭代器的過程有兩步：

初始化迭代器的內存和參數。
設置迭代器標記爲安全。

3.非安全迭代器的創建

非安全迭代器其實就是少了設置safe=1的那一步。

dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;

    return iter;
}

4.迭代器的使用

看下迭代器被使用的地方dictNext函數：

dictEntry *dictNext(dictIterator *iter)
{
    while (1) {
				
        //當entry=null當時候，會進入這個分支
        if (iter->entry == NULL) {
            dictht *ht = &iter->d->ht[iter->table];
          //只有首次遍歷，纔會出現index=-1並且table等於0這種情況，這個時候會去更新iterators
            if (iter->index == -1 && iter->table == 0) {
                if (iter->safe)
                  //還記得我們的dict結構體中定義的變量嗎？當安全迭代器首次進行遍歷的時候
                  //就會增加該變量的值
                    iter->d->iterators++;
                else
                  //非安全迭代器
                    iter->fingerprint = dictFingerprint(iter->d);
            }
            iter->index++;
            if (iter->index >= (signed) ht->size) {
              //遍歷結束前判斷是否在rehash，如果是，更新index=0，table=1。
                if (dictIsRehashing(iter->d) && iter->table == 0) {
                    iter->table++;
                    iter->index = 0;
                    ht = &iter->d->ht[1];
                } else {
                    break;
                }
            }
            //綜上所述，觸發這個賦值的情況有2種：
            //1.首次遍歷hashtable[0]
          	//2.字典在進行rehash，首次遍歷hashtable[1]
            iter->entry = ht->table[iter->index];
        } else {
            iter->entry = iter->nextEntry;
        }
        if (iter->entry) {
          //記錄這次遍歷的下一個節點
            iter->nextEntry = iter->entry->next;
            return iter->entry;
        }
    }
    return NULL;
}

該函數其實就是使用迭代器獲取一個字典中的元素。
如果當前傳入的是安全迭代器，在進行第一次遍歷的時候，iterators會做一個增加。
如果當前是非安全迭代器，會計算一個fingerprint（不展開了，簡單理解就是如果使用非安全迭代器的過程中，有數據被修改了那麼指紋就會發生變化，當釋放迭代器的時候會做指紋檢測）。
如果當前在進行rehash，那麼table[1]也會被遍歷。
在函數返回前， iter->nextEntry = iter->entry->next記錄了這次遍歷過程中的下一條數據。並且下一次遍歷會使用 iter->nextEntry。

iterators++的作用？

還記得分步rehash的函數判斷嗎？

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

也就是說，當有安全迭代器存在的時候，單步rehash的操作會被禁止。

爲什麼要記錄下一次要遍歷的節點？

首先安全迭代器的定義是遍歷的過程中可以做讀寫操作。如果迭代器返回的當前節點設置了過期時間，那麼就可能因爲過期導致該節點被清理掉，也就是從鏈表中移除。那麼下一次迭代就會終止進而導致數據遍歷的缺失。

5.迭代器的釋放

迭代器的釋放函數是dictReleaseIterator：

void dictReleaseIterator(dictIterator *iter)
{

    if (!(iter->index == -1 && iter->table == 0)) {
        // 釋放安全迭代器對漸進式rehash的阻止
        if (iter->safe)
            iter->d->iterators--;
        // 如果當前是非安全迭代器，需要看一下指紋是否有變化，如果有變化會觸發一個警告：
      
      /**
      === REDIS BUG REPORT START: Cut & paste starting from here ===
[23085] 20 Jan 22:45:08.802 * DB saved on disk
[23086] 20 Jan 22:45:08.804 # === ASSERTION FAILED ===
[23086] 20 Jan 22:45:08.808 # ==> dict.c:1029 'iter->fingerprint == dictFingerprint(iter->d)' is not true
      */
        else
            assert(iter->fingerprint == dictFingerprint(iter->d));
    }
    zfree(iter);
}

5.總結

最後我們做一個總結：首先我們從keys命令出發對redis的字典結構和漸進式rehash做了一個分析。

漸進rehash的觸發有2種情況：一個是redis讀寫的時候做一次rehash，一個是定時事件定時協助rehash（前提是配置開啓並且沒有進行rdb和aof）。

然後我們又從keys命令的處理函數出發，對redis的兩種迭代器做了一次分析：

安全迭代器：安全迭代器會讓漸進式rehash停止，並且還允許在迭代的過程中對數據做增刪，能夠保證不會遍歷到重複的數據。

除了keys使用了安全迭代器外，像rdb持久化和BGREWRITEAOF都使用的安全迭代器去遍歷的數據，來防止重複的數據和過期數據的寫入。

我理解安全迭代器其實是給後臺進程做各種數據的持久化用的。我上面說安全迭代器存在的時候，**單步rehash的操作會被禁止。**但是我們還有定時事件也在做rehash呀？那裏並沒有判斷 if (d->iterators == 0) 。

但是它做了這個判斷：if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)，在沒有 BGSAVE 或者 BGREWRITEAOF 執行時，纔對哈希表進行 rehash。

非安全迭代器：非安全迭代器只允許做遍歷操作，可能遍歷到重複數據（因爲沒有對rehash做限制，此時如果發生rehash操作，那麼就可能將遍歷過的數據遷移到未遍歷過的位置上）。並且非安全迭代器還有一個fingerprint，每次釋放迭代器前都會看一下指紋是否被修改過。

我個人理解，非安全迭代器其實是給redis的主進程用的。因爲有fingerprint的存在，如果說後臺進程使用了非安全迭代器，在後臺進程使用的過程中，主進程做了大批量的數據修改，那麼在釋放的迭代器的時候，對fingerprint做的校驗就會不通過。

全文完

參考資料：
redis官方文檔：https://redis.io/
redis源碼：https://github.com/redis/redis
書籍：《Redis設計與實現》

最後，本人能力有限，可能有分析不到位或者錯誤的地方，在此先說一聲抱歉。
如果有錯誤的地方，請批評指正，謝謝！

從keys命令出發-淺談redis的字典和字典迭代器

1.keys命令

2.redis的HashTable（字典）

1.HashTable的數據結構上下文

2.漸進式rehash

1.rehash的觸發

2.dict_force_resize_ratio和dict_can_resize

3. 漸進rehash的處理

1.增刪改查前協助rehash

2.定時事件中處理擴容

3.keys命令的處理邏輯

4.安全迭代器和非安全迭代器

1.迭代器的上下文

2.安全迭代器的創建

3.非安全迭代器的創建

4.迭代器的使用

5.迭代器的釋放

5.總結

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

哈哈哈哈或

【自用】關於微信小程序的合法域名列表，

使用 @NoRepositoryBean 簡化數據庫訪問

MySQL查出時間比實際晚8小時的解決方案

什麼是IPD項目管理模式？聊聊IPD下的產品研發流程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結