HashMap深度解析：一文讓你徹底瞭解HashMap

寫在前面

HashMap是Map族中最爲常用的一種，也是 Java Collection Framework 的重要成員。本文首先給出了 HashMap 的實質並概述了其與 Map、HashSet 的關係，緊接着給出了 HashMap 在 JDK 中的定義，並結合源碼分析了其四種構造方式。最後，通過對 HashMap 的數據結構、實現原理、源碼實現三個方面的剖析，深入到它底層 Hash 存儲機制，解釋了其底層數組長度總是 2 的 n 次方的原因，也揭示了其快速存取、擴容及擴容後的重哈希的原理與實現。

本文所有關於HashMap的源碼都是基於 JDK 1.6 的，不同 JDK 版本之間也許會有些許差異，但不影響我們對 HashMap 的數據結構、原理等整體的把握和了解。

HashMap 概述

Map 是 Key-Value 對映射的抽象接口，該映射不包括重複的鍵，即一個鍵對應一個值。HashMap 是 Java Collection Framework 的重要成員，也是Map族(如下圖所示)中我們最爲常用的一種。簡單地說，HashMap 是基於哈希表的 Map 接口的實現，以 Key-Value 的形式存在，即存儲的對象是 Entry (同時包含了 Key 和 Value) 。在HashMap中，其會根據hash算法來計算key-value的存儲位置並進行快速存取。特別地，HashMap最多隻允許一條Entry的鍵爲Null(多條會覆蓋)，但允許多條Entry的值爲Null。此外，HashMap 是 Map 的一個非同步的實現。

雖然 HashMap 和 HashSet 實現的接口規範不同，但是它們底層的 Hash 存儲機制完全相同。實際上，HashSet 本身就是在 HashMap 的基礎上實現的

HashMap 在 JDK 中的定義

HashMap實現了Map接口，並繼承 AbstractMap 抽象類，其中 Map 接口定義了鍵值映射規則。和 AbstractCollection抽象類在 Collection 族的作用類似， AbstractMap 抽象類提供了 Map 接口的骨幹實現，以最大限度地減少實現Map接口所需的工作。HashMap 在JDK中的定義爲：

public class HashMap<K, V> extends AbstractMap<K, V>implements Map<K, V>, Cloneable, Serializable {
	...}

HashMap 的構造函數

HashMap 一共提供了四個構造函數，其中默認無參的構造函數和參數爲Map的構造函數爲 Java Collection Framework 規範的推薦實現，其餘兩個構造函數則是 HashMap 專門提供的。

1、HashMap()

該構造函數意在構造一個具有> 默認初始容量 (16) 和默認負載因子(0.75) 的空 HashMap，是 Java Collection Framework 規範推薦提供的，其源碼如下：

 /** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */
 public HashMap() {
 //負載因子:用於衡量的是一個散列表的空間的使用程度 this.loadFactor = DEFAULT_LOAD_FACTOR; 
 //HashMap進行擴容的閾值，它的值等於 HashMap 的容量乘以負載因子 threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
 // HashMap的底層實現仍是數組，只是數組的每一項都是一條鏈 table = new Entry[DEFAULT_INITIAL_CAPACITY];
 init();
 }

2、HashMap(int initialCapacity, float loadFactor)

該構造函數意在構造一個指定初始容量和指定負載因子的空 HashMap，其源碼如下：

 /** * Constructs an empty HashMap with the specified initial capacity and load factor. */
 public HashMap(int initialCapacity, float loadFactor) {
 //初始容量不能小於 0 if (initialCapacity < 0)
 throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
 //初始容量不能超過 2^30 if (initialCapacity > MAXIMUM_CAPACITY)
 initialCapacity = MAXIMUM_CAPACITY;
 //負載因子不能小於 0 
 if (loadFactor <= 0 || Float.isNaN(loadFactor))
 throw new IllegalArgumentException("Illegal load factor: " +
 loadFactor);
 // HashMap 的容量必須是2的冪次方，超過 initialCapacity 的最小 2^n 
 int capacity = 1;
 while (capacity < initialCapacity)
 capacity <<= 1; 
 //負載因子 this.loadFactor = loadFactor;
 //設置HashMap的容量極限，當HashMap的容量達到該極限時就會進行自動擴容操作 threshold = (int)(capacity * loadFactor);
 // HashMap的底層實現仍是數組，只是數組的每一項都是一條鏈 table = new Entry[capacity];
 init();
 }

3、HashMap(int initialCapacity)

該構造函數意在構造一個指定初始容量和默認負載因子 (0.75)的空 HashMap，其源碼如下：

 // Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75) public HashMap(int initialCapacity) {
 this(initialCapacity, DEFAULT_LOAD_FACTOR); // 直接調用上述構造函數 }

4、HashMap(Map<? extends K, ? extends V> m)

該構造函數意在構造一個與指定 Map 具有相同映射的 HashMap，其初始容量不小於 16 (具體依賴於指定Map的大小)，負載因子是 0.75，是 Java Collection Framework 規範推薦提供的，其源碼如下：

 // Constructs a new HashMap with the same mappings as the specified Map. 
 // The HashMap is created with default load factor (0.75) and an initial capacity // sufficient to hold the mappings in the specified Map. public HashMap(Map<? extends K, ? extends V> m) {
 // 初始容量不小於 16 
 this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
 DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
 putAllForCreate(m);
 }

在這裏，我們提到了兩個非常重要的參數：初始容量和負載因子，這兩個參數是影響HashMap性能的重要參數。其中，容量表示哈希表中桶的數量 (table 數組的大小)，初始容量是創建哈希表時桶的數量；負載因子是哈希表在其容量自動增加之前可以達到多滿的一種尺度，它衡量的是一個散列表的空間的使用程度，負載因子越大表示散列表的裝填程度越高，反之愈小。

HashMap 的數據結構

哈希的相關概念

Hash 就是把任意長度的輸入(又叫做預映射， pre-image)，通過哈希算法，變換成固定長度的輸出(通常是整型)，該輸出就是哈希值。這種轉換是一種壓縮映射，也就是說，散列值的空間通常遠小於輸入的空間。不同的輸入可能會散列成相同的輸出，從而不可能從散列值來唯一的確定輸入值。簡單的說，就是一種將任意長度的消息壓縮到某一固定長度的息摘要函數。

哈希的應用：數據結構

我們知道，數組的特點是：尋址容易，插入和刪除困難；而鏈表的特點是：尋址困難，插入和刪除容易。那麼我們能不能綜合兩者的特性，做出一種尋址容易，插入和刪除也容易的數據結構呢？答案是肯定的，這就是我們要提起的哈希表。事實上，哈希表有多種不同的實現方法，我們接下來解釋的是最經典的一種方法 —— 拉鍊法，我們可以將其理解爲鏈表的數組，如下圖所示：

我們可以從上圖看到，左邊很明顯是個數組，數組的每個成員是一個鏈表。該數據結構所容納的所有元素均包含一個指針，用於元素間的鏈接。我們根據元素的自身特徵把元素分配到不同的鏈表中去，反過來我們也正是通過這些特徵找到正確的鏈表，再從鏈表中找出正確的元素。其中，根據元素特徵計算元素數組下標的方法就是哈希算法。

總的來說，哈希表適合用作快速查找、刪除的基本數據結構，通常需要總數據量可以放入內存。在使用哈希表時，有以下幾個關鍵點：

hash 函數（哈希算法）的選擇：針對不同的對象(字符串、整數等)具體的哈希方法；
碰撞處理：常用的有兩種方式，一種是open hashing，即 >拉鍊法；
另一種就是 closed hashing，即開地址法(opened addressing)。

HashMap 的數據結構

我們知道，在Java中最常用的兩種結構是數組和鏈表，幾乎所有的數據結構都可以利用這兩種來組合實現，HashMap 就是這種應用的一個典型。實際上，HashMap 就是一個鏈表數組，如下是它數據結構：

從上圖中，我們可以形象地看出HashMap底層實現還是數組，只是數組的每一項都是一條鏈。其中參數initialCapacity 就代表了該數組的長度，也就是桶的個數。在第三節我們已經瞭解了HashMap 的默認構造函數的源碼：

 /** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */
 public HashMap() {
 //負載因子:用於衡量的是一個散列表的空間的使用程度 this.loadFactor = DEFAULT_LOAD_FACTOR; 
 //HashMap進行擴容的閾值，它的值等於 HashMap 的容量乘以負載因子 threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
 // HashMap的底層實現仍是數組，只是數組的每一項都是一條鏈 table = new Entry[DEFAULT_INITIAL_CAPACITY];
 init();
 }

從上述源碼中我們可以看出，每次新建一個HashMap時，都會初始化一個Entry類型的table數組，其中 Entry類型的定義如下：

static class Entry<K,V> implements Map.Entry<K,V> {
 final K key; // 鍵值對的鍵 V value; // 鍵值對的值 Entry<K,V> next; // 下一個節點 final int hash; // hash(key.hashCode())方法的返回值 /** * Creates new entry. */
 Entry(int h, K k, V v, Entry<K,V> n) { // Entry 的構造函數 value = v;
 next = n;
 key = k;
 hash = h;
 }
 ......}

其中，Entry爲HashMap的內部類，實現了 Map.Entry 接口，其包含了鍵key、值value、下一個節點next，以及hash值四個屬性。事實上，Entry 是構成哈希表的基石，是哈希表所存儲的元素的具體形式。

HashMap 的快速存取

下面我們結合JDK源碼看HashMap 的存取實現。

HashMap 的存儲實現

在 HashMap 中，鍵值對的存儲是通過 put(key,vlaue) 方法來實現的，其源碼如下：

 /** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with key, or null if there was no mapping for key. * Note that a null return can also indicate that the map previously associated null with key. */
 public V put(K key, V value) {
 //當key爲null時，調用putForNullKey方法，並將該鍵值對保存到table的第一個位置 
 if (key == null)
 return putForNullKey(value); 
 //根據key的hashCode計算hash值 int hash = hash(key.hashCode()); // ------- (1) //計算該鍵值對在數組中的存儲位置（哪個桶） int i = indexFor(hash, table.length); // ------- (2) //在table的第i個桶上進行迭代，尋找 key 保存的位置 for (Entry<K,V> e = table[i]; e != null; e = e.next) { // ------- (3) Object k;
 //判斷該條鏈上是否存在hash值相同且key值相等的映射，若存在，則直接覆蓋 value，並返回舊value if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
 V oldValue = e.value;
 e.value = value;
 e.recordAccess(this);
 return oldValue; // 返回舊值 }
 }
 modCount++; //修改次數增加1，快速失敗機制 //原HashMap中無該映射，將該添加至該鏈的鏈頭 addEntry(hash, key, value, i); 
 return null;
 }

對NULL鍵的特別處理：putForNullKey()

我們直接看其源碼：

 /** * Offloaded version of put for null keys */
 private V putForNullKey(V value) {
 // 若key==null，則將其放入table的第一個桶，即 table[0] for (Entry<K,V> e = table[0]; e != null; e = e.next) { 
 if (e.key == null) { // 若已經存在key爲null的鍵，則替換其值，並返回舊值 V oldValue = e.value;
 e.value = value;
 e.recordAccess(this);
 return oldValue;
 }
 }
 modCount++; // 快速失敗 addEntry(0, null, value, 0); // 否則，將其添加到 table[0] 的桶中 return null;
 }

HashMap 中的哈希策略（算法）

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions. This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits.
 *
 * Note: Null keys always map to hash 0, thus index 0.
 */
static int hash( int h )
{
/*
 * This function ensures that hashCodes that differ only by
 * constant multiples at each bit position have a bounded
 * number of collisions (approximately 8 at default load factor).
 */
	h ^= (h >>> 20) ^ (h >>> 12);
	return(h ^ (h >>> 7) ^ (h >>> 4) );
}

正如JDK官方對該方法的描述那樣，使用hash()方法對一個對象的hashCode進行重新計算是爲了防止質量低下的hashCode()函數實現。由於hashMap的支撐數組長度總是 2 的冪次，通過右移可以使低位的數據儘量的不同，從而使hash值的分佈儘量均勻。

通過上述hash()方法計算得到 Key 的 hash值後，怎麼才能保證元素均勻分佈到table的每個桶中呢？我們會想到取模，但是由於取模的效率較低，HashMap 是通過調用上面的indexFor()方法處理的，其不但簡單而且效率很高，對應源碼如下所示：

/** * * Returns index for hash code h. * */static int indexFor( int h, int length ){
	return(h & (length - 1) ); /* 作用等價於取模運算，但這種方式效率更高 */}

HashMap 中鍵值對的添加：addEntry()

我們直接看其源碼：

 /** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if appropriate. * * Subclass overrides this to alter the behavior of put method. * 
 * 永遠都是在鏈表的表頭添加新元素 */
 void addEntry(int hash, K key, V value, int bucketIndex) {
 //獲取bucketIndex處的鏈表 Entry<K,V> e = table[bucketIndex];
 //將新創建的 Entry 鏈入 bucketIndex處的鏈表的表頭 
 table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
 //若HashMap中元素的個數超過極限值 threshold，則容量擴大兩倍 if (size++ >= threshold)
 resize(2 * table.length);
 }

HashMap 的擴容：resize()

隨着HashMap中元素的數量越來越多，發生碰撞的概率將越來越大，所產生的子鏈長度就會越來越長，這樣勢必會影響HashMap的存取速度。爲了保證HashMap的效率，系統必須要在某個臨界點進行擴容處理，該臨界點就是HashMap中元素的數量在數值上等於threshold（table數組長度*加載因子）。但是，不得不說，擴容是一個非常耗時的過程，因爲它需要重新計算這些元素在新table數組中的位置並進行復制處理。所以，如果我們能夠提前預知HashMap 中元素的個數，那麼在構造HashMap時預設元素的個數能夠有效的提高HashMap的性能。我們直接看其源碼：

/** * * Rehashes the contents of this map into a new array with a * * larger capacity. This method is called automatically when the * * number of keys in this map reaches its threshold. * * * * If current capacity is MAXIMUM_CAPACITY, this method does not * * resize the map, but sets threshold to Integer.MAX_VALUE. * * This has the effect of preventing future calls. * * * * @param newCapacity the new capacity, MUST be a power of two; * * must be greater than current capacity unless current * * capacity is MAXIMUM_CAPACITY (in which case value * * is irrelevant). * */void resize( int newCapacity ){
	Entry[] oldTable = table;
	int oldCapacity = oldTable.length;
	/* 若 oldCapacity 已達到最大值，直接將 threshold 設爲 Integer.MAX_VALUE */
	if ( oldCapacity == MAXIMUM_CAPACITY )
	{
		threshold = Integer.MAX_VALUE;
		return; /* 直接返回 */
	}
	/* 否則，創建一個更大的數組 */
	Entry[] newTable = new Entry[newCapacity];
	/* 將每條Entry重新哈希到新的數組中 */
	transfer( newTable );
	table = newTable;
	threshold = (int) (newCapacity * loadFactor); /* 重新設定 threshold */}

HashMap 的重哈希：transfer()

重哈希的主要是一個重新計算原HashMap中的元素在新table數組中的位置並進行復制處理的過程，我們直接看其源碼：

/** * * Transfers all entries from current table to newTable. * */void transfer( Entry[] newTable ){/* 將原數組 table 賦給數組 src */
	Entry[] src = table;
	int newCapacity = newTable.length;/* 將數組 src 中的每條鏈重新添加到 newTable 中 */
	for ( int j = 0; j < src.length; j++ )
	{
		Entry<K, V> e = src[j];
		if ( e != null )
		{
			src[j] = null; /* src 回收 *//* 將每條鏈的每個元素依次添加到 newTable 中相應的桶中 */
			do
			{
				Entry<K, V> next = e.next;/* e.hash指的是 hash(key.hashCode())的返回值; *//* 計算在newTable中的位置，注意原來在同一條子鏈上的元素可能被分配到不同的子鏈 */
				int i = indexFor( e.hash, newCapacity );
				e.next		= newTable[i];
				newTable[i]	= e;
				e		= next;
			}
			while ( e != null );
		}
	}}

特別需要注意的是，在重哈希的過程中，原屬於一個桶中的Entry對象可能被分到不同的桶，因爲HashMap 的容量發生了變化，那麼 h&(length - 1) 的值也會發生相應的變化。極端地說，如果重哈希後，原屬於一個桶中的Entry對象仍屬於同一桶，那麼重哈希也就失去了意義。

HashMap 的讀取實現

相對於HashMap的存儲而言，讀取就顯得比較簡單了。因爲，HashMap只需通過key的hash值定位到table數組的某個特定的桶，然後查找並返回該key對應的value即可，源碼如下：

/** * * Returns the value to which the specified key is mapped, * * or {@code null} if this map contains no mapping for the key. * * <p>More formally, if this map contains a mapping from a key * * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * * key.equals(k))}, then this method returns {@code v}; otherwise * * it returns {@code null}. (There can be at most one such mapping.) * <p>A return value of {@code null} does not <i>necessarily</i> * * indicate that the map contains no mapping for the key; it's also * * possible that the map explicitly maps the key to {@code null}. * * The {@link #containsKey containsKey} operation may be used to * * distinguish these two cases. * @see #put(Object, Object) * */public V get( Object key ){/* 若爲null，調用getForNullKey方法返回相對應的value */
	if ( key == null )/* 從table的第一個桶中尋找 key 爲 null 的映射；若不存在，直接返回null */
		return(getForNullKey() );/* 根據該 key 的 hashCode 值計算它的 hash 碼 */
	int hash = hash( key.hashCode() );/* 找出 table 數組中對應的桶 */
	for ( Entry<K, V> e = table[indexFor( hash, table.length )];
	      e != null;
	      e = e.next )
	{
		Object k;/* 若搜索的key與查找的key相同，則返回相對應的value */
		if ( e.hash == hash && ( (k = e.key) == key || key.equals( k ) ) )
			return(e.value);
	}
	return(null);}

針對鍵爲NULL的鍵值對，HashMap 提供了專門的處理：getForNullKey()，其源碼如下：

/** * * Offloaded version of get() to look up null keys. Null keys map * * to index 0. This null case is split out into separate methods * * for the sake of performance in the two most commonly used * * operations (get and put), but incorporated with conditionals in * * others. * */private V getForNullKey(){
	/* 鍵爲NULL的鍵值對若存在，則必定在第一個桶中 */
	for ( Entry<K, V> e = table[0]; e != null; e = e.next )
	{
		if ( e.key == null )
			return(e.value);
	}
	/* 鍵爲NULL的鍵值對若不存在，則直接返回 null */
	return(null);}

因此，調用HashMap的get(Object key)方法後，若返回值是 NULL，則存在如下兩種可能：

該 key 對應的值就是 null;
HashMap 中不存在該 key。

HashMap深度解析：一文讓你徹底瞭解HashMap

寫在前面

HashMap 概述

HashMap 在 JDK 中的定義

HashMap 的構造函數

HashMap 的數據結構

HashMap 的快速存取

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

關於阿里開發者招聘節 |這5道筆試真題你會嗎！？？？

HashMap深度解析：一文讓你徹底瞭解HashMap

某java程序員連拿3份offer，每份高達45k？他說他作對了這些。

8年開發老司機告訴你完成一個java Web完整項目要用到的專業技能

看到“java單例模式”腦殼疼，學會這幾招分分鐘搞定

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結