HashMap , 翻譯解釋

JDK1.7版本的HASHMAP
HashMap文件頭註釋翻譯:

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
↓↓↓↓↓
基於Map接口實現的Hash表 , 提供了map的所有可選操作 , 並且允許多個值(value)爲null,一個鍵(key)爲null. [HashMap基本類似於HashTable,除了非線程安全(unsynchronized)和允許null] . 這個類不能保證map的順序[1],特殊說明,隨着時間推移,不保證map的順序不變[2]

[1] . map是無序的. 跟put順序沒關係.
[2] .

  • hashmap的每次put都在去判斷是否需要resize,
  • resize操作時,會收集系統信息,判斷是否需要調整計算hash的性能(hashseed),
  • 如果需要,hashseed會不同,
  • 那麼計算的hash值也不同.
  • 那麼key落在的位置也不同.
  • 那麼舉例就是,hashseed=0時計算hash爲A在前,B在後.hashseed=1時,計算hash爲B在前,A在後.
  • 那麼可得,迭代器迭代的也是完全無序的,可以理解爲亂序.

 

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
↓↓↓↓↓
本類提供的基礎操作性能恆定(constant-time performance)[get和put操作],可以認爲散列函數在桶中均勻的分散元素.[3]
迭代器(Iterator)迭代集合所需要的時間與HashMap實例的"容量[桶的數量]"成正比,再加上其大小[鍵值對的數量][4].
因此,如果需要迭代器的性能,就不要設置太高的初始容量(initial capacity)(或者太低的負載因子(load factor))[5]

[3] . 性能恆定的說法是有一定前提的.(單指get和put操作)

  1. 負載因子(load factor)適當,默認的0.75在多數場景下是比較OK的.太高,會有hash衝突,太低則浪費空間.
  2. 不觸發擴容. 擴容相對來說是比較耗時的操作.所以就需要在初始化時,儘量準確的提供初始容量.後續就會盡量少的觸發擴容操作.
  3. 在上兩個適當的情況下.散列算法基本會將key均勻分佈在數組中,無論數據量有多大,那麼get和put都是直接尋址,快速,性能好.

[4] . 容量(capacity)和大小(size)是兩個概念 .

  1. 容量(capacity)是初始化數組的個數,舉例:默認初始容量是16,就算沒有put數據,容量還是16.
  2. 大小(size),在沒有put數據時,size爲0,put一個數據,size爲1.等等.
  3. so~ ,沒有數據時, capacity=16 , size=0 , put一次後, capacity=16 , size=1 .
  4. so~ ,迭代器(iterator)是按容量去循環,再加每個容量裏鏈的長度. 所以就算size=1,iterator循環16次.
1HashIterator() {
2    expectedModCount = modCount;
3    if (size > 0) { // advance to first entry
4        Entry[] t = table;
5        //循環查詢hash表,直到不爲null的數據存在,並賦值給next
6        while (index < t.length && (next = t[index++]) == null);
7    }
8}

[5] . 官方建議,容量不要太大,合適就行;負載因子(load factor)太低,也會造成容量過大,實際使用較小.不然迭代器在要做太多無用功.

感覺自己的翻譯還是太差了,比較吃力,自己可以看的懂,但解釋不出來….,還是隻貼原文吧

 

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.[6] The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.[7] The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.[8] When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.[9]

[6] . HashMap有兩個特別重要的性能指標:初始容量(initial capacity)和負載因子(load factor)
[7] . 容量(capacity),就是hash表裏桶的數量(桶==key數組),初始容量就是hashmap在創建時的指定的一個容量,不是實際容量.(實際容量會在實際put時計算,實際容量=MAX(2n,initialCapacity),最接近初始容量的一個2的冪值,比如初始容量=20,那麼實際容量=32)
[8] . 負載因子(load factor),調節hashMap性能的重要參數
loadFactor<=>capacitysize​,當比例值大於等於loadFactor,則觸發擴容,.比例值小於loadFactor,put時不觸發擴容.
[9] . size>=capacity∗loadFactor ,觸發擴容,會重新進行hash計算(內部數據結構被重建).擴容後的容量是原容量的2倍,即:2∗capacity

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.[10]

[10] . 這段就重點說下負載因子(load factor). 默認的0.75,是在時間和空間成本上取了個折中;較大的值的話,雖然減少了空間浪費,增大了hash衝突的概率,增大了鏈表的長度.增大了查找的成本.
所以在設置初始容量的時候,考慮好數據量和負載因子,儘量減少重新散列(rehash)操作的次數.

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.[11]

[11] . 如果你需要在map中存儲大量數據,那麼一個足夠大的初始容量纔是最好的,避免擴容操作.

重點來了!
Note that this implementation is not synchronized.[12] If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.[13] (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.[14]

[12] . 本實現類不是線程安全的
[13] . 如果有多個線程要訪問hashMap,並且至少有一個線程要修改數據結構,那麼就必須要做線程同步.(修改操作包括:put,putAll,remove等).
[14] . 通常是在對象上加同步鎖 (synchroniz) 來實現同步操作.

If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:Map m = Collections.synchronizedMap(new HashMap(...));[15]

[15] . 可以用Collections.synchronizedMap來包含對象,使其成爲線程同步對象.最好的創建對象的時候,就包含起來.
這是硬操作,把hashmap用synchronized關鍵字包裹.性能不好,同步可以用concurrenthashmap

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.[16]

[16] . 迭代器(iterator)的快速失敗(fail-fast)原則: 在多線程的時候,一個線程在迭代,另一個線程修改了結構(put,putAll,remove).那麼會拋出ConcurrentModificationException異常.退出迭代.除非是iterator自己的remove方法.
場景疑問:A,B兩個線程都在迭代map,A線程iterator.remove了.B線程會不會拋異常??

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

  • 上一段說的挺好,這一段又開始尥蹶子了.
  • 意思是多線程操作發生 時,不能保證一定會有異常拋出(即快速失敗行爲).多線程操作的時候,誰都保證不了.
  • iterator已經盡最大努力去拋出異常ConcurrentModificationException.所以不能依賴這些個異常去做決斷.只是提供一個參考.(你地明白…)

到此結束.打完收功!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章