JAVA中爲什麼Map桶（鏈表）長度超過8才轉爲紅黑樹

原創

stuqbx

2020-07-01 09:21

爲什麼要轉換？

因爲Map中桶的元素初始化是鏈表保存的，其查找性能是O(n)，而樹結構能將查找性能提升到O(log(n))。當鏈表長度很小的時候，即使遍歷，速度也非常快，但是當鏈表長度不斷變長，肯定會對查詢性能有一定的影響，所以才需要轉成樹。

爲什麼閾值是8？

轉換後存儲的數據結構TreeNodes佔用空間是普通Nodes的兩倍，只有當bin包含足夠多的節點時纔會轉成TreeNodes，而是否足夠多是由TREEIFY_THRESHOLD的值決定的。

在hashCode離散性很好的情況下，樹型bin（桶，即bucket，HashMap中hashCode值一樣的元素保存的地方）用到的概率非常小，因爲數據均勻分佈在每個bin中，幾乎不會有bin中鏈表長度會達到閾值。事實上，在隨機hashCode的情況下，在bin中節點的分佈頻率遵循如下的泊松分佈（http://en.wikipedia.org/wiki/Poisson_distribution）。

在擴容閾值爲0.75的情況下，（即使因爲擴容而方差很大）遵循着參數平均爲0.5的泊松分佈。忽略方差，按公式

計算，概率如下：

長度	概率
0	0.60653066
1	0.30326533
2	0.07581633
3	0.01263606
4	0.00157952
5	0.00015795
6	0.00001316
7	0.00000094
8	0.00000006

如上，一個bin中鏈表長度達到8個元素的概率爲0.00000006，幾乎是不可能事件。

大部分情況下，鏈表存儲能節約存儲空間同時有着良好的查找性能；極個別情況下，節點數達到8個，轉爲紅黑樹，能獲得更好的查找性能，同時因爲是個別情況，不需要大量的存儲空間。

所以，閾值8是時間和空間的權衡，是根據概率統計決定的。不得不感嘆，發展30年的Java每一項改動和優化都是非常嚴謹和科學的。

附. JDK(1.8.0_45)中的相關注釋

HashMap類第174～197行

     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

ConcurrentHashMap中第327~349行也有關於此的說法，大同小異。

     * The main disadvantage of per-bin locks is that other update
     * operations on other nodes in a bin list protected by the same
     * lock can stall, for example when user equals() or mapping
     * functions take a long time.  However, statistically, under
     * random hash codes, this is not a common problem.  Ideally, the
     * frequency of nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average, given the resizing threshold
     * of 0.75, although with a large variance because of resizing
     * granularity. Ignoring variance, the expected occurrences of
     * list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
     * first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

JAVA中爲什麼Map桶（鏈表）長度超過8才轉爲紅黑樹

爲什麼要轉換？

爲什麼閾值是8？

附. JDK(1.8.0_45)中的相關注釋

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

功夫貸支付服務架構演進之路——解決的問題

JAVA中爲什麼Map桶（鏈表）長度超過8才轉爲紅黑樹

將文件複製到指定路徑[C# 文件操作]

WinSock網絡通信程序設計入門

MFC的CSocket的一個小Bug？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結