HaspMap的原理

實現簡單的Map

前幾天有想法弄懂HashMap的實現的原理，我自己也YY了一個想法去實現一個簡單的Map，代碼如下：

public class KeyValuePair<K,V> {
	
	public  K Key;
	public  V Value;
	
	public K getKey() {
		return Key;
	}
	
	public void setKey(K key) {
		Key = key;
	}
	
	public V getValue() {
		return Value;
	}
	
	public void setValue(V value) {
		Value = value;
	}
}

然後使用List作爲Container對數據進行存儲，主體的內部實現原理如下：

public class MyMap<K, V> {
	private List<KeyValuePair<K, V>> map;
	
	public MyMap() {
		map = new ArrayList<KeyValuePair<K, V>>();
	}
	
	public V put(K k, V v) {
		KeyValuePair<K, V> keyValuePair = new KeyValuePair<K, V>();
		keyValuePair.setKey(k);
		keyValuePair.setValue(v);
		map.add(keyValuePair);
		return v;
	}
	
	public V get(K k) {
		for (KeyValuePair pair : map) {
			if (pair.getKey().equals(k)) {
				return (V) pair.getValue();
			}
		}
		return null;
	}
}

雖然也能實現類似的效果，但我們可以看到這個的map的時間複雜度是O(n)，當集合數量很大時，則效率可以的非常的糟糕，下面做一個對比的測試：

@Test
public void MapTest(){
	
	long start=System.currentTimeMillis();
	MyMap<String,String> map =new MyMap();
	for (int i=0;i<10000;i++){
		map.put("Key"+i,"value"+i);
	}
	for (int i=0;i<10000;i++){
		map.get("Key"+i);
	}
	long end=System.currentTimeMillis();
	System.out.println("耗時："+(end-start));
	
	 start=System.currentTimeMillis();
	Map<String,String> hashMap =new HashMap<>();
	for (int i=0;i<10000;i++){
		hashMap.put("Key"+i,"value"+i);
	}
	for (int i=0;i<10000;i++){
		hashMap.get("Key"+i);
	}
	end=System.currentTimeMillis();
	System.out.println("耗時："+(end-start));
}

運行結果如下：

耗時：1815
耗時：14

整整慢了100多倍！

HashMap的實現原理

對於上面的代碼，我們應該知道性能最慢的是查找對應的key值，對於ArrayList來說，可能插入也是很大的性能消耗。在JDK中使用一個數組來存儲key,索引是根據Key的Hash值來確定，而每一個key對應數據單元是一個鏈表。用圖表示效果如下：

下面我們JDK的原理進行分析：

存值

首先定義一個數組，其類型是一個Key-Value類型
根據key的Hash值來確定當前的索引
根據索引值來判斷當前是否有值，如果當前有值則把當前的值插入當前數據之前

取值

1.根據key的Hash值來確定當前的索引,根據索引來找到鏈表的首節點

2.遍歷鏈表，找到指定的Key對應的節點，取出當前值

具體的實現代碼如下（可以利用上面的代碼）：

public class KeyValuePair<K,V> {
	
	public  K Key;
	public  V Value;
	public KeyValuePair next;
	
	public KeyValuePair getNext() {
		return next;
	}
	
	public void setNext(KeyValuePair next) {
		this.next = next;
	}
	public KeyValuePair(){
	
	}
	public KeyValuePair(K k, V v){
		this.Key=k;
		this.Value=v;
	}
	public K getKey() {
		return Key;
	}
	
	public void setKey(K key) {
		Key = key;
	}
	
	public V getValue() {
		return Value;
	}
	
	public void setValue(V value) {
		Value = value;
	}
}

HashMap的實現：

public class MyHashMap<K, V> {
	
	private  int defalutLength = 16;
	private int size;
	private KeyValuePair<K, V>[] arr;
	public MyHashMap() {
		arr = new KeyValuePair[defalutLength];
		size = 0;
	}
	
	public V put(K k, V v) {
		int index = findIndex(k);
		//todo:find out of index
		if (arr[index] == null) {
			arr[index] = new KeyValuePair(k, v);
		} else {
			KeyValuePair tempPair = arr[index];
			arr[index] = new KeyValuePair(k, v);
			arr[index].setNext(tempPair);
		}
		size++;
		return v;
	}
	
	private int findIndex(K key) {
		int index=key.hashCode() % defalutLength;
		return index>0?index:(-1)*index;
	}
 
	public V get(K k) {
		int index = findIndex(k);
		if (arr[index] == null) {
			return null;
		}
		KeyValuePair<K, V> current = arr[index];
		while (current.next != null) {
			if (current.getKey().equals(k)) {
				return current.getValue();
			}
			current = current.next;
		}
		return null;
	}
	public  int size(){
		return this.size;
	}
	
}

同樣我們修改測試的代碼:

@Test
public void MapTest(){
		
	long start=System.currentTimeMillis();
	MyMap<String,String> map =new MyMap();
	for (int i=0;i<10000;i++){
		map.put("Key"+i,"value"+i);
	}
	for (int i=0;i<10000;i++){
		map.get("Key"+i);
	}
	long end=System.currentTimeMillis();
	System.out.println("耗時："+(end-start));
	
	 start=System.currentTimeMillis();
	Map<String,String> hashMap =new HashMap<>();
	for (int i=0;i<10000;i++){
		hashMap.put("Key"+i,"value"+i);
	}
	for (int i=0;i<10000;i++){
		hashMap.get("Key"+i);
	}
	end=System.currentTimeMillis();
	System.out.println("耗時："+(end-start));
	 
	
	
	start=System.currentTimeMillis();
	MyHashMap<String,String> myhashMap =new MyHashMap<>();
	for (int i=0;i<10000;i++){
		myhashMap.put("Key"+i,"value"+i);
	}
	for (int i=0;i<10000;i++){
		myhashMap.get("Key"+i);
	}
	end=System.currentTimeMillis();
	System.out.println("耗時："+(end-start));
	 
}

運行結果：

耗時：2337
耗時：26
耗時：337

我們看到我們使用的鏈表在插入數據的時候進行整理，極大的提高了Map的效率，但離Jdk的性能還有很大的差距。

優化散列算法

對於Map的查找的性能的瓶頸主要在最後的鏈表的查找，我們可以把Key的數據進行擴大，讓Key分佈的更加平均，這樣就能減少最後鏈表迭代次數，實現思路：

添加一個報警百分比，當key的使用率長度大於當前的比例，我們對key的數組進行擴容
擴容後對原來的Key進行重新散列

修改後代碼如下：

public class MyHashMap<K, V> {
	
	private  int defalutLength = 16;
	private final double defaultAlfa = 0.75;
	private int size;
	private int arrLength;
	private KeyValuePair<K, V>[] arr;
	
	public MyHashMap() {
		arr = new KeyValuePair[defalutLength];
		size = 0;
		arrLength=0;
	}
	
	public V put(K k, V v) {
		int index = findIndex(k);
		//todo:find out of index
		if(arrLength>defalutLength*defaultAlfa){
			extentArr();
		}
		if (arr[index] == null) {
			arr[index] = new KeyValuePair(k, v);
			arrLength++;
		} else {
			KeyValuePair tempPair = arr[index];
			arr[index] = new KeyValuePair(k, v);
			arr[index].setNext(tempPair);
		}
		size++;
		return v;
	}
	
	private int findIndex(K key) {
		
		int index=key.hashCode() % defalutLength;
		return index>0?index:(-1)*index;
	}
	private void extentArr(){
		  defalutLength=defalutLength*2;
		KeyValuePair<K, V>[] newArr=new KeyValuePair[defalutLength];
		for (int i=0;i<defalutLength/2;i++){
			if(arr[i]!=null){
				int index= findIndex(arr[i].getKey());
				newArr[index]=arr[i];
			}
		}
		arr=newArr;
	}
	public V get(K k) {
		int index = findIndex(k);
		if (arr[index] == null) {
			return null;
		}
		
		KeyValuePair<K, V> current = arr[index];
		while (current.next != null) {
			if (current.getKey().equals(k)) {
				return current.getValue();
			}
			current = current.next;
		}
		return null;
	}
	public  int size(){
		return this.size;
	}
	
}

最終測試性能結果如下:

耗時：2263
耗時：23
耗時：33

性能已經很接近了，至於爲什麼有差異，可能jdk有其它更多的優化（比如當鏈表長度大於8時，使用紅黑樹），但本文就討論到這裏。

(本文完)

作者:老付如果覺得對您有幫助,可以下方的訂閱,或者選擇右側捐贈作者，如果有問題，請在捐贈後諮詢，謝謝合作如有任何知識產權、版權問題或理論錯誤，還請指正。自由轉載-非商用-非衍生-保持署名,請遵循：創意共享3.0許可證交流請加羣113249828：點擊加羣或發我郵件 [email protected]

實現簡單的Map

HashMap的實現原理

存值

取值

優化散列算法

jekyll如何使用中文路徑

多線程如何排隊執行

linux下sublime如何使用中文

js如何操作本地程序

docker 入門與安裝

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結