golang map 內幕

關鍵性數據結構

  • hmap: map 的 header結構
  • bmap: map 的 bucket結構
  • mapextra: map 的 拓展結構 不是每一個map都包含

golang map 是用 hash map實現的,首先,我們先看 hash map是怎麼實現的;然後我們再看 golang map 是怎麼基於 hash map 封裝的 map 類型。

Bucket

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

這裏面的 bucketCnt 是一個常量:

const (
    // Maximum number of key/elem pairs a bucket can hold.
    bucketCntBits = 3
    bucketCnt     = 1 << bucketCntBits
    
    ... ...
)

這個topHash 就是指hash值的前8位, 每一個桶是大小爲 8*8 大小。此外topHash 的元素還可能有以下狀態:

const(
    emptyRest      = 0 // 此位置未被佔用,且後面的位置也沒被佔用
	emptyOne       = 1 // 此位置未被佔用,在delete後會設置此標誌
	evacuatedX     = 2 // 此位置已經被佔用,對應數據已經被遷移到新的buckets的first半區
	evacuatedY     = 3 // 此位置已經被佔用,對應數據已經被遷移到新的buckets的second半區
	evacuatedEmpty = 4 // 此位置未被佔用,但bucket已經被遷移
	minTopHash     = 5 // topHash的最小值,爲了與前面4個值進行區分
)

Hmap

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // overflow buckets 的計數器;詳細信息看 incrnoverflow 
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated), 已遷移的bucket的計數器

	extra *mapextra // optional fields,記錄next_overflow的地址,和已經分配的overflow
}

const (
    // flags
	iterator     = 1 // there may be an iterator using buckets,有一個iter在使用bucket
	oldIterator  = 2 // there may be an iterator using oldbuckets,有一個iter在使用old_bucket
	hashWriting  = 4 // a goroutine is writing to the map,有一個協程在寫入
	sameSizeGrow = 8 // the current map growth is to a new map of the same size,新bucekt與old_bucket size相同.
)


// mapextra holds fields that are not present on all maps.
type mapextra struct {
	// If both key and elem do not contain pointers and are inline, then we mark bucket
	// type as containing no pointers. This avoids scanning such maps.
	// However, bmap.overflow is a pointer. In order to keep overflow buckets
	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
	// overflow and oldoverflow are only used if key and elem do not contain pointers.
	// overflow contains overflow buckets for hmap.buckets.
	// oldoverflow contains overflow buckets for hmap.oldbuckets.
	// The indirection allows to store a pointer to the slice in hiter.
	overflow    *[]*bmap
	oldoverflow *[]*bmap

	// nextOverflow holds a pointer to a free overflow bucket.
	nextOverflow *bmap
}

map解析

make map

在golang中可以通過 make(map[key]value, hint) 創建一個map實例,在runtime包中是通過如下函數實現的:

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// 1. 初始化hmap
	if h == nil {
		h = new(hmap)
	}
	h.hash0 = fastrand() // 取一個隨機值作爲hash seed

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B
    
    // allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	// 分配空間
	if h.B != 0 {
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow // 保存overflow區域的首地址
		}
	}
	
}


// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
    // 1. 計算要使用的空間大小
	base := bucketShift(b)  // base = 2^(b'), b' 32位系統時爲b的低5位,最大值爲31,64系統時爲b的低6位,最大值爲63
	nbuckets := base
	// For small b, overflow buckets are unlikely.
	// Avoid the overhead of the calculation.
	// 預分配內存
	if b >= 4 {
		// Add on the estimated number of overflow buckets
		// required to insert the median number of elements
		// used with this value of b.
		nbuckets += bucketShift(b - 4) // nbuckets = base + 2^((b-4)') 
		sz := t.bucket.size * nbuckets
		up := roundupsize(sz) // 內存頁對齊,golang 中pagesize 爲8kB
		if up != sz {
			nbuckets = up / t.bucket.size
		}
	}
    
    // 2. 空間重複利用
	if dirtyalloc == nil { 
	    // 沒有髒空間,則新分配一個數組
		buckets = newarray(t.bucket, int(nbuckets))
	} else {
		// dirtyalloc was previously generated by
		// the above newarray(t.bucket, int(nbuckets))
		// but may not be empty.
		buckets = dirtyalloc
		size := t.bucket.size * nbuckets
		if t.bucket.ptrdata != 0 {
			memclrHasPointers(buckets, size)
		} else {
			memclrNoHeapPointers(buckets, size)
		}
	}
    
    // 3. 計算overflow 指針
	if base != nbuckets {
		// We preallocated some overflow buckets.
		// To keep the overhead of tracking these overflow buckets to a minimum,
		// we use the convention that if a preallocated overflow bucket's overflow
		// pointer is nil, then there are more available by bumping the pointer.
		// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
		nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
		last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
		last.setoverflow(t, (*bmap)(buckets)) // 將overflow最後一個bucket的末尾位置存儲 buckets指針
	}
	return buckets, nextOverflow
}

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
// 檢查當前數量是不是超過負載係數
func overLoadFactor(count int, B uint8) bool {
	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

返回值類型是 hmap*, hmap.B 要符合以下規則

    loadFactorNum = 13
    loadFactorDen = 2
    count = hint
    uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)

hmap.buckets 分配的內存: roudupsize(base=bucketsize*2^B + overflow=bucketsize*/2^B/4),bucketsize = bitmap_size + 8*(key_size) + 8*(elem_size) + ptr_size,在32和64位系統中ptr_size不同,所以bucket_size會也不同。
從上面的代碼可以看出,在物理內存中是以一個數組來存儲hmap table,內存分佈大致如下:

# bucket
|bmap|key1~8|elem1~8|

#bucekts
|bucket1~N|overflow|

# overflow
|nextOverflow|...|last|

# last
|bmap|...|ptr_buckets|

bucket1~N 是base區域, overflow 是預留區域,降低內存重複分配的次數。

insert into map

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ... ...
    // 1. 生成key的hash
    alg := t.key.alg
	hash := alg.hash(key, uintptr(h.hash0))
	
	if h.buckets == nil {
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
	}

again:
    // 2. 分桶
    // 取出hash的低 B 位,作爲 bucket的編號
	bucket := hash & bucketMask(h.B)
	if h.growing() {
		growWork(t, h, bucket)
	}
	
	// 找到桶對應的數組首地址
	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
	// hash 的高 8 位, 如果小於5 則 +5 返回
	top := tophash(hash)

	var inserti *uint8 // bmap中寫入 tophash 的地址
	var insertk unsafe.Pointer // 寫入key的地址
	var elem unsafe.Pointer // 寫入值的地址
bucketloop:
	for {
		for i := uintptr(0); i < bucketCnt; i++ { // 查找可使用的位置
			if b.tophash[i] != top {
				if isEmpty(b.tophash[i]) && inserti == nil {
					inserti = &b.tophash[i]
					insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
					elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				}
				if b.tophash[i] == emptyRest {  // 找到可使用的位置,跳出循環
					break bucketloop
				}
				continue
			}
			// 找到相同topHash,比較key
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if !alg.equal(key, k) { // key不相同繼續查找
				continue
			}
			// already have a mapping for key. Update it.
			// key相同,找到elem位置,跳轉到done
			if t.needkeyupdate() {
				typedmemmove(t.key, k, key) 
			}
			elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
			goto done
		}
		
		// 在N bucket 中沒有找到,查找overflow
		ovf := b.overflow(t)
		if ovf == nil {
			break // overflow 爲空跳出循環
		}
		b = ovf
	}

	// Did not find mapping for key. Allocate new cell & add entry.

	// If we hit the max load factor or we have too many overflow buckets,
	// and we're not already in the middle of growing, start growing.
	// 是否觸發內存增長
	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
		hashGrow(t, h)
		goto again // Growing the table invalidates everything, so try again
	}

	if inserti == nil { // 沒有找到可以插入的位置,新分配一個overflow
		// all current buckets are full, allocate a new one.
		newb := h.newoverflow(t, b)
		inserti = &newb.tophash[0]
		insertk = add(unsafe.Pointer(newb), dataOffset)
		elem = add(insertk, bucketCnt*uintptr(t.keysize))
	}

	// store new key/elem at insert position
	if t.indirectkey() {
		kmem := newobject(t.key)
		*(*unsafe.Pointer)(insertk) = kmem
		insertk = kmem
	}
	if t.indirectelem() {
		vmem := newobject(t.elem)
		*(*unsafe.Pointer)(elem) = vmem
	}
	typedmemmove(t.key, insertk, key)
	*inserti = top
	h.count++

done:
	if h.flags&hashWriting == 0 {
		throw("concurrent map writes")
	}
	h.flags &^= hashWriting
	if t.indirectelem() {
		elem = *((*unsafe.Pointer)(elem))
	}
	return elem	
	
}

從上面代碼解析我們能清楚一個寫入的過程:

  1. 對key做hash,取hash的低B位,確定bucket的編號 N;
  2. 遍歷 N bucket中每一個 位置 ,找到沒有寫入的 位置,寫入topHash即hash的高8位;
  3. 如果 N bucket 中有相同的 topHash,則需要去出對應的key做比較,如果相同則修改elem,如果不同則繼續向後遍歷,尋找空閒的 位置 寫入,以此來解決衝突的問題。

那如果出現N bucket 滿了怎麼辦?雖然這種概率很低但是也難免會遇到,畢竟一個bmap中只能裝的下8個key,這裏就要用到我們剛纔說的預分配內存 overflow,分配overflow的邏輯如下:

func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {
	var ovf *bmap
	if h.extra != nil && h.extra.nextOverflow != nil { // 存在overflow區域
		// We have preallocated overflow buckets available.
		// See makeBucketArray for more details.
		ovf = h.extra.nextOverflow
		if ovf.overflow(t) == nil { // ovf不是最後一塊,將nextOverflow 指針向後移動
			// We're not at the end of the preallocated overflow buckets. Bump the pointer.
			h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.bucketsize)))
		} else { // 最後一塊 overflow,將nextOverflow 置空
			// This is the last preallocated overflow bucket.
			// Reset the overflow pointer on this bucket,
			// which was set to a non-nil sentinel value.
			ovf.setoverflow(t, nil) // 抹平末尾指針
			h.extra.nextOverflow = nil
		}
	} else {// overflow 用光,新建一塊
		ovf = (*bmap)(newobject(t.bucket)) 
	}
	h.incrnoverflow() // 增加overflow 使用計數
	if t.bucket.ptrdata == 0 {
		h.createOverflow()
		*h.extra.overflow = append(*h.extra.overflow, ovf)
	}
	b.setoverflow(t, ovf) // 將overflow的地址加到末尾
	return ovf
}

通過上面的函數我們可以明白,當N bucket被用光後如何擴充,即從預留區域查找一塊未使用的區域,將該區域的指針放在 N bucket 的末尾,作爲 N bucket的擴充區域:

# N bucket 擴充
|bmap|key1~8|elem1~8|ptr_extra_bucket|......|extra_bucket|
                            ^---------------^

然後我們就可以愉快的將新的key-value放到extra_bucket 中,我們在結合上面一節的內容可以更加清晰的明白hashmap table在內存裏的構造,即連續數組+跳轉指針,這也是爲什麼訪問能如此快速的原因,基本上都是在連續內存上指針位移操作。

看到這裏我們對map的實現應該有一個粗淺的認識,但是這只是一部分。我們可以注意到hmap中還有一個oldbuckets,還有其他的成員變量的含義沒有被解開。

在上面的段落中我們給出了bucket溢出的解決方案,但如果bucket溢出過多怎麼辦(即單桶數據過多)??我們可以假設一個極端情況 一個map中除了 bucket1,剩餘的都是它的extra_bucket,當我在 bucket1中取查詢的時候要遍歷很長,最差的情況要遍歷整個map。怎樣去解決這個問題?

  • 對key進行排序,使key變成有序的。可以使用插入排序,實現簡單但是內存位移的操作較多。使用二分查找時間複雜度可以優化到log(n)。當然也可以使用其它排序算法,來減少內存位移發生的機率,但是因爲底層存儲是使用的數組,內存位移難以避免。
  • 重建,通過重新劃分桶,來解決單桶數據過多問題。

上面代碼中不難展示出在golang中使用的是第二種方案,那麼什麼情況下會觸發重建?如何重建?

map Grow

tooManyOverflowBucketsoverLoadFactor 這兩個函數會去判斷是否需要執行 Grow 流程:

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ... ...
    // Did not find mapping for key. Allocate new cell & add entry.

	// If we hit the max load factor or we have too many overflow buckets,
	// and we're not already in the middle of growing, start growing.
	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
		hashGrow(t, h)
		goto again // Growing the table invalidates everything, so try again
	}
	... ...
}

const(
    loadFactorNum = 13
    loadFactorDen = 2
)

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
	// If the threshold is too low, we do extraneous work.
	// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
	// "too many" means (approximately) as many overflow buckets as regular buckets.
	// See incrnoverflow for more details.
	if B > 15 {
		B = 15
	}
	// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
	return noverflow >= uint16(1)<<(B&15)
}

首先是overLoadFactor,當 key_count + 1 > 8 and key_count+1 > 13*2^(B-1) 時會觸發 Grow 流程。其次是 tooManyOverflowBuckets,當noverflow >= 2^(B&15),也就是說overflow bucket的數量大於2^(B&15)就會觸發Grow,這裏分兩種情況:

  1. B < 16,此時當overflow bucketbase bucket 相等的時候就會觸發增長流程。
  2. B >= 16 時,此時當overflow bucket數量 > 2^15時就會觸發增長流程。

這裏需要注意的是,noverflow 並不是一個準確的計數,當數量過大的時候它只能顯示一個近似的數量:

// incrnoverflow increments h.noverflow.
// noverflow counts the number of overflow buckets.
// This is used to trigger same-size map growth.
// See also tooManyOverflowBuckets.
// To keep hmap small, noverflow is a uint16.
// When there are few buckets, noverflow is an exact count.
// When there are many buckets, noverflow is an approximate count.
func (h *hmap) incrnoverflow() {
	// We trigger same-size map growth if there are
	// as many overflow buckets as buckets.
	// We need to be able to count to 1<<h.B.
	if h.B < 16 {
		h.noverflow++
		return
	}
	// Increment with probability 1/(1<<(h.B-15)).
	// When we reach 1<<15 - 1, we will have approximately
	// as many overflow buckets as buckets.
	mask := uint32(1)<<(h.B-15) - 1
	// Example: if h.B == 18, then mask == 7,
	// and fastrand & 7 == 0 with probability 1/8.
	if fastrand()&mask == 0 {
		h.noverflow++
	}
}

可以很明顯的看出,當h.B>=16時候並不是每次都會累加,此時base bucket的數量至少爲2^16,可以存儲2^19=524288條數據,外加預分配的overflow 。桶的數量越多,分佈的越離散,出現overflow的概率更低,即使出現overflow單桶過長的概率也會降低。

func hashGrow(t *maptype, h *hmap) {
	// If we've hit the load factor, get bigger.
	// Otherwise, there are too many overflow buckets,
	// so keep the same number of buckets and "grow" laterally.
	bigger := uint8(1)
	// 判斷是否超出loadFactor
	if !overLoadFactor(h.count+1, h.B) { // 沒有超過則維持以前的size
		bigger = 0
		h.flags |= sameSizeGrow
	}
	oldbuckets := h.buckets
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
    
    // 更新flags
	flags := h.flags &^ (iterator | oldIterator)
	if h.flags&iterator != 0 {
		flags |= oldIterator
	}
	// commit the grow (atomic wrt gc)
	h.B += bigger
	h.flags = flags
	// 替換buckets
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
	h.nevacuate = 0
	h.noverflow = 0

	if h.extra != nil && h.extra.overflow != nil {
		// Promote current overflow buckets to the old generation.
		if h.extra.oldoverflow != nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	if nextOverflow != nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}

	// the actual copying of the hash table data is done incrementally
	// by growWork() and evacuate().
}

根據上面的hashGrow 函數,我們可以看出map內存增長的規則,在overLoadFactor的情況下,h.B = h.B + 1,即base bucket數量翻倍, 否則維持原size不變,創建一個新的 buckets數組。

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer{
    ... ...
    bucket := hash & bucketMask(h.B)
    if h.growing() {
		growWork(t, h, bucket)
	}
	... ...
}


func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
	
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}

// 遷移數據
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
	newbit := h.noldbuckets() //oldbuckets 的 size
	if !evacuated(b) {
		// TODO: reuse overflow buckets instead of using new ones, if there
		// is no iterator using the old buckets.  (If !oldIterator.)

		// xy contains the x and y (low and high) evacuation destinations.
		// 1. 確定兩個潛在目的遷移地址,X/Y
		// X 地址爲 new buckets中編號爲 oldbucket 的bucket
		var xy [2]evacDst
		x := &xy[0]
		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
		x.k = add(unsafe.Pointer(x.b), dataOffset)
		x.e = add(x.k, bucketCnt*uintptr(t.keysize))
        
		if !h.sameSizeGrow() { // size 不變的情況下
			// Only calculate y pointers if we're growing bigger.
			// Otherwise GC can see bad pointers.
			// Y 地址爲 new buckets中編號爲 oldbucket + noldbuckets, 即second 半區
			y := &xy[1]
			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
			y.k = add(unsafe.Pointer(y.b), dataOffset)
			y.e = add(y.k, bucketCnt*uintptr(t.keysize))
		}
        
        // 2. 開始遷移數據,包括bucket overflow部分的數據
		for ; b != nil; b = b.overflow(t) {
			k := add(unsafe.Pointer(b), dataOffset)
			e := add(k, bucketCnt*uintptr(t.keysize))
			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
				top := b.tophash[i]
				if isEmpty(top) {
					b.tophash[i] = evacuatedEmpty
					continue
				}
				if top < minTopHash {
					throw("bad map state")
				}
				k2 := k
				if t.indirectkey() {
					k2 = *((*unsafe.Pointer)(k2))
				}
				var useY uint8
				 // 計算遷移到first 半區還是 second半區
				if !h.sameSizeGrow() {
					// Compute hash to make our evacuation decision (whether we need
					// to send this key/elem to bucket x or bucket y).
					hash := t.key.alg.hash(k2, uintptr(h.hash0))
					if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.alg.equal(k2, k2) {
						// If key != key (NaNs), then the hash could be (and probably
						// will be) entirely different from the old hash. Moreover,
						// it isn't reproducible. Reproducibility is required in the
						// presence of iterators, as our evacuation decision must
						// match whatever decision the iterator made.
						// Fortunately, we have the freedom to send these keys either
						// way. Also, tophash is meaningless for these kinds of keys.
						// We let the low bit of tophash drive the evacuation decision.
						// We recompute a new random tophash for the next level so
						// these keys will get evenly distributed across all buckets
						// after multiple grows.
						useY = top & 1
						top = tophash(hash)
					} else {
						if hash&newbit != 0 {
							useY = 1
						}
					}
				}

				if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
					throw("bad evacuatedN")
				}
                // 修改topHash 爲 evacuatedX 或 evacuatedY, 表示已經被遷移
				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
				dst := &xy[useY]                 // evacuation destination
                // 遷移數據
				if dst.i == bucketCnt {
					dst.b = h.newoverflow(t, dst.b)
					dst.i = 0
					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
					dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
				}
				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
				if t.indirectkey() {
					*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
				} else {
					typedmemmove(t.key, dst.k, k) // copy elem
				}
				if t.indirectelem() {
					*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
				} else {
					typedmemmove(t.elem, dst.e, e)
				}
				dst.i++
				// These updates might push these pointers past the end of the
				// key or elem arrays.  That's ok, as we have the overflow pointer
				// at the end of the bucket to protect against pointing past the
				// end of the bucket.
				dst.k = add(dst.k, uintptr(t.keysize))
				dst.e = add(dst.e, uintptr(t.elemsize))
			}
		}
		// Unlink the overflow buckets & clear key/elem to help GC.
		// 清理數據
		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
			// Preserve b.tophash because the evacuation
			// state is maintained there.
			ptr := add(b, dataOffset)
			n := uintptr(t.bucketsize) - dataOffset
			memclrHasPointers(ptr, n)
		}
	}
    
	if oldbucket == h.nevacuate {
	    // 統計是否完全遷移,如果完全遷移後,oldbuckets 會被釋放掉(設置爲nil)
		advanceEvacuationMark(h, t, newbit)
	}
}

上面我們提到了在Grow的流程中,新申請的buckets 可能會大小不變即same_size,也可能會變成oldbuckets的兩倍即double_size,當double_size的情況下,會劃分爲兩個半區firstsecond

# oldbuckets
|bucket1~N|

# newbuckets
|bucket1~N|bucketN+1~2N|
  first        second

我們以dobule_size爲例,當插入一個新的key觸發Grow操作的時候,整體的執行流程如下:

  1. 取hash(key)的低 B 位作爲bucket編號 N, bucket_i = bucket_N;判斷當前是否處於增長狀態,如果不處於Grow_State,執行下一步;否則跳轉到第 6 步;
  2. 如果bucket_i中可以找到插入的位置,則插入結束流程;否則執行下一步;
  3. 查找bucket_i 是否分配了overflow bucket,如果有分配overflow bucket,則 bucketi = overflwo_bucket,跳轉到第2步;否則執行下一步;
  4. 判斷是否需要執行Grow流程,如果需要則執行下一步;否則執行第 7 步;
  5. 創建新的 buckets並擴充到dobule_size,設置爲Grow_State,跳轉到第 1 步;
  6. oldbucketoldbuckets遷移到newbucketsoldbucket = N^(2^B-1),odlbucket中的數據會分佈到firstsecond兩個半區中(下面會詳細說明),跳轉到第2步執行;
  7. bucket_i分配的oveflow_bucketr,插入key,結束流程;

此處對第6步進行一下詳細描述,我們假設 N = 12, B = 3,觸發增長後B = 4, X = N^(2^B - 1) = 4, Y = X + 2^(B-1) = 12,新的key會被插到second半區;如果N = 4則會被插入到first半區,但無論如何都是在oldbucket遷移過來的數據桶中,以此來保證hash的一致性,這就是重構hash map的過程。

same_size的情況下執行過程是一樣的,因爲bucket總數不變,所以oldbucket對應遷移到new_buckets中相同編號的bucket中即可。

同時有一個細節值得我們注意,在bmap中被設置爲emptyone的表示是已經被刪除的數據,在遷移的過程中跳過即可,這樣遷移後的數據會變的更加緊湊。

# 遷移前
# oldbucket_4
# emptyreset = 0, emptyone = 1
|xx|yy|1|ww|zz|0|0|0|key1~8|elem1~8|

# 遷移後
# newbucket
# newbucket_4
|xx|ww|0|0|0|0|0|0|key_xx|key_ww|...|elem_xx|elme_ww|...|

# newbucket_12
|yy|zz|0|0|0|0|0|0|key_yy|key_zz|...|elem_yy|elme_zz|...|

# oldbucket_4
# evacuatedX = 2, evacuatedY = 3, evacuatedEmpty = 4
|2|3|1|2|3|4|4|4|key1~8|elem1~8|

map access

golang map中訪問一個map中數據有三種方式,我們以map[int]int爲例:

    sets := map[int]int{1:2,3:4,5:6}
    value := map[1] // 返回值
    value, isExist := map[i] // 返回值和是否存在
    for key, value := range sets{ // 遍歷
    }

前兩種訪問方式相同,都是通過key來訪問,只是返回的值有所不同而已。


func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
	if raceenabled && h != nil {
		callerpc := getcallerpc()
		pc := funcPC(mapaccess2)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled && h != nil {
		msanread(key, t.key.size)
	}
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.key.alg.hash(key, 0) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0]), false
	}
	if h.flags&hashWriting != 0 {
		throw("concurrent map read and map write")
	}
	
	// 1. 計算對應的桶編號,獲取桶地址
	alg := t.key.alg
	hash := alg.hash(key, uintptr(h.hash0))
	m := bucketMask(h.B)
	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize))) // 取hash值的低B位爲桶編號
	if c := h.oldbuckets; c != nil { // 當存在舊桶的時候,數據可能尚未遷移
		if !h.sameSizeGrow() { // 當擴充桶爲double_size時, 桶編號要取 hash 值的低 B-1 位
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		}
		
		oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
		if !evacuated(oldb) { // 如果沒有遷移則去老的桶取
			b = oldb
		}
	}
	top := tophash(hash)
bucketloop:
    // 2. 遍歷尋找對應的key
	for ; b != nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
			if b.tophash[i] != top {
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if alg.equal(key, k) {
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}
				return e, true
			}
		}
	}
	return unsafe.Pointer(&zeroVal[0]), false
}

整個查詢過程比較簡單,如果你是順序閱讀到這裏的話應該很好理解:

  1. 取低key_hash的低B位計算桶的編號N,bucket_i = bucket_N,如果此時有舊桶存在,執行第2步,否則執行第 4 步;
  2. 尋找oldbucket,判斷Grow 類型,如果是same_size則編號編號仍然爲N,如果是dobule_size則取key_hash低B-1位作爲舊桶編號;執行下一步;
  3. 判斷odlbucket是否被遷移,如果沒被遷移則 bucket_i = old_bucket;執行下一步;
  4. bucket_ioverflow_bucket中尋找與key_topHash相等的tophash,找到後取對應的keyequal 比對,如果相等則返回 對應的elem,否則繼續遍歷,到結束爲止。

相比較於按照key訪問,遍歷訪問要更復雜一些:


// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
	if raceenabled && h != nil {
		callerpc := getcallerpc()
		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
	}

	if h == nil || h.count == 0 {
		return
	}

	if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
		throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
	}
	it.t = t
	it.h = h

	// grab snapshot of bucket state
	// 1. 創建當前map狀態快照
	it.B = h.B
	it.buckets = h.buckets
	if t.bucket.ptrdata == 0 {
		// Allocate the current slice and remember pointers to both current and old.
		// This preserves all relevant overflow buckets alive even if
		// the table grows and/or overflow buckets are added to the table
		// while we are iterating.
		h.createOverflow()
		it.overflow = h.extra.overflow
		it.oldoverflow = h.extra.oldoverflow
	}

	// decide where to start
	// 2. 決定起始位置
	// 取一個隨機值
	r := uintptr(fastrand())
	if h.B > 31-bucketCntBits {
		r += uintptr(fastrand()) << 31
	}
	// 選取一個隨機的起始bucket
	it.startBucket = r & bucketMask(h.B)
	// 選取一個隨機的偏移量
	it.offset = uint8(r >> h.B & (bucketCnt - 1))

	// iterator state
	it.bucket = it.startBucket

	// Remember we have an iterator.
	// Can run concurrently with another mapiterinit().
	// 寫入標誌位
	if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
		atomic.Or8(&h.flags, iterator|oldIterator)
	}

	mapiternext(it)
}


func mapiternext(it *hiter) {
	h := it.h
	if raceenabled {
		callerpc := getcallerpc()
		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
	}
	if h.flags&hashWriting != 0 {
		throw("concurrent map iteration and map write")
	}
	t := it.t
	bucket := it.bucket
	b := it.bptr
	i := it.i
	checkBucket := it.checkBucket
	alg := t.key.alg

next:
	if b == nil {
	    // 判斷是否已經循環遍歷所有的Bucket
		if bucket == it.startBucket && it.wrapped {
			// end of iteration
			it.key = nil
			it.elem = nil
			return
		}
		
		if h.growing() && it.B == h.B {
		    //map 在grow state 且 iterInit在grow state 或者是 `same_size`
			// Iterator was started in the middle of a grow, and the grow isn't done yet.
			// If the bucket we're looking at hasn't been filled in yet (i.e. the old
			// bucket hasn't been evacuated) then we need to iterate through the old
			// bucket and only return the ones that will be migrated to this bucket.
			oldbucket := bucket & it.h.oldbucketmask()
			b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
			if !evacuated(b) { // 判斷是否遷移
				checkBucket = bucket
			} else {
				b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
				checkBucket = noCheck
			}
		} else {
		    // map未處於`grow state`,或`grow state`爲`doubel_size`.
			b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
			checkBucket = noCheck
		}
		bucket++
		if bucket == bucketShift(it.B) {
			bucket = 0
			it.wrapped = true
		}
		i = 0
	}
	for ; i < bucketCnt; i++ {
		offi := (i + it.offset) & (bucketCnt - 1)
		// 跳過空閒位置
		if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
			// TODO: emptyRest is hard to use here, as we start iterating
			// in the middle of a bucket. It's feasible, just tricky.
			continue
		}
		k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
		if t.indirectkey() {
			k = *((*unsafe.Pointer)(k))
		}
		e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
		
		if checkBucket != noCheck && !h.sameSizeGrow() {// 過濾不會被遷移過來的數據
			// Special case: iterator was started during a grow to a larger size
			// and the grow is not done yet. We're working on a bucket whose
			// oldbucket has not been evacuated yet. Or at least, it wasn't
			// evacuated when we started the bucket. So we're iterating
			// through the oldbucket, skipping any keys that will go
			// to the other new bucket (each oldbucket expands to two
			// buckets during a grow).
			if t.reflexivekey() || alg.equal(k, k) {
				// If the item in the oldbucket is not destined for
				// the current new bucket in the iteration, skip it.
				hash := alg.hash(k, uintptr(h.hash0))
				if hash&bucketMask(it.B) != checkBucket {
					continue
				}
			} else {
				// Hash isn't repeatable if k != k (NaNs).  We need a
				// repeatable and randomish choice of which direction
				// to send NaNs during evacuation. We'll use the low
				// bit of tophash to decide which way NaNs go.
				// NOTE: this case is why we need two evacuate tophash
				// values, evacuatedX and evacuatedY, that differ in
				// their low bit.
				if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
					continue
				}
			}
		}
		if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
			!(t.reflexivekey() || alg.equal(k, k)) {// 數據沒有遷移,直接訪問
			// This is the golden data, we can return it.
			// OR
			// key!=key, so the entry can't be deleted or updated, so we can just return it.
			// That's lucky for us because when key!=key we can't look it up successfully.
			it.key = k
			if t.indirectelem() {
				e = *((*unsafe.Pointer)(e))
			}
			it.elem = e
		} else { // 數據已經遷移,通過key定位訪問
			// The hash table has grown since the iterator was started.
			// The golden data for this key is now somewhere else.
			// Check the current hash table for the data.
			// This code handles the case where the key
			// has been deleted, updated, or deleted and reinserted.
			// NOTE: we need to regrab the key as it has potentially been
			// updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
			rk, re := mapaccessK(t, h, k)
			if rk == nil {
				continue // key has been deleted
			}
			it.key = rk
			it.elem = re
		}
		it.bucket = bucket
		if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
			it.bptr = b
		}
		it.i = i + 1
		it.checkBucket = checkBucket
		return
	}
	b = b.overflow(t) // 繼續遍歷overflow bucket
	i = 0
	goto next
}

當我們去遍歷一個map的時候可能有三種情況:

  1. iterInititerNext期間未發生Grow: 只要順序遍歷it.buckets數據即可;
  2. iterInititerNext 都發生在Grow state: 上述代碼可以清晰看出iterInit 時候是對 h.bucketsh.B 做了 snapshot,此時iterNext是以 newBucket作爲基礎去遍歷的,那麼一個bucket_N可能有兩個狀態:已經遷移和尚未遷移。已經遷移的直接遍歷桶即可,未遷移的則需要去oldbucket中遍歷,不過需要注意的一點是,要過濾掉那些不可能被遷移到bucket_N的數據(在double_size情況下分上下半區);
  3. iterInitGrow state 之前,iterNextGrow State:此種情況更加複雜一些,因爲iterInit是在Grow之前,iterNext的時候it.buckets實際對應的是h.oldbucket,也就是說是基於oldbuckets去遍歷,此時bucket_N也有兩種情況:已經遷移和沒有遷移,沒有遷移的直接取數據返回,已經遷移的則直接通過key訪問,因爲此時可能在新bucekt中已經被更新或者刪除了。

map delete

上面我們也提到過,map的移除是通過修改topHashemptyOne完成,刪除邏輯要比插入邏輯簡單很多:

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
	if raceenabled && h != nil {
		callerpc := getcallerpc()
		pc := funcPC(mapdelete)
		racewritepc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled && h != nil {
		msanread(key, t.key.size)
	}
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.key.alg.hash(key, 0) // see issue 23734
		}
		return
	}
	if h.flags&hashWriting != 0 {
		throw("concurrent map writes")
	}

	alg := t.key.alg
	hash := alg.hash(key, uintptr(h.hash0))

	// Set hashWriting after calling alg.hash, since alg.hash may panic,
	// in which case we have not actually done a write (delete).
	h.flags ^= hashWriting

	bucket := hash & bucketMask(h.B)
	if h.growing() { // 判斷是否在增長,如果在增長,則對對應的oldbucket 進行遷移
		growWork(t, h, bucket)
	}
	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
	bOrig := b // bucket在base區域的位置,但key實際所在位置可能是overflow
	top := tophash(hash)
search:
	for ; b != nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
			if b.tophash[i] != top {
				if b.tophash[i] == emptyRest { // 搜索到emptyRest,停止搜索
					break search
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			k2 := k
			if t.indirectkey() {
				k2 = *((*unsafe.Pointer)(k2))
			}
			if !alg.equal(key, k2) {
				continue
			}
			// Only clear key if there are pointers in it.
			// 清理內存
			if t.indirectkey() {
				*(*unsafe.Pointer)(k) = nil
			} else if t.key.ptrdata != 0 {
				memclrHasPointers(k, t.key.size) 
			}
			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
			if t.indirectelem() {
				*(*unsafe.Pointer)(e) = nil
			} else if t.elem.ptrdata != 0 {
				memclrHasPointers(e, t.elem.size)
			} else {
				memclrNoHeapPointers(e, t.elem.size)
			}
			
			b.tophash[i] = emptyOne
			// If the bucket now ends in a bunch of emptyOne states,
			// change those to emptyRest states.
			// It would be nice to make this a separate function, but
			// for loops are not currently inlineable.
			// 如果emptyOne 後面緊跟的 emptyRest,則把emptyOne設置爲emptyRest
			if i == bucketCnt-1 {
				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
					goto notLast
				}
			} else {
				if b.tophash[i+1] != emptyRest {
					goto notLast
				}
			}
			for {
				b.tophash[i] = emptyRest
				if i == 0 {
					if b == bOrig {
						break // beginning of initial bucket, we're done.
					}
					// Find previous bucket, continue at its last entry.
					c := b
					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) { //查找b前面的overflow
					}
					i = bucketCnt - 1
				} else {
					i--
				}
				if b.tophash[i] != emptyOne {
					break
				}
			}
		notLast:
			h.count--
			break search
		}
	}

	if h.flags&hashWriting == 0 {
		throw("concurrent map writes")
	}
	h.flags &^= hashWriting
}

執行流程如下:

  1. 查找key所在bucket,將對應位置的topHash設置爲emptyOne,清除對應的keyelem數據;
  2. 如果下一個位置爲emptyRest(包括後面緊跟的overflow_bueckt),則將emptyOne修改爲emptyRest,向下執行;否則結束流程;
  3. 向前查找前一個位置,如果到了base_bucekt的起始位置,則結束流程;否則跳轉到第2步。

總結

  1. golang map的底層實現是通過hash table實現的,每個bucket可以存貯8個key-elem,通過key_hash的低B(總共劃分2^B個桶)bit劃分桶,桶內通過topHash(key_hash前8bit)做區分;
  2. hash table在內存中使用連續數組+跳轉指針存儲,跳轉指針指向overflow_bucket,根據key查找的時候都是在連續內存上操作,以此來保證O(1)時間複雜度;
  3. 桶的Grow可以剔除被刪除數據佔用的空間,使得數據更加緊湊,同時overflow_bucket的排序會發生改變,優先遷移的bucket對應的overflow_bucket地址靠前。有兩種形式:same_sizeGrow前後桶數不變),數據分桶格局不變;double_size(擴張後桶數*2), 根據key_hash的B-1 bit決定是劃分到first半區還是second半區,完成桶的重新劃分。
  4. 刪除數據的時候會見對應位置的topHash設置爲emptyOne,如果一個bucekt(這裏的bucekt是指邏輯上的桶,包括base_bucekt和overflow_bucket)中最後的位置爲emptyOne則修改爲emptyreset
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章