1. 前言
map是CS中十分根底的数据结构,对于golang map的根本应用,这里不再赘述,能够参考官网文档。
golang的map实现是基于hash查找表,并且基于链表来解决hash碰撞问题。
2. 环境信息
- go版本:go1.15.4 darwin/amd64
3. go map数据结构剖析
map的根底构造体是hmap
,该构造体存在文件runtime/map.go
中hmap
源码:
// A header for a Go map.type hmap struct { // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go. // Make sure this stays in sync with the compiler's definition. count int // # live cells == size of map. Must be first (used by len() builtin) flags uint8 B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items) noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details hash0 uint32 // hash seed buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0. oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated) extra *mapextra // optional fields}// mapextra holds fields that are not present on all maps.type mapextra struct { // If both key and elem do not contain pointers and are inline, then we mark bucket // type as containing no pointers. This avoids scanning such maps. // However, bmap.overflow is a pointer. In order to keep overflow buckets // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow. // overflow and oldoverflow are only used if key and elem do not contain pointers. // overflow contains overflow buckets for hmap.buckets. // oldoverflow contains overflow buckets for hmap.oldbuckets. // The indirection allows to store a pointer to the slice in hiter. overflow *[]*bmap oldoverflow *[]*bmap // nextOverflow holds a pointer to a free overflow bucket. nextOverflow *bmap}
count
:map中kv对的数量;flags
:map的一些标记位;B
:map中bucket数量为2^B
个;意味着此时map数据结构中能够存储loadFactor * 2^B
个数据,如果超过,则须要扩容;todonoverflow
:map中溢出bucket的近似数量;todohash0
:hash函数的种子;buckets
:map中bucket的首指针,map中一共有2^B
个bucket;如果count==0的状况下,该字段可能是nil;oldbuckets
:map中旧bucket的首指针,该字段只有在map扩容的时候,才不等于nil;todonevacuate
:map中bucket迁徙数量,至少有此数量的bucket从旧bucket迁徙到新bucket;todoextra
:扩大字段;
bmap
是bucket真正的构造体
// A bucket for a Go map.type bmap struct { // tophash generally contains the top byte of the hash value // for each key in this bucket. If tophash[0] < minTopHash, // tophash[0] is a bucket evacuation state instead. tophash [bucketCnt]uint8 // Followed by bucketCnt keys and then bucketCnt elems. // NOTE: packing all the keys together and then all the elems together makes the // code a bit more complicated than alternating key/elem/key/elem/... but it allows // us to eliminate padding which would be needed for, e.g., map[int64]int8. // Followed by an overflow pointer.}
tophash
:存储hash值的高8位;keys
:key数组,暗藏字段;values
:value数组,暗藏字段;overflow
:溢出buceket指针,暗藏字段;
bmap.tophash
中除了存储hash值的高8位,也能够用来存储一些状态码。
// Possible tophash values. We reserve a few possibilities for special marks. // Each bucket (including its overflow buckets, if any) will have either all or none of its // entries in the evacuated* states (except during the evacuate() method, which only happens // during map writes and thus no one else can observe the map during that time). emptyRest = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows. emptyOne = 1 // this cell is empty evacuatedX = 2 // key/elem is valid. Entry has been evacuated to first half of larger table. evacuatedY = 3 // same as above, but evacuated to second half of larger table. evacuatedEmpty = 4 // cell is empty, bucket is evacuated. minTopHash = 5 // minimum tophash for a normal filled cell.
bmap结构图
hmap结构图
上面咱们重点剖析一下map的创立和增删改查操作,咱们会展现源码,同时在源码上减少中文正文,作为对源码的剖析;golang编译器会依据不同状况,调用不同的函数,咱们上面剖析的是runtime/map.go
文件中的根本函数;一些其余优化函数,例如runtime/map_faststr.go
中对map[string]type
类型的优化,感兴趣的同学能够自行查看。
3.1. map创立
示例代码
func main() { m1 := make(map[string]string) m2 := make(map[string]string, 9)}
咱们能够通过汇编编译代码看到go map创立调用的底层函数是makemap
,该函数存在文件runtime/map.go
中;事实上,不同的map申明形式,go规范编译器抉择不同的函数调用,例如m1 := make(map[string]string)
代码,编译器会调用函数runtime.makemap_small
,然而大部分场景下都是调用makemap
。上面咱们剖析下函数makemap
:
// makemap implements Go map creation for make(map[k]v, hint).// If the compiler has determined that the map or the first bucket// can be created on the stack, h and/or bucket may be non-nil.// If h != nil, the map can be created directly in h.// If h.buckets != nil, bucket pointed to can be used as the first bucket.func makemap(t *maptype, hint int, h *hmap) *hmap { // 查看申请的map空间是否超过内存限度 mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size) if overflow || mem > maxAlloc { hint = 0 } // 初始化hmap // initialize Hmap if h == nil { h = new(hmap) } // hash初始种子 h.hash0 = fastrand() // 计算B // Find the size parameter B which will hold the requested # of elements. // For hint < 0 overLoadFactor returns false since hint < bucketCnt. B := uint8(0) for overLoadFactor(hint, B) { B++ } h.B = B // allocate initial hash table // if B == 0, the buckets field is allocated lazily later (in mapassign) // If hint is large zeroing this memory could take a while. if h.B != 0 { var nextOverflow *bmap // 调用函数makeBucketArray,调配bucket和溢出bucket的内存 h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) if nextOverflow != nil { h.extra = new(mapextra) h.extra.nextOverflow = nextOverflow } } return h}// makeBucketArray initializes a backing array for map buckets.// 1<<b is the minimum number of buckets to allocate.// dirtyalloc should either be nil or a bucket array previously// allocated by makeBucketArray with the same t and b parameters.// If dirtyalloc is nil a new backing array will be alloced and// otherwise dirtyalloc will be cleared and reused as backing array.func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) { base := bucketShift(b) nbuckets := base // 如果b >= 4,则示意申请的map空间较大,咱们当时申请一些溢出bucket,这样能够提高效率 // For small b, overflow buckets are unlikely. // Avoid the overhead of the calculation. if b >= 4 { // Add on the estimated number of overflow buckets // required to insert the median number of elements // used with this value of b. nbuckets += bucketShift(b - 4) sz := t.bucket.size * nbuckets up := roundupsize(sz) if up != sz { nbuckets = up / t.bucket.size } } if dirtyalloc == nil { buckets = newarray(t.bucket, int(nbuckets)) } else { // dirtyalloc was previously generated by // the above newarray(t.bucket, int(nbuckets)) // but may not be empty. buckets = dirtyalloc size := t.bucket.size * nbuckets if t.bucket.ptrdata != 0 { memclrHasPointers(buckets, size) } else { memclrNoHeapPointers(buckets, size) } } if base != nbuckets { // We preallocated some overflow buckets. // To keep the overhead of tracking these overflow buckets to a minimum, // we use the convention that if a preallocated overflow bucket's overflow // pointer is nil, then there are more available by bumping the pointer. // We need a safe non-nil pointer for the last overflow bucket; just use buckets. // nextOverflow是溢出bucket的首地址; // last是最初一个溢出bucket的首地址; // 每个溢出bucket对应的bmap构造体中的溢出bucket指针都是nil;然而last的溢出bucket指针是bucket的起始地址; nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize))) last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize))) last.setoverflow(t, (*bmap)(buckets)) } return buckets, nextOverflow}
总结:
- 函数
makemap
会依据不同的申明形式和参数,决定map的初始化空间大小; - map中kv都存储在bucket中,每个bucket能够存8对kv;
- 如果
len(map) > 0
,则map中至多存在一个bucket,所以map的空间都是8的整数倍; - 如果map申请空间较大(大于等于128),示意呈现key碰撞的几率较大,则会当时创立一些溢出bucket,以备前期应用;
3.2. map查找元素
示例代码
func main() { m1 := make(map[int8]int) m1[1] = 1 v, ok := m1[1] fmt.Println(v, ok)}
map查找元素操作,底层调用的函数mapaccess1
和mapaccess2
,该函数存在文件runtime/map.go
中;这两个函数基本一致,只是函数mapaccess2
会返回bool类型,示意key是否存在。事实上,对于不同的map key类型,编译器会调用不同的函数来实现map的增删改查操作,其中针对非凡key类型的优化函数,存在文件runtime/map_fast32.go
,runtime/map_fast64.go
和runtime/map_faststr.go
中;例如,如果key的类型是string,map的查找操作会调用优化函数mapaccess1_faststr
和mapaccess2_faststr
。本文只剖析根本的函数,对于优化函数,感兴趣的同学能够自行查看源码。上面咱们剖析函数mapaccess2
:
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) { // 启用数据竞争检测 if raceenabled && h != nil { callerpc := getcallerpc() pc := funcPC(mapaccess2) racereadpc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } // 启用-msan检测 if msanenabled && h != nil { msanread(key, t.key.size) } if h == nil || h.count == 0 { if t.hashMightPanic() { t.hasher(key, 0) // see issue 23734 } return unsafe.Pointer(&zeroVal[0]), false } // map不反对并发平安,并发读写会产生panic if h.flags&hashWriting != 0 { throw("concurrent map read and map write") } // 计算hash值 hash := t.hasher(key, uintptr(h.hash0)) // m示意map中bucket数量 m := bucketMask(h.B) // 利用`hash mod m`能够计算bucket索引,b示意对应bucket的首地址 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize))) // map正在迁徙的场景,如果map正在迁徙,则优先从oldbuckets中查找kv if c := h.oldbuckets; c != nil { // map是否在扩容迁徙,如果是扩容迁徙,则oldbuckets理论的bucket数量是m的一半(扩容会让bucket数量增加一倍) if !h.sameSizeGrow() { // There used to be half as many buckets; mask down one more power of two. m >>= 1 } // 依据hash值,查找oldbuckets中对应的bucket地址 oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize))) // 如果oldb的标记位不是撤退状态,则咱们从oldb中查找kv if !evacuated(oldb) { b = oldb } } // top示意hash的高8位,如果hash高8位小于5,则top须要加上5;因为5示意`minTopHash`,top如果是小于等于5,都是示意非凡状态;失常的key的top值都是大于5的 top := tophash(hash)bucketloop: // 一一查找对应bucket和其溢出bucket for ; b != nil; b = b.overflow(t) { // 一个bucket有8对kv,一一查找 for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // 如果b.tophash[i] == emptyRest,示意剩下的kv对都是空的,所以间接跳出循环 if b.tophash[i] == emptyRest { break bucketloop } continue } // 查找对应的key的地址 k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } // 比拟key是否相等 if t.key.equal(key, k) { // 如果key相等,则找到对应的value e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } // 返回value return e, true } } } // 返回对应的0值 return unsafe.Pointer(&zeroVal[0]), false}
总结:
- 判断是否并发读写,如果是,则抛出panic;
- 计算hash值,依据hash的位置找到对应的bucket,依据高8位,找到对应的kv槽位;
- map迁徙场景下,优先从oldbuckets中查找kv;
- 比拟key,相等则返回value,不等则返回0值;
- map kv定位过程如下图:
3.4. map新增元素和更新元素
示例代码
func main() { m1 := make(map[int8]int) m1[1] = 1 m1[2] = 2 m1[1] = 11 fmt.Println(m1)}
map的新增和更新元素操作,都会调用函数mapassign
,该函数存在文件runtime/map.go
中。
// Like mapaccess, but allocates a slot for the key if it is not present in the map.func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { if h == nil { panic(plainError("assignment to entry in nil map")) } if raceenabled { callerpc := getcallerpc() pc := funcPC(mapassign) racewritepc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } if msanenabled { msanread(key, t.key.size) } // map不反对并发读写 if h.flags&hashWriting != 0 { throw("concurrent map writes") } // 计算hash值 hash := t.hasher(key, uintptr(h.hash0)) // map状态设置为hashWriting // Set hashWriting after calling t.hasher, since t.hasher may panic, // in which case we have not actually done a write. h.flags ^= hashWriting // 如果map没有初始化bucket,此时会申请bucket空间 if h.buckets == nil { h.buckets = newobject(t.bucket) // newarray(t.bucket, 1) }again: // 依据hash值,计算bucket索引 bucket := hash & bucketMask(h.B) // 判断map是否正在扩容 if h.growing() { // 函数growWork是将hmp.oldbuckets中对应的bucket迁徙到新的buckets中 growWork(t, h, bucket) } // 指标bucket的首地址 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize))) top := tophash(hash) var inserti *uint8 var insertk unsafe.Pointer var elem unsafe.Pointerbucketloop: for { // 遍历tophash查找key是否曾经存在,或者是否有空位插入kv for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // tophash中可能有多个空位,咱们记录第一个空位的索引,前面的空位跳过 if isEmpty(b.tophash[i]) && inserti == nil { inserti = &b.tophash[i] insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) } // tophash值示意残余都是空位,则间接完结循环,因为前面全是空位,不会有雷同的key在前面的槽位,此次操作必然是插入,而不是更新 if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if !t.key.equal(key, k) { continue } // already have a mapping for key. Update it. if t.needkeyupdate() { typedmemmove(t.key, k, key) } elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) goto done } // 如果bucket是满的,而且没有发现雷同的key,则持续查找溢出bucket ovf := b.overflow(t) if ovf == nil { break } b = ovf } // Did not find mapping for key. Allocate new cell & add entry. // If we hit the max load factor or we have too many overflow buckets, // and we're not already in the middle of growing, start growing. // 程序运行到此处,必然是因为没有找到雷同的key,此次操作是插入,不是更新; // 插入一对kv,咱们须要判断map是否须要扩容; // overLoadFactor函数用来判断map是否因为数据太多,须要增量1倍扩容; // tooManyOverflowBuckets函数用来判断map是否须要等量迁徙,map因为删除操作,溢出bucket很多,然而数据分布很稠密,咱们能够通过等量迁徙,将数据更加紧凑的存储在一起,节约空间; // 具体能够看evacuate函数剖析; if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) { // hashGrow函数次要是设置hmap.flags为扩容状态,申请新的内存空间用来扩容,同时设置hmap.oldbuckets为原来的hmap.buckets hashGrow(t, h) goto again // Growing the table invalidates everything, so try again } // inserti == nil示意没有插入的槽位,须要申请溢出bucket if inserti == nil { // all current buckets are full, allocate a new one. newb := h.newoverflow(t, b) inserti = &newb.tophash[0] insertk = add(unsafe.Pointer(newb), dataOffset) elem = add(insertk, bucketCnt*uintptr(t.keysize)) } // store new key/elem at insert position if t.indirectkey() { kmem := newobject(t.key) *(*unsafe.Pointer)(insertk) = kmem insertk = kmem } if t.indirectelem() { vmem := newobject(t.elem) *(*unsafe.Pointer)(elem) = vmem } typedmemmove(t.key, insertk, key) *inserti = top h.count++done: // 设置flags,并写入value if h.flags&hashWriting == 0 { throw("concurrent map writes") } h.flags &^= hashWriting if t.indirectelem() { elem = *((*unsafe.Pointer)(elem)) } return elem}// 迁徙oldbucket中的对应bucketfunc growWork(t *maptype, h *hmap, bucket uintptr) { // make sure we evacuate the oldbucket corresponding // to the bucket we're about to use evacuate(t, h, bucket&h.oldbucketmask()) // evacuate one more oldbucket to make progress on growing if h.growing() { evacuate(t, h, h.nevacuate) }}// bucket迁徙函数func evacuate(t *maptype, h *hmap, oldbucket uintptr) { // old bucket索引 b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) // 如果是等量迁徙,则newbit示意bucket数量;如果是增量迁徙,newbit示意增量前的bucket数量; newbit := h.noldbuckets() // 待迁徙bucket是否是迁徙状态 if !evacuated(b) { // TODO: reuse overflow buckets instead of using new ones, if there // is no iterator using the old buckets. (If !oldIterator.) // xy contains the x and y (low and high) evacuation destinations. // 同一个hash值,在新旧buckets中对应的bucket索引可能是不一样的; // 例如hash值是1001,旧buckets数量是8,新buckets数量是16,那么该hash值在旧buckets中索引是1,新buckets中索引是9; // x示意新旧索引不变的状况下,新bucket的索引;y示意新索引减少newbit状况下,新bucket的索引; var xy [2]evacDst x := &xy[0] x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize))) x.k = add(unsafe.Pointer(x.b), dataOffset) x.e = add(x.k, bucketCnt*uintptr(t.keysize)) if !h.sameSizeGrow() { // Only calculate y pointers if we're growing bigger. // Otherwise GC can see bad pointers. y := &xy[1] y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize))) y.k = add(unsafe.Pointer(y.b), dataOffset) y.e = add(y.k, bucketCnt*uintptr(t.keysize)) } for ; b != nil; b = b.overflow(t) { // 待迁徙bucket中kv的首地址 k := add(unsafe.Pointer(b), dataOffset) e := add(k, bucketCnt*uintptr(t.keysize)) for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) { top := b.tophash[i] // 如果tophash为空,则跳过,这样就能够让数据紧凑,节约内存空间; if isEmpty(top) { b.tophash[i] = evacuatedEmpty continue } if top < minTopHash { throw("bad map state") } k2 := k if t.indirectkey() { k2 = *((*unsafe.Pointer)(k2)) } var useY uint8 if !h.sameSizeGrow() { // Compute hash to make our evacuation decision (whether we need // to send this key/elem to bucket x or bucket y). hash := t.hasher(k2, uintptr(h.hash0)) if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) { // If key != key (NaNs), then the hash could be (and probably // will be) entirely different from the old hash. Moreover, // it isn't reproducible. Reproducibility is required in the // presence of iterators, as our evacuation decision must // match whatever decision the iterator made. // Fortunately, we have the freedom to send these keys either // way. Also, tophash is meaningless for these kinds of keys. // We let the low bit of tophash drive the evacuation decision. // We recompute a new random tophash for the next level so // these keys will get evenly distributed across all buckets // after multiple grows. useY = top & 1 top = tophash(hash) } else { if hash&newbit != 0 { useY = 1 } } } // 查看迁徙状态 if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY { throw("bad evacuatedN") } // 设置tophash值 b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY dst := &xy[useY] // evacuation destination // dst用来接管迁徙的bucket(包含溢出bucket)中的kv; // 迁徙过去的无效kv数量达到8之后,dst会申请溢出bucket; if dst.i == bucketCnt { dst.b = h.newoverflow(t, dst.b) dst.i = 0 dst.k = add(unsafe.Pointer(dst.b), dataOffset) dst.e = add(dst.k, bucketCnt*uintptr(t.keysize)) } dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check if t.indirectkey() { *(*unsafe.Pointer)(dst.k) = k2 // copy pointer } else { typedmemmove(t.key, dst.k, k) // copy elem } if t.indirectelem() { *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e) } else { typedmemmove(t.elem, dst.e, e) } dst.i++ // These updates might push these pointers past the end of the // key or elem arrays. That's ok, as we have the overflow pointer // at the end of the bucket to protect against pointing past the // end of the bucket. dst.k = add(dst.k, uintptr(t.keysize)) dst.e = add(dst.e, uintptr(t.elemsize)) } } // 迁徙实现后,清理bucket kv和溢出bucket的指针;保留tophash; // Unlink the overflow buckets & clear key/elem to help GC. if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 { b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)) // Preserve b.tophash because the evacuation // state is maintained there. ptr := add(b, dataOffset) n := uintptr(t.bucketsize) - dataOffset memclrHasPointers(ptr, n) } } // hmap.nevacuate累加 if oldbucket == h.nevacuate { advanceEvacuationMark(h, t, newbit) }}func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) { h.nevacuate++ // Experiments suggest that 1024 is overkill by at least an order of magnitude. // Put it in there as a safeguard anyway, to ensure O(1) behavior. stop := h.nevacuate + 1024 if stop > newbit { stop = newbit } for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) { h.nevacuate++ } if h.nevacuate == newbit { // newbit == # of oldbuckets // Growing is all done. Free old main bucket array. h.oldbuckets = nil // Can discard old overflow buckets as well. // If they are still referenced by an iterator, // then the iterator holds a pointers to the slice. if h.extra != nil { h.extra.oldoverflow = nil } h.flags &^= sameSizeGrow }}func hashGrow(t *maptype, h *hmap) { // If we've hit the load factor, get bigger. // Otherwise, there are too many overflow buckets, // so keep the same number of buckets and "grow" laterally. bigger := uint8(1) if !overLoadFactor(h.count+1, h.B) { bigger = 0 h.flags |= sameSizeGrow } oldbuckets := h.buckets newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) flags := h.flags &^ (iterator | oldIterator) if h.flags&iterator != 0 { flags |= oldIterator } // commit the grow (atomic wrt gc) h.B += bigger h.flags = flags h.oldbuckets = oldbuckets h.buckets = newbuckets h.nevacuate = 0 h.noverflow = 0 if h.extra != nil && h.extra.overflow != nil { // Promote current overflow buckets to the old generation. if h.extra.oldoverflow != nil { throw("oldoverflow is not nil") } h.extra.oldoverflow = h.extra.overflow h.extra.overflow = nil } if nextOverflow != nil { if h.extra == nil { h.extra = new(mapextra) } h.extra.nextOverflow = nextOverflow } // the actual copying of the hash table data is done incrementally // by growWork() and evacuate().}
总结:
- map优先查看是否有雷同的key,如果有,则示意是更新操作;
- 如果没有雷同的key,则示意是插入操作;如果有空位,则在第一个空位处插入;如果没有空位,则减少一个溢出bucket,在溢出bucket中插入;插入操作可能会触发扩容操作;
- map不是一次性实现扩容的,而是逐渐实现扩容的;当在一个bucket中执行插入操作的时候,如果发现须要扩容,则会把这个bucket(蕴含溢出bucket)全副迁徙到新申请的buckets空间中,同时多扩容一个bucket(集体了解是减速扩容速度,否则因为个别bucket始终没有应用,导致map始终保护新旧两个buckets);
- map库容分为等量迁徙和加倍扩容:等量迁徙是为了让稠密的数据分布更加紧凑(因为删除操作,map可能会很稠密),加倍扩容是因为插入数据过多,申请一个加倍的空间来存储kv,同时加倍扩容也会删除空的槽位,让数据分布紧凑;
3.5. map删除元素
示例代码
func main() { m1 := make(map[int8]int) m1[1] = 1 delete(m1, 1)}
map删除元素操作调用的底层函数是mapdelete
该函数存在文件runtime/map.go
中.
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) { if raceenabled && h != nil { callerpc := getcallerpc() pc := funcPC(mapdelete) racewritepc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } if msanenabled && h != nil { msanread(key, t.key.size) } // h == nil || h.count == 0的时候,间接返回; // 不过如果map的key类型是无奈比拟的话,这里会报错runtime error: hash of unhashable type xxx // 所以会调用一次t.hasher函数,该函数会报适合的panic,能够参考issue 23734:https://github.com/golang/go/issues/23734 if h == nil || h.count == 0 { if t.hashMightPanic() { t.hasher(key, 0) // see issue 23734 } return } // map不反对并发读写 if h.flags&hashWriting != 0 { throw("concurrent map writes") } hash := t.hasher(key, uintptr(h.hash0)) // Set hashWriting after calling t.hasher, since t.hasher may panic, // in which case we have not actually done a write (delete). h.flags ^= hashWriting // bucket索引 bucket := hash & bucketMask(h.B) // 如果map正在扩容过程中,此时会优先扩容,一次扩容2个bucket; if h.growing() { growWork(t, h, bucket) } b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) bOrig := b top := tophash(hash)search: for ; b != nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // 如果top=emptyRest,则示意前面的槽位都是空的,所以间接返回; if b.tophash[i] == emptyRest { break search } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) k2 := k if t.indirectkey() { k2 = *((*unsafe.Pointer)(k2)) } if !t.key.equal(key, k2) { continue } // 删除kv // Only clear key if there are pointers in it. if t.indirectkey() { *(*unsafe.Pointer)(k) = nil } else if t.key.ptrdata != 0 { memclrHasPointers(k, t.key.size) } e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { *(*unsafe.Pointer)(e) = nil } else if t.elem.ptrdata != 0 { memclrHasPointers(e, t.elem.size) } else { memclrNoHeapPointers(e, t.elem.size) } // 删除kv之后,首先将top值批改为emptyOne,如果后续的kv都没有,会将以后top值批改为emptyRest; // 同时,以后top值批改,可能会导致之前的top值也须要相应批改; b.tophash[i] = emptyOne // If the bucket now ends in a bunch of emptyOne states, // change those to emptyRest states. // It would be nice to make this a separate function, but // for loops are not currently inlineable. // 如果曾经是以后bucket的最初一个元素,则会持续寻找溢出bucket; if i == bucketCnt-1 { if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest { goto notLast } } else { // 如果下一个top值不是emptyRest,则示意以后的top值不须要批改成emptyRest; if b.tophash[i+1] != emptyRest { goto notLast } } // 循环批改top值; // 因为以后top值批改为emptyRest,可能导致前一个top值或者前一个bucket的最初一个top值也要相应批改; for { b.tophash[i] = emptyRest if i == 0 { if b == bOrig { break // beginning of initial bucket, we're done. } // Find previous bucket, continue at its last entry. c := b for b = bOrig; b.overflow(t) != c; b = b.overflow(t) { } i = bucketCnt - 1 } else { i-- } if b.tophash[i] != emptyOne { break } } notLast: h.count-- break search } } if h.flags&hashWriting == 0 { throw("concurrent map writes") } h.flags &^= hashWriting}
总结:
- 删除操作也不能够并发;
- 删除时候,也会触发扩容迁徙;集体了解,go map不会一次性实现扩容迁徙,这样应该比拟耗费工夫和性能,go map通过用户行为一直触发扩容迁徙(一次就会扩容迁徙2个bucket),这样尽管会有较长时间保留着old buckets,然而对map响应和用户体验影响较小,所以应该是一种折中和均衡的计划;
- 删除时候,会顺次遍历扭转top值;
3.6. map遍历元素
示例代码
func main() { m1 := make(map[int8]int) m1[1] = 1 for k,v := range m1 { fmt.Println(k,v) }}
map
遍历元素分为两步,首先调用函数mapiterinit
,初始化迭代器构造体hiter
;而后调用函数mapiternext
来循环遍历kv;上面咱们首先看下迭代器hiter
的构造,而后剖析一下函数mapiterinit
和函数mapiternext
源码,这两个函数都存在于文件runtime/map.go
中。
迭代器hiter
的构造
// A hash iteration structure.// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate// the layout of this structure.type hiter struct { key unsafe.Pointer // Must be in first position. Write nil to indicate iteration end (see cmd/internal/gc/range.go). elem unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go). t *maptype h *hmap buckets unsafe.Pointer // bucket ptr at hash_iter initialization time bptr *bmap // current bucket overflow *[]*bmap // keeps overflow buckets of hmap.buckets alive oldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alive startBucket uintptr // bucket iteration started at offset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1) wrapped bool // already wrapped around from end of bucket array to beginning B uint8 i uint8 bucket uintptr checkBucket uintptr}
key
:key指针;elem
:elem指针;bptr
:以后正待遍历的bucket
指针;startBucket
:遍历起始的bucket索引;offset
:遍历每个bucket的时候,起始的cell索引;wrapped
:map遍历个别是从两头的bucket开始往开端bucket遍历,如果曾经到了开端,则会持续从头开始遍历;该标记位为真时候,示意开始从头开始遍历;i
:以后cell索引;bucket
:以后bucket索引;checkBucket
:须要查看的bucket索引;当map遍历的之前,map正在扩容迁徙(growing)过程中,此时找到一个待遍历的bucket,咱们会先找到旧bucket,如果旧bucket还没有迁徙,同时咱们晓得,如果迁徙完结,该bucket中的kv必定会迁徙到2个bucket(例如B=1,旧的buckets是b0和b1;扩容后B=2,新的buckets是b0,b1,b2,b3,依据之前扩容迁徙的过程剖析,旧的b0会迁徙到新的b0和b2);所以map只会返回最终会迁徙到新bucket的kv;checkBucket就是上述场景下的bucket索引;
迭代函数源码:
// mapiterinit initializes the hiter struct used for ranging over maps.// The hiter struct pointed to by 'it' is allocated on the stack// by the compilers order pass or on the heap by reflect_mapiterinit.// Both need to have zeroed hiter since the struct contains pointers.func mapiterinit(t *maptype, h *hmap, it *hiter) { if raceenabled && h != nil { callerpc := getcallerpc() racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit)) } // 遍历没有初始化的map不会报错 if h == nil || h.count == 0 { return } if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 { throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go } it.t = t it.h = h // grab snapshot of bucket state it.B = h.B it.buckets = h.buckets if t.bucket.ptrdata == 0 { // Allocate the current slice and remember pointers to both current and old. // This preserves all relevant overflow buckets alive even if // the table grows and/or overflow buckets are added to the table // while we are iterating. h.createOverflow() it.overflow = h.extra.overflow it.oldoverflow = h.extra.oldoverflow } // 每次map遍历的起始bucket槽位和起始cell槽位都是随机的,起因就是这两个槽位是依据随机数来产生的 // decide where to start r := uintptr(fastrand()) if h.B > 31-bucketCntBits { r += uintptr(fastrand()) << 31 } it.startBucket = r & bucketMask(h.B) it.offset = uint8(r >> h.B & (bucketCnt - 1)) // iterator state it.bucket = it.startBucket // 批改hmap状态,原子操作 // Remember we have an iterator. // Can run concurrently with another mapiterinit(). if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator { atomic.Or8(&h.flags, iterator|oldIterator) } mapiternext(it)}func mapiternext(it *hiter) { h := it.h if raceenabled { callerpc := getcallerpc() racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext)) } if h.flags&hashWriting != 0 { throw("concurrent map iteration and map write") } t := it.t bucket := it.bucket b := it.bptr i := it.i checkBucket := it.checkBucketnext: if b == nil { // bucket示意以后的`bucket`索引,wrapped示意是否从头遍历了; // map个别是从两头`bucket`开始遍历,如果遍历到开端则wrapped=true,bucket=0,从头开始持续遍历; // 所以上面的判断条件如果为真,就是曾经遍历完结了; if bucket == it.startBucket && it.wrapped { // end of iteration it.key = nil it.elem = nil return } // 遍历之后,h.B可能持续变大 if h.growing() && it.B == h.B { // Iterator was started in the middle of a grow, and the grow isn't done yet. // If the bucket we're looking at hasn't been filled in yet (i.e. the old // bucket hasn't been evacuated) then we need to iterate through the old // bucket and only return the ones that will be migrated to this bucket. oldbucket := bucket & it.h.oldbucketmask() b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) if !evacuated(b) { checkBucket = bucket } else { b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize))) checkBucket = noCheck } } else { b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize))) checkBucket = noCheck } bucket++ // bucket遍历到开端后,从头开始持续遍历 if bucket == bucketShift(it.B) { bucket = 0 it.wrapped = true } i = 0 } for ; i < bucketCnt; i++ { offi := (i + it.offset) & (bucketCnt - 1) if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty { // TODO: emptyRest is hard to use here, as we start iterating // in the middle of a bucket. It's feasible, just tricky. continue } k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize)) if checkBucket != noCheck && !h.sameSizeGrow() { // Special case: iterator was started during a grow to a larger size // and the grow is not done yet. We're working on a bucket whose // oldbucket has not been evacuated yet. Or at least, it wasn't // evacuated when we started the bucket. So we're iterating // through the oldbucket, skipping any keys that will go // to the other new bucket (each oldbucket expands to two // buckets during a grow). if t.reflexivekey() || t.key.equal(k, k) { // If the item in the oldbucket is not destined for // the current new bucket in the iteration, skip it. hash := t.hasher(k, uintptr(h.hash0)) // 跳过不会迁徙到以后bucket的kv if hash&bucketMask(it.B) != checkBucket { continue } } else { // Hash isn't repeatable if k != k (NaNs). We need a // repeatable and randomish choice of which direction // to send NaNs during evacuation. We'll use the low // bit of tophash to decide which way NaNs go. // NOTE: this case is why we need two evacuate tophash // values, evacuatedX and evacuatedY, that differ in // their low bit. if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) { continue } } } if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) || !(t.reflexivekey() || t.key.equal(k, k)) { // This is the golden data, we can return it. // OR // key!=key, so the entry can't be deleted or updated, so we can just return it. // That's lucky for us because when key!=key we can't look it up successfully. it.key = k if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } it.elem = e } else { // The hash table has grown since the iterator was started. // The golden data for this key is now somewhere else. // Check the current hash table for the data. // This code handles the case where the key // has been deleted, updated, or deleted and reinserted. // NOTE: we need to regrab the key as it has potentially been // updated to an equal() but not identical key (e.g. +0.0 vs -0.0). rk, re := mapaccessK(t, h, k) if rk == nil { continue // key has been deleted } it.key = rk it.elem = re } it.bucket = bucket if it.bptr != b { // avoid unnecessary write barrier; see issue 14921 it.bptr = b } it.i = i + 1 it.checkBucket = checkBucket return } b = b.overflow(t) i = 0 goto next}// returns both key and elem. Used by map iteratorfunc mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) { if h == nil || h.count == 0 { return nil, nil } hash := t.hasher(key, uintptr(h.hash0)) m := bucketMask(h.B) b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize))) if c := h.oldbuckets; c != nil { if !h.sameSizeGrow() { // There used to be half as many buckets; mask down one more power of two. m >>= 1 } oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize))) if !evacuated(oldb) { b = oldb } } top := tophash(hash)bucketloop: for ; b != nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if t.key.equal(key, k) { e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } return k, e } } } return nil, nil}
总结:
map
遍历首先会初始化迭代器hiter
,而后调用遍历函数mapiternext
;map
遍历的起始bucket和起始cell都是随机的;- 如果
map
遍历前,map
进入一个growing过程,则map
遍历成果等效于该growing全副完结后的的成果;也就是说,一个新bucket,可能还没有迁徙进数据,然而map
能够失常返回将来会迁徙进入该bucket的数据;
4. 其余
如何获取调用的具体
map
函数筹备代码
package mainimport ("fmt")func main() {m1 := make(map[string]string, 9)fmt.Println(m1)for i := 0; i < 20; i++ { str := fmt.Sprintf("%d", i) m1[str] = str} a := m1["0"]b, ok := m1["0"]fmt.Println(a,b,ok)}
打印汇编代码命令
go tool compile -N -l -S main.go > main.txt
- 依据汇编代码,查找调用函数
CALL runtime.makemap(SB)
5. 参考
- Golang map源码详解
- go根底之map
- 年度最佳【golang】map详解