1. 前言

map是CS中十分根底的数据结构,对于golang map的根本应用,这里不再赘述,能够参考官网文档。
golang的map实现是基于hash查找表,并且基于链表来解决hash碰撞问题。

2. 环境信息

  • go版本:go1.15.4 darwin/amd64

3. go map数据结构剖析

map的根底构造体是hmap,该构造体存在文件runtime/map.go
hmap源码:

// A header for a Go map.type hmap struct {    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.    // Make sure this stays in sync with the compiler's definition.    count     int // # live cells == size of map.  Must be first (used by len() builtin)    flags     uint8    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details    hash0     uint32 // hash seed    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)    extra *mapextra // optional fields}// mapextra holds fields that are not present on all maps.type mapextra struct {    // If both key and elem do not contain pointers and are inline, then we mark bucket    // type as containing no pointers. This avoids scanning such maps.    // However, bmap.overflow is a pointer. In order to keep overflow buckets    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.    // overflow and oldoverflow are only used if key and elem do not contain pointers.    // overflow contains overflow buckets for hmap.buckets.    // oldoverflow contains overflow buckets for hmap.oldbuckets.    // The indirection allows to store a pointer to the slice in hiter.    overflow    *[]*bmap    oldoverflow *[]*bmap    // nextOverflow holds a pointer to a free overflow bucket.    nextOverflow *bmap}
  • count:map中kv对的数量;
  • flags:map的一些标记位;
  • B:map中bucket数量为2^B个;意味着此时map数据结构中能够存储loadFactor * 2^B个数据,如果超过,则须要扩容;todo
  • noverflow:map中溢出bucket的近似数量;todo
  • hash0:hash函数的种子;
  • buckets:map中bucket的首指针,map中一共有2^B个bucket;如果count==0的状况下,该字段可能是nil;
  • oldbuckets:map中旧bucket的首指针,该字段只有在map扩容的时候,才不等于nil;todo
  • nevacuate:map中bucket迁徙数量,至少有此数量的bucket从旧bucket迁徙到新bucket;todo
  • extra:扩大字段;

bmap是bucket真正的构造体

// A bucket for a Go map.type bmap struct {    // tophash generally contains the top byte of the hash value    // for each key in this bucket. If tophash[0] < minTopHash,    // tophash[0] is a bucket evacuation state instead.    tophash [bucketCnt]uint8    // Followed by bucketCnt keys and then bucketCnt elems.    // NOTE: packing all the keys together and then all the elems together makes the    // code a bit more complicated than alternating key/elem/key/elem/... but it allows    // us to eliminate padding which would be needed for, e.g., map[int64]int8.    // Followed by an overflow pointer.}
  • tophash:存储hash值的高8位;
  • keys:key数组,暗藏字段;
  • values:value数组,暗藏字段;
  • overflow:溢出buceket指针,暗藏字段;

bmap.tophash中除了存储hash值的高8位,也能够用来存储一些状态码。

    // Possible tophash values. We reserve a few possibilities for special marks.    // Each bucket (including its overflow buckets, if any) will have either all or none of its    // entries in the evacuated* states (except during the evacuate() method, which only happens    // during map writes and thus no one else can observe the map during that time).    emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.    emptyOne       = 1 // this cell is empty    evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.    evacuatedY     = 3 // same as above, but evacuated to second half of larger table.    evacuatedEmpty = 4 // cell is empty, bucket is evacuated.    minTopHash     = 5 // minimum tophash for a normal filled cell.
bmap结构图

hmap结构图

上面咱们重点剖析一下map的创立和增删改查操作,咱们会展现源码,同时在源码上减少中文正文,作为对源码的剖析;golang编译器会依据不同状况,调用不同的函数,咱们上面剖析的是runtime/map.go文件中的根本函数;一些其余优化函数,例如runtime/map_faststr.go中对map[string]type类型的优化,感兴趣的同学能够自行查看。

3.1. map创立

示例代码

func main() {    m1 := make(map[string]string)    m2 := make(map[string]string, 9)}

咱们能够通过汇编编译代码看到go map创立调用的底层函数是makemap,该函数存在文件runtime/map.go中;事实上,不同的map申明形式,go规范编译器抉择不同的函数调用,例如m1 := make(map[string]string)代码,编译器会调用函数runtime.makemap_small,然而大部分场景下都是调用makemap。上面咱们剖析下函数makemap

// makemap implements Go map creation for make(map[k]v, hint).// If the compiler has determined that the map or the first bucket// can be created on the stack, h and/or bucket may be non-nil.// If h != nil, the map can be created directly in h.// If h.buckets != nil, bucket pointed to can be used as the first bucket.func makemap(t *maptype, hint int, h *hmap) *hmap {    // 查看申请的map空间是否超过内存限度    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)    if overflow || mem > maxAlloc {        hint = 0    }    // 初始化hmap    // initialize Hmap    if h == nil {        h = new(hmap)    }    // hash初始种子    h.hash0 = fastrand()    // 计算B    // Find the size parameter B which will hold the requested # of elements.    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.    B := uint8(0)    for overLoadFactor(hint, B) {        B++    }    h.B = B    // allocate initial hash table    // if B == 0, the buckets field is allocated lazily later (in mapassign)    // If hint is large zeroing this memory could take a while.    if h.B != 0 {        var nextOverflow *bmap        // 调用函数makeBucketArray,调配bucket和溢出bucket的内存        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)        if nextOverflow != nil {            h.extra = new(mapextra)            h.extra.nextOverflow = nextOverflow        }    }    return h}// makeBucketArray initializes a backing array for map buckets.// 1<<b is the minimum number of buckets to allocate.// dirtyalloc should either be nil or a bucket array previously// allocated by makeBucketArray with the same t and b parameters.// If dirtyalloc is nil a new backing array will be alloced and// otherwise dirtyalloc will be cleared and reused as backing array.func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {    base := bucketShift(b)    nbuckets := base    // 如果b >= 4,则示意申请的map空间较大,咱们当时申请一些溢出bucket,这样能够提高效率    // For small b, overflow buckets are unlikely.    // Avoid the overhead of the calculation.    if b >= 4 {        // Add on the estimated number of overflow buckets        // required to insert the median number of elements        // used with this value of b.        nbuckets += bucketShift(b - 4)        sz := t.bucket.size * nbuckets        up := roundupsize(sz)        if up != sz {            nbuckets = up / t.bucket.size        }    }    if dirtyalloc == nil {        buckets = newarray(t.bucket, int(nbuckets))    } else {        // dirtyalloc was previously generated by        // the above newarray(t.bucket, int(nbuckets))        // but may not be empty.        buckets = dirtyalloc        size := t.bucket.size * nbuckets        if t.bucket.ptrdata != 0 {            memclrHasPointers(buckets, size)        } else {            memclrNoHeapPointers(buckets, size)        }    }    if base != nbuckets {        // We preallocated some overflow buckets.        // To keep the overhead of tracking these overflow buckets to a minimum,        // we use the convention that if a preallocated overflow bucket's overflow        // pointer is nil, then there are more available by bumping the pointer.        // We need a safe non-nil pointer for the last overflow bucket; just use buckets.        // nextOverflow是溢出bucket的首地址;        // last是最初一个溢出bucket的首地址;        // 每个溢出bucket对应的bmap构造体中的溢出bucket指针都是nil;然而last的溢出bucket指针是bucket的起始地址;        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))        last.setoverflow(t, (*bmap)(buckets))    }    return buckets, nextOverflow}

总结:

  • 函数makemap会依据不同的申明形式和参数,决定map的初始化空间大小;
  • map中kv都存储在bucket中,每个bucket能够存8对kv;
  • 如果len(map) > 0,则map中至多存在一个bucket,所以map的空间都是8的整数倍;
  • 如果map申请空间较大(大于等于128),示意呈现key碰撞的几率较大,则会当时创立一些溢出bucket,以备前期应用;

3.2. map查找元素

示例代码

func main() {    m1 := make(map[int8]int)    m1[1] = 1    v, ok := m1[1]    fmt.Println(v, ok)}

map查找元素操作,底层调用的函数mapaccess1mapaccess2,该函数存在文件runtime/map.go中;这两个函数基本一致,只是函数mapaccess2会返回bool类型,示意key是否存在。事实上,对于不同的map key类型,编译器会调用不同的函数来实现map的增删改查操作,其中针对非凡key类型的优化函数,存在文件runtime/map_fast32.goruntime/map_fast64.goruntime/map_faststr.go中;例如,如果key的类型是string,map的查找操作会调用优化函数mapaccess1_faststrmapaccess2_faststr。本文只剖析根本的函数,对于优化函数,感兴趣的同学能够自行查看源码。上面咱们剖析函数mapaccess2

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {    // 启用数据竞争检测    if raceenabled && h != nil {        callerpc := getcallerpc()        pc := funcPC(mapaccess2)        racereadpc(unsafe.Pointer(h), callerpc, pc)        raceReadObjectPC(t.key, key, callerpc, pc)    }    // 启用-msan检测    if msanenabled && h != nil {        msanread(key, t.key.size)    }    if h == nil || h.count == 0 {        if t.hashMightPanic() {            t.hasher(key, 0) // see issue 23734        }        return unsafe.Pointer(&zeroVal[0]), false    }    // map不反对并发平安,并发读写会产生panic    if h.flags&hashWriting != 0 {        throw("concurrent map read and map write")    }    // 计算hash值    hash := t.hasher(key, uintptr(h.hash0))    // m示意map中bucket数量    m := bucketMask(h.B)    // 利用`hash mod m`能够计算bucket索引,b示意对应bucket的首地址    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))    // map正在迁徙的场景,如果map正在迁徙,则优先从oldbuckets中查找kv    if c := h.oldbuckets; c != nil {        // map是否在扩容迁徙,如果是扩容迁徙,则oldbuckets理论的bucket数量是m的一半(扩容会让bucket数量增加一倍)        if !h.sameSizeGrow() {            // There used to be half as many buckets; mask down one more power of two.            m >>= 1        }        // 依据hash值,查找oldbuckets中对应的bucket地址        oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))        // 如果oldb的标记位不是撤退状态,则咱们从oldb中查找kv        if !evacuated(oldb) {            b = oldb        }    }    // top示意hash的高8位,如果hash高8位小于5,则top须要加上5;因为5示意`minTopHash`,top如果是小于等于5,都是示意非凡状态;失常的key的top值都是大于5的    top := tophash(hash)bucketloop:    // 一一查找对应bucket和其溢出bucket    for ; b != nil; b = b.overflow(t) {        // 一个bucket有8对kv,一一查找        for i := uintptr(0); i < bucketCnt; i++ {            if b.tophash[i] != top {               // 如果b.tophash[i] == emptyRest,示意剩下的kv对都是空的,所以间接跳出循环                if b.tophash[i] == emptyRest {                    break bucketloop                }                continue            }            // 查找对应的key的地址            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))            if t.indirectkey() {                k = *((*unsafe.Pointer)(k))            }            // 比拟key是否相等             if t.key.equal(key, k) {               // 如果key相等,则找到对应的value                 e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))                if t.indirectelem() {                    e = *((*unsafe.Pointer)(e))                }               // 返回value                return e, true            }        }    }    // 返回对应的0值    return unsafe.Pointer(&zeroVal[0]), false}

总结:

  • 判断是否并发读写,如果是,则抛出panic;
  • 计算hash值,依据hash的位置找到对应的bucket,依据高8位,找到对应的kv槽位;
  • map迁徙场景下,优先从oldbuckets中查找kv;
  • 比拟key,相等则返回value,不等则返回0值;
  • map kv定位过程如下图:

3.4. map新增元素和更新元素

示例代码

func main() {    m1 := make(map[int8]int)    m1[1] = 1    m1[2] = 2    m1[1] = 11    fmt.Println(m1)}

map的新增和更新元素操作,都会调用函数mapassign,该函数存在文件runtime/map.go中。

// Like mapaccess, but allocates a slot for the key if it is not present in the map.func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {    if h == nil {        panic(plainError("assignment to entry in nil map"))    }    if raceenabled {        callerpc := getcallerpc()        pc := funcPC(mapassign)        racewritepc(unsafe.Pointer(h), callerpc, pc)        raceReadObjectPC(t.key, key, callerpc, pc)    }    if msanenabled {        msanread(key, t.key.size)    }    // map不反对并发读写    if h.flags&hashWriting != 0 {        throw("concurrent map writes")    }    // 计算hash值     hash := t.hasher(key, uintptr(h.hash0))    // map状态设置为hashWriting    // Set hashWriting after calling t.hasher, since t.hasher may panic,    // in which case we have not actually done a write.    h.flags ^= hashWriting    // 如果map没有初始化bucket,此时会申请bucket空间    if h.buckets == nil {        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)    }again:    // 依据hash值,计算bucket索引    bucket := hash & bucketMask(h.B)    // 判断map是否正在扩容    if h.growing() {        // 函数growWork是将hmp.oldbuckets中对应的bucket迁徙到新的buckets中        growWork(t, h, bucket)    }    // 指标bucket的首地址     b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))    top := tophash(hash)    var inserti *uint8    var insertk unsafe.Pointer    var elem unsafe.Pointerbucketloop:    for {        // 遍历tophash查找key是否曾经存在,或者是否有空位插入kv        for i := uintptr(0); i < bucketCnt; i++ {            if b.tophash[i] != top {                // tophash中可能有多个空位,咱们记录第一个空位的索引,前面的空位跳过                if isEmpty(b.tophash[i]) && inserti == nil {                    inserti = &b.tophash[i]                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))                }                // tophash值示意残余都是空位,则间接完结循环,因为前面全是空位,不会有雷同的key在前面的槽位,此次操作必然是插入,而不是更新                if b.tophash[i] == emptyRest {                    break bucketloop                }                continue            }            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))            if t.indirectkey() {                k = *((*unsafe.Pointer)(k))            }            if !t.key.equal(key, k) {                continue            }            // already have a mapping for key. Update it.            if t.needkeyupdate() {                typedmemmove(t.key, k, key)            }            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))            goto done        }        // 如果bucket是满的,而且没有发现雷同的key,则持续查找溢出bucket        ovf := b.overflow(t)        if ovf == nil {            break        }        b = ovf    }    // Did not find mapping for key. Allocate new cell & add entry.    // If we hit the max load factor or we have too many overflow buckets,    // and we're not already in the middle of growing, start growing.    // 程序运行到此处,必然是因为没有找到雷同的key,此次操作是插入,不是更新;    // 插入一对kv,咱们须要判断map是否须要扩容;    // overLoadFactor函数用来判断map是否因为数据太多,须要增量1倍扩容;    // tooManyOverflowBuckets函数用来判断map是否须要等量迁徙,map因为删除操作,溢出bucket很多,然而数据分布很稠密,咱们能够通过等量迁徙,将数据更加紧凑的存储在一起,节约空间;    // 具体能够看evacuate函数剖析;    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {        //  hashGrow函数次要是设置hmap.flags为扩容状态,申请新的内存空间用来扩容,同时设置hmap.oldbuckets为原来的hmap.buckets        hashGrow(t, h)        goto again // Growing the table invalidates everything, so try again    }    // inserti == nil示意没有插入的槽位,须要申请溢出bucket    if inserti == nil {        // all current buckets are full, allocate a new one.        newb := h.newoverflow(t, b)        inserti = &newb.tophash[0]        insertk = add(unsafe.Pointer(newb), dataOffset)        elem = add(insertk, bucketCnt*uintptr(t.keysize))    }    // store new key/elem at insert position    if t.indirectkey() {        kmem := newobject(t.key)        *(*unsafe.Pointer)(insertk) = kmem        insertk = kmem    }    if t.indirectelem() {        vmem := newobject(t.elem)        *(*unsafe.Pointer)(elem) = vmem    }    typedmemmove(t.key, insertk, key)    *inserti = top    h.count++done:    // 设置flags,并写入value    if h.flags&hashWriting == 0 {        throw("concurrent map writes")    }    h.flags &^= hashWriting    if t.indirectelem() {        elem = *((*unsafe.Pointer)(elem))    }    return elem}// 迁徙oldbucket中的对应bucketfunc growWork(t *maptype, h *hmap, bucket uintptr) {    // make sure we evacuate the oldbucket corresponding    // to the bucket we're about to use    evacuate(t, h, bucket&h.oldbucketmask())    // evacuate one more oldbucket to make progress on growing    if h.growing() {        evacuate(t, h, h.nevacuate)    }}// bucket迁徙函数func evacuate(t *maptype, h *hmap, oldbucket uintptr) {    // old bucket索引    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))    // 如果是等量迁徙,则newbit示意bucket数量;如果是增量迁徙,newbit示意增量前的bucket数量;    newbit := h.noldbuckets()    // 待迁徙bucket是否是迁徙状态    if !evacuated(b) {        // TODO: reuse overflow buckets instead of using new ones, if there        // is no iterator using the old buckets.  (If !oldIterator.)        // xy contains the x and y (low and high) evacuation destinations.        // 同一个hash值,在新旧buckets中对应的bucket索引可能是不一样的;        // 例如hash值是1001,旧buckets数量是8,新buckets数量是16,那么该hash值在旧buckets中索引是1,新buckets中索引是9;        // x示意新旧索引不变的状况下,新bucket的索引;y示意新索引减少newbit状况下,新bucket的索引;        var xy [2]evacDst        x := &xy[0]        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))        x.k = add(unsafe.Pointer(x.b), dataOffset)        x.e = add(x.k, bucketCnt*uintptr(t.keysize))        if !h.sameSizeGrow() {            // Only calculate y pointers if we're growing bigger.            // Otherwise GC can see bad pointers.            y := &xy[1]            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))            y.k = add(unsafe.Pointer(y.b), dataOffset)            y.e = add(y.k, bucketCnt*uintptr(t.keysize))        }        for ; b != nil; b = b.overflow(t) {            // 待迁徙bucket中kv的首地址            k := add(unsafe.Pointer(b), dataOffset)            e := add(k, bucketCnt*uintptr(t.keysize))            for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {                top := b.tophash[i]                // 如果tophash为空,则跳过,这样就能够让数据紧凑,节约内存空间;                 if isEmpty(top) {                    b.tophash[i] = evacuatedEmpty                    continue                }                if top < minTopHash {                    throw("bad map state")                }                k2 := k                if t.indirectkey() {                    k2 = *((*unsafe.Pointer)(k2))                }                var useY uint8                if !h.sameSizeGrow() {                    // Compute hash to make our evacuation decision (whether we need                    // to send this key/elem to bucket x or bucket y).                    hash := t.hasher(k2, uintptr(h.hash0))                    if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {                        // If key != key (NaNs), then the hash could be (and probably                        // will be) entirely different from the old hash. Moreover,                        // it isn't reproducible. Reproducibility is required in the                        // presence of iterators, as our evacuation decision must                        // match whatever decision the iterator made.                        // Fortunately, we have the freedom to send these keys either                        // way. Also, tophash is meaningless for these kinds of keys.                        // We let the low bit of tophash drive the evacuation decision.                        // We recompute a new random tophash for the next level so                        // these keys will get evenly distributed across all buckets                        // after multiple grows.                        useY = top & 1                        top = tophash(hash)                    } else {                        if hash&newbit != 0 {                            useY = 1                        }                    }                }                // 查看迁徙状态                if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {                    throw("bad evacuatedN")                }                // 设置tophash值                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY                dst := &xy[useY]                 // evacuation destination                // dst用来接管迁徙的bucket(包含溢出bucket)中的kv;                // 迁徙过去的无效kv数量达到8之后,dst会申请溢出bucket;                if dst.i == bucketCnt {                    dst.b = h.newoverflow(t, dst.b)                    dst.i = 0                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)                    dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))                }                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check                if t.indirectkey() {                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer                } else {                    typedmemmove(t.key, dst.k, k) // copy elem                }                if t.indirectelem() {                    *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)                } else {                    typedmemmove(t.elem, dst.e, e)                }                dst.i++                // These updates might push these pointers past the end of the                // key or elem arrays.  That's ok, as we have the overflow pointer                // at the end of the bucket to protect against pointing past the                // end of the bucket.                dst.k = add(dst.k, uintptr(t.keysize))                dst.e = add(dst.e, uintptr(t.elemsize))            }        }        // 迁徙实现后,清理bucket kv和溢出bucket的指针;保留tophash;        // Unlink the overflow buckets & clear key/elem to help GC.        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))            // Preserve b.tophash because the evacuation            // state is maintained there.            ptr := add(b, dataOffset)            n := uintptr(t.bucketsize) - dataOffset            memclrHasPointers(ptr, n)        }    }    //  hmap.nevacuate累加    if oldbucket == h.nevacuate {        advanceEvacuationMark(h, t, newbit)    }}func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {    h.nevacuate++    // Experiments suggest that 1024 is overkill by at least an order of magnitude.    // Put it in there as a safeguard anyway, to ensure O(1) behavior.    stop := h.nevacuate + 1024    if stop > newbit {        stop = newbit    }    for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {        h.nevacuate++    }    if h.nevacuate == newbit { // newbit == # of oldbuckets        // Growing is all done. Free old main bucket array.        h.oldbuckets = nil        // Can discard old overflow buckets as well.        // If they are still referenced by an iterator,        // then the iterator holds a pointers to the slice.        if h.extra != nil {            h.extra.oldoverflow = nil        }        h.flags &^= sameSizeGrow    }}func hashGrow(t *maptype, h *hmap) {    // If we've hit the load factor, get bigger.    // Otherwise, there are too many overflow buckets,    // so keep the same number of buckets and "grow" laterally.    bigger := uint8(1)    if !overLoadFactor(h.count+1, h.B) {        bigger = 0        h.flags |= sameSizeGrow    }    oldbuckets := h.buckets    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)    flags := h.flags &^ (iterator | oldIterator)    if h.flags&iterator != 0 {        flags |= oldIterator    }    // commit the grow (atomic wrt gc)    h.B += bigger    h.flags = flags    h.oldbuckets = oldbuckets    h.buckets = newbuckets    h.nevacuate = 0    h.noverflow = 0    if h.extra != nil && h.extra.overflow != nil {        // Promote current overflow buckets to the old generation.        if h.extra.oldoverflow != nil {            throw("oldoverflow is not nil")        }        h.extra.oldoverflow = h.extra.overflow        h.extra.overflow = nil    }    if nextOverflow != nil {        if h.extra == nil {            h.extra = new(mapextra)        }        h.extra.nextOverflow = nextOverflow    }    // the actual copying of the hash table data is done incrementally    // by growWork() and evacuate().}

总结:

  • map优先查看是否有雷同的key,如果有,则示意是更新操作;
  • 如果没有雷同的key,则示意是插入操作;如果有空位,则在第一个空位处插入;如果没有空位,则减少一个溢出bucket,在溢出bucket中插入;插入操作可能会触发扩容操作;
  • map不是一次性实现扩容的,而是逐渐实现扩容的;当在一个bucket中执行插入操作的时候,如果发现须要扩容,则会把这个bucket(蕴含溢出bucket)全副迁徙到新申请的buckets空间中,同时多扩容一个bucket(集体了解是减速扩容速度,否则因为个别bucket始终没有应用,导致map始终保护新旧两个buckets);
  • map库容分为等量迁徙和加倍扩容:等量迁徙是为了让稠密的数据分布更加紧凑(因为删除操作,map可能会很稠密),加倍扩容是因为插入数据过多,申请一个加倍的空间来存储kv,同时加倍扩容也会删除空的槽位,让数据分布紧凑;

3.5. map删除元素

示例代码

func main() {    m1 := make(map[int8]int)    m1[1] = 1    delete(m1, 1)}

map删除元素操作调用的底层函数是mapdelete该函数存在文件runtime/map.go中.

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {    if raceenabled && h != nil {        callerpc := getcallerpc()        pc := funcPC(mapdelete)        racewritepc(unsafe.Pointer(h), callerpc, pc)        raceReadObjectPC(t.key, key, callerpc, pc)    }    if msanenabled && h != nil {        msanread(key, t.key.size)    }    // h == nil || h.count == 0的时候,间接返回;    // 不过如果map的key类型是无奈比拟的话,这里会报错runtime error: hash of unhashable type xxx    // 所以会调用一次t.hasher函数,该函数会报适合的panic,能够参考issue 23734:https://github.com/golang/go/issues/23734    if h == nil || h.count == 0 {        if t.hashMightPanic() {            t.hasher(key, 0) // see issue 23734        }        return    }    // map不反对并发读写    if h.flags&hashWriting != 0 {        throw("concurrent map writes")    }    hash := t.hasher(key, uintptr(h.hash0))    // Set hashWriting after calling t.hasher, since t.hasher may panic,    // in which case we have not actually done a write (delete).    h.flags ^= hashWriting    // bucket索引    bucket := hash & bucketMask(h.B)    // 如果map正在扩容过程中,此时会优先扩容,一次扩容2个bucket;    if h.growing() {        growWork(t, h, bucket)    }    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))    bOrig := b    top := tophash(hash)search:    for ; b != nil; b = b.overflow(t) {        for i := uintptr(0); i < bucketCnt; i++ {            if b.tophash[i] != top {                // 如果top=emptyRest,则示意前面的槽位都是空的,所以间接返回;                if b.tophash[i] == emptyRest {                    break search                }                continue            }            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))            k2 := k            if t.indirectkey() {                k2 = *((*unsafe.Pointer)(k2))            }            if !t.key.equal(key, k2) {                continue            }            // 删除kv            // Only clear key if there are pointers in it.            if t.indirectkey() {                *(*unsafe.Pointer)(k) = nil            } else if t.key.ptrdata != 0 {                memclrHasPointers(k, t.key.size)            }            e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))            if t.indirectelem() {                *(*unsafe.Pointer)(e) = nil            } else if t.elem.ptrdata != 0 {                memclrHasPointers(e, t.elem.size)            } else {                memclrNoHeapPointers(e, t.elem.size)            }            // 删除kv之后,首先将top值批改为emptyOne,如果后续的kv都没有,会将以后top值批改为emptyRest;            // 同时,以后top值批改,可能会导致之前的top值也须要相应批改;            b.tophash[i] = emptyOne            // If the bucket now ends in a bunch of emptyOne states,            // change those to emptyRest states.            // It would be nice to make this a separate function, but            // for loops are not currently inlineable.              // 如果曾经是以后bucket的最初一个元素,则会持续寻找溢出bucket;            if i == bucketCnt-1 {                if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {                    goto notLast                }            } else { // 如果下一个top值不是emptyRest,则示意以后的top值不须要批改成emptyRest;                if b.tophash[i+1] != emptyRest {                    goto notLast                }            }            // 循环批改top值;            // 因为以后top值批改为emptyRest,可能导致前一个top值或者前一个bucket的最初一个top值也要相应批改;            for {                b.tophash[i] = emptyRest                if i == 0 {                    if b == bOrig {                        break // beginning of initial bucket, we're done.                    }                    // Find previous bucket, continue at its last entry.                    c := b                    for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {                    }                    i = bucketCnt - 1                } else {                    i--                }                if b.tophash[i] != emptyOne {                    break                }            }        notLast:            h.count--            break search        }    }    if h.flags&hashWriting == 0 {        throw("concurrent map writes")    }    h.flags &^= hashWriting}

总结:

  • 删除操作也不能够并发;
  • 删除时候,也会触发扩容迁徙;集体了解,go map不会一次性实现扩容迁徙,这样应该比拟耗费工夫和性能,go map通过用户行为一直触发扩容迁徙(一次就会扩容迁徙2个bucket),这样尽管会有较长时间保留着old buckets,然而对map响应和用户体验影响较小,所以应该是一种折中和均衡的计划;
  • 删除时候,会顺次遍历扭转top值;

3.6. map遍历元素

示例代码

func main() {    m1 := make(map[int8]int)    m1[1] = 1    for k,v := range m1 {        fmt.Println(k,v)    }}

map遍历元素分为两步,首先调用函数mapiterinit,初始化迭代器构造体hiter;而后调用函数mapiternext来循环遍历kv;上面咱们首先看下迭代器hiter的构造,而后剖析一下函数mapiterinit和函数mapiternext源码,这两个函数都存在于文件runtime/map.go中。
迭代器hiter的构造

// A hash iteration structure.// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate// the layout of this structure.type hiter struct {    key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/internal/gc/range.go).    elem        unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).    t           *maptype    h           *hmap    buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time    bptr        *bmap          // current bucket    overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive    oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive    startBucket uintptr        // bucket iteration started at    offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)    wrapped     bool           // already wrapped around from end of bucket array to beginning    B           uint8    i           uint8    bucket      uintptr    checkBucket uintptr}
  • key:key指针;
  • elem:elem指针;
  • bptr:以后正待遍历的bucket指针;
  • startBucket:遍历起始的bucket索引;
  • offset:遍历每个bucket的时候,起始的cell索引;
  • wrapped:map遍历个别是从两头的bucket开始往开端bucket遍历,如果曾经到了开端,则会持续从头开始遍历;该标记位为真时候,示意开始从头开始遍历;
  • i:以后cell索引;
  • bucket:以后bucket索引;
  • checkBucket:须要查看的bucket索引;当map遍历的之前,map正在扩容迁徙(growing)过程中,此时找到一个待遍历的bucket,咱们会先找到旧bucket,如果旧bucket还没有迁徙,同时咱们晓得,如果迁徙完结,该bucket中的kv必定会迁徙到2个bucket(例如B=1,旧的buckets是b0和b1;扩容后B=2,新的buckets是b0,b1,b2,b3,依据之前扩容迁徙的过程剖析,旧的b0会迁徙到新的b0和b2);所以map只会返回最终会迁徙到新bucket的kv;checkBucket就是上述场景下的bucket索引;

迭代函数源码:

// mapiterinit initializes the hiter struct used for ranging over maps.// The hiter struct pointed to by 'it' is allocated on the stack// by the compilers order pass or on the heap by reflect_mapiterinit.// Both need to have zeroed hiter since the struct contains pointers.func mapiterinit(t *maptype, h *hmap, it *hiter) {    if raceenabled && h != nil {        callerpc := getcallerpc()        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))    }    // 遍历没有初始化的map不会报错    if h == nil || h.count == 0 {        return    }    if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {        throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go    }    it.t = t    it.h = h    // grab snapshot of bucket state    it.B = h.B    it.buckets = h.buckets    if t.bucket.ptrdata == 0 {        // Allocate the current slice and remember pointers to both current and old.        // This preserves all relevant overflow buckets alive even if        // the table grows and/or overflow buckets are added to the table        // while we are iterating.        h.createOverflow()        it.overflow = h.extra.overflow        it.oldoverflow = h.extra.oldoverflow    }    // 每次map遍历的起始bucket槽位和起始cell槽位都是随机的,起因就是这两个槽位是依据随机数来产生的    // decide where to start    r := uintptr(fastrand())    if h.B > 31-bucketCntBits {        r += uintptr(fastrand()) << 31    }    it.startBucket = r & bucketMask(h.B)    it.offset = uint8(r >> h.B & (bucketCnt - 1))    // iterator state    it.bucket = it.startBucket    // 批改hmap状态,原子操作    // Remember we have an iterator.    // Can run concurrently with another mapiterinit().    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {        atomic.Or8(&h.flags, iterator|oldIterator)    }    mapiternext(it)}func mapiternext(it *hiter) {    h := it.h    if raceenabled {        callerpc := getcallerpc()        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))    }    if h.flags&hashWriting != 0 {        throw("concurrent map iteration and map write")    }    t := it.t    bucket := it.bucket    b := it.bptr    i := it.i    checkBucket := it.checkBucketnext:    if b == nil {        // bucket示意以后的`bucket`索引,wrapped示意是否从头遍历了;        // map个别是从两头`bucket`开始遍历,如果遍历到开端则wrapped=true,bucket=0,从头开始持续遍历;        // 所以上面的判断条件如果为真,就是曾经遍历完结了;        if bucket == it.startBucket && it.wrapped {            // end of iteration            it.key = nil            it.elem = nil            return        }        // 遍历之后,h.B可能持续变大        if h.growing() && it.B == h.B {            // Iterator was started in the middle of a grow, and the grow isn't done yet.            // If the bucket we're looking at hasn't been filled in yet (i.e. the old            // bucket hasn't been evacuated) then we need to iterate through the old            // bucket and only return the ones that will be migrated to this bucket.            oldbucket := bucket & it.h.oldbucketmask()            b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))            if !evacuated(b) {                checkBucket = bucket            } else {                b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))                checkBucket = noCheck            }        } else {            b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))            checkBucket = noCheck        }        bucket++        // bucket遍历到开端后,从头开始持续遍历        if bucket == bucketShift(it.B) {            bucket = 0            it.wrapped = true        }        i = 0    }    for ; i < bucketCnt; i++ {        offi := (i + it.offset) & (bucketCnt - 1)        if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {            // TODO: emptyRest is hard to use here, as we start iterating            // in the middle of a bucket. It's feasible, just tricky.            continue        }        k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))        if t.indirectkey() {            k = *((*unsafe.Pointer)(k))        }        e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))        if checkBucket != noCheck && !h.sameSizeGrow() {            // Special case: iterator was started during a grow to a larger size            // and the grow is not done yet. We're working on a bucket whose            // oldbucket has not been evacuated yet. Or at least, it wasn't            // evacuated when we started the bucket. So we're iterating            // through the oldbucket, skipping any keys that will go            // to the other new bucket (each oldbucket expands to two            // buckets during a grow).            if t.reflexivekey() || t.key.equal(k, k) {                // If the item in the oldbucket is not destined for                // the current new bucket in the iteration, skip it.                hash := t.hasher(k, uintptr(h.hash0))                // 跳过不会迁徙到以后bucket的kv                if hash&bucketMask(it.B) != checkBucket {                    continue                }            } else {                // Hash isn't repeatable if k != k (NaNs).  We need a                // repeatable and randomish choice of which direction                // to send NaNs during evacuation. We'll use the low                // bit of tophash to decide which way NaNs go.                // NOTE: this case is why we need two evacuate tophash                // values, evacuatedX and evacuatedY, that differ in                // their low bit.                if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {                    continue                }            }        }        if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||            !(t.reflexivekey() || t.key.equal(k, k)) {            // This is the golden data, we can return it.            // OR            // key!=key, so the entry can't be deleted or updated, so we can just return it.            // That's lucky for us because when key!=key we can't look it up successfully.            it.key = k            if t.indirectelem() {                e = *((*unsafe.Pointer)(e))            }            it.elem = e        } else {            // The hash table has grown since the iterator was started.            // The golden data for this key is now somewhere else.            // Check the current hash table for the data.            // This code handles the case where the key            // has been deleted, updated, or deleted and reinserted.            // NOTE: we need to regrab the key as it has potentially been            // updated to an equal() but not identical key (e.g. +0.0 vs -0.0).            rk, re := mapaccessK(t, h, k)            if rk == nil {                continue // key has been deleted            }            it.key = rk            it.elem = re        }        it.bucket = bucket        if it.bptr != b { // avoid unnecessary write barrier; see issue 14921            it.bptr = b        }        it.i = i + 1        it.checkBucket = checkBucket        return    }    b = b.overflow(t)    i = 0    goto next}// returns both key and elem. Used by map iteratorfunc mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {    if h == nil || h.count == 0 {        return nil, nil    }    hash := t.hasher(key, uintptr(h.hash0))    m := bucketMask(h.B)    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))    if c := h.oldbuckets; c != nil {        if !h.sameSizeGrow() {            // There used to be half as many buckets; mask down one more power of two.            m >>= 1        }        oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))        if !evacuated(oldb) {            b = oldb        }    }    top := tophash(hash)bucketloop:    for ; b != nil; b = b.overflow(t) {        for i := uintptr(0); i < bucketCnt; i++ {            if b.tophash[i] != top {                if b.tophash[i] == emptyRest {                    break bucketloop                }                continue            }            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))            if t.indirectkey() {                k = *((*unsafe.Pointer)(k))            }            if t.key.equal(key, k) {                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))                if t.indirectelem() {                    e = *((*unsafe.Pointer)(e))                }                return k, e            }        }    }    return nil, nil}

总结:

  • map遍历首先会初始化迭代器hiter,而后调用遍历函数mapiternext
  • map遍历的起始bucket和起始cell都是随机的;
  • 如果map遍历前,map进入一个growing过程,则map遍历成果等效于该growing全副完结后的的成果;也就是说,一个新bucket,可能还没有迁徙进数据,然而map能够失常返回将来会迁徙进入该bucket的数据;

4. 其余

  • 如何获取调用的具体map函数

    • 筹备代码

      package mainimport ("fmt")func main() {m1 := make(map[string]string, 9)fmt.Println(m1)for i := 0; i < 20; i++ {   str := fmt.Sprintf("%d", i)   m1[str] = str}   a := m1["0"]b, ok := m1["0"]fmt.Println(a,b,ok)}
    • 打印汇编代码命令

      go tool compile -N -l -S main.go > main.txt
    • 依据汇编代码,查找调用函数CALL runtime.makemap(SB)

5. 参考

  • Golang map源码详解
  • go根底之map
  • 年度最佳【golang】map详解