关于golang:Go源码学习map

3次阅读

共计 28361 个字符,预计需要花费 71 分钟才能阅读完成。

1. 前言

map 是 CS 中十分根底的数据结构,对于 golang map 的根本应用,这里不再赘述,能够参考官网文档。
golang 的 map 实现是基于 hash 查找表,并且基于链表来解决 hash 碰撞问题。

2. 环境信息

  • go 版本:go1.15.4 darwin/amd64

3. go map 数据结构剖析

map 的根底构造体是 hmap,该构造体存在文件runtime/map.go
hmap源码:

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed
    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}
  • count:map 中 kv 对的数量;
  • flags:map 的一些标记位;
  • B:map 中 bucket 数量为 2^B 个;意味着此时 map 数据结构中能够存储 loadFactor * 2^B 个数据,如果超过,则须要扩容;todo
  • noverflow:map 中溢出 bucket 的近似数量;todo
  • hash0:hash 函数的种子;
  • buckets:map 中 bucket 的首指针,map 中一共有 2^B 个 bucket;如果 count== 0 的状况下,该字段可能是 nil;
  • oldbuckets:map 中旧 bucket 的首指针,该字段只有在 map 扩容的时候,才不等于 nil;todo
  • nevacuate:map 中 bucket 迁徙数量,至少有此数量的 bucket 从旧 bucket 迁徙到新 bucket;todo
  • extra:扩大字段;

bmap是 bucket 真正的构造体

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}
  • tophash:存储 hash 值的高 8 位;
  • keys:key 数组,暗藏字段;
  • values:value 数组,暗藏字段;
  • overflow:溢出 buceket 指针,暗藏字段;

bmap.tophash中除了存储 hash 值的高 8 位,也能够用来存储一些状态码。

    // Possible tophash values. We reserve a few possibilities for special marks.
    // Each bucket (including its overflow buckets, if any) will have either all or none of its
    // entries in the evacuated* states (except during the evacuate() method, which only happens
    // during map writes and thus no one else can observe the map during that time).
    emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
    emptyOne       = 1 // this cell is empty
    evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
    evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
    evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
    minTopHash     = 5 // minimum tophash for a normal filled cell.
bmap 结构图

hmap 结构图

上面咱们重点剖析一下 map 的创立和增删改查操作,咱们会展现源码,同时在源码上减少中文正文,作为对源码的剖析;golang 编译器会依据不同状况,调用不同的函数,咱们上面剖析的是 runtime/map.go 文件中的根本函数;一些其余优化函数,例如 runtime/map_faststr.go 中对 map[string]type 类型的优化,感兴趣的同学能够自行查看。

3.1. map 创立

示例代码

func main() {m1 := make(map[string]string)
    m2 := make(map[string]string, 9)
}

咱们能够通过汇编编译代码看到 go map 创立调用的底层函数是 makemap,该函数存在文件runtime/map.go 中;事实上,不同的 map 申明形式,go 规范编译器抉择不同的函数调用,例如 m1 := make(map[string]string) 代码,编译器会调用函数runtime.makemap_small,然而大部分场景下都是调用makemap。上面咱们剖析下函数makemap

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    // 查看申请的 map 空间是否超过内存限度
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {hint = 0}
    // 初始化 hmap
    // initialize Hmap
    if h == nil {h = new(hmap)
    }
    // hash 初始种子
    h.hash0 = fastrand()
    // 计算 B
    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    B := uint8(0)
    for overLoadFactor(hint, B) {B++}
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        // 调用函数 makeBucketArray,调配 bucket 和溢出 bucket 的内存
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {base := bucketShift(b)
    nbuckets := base
    // 如果 b >= 4,则示意申请的 map 空间较大,咱们当时申请一些溢出 bucket,这样能够提高效率
    // For small b, overflow buckets are unlikely.
    // Avoid the overhead of the calculation.
    if b >= 4 {
        // Add on the estimated number of overflow buckets
        // required to insert the median number of elements
        // used with this value of b.
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {nbuckets = up / t.bucket.size}
    }

    if dirtyalloc == nil {buckets = newarray(t.bucket, int(nbuckets))
    } else {
        // dirtyalloc was previously generated by
        // the above newarray(t.bucket, int(nbuckets))
        // but may not be empty.
        buckets = dirtyalloc
        size := t.bucket.size * nbuckets
        if t.bucket.ptrdata != 0 {memclrHasPointers(buckets, size)
        } else {memclrNoHeapPointers(buckets, size)
        }
    }

    if base != nbuckets {
        // We preallocated some overflow buckets.
        // To keep the overhead of tracking these overflow buckets to a minimum,
        // we use the convention that if a preallocated overflow bucket's overflow
        // pointer is nil, then there are more available by bumping the pointer.
        // We need a safe non-nil pointer for the last overflow bucket; just use buckets.
        // nextOverflow 是溢出 bucket 的首地址;// last 是最初一个溢出 bucket 的首地址;// 每个溢出 bucket 对应的 bmap 构造体中的溢出 bucket 指针都是 nil;然而 last 的溢出 bucket 指针是 bucket 的起始地址;nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

总结:

  • 函数 makemap 会依据不同的申明形式和参数,决定 map 的初始化空间大小;
  • map 中 kv 都存储在 bucket 中,每个 bucket 能够存 8 对 kv;
  • 如果len(map) > 0,则 map 中至多存在一个 bucket,所以 map 的空间都是 8 的整数倍;
  • 如果 map 申请空间较大(大于等于 128),示意呈现 key 碰撞的几率较大,则会当时创立一些溢出 bucket,以备前期应用;

3.2. map 查找元素

示例代码

func main() {m1 := make(map[int8]int)
    m1[1] = 1
    v, ok := m1[1]
    fmt.Println(v, ok)
}

map 查找元素操作,底层调用的函数 mapaccess1mapaccess2,该函数存在文件 runtime/map.go 中;这两个函数基本一致,只是函数 mapaccess2 会返回 bool 类型,示意 key 是否存在。事实上,对于不同的 map key 类型,编译器会调用不同的函数来实现 map 的增删改查操作,其中针对非凡 key 类型的优化函数,存在文件 runtime/map_fast32.goruntime/map_fast64.goruntime/map_faststr.go中;例如,如果 key 的类型是 string,map 的查找操作会调用优化函数 mapaccess1_faststrmapaccess2_faststr。本文只剖析根本的函数,对于优化函数,感兴趣的同学能够自行查看源码。上面咱们剖析函数mapaccess2

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
    // 启用数据竞争检测
    if raceenabled && h != nil {callerpc := getcallerpc()
        pc := funcPC(mapaccess2)
        racereadpc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    // 启用 -msan 检测
    if msanenabled && h != nil {msanread(key, t.key.size)
    }
    if h == nil || h.count == 0 {if t.hashMightPanic() {t.hasher(key, 0) // see issue 23734
        }
        return unsafe.Pointer(&zeroVal[0]), false
    }
    // map 不反对并发平安,并发读写会产生 panic
    if h.flags&hashWriting != 0 {throw("concurrent map read and map write")
    }
    // 计算 hash 值
    hash := t.hasher(key, uintptr(h.hash0))
    // m 示意 map 中 bucket 数量
    m := bucketMask(h.B)
    // 利用 `hash mod m` 能够计算 bucket 索引,b 示意对应 bucket 的首地址
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
    // map 正在迁徙的场景,如果 map 正在迁徙,则优先从 oldbuckets 中查找 kv
    if c := h.oldbuckets; c != nil {
        // map 是否在扩容迁徙,如果是扩容迁徙,则 oldbuckets 理论的 bucket 数量是 m 的一半(扩容会让 bucket 数量增加一倍)if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        // 依据 hash 值,查找 oldbuckets 中对应的 bucket 地址
        oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
        // 如果 oldb 的标记位不是撤退状态,则咱们从 oldb 中查找 kv
        if !evacuated(oldb) {b = oldb}
    }
    // top 示意 hash 的高 8 位,如果 hash 高 8 位小于 5,则 top 须要加上 5;因为 5 示意 `minTopHash`,top 如果是小于等于 5,都是示意非凡状态;失常的 key 的 top 值都是大于 5 的
    top := tophash(hash)
bucketloop:
    // 一一查找对应 bucket 和其溢出 bucket
    for ; b != nil; b = b.overflow(t) {
        // 一个 bucket 有 8 对 kv,一一查找
        for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {// 如果 b.tophash[i] == emptyRest,示意剩下的 kv 对都是空的,所以间接跳出循环
                if b.tophash[i] == emptyRest {break bucketloop}
                continue
            }
            // 查找对应的 key 的地址
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {k = *((*unsafe.Pointer)(k))
            }
            // 比拟 key 是否相等 
            if t.key.equal(key, k) {
               // 如果 key 相等,则找到对应的 value 
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {e = *((*unsafe.Pointer)(e))
                }
               // 返回 value
                return e, true
            }
        }
    }
    // 返回对应的 0 值
    return unsafe.Pointer(&zeroVal[0]), false
}

总结:

  • 判断是否并发读写,如果是,则抛出 panic;
  • 计算 hash 值,依据 hash 的位置找到对应的 bucket,依据高 8 位,找到对应的 kv 槽位;
  • map 迁徙场景下,优先从 oldbuckets 中查找 kv;
  • 比拟 key,相等则返回 value,不等则返回 0 值;
  • map kv 定位过程如下图:

3.4. map 新增元素和更新元素

示例代码

func main() {m1 := make(map[int8]int)
    m1[1] = 1
    m1[2] = 2
    m1[1] = 11
    fmt.Println(m1)
}

map 的新增和更新元素操作,都会调用函数 mapassign,该函数存在文件runtime/map.go 中。

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h == nil {panic(plainError("assignment to entry in nil map"))
    }
    if raceenabled {callerpc := getcallerpc()
        pc := funcPC(mapassign)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled {msanread(key, t.key.size)
    }
    // map 不反对并发读写
    if h.flags&hashWriting != 0 {throw("concurrent map writes")
    }
    // 计算 hash 值 
    hash := t.hasher(key, uintptr(h.hash0))
    // map 状态设置为 hashWriting
    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write.
    h.flags ^= hashWriting
    // 如果 map 没有初始化 bucket,此时会申请 bucket 空间
    if h.buckets == nil {h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
    // 依据 hash 值,计算 bucket 索引
    bucket := hash & bucketMask(h.B)
    // 判断 map 是否正在扩容
    if h.growing() {
        // 函数 growWork 是将 hmp.oldbuckets 中对应的 bucket 迁徙到新的 buckets 中
        growWork(t, h, bucket)
    }
    // 指标 bucket 的首地址 
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    for {
        // 遍历 tophash 查找 key 是否曾经存在,或者是否有空位插入 kv
        for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {
                // tophash 中可能有多个空位,咱们记录第一个空位的索引,前面的空位跳过
                if isEmpty(b.tophash[i]) && inserti == nil {inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                // tophash 值示意残余都是空位,则间接完结循环,因为前面全是空位,不会有雷同的 key 在前面的槽位,此次操作必然是插入,而不是更新
                if b.tophash[i] == emptyRest {break bucketloop}
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {k = *((*unsafe.Pointer)(k))
            }
            if !t.key.equal(key, k) {continue}
            // already have a mapping for key. Update it.
            if t.needkeyupdate() {typedmemmove(t.key, k, key)
            }
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            goto done
        }
        // 如果 bucket 是满的,而且没有发现雷同的 key,则持续查找溢出 bucket
        ovf := b.overflow(t)
        if ovf == nil {break}
        b = ovf
    }

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    // 程序运行到此处,必然是因为没有找到雷同的 key,此次操作是插入,不是更新;// 插入一对 kv,咱们须要判断 map 是否须要扩容;// overLoadFactor 函数用来判断 map 是否因为数据太多,须要增量 1 倍扩容;// tooManyOverflowBuckets 函数用来判断 map 是否须要等量迁徙,map 因为删除操作,溢出 bucket 很多,然而数据分布很稠密,咱们能够通过等量迁徙,将数据更加紧凑的存储在一起,节约空间;// 具体能够看 evacuate 函数剖析;if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        //  hashGrow 函数次要是设置 hmap.flags 为扩容状态,申请新的内存空间用来扩容,同时设置 hmap.oldbuckets 为原来的 hmap.buckets
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }
    // inserti == nil 示意没有插入的槽位,须要申请溢出 bucket
    if inserti == nil {
        // all current buckets are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    if t.indirectkey() {kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectelem() {vmem := newobject(t.elem)
        *(*unsafe.Pointer)(elem) = vmem
    }
    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++

done:
    // 设置 flags,并写入 value
    if h.flags&hashWriting == 0 {throw("concurrent map writes")
    }
    h.flags &^= hashWriting
    if t.indirectelem() {elem = *((*unsafe.Pointer)(elem))
    }
    return elem
}
// 迁徙 oldbucket 中的对应 bucket
func growWork(t *maptype, h *hmap, bucket uintptr) {
    // make sure we evacuate the oldbucket corresponding
    // to the bucket we're about to use
    evacuate(t, h, bucket&h.oldbucketmask())

    // evacuate one more oldbucket to make progress on growing
    if h.growing() {evacuate(t, h, h.nevacuate)
    }
}
// bucket 迁徙函数
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    // old bucket 索引
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    // 如果是等量迁徙,则 newbit 示意 bucket 数量;如果是增量迁徙,newbit 示意增量前的 bucket 数量;newbit := h.noldbuckets()
    // 待迁徙 bucket 是否是迁徙状态
    if !evacuated(b) {
        // TODO: reuse overflow buckets instead of using new ones, if there
        // is no iterator using the old buckets.  (If !oldIterator.)

        // xy contains the x and y (low and high) evacuation destinations.
        // 同一个 hash 值,在新旧 buckets 中对应的 bucket 索引可能是不一样的;// 例如 hash 值是 1001,旧 buckets 数量是 8,新 buckets 数量是 16,那么该 hash 值在旧 buckets 中索引是 1,新 buckets 中索引是 9;// x 示意新旧索引不变的状况下,新 bucket 的索引;y 示意新索引减少 newbit 状况下,新 bucket 的索引;var xy [2]evacDst
        x := &xy[0]
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        x.e = add(x.k, bucketCnt*uintptr(t.keysize))

        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            y.e = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            // 待迁徙 bucket 中 kv 的首地址
            k := add(unsafe.Pointer(b), dataOffset)
            e := add(k, bucketCnt*uintptr(t.keysize))
            for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {top := b.tophash[i]
                // 如果 tophash 为空,则跳过,这样就能够让数据紧凑,节约内存空间;if isEmpty(top) {b.tophash[i] = evacuatedEmpty
                    continue
                }
                if top < minTopHash {throw("bad map state")
                }
                k2 := k
                if t.indirectkey() {k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                if !h.sameSizeGrow() {
                    // Compute hash to make our evacuation decision (whether we need
                    // to send this key/elem to bucket x or bucket y).
                    hash := t.hasher(k2, uintptr(h.hash0))
                    if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {// If key != key (NaNs), then the hash could be (and probably
                        // will be) entirely different from the old hash. Moreover,
                        // it isn't reproducible. Reproducibility is required in the
                        // presence of iterators, as our evacuation decision must
                        // match whatever decision the iterator made.
                        // Fortunately, we have the freedom to send these keys either
                        // way. Also, tophash is meaningless for these kinds of keys.
                        // We let the low bit of tophash drive the evacuation decision.
                        // We recompute a new random tophash for the next level so
                        // these keys will get evenly distributed across all buckets
                        // after multiple grows.
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        if hash&newbit != 0 {useY = 1}
                    }
                }
                // 查看迁徙状态
                if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {throw("bad evacuatedN")
                }
                // 设置 tophash 值
                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                dst := &xy[useY]                 // evacuation destination
                // dst 用来接管迁徙的 bucket(包含溢出 bucket)中的 kv;// 迁徙过去的无效 kv 数量达到 8 之后,dst 会申请溢出 bucket;if dst.i == bucketCnt {dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                if t.indirectkey() {*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {typedmemmove(t.key, dst.k, k) // copy elem
                }
                if t.indirectelem() {*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
                } else {typedmemmove(t.elem, dst.e, e)
                }
                dst.i++
                // These updates might push these pointers past the end of the
                // key or elem arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                dst.k = add(dst.k, uintptr(t.keysize))
                dst.e = add(dst.e, uintptr(t.elemsize))
            }
        }
        // 迁徙实现后,清理 bucket kv 和溢出 bucket 的指针;保留 tophash;
        // Unlink the overflow buckets & clear key/elem to help GC.
        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }
    //  hmap.nevacuate 累加
    if oldbucket == h.nevacuate {advanceEvacuationMark(h, t, newbit)
    }
}

func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
    h.nevacuate++
    // Experiments suggest that 1024 is overkill by at least an order of magnitude.
    // Put it in there as a safeguard anyway, to ensure O(1) behavior.
    stop := h.nevacuate + 1024
    if stop > newbit {stop = newbit}
    for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {h.nevacuate++}
    if h.nevacuate == newbit { // newbit == # of oldbuckets
        // Growing is all done. Free old main bucket array.
        h.oldbuckets = nil
        // Can discard old overflow buckets as well.
        // If they are still referenced by an iterator,
        // then the iterator holds a pointers to the slice.
        if h.extra != nil {h.extra.oldoverflow = nil}
        h.flags &^= sameSizeGrow
    }
}

func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {flags |= oldIterator}
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

总结:

  • map 优先查看是否有雷同的 key,如果有,则示意是更新操作;
  • 如果没有雷同的 key,则示意是插入操作;如果有空位,则在第一个空位处插入;如果没有空位,则减少一个溢出 bucket,在溢出 bucket 中插入;插入操作可能会触发扩容操作;
  • map 不是一次性实现扩容的,而是逐渐实现扩容的;当在一个 bucket 中执行插入操作的时候,如果发现须要扩容,则会把这个 bucket(蕴含溢出 bucket)全副迁徙到新申请的 buckets 空间中,同时多扩容一个 bucket(集体了解是减速扩容速度,否则因为个别 bucket 始终没有应用,导致 map 始终保护新旧两个 buckets);
  • map 库容分为等量迁徙和加倍扩容:等量迁徙是为了让稠密的数据分布更加紧凑(因为删除操作,map 可能会很稠密),加倍扩容是因为插入数据过多,申请一个加倍的空间来存储 kv,同时加倍扩容也会删除空的槽位,让数据分布紧凑;

3.5. map 删除元素

示例代码

func main() {m1 := make(map[int8]int)
    m1[1] = 1
    delete(m1, 1)
}

map 删除元素操作调用的底层函数是 mapdelete 该函数存在文件 runtime/map.go 中.

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    if raceenabled && h != nil {callerpc := getcallerpc()
        pc := funcPC(mapdelete)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {msanread(key, t.key.size)
    }
    // h == nil || h.count == 0 的时候,间接返回;// 不过如果 map 的 key 类型是无奈比拟的话,这里会报错 runtime error: hash of unhashable type xxx
    // 所以会调用一次 t.hasher 函数,该函数会报适合的 panic,能够参考 issue 23734:https://github.com/golang/go/issues/23734
    if h == nil || h.count == 0 {if t.hashMightPanic() {t.hasher(key, 0) // see issue 23734
        }
        return
    }
    // map 不反对并发读写
    if h.flags&hashWriting != 0 {throw("concurrent map writes")
    }

    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write (delete).
    h.flags ^= hashWriting
    // bucket 索引
    bucket := hash & bucketMask(h.B)
    // 如果 map 正在扩容过程中,此时会优先扩容,一次扩容 2 个 bucket;if h.growing() {growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    bOrig := b
    top := tophash(hash)
search:
    for ; b != nil; b = b.overflow(t) {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {
                // 如果 top=emptyRest,则示意前面的槽位都是空的,所以间接返回;if b.tophash[i] == emptyRest {break search}
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            k2 := k
            if t.indirectkey() {k2 = *((*unsafe.Pointer)(k2))
            }
            if !t.key.equal(key, k2) {continue}
            // 删除 kv
            // Only clear key if there are pointers in it.
            if t.indirectkey() {*(*unsafe.Pointer)(k) = nil
            } else if t.key.ptrdata != 0 {memclrHasPointers(k, t.key.size)
            }
            e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            if t.indirectelem() {*(*unsafe.Pointer)(e) = nil
            } else if t.elem.ptrdata != 0 {memclrHasPointers(e, t.elem.size)
            } else {memclrNoHeapPointers(e, t.elem.size)
            }
            // 删除 kv 之后,首先将 top 值批改为 emptyOne,如果后续的 kv 都没有,会将以后 top 值批改为 emptyRest;// 同时,以后 top 值批改,可能会导致之前的 top 值也须要相应批改;b.tophash[i] = emptyOne
            // If the bucket now ends in a bunch of emptyOne states,
            // change those to emptyRest states.
            // It would be nice to make this a separate function, but
            // for loops are not currently inlineable.
              // 如果曾经是以后 bucket 的最初一个元素,则会持续寻找溢出 bucket;if i == bucketCnt-1 {if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {goto notLast}
            } else { // 如果下一个 top 值不是 emptyRest,则示意以后的 top 值不须要批改成 emptyRest;if b.tophash[i+1] != emptyRest {goto notLast}
            }
            // 循环批改 top 值;// 因为以后 top 值批改为 emptyRest,可能导致前一个 top 值或者前一个 bucket 的最初一个 top 值也要相应批改;for {b.tophash[i] = emptyRest
                if i == 0 {
                    if b == bOrig {break // beginning of initial bucket, we're done.}
                    // Find previous bucket, continue at its last entry.
                    c := b
                    for b = bOrig; b.overflow(t) != c; b = b.overflow(t) { }
                    i = bucketCnt - 1
                } else {i--}
                if b.tophash[i] != emptyOne {break}
            }
        notLast:
            h.count--
            break search
        }
    }

    if h.flags&hashWriting == 0 {throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}

总结:

  • 删除操作也不能够并发;
  • 删除时候,也会触发扩容迁徙;集体了解,go map 不会一次性实现扩容迁徙,这样应该比拟耗费工夫和性能,go map 通过用户行为一直触发扩容迁徙(一次就会扩容迁徙 2 个 bucket),这样尽管会有较长时间保留着 old buckets,然而对 map 响应和用户体验影响较小,所以应该是一种折中和均衡的计划;
  • 删除时候,会顺次遍历扭转 top 值;

3.6. map 遍历元素

示例代码

func main() {m1 := make(map[int8]int)
    m1[1] = 1
    for k,v := range m1 {fmt.Println(k,v)
    }
}

map遍历元素分为两步,首先调用函数 mapiterinit,初始化迭代器构造体hiter;而后调用函数mapiternext 来循环遍历 kv;上面咱们首先看下迭代器 hiter 的构造,而后剖析一下函数 mapiterinit 和函数 mapiternext 源码,这两个函数都存在于文件 runtime/map.go 中。
迭代器 hiter 的构造

// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate
// the layout of this structure.
type hiter struct {key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/internal/gc/range.go).
    elem        unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).
    t           *maptype
    h           *hmap
    buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time
    bptr        *bmap          // current bucket
    overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive
    oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive
    startBucket uintptr        // bucket iteration started at
    offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
    wrapped     bool           // already wrapped around from end of bucket array to beginning
    B           uint8
    i           uint8
    bucket      uintptr
    checkBucket uintptr
}
  • key:key 指针;
  • elem:elem 指针;
  • bptr:以后正待遍历的 bucket 指针;
  • startBucket:遍历起始的 bucket 索引;
  • offset:遍历每个 bucket 的时候,起始的 cell 索引;
  • wrapped:map 遍历个别是从两头的 bucket 开始往开端 bucket 遍历,如果曾经到了开端,则会持续从头开始遍历;该标记位为真时候,示意开始从头开始遍历;
  • i:以后 cell 索引;
  • bucket:以后 bucket 索引;
  • checkBucket:须要查看的 bucket 索引;当 map 遍历的之前,map 正在扩容迁徙 (growing) 过程中,此时找到一个待遍历的 bucket,咱们会先找到旧 bucket,如果旧 bucket 还没有迁徙,同时咱们晓得,如果迁徙完结,该 bucket 中的 kv 必定会迁徙到 2 个 bucket(例如 B =1,旧的 buckets 是 b0 和 b1;扩容后 B =2,新的 buckets 是 b0,b1,b2,b3,依据之前扩容迁徙的过程剖析,旧的 b0 会迁徙到新的 b0 和 b2);所以 map 只会返回最终会迁徙到新 bucket 的 kv;checkBucket 就是上述场景下的 bucket 索引;

迭代函数源码:

// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
    if raceenabled && h != nil {callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
    }
    // 遍历没有初始化的 map 不会报错
    if h == nil || h.count == 0 {return}

    if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
    }
    it.t = t
    it.h = h

    // grab snapshot of bucket state
    it.B = h.B
    it.buckets = h.buckets
    if t.bucket.ptrdata == 0 {
        // Allocate the current slice and remember pointers to both current and old.
        // This preserves all relevant overflow buckets alive even if
        // the table grows and/or overflow buckets are added to the table
        // while we are iterating.
        h.createOverflow()
        it.overflow = h.extra.overflow
        it.oldoverflow = h.extra.oldoverflow
    }
    // 每次 map 遍历的起始 bucket 槽位和起始 cell 槽位都是随机的,起因就是这两个槽位是依据随机数来产生的
    // decide where to start
    r := uintptr(fastrand())
    if h.B > 31-bucketCntBits {r += uintptr(fastrand()) << 31
    }
    it.startBucket = r & bucketMask(h.B)
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

    // iterator state
    it.bucket = it.startBucket

    // 批改 hmap 状态,原子操作
    // Remember we have an iterator.
    // Can run concurrently with another mapiterinit().
    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {atomic.Or8(&h.flags, iterator|oldIterator)
    }

    mapiternext(it)
}

func mapiternext(it *hiter) {
    h := it.h
    if raceenabled {callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
    }
    if h.flags&hashWriting != 0 {throw("concurrent map iteration and map write")
    }
    t := it.t
    bucket := it.bucket
    b := it.bptr
    i := it.i
    checkBucket := it.checkBucket

next:
    if b == nil {
        // bucket 示意以后的 `bucket` 索引,wrapped 示意是否从头遍历了;// map 个别是从两头 `bucket` 开始遍历,如果遍历到开端则 wrapped=true,bucket=0,从头开始持续遍历;// 所以上面的判断条件如果为真,就是曾经遍历完结了;if bucket == it.startBucket && it.wrapped {
            // end of iteration
            it.key = nil
            it.elem = nil
            return
        }
        // 遍历之后,h.B 可能持续变大
        if h.growing() && it.B == h.B {
            // Iterator was started in the middle of a grow, and the grow isn't done yet.
            // If the bucket we're looking at hasn't been filled in yet (i.e. the old
            // bucket hasn't been evacuated) then we need to iterate through the old
            // bucket and only return the ones that will be migrated to this bucket.
            oldbucket := bucket & it.h.oldbucketmask()
            b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
            if !evacuated(b) {checkBucket = bucket} else {b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
                checkBucket = noCheck
            }
        } else {b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
            checkBucket = noCheck
        }
        bucket++
        // bucket 遍历到开端后,从头开始持续遍历
        if bucket == bucketShift(it.B) {
            bucket = 0
            it.wrapped = true
        }
        i = 0
    }
    for ; i < bucketCnt; i++ {offi := (i + it.offset) & (bucketCnt - 1)
        if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
            // TODO: emptyRest is hard to use here, as we start iterating
            // in the middle of a bucket. It's feasible, just tricky.
            continue
        }
        k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
        if t.indirectkey() {k = *((*unsafe.Pointer)(k))
        }
        e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
        if checkBucket != noCheck && !h.sameSizeGrow() {
            // Special case: iterator was started during a grow to a larger size
            // and the grow is not done yet. We're working on a bucket whose
            // oldbucket has not been evacuated yet. Or at least, it wasn't
            // evacuated when we started the bucket. So we're iterating
            // through the oldbucket, skipping any keys that will go
            // to the other new bucket (each oldbucket expands to two
            // buckets during a grow).
            if t.reflexivekey() || t.key.equal(k, k) {
                // If the item in the oldbucket is not destined for
                // the current new bucket in the iteration, skip it.
                hash := t.hasher(k, uintptr(h.hash0))
                // 跳过不会迁徙到以后 bucket 的 kv
                if hash&bucketMask(it.B) != checkBucket {continue}
            } else {// Hash isn't repeatable if k != k (NaNs).  We need a
                // repeatable and randomish choice of which direction
                // to send NaNs during evacuation. We'll use the low
                // bit of tophash to decide which way NaNs go.
                // NOTE: this case is why we need two evacuate tophash
                // values, evacuatedX and evacuatedY, that differ in
                // their low bit.
                if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {continue}
            }
        }
        if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
            !(t.reflexivekey() || t.key.equal(k, k)) {
            // This is the golden data, we can return it.
            // OR
            // key!=key, so the entry can't be deleted or updated, so we can just return it.
            // That's lucky for us because when key!=key we can't look it up successfully.
            it.key = k
            if t.indirectelem() {e = *((*unsafe.Pointer)(e))
            }
            it.elem = e
        } else {
            // The hash table has grown since the iterator was started.
            // The golden data for this key is now somewhere else.
            // Check the current hash table for the data.
            // This code handles the case where the key
            // has been deleted, updated, or deleted and reinserted.
            // NOTE: we need to regrab the key as it has potentially been
            // updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
            rk, re := mapaccessK(t, h, k)
            if rk == nil {continue // key has been deleted}
            it.key = rk
            it.elem = re
        }
        it.bucket = bucket
        if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
            it.bptr = b
        }
        it.i = i + 1
        it.checkBucket = checkBucket
        return
    }
    b = b.overflow(t)
    i = 0
    goto next
}
// returns both key and elem. Used by map iterator
func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {
    if h == nil || h.count == 0 {return nil, nil}
    hash := t.hasher(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {b = oldb}
    }
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {if b.tophash[i] == emptyRest {break bucketloop}
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {k = *((*unsafe.Pointer)(k))
            }
            if t.key.equal(key, k) {e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {e = *((*unsafe.Pointer)(e))
                }
                return k, e
            }
        }
    }
    return nil, nil
}

总结:

  • map遍历首先会初始化迭代器hiter,而后调用遍历函数mapiternext
  • map遍历的起始 bucket 和起始 cell 都是随机的;
  • 如果 map 遍历前,map进入一个 growing 过程,则 map 遍历成果等效于该 growing 全副完结后的的成果;也就是说,一个新 bucket,可能还没有迁徙进数据,然而 map 能够失常返回将来会迁徙进入该 bucket 的数据;

4. 其余

  • 如何获取调用的具体 map 函数

    • 筹备代码

      package main
      import ("fmt")
      
      func main() {m1 := make(map[string]string, 9)
      fmt.Println(m1)
      for i := 0; i < 20; i++ {str := fmt.Sprintf("%d", i)
         m1[str] = str
      }
         a := m1["0"]
      b, ok := m1["0"]
      fmt.Println(a,b,ok)
      }
    • 打印汇编代码命令

      go tool compile -N -l -S main.go > main.txt
    • 依据汇编代码,查找调用函数CALL runtime.makemap(SB)

5. 参考

  • Golang map 源码详解
  • go 根底之 map
  • 年度最佳【golang】map 详解
正文完
 0