深入理解Go内存分配

1次阅读

共计 26059 个字符,预计需要花费 66 分钟才能阅读完成。

Go 语言内置运行时(就是 runtime),抛弃了传统的内存分配方式,改为自主管理,最开始是基于 tcmalloc,虽然后面改动相对已经很大了。使用自主管理可以实现更好的内存使用模式,比如内存池、预分配等等,从而避免了系统调用所带来的性能问题。

在了解 Go 的内存分配之前,我们可以看一下内存分配的基本策略,来帮助我们理解 Go 的内存分配

基本策略:

  1. 每次从操作系统申请一大块内存,以减少系统调用
  2. 将申请的大块内存按照特定大小预先切成小块,构成链表
  3. 为对象分配内存时,从大小合适的链表中提取一块即可
  4. 如果对象销毁,则将对象占用的内存,归还到原链表,以便复用
  5. 如果限制内存过多,则尝试归还部分给操作系统,降低整体开销

下面我们从源码角度来分析 Go 的内存分配策略有何异同

准备

在追踪源码之前,我们需要首先了解一些概念和结构体

  • span: 又多个地址连续的页(page)组成的大块内存
  • object: 将 span 按特定大小切分成多个小块,每个小块可存储一个对象

对象分类

  • 小对象(tiny): size < 16byte
  • 普通对象:16byte ~ 32K
  • 大对象(large):size > 32K

大小转换

结构体

mHeap

代表 Go 程序持有的所有堆空间,Go 程序使用一个 mheap 的全局对象 _mheap 来管理堆内存。

type mheap struct {
    lock      mutex
    free      [_MaxMHeapList]mSpanList // page 在 127 以内的闲置的 span 列表
    freelarge mTreap                   // page 数大于 127 的大 span 组成的树状结构体
    busy      [_MaxMHeapList]mSpanList // page 在 127 以内的已分配的 span 列表
    busylarge mSpanList                // page 数大于 127 的已分配的大 span 组成的列表

    // allspans is a slice of all mspans ever created. Each mspan
    // appears exactly once.
    // 所有创建过的 mspan 的 slice
    allspans []*mspan // all spans out there

    // arenas is the heap arena map. It points to the metadata for
    // the heap for every arena frame of the entire usable virtual
    // address space.
    //
    // Use arenaIndex to compute indexes into this array.
    //
    // For regions of the address space that are not backed by the
    // Go heap, the arena map contains nil.
    //
    // Modifications are protected by mheap_.lock. Reads can be
    // performed without locking; however, a given entry can
    // transition from nil to non-nil at any time when the lock
    // isn't held. (Entries never transitions back to nil.)
    //
    // In general, this is a two-level mapping consisting of an L1
    // map and possibly many L2 maps. This saves space when there
    // are a huge number of arena frames. However, on many
    // platforms (even 64-bit), arenaL1Bits is 0, making this
    // effectively a single-level map. In this case, arenas[0]
    // will never be nil.
    // 一组 heapArena 组成,每一个 heapArena 都包含了连续的 pagesPerArena 个 span,这个主要是为 mheap 管理 span 和垃圾回收服务,heapArena 也有介绍
    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

    // heapArenaAlloc is pre-reserved space for allocating heapArena
    // objects. This is only used on 32-bit, where we pre-reserve
    // this space to avoid interleaving it with the heap itself.
    // 预先分配的 heapArena 对象的地址
    heapArenaAlloc linearAlloc

    // arenaHints is a list of addresses at which to attempt to
    // add more heap arenas. This is initially populated with a
    // set of general hint addresses, and grown with the bounds of
    // actual heap arena ranges.
    arenaHints *arenaHint

    // arena is a pre-reserved space for allocating heap arenas
    // (the actual arenas). This is only used on 32-bit.
    // 仅 32 位使用
    arena linearAlloc

    //_ uint32 // ensure 64-bit alignment of central

    // central free lists for small size classes.
    // the padding makes sure that the MCentrals are
    // spaced CacheLineSize bytes apart, so that each MCentral.lock
    // gets its own cache line.
    // central is indexed by spanClass.
    // mcentral 内存分配中心,mcache 没有足够的内存分配的时候,会从 mcentral 分配
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
    }

    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    treapalloc            fixalloc // allocator for treapNodes* used by large objects
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    speciallock           mutex    // lock for special record allocators.
    arenaHintAlloc        fixalloc // allocator for arenaHints

    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

mSpanList

mSpan 的链表,free busy busyLarge 上的 mSpan 都是通过链表串联起来的

type mSpanList struct {
    first *mspan // first span in list, or nil if none
    last  *mspan // last span in list, or nil if none
}

mSpan

Go 中内存管理的基本单元,是由一片连续的 8KB 的页组成的大块内存。注意,这里的页和操作系统本身的页并不是一回事,它一般是操作系统页大小的几倍。一句话概括:mspan是一个包含起始地址、mspan规格、页的数量等内容的双端链表。

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    // 该 span 锁包含的页数
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in _MSpanManual spans

    // freeindex is the slot index between 0 and nelems at which to begin scanning
    // for the next free object in this span.
    // Each allocation scans allocBits starting at freeindex until it encounters a 0
    // indicating a free object. freeindex is then adjusted so that subsequent scans begin
    // just past the newly discovered free object.
    //
    // If freeindex == nelem, this span has no free objects.
    //
    // allocBits is a bitmap of objects in this span.
    // If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
    // then object n is free;
    // otherwise, object n is allocated. Bits starting at nelem are
    // undefined and should never be referenced.
    //
    // Object n starts at address n*elemsize + (start << pageShift).
    // 用于定位下一个可用的 object, 大小范围在 0- nelems 之间
    freeindex uintptr
    // TODO: Look up nelems from sizeclass and remove this field if it
    // helps performance.
    // span 里 object 的数量
    nelems uintptr // number of object in the span.

    // Cache of the allocBits at freeindex. allocCache is shifted
    // such that the lowest bit corresponds to the bit freeindex.
    // allocCache holds the complement of allocBits, thus allowing
    // ctz (count trailing zero) to use it directly.
    // allocCache may contain bits beyond s.nelems; the caller must ignore
    // these.
    // 用于缓存 freeindex 开始的 bitmap, 缓存的 bit 值与原值相反,ctz 函数可以通过这个值快速计算出下一个 free object 的 index
    allocCache uint64

    // 分配位图,每一位代表每一块是否已经分配
    allocBits  *gcBits

    // 已经分配的 object 的数量
    allocCount  uint16     // number of allocated objects

    elemsize    uintptr    // computed from sizeclass or from npages

}

spanClass

class 表中的 class ID,和 Size Classs 相关

type spanClass uint8

mTreap

这个结构是包含 mspan 的树状结构,主要是给 freeLarge 使用,在查找对应 classsize 的大对象的时候,使用树状结构查找要比链表更快

type mTreap struct {treap *treapNode}

mtreapNode

mTreap 结构的节点,节点信息包含 mspan 和左右子节点等信息

type treapNode struct {
    right     *treapNode // all treapNodes > this treap node
    left      *treapNode // all treapNodes < this treap node
    parent    *treapNode // direct parent of this node, nil if root
    npagesKey uintptr    // number of pages in spanKey, used as primary sort key
    spanKey   *mspan     // span of size npagesKey, used as secondary sort key
    priority  uint32     // random number used by treap algorithm to keep tree probabilistically balanced
}

heapArena

heapArena 存储的是 arena 的元数据,arenas 是一组 heapArena 构成,所有的分配的内存都在 arenas 里面,大致 arenas[L1][L2] = heapArena,而对于 分配出去的内存的 address,通过 arenaIndex 可以计算出 L1 L2,从而找到该内存所对应的 arenas[L1][L2],即 heapArena

type heapArena struct {
    // bitmap stores the pointer/scalar bitmap for the words in
    // this arena. See mbitmap.go for a description. Use the
    // heapBits type to access this.
    bitmap [heapArenaBitmapBytes]byte

    // spans maps from virtual address page ID within this arena to *mspan.
    // For allocated spans, their pages map to the span itself.
    // For free spans, only the lowest and highest pages map to the span itself.
    // Internal pages map to an arbitrary span.
    // For pages that have never been allocated, spans entries are nil.
    //
    // Modifications are protected by mheap.lock. Reads can be
    // performed without locking, but ONLY from indexes that are
    // known to contain in-use or stack spans. This means there
    // must not be a safe-point between establishing that an
    // address is live and looking it up in the spans array.
    spans [pagesPerArena]*mspan
}

arenaHint

这个是记录 arena 可以增长的地址

type arenaHint struct {
    addr uintptr
    // down 为 true,表示可以扩展 arena 的大小
    down bool
    next *arenaHint
}

mcentral

mcentral 则是全局资源,为多个线程服务,当某个线程内存不足时会向 mcentral 申请,当某个线程释放内存时又会回收进 mcentral

type mcentral struct {
    lock      mutex
    spanclass spanClass
    // free object 的链表
    nonempty  mSpanList // list of spans with a free object, ie a nonempty free list
    // no free object 的链表
    empty     mSpanList // list of spans with no free objects (or cached in an mcache)

    // nmalloc is the cumulative count of objects allocated from
    // this mcentral, assuming all spans in mcaches are
    // fully-allocated. Written atomically, read under STW.
    nmalloc uint64
}

结构图

接下来,我们结合一下宏观的图示来理解一下上面的结构体之间的关联,同时对于后面的内存分配有一个简单的了解,等到后面全部讲完后,在回过头来看看这幅图,可能会对 Go 的内存分配有更清晰的认知

初始化

func mallocinit() {
    // Initialize the heap.
    // 初始化 mheap
    mheap_.init()
    _g_ := getg()
  // 获取当前 g 所在的 m 的 mcache,并初始化
    _g_.m.mcache = allocmcache()
    for i := 0x7f; i >= 0; i-- {
  var p uintptr
  switch {
  case GOARCH == "arm64" && GOOS == "darwin":
      p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
  case GOARCH == "arm64":
      p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
  case raceenabled:
    // The TSAN runtime requires the heap
    // to be in the range [0x00c000000000,
    // 0x00e000000000).
    p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
    if p >= uintptrMask&0x00e000000000 {continue}
  default:
      p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
  }
  // 保存 arena 相关属性
  hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
  hint.addr = p
  hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
}

mheap.init

func (h *mheap) init() {h.treapalloc.init(unsafe.Sizeof(treapNode{}), nil, nil, &memstats.other_sys)
    h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
    h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
    h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
    h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
    h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)

    // Don't zero mspan allocations. Background sweeping can
    // inspect a span concurrently with allocating it, so it's
    // important that the span's sweepgen survive across freeing
    // and re-allocating a span to prevent background sweeping
    // from improperly cas'ing it from 0.
    //
    // This is safe because mspan contains no heap pointers.
    h.spanalloc.zero = false

    // h->mapcache needs no init
    for i := range h.free {h.free[i].init()
        h.busy[i].init()}

    h.busylarge.init()
    for i := range h.central {h.central[i].mcentral.init(spanClass(i))
    }
}

mcentral.init

初始化某个规格的 mcentral

// Initialize a single central free list.
func (c *mcentral) init(spc spanClass) {
    c.spanclass = spc
    c.nonempty.init()
    c.empty.init()}

allocmcache

mcache 的初始化

func allocmcache() *mcache {lock(&mheap_.lock)
    c := (*mcache)(mheap_.cachealloc.alloc())
    unlock(&mheap_.lock)
    for i := range c.alloc {c.alloc[i] = &emptymspan
    }
    c.next_sample = nextSample()
    return c
}

fixalloc.alloc

fixalloc 是一个固定大小的分配器。主要用来分配一些对内存的包装的结构, 比如:mspan,mcache.. 等等, 虽然启动分配的实际使用内存是由其他内存分配器分配的。主要分配思路为: 开始的时候一次性分配一大块内存,每次请求分配一小块,释放时放在 list 链表中,由于 size 是不变的,所以不会出现内存碎片。

func (f *fixalloc) alloc() unsafe.Pointer {
    if f.size == 0 {print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
        throw("runtime: internal error")
    }
    
  // 如果 list 不要为空,直接拿
    if f.list != nil {v := unsafe.Pointer(f.list)
        f.list = f.list.next
        f.inuse += f.size
        if f.zero {memclrNoHeapPointers(v, f.size)
        }
        return v
    }
  // 如果块为空,则从系统分配中调用系统内存分配
    if uintptr(f.nchunk) < f.size {f.chunk = uintptr(persistentalloc(_FixAllocChunk, 0, f.stat))
        f.nchunk = _FixAllocChunk
    }
    // 从 chunk 中分配一个固定大小的 size,释放的时候,会回归到 list 中
    v := unsafe.Pointer(f.chunk)
    if f.first != nil {f.first(f.arg, v)
    }
    f.chunk = f.chunk + f.size
    f.nchunk -= uint32(f.size)
    f.inuse += f.size
    return v
}

初始化的工作很简单:

  1. 初始化 heap,初始化 free large 对应规格的链表,初始化 busyLarge 链表
  2. 初始化每个规格对应的 mcentral
  3. 初始化 mcache,对 mcache 里面每个对应的规格进行初始化
  4. 初始化 arenaHints,填充一组地址,后面根据真正的 arena 边界来进行扩增

分配

newObject

func newobject(typ *_type) unsafe.Pointer {return mallocgc(typ.size, typ, true)
}

mallocgc

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    
    // Set mp.mallocing to keep from being preempted by GC.
    // 加锁防止被 GC 抢占
    mp := acquirem()
    if mp.mallocing != 0 {throw("malloc deadlock")
    }
    if mp.gsignal == getg() {throw("malloc during signal")
    }
    mp.mallocing = 1

    shouldhelpgc := false
    dataSize := size
    // 获取当前线程的 mcache
    c := gomcache()
    var x unsafe.Pointer
    
    // 判断分配的对象是否 是 nil 或非指针类型
    noscan := typ == nil || typ.kind&kindNoPointers != 0
    if size <= maxSmallSize {
        if noscan && size < maxTinySize {
            // 这里开始小对象的内存分配
            
            // 对齐,调整偏移量
            off := c.tinyoffset
            // Align tiny pointer for required (conservative) alignment.
            if size&7 == 0 {off = round(off, 8)
            } else if size&3 == 0 {off = round(off, 4)
            } else if size&1 == 0 {off = round(off, 2)
            }
            // 如果当前 mcache 上绑定的 tiny 块内存空间足够,直接分配,并返回
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            // Allocate a new maxTinySize block.
            // 当前 mcache 上的 tiny 块内存空间不足,重新分配一块 tiny 块内存
            span := c.alloc[tinySpanClass]
            
            // 尝试从 allocCache 获取内存,获取不到返回 0
            v := nextFreeFast(span)
            if v == 0 {
                // 没有从 allocCache 获取到内存,netxtFree 函数 尝试从 mcentral 获取一个新的对应规格的快内存,替换原先内存空间不足的内存块,并分配内存,后面解析 nextFree 函数
                v, _, shouldhelpgc = c.nextFree(tinySpanClass)
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
            if size < c.tinyoffset || c.tiny == 0 {c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
        } else {
            // 这里开始 正常对象的 内存分配
            
            // 首先查表,以确定 sizeclass
            var sizeclass uint8
            if size <= smallSizeMax-8 {sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            // 找到对应 sizeclass(后面 ` 规格 ` 来代替)的 span
            span := c.alloc[spc]
            // 同小对象分配一样,尝试从 allocCache 获取内存,获取不到返回 0
            v := nextFreeFast(span)
            if v == 0 {v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
        }
    } else {
        // 这里开始大对象的分配
        // 大对象的分配与 小对象 和普通对象 的分配有点不一样,大对象直接从 mheap 上分配
        var s *mspan
        shouldhelpgc = true
        systemstack(func() {s = largeAlloc(size, needzero, noscan)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }
    
    // bitmap 标记...
    // 检查出发条件,启动垃圾回收 ...

    return x
}

整理一下 这段代码的基本思路:

  1. 首先判定 对象是 大对象 还是 普通对象还是 小对象
  2. 如果是 小对象

    1. 从 mcache 的 alloc 找到对应 classsize 的 mspan
    2. 如果当前 mspan 有足够的空间,分配并修改 mspan 的相关属性(nextFreeFast 函数中实现)
    3. 如果当前 mspan 没有足够的空间,从 mcentral 重新获取一块 对应 classsize 的 mspan,替换原先的 mspan,然后 分配并修改 mspan 的相关属性
  3. 如果是普通对象,逻辑大致同小对象的 内存分配

    1. 首先查表,以确定 需要分配内存的对象的 sizeclass,并找到 对应 classsize 的 mspan
    2. 如果当前 mspan 有足够的空间,分配并修改 mspan 的相关属性(nextFreeFast 函数中实现)
    3. 如果当前 mspan 没有足够的空间,从 mcentral 重新获取一块 对应 classsize 的 mspan,替换原先的 mspan,然后 分配并修改 mspan 的相关属性
  4. 如果是大对象,直接从 mheap 进行分配,这里的实现依靠 largeAlloc 函数实现,我们先跟一下这个函数

largeAlloc

func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {// print("largeAlloc size=", size, "\n")
    
  // 内存溢出判断
    if size+_PageSize < size {throw("out of memory")
    }
  
  // 计算出对象所需的页数
    npages := size >> _PageShift
    if size&_PageMask != 0 {npages++}

    // Deduct credit for this span allocation and sweep if
    // necessary. mHeap_Alloc will also sweep npages, so this only
    // pays the debt down to npage pages.
    deductSweepCredit(npages*_PageSize, npages)
    
  // 分配函数的具体实现
    s := mheap_.alloc(npages, makeSpanClass(0, noscan), true, needzero)
    if s == nil {throw("out of memory")
    }
    s.limit = s.base() + size
  // bitmap 记录分配的 span
    heapBitsForAddr(s.base()).initSpan(s)
    return s
}

mheap.alloc

func (h *mheap) alloc(npage uintptr, spanclass spanClass, large bool, needzero bool) *mspan {
    // Don't do any operations that lock the heap on the G stack.
    // It might trigger stack growth, and the stack growth code needs
    // to be able to allocate heap.
    var s *mspan
    systemstack(func() {s = h.alloc_m(npage, spanclass, large)
    })

    if s != nil {
        if needzero && s.needzero != 0 {memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
        }
        s.needzero = 0
    }
    return s
}

mheap.alloc_m

根据页数从 heap 上面分配一个新的 span,并且在 HeapMap 和 HeapMapCache 上记录对象的 sizeclass

func (h *mheap) alloc_m(npage uintptr, spanclass spanClass, large bool) *mspan {_g_ := getg()
    if _g_ != _g_.m.g0 {throw("_mheap_alloc not on g0 stack")
    }
    lock(&h.lock)

    // 清理垃圾,内存块状态标记 省略...
    
    // 从 heap 中获取指定页数的 span
    s := h.allocSpanLocked(npage, &memstats.heap_inuse)
    if s != nil {
        // Record span info, because gc needs to be
        // able to map interior pointer to containing span.
        atomic.Store(&s.sweepgen, h.sweepgen)
        h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list.// 忽略
        s.state = _MSpanInUse
        s.allocCount = 0
        s.spanclass = spanclass
    // 重置 span 的状态
        if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
            s.elemsize = s.npages << _PageShift
            s.divShift = 0
            s.divMul = 0
            s.divShift2 = 0
            s.baseMask = 0
        } else {s.elemsize = uintptr(class_to_size[sizeclass])
            m := &class_to_divmagic[sizeclass]
            s.divShift = m.shift
            s.divMul = m.mul
            s.divShift2 = m.shift2
            s.baseMask = m.baseMask
        }

        // update stats, sweep lists
        h.pagesInUse += uint64(npage)
        if large {
      // 更新 mheap 中大对象的相关属性
            memstats.heap_objects++
            mheap_.largealloc += uint64(s.elemsize)
            mheap_.nlargealloc++
            atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
            // Swept spans are at the end of lists.
      // 根据页数判断是 busy 还是 busylarge 链表,并追加到末尾
            if s.npages < uintptr(len(h.busy)) {h.busy[s.npages].insertBack(s)
            } else {h.busylarge.insertBack(s)
            }
        }
    }
    // gc trace 标记,省略...
    unlock(&h.lock)
    return s
}
mheap.allocSpanLocked

分配一个给定大小的 span,并将分配的 span 从 freelist 中移除

func (h *mheap) allocSpanLocked(npage uintptr, stat *uint64) *mspan {
    var list *mSpanList
    var s *mspan

    // Try in fixed-size lists up to max.
  // 先尝试获取指定页数的 span,如果没有,则试试页数更多的
    for i := int(npage); i < len(h.free); i++ {list = &h.free[i]
        if !list.isEmpty() {
            s = list.first
            list.remove(s)
            goto HaveSpan
        }
    }
    // Best fit in list of large spans.
  // 从 freelarge 上找到一个合适的 span 节点返回,下面继续分析这个函数
    s = h.allocLarge(npage) // allocLarge removed s from h.freelarge for us
    if s == nil {
    // 如果 freelarge 上找不到合适的 span 节点,就只有从 系统 重新分配了
    // 后面继续分析这个函数
        if !h.grow(npage) {return nil}
    // 从系统分配后,再次到 freelarge 上寻找合适的节点
        s = h.allocLarge(npage)
        if s == nil {return nil}
    }

HaveSpan:
  // 从 free 上面获取到了 合适页数的 span
    // Mark span in use. 省略....
    
    if s.npages > npage {
        // Trim extra and put it back in the heap.
    // 创建一个 s.napges - npage 大小的 span,并放回 heap
        t := (*mspan)(h.spanalloc.alloc())
        t.init(s.base()+npage<<_PageShift, s.npages-npage)
    // 更新获取到的 span s 的属性
        s.npages = npage
        h.setSpan(t.base()-1, s)
        h.setSpan(t.base(), t)
        h.setSpan(t.base()+t.npages*pageSize-1, t)
        t.needzero = s.needzero
        s.state = _MSpanManual // prevent coalescing with s
        t.state = _MSpanManual
        h.freeSpanLocked(t, false, false, s.unusedsince)
        s.state = _MSpanFree
    }
    s.unusedsince = 0
    // 将 s 放到 spans 和 arenas 数组里面
    h.setSpans(s.base(), npage, s)

    *stat += uint64(npage << _PageShift)
    memstats.heap_idle -= uint64(npage << _PageShift)

    //println("spanalloc", hex(s.start<<_PageShift))
    if s.inList() {throw("still in list")
    }
    return s
}
mheap.allocLarge

从 mheap 的 freeLarge 树上面找到一个指定 page 数量的 span,并将该 span 从树上移除,找不到则返回 nil

func (h *mheap) allocLarge(npage uintptr) *mspan {
    // Search treap for smallest span with >= npage pages.
    return h.freelarge.remove(npage)
}

// 上面的 h.freelarge.remove 即调用这个函数
// 典型的二叉树寻找算法
func (root *mTreap) remove(npages uintptr) *mspan {
    t := root.treap
    for t != nil {
        if t.spanKey == nil {throw("treap node with nil spanKey found")
        }
        if t.npagesKey < npages {t = t.right} else if t.left != nil && t.left.npagesKey >= npages {t = t.left} else {
            result := t.spanKey
            root.removeNode(t)
            return result
        }
    }
    return nil
}

注:在看《Go 语言学习笔记》的时候,这里的查找算法还是 对链表的 遍历查找

mheap.grow

在 mheap.allocSpanLocked 这个函数中,如果 freelarge 上找不到合适的 span 节点,就只有从 系统 重新分配了,那我们接下来就继续分析一下这个函数的实现

func (h *mheap) grow(npage uintptr) bool {
    ask := npage << _PageShift
  // 向系统申请内存,后面继续追踪 sysAlloc 这个函数
    v, size := h.sysAlloc(ask)
    if v == nil {print("runtime: out of memory: cannot allocate", ask, "-byte block (", memstats.heap_sys, "in use)\n")
        return false
    }

    // Create a fake "in use" span and free it, so that the
    // right coalescing happens.
  // 创建 span 来管理刚刚申请的内存
    s := (*mspan)(h.spanalloc.alloc())
    s.init(uintptr(v), size/pageSize)
    h.setSpans(s.base(), s.npages, s)
    atomic.Store(&s.sweepgen, h.sweepgen)
    s.state = _MSpanInUse
    h.pagesInUse += uint64(s.npages)
  // 将刚刚申请的 span 放到 arenas 和 spans 数组里面
    h.freeSpanLocked(s, false, true, 0)
    return true
}
mheao.sysAlloc
func (h *mheap) sysAlloc(n uintptr) (v unsafe.Pointer, size uintptr) {n = round(n, heapArenaBytes)

   // First, try the arena pre-reservation.
 // 从 arena 中 获取对应大小的内存,获取不到返回 nil
   v = h.arena.alloc(n, heapArenaBytes, &memstats.heap_sys)
   if v != nil {
   // 从 arena 获取到需要的内存,跳转到 mapped 操作
       size = n
       goto mapped
   }

   // Try to grow the heap at a hint address.
 // 尝试 从 arenaHint 向下扩展内存
   for h.arenaHints != nil {
       hint := h.arenaHints
       p := hint.addr
       if hint.down {p -= n}
       if p+n < p {
           // We can't use this, so don't ask.
     // 表名 hint.down = false 不能向下扩展内存
           v = nil
       } else if arenaIndex(p+n-1) >= 1<<arenaBits {
     // 超出 heap 可寻址的内存地址,不能使用
           // Outside addressable heap. Can't use.
           v = nil
       } else {
     // 当前 hint 可以向下扩展内存,利用 mmap 向系统申请内存
           v = sysReserve(unsafe.Pointer(p), n)
       }
       if p == uintptr(v) {
           // Success. Update the hint.
           if !hint.down {p += n}
           hint.addr = p
           size = n
           break
       }
       // Failed. Discard this hint and try the next.
       //
       // TODO: This would be cleaner if sysReserve could be
       // told to only return the requested address. In
       // particular, this is already how Windows behaves, so
       // it would simply things there.
       if v != nil {sysFree(v, n, nil)
       }
       h.arenaHints = hint.next
       h.arenaHintAlloc.free(unsafe.Pointer(hint))
   }

   if size == 0 {
       if raceenabled {
           // The race detector assumes the heap lives in
           // [0x00c000000000, 0x00e000000000), but we
           // just ran out of hints in this region. Give
           // a nice failure.
           throw("too many address space collisions for -race mode")
       }

       // All of the hints failed, so we'll take any
       // (sufficiently aligned) address the kernel will give
       // us.
       v, size = sysReserveAligned(nil, n, heapArenaBytes)
       if v == nil {return nil, 0}

       // Create new hints for extending this region.
       hint := (*arenaHint)(h.arenaHintAlloc.alloc())
       hint.addr, hint.down = uintptr(v), true
       hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
       hint = (*arenaHint)(h.arenaHintAlloc.alloc())
       hint.addr = uintptr(v) + size
       hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
   }

   // Check for bad pointers or pointers we can't use.
   {
       var bad string
       p := uintptr(v)
       if p+size < p {bad = "region exceeds uintptr range"} else if arenaIndex(p) >= 1<<arenaBits {bad = "base outside usable address space"} else if arenaIndex(p+size-1) >= 1<<arenaBits {bad = "end outside usable address space"}
       if bad != "" {
           // This should be impossible on most architectures,
           // but it would be really confusing to debug.
           print("runtime: memory allocated by OS [", hex(p), ",", hex(p+size), ") not in usable address space:", bad, "\n")
           throw("memory reservation exceeds address space limit")
       }
   }

   if uintptr(v)&(heapArenaBytes-1) != 0 {throw("misrounded allocation in sysAlloc")
   }

   // Back the reservation.
   sysMap(v, size, &memstats.heap_sys)

mapped:
   // Create arena metadata.
 // 根据 v 的 address,计算出 arenas 的 L1 L2
   for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {l2 := h.arenas[ri.l1()]
       if l2 == nil {// 如果 L2 为 nil,则分配 arenas[L1]
           // Allocate an L2 arena map.
           l2 = (*[1 << arenaL2Bits]*heapArena)(persistentalloc(unsafe.Sizeof(*l2), sys.PtrSize, nil))
           if l2 == nil {throw("out of memory allocating heap arena map")
           }
           atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
       }
       
   // 如果 arenas[ri.L1()][ri.L2()] 不为空 说明已经实例化过了
       if l2[ri.l2()] != nil {throw("arena already initialized")
       }
       var r *heapArena
   // 从 arena 上分配内存
       r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
       if r == nil {r = (*heapArena)(persistentalloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
           if r == nil {throw("out of memory allocating heap arena metadata")
           }
       }

       // Store atomically just in case an object from the
       // new heap arena becomes visible before the heap lock
       // is released (which shouldn't happen, but there's
       // little downside to this).
       atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
   }
   // 省略部分代码...
   return
}

至此,大对象的分配流程至此结束,我们继续看一下,小对象和普通话对象的分配流程

小对象和普通对象分配

下面一段是 小对象和普通对象的内存查找和分配的主要函数,在上面的时候已经分析过了,下面我们就着重分析这两个函数

            span := c.alloc[spc]
            v := nextFreeFast(span)
            if v == 0 {v, _, shouldhelpgc = c.nextFree(spc)
            }

nextFreeFast

这个函数返回 span 上可用的地址,如果找不到 则返回 0

func nextFreeFast(s *mspan) gclinkptr {
  // 计算 s.allocCache 从低位起有多少个 0
    theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
    if theBit < 64 {result := s.freeindex + uintptr(theBit)
        if result < s.nelems {
            freeidx := result + 1
            if freeidx%64 == 0 && freeidx != s.nelems {return 0}
      // 更新 bitmap、可用的 slot 索引
            s.allocCache >>= uint(theBit + 1)
            s.freeindex = freeidx
            s.allocCount++
      // 返回 找到的内存的地址
            return gclinkptr(result*s.elemsize + s.base())
        }
    }
    return 0
}

mcache.nextFree

如果 nextFreeFast 找不到 合适的内存,就会进入这个函数

nextFree 如果在 cached span 里面找到未使用的 object,则返回,否则,调用 refill 函数,从 central 中获取对应 classsize 的 span,然后 从新的 span 里面找到未使用的 object 返回

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    // 先找到 mcache 中 对应 规格的 span
  s = c.alloc[spc]
    shouldhelpgc = false
  // 在 当前 span 中找到合适的 index 索引
    freeIndex := s.nextFreeIndex()
    if freeIndex == s.nelems {
        // The span is full.
    // freeIndex == nelems 时,表示当前 span 已满
        if uintptr(s.allocCount) != s.nelems {println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
            throw("s.allocCount != s.nelems && freeIndex == s.nelems")
        }
    // 调用 refill 函数,从 mcentral 中获取可用的 span,并替换掉当前 mcache 里面的 span
        systemstack(func() {c.refill(spc)
        })
        shouldhelpgc = true
        s = c.alloc[spc]
        
    // 再次到新的 span 里面查找合适的 index
        freeIndex = s.nextFreeIndex()}

    if freeIndex >= s.nelems {throw("freeIndex is not valid")
    }
    
  // 计算出来 内存地址,并更新 span 的属性
    v = gclinkptr(freeIndex*s.elemsize + s.base())
    s.allocCount++
    if uintptr(s.allocCount) > s.nelems {println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
        throw("s.allocCount > s.nelems")
    }
    return
}

mcache.refill

Refill 根据指定的 sizeclass 获取对应的 span,并作为 mcache 的新的 sizeclass 对应的 span

func (c *mcache) refill(spc spanClass) {_g_ := getg()

    _g_.m.locks++
    // Return the current cached span to the central lists.
    s := c.alloc[spc]

    if uintptr(s.allocCount) != s.nelems {throw("refill of span with free space remaining")
    }
    
  // 判断 s 是不是 空的 span
    if s != &emptymspan {s.incache = false}
    // 尝试从 mcentral 获取一个新的 span 来代替老的 span
    // Get a new cached span from the central lists.
    s = mheap_.central[spc].mcentral.cacheSpan()
    if s == nil {throw("out of memory")
    }

    if uintptr(s.allocCount) == s.nelems {throw("span has no free space")
    }
    // 更新 mcache 的 span
    c.alloc[spc] = s
    _g_.m.locks--
}
mcentral.cacheSpan
func (c *mcentral) cacheSpan() *mspan {
    // Deduct credit for this span allocation and sweep if necessary.
    spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
    // 清理垃圾...
    lock(&c.lock)

    sg := mheap_.sweepgen
retry:
    var s *mspan
    for s = c.nonempty.first; s != nil; s = s.next {
    // if sweepgen == h->sweepgen - 2, the span needs sweeping
    // if sweepgen == h->sweepgen - 1, the span is currently being swept
    // if sweepgen == h->sweepgen, the span is swept and ready to use
    // h->sweepgen is incremented by 2 after every GC
    // 需要清理的 span
        if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {c.nonempty.remove(s)
            c.empty.insertBack(s)
            unlock(&c.lock)
            s.sweep(true)
            goto havespan
        }
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // we have a nonempty span that does not require sweeping, allocate from it
    // 找到片 没有被 清理的 span,分配,跳转到 havespan 标签继续处理
        c.nonempty.remove(s)
        c.empty.insertBack(s)
        unlock(&c.lock)
        goto havespan
    }
    
  // 对于 上一轮循环中,可能 正在清扫的 span,清扫后的 span 可能会有有用的 span,所以在这里 在进行一次遍历检查
    for s = c.empty.first; s != nil; s = s.next {if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            // we have an empty span that requires sweeping,
            // sweep it and see if we can free some space in it
            c.empty.remove(s)
            // swept spans are at the end of the list
            c.empty.insertBack(s)
            unlock(&c.lock)
            s.sweep(true)
            freeIndex := s.nextFreeIndex()
            if freeIndex != s.nelems {
                s.freeindex = freeIndex
                goto havespan
            }
            lock(&c.lock)
            // the span is still empty after sweep
            // it is already in the empty list, so just retry
            goto retry
        }
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // already swept empty span,
        // all subsequent ones must also be either swept or in process of sweeping
        break
    }

    unlock(&c.lock)

    // Replenish central list if empty.
  // 找不到 合适的 span,补充对应 classsize 的 span,grow 函数会调用 mheap.alloc 来填充 span,上面已经分析过了,不再赘述
    s = c.grow()
    if s == nil {return nil}
    lock(&c.lock)
  // 插入到 empty span list 后面
    c.empty.insertBack(s)
    unlock(&c.lock)

    // At this point s is a non-empty span, queued at the end of the empty list,
    // c is unlocked.
havespan:

    cap := int32((s.npages << _PageShift) / s.elemsize)
    n := cap - int32(s.allocCount)
    if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {throw("span has no free objects")
    }
    // Assume all objects from this span will be allocated in the
    // mcache. If it gets uncached, we'll adjust this.
    atomic.Xadd64(&c.nmalloc, int64(n))
    usedBytes := uintptr(s.allocCount) * s.elemsize
    atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
    // 表示 span 为正在使用
    s.incache = true
    freeByteBase := s.freeindex &^ (64 - 1)
    whichByte := freeByteBase / 8
  // 更新 bitmap
    // Init alloc bits cache.
    s.refillAllocCache(whichByte)

    // Adjust the allocCache so that s.freeindex corresponds to the low bit in
    // s.allocCache.
    s.allocCache >>= s.freeindex % 64

    return s
}

到这里,如果 从 mcentral 找不到对应的 span,就开始了内存扩张之旅了,也就是我们上面分析的 mheap.alloc,后面的分析就同上了

分配小结

综上,可以看出 Go 的内存分配的大致流程如下

  1. 首先判定 对象是 大对象 还是 普通对象还是 小对象
  2. 如果是 小对象

    1. 从 mcache 的 alloc 找到对应 classsize 的 mspan
    2. 如果当前 mspan 有足够的空间,分配并修改 mspan 的相关属性(nextFreeFast 函数中实现)
    3. 如果当前 mspan 没有足够的空间,从 mcentral 重新获取一块 对应 classsize 的 mspan,替换原先的 mspan,然后 分配并修改 mspan 的相关属性
    4. 如果 mcentral 没有足够的对应的 classsize 的 span,则去向 mheap 申请
    5. 如果 对应 classsize 的 span 没有了,则找一个相近的 classsize 的 span,切割并分配
    6. 如果 找不到相近的 classsize 的 span,则去向系统申请,并补充到 mheap 中
  3. 如果是普通对象,逻辑大致同小对象的 内存分配

    1. 首先查表,以确定 需要分配内存的对象的 sizeclass,并找到 对应 classsize 的 mspan
    2. 如果当前 mspan 有足够的空间,分配并修改 mspan 的相关属性(nextFreeFast 函数中实现)
    3. 如果当前 mspan 没有足够的空间,从 mcentral 重新获取一块 对应 classsize 的 mspan,替换原先的 mspan,然后 分配并修改 mspan 的相关属性
    4. 如果 mcentral 没有足够的对应的 classsize 的 span,则去向 mheap 申请
    5. 如果 对应 classsize 的 span 没有了,则找一个相近的 classsize 的 span,切割并分配
    6. 如果 找不到相近的 classsize 的 span,则去向系统申请,并补充到 mheap 中
  4. 如果是大对象,直接从 mheap 进行分配

    1. 如果 对应 classsize 的 span 没有了,则找一个相近的 classsize 的 span,切割并分配
    2. 如果 找不到相近的 classsize 的 span,则去向系统申请,并补充到 mheap 中

参考资料

《Go 语言学习笔记》

《图解 Go 语言内存分配》

《探索 Go 内存管理(分配)》

《Golang 内存管理》

正文完
 0