关于golang:Golang-内存组件之mspanmcachemcentral-和-mheap-数据结构

最新版本请查看原文：https://blog.haohtml.com/arch…

Golang 中的内存部件组成关系如下图所示

golang 内存调配组件

在学习 golang 内存时，常常会波及几个重要的数据结构，如果不相熟它们的状况下，了解它们就显得分外的吃力，所以本篇次要对相干的几个内存组件做下数据结构的介绍。

在 Golang 中，mcache、mcentral 和 mheap 是内存治理的三大组件，mcache 治理线程在本地缓存的 mspan，页 mcentral 治理着全局的 mspan 为所有 mcache 提供所有线程。

依据调配对象的大小，外部会应用不同的内存分配机制，具体参考函数 mallocgo()

<16b 会应用渺小对象内存分配器，次要应用 mcache.tinyXXX 这类的字段
16-32b 从 P 上面的 mcache 中调配
>32b 间接从 mheap 中调配

对于 golang 中的内存申请流程，大家应该都十分相熟了，这里不再进行详细描述。

在 GPM 关系中，会在每个 P 下都有一个 mcache 字段，用来示意内存信息。

在 Go 1.2 版本前调度器应用的是 GM 模型，将 mcache 放在了 M 里，但发现存在诸多问题，期中对于内存这一块存在着微小的节约。每个M 都持有 mcache 和 stack alloc，但只有在 M 运行 Go 代码时才须要应用的内存(每个 mcache 能够高达 2mb)，当 M 在处于 syscall 或 网络申请 的时候是不须要的，再加上 M 又是容许创立多个的，这就造成了很大的节约。所以从 go 1.3 版本开始应用了 GPM 模型，这样在高并发状态下，每个 G 只有在运行的时候才会应用到内存，而每个 G 会绑定一个 P，所以它们在运行只占用一份 mcache，对于 mcache 的数量就是 P 的数量，同时并发拜访时也不会产生锁。

对于 GM 模型除了下面提供到内存节约的问题，还有其它问题，如繁多全局锁 sched.Lock、goroutine 传递问题和内存局部性等。

在 P 中，一个 mcache 除了能够用来缓存小对象外，还蕴含一些本地调配统计信息。因为在每个 P 上面都存在一个 ·mcache·，所以多个 goroutine 并发申请内存时是无锁的。

当申请一个 16b 大小的内存时，会优先从运行以后 G 所在的 P 里的 mcache 字段里找到相匹配的mspan 规格，此时最合适的是图中 mspan3 规格。

mcache 是从非 GC 内存中调配的，所以任何一个堆指针都必须通过非凡解决。源码文件：https://github.com/golang/go/blob/go1.16.2/src/runtime/mcache.go

type mcache struct {
    // 下方成员会在每次拜访 malloc 时都会被访，所以为了更加高效的缓存将按组其放在这里
    nextSample uintptr // trigger heap sample after allocating this many bytes
    scanAlloc  uintptr // bytes of scannable heap allocated

    // 小对象缓存，<16b。举荐浏览 "Tiny allocator" 正文文档
    tiny       uintptr
    tinyoffset uintptr
    tinyAllocs uintptr

    // 下方成员不会在每次 malloc 时被拜访
    alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass

    stackcache [_NumStackOrders]stackfreelist

    flushGen uint32
}

nextSample 调配多少大小的堆时触发堆采样;
scannAlloc 调配的可扫描堆字节数;
tiny 堆指针，指向以后 tiny 块的起始指针，如果以后无 tiny 块则为nil。在终止标记期间，通过调用 mcache.releaseAll() 来革除它;
tinyoffset 以后 tiny 块的地位;
tinyAllocs 领有以后 mcache 的 P 执行的渺小调配数;
alloc [numSpanClasses]*mspan 以后 P 的调配规格信息，共 numSpanClasses = _NumSizeClasses << 1 种规格
stackcache 内存规格序号，按 spanClass 索引，参考这里;
flushGen 示意上次刷新 mcache 的 sweepgen（打扫生成）。如果 flushGen != mheap_.sweepgen 则阐明 mcache 已过期须要刷新，需被被打扫。在 acrequirep 中实现;

mcache.tiny 是一个指针，当申请对象大小为 <16KB 的时候，会应用 Tiny allocator 分配器，会依据tiny、tinyoffset 和 tinyAllocs 这三个字段的状况进行申请。

span 大小规格数据共有 67 类。源码里定义的尽管是 _NumSizeClasses = 68 类，但其中蕴含一个大小为 0 的规格，此规格示意大对象，即 >32KB，此种对象只会调配到 heap 上，所以不可能呈现在 mcache.alloc 中。

mcache.alloc 是一个数组，值为 *spans 类型，它是 go 中治理内存的根本单元。对于 16-32 kb 大小的内存都会应用这个数组里的的 spans 中调配。每个 span 存在两次，一个 不蕴含指针 的对象列表和另一个 蕴含指针 的对象列表。这种区别将使垃圾收集的工作更容易，因为它不用扫描不蕴含任何指针的范畴。

mspan 是分配内存时的根本单元。当分配内存时，会在 mcache 中查找适宜规格的可用 mspan，此时不须要加锁，因而调配效率极高。
Go 将内存块分为大小不同的 67 种，而后再把这 67 种大内存块，一一分为小块 (能够近似了解为大小不同的相当于page) 称之为span(间断的page)，在 go 语言中就是上文提及的mspan。

对象调配的时候，依据对象的大小抉择大小相近的span。

spans 与 mcache 的关系如下图所示

// mSpanList heads a linked list of spans.
// 指向 spans 链表
//go:notinheap
type mSpanList struct {
    first *mspan // first span in list, or nil if none
    last  *mspan // last span in list, or nil if none
}

//go:notinheap
type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in mSpanManual spans

    freeindex uintptr

    nelems uintptr // number of object in the span.

    allocCache uint64

    allocBits  *gcBits
    gcmarkBits *gcBits

    // sweep generation:
    // if sweepgen == h->sweepgen - 2, the span needs sweeping
    // if sweepgen == h->sweepgen - 1, the span is currently being swept
    // if sweepgen == h->sweepgen, the span is swept and ready to use
    // if sweepgen == h->sweepgen + 1, the span was cached before sweep began and is still cached, and needs sweeping
    // if sweepgen == h->sweepgen + 3, the span was swept and then cached and is still cached
    // h->sweepgen is incremented by 2 after every GC

    sweepgen    uint32
    divMul      uint16        // for divide by elemsize - divMagic.mul
    baseMask    uint16        // if non-0, elemsize is a power of 2, & this will get object allocation base
    allocCount  uint16        // number of allocated objects
    spanclass   spanClass     // size class and noscan (uint8)
    state       mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
    needzero    uint8         // needs to be zeroed before allocation
    divShift    uint8         // for divide by elemsize - divMagic.shift
    divShift2   uint8         // for divide by elemsize - divMagic.shift2
    elemsize    uintptr       // computed from sizeclass or from npages
    limit       uintptr       // end of data in span
    speciallock mutex         // guards specials list
    specials    *special      // linked list of special records sorted by offset.
}

mSpanList 是一个 mspans 链表，这个很好了解。重点看下 mspan 构造体

next 指向下一个 span 的指针，为 nil 示意没有
prev 指向上一个 span 的指针，与 next 相同
list 指向 mSpanList，调试应用，当前会废除
startAddr span 第一个字节地址，可通过 s.base() 函数读取
npages span 中的页数（一个 span 是由多个 page 组成的，与 linux 中的页不是同一个概念）
manualFreeList 在 mSpanManual spans 中的闲暇对象的列表

freeindex 标记 0~nelems 之间的插槽索引，标记的的是在 span 中的下一个闲暇对象;

每次分配内存都从 `allocBits` 的 `freeindex` 索引地位开始，直到遇到 `0` , 示意闲暇对象，而后调整 `freeindex` 使得下一次扫描能跳过上一次的调配；若 `freeindex==nelem`，则以后 span 没有了空余对象；allocBits 是对象在 span 中的位图；如果 `n >= freeindex and allocBits[n/8] & (1<<(n%8)) == 0` , 那么对象 n 是闲暇的；否则，对象 n 示意已被调配。从 elem 开始的是未定义的，将不应该被定义；

nelems span 中对象数（page 是内存存储的根本单元, 一个 span 由多个 page 组成，同时一个对象可能占用一个或多个 page)
allocCache 在 freeindex 地位的 allocBits 缓存
allocBits 标记 span 中的 elem 哪些是被应用的，哪些是未被应用的；革除后将开释 allocBits，并将 allocBits 设置为 gcmarkBits。
gcmarkBits 标记 span 中的 elem 哪些是被标记了的，哪些是未被标记的

mentral 是一个闲暇列表。

实际上 mcentral 它并不蕴含闲暇对象列表，真正蕴含的是 mspan。

每个mcentral 是两个 mspans 列表：闲暇对象 c->notempty 和齐全调配对象 c->empty，如图所示

当申请一个 16b 大小的内存时，如果 p.mcache 中无可用大小内存时，则它找一个最合适的规定 mcentral 查找，如图所示这时会在寄存 16b 大小的 mcentral 中的 notempty 里查找。

文件源码：https://github.com/golang/go/blob/go1.16.2/src/runtime/mcentral.go

type mcentral struct {
    spanclass spanClass
    partial [2]spanSet // list of spans with a free object
    full    [2]spanSet // list of spans with no free objects
}

spanClass 指以后规格大小
partial 存在闲暇对象 spans 列表
full 无闲暇对象 spans 列表

其中 partial 和 full 都蕴含两个 spans 集数组。一个用在扫描 spans, 另一个用在未扫描 spans。在每轮 GC 期间都扮演着不同的角色。mheap_.sweepgen 在每轮 gc 期间都会递增 2。

partial 和 full 的数据类型为 spanSet，示意 *mspans 集。

type spanSet struct {
    spineLock mutex
    spine     unsafe.Pointer // *[N]*spanSetBlock, accessed atomically
    spineLen  uintptr        // Spine array length, accessed atomically
    spineCap  uintptr        // Spine array cap, accessed under lock

    index headTailIndex
}

对 mcentral 的初始化如下

// Initialize a single central free list.
func (c *mcentral) init(spc spanClass) {
    c.spanclass = spc
    lockInit(&c.partial[0].spineLock, lockRankSpanSetSpine)
    lockInit(&c.partial[1].spineLock, lockRankSpanSetSpine)
    lockInit(&c.full[0].spineLock, lockRankSpanSetSpine)
    lockInit(&c.full[1].spineLock, lockRankSpanSetSpine)
}

还是下面的例子，如果申请 16b 内存时，顺次通过 mcache 和 mcentral 都没有可用合适规定的大小内存，这时候会向 mheap 申请一块内存。而后按指定规格划分为一些列表，并将其增加到雷同规格大小的 mcentral 的 not empty list 前面；

Go 没法应用工作线程的本地缓存 mcache 和全局核心缓存 mcentral 上治理超过 32KB 的内存调配，所以对于那些超过 32KB 的内存申请，会间接从堆上 (mheap) 上调配对应的数量的内存页 (每页大小是 8KB) 给程序。

type mheap struct {
    // lock must only be acquired on the system stack, otherwise a g
    // could self-deadlock if its stack grows with the lock held.
    lock      mutex
    pages     pageAlloc // page allocation data structure
    sweepgen  uint32    // sweep generation, see comment in mspan; written during STW
    sweepdone uint32    // all spans are swept
    sweepers  uint32    // number of active sweepone calls

    allspans []*mspan // all spans out there

    _ uint32 // align uint64 fields on 32-bit for atomics

    // Proportional sweep
    pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
    pagesSwept         uint64  // pages swept this cycle; updated atomically
    pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
    sweepHeapLiveBasis uint64  // value of heap_live to use as the origin of sweep ratio; written with lock, read without
    sweepPagesPerByte  float64 // proportional sweep ratio; written with lock, read without

    scavengeGoal uint64

    // Page reclaimer state
    // This is accessed atomically.
    reclaimIndex uint64

    // This is accessed atomically.
    reclaimCredit uintptr

    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

    heapArenaAlloc linearAlloc

    arenaHints *arenaHint

    arena linearAlloc

    allArenas []arenaIdx

    sweepArenas []arenaIdx

    markArenas []arenaIdx

    curArena struct {base, end uintptr}

    _ uint32 // ensure 64-bit alignment of central

    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
    }

    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    speciallock           mutex    // lock for special record allocators.
    arenaHintAlloc        fixalloc // allocator for arenaHints

    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

var mheap_ mheap

lock 全局锁，保障并发，所以尽量避免从 mheap 中调配
pages 页面调配的数据结构
sweepgen 打扫生成
sweepdone 打扫实现标记
sweepers 流动打扫调用 sweepone 数
allspans 所有的 spans 都是通过 mheap_ 申请，所有申请过的 mspan 都会记录在 allspans。构造体中的 lock 就是用来保障并发平安的。
pagesInUse 统计 mSpanInUse 中 spans 的页数
pagesSwept 本轮打扫的页数
pagesSweptBasis 用作打扫率
sweepHeapLiveBasis 用作扫描率的 heap_live 值
sweepPagesPerByte 打扫率
scavengeGoal 保留的堆内存总量（事后设定的），runtime 将试图返还内存给 OS
reclaimIndex 指回收的下一页在 allAreans 中的索引。具体来说，它指的是 arena allArenas[i/pagesPerArena] 的第（i%pagesPerArena）页
reclaimCredit 多余页面的备用信用。因为页回收器工作在大块中，它可能回收的比申请的要多，开释的任何备用页将转到此信用池

arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena 堆 arena 映射。它指向整个可用虚拟地址空间的每个 arena 帧的堆元数据；

应用 arenaIndex 将索引计算到此数组中；对于没有 Go 堆反对的地址空间区域，arena 映射蕴含 `nil`；一般来说，这是一个两级映射，由一个 L1 级映射和多个 L2 级映射组成；当有大量的的 arena 帧时将节俭空间，然而在许多平台 (64 位),arenaL1Bits 是 0，这实际上是一个单级映射。这种状况下 arenas[0] 永远不会为零。

heapArenaAlloc 是为调配 heapArena 对象而事后保留的空间。仅仅用于 32 位零碎。
arenaHints 试图增加更多堆 arenas 的地址列表。它最后由一组通用少许地址填充，并随实 heap arena 的界线而增长。
arena
allArenas []arenaIdx 是每个映射 arena 的 arenaIndex 索引。能够用以遍历地址空间。
sweepArenas []arenaIdx 指在 打扫周期 开始时保留的 allArenas 快照
markArenas []arenaIdx 指在 标记周期 开始时保留的 allArenas 快照
curArena 指 heap 以后增长时的 arena，它总是与 physPageSize 对齐。
central 重要字段！这个就是下面介绍的 mcentral，每种规格大小的块对应一个 mcentral。pad 是一个字节填充，用来防止伪共享（false sharing）
spanalloc 数据类型 fixalloc 是 free-list，用来调配特定大小的块。比方 cachealloc 调配 mcache 大小的块。
cachealloc 同上
其它

对于 heap 构造中的字段比拟多，有几个应用频率十分高的字段，如 allspans、arenas、allArenas、sweepArenas、markArenas 和 central。有些是与 GC 无关，有些是与内存保护治理无关。随着浏览 runtime 的时长，会越来越理解每个字段的应用场景。

https://studygolang.com/articles/29752
https://www.cnblogs.com/shijingxiang/articles/12196677.html
http://www.voidcn.com/article/p-yhcodasw-bkx.html
https://www.cnblogs.com/zpcoding/p/13259943.html#_label1_4
https://www.dazhuanlan.com/2019/09/29/5d900b0173983/
https://www.luozhiyun.com/archives/434

mcache

mspan

mcentral

mheap

参考资料