什么是Head block?

v2.19之前,最近2hour的指标数据存储在memory。
v2.19引入Head block,最近的指标数据存储在memory,当head block满时,将数据存储到disk并通过mmap援用它。
Head block由若干个chunk组成,head chunk是memChunk,接管时序写入。

写入时序数据时,当写入head chunk和wal后,就返回写入胜利。

什么是mmap?

一般文件的读写:

  • 文件先读入kernal space;
  • 文件内容被copy值user space;
  • 用户操作文件内容;

mmap形式下文件的读写:

  • 文件被map到kernal space后,用户空间就能够读写;
  • 相比一般文件读写,缩小了一次零碎调用和一次文件copy;
  • 在多过程以只读形式共享同一个文件的场景下,将节俭大量的内存;

Head block的生命周期

1)初始状态

时序数据被写入head chunk和wal后,返回写入胜利。

2)head chunk被写满

headChunk对每个series,保留最近的120个点的数据;

const samplesPerChunk = 120

若scrape interval=15s的话,headChunk会存储30min的指标数据;
head chunk被写满后,生成新的head chunk承受指标写入,如下图所示:

同时,原head chunk被flush到disk,并mmap援用它:

3)mmap的chunks满

mmap的chunks达到chunkRange(2hour)的3/2时,如下图所示:

mmap中chunkRange(2hour)的数据将被长久化到block,同时生成checkpoint & 清理wal日志。

Head block的源码剖析

每个memSeries构造,蕴含一个headChunk,其保留1个series在mem中的数据:

// prometheus/tsdb/head.go// memSeries is the in-memory representation of a series.type memSeries struct {    ...    ref           uint64    lset          labels.Labels    ...    headChunk     *memChunk}type memChunk struct {    chunk            chunkenc.Chunk    minTime, maxTime int64}

向memSeries增加指标数据:

// prometheus/tsdb/head.go// append adds the sample (t, v) to the series.func (s *memSeries) append(t int64, v float64, appendID uint64, chunkDiskMapper *chunks.ChunkDiskMapper) (sampleInOrder, chunkCreated bool) {    // 1个chunk最多120个sample    const samplesPerChunk = 120    numSamples := c.chunk.NumSamples()    // If we reach 25% of a chunk's desired sample count, set a definitive time    // at which to start the next chunk.    // 到1/4时,从新计算nextAt(120点当前的工夫)    if numSamples == samplesPerChunk/4 {        s.nextAt = computeChunkEndTime(c.minTime, c.maxTime, s.nextAt)    }    // 达到工夫,创立新的headChunk    if t >= s.nextAt {        c = s.cutNewHeadChunk(t, chunkDiskMapper)        chunkCreated = true    }    // 向headChunk插入t/v数据    s.app.Append(t, v)    ......}

当达到nextAt后,写入老的headChunk数据,并新建headChunk:

// prometheus/tsdb/head.gofunc (s *memSeries) cutNewHeadChunk(mint int64, chunkDiskMapper *chunks.ChunkDiskMapper) *memChunk {    // 写入mmap    s.mmapCurrentHeadChunk(chunkDiskMapper)    // 新建headChunk    s.headChunk = &memChunk{        chunk:   chunkenc.NewXORChunk(),        minTime: mint,        maxTime: math.MinInt64,    }    s.nextAt = rangeForTimestamp(mint, s.chunkRange)    app, err := s.headChunk.chunk.Appender()    s.app = app    return s.headChunk}

将headChunk写入mmap:

// prometheus/tsdb/head.gofunc (s *memSeries) mmapCurrentHeadChunk(chunkDiskMapper *chunks.ChunkDiskMapper) {    chunkRef, err := chunkDiskMapper.WriteChunk(s.ref, s.headChunk.minTime, s.headChunk.maxTime, s.headChunk.chunk)    s.mmappedChunks = append(s.mmappedChunks, &mmappedChunk{        ref:        chunkRef,        numSamples: uint16(s.headChunk.chunk.NumSamples()),        minTime:    s.headChunk.minTime,        maxTime:    s.headChunk.maxTime,    })}// prometheus/tsdb/chunks/head_chunks.go// WriteChunk writes the chunk to the disk.func (cdm *ChunkDiskMapper) WriteChunk(seriesRef uint64, mint, maxt int64, chk chunkenc.Chunk) (chkRef uint64, err error) {    ....    // 写入header信息    if err := cdm.writeAndAppendToCRC32(cdm.byteBuf[:bytesWritten]); err != nil {        return 0, err    }    // 写入chunk数据    if err := cdm.writeAndAppendToCRC32(chk.Bytes()); err != nil {        return 0, err    }    if err := cdm.writeCRC32(); err != nil {        return 0, err    }    // writeBufferSize=4M        // 超过4M,则间接flush到disk    if len(chk.Bytes())+MaxHeadChunkMetaSize >= writeBufferSize {        if err := cdm.flushBuffer(); err != nil {            return 0, err        }    }    return chkRef, nil}

Head block的益处

prometheus在2.19.0的release note中提到:

能够看出,Head block带来的益处:

  • 缩小了用户态内存的占用:

    • 之前最近2hour的chunks存在memory中;
    • 引入head block后,chunks通过mmap援用,不占用用户态内存;
  • 晋升了prometheus实例重启的数据恢复速度:

    • 若没有head block,复原时须要replay所有的wal到memory;
    • 有了head block后,复原时仅需读入mmap chunks,而后replay没有mmap的局部wal即可;

参考:

1.https://ganeshvernekar.com/bl...
2.https://ganeshvernekar.com/bl...