QuerySeries 办法实现

本篇就讲QuerySeries办法的实现,这儿会用到一些额定常识~

  • Roaring Bitmap
  • 来自 Prometheus优化的正则匹配器

tsdb.go

func (db *TSDB) QuerySeries(matcherList LabelMatcherList, start, end int64) ([]map[string]string, error) {    labelList := make([]LabelList, 0)  // 从 内存的有序List中获取segment    for _, segment := range db.segments.Get(start, end) {  // 将其加载成为segment(磁盘或者内存)        segment = segment.Load()    // 依据传入的正则匹配器查问Series,匹配器能够为nil        data, err := segment.QuerySeries(matcherList)        if err != nil {            return nil, err        }        labelList = append(labelList, data...)    }  // 合并series的所有label返回    return db.mergeQuerySeries(labelList...), nil}

从内存中查问一个segment之前咱们曾经实现过,就不过多介绍了,Load()函数也上一篇写过,也不去叙述细节,间接看外围办法QuerySeries()

同样,QuerySeries()分为从内存查问和从磁盘查问,具体从哪儿查问,次要看之前的Load()办法获取的segment是磁盘的还是内存的

memtable 内存

memtable.go

func (m *memtable) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) {    matchSeriesIds := m.indexMap.MatchSids(m.labelVs, matcherList)    ret := make([]LabelList, 0)    for _, seriesID := range matchSeriesIds {        s, _ := m.segment.Load(seriesID)        series := s.(*memSeries)        ret = append(ret, series.labels)    }    return ret, nil}

indexMap保留的就是key=labelName,value = seriesList的数据,所以上述代码,外围就是MatchSids,查问出以后segment中所有我须要的labelName对应的series。

index.go

func (mim *memtableIndexMap) MatchSids(valueList *labelValueList, matcherList LabelMatcherList) []string {    mim.mutex.Lock()    defer mim.mutex.Unlock()    seriesIdList := newMemtableSidList()    var got bool    for i := len(matcherList) - 1; i >= 0; i-- {        temp := newMemtableSidList()        vs := valueList.Match(matcherList[i])        for _, value := range vs {            mimIndex := mim.index[joinSeprator(matcherList[i].Name, value)]            if mimIndex == nil || mimIndex.Size() <= 0 {                continue            }            temp.Union(mimIndex.Copy())        }        if temp == nil || temp.Size() <= 0 {            return nil        }        if !got {            seriesIdList = temp            got = true            continue        }        seriesIdList.DeleteSids(temp.Copy())    }    return seriesIdList.List()}

这儿两个点

  1. 通过优化的正则优化器去查问符合条件的label

      这儿差一点,Prometheus算是tsdb的 鼻祖了,举荐浏览一下它的论文,很优良
  2. 对于雷同的labelName,求并集获取所有的sid,对于不同的labelName,求交加去获取符合条件的sid,这样就能够获取咱们所须要的series了
正则优化器实现

label.gofastMatcher

留神,咱们传入的是labelMatcher,而去进行正则优化的是fastMatcher,所以须要定义两个构造体

// LabelMatcher 标签匹配器type LabelMatcher struct {    Name      string    Value     string    IsRegular bool}type fastMatcher struct {    regular  *regexp.Regexp    prefix   string    suffix   string    contains string}

查找过程 Match()办法

func (lvl *labelValueList) Match(matcher LabelMatcher) []string {    ret := make([]string, 0)    if matcher.IsRegular {        pattern, err := newFastRegularMatcher(matcher.Value)        if err != nil {            return []string{matcher.Value}        }        for _, value := range lvl.Get(matcher.Name) {            if pattern.MatchStr(value) {                ret = append(ret, value)            }        }        return ret    }    return []string{matcher.Value}}

外围步骤

  1. 结构fastMatcher

    func newFastRegularMatcher(value string) (*fastMatcher, error) { ret, err := regexp.Compile("^(?:" + value + ")$") if err != nil {     return nil, err } parse, err := syntax.Parse(value, syntax.Perl) if err != nil {     return nil, err } frm := &fastMatcher{     regular: ret, } if parse.Op == syntax.OpConcat {     frm.prefix, frm.suffix, frm.contains = optimizeRegular(parse) } return frm, nil}func optimizeRegular(regular *syntax.Regexp) (prefix, sufiix, contains string) { sub := regular.Sub if len(sub) > 0 && sub[0].Op == syntax.OpBeginText {     sub = sub[1:] } if len(sub) > 0 && sub[len(sub)-1].Op == syntax.OpEndText {     sub = sub[:len(sub)-1] } if len(sub) <= 0 {     return } if sub[0].Op == syntax.OpLiteral && (sub[0].Flags&syntax.FoldCase) == 0 {     prefix = string(sub[0].Rune) } if last := len(sub) - 1; sub[last].Op == syntax.OpLiteral && (sub[last].Flags&syntax.FoldCase) == 0 {     sufiix = string(sub[last].Rune) } for i := 1; i < len(sub)-1; i++ {     if sub[i].Op == syntax.OpLiteral && (sub[i].Flags&syntax.FoldCase) == 0 {         contains = string(sub[i].Rune)         break     } } return}
  2. 获取以后的lableList的所有labelName去通过fastMatcher匹配值

    func (fm *fastMatcher) MatchStr(s string) bool { if !strings.EqualFold(fm.prefix, "") && !strings.HasPrefix(s, fm.prefix) {     return false } if !strings.EqualFold(fm.suffix, "") && !strings.HasSuffix(s, fm.suffix) {     return false } if !strings.EqualFold(fm.contains, "") && !strings.Contains(s, fm.contains) {     return false } return fm.regular.MatchString(s)}

这样就能够拿到以后的segment中的所有的labelName标签,而后就能够通过index索引获取所有的seriesID,而后将其全副保存起来,就是求并集的过程

func (msl *memtableSidList) Union(other *memtableSidList) {    msl.mutex.Lock()    defer msl.mutex.Unlock()    for key := range msl.container {        msl.container[key] = struct{}{}    }}

而后对于不同的labelName的seriesID,求交加

func (msl *memtableSidList) DeleteSids(other *memtableSidList) {    msl.mutex.Lock()    defer msl.mutex.Unlock()    for key := range msl.container {        _, ok := other.container[key]        if !ok {            delete(msl.container, key)        }    }}

留神,container中保留的key=sid, value = struct{}

至此,咱们就把内存的MatchSids办法实现了,接下来实现磁盘的

disk 磁盘

disk.go

func (ds *diskSegment) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) {    seriesIDList := ds.indexMap.MatchSids(ds.labelVs, matcherList)    ret := make([]LabelList, 0)    for _, seriesID := range seriesIDList {        ret = append(ret, ds.indexMap.MatchLabels(ds.series[seriesID].Labels...))    }    return ret, nil}

MatchSids()

func (dim *diskIndexMap) MatchSids(lvl *labelValueList, matcherList LabelMatcherList) []uint32 {    dim.mutex.Lock()    defer dim.mutex.Unlock()    bitmapList := make([]*roaring.Bitmap, 0)    for i := len(matcherList) - 1; i >= 0; i-- {        bitmapTemp := make([]*roaring.Bitmap, 0)        values := lvl.Match(matcherList[i])        for _, value := range values {            didIndex := dim.label2sids[joinSeprator(matcherList[i].Name, value)]            if didIndex == nil || didIndex.list.IsEmpty() {                continue            }            bitmapTemp = append(bitmapTemp, didIndex.list)        }        union := roaring.ParOr(4, bitmapTemp...)        if union.IsEmpty() {            return nil        }        bitmapList = append(bitmapList, union)    }    return roaring.ParAnd(4, bitmapList...).ToArray()}

逻辑和下面memtable.go的一样,然而这儿通过roaing.bitmap进行存储,通过union := roaring.ParOr(4, bitmapTemp...) 执行并集,通过roaring.ParAnd(4, bitmapList...).ToArray()求交加

这样获取所有的sids后,memtable就是基于sids找到所有的series而后返回
memtable.go

func (m *memtable) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) {    matchSeriesIds := m.indexMap.MatchSids(m.labelVs, matcherList)    ret := make([]LabelList, 0)    for _, seriesID := range matchSeriesIds {        s, _ := m.segment.Load(seriesID)        series := s.(*memSeries)        ret = append(ret, series.labels)    }    return ret, nil}

磁盘则是通过seriesID拿到labels索引,而后通过索引获取label的值,将其拼接成为Label返回,可能磁盘这儿说着有点点难懂,看下数据的流转流程就清晰了。

当咱们在执行Load()办法将数据从磁盘中解码进去时,能够拿到meta.Labels, 而后让label2sids保留labelName和sids,labelOrdered保留索引和labelName

所以下面咱们是通过seriesID获取labelOrdered的索引,而后通过labelOrdered获取labelName,其实这个labelName之前存储时就是 labelName+Value存储的,所以就能够获取label的值了

github:https://github.com/azhsmesos/...

本文由mdnice多平台公布