QuerySeries 办法实现
本篇就讲QuerySeries办法的实现,这儿会用到一些额定常识~
- Roaring Bitmap
- 来自 Prometheus优化的正则匹配器
tsdb.go
func (db *TSDB) QuerySeries(matcherList LabelMatcherList, start, end int64) ([]map[string]string, error) { labelList := make([]LabelList, 0) // 从 内存的有序List中获取segment for _, segment := range db.segments.Get(start, end) { // 将其加载成为segment(磁盘或者内存) segment = segment.Load() // 依据传入的正则匹配器查问Series,匹配器能够为nil data, err := segment.QuerySeries(matcherList) if err != nil { return nil, err } labelList = append(labelList, data...) } // 合并series的所有label返回 return db.mergeQuerySeries(labelList...), nil}
从内存中查问一个segment之前咱们曾经实现过,就不过多介绍了,Load()
函数也上一篇写过,也不去叙述细节,间接看外围办法QuerySeries()
同样,QuerySeries()
分为从内存查问和从磁盘查问,具体从哪儿查问,次要看之前的Load()
办法获取的segment是磁盘的还是内存的
memtable 内存
memtable.go
func (m *memtable) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) { matchSeriesIds := m.indexMap.MatchSids(m.labelVs, matcherList) ret := make([]LabelList, 0) for _, seriesID := range matchSeriesIds { s, _ := m.segment.Load(seriesID) series := s.(*memSeries) ret = append(ret, series.labels) } return ret, nil}
indexMap
保留的就是key=labelName,value = seriesList
的数据,所以上述代码,外围就是MatchSids,查问出以后segment中所有我须要的labelName对应的series。
index.go
func (mim *memtableIndexMap) MatchSids(valueList *labelValueList, matcherList LabelMatcherList) []string { mim.mutex.Lock() defer mim.mutex.Unlock() seriesIdList := newMemtableSidList() var got bool for i := len(matcherList) - 1; i >= 0; i-- { temp := newMemtableSidList() vs := valueList.Match(matcherList[i]) for _, value := range vs { mimIndex := mim.index[joinSeprator(matcherList[i].Name, value)] if mimIndex == nil || mimIndex.Size() <= 0 { continue } temp.Union(mimIndex.Copy()) } if temp == nil || temp.Size() <= 0 { return nil } if !got { seriesIdList = temp got = true continue } seriesIdList.DeleteSids(temp.Copy()) } return seriesIdList.List()}
这儿两个点
通过优化的正则优化器去查问符合条件的label
这儿差一点,Prometheus算是tsdb的 鼻祖了,举荐浏览一下它的论文,很优良
- 对于雷同的labelName,求并集获取所有的sid,对于不同的labelName,求交加去获取符合条件的sid,这样就能够获取咱们所须要的series了
正则优化器实现
label.go
的fastMatcher
留神,咱们传入的是labelMatcher,而去进行正则优化的是fastMatcher,所以须要定义两个构造体
// LabelMatcher 标签匹配器type LabelMatcher struct { Name string Value string IsRegular bool}type fastMatcher struct { regular *regexp.Regexp prefix string suffix string contains string}
查找过程 Match()
办法
func (lvl *labelValueList) Match(matcher LabelMatcher) []string { ret := make([]string, 0) if matcher.IsRegular { pattern, err := newFastRegularMatcher(matcher.Value) if err != nil { return []string{matcher.Value} } for _, value := range lvl.Get(matcher.Name) { if pattern.MatchStr(value) { ret = append(ret, value) } } return ret } return []string{matcher.Value}}
外围步骤
结构fastMatcher
func newFastRegularMatcher(value string) (*fastMatcher, error) { ret, err := regexp.Compile("^(?:" + value + ")$") if err != nil { return nil, err } parse, err := syntax.Parse(value, syntax.Perl) if err != nil { return nil, err } frm := &fastMatcher{ regular: ret, } if parse.Op == syntax.OpConcat { frm.prefix, frm.suffix, frm.contains = optimizeRegular(parse) } return frm, nil}func optimizeRegular(regular *syntax.Regexp) (prefix, sufiix, contains string) { sub := regular.Sub if len(sub) > 0 && sub[0].Op == syntax.OpBeginText { sub = sub[1:] } if len(sub) > 0 && sub[len(sub)-1].Op == syntax.OpEndText { sub = sub[:len(sub)-1] } if len(sub) <= 0 { return } if sub[0].Op == syntax.OpLiteral && (sub[0].Flags&syntax.FoldCase) == 0 { prefix = string(sub[0].Rune) } if last := len(sub) - 1; sub[last].Op == syntax.OpLiteral && (sub[last].Flags&syntax.FoldCase) == 0 { sufiix = string(sub[last].Rune) } for i := 1; i < len(sub)-1; i++ { if sub[i].Op == syntax.OpLiteral && (sub[i].Flags&syntax.FoldCase) == 0 { contains = string(sub[i].Rune) break } } return}
获取以后的lableList的所有labelName去通过fastMatcher匹配值
func (fm *fastMatcher) MatchStr(s string) bool { if !strings.EqualFold(fm.prefix, "") && !strings.HasPrefix(s, fm.prefix) { return false } if !strings.EqualFold(fm.suffix, "") && !strings.HasSuffix(s, fm.suffix) { return false } if !strings.EqualFold(fm.contains, "") && !strings.Contains(s, fm.contains) { return false } return fm.regular.MatchString(s)}
这样就能够拿到以后的segment中的所有的labelName标签,而后就能够通过index索引获取所有的seriesID,而后将其全副保存起来,就是求并集的过程
func (msl *memtableSidList) Union(other *memtableSidList) { msl.mutex.Lock() defer msl.mutex.Unlock() for key := range msl.container { msl.container[key] = struct{}{} }}
而后对于不同的labelName的seriesID,求交加
func (msl *memtableSidList) DeleteSids(other *memtableSidList) { msl.mutex.Lock() defer msl.mutex.Unlock() for key := range msl.container { _, ok := other.container[key] if !ok { delete(msl.container, key) } }}
留神,container中保留的key=sid, value = struct{}
至此,咱们就把内存的MatchSids
办法实现了,接下来实现磁盘的
disk 磁盘
disk.go
func (ds *diskSegment) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) { seriesIDList := ds.indexMap.MatchSids(ds.labelVs, matcherList) ret := make([]LabelList, 0) for _, seriesID := range seriesIDList { ret = append(ret, ds.indexMap.MatchLabels(ds.series[seriesID].Labels...)) } return ret, nil}
MatchSids()
func (dim *diskIndexMap) MatchSids(lvl *labelValueList, matcherList LabelMatcherList) []uint32 { dim.mutex.Lock() defer dim.mutex.Unlock() bitmapList := make([]*roaring.Bitmap, 0) for i := len(matcherList) - 1; i >= 0; i-- { bitmapTemp := make([]*roaring.Bitmap, 0) values := lvl.Match(matcherList[i]) for _, value := range values { didIndex := dim.label2sids[joinSeprator(matcherList[i].Name, value)] if didIndex == nil || didIndex.list.IsEmpty() { continue } bitmapTemp = append(bitmapTemp, didIndex.list) } union := roaring.ParOr(4, bitmapTemp...) if union.IsEmpty() { return nil } bitmapList = append(bitmapList, union) } return roaring.ParAnd(4, bitmapList...).ToArray()}
逻辑和下面memtable.go
的一样,然而这儿通过roaing.bitmap进行存储,通过union := roaring.ParOr(4, bitmapTemp...)
执行并集,通过roaring.ParAnd(4, bitmapList...).ToArray()
求交加
这样获取所有的sids后,memtable就是基于sids找到所有的series而后返回memtable.go
func (m *memtable) QuerySeries(matcherList LabelMatcherList) ([]LabelList, error) { matchSeriesIds := m.indexMap.MatchSids(m.labelVs, matcherList) ret := make([]LabelList, 0) for _, seriesID := range matchSeriesIds { s, _ := m.segment.Load(seriesID) series := s.(*memSeries) ret = append(ret, series.labels) } return ret, nil}
磁盘则是通过seriesID拿到labels索引,而后通过索引获取label的值,将其拼接成为Label返回,可能磁盘这儿说着有点点难懂,看下数据的流转流程就清晰了。
当咱们在执行Load()
办法将数据从磁盘中解码进去时,能够拿到meta.Labels, 而后让label2sids
保留labelName和sids,labelOrdered
保留索引和labelName
所以下面咱们是通过seriesID获取labelOrdered的索引,而后通过labelOrdered获取labelName,其实这个labelName之前存储时就是 labelName+Value存储的,所以就能够获取label的值了
github:https://github.com/azhsmesos/...
本文由mdnice多平台公布