关于prometheus:prometheus中remoteread的preferLocalStorage逻辑分析

prometheus配置了remote-read之后,能够读近程的tsdb存储;

remote_read:
  - url: "http://storage01:9090/api/v1/read"
    read_recent: true
  - url: "http://storage02:9090/api/v1/read"
    read_recent: true

prometheus在执行查问时,本地tsdb和近程tsdb是如何取舍的呢?

论断:

  • 首先,查看remote-read的readRecent配置,默认=false;
  • 若remote-read.readRecent=true,则本地tsdb + 近程tsdb同时查问,而后将后果merge返回client;
  • 若remote-read.readRecent=false,则:

    • 若Prometheus已有block生成,则对于4hour之后的查问,仅查问本地tsdb,不查问近程tsdb;
    • 否则,对于其它状况,须要同时查问本地tsdb和近程tsdb,最初将后果merge返回client;

也就是说,prometheus运行一段时间后,为减小API查问提早,做了肯定的优化,对于本地能够笼罩的数据,尽量从本地tsdb中的查问。

一.remote read配置加载

为每个remote-read配置,创立1个SampleAndChunkQueryableClient,蕴含1个Http client:

// storage/remote/storage.go
func (s *Storage) ApplyConfig(conf *config.Config) error {
    ...
    readHashes := make(map[string]struct{})
    queryables := make([]storage.SampleAndChunkQueryable, 0, len(conf.RemoteReadConfigs))
    for _, rrConf := range conf.RemoteReadConfigs {
        hash, err := toHash(rrConf)
        ...
        readHashes[hash] = struct{}{}
        ...
        name := hash[:6]
        // httpclient
        c, err := newReadClient(name, &ClientConfig{
            URL:              rrConf.URL,
            Timeout:          rrConf.RemoteTimeout,
            HTTPClientConfig: rrConf.HTTPClientConfig,
        })
        ...
        queryables = append(queryables, NewSampleAndChunkQueryableClient(
            c,
            conf.GlobalConfig.ExternalLabels,
            labelsToEqualityMatchers(rrConf.RequiredMatchers),
            rrConf.ReadRecent,                // readRecent参数
            s.localStartTimeCallback,
        ))
    }
    s.queryables = queryables
    return nil
}

二.preferLocalStorage的逻辑

remote查问是通过sampleAndChunkQueryableClient.Querier()返回的对象进行的;

  • readRecent=true时,跳过优化逻辑,间接查问近程tsdb;
  • readRecent=false时,能够看到,是否查问远端tsdb,由c.preferLocalStorage()返回值确定:

    • 若返回的noop=true,则远端Queries=Storage.NoopQueries(),即不查问远端tsdb;
    • 否则,执行远端tsdb的查问;
// storage/remote/read.go
func (c *sampleAndChunkQueryableClient) Querier(ctx context.Context, mint, maxt int64) (storage.Querier, error) {
    q := &querier{
        ctx:              ctx,
        mint:             mint,
        maxt:             maxt,
        client:           c.client,
        externalLabels:   c.externalLabels,
        requiredMatchers: c.requiredMatchers,
    }
    // readRecent=true时,跳过优化逻辑,间接查问近程tsdb
    if c.readRecent {        
        return q, nil
    }
    var (
        noop bool
        err  error
    )
    q.maxt, noop, err = c.preferLocalStorage(mint, maxt)
    // 若noop=true,则不查问远端的tsdb
    if noop {
        return storage.NoopQuerier(), nil
    }
    return q, nil
}

c.preferLocalStorage()的实现代码:

  • mint,maxt是查问申请传入的工夫;
  • 对于查问的工夫范畴mint~maxt:

    • 对于localStartTime之后的(min>localStartTime),无需查问近程tsdb,仅查问本地tsdb即可;
    • 否则,查问本地tsdb和近程tsdb,最初merge;
// storage/remote/read.go
func (c *sampleAndChunkQueryableClient) preferLocalStorage(mint, maxt int64) (cmaxt int64, noop bool, err error) {
    localStartTime, err := c.callback()
    cmaxt = maxt

    // Avoid queries whose time range is later than the first timestamp in local DB.
    if mint > localStartTime {
        return 0, true, nil
    }
    // Query only samples older than the first timestamp in local DB.
    if maxt > localStartTime {
        cmaxt = localStartTime
    }
    return cmaxt, false, nil
}

localStartTime的计算方法:

  • 若存在blocks,则localStartTime=block0.minTime + 4 hour;

    • 4hour = startTimeMargin
  • 否则,localStartTime=time.Now() + 4hour:
// cmd/prometheus/main.go
func (s *readyStorage) StartTime() (int64, error) {
    if x := s.get(); x != nil {
        var startTime int64
        if len(x.Blocks()) > 0 {
            startTime = x.Blocks()[0].Meta().MinTime
        } else {
            startTime = time.Now().Unix() * 1000
        }
        // Add a safety margin as it may take a few minutes for everything to spin up.
        // s.startTimeMargin=4hour
        return startTime + s.startTimeMargin, nil
    }
    return math.MaxInt64, tsdb.ErrNotReady
}

startTimeMargin的计算:4hour

// cfg.tsdb.MinBlockDuration = 2hour
startTimeMargin := int64(2 * time.Duration(cfg.tsdb.MinBlockDuration).Seconds() * 1000)

综上所述,c.preferLocalStorage()的逻辑如图所示:

  • mint/maxt为查问条件传入的起止工夫;
  • localStartTime为代码中计算的工夫;

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理