prometheus 配置了 remote-read 之后,能够读近程的 tsdb 存储;
remote_read:
- url: "http://storage01:9090/api/v1/read"
read_recent: true
- url: "http://storage02:9090/api/v1/read"
read_recent: true
prometheus 在执行查问时,本地 tsdb 和近程 tsdb 是如何取舍的呢?
论断:
- 首先,查看 remote-read 的 readRecent 配置,默认 =false;
- 若 remote-read.readRecent=true,则本地 tsdb + 近程 tsdb 同时查问,而后将后果 merge 返回 client;
-
若 remote-read.readRecent=false,则:
- 若 Prometheus 已有 block 生成,则对于 4hour 之后的查问,仅查问本地 tsdb,不查问近程 tsdb;
- 否则,对于其它状况,须要同时查问本地 tsdb 和近程 tsdb,最初将后果 merge 返回 client;
也就是说,prometheus 运行一段时间后,为减小 API 查问提早,做了肯定的优化,对于本地能够笼罩的数据,尽量从本地 tsdb 中的查问。
一.remote read 配置加载
为每个 remote-read 配置,创立 1 个 SampleAndChunkQueryableClient,蕴含 1 个 Http client:
// storage/remote/storage.go
func (s *Storage) ApplyConfig(conf *config.Config) error {
...
readHashes := make(map[string]struct{})
queryables := make([]storage.SampleAndChunkQueryable, 0, len(conf.RemoteReadConfigs))
for _, rrConf := range conf.RemoteReadConfigs {hash, err := toHash(rrConf)
...
readHashes[hash] = struct{}{}
...
name := hash[:6]
// httpclient
c, err := newReadClient(name, &ClientConfig{
URL: rrConf.URL,
Timeout: rrConf.RemoteTimeout,
HTTPClientConfig: rrConf.HTTPClientConfig,
})
...
queryables = append(queryables, NewSampleAndChunkQueryableClient(
c,
conf.GlobalConfig.ExternalLabels,
labelsToEqualityMatchers(rrConf.RequiredMatchers),
rrConf.ReadRecent, // readRecent 参数
s.localStartTimeCallback,
))
}
s.queryables = queryables
return nil
}
二.preferLocalStorage 的逻辑
remote 查问是通过 sampleAndChunkQueryableClient.Querier() 返回的对象进行的;
- readRecent=true 时,跳过优化逻辑,间接查问近程 tsdb;
-
readRecent=false 时,能够看到,是否查问远端 tsdb,由 c.preferLocalStorage() 返回值确定:
- 若返回的 noop=true,则远端 Queries=Storage.NoopQueries(),即不查问远端 tsdb;
- 否则,执行远端 tsdb 的查问;
// storage/remote/read.go
func (c *sampleAndChunkQueryableClient) Querier(ctx context.Context, mint, maxt int64) (storage.Querier, error) {
q := &querier{
ctx: ctx,
mint: mint,
maxt: maxt,
client: c.client,
externalLabels: c.externalLabels,
requiredMatchers: c.requiredMatchers,
}
// readRecent=true 时,跳过优化逻辑,间接查问近程 tsdb
if c.readRecent {return q, nil}
var (
noop bool
err error
)
q.maxt, noop, err = c.preferLocalStorage(mint, maxt)
// 若 noop=true,则不查问远端的 tsdb
if noop {return storage.NoopQuerier(), nil
}
return q, nil
}
c.preferLocalStorage() 的实现代码:
- mint,maxt 是查问申请传入的工夫;
-
对于查问的工夫范畴 mint~maxt:
- 对于 localStartTime 之后的 (min>localStartTime),无需查问近程 tsdb,仅查问本地 tsdb 即可;
- 否则,查问本地 tsdb 和近程 tsdb,最初 merge;
// storage/remote/read.go
func (c *sampleAndChunkQueryableClient) preferLocalStorage(mint, maxt int64) (cmaxt int64, noop bool, err error) {localStartTime, err := c.callback()
cmaxt = maxt
// Avoid queries whose time range is later than the first timestamp in local DB.
if mint > localStartTime {return 0, true, nil}
// Query only samples older than the first timestamp in local DB.
if maxt > localStartTime {cmaxt = localStartTime}
return cmaxt, false, nil
}
localStartTime 的计算方法:
-
若存在 blocks,则 localStartTime=block0.minTime + 4 hour;
- 4hour = startTimeMargin
- 否则,localStartTime=time.Now() + 4hour:
// cmd/prometheus/main.go
func (s *readyStorage) StartTime() (int64, error) {if x := s.get(); x != nil {
var startTime int64
if len(x.Blocks()) > 0 {startTime = x.Blocks()[0].Meta().MinTime} else {startTime = time.Now().Unix() * 1000}
// Add a safety margin as it may take a few minutes for everything to spin up.
// s.startTimeMargin=4hour
return startTime + s.startTimeMargin, nil
}
return math.MaxInt64, tsdb.ErrNotReady
}
startTimeMargin 的计算:4hour
// cfg.tsdb.MinBlockDuration = 2hour
startTimeMargin := int64(2 * time.Duration(cfg.tsdb.MinBlockDuration).Seconds() * 1000)
综上所述,c.preferLocalStorage() 的逻辑如图所示:
- mint/maxt 为查问条件传入的起止工夫;
- localStartTime 为代码中计算的工夫;