Retention Policy(RP)是数据保留工夫策略,超过了肯定的工夫后,老的数据会被主动删除。
联合CQ(Continuous Query)和RP,能够将历史数据保留为低精度,最近的数据保留为高精度,以升高存储用量。
RP的语法结构:
CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]
其中:
- duration指定了数据保留的工夫,当超过了这个工夫后,数据被主动删除;
- replication指定每个shard的正本数,默认为1,集群场景须要>=2;
- shard duration实际上指定每个shardGroup保留数据的工夫长度,能够不传入,零碎会依据duration主动计算一个值;
- default指定是否默认的RP,若RP为默认,创立database未指定RP时,就应用默认的RP;
influxdb内置了一个默认策略autogen:
- duration=0s示意永不过期;
- shardGroupDuration=168h示意1个shardGroup保留7天的数据;
> show retention policies;name duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 true
shardGroup vs shard:
- shardGroup蕴含若干个shard;
- shardGroup指定保留1段时间的数据,shardGroup下所有shard的数据都位于这个工夫范畴内;
看一下单机版influxdb,rp=autogen,replica=1的shards状况:
> show shard groups;name: shard groupsid database retention_policy start_time end_time expiry_time-- -------- ---------------- ---------- -------- -----------25 falcon autogen 2020-04-27T00:00:00Z 2020-05-04T00:00:00Z 2020-05-04T00:00:00Z33 falcon autogen 2020-05-04T00:00:00Z 2020-05-11T00:00:00Z 2020-05-11T00:00:00Z42 falcon autogen 2020-05-11T00:00:00Z 2020-05-18T00:00:00Z 2020-05-18T00:00:00Z51 falcon autogen 2020-05-18T00:00:00Z 2020-05-25T00:00:00Z 2020-05-25T00:00:00Z60 falcon autogen 2020-05-25T00:00:00Z 2020-06-01T00:00:00Z 2020-06-01T00:00:00Z69 falcon autogen 2020-06-01T00:00:00Z 2020-06-08T00:00:00Z 2020-06-08T00:00:00Z78 falcon autogen 2020-06-08T00:00:00Z 2020-06-15T00:00:00Z 2020-06-15T00:00:00Z
每个shardGroup保留7day的数据,1个shardGroup蕴含1个shard:
> show shards;name: falconid database retention_policy shard_group start_time end_time expiry_time owners-- -------- ---------------- ----------- ---------- -------- ----------- ------25 falcon autogen 25 2020-04-27T00:00:00Z 2020-05-04T00:00:00Z 2020-05-04T00:00:00Z33 falcon autogen 33 2020-05-04T00:00:00Z 2020-05-11T00:00:00Z 2020-05-11T00:00:00Z42 falcon autogen 42 2020-05-11T00:00:00Z 2020-05-18T00:00:00Z 2020-05-18T00:00:00Z51 falcon autogen 51 2020-05-18T00:00:00Z 2020-05-25T00:00:00Z 2020-05-25T00:00:00Z60 falcon autogen 60 2020-05-25T00:00:00Z 2020-06-01T00:00:00Z 2020-06-01T00:00:00Z69 falcon autogen 69 2020-06-01T00:00:00Z 2020-06-08T00:00:00Z 2020-06-08T00:00:00Z78 falcon autogen 78 2020-06-08T00:00:00Z 2020-06-15T00:00:00Z 2020-06-15T00:00:00Z
1.如何确定1个shardGroup蕴含几个shard?
// replicaN是创立RP时指定的正本数shardN := len(data.DataNodes) / replicaN
若有3个数据节点,每个shard 2正本,那么每个shardGroup下只有3/2=1个shard;
2.写入时序数据时,先依据工夫确定存入哪个shardGroup,那如何确定数据放入哪个shard?
写入的时序数据,计算时序数据的hash,而后 hash % shardN后,决定放入哪个shard;
// HashID returns a non-cryptographic checksum of the point's key.func (p *point) HashID() uint64 { h := NewInlineFNV64a() h.Write(p.key) //p.key是measurement+tags sum := h.Sum64() return sum}func (sgi *ShardGroupInfo) ShardFor(hash uint64) ShardInfo { return sgi.Shards[hash%uint64(len(sgi.Shards))]}
每个shard对应OS上的一个目录,目录名称是shardId,递增的整数:
/var/lib/influxdb/data/mydatabase/six_month # ls -alhtotal 0drwx------ 6 root root 42 Sep 6 08:00 .drwx------ 4 root root 38 Sep 3 14:59 ..drwxr-xr-x 3 root root 68 Sep 6 08:41 1drwxr-xr-x 3 root root 68 Sep 3 16:48 3drwxr-xr-x 3 root root 68 Sep 3 17:02 4drwxr-xr-x 3 root root 68 Sep 15 09:59 6
shard所在目录:datapath/database/retentionPolicy/shardId
func (s *Store) CreateShard(database, retentionPolicy string, shardID uint64, enabled bool) error { ...... path := filepath.Join(s.path, database, retentionPolicy, strconv.FormatUint(shardID, 10)) shard := NewShard(shardID, path, walPath, sfile, opt) ......}
3.默认shardGroupDuration的计算方法:
shardGroup duration若未指定,influxdb会依据duration计算一个,计算方法:
RP's DURATION | shardGroup duration |
---|---|
<2day | 1h |
>=2day and <=6month | 1day |
>6month | 7day |
计算方法的实现代码:
// shardGroupDuration returns the duration for a shard group based on a policy duration.func shardGroupDuration(d time.Duration) time.Duration { if d >= 180*24*time.Hour || d == 0 { // 6 months or 0 return 7 * 24 * time.Hour } else if d >= 2*24*time.Hour { // 2 days return 1 * 24 * time.Hour } return 1 * time.Hour}