京东物流:康睿 姚再毅 李振 刘斌 王北永
阐明:以下全副均基于 ElasticSearch 8.1 版本
一. 跨集群检索 – ccr
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
跨集群检索的背景和意义
跨集群检索定义
跨集群检索环境搭建
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
步骤 1:搭建两个本地单节点集群,本地练习可勾销平安配置
步骤 2:每个集群都执行以下命令
PUT \_cluster/settings {“persistent”: { “cluster”: { “remote”: { “cluster\_one”: { “seeds”: [ “172.21.0.14:9301″] },”cluster_two”: {“seeds”: [ “172.21.0.14:9302”] } } } } }
步骤 3:验证集群之间是否互通
计划 1:Kibana 可视化查看:stack Management -> Remote Clusters -> status 应该是 connected!且必须打上绿色的对号。
计划 2:GET _remote/info
跨集群查问演练
# 步骤 1 在集群 1 中增加数据如下
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster01..."}
# 步骤 2 在集群 2 中增加数据如下:PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster02..."}
# 步骤 3:执行跨集群检索如下: 语法:POST 集群名称 1: 索引名称, 集群名称 2: 索引名称 /_search
POST cluster_one:test01,cluster_two:test01/_search
{
"took" : 7,
"timed_out" : false,
"num_reduce_phases" : 3,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"_clusters" : {
"total" : 2,
"successful" : 2,
"skipped" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "cluster_two:test01",
"_id" : "1",
"_score" : 1.0,
"_source" : {"title" : "this is from cluster02..."}
},
{
"_index" : "cluster_one:test01",
"_id" : "1",
"_score" : 1.0,
"_source" : {"title" : "this is from cluster01..."}
}
]
}
}
二. 跨集群复制 – ccs – 该性能需付费
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html
如何保障集群的高可用
- 正本机制
- 快照和复原
- 跨集群复制(相似 mysql 主从同步)
跨集群复制概述
跨集群复制配置
- 筹备两个集群,网络互通
- 开启 license 应用,可试用 30 天
- 开启地位:Stack Management -> License mangement.
3. 定义好谁是 Leads 集群,谁是 follower 集群
4. 在 follower 集群配置 Leader 集群
5. 在 follower 集群配置 Leader 集群的索引同步规定(kibana 页面配置)
a.stack Management -> Cross Cluster Replication -> create a follower index.
6. 启用步骤 5 的配置
三索引模板
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
8.X 之组件模板
1. 创立组件模板 - 索引 setting 相干
# 组件模板 - 索引 setting 相干
PUT _component_template/template_sttting_part
{
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}
2. 创立组件模板 - 索引 mapping 相干
# 组件模板 - 索引 mapping 相干
PUT _component_template/template_mapping_part
{
"template": {
"mappings": {
"properties": {
"hosr_name":{"type": "keyword"},
"cratet_at":{
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
}
}
}
3. 创立组件模板 - 配置模板和索引之间的关联
// ** 留神:composed_of 如果多个组件模板中的配置项有反复,前面的会笼罩后面的,和配置的程序无关 **
# 基于组件模板,配置模板和索引之间的关联
# 也就是所有 tem_* 该表达式相干的索引创立时,都会应用到以下规定
PUT _index_template/template_1
{
"index_patterns": ["tem_*"],
"composed_of": [
"template_sttting_part",
"template_mapping_part"
]
}
4. 测试
# 创立测试
PUT tem_001
索引模板基本操作
实战演练
需要 1:默认如果不显式指定 Mapping, 数值类型会被动静映射为 long 类型,但实际上业务数值都比拟小,会存在存储节约。须要将默认值指定为 Integer
索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.htmlmapping- 动静模板, 官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html
# 联合 mapping 动静模板 和 索引模板
# 1. 创立组件模板之 - mapping 模板
PUT _component_template/template_mapping_part_01
{
"template": {
"mappings": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {"type": "integer"}
}
}
]
}
}
}
# 2. 创立组件模板与索引关联配置
PUT _index_template/template_2
{"index_patterns": ["tem1_*"],
"composed_of": ["template_mapping_part_01"]
}
# 3. 创立测试数据
POST tem1_001/_doc/1
{"age":18}
# 4. 查看 mapping 构造验证
get tem1_001/_mapping
需要 2:date_* 结尾的字段,对立匹配为 date 日期类型。
索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.htmlmapping- 动静模板, 官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html
# 联合 mapping 动静模板 和 索引模板
# 1. 创立组件模板之 - mapping 模板
PUT _component_template/template_mapping_part_01
{
"template": {
"mappings": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {"type": "integer"}
}
},
{
"date_type_process": {
"match": "date_*",
"mapping": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
]
}
}
}
# 2. 创立组件模板与索引关联配置
PUT _index_template/template_2
{"index_patterns": ["tem1_*"],
"composed_of": ["template_mapping_part_01"]
}
# 3. 创立测试数据
POST tem1_001/_doc/2
{
"age":19,
"date_aoe":"2022-01-01 18:18:00"
}
# 4. 查看 mapping 构造验证
get tem1_001/_mapping
四.LIM 索引生命周期治理
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html
什么是索引生命周期
索引的 生 -> 老 -> 病 -\> 死
是否有过思考,如果一个索引,创立之后,就不再去治理了?会产生什么?
什么是索引生命周期治理
索引太大了会如何?
大索引的复原工夫,要远比小索引复原慢的多的多索引大了当前,检索会很慢,写入和更新也会受到不同水平的影响索引大到肯定水平,当索引呈现衰弱问题,会导致整个集群外围业务不可用
最佳实际
集群的单个分片最大文档数下限:2 的 32 次幂减 1,即 20 亿左右官网倡议:分片大小管制在 30GB-50GB,若索引数据量有限增大,必定会超过这个值
用户不关注全量
某些业务场景,业务更关注近期的数据,如近 3 天、近 7 天大索引会将全副历史数据会集在一起,不利于这种场景的查问
索引生命周期治理的历史演变
LIM 前奏 – rollover 滚动索引
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html
# 0. 自测前提,lim 生命周期 rollover 频率。默认 10 分钟
PUT _cluster/settings
{
"persistent": {"indices.lifecycle.poll_interval": "1s"}
}
# 1. 创立索引,并指定别名
PUT test_index-0001
{
"aliases": {
"my-test-index-alias": {"is_write_index": true}
}
}
# 2. 批量导入数据
PUT my-test-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 3.rollover 滚动规定配置
POST my-test-index-alias/_rollover
{
"conditions": {
"max_age": "7d",
"max_docs": 5,
"max_primary_shard_size": "50gb"
}
}
# 4. 在满足条件的前提下创立滚动索引
PUT my-test-index-alias/_bulk
{"index":{"_id":7}}
{"title":"testing 07"}
# 5. 查问验证滚动是否胜利
POST my-test-index-alias/_search
LIM 前奏 – shrink 索引压缩
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.html外围步骤:
1. 将数据全副迁徙至一个独立的节点
2. 索引禁止写入
3. 方可进行压缩
# 1. 筹备测试数据
DELETE kibana_sample_data_logs_ext
PUT kibana_sample_data_logs_ext
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0
}
}
POST _reindex
{
"source": {"index": "kibana_sample_data_logs"},
"dest": {"index": "kibana_sample_data_logs_ext"}
}
# 2. 压缩前必要的条件设置
# number_of_replicas : 压缩后正本为 0
# index.routing.allocation.include._tier_preference 数据分片全副路由到 hot 节点
# "index.blocks.write 压缩后索引不再容许数据写入
PUT kibana_sample_data_logs_ext/_settings
{
"settings": {
"index.number_of_replicas": 0,
"index.routing.allocation.include._tier_preference": "data_hot",
"index.blocks.write": true
}
}
# 3. 施行压缩
POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
{
"settings":{
"index.number_of_replicas": 0,
"index.number_of_shards": 1,
"index.codec":"best_compression"
},
"aliases":{"kibana_sample_data_logs_alias":{}
}
}
LIM 实战
全局认知建设 – 四大阶段
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html
生命周期治理阶段(Policy):
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html
Hot 阶段 (生)
Set priority
Unfollow
Rollover
Read-only
Shrink
Force Merge
Search snapshot
Warm 阶段 (老)
Set priority
Unfollow
Read-only
Allocate
migrate
Shirink
Force Merge
Cold 阶段 (病)
Search snapshot
Delete 阶段 (死)
delete
演练
1. 创立 policy
- Hot 阶段设置,rollover: max\_age:3d,max\_docs:5, max_size:50gb, 优先级:100
- Warm 阶段设置:min_age:15s , forcemerage 段合并,热节点迁徙到 warm 节点,正本数设置 0,优先级:50
- Cold 阶段设置: min_age 30s, warm 迁徙到 cold 阶段
- Delete 阶段设置:min_age 45s,执行删除操作
PUT _ilm/policy/kr_20221114_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {"priority": 100},
"rollover": {
"max_size": "50gb",
"max_primary_shard_size": "50gb",
"max_age": "3d",
"max_docs": 5
}
}
},
"warm": {
"min_age": "15s",
"actions": {
"forcemerge": {"max_num_segments": 1},
"set_priority": {"priority": 50},
"allocate": {"number_of_replicas": 0}
}
},
"cold": {
"min_age": "30s",
"actions": {
"set_priority": {"priority": 0}
}
},
"delete": {
"min_age": "45s",
"actions": {
"delete": {"delete_searchable_snapshot": true}
}
}
}
}
}
2. 创立 index template
PUT _index_template/kr_20221114_template
{"index_patterns": ["kr_index-**"],
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "kr_20221114_policy",
"rollover_alias": "kr-index-alias"
},
"routing": {
"allocation": {
"include": {"_tier_preference": "data-hot"}
}
},
"number_of_shards": "3",
"number_of_replicas": "1"
}
},
"aliases": {},
"mappings": {}}
}
3. 测试须要批改 lim rollover 刷新频率
PUT _cluster/settings
{
"persistent": {"indices.lifecycle.poll_interval": "1s"}
}
4. 进行测试
# 创立索引,并制订可写别名
PUT kr_index-0001
{
"aliases": {
"kr-index-alias": {"is_write_index": true}
}
}
# 通过别名新增数据
PUT kr-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 通过别名新增数据,触发 rollover
PUT kr-index-alias/_bulk
{"index":{"_id":6}}
{"title":"testing 06"}
# 查看索引状况
GET kr_index-0001
get _cat/indices?v
过程总结
第一步:配置 lim pollicy
- 横向:Phrase 阶段 (Hot、Warm、Cold、Delete) 生老病死
- 纵向:Action 操作(rollover、forcemerge、readlyonly、delete)
第二步:创立模板 绑定 policy, 指定别名
第三步:创立起始索引
第四步:索引基于第一步指定的 policy 进行滚动
五.Data Stream
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html
个性解析
Data Stream 让咱们跨多个索引存储时序数据,同时给了惟一的对外接口(data stream 名称)
- 写入和检索申请发给 data stream
- data stream 将这些申请路由至 backing index(后盾索引)
Backing indices
每个 data stream 由多个暗藏的后盾索引形成
- 主动创立
- 要求模板索引
rollover 滚动索引机制用于主动生成后盾索引
- 将成为 data stream 新的写入索引
利用场景
- 日志、事件、指标等其余继续创立(少更新)的业务数据
- 两大外围特点
- 时序性数据
- 数据极少更新或没有更新
创立 Data Stream 外围步骤
官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html
Set up a data stream
To set up a data stream, follow these steps:
- Create an index lifecycle policy
- Create component templates
- Create an index template
- Create the data stream
- Secure the data stream
演练
1. 创立一个 data stream,名称为 my-data-stream
2. index_template 名称为 my-index-template
3. 满足 index 格局【”my-data-stream*”】的索引都要被利用到
4. 数据插入的时候,在 data_hot 节点
5. 过 3 分钟之后要 rollover 到 data_warm 节点
6. 再过 5 分钟要到 data_cold 节点
# 步骤 1。创立 lim policy
PUT _ilm/policy/my-lifecycle-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "3m",
"max_docs": 5
},
"set_priority": {"priority": 100}
}
},
"warm": {
"min_age": "5m",
"actions": {
"allocate": {"number_of_replicas": 0},
"forcemerge": {"max_num_segments": 1},
"set_priority": {"priority": 50}
}
},
"cold": {
"min_age": "6m",
"actions": {"freeze":{}
}
},
"delete": {
"min_age": "45s",
"actions": {"delete": {}
}
}
}
}
}
# 步骤 2 创立组件模板 - mapping
PUT _component_template/my-mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"message": {"type": "wildcard"}
}
}
},
"_meta": {
"description": "Mappings for @timestamp and message fields",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤 3 创立组件模板 - setting
PUT _component_template/my-settings
{
"template": {
"settings": {
"index.lifecycle.name": "my-lifecycle-policy",
"index.routing.allocation.include._tier_preference":"data_hot"
}
},
"_meta": {
"description": "Settings for ILM",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤 4 创立索引模板
PUT _index_template/my-index-template
{"index_patterns": ["my-data-stream*"],
"data_stream": { },
"composed_of": ["my-mappings", "my-settings"],
"priority": 500,
"_meta": {
"description": "Template for my time series data",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤 5 创立 data stream 并 写入数据测试
PUT my-data-stream/_bulk
{"create":{} }
{"@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\"200 24736" }
{"create":{} }
{"@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\"200 3638" }
POST my-data-stream/_doc
{
"@timestamp": "2099-05-06T16:21:15.000Z",
"message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\"200 24736"
}
# 步骤 6 查看 data stream 后盾索引信息
GET /_resolve/index/my-data-stream*