关于elasticsearch:ElasticSearch-必知必会-进阶篇

京东物流:康睿 姚再毅 李振 刘斌 王北永

阐明:以下全副均基于 ElasticSearch 8.1 版本

一.跨集群检索 – ccr

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

跨集群检索的背景和意义

跨集群检索定义

跨集群检索环境搭建

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

步骤1:搭建两个本地单节点集群,本地练习可勾销平安配置

步骤2:每个集群都执行以下命令

PUT \_cluster/settings { “persistent”: { “cluster”: { “remote”: { “cluster\_one”: { “seeds”: [ “172.21.0.14:9301″ ] },”cluster_two”: { “seeds”: [ “172.21.0.14:9302” ] } } } } }

步骤3:验证集群之间是否互通

计划1:Kibana 可视化查看:stack Management -> Remote Clusters -> status 应该是 connected! 且必须打上绿色的对号。

​ 计划2:GET _remote/info

跨集群查问演练

# 步骤1 在集群 1 中增加数据如下
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster01..."}

# 步骤2 在集群 2 中增加数据如下:
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster02..."}

# 步骤 3:执行跨集群检索如下: 语法:POST 集群名称1:索引名称,集群名称2:索引名称/_search
POST cluster_one:test01,cluster_two:test01/_search
{
  "took" : 7,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "_clusters" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cluster_two:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster02..."
        }
      },
      {
        "_index" : "cluster_one:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster01..."
        }
      }
    ]
  }
}


二.跨集群复制 – ccs – 该性能需付费

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

如何保障集群的高可用

  1. 正本机制
  2. 快照和复原
  3. 跨集群复制(相似mysql 主从同步)

跨集群复制概述

跨集群复制配置

  1. 筹备两个集群,网络互通
  2. 开启 license 应用,可试用30天
  • 开启地位:Stack Management -> License mangement.

3.定义好谁是Leads集群,谁是follower集群

4.在follower集群配置Leader集群

5.在follower集群配置Leader集群的索引同步规定(kibana页面配置)

a.stack Management -> Cross Cluster Replication -> create a follower index.

6.启用步骤5的配置


三索引模板

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

8.X之组件模板

1.创立组件模板-索引setting相干

# 组件模板 - 索引setting相干
PUT _component_template/template_sttting_part
{
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 0
    }
  }
}

2.创立组件模板-索引mapping相干

# 组件模板 - 索引mapping相干
PUT _component_template/template_mapping_part
{
  "template": {
    "mappings": {
      "properties": {
        "hosr_name":{
          "type": "keyword"
        },
        "cratet_at":{
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    }
  }
}

3.创立组件模板-配置模板和索引之间的关联

// **留神:composed_of 如果多个组件模板中的配置项有反复,前面的会笼罩后面的,和配置的程序无关**
# 基于组件模板,配置模板和索引之间的关联
# 也就是所有 tem_* 该表达式相干的索引创立时,都会应用到以下规定
PUT _index_template/template_1
{
  "index_patterns": [
    "tem_*"
  ],
  "composed_of": [
    "template_sttting_part",
    "template_mapping_part"
  ]
}

4.测试

# 创立测试
PUT tem_001

索引模板基本操作

实战演练

需要1:默认如果不显式指定Mapping,数值类型会被动静映射为long类型,但实际上业务数值都比拟小,会存在存储节约。须要将默认值指定为Integer

索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-动静模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

# 联合mapping 动静模板 和 索引模板
# 1.创立组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        }
      ]
    }
  }
}

# 2. 创立组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}

# 3.创立测试数据
POST tem1_001/_doc/1
{
  "age":18
}

# 4.查看mapping构造验证
get tem1_001/_mapping


需要2:date_*结尾的字段,对立匹配为date日期类型。

索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-动静模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

# 联合mapping 动静模板 和 索引模板
# 1.创立组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        },
        {
        "date_type_process": {
          "match": "date_*",
          "mapping": {
            "type": "date",
            "format":"yyyy-MM-dd HH:mm:ss"
          }
        }
      }
      ]
    }
  }
}

# 2. 创立组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}


# 3.创立测试数据
POST tem1_001/_doc/2
{
  "age":19,
  "date_aoe":"2022-01-01 18:18:00"
}

# 4.查看mapping构造验证
get tem1_001/_mapping

四.LIM 索引生命周期治理

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html

什么是索引生命周期

索引的 生-> 老 -> 病 -\> 死

是否有过思考,如果一个索引,创立之后,就不再去治理了?会产生什么?

什么是索引生命周期治理

索引太大了会如何?

大索引的复原工夫,要远比小索引复原慢的多的多索引大了当前,检索会很慢,写入和更新也会受到不同水平的影响索引大到肯定水平,当索引呈现衰弱问题,会导致整个集群外围业务不可用

最佳实际

集群的单个分片最大文档数下限:2的32次幂减1,即20亿左右官网倡议:分片大小管制在30GB-50GB,若索引数据量有限增大,必定会超过这个值

用户不关注全量

某些业务场景,业务更关注近期的数据,如近3天、近7天大索引会将全副历史数据会集在一起,不利于这种场景的查问

索引生命周期治理的历史演变

LIM前奏 – rollover 滚动索引

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html

# 0.自测前提,lim生命周期rollover频率。默认10分钟
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}

# 1. 创立索引,并指定别名
PUT test_index-0001
{
  "aliases": {
    "my-test-index-alias": {
      "is_write_index": true
    }
  }
}

# 2.批量导入数据
PUT my-test-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}

# 3.rollover 滚动规定配置
POST my-test-index-alias/_rollover
{
  "conditions": {
    "max_age": "7d",
    "max_docs": 5,
    "max_primary_shard_size": "50gb"
  }
}

# 4.在满足条件的前提下创立滚动索引
PUT my-test-index-alias/_bulk
{"index":{"_id":7}}
{"title":"testing 07"}

# 5.查问验证滚动是否胜利
POST my-test-index-alias/_search

LIM前奏 – shrink 索引压缩

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.html

外围步骤:

1. 将数据全副迁徙至一个独立的节点

2. 索引禁止写入

3. 方可进行压缩

# 1.筹备测试数据
DELETE kibana_sample_data_logs_ext
PUT kibana_sample_data_logs_ext
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0
  }
}
POST _reindex
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "kibana_sample_data_logs_ext"
  }
}


# 2.压缩前必要的条件设置
# number_of_replicas :压缩后正本为0
# index.routing.allocation.include._tier_preference 数据分片全副路由到hot节点
# "index.blocks.write 压缩后索引不再容许数据写入
PUT kibana_sample_data_logs_ext/_settings
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.routing.allocation.include._tier_preference": "data_hot",
    "index.blocks.write": true
  }
}

# 3.施行压缩
POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
{
  "settings":{
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.codec":"best_compression"
  },
  "aliases":{
    "kibana_sample_data_logs_alias":{}
  }
}

LIM实战

全局认知建设 – 四大阶段

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html

生命周期治理阶段(Policy):
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html

Hot阶段(生)

Set priority

Unfollow

Rollover

Read-only

Shrink

Force Merge

Search snapshot

Warm阶段(老)

Set priority

Unfollow

Read-only

Allocate

migrate

Shirink

Force Merge

Cold阶段(病)

Search snapshot

Delete阶段(死)

delete

演练

1.创立policy
  • Hot阶段设置,rollover: max\_age:3d,max\_docs:5, max_size:50gb, 优先级:100
  • Warm阶段设置:min_age:15s , forcemerage段合并,热节点迁徙到warm节点,正本数设置0,优先级:50
  • Cold阶段设置: min_age 30s, warm迁徙到cold阶段
  • Delete阶段设置:min_age 45s,执行删除操作
PUT _ilm/policy/kr_20221114_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_size": "50gb",
            "max_primary_shard_size": "50gb",
            "max_age": "3d",
            "max_docs": 5
          }
        }
      },
      "warm": {
        "min_age": "15s",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30s",
        "actions": {
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

2.创立index template
PUT _index_template/kr_20221114_template
{
  "index_patterns": ["kr_index-**"],
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "kr_20221114_policy",
          "rollover_alias": "kr-index-alias"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data-hot"
            }
          }
        },
        "number_of_shards": "3",
        "number_of_replicas": "1"
      }
    },
    "aliases": {},
    "mappings": {}
  }
}


3.测试须要批改lim rollover刷新频率
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}

4.进行测试
# 创立索引,并制订可写别名
PUT kr_index-0001
{
  "aliases": {
    "kr-index-alias": {
      "is_write_index": true
    }
  }
}
# 通过别名新增数据
PUT kr-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 通过别名新增数据,触发rollover
PUT kr-index-alias/_bulk
{"index":{"_id":6}}
{"title":"testing 06"}
# 查看索引状况
GET kr_index-0001

get _cat/indices?v

过程总结

第一步:配置 lim pollicy

  • 横向:Phrase 阶段(Hot、Warm、Cold、Delete) 生老病死
  • 纵向:Action 操作(rollover、forcemerge、readlyonly、delete)

第二步:创立模板 绑定policy,指定别名

第三步:创立起始索引

第四步:索引基于第一步指定的policy进行滚动


五.Data Stream

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html

个性解析

Data Stream让咱们跨多个索引存储时序数据,同时给了惟一的对外接口(data stream名称)

  • 写入和检索申请发给data stream
  • data stream将这些申请路由至 backing index(后盾索引)

Backing indices

每个data stream由多个暗藏的后盾索引形成

  • 主动创立
  • 要求模板索引

rollover 滚动索引机制用于主动生成后盾索引

  • 将成为data stream 新的写入索引

利用场景

  1. 日志、事件、指标等其余继续创立(少更新)的业务数据
  2. 两大外围特点
  3. 时序性数据
  4. 数据极少更新或没有更新

创立Data Stream 外围步骤

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html

Set up a data stream

To set up a data stream, follow these steps:

  1. Create an index lifecycle policy
  2. Create component templates
  3. Create an index template
  4. Create the data stream
  5. Secure the data stream

演练

1. 创立一个data stream,名称为my-data-stream

2. index_template 名称为 my-index-template

3. 满足index格局【”my-data-stream*”】的索引都要被利用到

4. 数据插入的时候,在data_hot节点

5. 过3分钟之后要rollover到data_warm节点

6. 再过5分钟要到data_cold节点

# 步骤1 。创立 lim policy
PUT _ilm/policy/my-lifecycle-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "3m",
            "max_docs": 5
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "5m",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }, 
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "6m",
        "actions": {
          "freeze":{}
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

# 步骤2 创立组件模板 - mapping
PUT _component_template/my-mappings
{
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "date_optional_time||epoch_millis"
        },
        "message": {
          "type": "wildcard"
        }
      }
    }
  },
  "_meta": {
    "description": "Mappings for @timestamp and message fields",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤3 创立组件模板 - setting
PUT _component_template/my-settings
{
  "template": {
    "settings": {
      "index.lifecycle.name": "my-lifecycle-policy",
      "index.routing.allocation.include._tier_preference":"data_hot"
    }
  },
  "_meta": {
    "description": "Settings for ILM",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤4 创立索引模板
PUT _index_template/my-index-template
{
  "index_patterns": ["my-data-stream*"],
  "data_stream": { },
  "composed_of": [ "my-mappings", "my-settings" ],
  "priority": 500,
  "_meta": {
    "description": "Template for my time series data",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤5 创立 data stream  并 写入数据测试
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }

POST my-data-stream/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}


# 步骤6 查看data stream 后盾索引信息
GET /_resolve/index/my-data-stream*

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理