elasticsearch学习笔记（九）——shard&replica机制以及ES集群节点的问题

jiezi

6 年前

1、shard 和 replica 知识归纳梳理
（1）一个 index 包含多个 shard（2）每个 shard 都是一个最小工作单元，承载部分数据，可以说就是一个 lucene 实例，拥有完整的建立索引和处理请求的能力（3）每当 ES 集群增加节点时，shard 会自动在 nodes 中实现负载均衡（4）对于 primary shard 和 replica shard，每个 document 肯定只存在于某一个 primary shard 以及其对应的 replica shard 中，不可能存在于多个 primary shard 中（5）replica shard 时 primary shard 的副本，主要负责容错，以及承担读请求的负载（6）primary shard 的数量在创建索引的时候就已经固定了，但是 replica shard 的数量可以随时修改（7）primary shard 的默认数量是 5 个，replica 默认是 1，也就是在默认的情况下，有 5 个 primary shard 和 5 个 replica shard（8）primary shard 不能和自己的 replica shard 放在同一个节点上（否则节点宕机，primary shard 和副本都丢失，也就起不到容错的作用了），但是可以和其它 primary shard 的 replica shard 放在同一个节点上
2、对于单节点 ES 集群存在的问题
假设我们创建一个索引 test_index
PUT /test_index
{
“settings”: {
“number_of_shards”: 5,
“number_of_replicas”: 1
}
}
（1）单 node 环境下，创建一个 index 叫 test_index, 有 5 个 primary shard,5 个 replica shard（2）集群的 status 是 yellow
GET /_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1555495650 10:07:30 elasticsearch yellow 1 1 8 8 0 0 6 0 – 57.1%

（3）这个时候，只会将 5 个 primary shard 分配到仅有的一个 node 上去，另外 5 个 replica shard 是无法分配的（4）此时集群是可以正常工作的，但是一旦出现节点宕机，数据就会全部丢失，此时节点也不可用，无法承担任何请求结构如下：
3、对于两个节点 ES 集群
上面 test_index 索引的 shard 分配的结构就会变成：
此时即使存放 primary shard 的节点挂掉了，ES 的 shard allocation 会被触发，此时对应的 primary shard 的一个副本会变成 primary shard
在什么场景下会触发 Shard 的 Allocation：
创建 / 删除一个 Index；加入 / 离开一个 Node；手动执行了 Reroute 命令；修改了 Replica 设置；