Elasticsearch系列性能调优最佳实践

性能调优是系统架构里所有组件必不可少的话题，Elasticsearch 也不例外，虽说 Elasticsearch 内的默认配置已经非常优秀，但这不表示它就是完美的，必要的一些实践我们还是需要了解一下。

慢查询日志是性能诊断的重要利器，常规操作是设置慢查询的阀值，然后运维童鞋每天对慢日志进行例行巡查，有特别慢的查询，立即报备事件处理，其余的定期将慢日志的 top n 取出来进行优化。

慢日志的配置在 elasticsearch 6.3.1 版本下是通过命令配置的，读操作和写操作可以单独设置，阀值的定义可根据实际的需求和性能指标，有人觉得 5 秒慢，有人觉得 3 秒就不可接受，我们以 3 秒为例：

PUT /_all/_settings
{
"index.search.slowlog.threshold.query.warn":"3s",
"index.search.slowlog.threshold.query.info":"2s",
"index.search.slowlog.threshold.query.debug":"1s",
"index.search.slowlog.threshold.query.trace":"500ms",

"index.search.slowlog.threshold.fetch.warn":"1s",
"index.search.slowlog.threshold.fetch.info":"800ms",
"index.search.slowlog.threshold.fetch.debug":"500ms",
"index.search.slowlog.threshold.fetch.trace":"200ms",

"index.indexing.slowlog.threshold.index.warn":"3s",
"index.indexing.slowlog.threshold.index.info":"2s",
"index.indexing.slowlog.threshold.index.debug":"1s",
"index.indexing.slowlog.threshold.index.trace":"500ms",
"index.indexing.slowlog.level":"info",
"index.indexing.slowlog.source":"1000"
}

这三段分别表示 query 查询、fetch 查询和 index 写入三类操作的慢日志输出阀值，_all 表示对所有索引生效，也可以针对具体的索引。

同时在 log4j2.properties 配置文件中增加如下配置：

# 查询操作慢日志输出
appender.index_search_slowlog_rolling.type = RollingFile
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_search_slowlog.log
appender.index_search_slowlog_rolling.layout.type = PatternLayout
appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %.10000m%n
appender.index_search_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_search_slowlog-%d{yyyy-MM-dd}.log
appender.index_search_slowlog_rolling.policies.type = Policies
appender.index_search_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_search_slowlog_rolling.policies.time.interval = 1
appender.index_search_slowlog_rolling.policies.time.modulate = true

logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false

# 索引操作慢日志输出
appender.index_indexing_slowlog_rolling.type = RollingFile
appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
appender.index_indexing_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_indexing_slowlog.log
appender.index_indexing_slowlog_rolling.layout.type = PatternLayout
appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.10000m%n
appender.index_indexing_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_indexing_slowlog-%d{yyyy-MM-dd}.log
appender.index_indexing_slowlog_rolling.policies.type = Policies
appender.index_indexing_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_indexing_slowlog_rolling.policies.time.interval = 1
appender.index_indexing_slowlog_rolling.policies.time.modulate = true

logger.index_indexing_slowlog.name = index.indexing.slowlog.index
logger.index_indexing_slowlog.level = trace
logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
logger.index_indexing_slowlog.additivity = false

重启 elasticsearch 实例后，就能在 /home/esuser/esdata/log 目录中看到生成的两个日志文件了。

搜索结果不要返回过大的结果集

过大的结果集会占用大量的 IO 资源和带宽，速度肯定快不了，Elasticsearch 是一个搜索引擎，最理想的搜索是精准查询或次精准查询，最关心的是排在前面的少数结果，而不是所有结果，优化搜索条件，控制搜索结果数量是高性能的前提。

如果真有大批量的数据查询，建议使用 scroll api。

避免超大的 document

http.max_context_length 的默认值是 100mb，如果你一次 document 写入时，document 的内容不能超过 100mb，否则 es 就会拒绝写入。虽然你可以修改此配置，但不建议这么做，es 底层的 lucene 引擎还是有一个 2gb 的最大限制。

过大的 document 会占用非常多的资源，从任何方面考虑都不建议，如果业务需求真有非常大的内容，如对书的内容搜索，建议按章节、按段落进行拆分存储。

避免稀疏的数据

document 的设计会从根本上影响索引的性能，稀疏数据是一个典型的不良设计，浪费存储空间，影响读写性能。

下面有一些 document 结构设计的建议：

避免将没有任何关联性的数据写入同一个索引

没有关联性的数据，意味着数据结构也不相同，硬生生放在同一个索引，会导致 index 数据非常稀疏，建议是将这些数据放在不同的索引中。

对 document 的结构进行统一规范化

document 的结构、命名尽可能统一规范处理，同样是创建时间字段，避免有的叫 timestamp，有的叫 create_time，尽可能统一。

对某些 field 禁用 norms 和 doc_values

如果一个 field 不需要考虑其相关度分数，那么可以禁用 norms，如果不需要对一个 field 进行排序或者聚合，那么可以禁用 doc_values 字段。

硬件资源是性能最硬核的部分，硬件好，起点就高。

用更快的硬件资源

在预算范围内，能用 SSD 固态硬盘就不要选用机械硬盘；

CPU 主频、核数当然是强大到预算上限；

内存单机上限 64GB，加机器加到没钱为止；

尽量使用本地存储系统，不要用 NFS 等网络存储，毕竟硬盘便宜。

给 filesystem cache 更多的内存

Elasticsearch 的搜索严重依赖于底层的 filesystem cache，如果所有的数据都能够存放在 filesystem cache 中，那么搜索基本上是秒级。

由于实际情况的限制，最佳的情况下，就是你的机器的内存，至少可以容纳你的总数据量的一半。

要达到最佳情况有两个办法：一个是砸钱，买更多机器，加更大内存；另一种是精简 document 数据，只把需要搜索的 field 放进 es 内，filesystem cache 就能存下更多的 document，可以提高内存的利用率。剩余的其他字段，可以放在 redis/mysql/hbase/hapdoop 做二级加载。

禁止 swapping 交换内存

将 swapping 禁止掉，如果 es jvm 内存交换到磁盘，再交换回内存，会造成大量磁盘 IO，性能很差。

index buffer

在高并发写入场景，我们可以将 index buffer 调大一些，indices.memory.index_buffer_size，这个可以调节大一些，这个值默认是 jvm heap 的 10%，这个 index buffer 大小，是所有的 shard 公用的，这个值除以 shard 数量，算出来平均每个 shard 可以使用的内存大小，一般建议对于每个 shard 最多给 512mb。

禁止_all field

_all field 会将 document 中所有 field 的值都合并在一起进行索引，很占用磁盘空空间，实际上用处却不大，生产环境最好禁用_all field。

使用 best_compression

_source field 和其他 field 很占用磁盘空间，建议对其使用 best_compression 进行压缩。

用最小的最合适的数字类型

es 支持 4 种数字类型：byte，short，integer，long。如果最小的类型就合适，那么就用最小的类型，节省磁盘空间。

禁用不需要的功能

对于需要进行聚合和排序的 field，我们才建立正排索引；
对于需要进行检索的 field，我们才建立倒排索引；
对于不关心 doc 分数的 field，我们可以禁用掉 norm；
对于不需要执行 phrase query 近似匹配的 field，那么可以禁用位置这个属性;

不要用默认的动态 string 类型映射

默认的动态 string 类型映射会将 string 类型的 field 同时映射为 text 类型以及 keyword 类型，大多数情况我们只需要使用其中一种，剩下的都是浪费磁盘空间，例如，id field 这种字段可能只需要 keyword，而 body field 可能只需要 text field。

所以是使用 keyword 和 text 在设计时就应该区分清楚，而不是全盘保存。

预热 filesystem cache

如果我们重启了 Elasticsearch，那么 filesystem cache 是空的，每次数据查询时再加载数据进 filesystem cache，我们可以先对一些数据进行查询，提前将一些常用数据加载到内存，待真实客户使用时，可以直接使用内存数据，响应就很快了。

多使用 bulk 做写入

我们使用 Java 作为客户端时，写入操作全部利用 bulk api 来完成。

使用多线程将数据写入
document 使用自动生成的 id

手动给 document 设置一个 id，那么 es 需要每次都去确认一下那个 id 是否存在，这个过程是比较耗费时间的。如果我们使用自动生成的 id，那么 es 就可以跳过这个步骤，写入性能会更好。

对于关系型数据库中的表 id，可以作为 es document 的一个 field 存入。

重视 document 结构设计

业务研发的重中之重，好的 document 结构会带来非常优秀的性能表现。

避免使用 script 脚本
充分利用缓存

时间查询时，不要使用 now 这种函数，应该在客户端把时间转换成规范的格式，再到 Elasticsearch 里查询，这样能提高缓存的使用率。

本篇介绍了 Elasticsearch 性能调优的常见实践方法，从服务器、实例再到代码层级，可以作为参考，但性能调优没有约定俗成的方法，需要反复的验证，仅供参考，谢谢

专注 Java 高并发、分布式架构，更多技术干货分享与心得，请关注公众号：Java 架构社区
可以扫左边二维码添加好友，邀请你加入 Java 架构社区微信群共同探讨技术

Elasticsearch系列性能调优最佳实践

概要

开启慢查询日志

优化实践建议

基本使用规范

服务器层级

Elasticsearch 层级

代码研发层级

小结