关于elasticsearch:ElasticSearch-23-种映射参数详解

@[toc]
ElasticSearch 系列教程咱们后面曾经连着发了四篇了，明天第五篇，咱们来聊一聊 Es 中的 23 种常见的映射参数。

针对这 23 种常见的映射参数，松哥专门录制了一个视频教程：

视频链接: https://pan.baidu.com/s/1J23m... 提取码: 6k2a

本文是松哥所录视频教程的一个笔记，笔记简明扼要，残缺内容小伙伴们能够参考视频。

1.ElasticSearch 映射参数

1.1 analyzer

定义文本字段的分词器。默认对索引和查问都是无效的。

假如不必分词器，咱们先来看一下索引的后果，创立一个索引并增加一个文档：

PUT blogPUT blog/_doc/1{  "title":"定义文本字段的分词器。默认对索引和查问都是无效的。"}

查看词条向量（term vectors）

GET blog/_termvectors/1{  "fields": ["title"]}

查看后果如下：

{  "_index" : "blog",  "_type" : "_doc",  "_id" : "1",  "_version" : 1,  "found" : true,  "took" : 0,  "term_vectors" : {    "title" : {      "field_statistics" : {        "sum_doc_freq" : 22,        "doc_count" : 1,        "sum_ttf" : 23      },      "terms" : {        "义" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 1,              "start_offset" : 1,              "end_offset" : 2            }          ]        },        "分" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 7,              "start_offset" : 7,              "end_offset" : 8            }          ]        },        "和" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 15,              "start_offset" : 16,              "end_offset" : 17            }          ]        },        "器" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 9,              "start_offset" : 9,              "end_offset" : 10            }          ]        },        "字" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 4,              "start_offset" : 4,              "end_offset" : 5            }          ]        },        "定" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 0,              "start_offset" : 0,              "end_offset" : 1            }          ]        },        "对" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 12,              "start_offset" : 13,              "end_offset" : 14            }          ]        },        "引" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 14,              "start_offset" : 15,              "end_offset" : 16            }          ]        },        "效" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 21,              "start_offset" : 22,              "end_offset" : 23            }          ]        },        "文" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 2,              "start_offset" : 2,              "end_offset" : 3            }          ]        },        "是" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 19,              "start_offset" : 20,              "end_offset" : 21            }          ]        },        "有" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 20,              "start_offset" : 21,              "end_offset" : 22            }          ]        },        "本" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 3,              "start_offset" : 3,              "end_offset" : 4            }          ]        },        "查" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 16,              "start_offset" : 17,              "end_offset" : 18            }          ]        },        "段" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 5,              "start_offset" : 5,              "end_offset" : 6            }          ]        },        "的" : {          "term_freq" : 2,          "tokens" : [            {              "position" : 6,              "start_offset" : 6,              "end_offset" : 7            },            {              "position" : 22,              "start_offset" : 23,              "end_offset" : 24            }          ]        },        "索" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 13,              "start_offset" : 14,              "end_offset" : 15            }          ]        },        "认" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 11,              "start_offset" : 12,              "end_offset" : 13            }          ]        },        "词" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 8,              "start_offset" : 8,              "end_offset" : 9            }          ]        },        "询" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 17,              "start_offset" : 18,              "end_offset" : 19            }          ]        },        "都" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 18,              "start_offset" : 19,              "end_offset" : 20            }          ]        },        "默" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 10,              "start_offset" : 11,              "end_offset" : 12            }          ]        }      }    }  }}

能够看到，默认状况下，中文就是一个字一个字的分，这种分词形式没有任何意义。如果这样分词，查问就只能依照一个字一个字来查，像上面这样：

GET blog/_search{  "query": {    "term": {      "title": "定"    }  }}

无意义！！！

所以，咱们要依据理论状况，配置适合的分词器。

给字段设定分词器：

PUT blog{  "mappings": {    "properties": {      "title":{        "type":"text",        "analyzer": "ik_smart"      }    }  }}

存储文档：

PUT blog/_doc/1{  "title":"定义文本字段的分词器。默认对索引和查问都是无效的。"}

查看词条向量：

GET blog/_termvectors/1{  "fields": ["title"]}

查问后果如下：

{  "_index" : "blog",  "_type" : "_doc",  "_id" : "1",  "_version" : 1,  "found" : true,  "took" : 1,  "term_vectors" : {    "title" : {      "field_statistics" : {        "sum_doc_freq" : 12,        "doc_count" : 1,        "sum_ttf" : 13      },      "terms" : {        "分词器" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 4,              "start_offset" : 7,              "end_offset" : 10            }          ]        },        "和" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 8,              "start_offset" : 16,              "end_offset" : 17            }          ]        },        "字段" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 2,              "start_offset" : 4,              "end_offset" : 6            }          ]        },        "定义" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 0,              "start_offset" : 0,              "end_offset" : 2            }          ]        },        "对" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 6,              "start_offset" : 13,              "end_offset" : 14            }          ]        },        "文本" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 1,              "start_offset" : 2,              "end_offset" : 4            }          ]        },        "无效" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 11,              "start_offset" : 21,              "end_offset" : 23            }          ]        },        "查问" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 9,              "start_offset" : 17,              "end_offset" : 19            }          ]        },        "的" : {          "term_freq" : 2,          "tokens" : [            {              "position" : 3,              "start_offset" : 6,              "end_offset" : 7            },            {              "position" : 12,              "start_offset" : 23,              "end_offset" : 24            }          ]        },        "索引" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 7,              "start_offset" : 14,              "end_offset" : 16            }          ]        },        "都是" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 10,              "start_offset" : 19,              "end_offset" : 21            }          ]        },        "默认" : {          "term_freq" : 1,          "tokens" : [            {              "position" : 5,              "start_offset" : 11,              "end_offset" : 13            }          ]        }      }    }  }}

而后就能够通过词去搜寻了：

GET blog/_search{  "query": {    "term": {      "title": "索引"    }  }}

1.2 search_analyzer

查问时候的分词器。默认状况下，如果没有配置 search_analyzer，则查问时，首先查看有没有 search_analyzer，有的话，就用 search_analyzer 来进行分词，如果没有，则看有没有 analyzer，如果有，则用 analyzer 来进行分词，否则应用 es 默认的分词器。

1.3 normalizer

normalizer 参数用于解析前（索引或者查问）的标准化配置。

比方，在 es 中，对于一些咱们不想切分的字符串，咱们通常会将其设置为 keyword，搜寻时候也是应用整个词进行搜寻。如果在索引前没有做好数据荡涤，导致大小写不统一，例如 javaboy 和 JAVABOY，此时，咱们就能够应用 normalizer 在索引之前以及查问之前进行文档的标准化。

先来一个反例，创立一个名为 blog 的索引，设置 author 字段类型为 keyword：

PUT blog{  "mappings": {    "properties": {      "author":{        "type": "keyword"      }    }  }}

增加两个文档：

PUT blog/_doc/1{  "author":"javaboy"}PUT blog/_doc/2{  "author":"JAVABOY"}

而后进行搜寻：

GET blog/_search{  "query": {    "term": {      "author": "JAVABOY"    }  }}

大写关键字能够搜到大写的文档，小写关键字能够搜到小写的文档。

如果应用了 normalizer，能够在索引和查问时，别离对文档进行预处理。

normalizer 定义形式如下：

PUT blog{  "settings": {    "analysis": {      "normalizer":{        "my_normalizer":{          "type":"custom",          "filter":["lowercase"]        }      }    }  },   "mappings": {    "properties": {      "author":{        "type": "keyword",        "normalizer":"my_normalizer"      }    }  }}

在 settings 中定义 normalizer，而后在 mappings 中援用。

测试形式和后面统一。此时查问的时候，大写关键字也能够查问到小写文档，因为无论是索引还是查问，都会将大写转为小写。

1.4 boost

boost 参数能够设置字段的权重。

boost 有两种应用思路，一种就是在定义 mappings 的时候应用，在指定字段类型时应用；另一种就是在查问时应用。

理论开发中倡议应用后者，前者有问题：如果不从新索引文档，权重无奈批改。

mapping 中应用 boost（不举荐）：

PUT blog{  "mappings": {    "properties": {      "content":{        "type": "text",        "boost": 2      }    }  }}

另一种形式就是在查问的时候，指定 boost

GET blog/_search{  "query": {    "match": {      "content": {        "query": "你好",        "boost": 2      }    }  }}

1.5 coerce

coerce 用来革除脏数据，默认为 true。

例如一个数字，在 JSON 中，用户可能写错了：

{"age":"99"}

或者：

{"age":"99.0"}

这些都不是正确的数字格局。

通过 coerce 能够解决该问题。

默认状况下，以下操作没问题，就是 coerce 起作用：

PUT blog{  "mappings": {    "properties": {      "age":{        "type": "integer"      }    }  }}POST blog/_doc{  "age":"99.0"}

如果须要批改 coerce ，形式如下：

PUT blog{  "mappings": {    "properties": {      "age":{        "type": "integer",        "coerce": false      }    }  }}POST blog/_doc{  "age":99}

当 coerce 批改为 false 之后，数字就只能是数字了，不能够是字符串，该字段传入字符串会报错。

1.6 copy_to

这个属性，能够将多个字段的值，复制到同一个字段中。

定义形式如下：

PUT blog{  "mappings": {    "properties": {      "title":{        "type": "text",        "copy_to": "full_content"      },      "content":{        "type": "text",        "copy_to": "full_content"      },      "full_content":{        "type": "text"      }    }  }}PUT blog/_doc/1{  "title":"你好江南一点雨",  "content":"当 coerce 批改为 false 之后，数字就只能是数字了，不能够是字符串，该字段传入字符串会报错。"}GET blog/_search{  "query": {    "term": {      "full_content": "当"    }  }}

1.7 doc_values 和 fielddata

es 中的搜寻次要是用到倒排索引，doc_values 参数是为了放慢排序、聚合操作而生的。当建设倒排索引的时候，会额定减少列式存储映射。

doc_values 默认是开启的，如果确定某个字段不须要排序或者不须要聚合，那么能够敞开 doc_values。

大部分的字段在索引时都会生成 doc_values，除了 text。text 字段在查问时会生成一个 fielddata 的数据结构，fieldata 在字段首次被聚合、排序的时候生成。

doc_values	fielddata
索引时创立	应用时动态创建
磁盘	内存
不占用内存	不占用磁盘
索引速度略微低一点	文档很多时，动态创建慢，占内存

doc_values 默认开启，fielddata 默认敞开。

doc_values 演示：

PUT usersPUT users/_doc/1{  "age":100}PUT users/_doc/2{  "age":99}PUT users/_doc/3{  "age":98}PUT users/_doc/4{  "age":101}GET users/_search{  "query": {    "match_all": {}  },  "sort":[    {      "age":{        "order": "desc"      }    }    ]}

因为 doc_values 默认时开启的，所以能够间接应用该字段排序，如果想敞开 doc_values ，如下：

PUT users{  "mappings": {    "properties": {      "age":{        "type": "integer",        "doc_values": false      }    }  }}PUT users/_doc/1{  "age":100}PUT users/_doc/2{  "age":99}PUT users/_doc/3{  "age":98}PUT users/_doc/4{  "age":101}GET users/_search{  "query": {    "match_all": {}  },  "sort":[    {      "age":{        "order": "desc"      }    }    ]}

1.8 dynamic

1.9 enabled

es 默认会索引所有的字段，然而有的字段可能只须要存储，不须要索引。此时能够通过 enabled 字段来管制：

PUT blog{  "mappings": {    "properties": {      "title":{        "enabled": false      }    }  }}PUT blog/_doc/1{  "title":"javaboy"}GET blog/_search{  "query": {    "term": {      "title": "javaboy"    }  }}

设置了 enabled 为 false 之后，就能够再通过该字段进行搜寻了。

1.10 format

日期格局。format 能够标准日期格局，而且一次能够定义多个 format。

PUT users{  "mappings": {    "properties": {      "birthday":{        "type": "date",        "format": "yyyy-MM-dd||yyyy-MM-dd HH:mm:ss"      }    }  }}PUT users/_doc/1{  "birthday":"2020-11-11"}PUT users/_doc/2{  "birthday":"2020-11-11 11:11:11"}

多个日期格局之间，应用 || 符号连贯，留神没有空格。
如果用户没有指定日期的 format，默认的日期格局是 strict_date_optional_time||epoch_mills

另外，所有的日期格局，能够在 https://www.elastic.co/guide/... 网址查看。

1.11 ignore_above

igbore_above 用于指定分词和索引的字符串最大长度，超过最大长度的话，该字段将不会被索引，这个字段只实用于 keyword 类型。

PUT blog{  "mappings": {    "properties": {      "title":{        "type": "keyword",        "ignore_above": 10      }    }  }}PUT blog/_doc/1{  "title":"javaboy"}PUT blog/_doc/2{  "title":"javaboyjavaboyjavaboy"}GET blog/_search{  "query": {    "term": {      "title": "javaboyjavaboyjavaboy"    }  }}

1.12 ignore_malformed

ignore_malformed 能够疏忽不规则的数据，该参数默认为 false。

PUT users{  "mappings": {    "properties": {      "birthday":{        "type": "date",        "format": "yyyy-MM-dd||yyyy-MM-dd HH:mm:ss"      },      "age":{        "type": "integer",        "ignore_malformed": true      }    }  }}PUT users/_doc/1{  "birthday":"2020-11-11",  "age":99}PUT users/_doc/2{  "birthday":"2020-11-11 11:11:11",  "age":"abc"}PUT users/_doc/2{  "birthday":"2020-11-11 11:11:11aaa",  "age":"abc"}

1.13 include_in_all

这个是针对 _all 字段的，然而在 es7 中，该字段曾经被废除了。

1.14 index

index 属性指定一个字段是否被索引，该属性为 true 示意字段被索引，false 示意字段不被索引。

PUT users{  "mappings": {    "properties": {      "age":{        "type": "integer",        "index": false      }    }  }}PUT users/_doc/1{  "age":99}GET users/_search{  "query": {    "term": {      "age": 99    }  }}

如果 index 为 false，则不能通过对应的字段搜寻。

1.15 index_options

index_options 管制索引时哪些信息被存储到倒排索引中（用在 text 字段中），有四种取值：

index_options	备注
docs	只存储文档编号，默认即此
freqs	在 docs 根底上，存储词项频率
positions	在 freqs 根底上，存储词项偏移地位
offsets	在 positions 根底上，存储词项开始和完结的字符地位

1.16 norms

norms 对字段评分有用，text 默认开启 norms，如果不是特地须要，不要开启 norms。

1.17 null_value

在 es 中，值为 null 的字段不索引也不能够被搜寻，null_value 能够让值为 null 的字段显式的可索引、可搜寻：

PUT users{  "mappings": {    "properties": {      "name":{        "type": "keyword",        "null_value": "javaboy_null"      }    }  }}PUT users/_doc/1{  "name":null,  "age":99}GET users/_search{  "query": {    "term": {      "name": "javaboy_null"    }  }}

1.18 position_increment_gap

被解析的 text 字段会将 term 的地位思考进去，目标是为了反对近似查问和短语查问，当咱们去索引一个含有多个值的 text 字段时，会在各个值之间增加一个假想的空间，将值隔开，这样就能够无效防止一些无意义的短语匹配，间隙大小通过 position_increment_gap 来管制，默认是 100。

PUT usersPUT users/_doc/1{  "name":["zhang san","li si"]}GET users/_search{  "query": {    "match_phrase": {      "name": {        "query": "sanli"      }    }  }}

sanli 搜寻不到，因为两个短语之间有一个假想的空隙，为 100。

GET users/_search{  "query": {    "match_phrase": {      "name": {        "query": "san li",        "slop": 101      }    }  }}

能够通过 slop 指定空隙大小。

也能够在定义索引的时候，指定空隙：

PUT users{  "mappings": {    "properties": {      "name":{        "type": "text",        "position_increment_gap": 0      }    }  }}PUT users/_doc/1{  "name":["zhang san","li si"]}GET users/_search{  "query": {    "match_phrase": {      "name": {        "query": "san li"      }    }  }}

1.19 properties

1.20 similarity

similarity 指定文档的评分模型，默认有三种：

similarity	备注
BM25	es 和 lucene 默认的评分模型
classic	TF/IDF 评分
boolean	boolean 模型评分

1.21 store

默认状况下，字段会被索引，也能够搜寻，然而不会存储，尽管不会被存储的，然而 _source 中有一个字段的备份。如果想将字段存储下来，能够通过配置 store 来实现。

1.22 term_vectors

term_vectors 是通过分词器产生的信息，包含：

一组 terms
每个 term 的地位
term 的首字符/尾字符与原始字符串原点的偏移量

term_vectors 取值：

取值	备注
no	不存储信息，默认即此
yes	term 被存储
with_positions	在 yes 的根底上减少地位信息
with_offset	在 yes 的根底上减少偏移信息
with_positions_offsets	term、地位、偏移量都存储

1.23 fields

fields 参数能够让同一字段有多种不同的索引形式。例如：

PUT blog{  "mappings": {    "properties": {      "title":{        "type": "text",        "fields": {          "raw":{            "type":"keyword"          }        }      }    }  }}PUT blog/_doc/1{  "title":"javaboy"}GET blog/_search{  "query": {    "term": {      "title.raw": "javaboy"    }  }}

https://www.elastic.co/guide/...

最初，松哥还收集了 50+ 个我的项目需要文档，想做个我的项目练练手的小伙伴无妨看看哦～

需要文档地址：https://github.com/lenve/javadoc