elasticsearch学习笔i记二十五Elasticsearch-mapping详解以及索引内部原理 | 乐趣区

下面先简单描述一下mapping是什么？
当我们插入几条数据，让ES自动为我们建立一个索引

PUT /website/_doc/1{  "post_date": "2017-01-01",  "title": "my first article",  "content": "this is my first article in this website",  "author_id": 11400}PUT /website/_doc/2{  "post_date": "2017-01-02",  "title": "my second article",  "content": "this is my second article in this website",  "author_id": 11400}PUT /website/_doc/3{  "post_date": "2017-01-03",  "title": "my third article",  "content": "this is my third article in this website",  "author_id": 11400}

查看mapping

GET /website/_mapping{  "website" : {    "mappings" : {      "properties" : {        "author_id" : {          "type" : "long"        },        "content" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        },        "post_date" : {          "type" : "date"        },        "title" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        }      }    }  }}

上面是插入数据自动生成的mapping，还有手动生成的mapping。这种自动或手动为index中的type建立的一种数据结构和相关配置，称为mapping。
下面是手动创建的mapping。

PUT /test_mapping{  "mappings" : {    "properties" : {      "author_id" : {        "type" : "long"      },      "content" : {        "type" : "text",        "fields" : {          "keyword" : {            "type" : "keyword",            "ignore_above" : 256          }        }      },      "post_date" : {        "type" : "date"      },      "title" : {        "type" : "text",        "fields" : {          "keyword" : {            "type" : "keyword",            "ignore_above" : 256          }        }      }    }  }}

1、精确匹配与全文搜索的对比分析

（1）exact value

也就是某个field必须全部匹配才能返回相应的document
示例:

GET /website/_search?q=post_date:2017{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 0,      "relation" : "eq"    },    "max_score" : null,    "hits" : [ ]  }}GET /website/_search?q=post_date:2017-01-01{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "website",        "_type" : "doc",        "_id" : "1",        "_score" : 1.0,        "_source" : {          "post_date" : "2017-01-01",          "title" : "my first article",          "content" : "this is my first article in this website",          "author_id" : 11400        }      }    ]  }}

（2）full text

full text与exact value不一样，不是说单纯的只是匹配完整的一个值，而是可以对值进行拆分词语后（分词）进行匹配，也可以通过缩写、时态、大小写、同义词等进行匹配。
示例：

GET /website/_search?q=title:article{  "took" : 7,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 3,      "relation" : "eq"    },    "max_score" : 0.087011375,    "hits" : [      {        "_index" : "website",        "_type" : "doc",        "_id" : "1",        "_score" : 0.087011375,        "_source" : {          "post_date" : "2017-01-01",          "title" : "my first article",          "content" : "this is my first article in this website",          "author_id" : 11400        }      },      {        "_index" : "website",        "_type" : "doc",        "_id" : "2",        "_score" : 0.087011375,        "_source" : {          "post_date" : "2017-01-02",          "title" : "my second article",          "content" : "this is my second in this website",          "author_id" : 11400        }      },      {        "_index" : "website",        "_type" : "doc",        "_id" : "3",        "_score" : 0.087011375,        "_source" : {          "post_date" : "2017-01-03",          "title" : "my third article",          "content" : "this is my third in this website",          "author_id" : 11400        }      }    ]  }}

2、倒排索引核心原理

下面演示一下倒排索引简单建立的过程，当然实际中倒排索引的建立过程会非常的复杂。
doc1: I really liked my small dogs, and I think my mom also liked them.
doc2: He never liked any dogs, so I hope that my mom will not expect me to liked him.

分词，初步的倒排索引的建立

word    doc1    doc2I        *        *really   *liked    *        *my       *        *small    *dogs     *and      *think    *mom      *        *also     *        them     *He                *never             *any               *so                *hope              *that              *will              *not               *expect            *me                *to                *him               *

搜索 mother like little dog, 不会有任何结果
mother
like
little
dog
这肯定不是我们想要的结果。比如mother和mom其实根本就没有区别。但是却检索不到。但是做下测试发现ES是可以查到的。实际上ES在建立倒排索引的时候，还会执行一个操作，就是会对拆分的各个单词进行相应的处理，以提升后面搜索的时候能够搜索到相关联的文档的概率。像时态的转换，单复数的转换，同义词的转换，大小写的转换。这个过程称为正则化（normalization）
mother-> mom
liked -> like
small -> little
dogs -> dog
这样重新建立倒排索引：

word    doc1    doc2I        *        *really   *like     *        *my       *        *little   *dog      *and      *think    *mom      *        *also     *        them     *He                *never             *any               *so                *hope              *that              *will              *not               *expect            *me                *to                *him               *

查询：mother like little dog 分词正则化
mother -> mom
like -> like
little -> little
dog -> dog
doc1和doc2都会搜索出来
doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

3、对mapping进一步总结

（1）往ES里面直接插入数据，ES会自动建立索引，同时建立type以及对应的mapping
（2）mapping中自动定义了每个fieldd的数据类型
（3）不同的数据类型（比如说text和date），可能有的是exact value，有的是full text
（4）exact value，在建立倒排索引的时候，分词的时候，都是将整个值一起作为关键字建立到倒排索引中；full text会经历各种各样的处理，分词，normalization（时态转换，同义词转换，大小写转换），才会建立到倒排索引中
（5）在搜索的时候，exact value和full text类型就决定了，对exact value和full text field进行搜索的行为也是不一样的，会跟建立倒排索引的行为保持一致；比如说exact value搜索的时候，就是直接按照整个值进行匹配，full text也会进行分词和正则化normalization再去倒排索引中去搜索。
（6）可以用 ES的dynamic mapping，让其自动建立mapping,包括自动设置数据类型；也可以提前手动创建index和type的mapping,自己对各个field进行设置，包括数据类型，包括索引行为，包括分析器等等。

mapping本质上就是index的type的元数据，决定了数据类型，建立倒排索引的行为，还有进行搜索的行为。

4、mapping核心数据类型以及dynamic mapping

（1）核心数据类型
string text：字符串类型
byte:字节类型
short：短整型
integer：整型
long:长整型
float:浮点型
boolean:布尔类型
date:时间类型
当然还有一些高级类型，像数组，对象object，但其底层都是text字符串类型
（2） dynamic mapping
true or false -> boolean
123 -> long
123.45 -> float
2017-01-01 -> date
"hello world" -> string text
（3）查看mapping

GET /{index}/mappingGET /test/_mapping{  "test" : {    "mappings" : {      "properties" : {        "field1" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        },        "field2" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        }      }    }  }}

5、手动建立和修改mapping以及定制string类型是否分词

注意：只能创建index时手动建立mapping，或者新增field mapping，但是不能update field mapping。

# 创建索引PUT /website{  "mappings": {    "properties": {      "author_id": {        "type": "long"      },      "title": {        "type": "text",        "analyzer": "standard"      },      "content": {        "type": "text"      },      "post_date": {        "type": "date"      },      "publisher_id": {        "type": "text",        "index": false      }    }  }}#修改字段的mappingPUT /website{  "mappings": {    "properties": {      "author_id": {        "type": "text"      }    }  }}{  "error": {    "root_cause": [      {        "type": "resource_already_exists_exception",        "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",        "index_uuid": "5xLohnJITHqCwRYInmBFmA",        "index": "website"      }    ],    "type": "resource_already_exists_exception",    "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",    "index_uuid": "5xLohnJITHqCwRYInmBFmA",    "index": "website"  },  "status": 400}#增加mapping的字段PUT /website/_mapping{  "properties": {    "new_field": {      "type": "text"    }  }}{  "acknowledged" : true}

6、mapping复杂类型y以及object类型数据底层结构

（1）multivalue field

{    "tags": ["tag1", "tag2"]}

（2）empty field
null, []
（3）object field

PUT /test/_create/1{  "address": {    "country": "china",    "province": "guangdong",    "city": "guangzhou"  },  "name": "jack",  "age": 27,  "join_date": "2017-01-01"}GET /test/_mapping{  "test" : {    "mappings" : {      "properties" : {        "address" : {          "properties" : {            "city" : {              "type" : "text",              "fields" : {                "keyword" : {                  "type" : "keyword",                  "ignore_above" : 256                }              }            },            "country" : {              "type" : "text",              "fields" : {                "keyword" : {                  "type" : "keyword",                  "ignore_above" : 256                }              }            },            "province" : {              "type" : "text",              "fields" : {                "keyword" : {                  "type" : "keyword",                  "ignore_above" : 256                }              }            }          }        },        "age" : {          "type" : "long"        },        "join_date" : {          "type" : "date"        },        "name" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        }      }    }  }}GET /test/_doc/1{  "_index" : "test",  "_type" : "_doc",  "_id" : "1",  "_version" : 1,  "_seq_no" : 0,  "_primary_term" : 1,  "found" : true,  "_source" : {    "address" : {      "country" : "china",      "province" : "guangdong",      "city" : "guangzhou"    },    "name" : "jack",    "age" : 27,    "join_date" : "2017-01-01"  }}

注意：建立索引的时候与string时一样的，数据类型不能混