上一篇文章中,咱们学习了Join类型的父子文档,明天持续学习一下嵌套文档,毕竟嵌套文档也是Elasticsearch举荐的,首先咱们看上面这条语句

PUT word_document/_doc/1{  "title" : "up",  "user" : [     {      "name" : "honghong",      "sex" :  "female",      "numberOfLikes":500    },    {      "name" : "mingming",      "sex" :  "male",      "numberOfLikes":50    },    {      "name" : "lanlan",      "sex" :  "male",      "numberOfLikes":100    }  ]}

对于下面这种格局的数据,user就是嵌套对象数组,那么userElasticsearch中是怎么存储的呢?如果咱们要对嵌套的子对象进行检索,怎么能力检索出咱们所须要的数据呢,上面咱们就一起来钻研下Nested数据类型

环境

  • macos 10.14.6
  • Elasticsearch 8.1
  • Kibana 8.1

Nested

结尾咱们还是先理解一下,什么是Nested类型,其实就是字面意思,Nested就是嵌套,也就是文章结尾user数据类型那种,所以能够看为是一种非凡的Object类型。还是以文章结尾的数据为例

PUT word_document/_doc/1{  "title" : "up",  "user" : [     {      "name" : "honghong",      "sex" :  "female",      "numberOfLikes":500    },    {      "name" : "mingming",      "sex" :  "male",      "numberOfLikes":50    },    {      "name" : "lanlan",      "sex" :  "male",      "numberOfLikes":100    }  ]}

如果咱们没有对word_document索引进行显示设置数据类型,在下面这个语句执行之后,Elasticsearch会默认推断数据类型,在Elasticsearch中内容会转换为可能如下的模式,扁平化的解决数据

{  "title":"up",  "user.name":["honghong","mingming","lanlan"],  "user.sex":["male","male","female"],  "user.numberOfLikes":[500,50,100]}

置信大家也看进去了,如果被Elasticsearch转换成下面的这种数据结构之后,咱们的搜寻后果是会被影响的,如果咱们应用如下这个语句进行查问,咱们想搜寻namehonghongsexmale,预期后果是没有匹配的文档,然而因为Elasticsearch对上述的后果进行了扁平化的解决,造成了谬误的匹配

GET word_document/_search{  "query": {    "bool": {      "must": [        { "match": { "user.name": "honghong" }},        { "match": { "user.sex":  "male" }}      ]    }  }}

如何防止上述情况的产生呢,那就是应用Elasticsearch提供的Nested数据类型,Nested 数据类型保障了嵌套对象的独立性,也就是让咱们能够对嵌套对象的内容进行检索,从而不会产生上述的这种状况

  • 首先咱们还是以下面文档为例,不过是这次咱们优先创立索引,并指定user字段为nested

    PUT word_document{  "mappings": {    "properties": {      "title":{        "type": "keyword"        },      "user": {        "type": "nested"       },      "numberOfLikes":{        "type": "integer"      }    }  }}
  • 上面退出咱们的测试数据,来验证咱们的搜寻语句

    PUT word_document/_doc/1{  "title" : "up",  "user" : [     {      "name" : "honghong",      "sex" :  "female",      "numberOfLikes":500    },    {      "name" : "mingming",      "sex" :  "male",      "numberOfLikes":50    },    {      "name" : "lanlan",      "sex" :  "male",      "numberOfLikes":100    }  ]}PUT word_document/_doc/2{  "title" : "up",  "user" : [       {      "name" : "honghong",      "sex" :  "female",      "numberOfLikes":20    },    {      "name" : "mingming",      "sex" :  "male",      "numberOfLikes":30    },    {      "name" : "lanlan",      "sex" :  "male",      "numberOfLikes":50    }  ]}PUT word_document/_doc/3{  "title" : "up",  "user" : [     {      "name" : "honghong",      "sex" :  "female",      "numberOfLikes":50    },    {      "name" : "mingming",      "sex" :  "male",      "numberOfLikes":50    },    {      "name" : "lanlan",      "sex" :  "male",      "numberOfLikes":50    }  ]}
  • 上面还是方才那个搜寻语句,此时就不会有匹配的文档返回,返回后果为空

    GET word_document/_search{  "query": {    "nested": {      "path": "user",      "query": {        "bool": {          "must": [            { "match": { "user.name": "honghong" }},            { "match": { "user.sex":  "male" }}           ]        }      }    }  }}
  • 那么对于嵌套文档咱们能够怎么查问呢,那就是指定nested查问类型,应用一般的查问是查问不到的哦,nested查问语句如下所示,此时返回的就是咱们

    GET word_document/_search{  "query": {    "nested": {      "path": "user",      "query": {        "bool": {          "must": [            { "match": { "user.name": "honghong" }},            { "match": { "user.sex":  "female" }}           ]        }      },      "inner_hits": {         "highlight": {          "fields": {            "user.name": {}          }        }      }    }  }}
  • 此外咱们还能够依据嵌套对象中的字段进行排序,升序时获取嵌套对象中最小的值最为比拟值,降序时获取嵌套对象最大的值作为比拟值

    GET word_document/_search{  "query": {    "nested": {      "path": "user",      "query": {        "match": {          "user.sex": "male"        }      }    }  },  "sort":[    {      "user.numberOfLikes": {        "order": "asc",         "nested": {          "path":"user"        }      }    }    ]}

    返回如下

    {  "took" : 101,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 3,      "relation" : "eq"    },    "max_score" : null,    "hits" : [      {        "_index" : "word_document",        "_id" : "2",        "_score" : null,        "_source" : {          "title" : "up",          "user" : [            {              "name" : "honghong",              "sex" : "female",              "numberOfLikes" : 20            },            {              "name" : "mingming",              "sex" : "male",              "numberOfLikes" : 30            },            {              "name" : "lanlan",              "sex" : "male",              "numberOfLikes" : 50            }          ]        },        "sort" : [          20        ]      },      {        "_index" : "word_document",        "_id" : "1",        "_score" : null,        "_source" : {          "title" : "up",          "user" : [            {              "name" : "honghong",              "sex" : "female",              "numberOfLikes" : 500            },            {              "name" : "mingming",              "sex" : "male",              "numberOfLikes" : 50            },            {              "name" : "lanlan",              "sex" : "male",              "numberOfLikes" : 100            }          ]        },        "sort" : [          50        ]      },      {        "_index" : "word_document",        "_id" : "3",        "_score" : null,        "_source" : {          "title" : "up",          "user" : [            {              "name" : "honghong",              "sex" : "female",              "numberOfLikes" : 50            },            {              "name" : "mingming",              "sex" : "male",              "numberOfLikes" : 50            },            {              "name" : "lanlan",              "sex" : "male",              "numberOfLikes" : 50            }          ]        },        "sort" : [          50        ]      }    ]  }}
  • 咱们也能够对嵌套对象进行聚合操作,如下咱们获取索引中user.name=honghong,user.sex=female的所有文档,聚合统计numberOfLikes的最小值

    GET word_document/_search{  "query": {    "nested": {      "path": "user",      "query": {        "bool": {          "must": [            {              "match": {                "user.name": "honghong"              }            },            {              "match": {                "user.sex": "female"              }            }          ]        }      }    }  },  "aggs": {    "my_min_value": {      "nested": {        "path": "user"      },       "aggs": {        "min_value": {          "min": {            "field": "user.numberOfLikes"          }        }      }    }  }}
  • 下面的聚合统计只是对外部的文档过滤,那如果咱们有这么一个需要,聚合统计嵌套对象user内容sex=male的最小值,那么咱们能够应用如下filter,上面这个语句优先过滤title=up的文档,聚合统计user.sex=malenumberOfLikes最小值

    GET /word_document/_search?size=0{  "query": {    "match": {      "title": "up"    }  },  "aggs": {    "my_user": {      "nested": {        "path": "user"      },      "aggs": {        "filter_my_user": {          "filter": {            "bool": {              "filter": [                {                  "match": {                    "user.sex": "male"                  }                }              ]            }          },          "aggs": {            "min_price": {              "min": {                "field": "user.numberOfLikes"              }            }          }        },        "no_filter_my_user":{          "min": {            "field": "user.numberOfLikes"          }        }      }    }  }}
  • 最初还有一种就是反向嵌套聚合,通过嵌套对象聚合父文档,返回父文档信息

    首先咱们还是先创立一个索引增加几条数据用来测试

    PUT /issues{  "mappings": {    "properties": {      "tags": { "type": "keyword" },      "comments": {                                    "type": "nested",        "properties": {          "username": { "type": "keyword" },          "comment": { "type": "text" }        }      }    }  }}PUT /issues/_doc/1{  "tags":"跳舞",  "comments":[{    "username":"小李",    "comment":"小李想学跳舞"  },  {    "username":"小红",    "comment":"小红跳舞很有天才"  }  ]}PUT /issues/_doc/2{  "tags":"唱歌",  "comments":[{    "username":"小李",    "comment":"小李会唱歌"  },  {    "username":"小李",    "comment":"小李唱歌有天才"  },  {    "username":"小红",    "comment":"小红是歌手"  }  ]}PUT /issues/_doc/3{  "tags":"跳舞",  "comments":[  {    "username":"小红",    "comment":"小红会跳舞"  },  {    "username":"小红",    "comment":"小红是舞神"  }  ]}PUT /issues/_doc/4{  "tags":"唱歌",  "comments":[  {    "username":"小李",    "comment":"小李几乎就是天生歌手"  }  ]}PUT /issues/_doc/5{  "tags":"跳舞",  "comments":[  {    "username":"小红",    "comment":"小红舞姿很美"  }  ]}
    issues 问题;tags 标签;username 名字;comment 评论;

    上面咱们应用反向嵌套聚合父文档,需要如下:

    1、先聚合统计出评论最多的username

    2、在聚合统计usernamecomment最多的tag

    GET /issues/_search?size=0{  "query": {    "match_all": {}  },  "aggs": {    "comments": {      "nested": {        "path": "comments"      },      "aggs": {        "top_usernames": {          "terms": {            "field": "comments.username"          },          "aggs": {            "comment_to_issue": {              "reverse_nested": {},               "aggs": {                "top_tags_per_comment": {                  "terms": {                    "field": "tags"                  }                }              }            }          }        }      }    }  }}

    后果如下,得出结论:小红评论次数最多,评论了5次,小红评论最多的标签是跳舞,有3次

    {  "aggregations" : {    "comments" : {      "doc_count" : 9,      "top_usernames" : {        "doc_count_error_upper_bound" : 0,        "sum_other_doc_count" : 0,        "buckets" : [          {            "key" : "小红",            "doc_count" : 5,            "comment_to_issue" : {              "doc_count" : 4,              "top_tags_per_comment" : {                "doc_count_error_upper_bound" : 0,                "sum_other_doc_count" : 0,                "buckets" : [                  {                    "key" : "跳舞",                    "doc_count" : 3                  },                  {                    "key" : "唱歌",                    "doc_count" : 1                  }                ]              }            }          },          {            "key" : "小李",            "doc_count" : 4,            "comment_to_issue" : {              "doc_count" : 3,              "top_tags_per_comment" : {                "doc_count_error_upper_bound" : 0,                "sum_other_doc_count" : 0,                "buckets" : [                  {                    "key" : "唱歌",                    "doc_count" : 2                  },                  {                    "key" : "跳舞",                    "doc_count" : 1                  }                ]              }            }          }        ]      }    }  }}

Nested 反对的参数有哪些

Nested也只是非凡的Object的一种,也是有反对的几种参数

  • dynamic: (可选参数) 是否容许在索引mapping文件未定义字段的状况下对新字段的解决,默认是退出到现有的嵌套对象中(true),还反对falsestrict
  • properties: (可选参数) 嵌套对象字段内容属性设置
  • include_in_parent:(可选参数) 默认false,如果为true,嵌套对象的字段也会作为一般字段的模式(flat)增加到父文档
  • include_in_root:(可选参数) 默认false,如果为true,嵌套对象的字段也会作为一般字段的模式(flat)增加到根文档

Nested 类型的束缚

通过后面的学习,咱们也晓得了nested类型能够作为一个独自的Lucene文档进行所有,当咱们有100个嵌套对象的时候咱们须要101个文档来存储映射关系,一个用于父文档,一个用于嵌套文档,所以这一部分的开销,ELasticsearch来通过一下设置进行了束缚

  • index.mapping.nested_fields.limit

    一个索引中,嵌套类型字段(nested)最多存在多个限度,默认50个,如咱们下面的例子中,也就是只占用了一个

  • index.mapping.nested_objects.limit

    一个索引中,单个嵌套类型字段蕴含的嵌套JSON对象的最大数量,默认10000

总结

通过下面的学习实际,咱们能够晓得Nested嵌套类型是Elasticsearch举荐的绝对于Join类型,并且Nested能够实现查问,聚合,排序等,根本满足了工作的须要。好了,到这就完结吧,有什么须要深刻理解的,留言哦,也能够去官网查看,毕竟官网还是一手材料,博主的也只能算是入门启蒙笔记,实际起来吧,加油!

Join 字段的详解能够参考博主的这一篇文章哦

本文由mdnice多平台公布