上一篇文章中,咱们学习了Join
类型的父子文档,明天持续学习一下嵌套文档,毕竟嵌套文档也是Elasticsearch
举荐的,首先咱们看上面这条语句
PUT word_document/_doc/1{ "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":100 } ]}
对于下面这种格局的数据,user
就是嵌套对象数组,那么user
在Elasticsearch
中是怎么存储的呢?如果咱们要对嵌套的子对象进行检索,怎么能力检索出咱们所须要的数据呢,上面咱们就一起来钻研下Nested
数据类型
环境
- macos 10.14.6
- Elasticsearch 8.1
- Kibana 8.1
Nested
结尾咱们还是先理解一下,什么是Nested
类型,其实就是字面意思,Nested
就是嵌套,也就是文章结尾user
数据类型那种,所以能够看为是一种非凡的Object
类型。还是以文章结尾的数据为例
PUT word_document/_doc/1{ "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":100 } ]}
如果咱们没有对word_document
索引进行显示设置数据类型,在下面这个语句执行之后,Elasticsearch会默认推断数据类型,在Elasticsearch中内容会转换为可能如下的模式,扁平化的解决数据
{ "title":"up", "user.name":["honghong","mingming","lanlan"], "user.sex":["male","male","female"], "user.numberOfLikes":[500,50,100]}
置信大家也看进去了,如果被Elasticsearch
转换成下面的这种数据结构之后,咱们的搜寻后果是会被影响的,如果咱们应用如下这个语句进行查问,咱们想搜寻name
是honghong
,sex
是male
,预期后果是没有匹配的文档,然而因为Elasticsearch
对上述的后果进行了扁平化的解决,造成了谬误的匹配
GET word_document/_search{ "query": { "bool": { "must": [ { "match": { "user.name": "honghong" }}, { "match": { "user.sex": "male" }} ] } }}
如何防止上述情况的产生呢,那就是应用Elasticsearch
提供的Nested
数据类型,Nested
数据类型保障了嵌套对象的独立性,也就是让咱们能够对嵌套对象的内容进行检索,从而不会产生上述的这种状况
首先咱们还是以下面文档为例,不过是这次咱们优先创立索引,并指定
user
字段为nested
PUT word_document{ "mappings": { "properties": { "title":{ "type": "keyword" }, "user": { "type": "nested" }, "numberOfLikes":{ "type": "integer" } } }}
上面退出咱们的测试数据,来验证咱们的搜寻语句
PUT word_document/_doc/1{ "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":100 } ]}PUT word_document/_doc/2{ "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":20 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":30 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":50 } ]}PUT word_document/_doc/3{ "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":50 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":50 } ]}
上面还是方才那个搜寻语句,此时就不会有匹配的文档返回,返回后果为空
GET word_document/_search{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.name": "honghong" }}, { "match": { "user.sex": "male" }} ] } } } }}
那么对于嵌套文档咱们能够怎么查问呢,那就是指定
nested
查问类型,应用一般的查问是查问不到的哦,nested
查问语句如下所示,此时返回的就是咱们GET word_document/_search{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.name": "honghong" }}, { "match": { "user.sex": "female" }} ] } }, "inner_hits": { "highlight": { "fields": { "user.name": {} } } } } }}
此外咱们还能够依据嵌套对象中的字段进行排序,升序时获取嵌套对象中最小的值最为比拟值,降序时获取嵌套对象最大的值作为比拟值
GET word_document/_search{ "query": { "nested": { "path": "user", "query": { "match": { "user.sex": "male" } } } }, "sort":[ { "user.numberOfLikes": { "order": "asc", "nested": { "path":"user" } } } ]}
返回如下
{ "took" : 101, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "word_document", "_id" : "2", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 20 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 30 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 50 } ] }, "sort" : [ 20 ] }, { "_index" : "word_document", "_id" : "1", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 100 } ] }, "sort" : [ 50 ] }, { "_index" : "word_document", "_id" : "3", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 50 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 50 } ] }, "sort" : [ 50 ] } ] }}
咱们也能够对嵌套对象进行聚合操作,如下咱们获取索引中
user.name=honghong
,user.sex=female
的所有文档,聚合统计numberOfLikes
的最小值GET word_document/_search{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.name": "honghong" } }, { "match": { "user.sex": "female" } } ] } } } }, "aggs": { "my_min_value": { "nested": { "path": "user" }, "aggs": { "min_value": { "min": { "field": "user.numberOfLikes" } } } } }}
下面的聚合统计只是对外部的文档过滤,那如果咱们有这么一个需要,聚合统计嵌套对象
user
内容sex=male
的最小值,那么咱们能够应用如下filter,上面这个语句优先过滤title=up
的文档,聚合统计user.sex=male
的numberOfLikes
最小值GET /word_document/_search?size=0{ "query": { "match": { "title": "up" } }, "aggs": { "my_user": { "nested": { "path": "user" }, "aggs": { "filter_my_user": { "filter": { "bool": { "filter": [ { "match": { "user.sex": "male" } } ] } }, "aggs": { "min_price": { "min": { "field": "user.numberOfLikes" } } } }, "no_filter_my_user":{ "min": { "field": "user.numberOfLikes" } } } } }}
最初还有一种就是反向嵌套聚合,通过嵌套对象聚合父文档,返回父文档信息
首先咱们还是先创立一个索引增加几条数据用来测试
PUT /issues{ "mappings": { "properties": { "tags": { "type": "keyword" }, "comments": { "type": "nested", "properties": { "username": { "type": "keyword" }, "comment": { "type": "text" } } } } }}PUT /issues/_doc/1{ "tags":"跳舞", "comments":[{ "username":"小李", "comment":"小李想学跳舞" }, { "username":"小红", "comment":"小红跳舞很有天才" } ]}PUT /issues/_doc/2{ "tags":"唱歌", "comments":[{ "username":"小李", "comment":"小李会唱歌" }, { "username":"小李", "comment":"小李唱歌有天才" }, { "username":"小红", "comment":"小红是歌手" } ]}PUT /issues/_doc/3{ "tags":"跳舞", "comments":[ { "username":"小红", "comment":"小红会跳舞" }, { "username":"小红", "comment":"小红是舞神" } ]}PUT /issues/_doc/4{ "tags":"唱歌", "comments":[ { "username":"小李", "comment":"小李几乎就是天生歌手" } ]}PUT /issues/_doc/5{ "tags":"跳舞", "comments":[ { "username":"小红", "comment":"小红舞姿很美" } ]}
issues 问题;tags 标签;username 名字;comment 评论;
上面咱们应用反向嵌套聚合父文档,需要如下:
1、先聚合统计出评论最多的
username
2、在聚合统计
username
中comment
最多的tag
GET /issues/_search?size=0{ "query": { "match_all": {} }, "aggs": { "comments": { "nested": { "path": "comments" }, "aggs": { "top_usernames": { "terms": { "field": "comments.username" }, "aggs": { "comment_to_issue": { "reverse_nested": {}, "aggs": { "top_tags_per_comment": { "terms": { "field": "tags" } } } } } } } } }}
后果如下,得出结论:
小红
评论次数最多,评论了5次
,小红评论最多的标签是跳舞
,有3次
{ "aggregations" : { "comments" : { "doc_count" : 9, "top_usernames" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "小红", "doc_count" : 5, "comment_to_issue" : { "doc_count" : 4, "top_tags_per_comment" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "跳舞", "doc_count" : 3 }, { "key" : "唱歌", "doc_count" : 1 } ] } } }, { "key" : "小李", "doc_count" : 4, "comment_to_issue" : { "doc_count" : 3, "top_tags_per_comment" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "唱歌", "doc_count" : 2 }, { "key" : "跳舞", "doc_count" : 1 } ] } } } ] } } }}
Nested 反对的参数有哪些
Nested
也只是非凡的Object
的一种,也是有反对的几种参数
dynamic
: (可选参数) 是否容许在索引mapping
文件未定义字段的状况下对新字段的解决,默认是退出到现有的嵌套对象中(true
),还反对false
,strict
properties
: (可选参数) 嵌套对象字段内容属性设置include_in_parent
:(可选参数) 默认false
,如果为true
,嵌套对象的字段也会作为一般字段的模式(flat
)增加到父文档include_in_root
:(可选参数) 默认false
,如果为true
,嵌套对象的字段也会作为一般字段的模式(flat
)增加到根文档
Nested 类型的束缚
通过后面的学习,咱们也晓得了nested类型能够作为一个独自的Lucene
文档进行所有,当咱们有100
个嵌套对象的时候咱们须要101
个文档来存储映射关系,一个用于父文档,一个用于嵌套文档,所以这一部分的开销,ELasticsearch
来通过一下设置进行了束缚
index.mapping.nested_fields.limit
一个索引中,嵌套类型字段(nested)最多存在多个限度,默认
50个
,如咱们下面的例子中,也就是只占用了一个index.mapping.nested_objects.limit
一个索引中,单个嵌套类型字段蕴含的嵌套
JSON
对象的最大数量,默认10000
总结
通过下面的学习实际,咱们能够晓得Nested
嵌套类型是Elasticsearch
举荐的绝对于Join
类型,并且Nested
能够实现查问,聚合,排序等,根本满足了工作的须要。好了,到这就完结吧,有什么须要深刻理解的,留言哦,也能够去官网查看,毕竟官网还是一手材料,博主的也只能算是入门启蒙笔记,实际起来吧,加油!
Join 字段的详解能够参考博主的这一篇文章哦
本文由mdnice多平台公布