上一篇文章中,咱们学习了 Join
类型的父子文档,明天持续学习一下嵌套文档,毕竟嵌套文档也是 Elasticsearch
举荐的,首先咱们看上面这条语句
PUT word_document/_doc/1
{
"title" : "up",
"user" : [
{
"name" : "honghong",
"sex" : "female",
"numberOfLikes":500
},
{
"name" : "mingming",
"sex" : "male",
"numberOfLikes":50
},
{
"name" : "lanlan",
"sex" : "male",
"numberOfLikes":100
}
]
}
对于下面这种格局的数据,user
就是嵌套对象数组,那么 user
在Elasticsearch
中是怎么存储的呢?如果咱们要对嵌套的子对象进行检索,怎么能力检索出咱们所须要的数据呢,上面咱们就一起来钻研下 Nested
数据类型
环境
- macos 10.14.6
- Elasticsearch 8.1
- Kibana 8.1
Nested
结尾咱们还是先理解一下,什么是 Nested
类型,其实就是字面意思,Nested
就是嵌套,也就是文章结尾 user
数据类型那种,所以能够看为是一种非凡的 Object
类型。还是以文章结尾的数据为例
PUT word_document/_doc/1
{
"title" : "up",
"user" : [
{
"name" : "honghong",
"sex" : "female",
"numberOfLikes":500
},
{
"name" : "mingming",
"sex" : "male",
"numberOfLikes":50
},
{
"name" : "lanlan",
"sex" : "male",
"numberOfLikes":100
}
]
}
如果咱们没有对 word_document
索引进行显示设置数据类型,在下面这个语句执行之后,Elasticsearch 会默认推断数据类型,在 Elasticsearch 中内容会转换为可能如下的模式,扁平化的解决数据
{
"title":"up",
"user.name":["honghong","mingming","lanlan"],
"user.sex":["male","male","female"],
"user.numberOfLikes":[500,50,100]
}
置信大家也看进去了,如果被 Elasticsearch
转换成下面的这种数据结构之后,咱们的搜寻后果是会被影响的,如果咱们应用如下这个语句进行查问,咱们想搜寻 name
是honghong
,sex
是 male
,预期后果是没有匹配的文档,然而因为Elasticsearch
对上述的后果进行了扁平化的解决,造成了谬误的匹配
GET word_document/_search
{
"query": {
"bool": {
"must": [{ "match": { "user.name": "honghong"}},
{"match": { "user.sex": "male"}}
]
}
}
}
如何防止上述情况的产生呢,那就是应用 Elasticsearch
提供的 Nested
数据类型,Nested
数据类型保障了嵌套对象的独立性,也就是让咱们能够对嵌套对象的内容进行检索,从而不会产生上述的这种状况
-
首先咱们还是以下面文档为例,不过是这次咱们优先创立索引,并指定
user
字段为nested
PUT word_document { "mappings": { "properties": { "title":{"type": "keyword"}, "user": {"type": "nested"}, "numberOfLikes":{"type": "integer"} } } }
-
上面退出咱们的测试数据,来验证咱们的搜寻语句
PUT word_document/_doc/1 { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":100 } ] } PUT word_document/_doc/2 { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":20 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":30 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":50 } ] } PUT word_document/_doc/3 { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes":50 }, { "name" : "mingming", "sex" : "male", "numberOfLikes":50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes":50 } ] }
-
上面还是方才那个搜寻语句,此时就不会有匹配的文档返回,返回后果为空
GET word_document/_search { "query": { "nested": { "path": "user", "query": { "bool": { "must": [{ "match": { "user.name": "honghong"}}, {"match": { "user.sex": "male"}} ] } } } } }
-
那么对于嵌套文档咱们能够怎么查问呢,那就是指定
nested
查问类型,应用一般的查问是查问不到的哦,nested
查问语句如下所示,此时返回的就是咱们GET word_document/_search { "query": { "nested": { "path": "user", "query": { "bool": { "must": [{ "match": { "user.name": "honghong"}}, {"match": { "user.sex": "female"}} ] } }, "inner_hits": { "highlight": { "fields": {"user.name": {} } } } } } }
-
此外咱们还能够依据嵌套对象中的字段进行排序,升序时获取嵌套对象中最小的值最为比拟值,降序时获取嵌套对象最大的值作为比拟值
GET word_document/_search { "query": { "nested": { "path": "user", "query": { "match": {"user.sex": "male"} } } }, "sort":[ { "user.numberOfLikes": { "order": "asc", "nested": {"path":"user"} } } ] }
返回如下
{ "took" : 101, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "word_document", "_id" : "2", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 20 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 30 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 50 } ] }, "sort" : [20] }, { "_index" : "word_document", "_id" : "1", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 500 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 100 } ] }, "sort" : [50] }, { "_index" : "word_document", "_id" : "3", "_score" : null, "_source" : { "title" : "up", "user" : [ { "name" : "honghong", "sex" : "female", "numberOfLikes" : 50 }, { "name" : "mingming", "sex" : "male", "numberOfLikes" : 50 }, { "name" : "lanlan", "sex" : "male", "numberOfLikes" : 50 } ] }, "sort" : [50] } ] } }
-
咱们也能够对嵌套对象进行聚合操作,如下咱们获取索引中
user.name=honghong
,user.sex=female
的所有文档,聚合统计numberOfLikes
的最小值GET word_document/_search { "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": {"user.name": "honghong"} }, { "match": {"user.sex": "female"} } ] } } } }, "aggs": { "my_min_value": { "nested": {"path": "user"}, "aggs": { "min_value": { "min": {"field": "user.numberOfLikes"} } } } } }
-
下面的聚合统计只是对外部的文档过滤,那如果咱们有这么一个需要,聚合统计嵌套对象
user
内容sex=male
的最小值,那么咱们能够应用如下 filter,上面这个语句优先过滤title=up
的文档,聚合统计user.sex=male
的numberOfLikes
最小值GET /word_document/_search?size=0 { "query": { "match": {"title": "up"} }, "aggs": { "my_user": { "nested": {"path": "user"}, "aggs": { "filter_my_user": { "filter": { "bool": { "filter": [ { "match": {"user.sex": "male"} } ] } }, "aggs": { "min_price": { "min": {"field": "user.numberOfLikes"} } } }, "no_filter_my_user":{ "min": {"field": "user.numberOfLikes"} } } } } }
-
最初还有一种就是反向嵌套聚合,通过嵌套对象聚合父文档,返回父文档信息
首先咱们还是先创立一个索引增加几条数据用来测试
PUT /issues { "mappings": { "properties": {"tags": { "type": "keyword"}, "comments": { "type": "nested", "properties": {"username": { "type": "keyword"}, "comment": {"type": "text"} } } } } } PUT /issues/_doc/1 { "tags":"跳舞", "comments":[{ "username":"小李", "comment":"小李想学跳舞" }, { "username":"小红", "comment":"小红跳舞很有天才" } ] } PUT /issues/_doc/2 { "tags":"唱歌", "comments":[{ "username":"小李", "comment":"小李会唱歌" }, { "username":"小李", "comment":"小李唱歌有天才" }, { "username":"小红", "comment":"小红是歌手" } ] } PUT /issues/_doc/3 { "tags":"跳舞", "comments":[ { "username":"小红", "comment":"小红会跳舞" }, { "username":"小红", "comment":"小红是舞神" } ] } PUT /issues/_doc/4 { "tags":"唱歌", "comments":[ { "username":"小李", "comment":"小李几乎就是天生歌手" } ] } PUT /issues/_doc/5 { "tags":"跳舞", "comments":[ { "username":"小红", "comment":"小红舞姿很美" } ] }
issues 问题;tags 标签;username 名字;comment 评论;
上面咱们应用反向嵌套聚合父文档,需要如下:
1、先聚合统计出评论最多的
username
2、在聚合统计
username
中comment
最多的tag
GET /issues/_search?size=0 { "query": {"match_all": {} }, "aggs": { "comments": { "nested": {"path": "comments"}, "aggs": { "top_usernames": { "terms": {"field": "comments.username"}, "aggs": { "comment_to_issue": {"reverse_nested": {}, "aggs": { "top_tags_per_comment": { "terms": {"field": "tags"} } } } } } } } } }
后果如下,得出结论:
小红
评论次数最多,评论了5 次
,小红评论最多的标签是跳舞
,有3 次
{ "aggregations" : { "comments" : { "doc_count" : 9, "top_usernames" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "小红", "doc_count" : 5, "comment_to_issue" : { "doc_count" : 4, "top_tags_per_comment" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "跳舞", "doc_count" : 3 }, { "key" : "唱歌", "doc_count" : 1 } ] } } }, { "key" : "小李", "doc_count" : 4, "comment_to_issue" : { "doc_count" : 3, "top_tags_per_comment" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "唱歌", "doc_count" : 2 }, { "key" : "跳舞", "doc_count" : 1 } ] } } } ] } } } }
Nested 反对的参数有哪些
Nested
也只是非凡的 Object
的一种,也是有反对的几种参数
dynamic
: (可选参数) 是否容许在索引mapping
文件未定义字段的状况下对新字段的解决,默认是退出到现有的嵌套对象中(true
), 还反对false
,strict
properties
: (可选参数) 嵌套对象字段内容属性设置include_in_parent
:(可选参数) 默认false
,如果为true
,嵌套对象的字段也会作为一般字段的模式(flat
) 增加到父文档include_in_root
:(可选参数) 默认false
,如果为true
,嵌套对象的字段也会作为一般字段的模式(flat
) 增加到根文档
Nested 类型的束缚
通过后面的学习,咱们也晓得了 nested 类型能够作为一个独自的 Lucene
文档进行所有,当咱们有 100
个嵌套对象的时候咱们须要 101
个文档来存储映射关系,一个用于父文档,一个用于嵌套文档,所以这一部分的开销,ELasticsearch
来通过一下设置进行了束缚
-
index.mapping.nested_fields.limit
一个索引中,嵌套类型字段 (nested) 最多存在多个限度,默认
50 个
,如咱们下面的例子中,也就是只占用了一个 -
index.mapping.nested_objects.limit
一个索引中,单个嵌套类型字段蕴含的嵌套
JSON
对象的最大数量,默认10000
总结
通过下面的学习实际,咱们能够晓得 Nested
嵌套类型是 Elasticsearch
举荐的绝对于 Join
类型,并且 Nested
能够实现查问,聚合,排序等,根本满足了工作的须要。好了,到这就完结吧,有什么须要深刻理解的,留言哦,也能够去官网查看,毕竟官网还是一手材料,博主的也只能算是入门启蒙笔记,实际起来吧,加油!
Join 字段的详解能够参考博主的这一篇文章哦
本文由 mdnice 多平台公布