关于后端:Nested嵌套对象类型还挺实用

34次阅读

共计 7108 个字符,预计需要花费 18 分钟才能阅读完成。

上一篇文章中,咱们学习了 Join 类型的父子文档,明天持续学习一下嵌套文档,毕竟嵌套文档也是 Elasticsearch 举荐的,首先咱们看上面这条语句

PUT word_document/_doc/1
{
  "title" : "up",
  "user" : [ 
    {
      "name" : "honghong",
      "sex" :  "female",
      "numberOfLikes":500
    },
    {
      "name" : "mingming",
      "sex" :  "male",
      "numberOfLikes":50
    },
    {
      "name" : "lanlan",
      "sex" :  "male",
      "numberOfLikes":100
    }
  ]
}

对于下面这种格局的数据,user就是嵌套对象数组,那么 userElasticsearch中是怎么存储的呢?如果咱们要对嵌套的子对象进行检索,怎么能力检索出咱们所须要的数据呢,上面咱们就一起来钻研下 Nested 数据类型

环境

  • macos 10.14.6
  • Elasticsearch 8.1
  • Kibana 8.1

Nested

结尾咱们还是先理解一下,什么是 Nested 类型,其实就是字面意思,Nested就是嵌套,也就是文章结尾 user 数据类型那种,所以能够看为是一种非凡的 Object 类型。还是以文章结尾的数据为例

PUT word_document/_doc/1
{
  "title" : "up",
  "user" : [ 
    {
      "name" : "honghong",
      "sex" :  "female",
      "numberOfLikes":500
    },
    {
      "name" : "mingming",
      "sex" :  "male",
      "numberOfLikes":50
    },
    {
      "name" : "lanlan",
      "sex" :  "male",
      "numberOfLikes":100
    }
  ]
}

如果咱们没有对 word_document 索引进行显示设置数据类型,在下面这个语句执行之后,Elasticsearch 会默认推断数据类型,在 Elasticsearch 中内容会转换为可能如下的模式,扁平化的解决数据

{
  "title":"up",
  "user.name":["honghong","mingming","lanlan"],
  "user.sex":["male","male","female"],
  "user.numberOfLikes":[500,50,100]
}

置信大家也看进去了,如果被 Elasticsearch 转换成下面的这种数据结构之后,咱们的搜寻后果是会被影响的,如果咱们应用如下这个语句进行查问,咱们想搜寻 namehonghongsexmale,预期后果是没有匹配的文档,然而因为Elasticsearch 对上述的后果进行了扁平化的解决,造成了谬误的匹配

GET word_document/_search
{
  "query": {
    "bool": {
      "must": [{ "match": { "user.name": "honghong"}},
        {"match": { "user.sex":  "male"}}
      ]
    }
  }
}

如何防止上述情况的产生呢,那就是应用 Elasticsearch 提供的 Nested 数据类型,Nested 数据类型保障了嵌套对象的独立性,也就是让咱们能够对嵌套对象的内容进行检索,从而不会产生上述的这种状况

  • 首先咱们还是以下面文档为例,不过是这次咱们优先创立索引,并指定 user 字段为nested

    PUT word_document
    {
      "mappings": {
        "properties": {
          "title":{"type": "keyword"},
          "user": {"type": "nested"},
          "numberOfLikes":{"type": "integer"}
        }
      }
    }
  • 上面退出咱们的测试数据,来验证咱们的搜寻语句

    PUT word_document/_doc/1
    {
      "title" : "up",
      "user" : [ 
        {
          "name" : "honghong",
          "sex" :  "female",
          "numberOfLikes":500
        },
        {
          "name" : "mingming",
          "sex" :  "male",
          "numberOfLikes":50
        },
        {
          "name" : "lanlan",
          "sex" :  "male",
          "numberOfLikes":100
        }
      ]
    }
    PUT word_document/_doc/2
    {
      "title" : "up",
      "user" : [ 
          {
          "name" : "honghong",
          "sex" :  "female",
          "numberOfLikes":20
        },
        {
          "name" : "mingming",
          "sex" :  "male",
          "numberOfLikes":30
        },
        {
          "name" : "lanlan",
          "sex" :  "male",
          "numberOfLikes":50
        }
      ]
    }
    PUT word_document/_doc/3
    {
      "title" : "up",
      "user" : [ 
        {
          "name" : "honghong",
          "sex" :  "female",
          "numberOfLikes":50
        },
        {
          "name" : "mingming",
          "sex" :  "male",
          "numberOfLikes":50
        },
        {
          "name" : "lanlan",
          "sex" :  "male",
          "numberOfLikes":50
        }
      ]
    }
  • 上面还是方才那个搜寻语句,此时就不会有匹配的文档返回,返回后果为空

    GET word_document/_search
    {
      "query": {
        "nested": {
          "path": "user",
          "query": {
            "bool": {
              "must": [{ "match": { "user.name": "honghong"}},
                {"match": { "user.sex":  "male"}} 
              ]
            }
          }
        }
      }
    }
  • 那么对于嵌套文档咱们能够怎么查问呢,那就是指定 nested 查问类型,应用一般的查问是查问不到的哦,nested查问语句如下所示,此时返回的就是咱们

    GET word_document/_search
    {
      "query": {
        "nested": {
          "path": "user",
          "query": {
            "bool": {
              "must": [{ "match": { "user.name": "honghong"}},
                {"match": { "user.sex":  "female"}} 
              ]
            }
          },
          "inner_hits": { 
            "highlight": {
              "fields": {"user.name": {}
              }
            }
          }
        }
      }
    }
  • 此外咱们还能够依据嵌套对象中的字段进行排序,升序时获取嵌套对象中最小的值最为比拟值,降序时获取嵌套对象最大的值作为比拟值

    GET word_document/_search
    {
      "query": {
        "nested": {
          "path": "user",
          "query": {
            "match": {"user.sex": "male"}
          }
        }
      },
      "sort":[
        {
          "user.numberOfLikes": {
            "order": "asc", 
            "nested": {"path":"user"}
          }
        }
        ]
    }

    返回如下

    {
      "took" : 101,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "word_document",
            "_id" : "2",
            "_score" : null,
            "_source" : {
              "title" : "up",
              "user" : [
                {
                  "name" : "honghong",
                  "sex" : "female",
                  "numberOfLikes" : 20
                },
                {
                  "name" : "mingming",
                  "sex" : "male",
                  "numberOfLikes" : 30
                },
                {
                  "name" : "lanlan",
                  "sex" : "male",
                  "numberOfLikes" : 50
                }
              ]
            },
            "sort" : [20]
          },
          {
            "_index" : "word_document",
            "_id" : "1",
            "_score" : null,
            "_source" : {
              "title" : "up",
              "user" : [
                {
                  "name" : "honghong",
                  "sex" : "female",
                  "numberOfLikes" : 500
                },
                {
                  "name" : "mingming",
                  "sex" : "male",
                  "numberOfLikes" : 50
                },
                {
                  "name" : "lanlan",
                  "sex" : "male",
                  "numberOfLikes" : 100
                }
              ]
            },
            "sort" : [50]
          },
          {
            "_index" : "word_document",
            "_id" : "3",
            "_score" : null,
            "_source" : {
              "title" : "up",
              "user" : [
                {
                  "name" : "honghong",
                  "sex" : "female",
                  "numberOfLikes" : 50
                },
                {
                  "name" : "mingming",
                  "sex" : "male",
                  "numberOfLikes" : 50
                },
                {
                  "name" : "lanlan",
                  "sex" : "male",
                  "numberOfLikes" : 50
                }
              ]
            },
            "sort" : [50]
          }
        ]
      }
    }
    
  • 咱们也能够对嵌套对象进行聚合操作,如下咱们获取索引中 user.name=honghong,user.sex=female 的所有文档,聚合统计 numberOfLikes 的最小值

    GET word_document/_search
    {
      "query": {
        "nested": {
          "path": "user",
          "query": {
            "bool": {
              "must": [
                {
                  "match": {"user.name": "honghong"}
                },
                {
                  "match": {"user.sex": "female"}
                }
              ]
            }
          }
        }
      },
      "aggs": {
        "my_min_value": {
          "nested": {"path": "user"}, 
          "aggs": {
            "min_value": {
              "min": {"field": "user.numberOfLikes"}
            }
          }
        }
      }
    }
  • 下面的聚合统计只是对外部的文档过滤,那如果咱们有这么一个需要,聚合统计嵌套对象 user 内容 sex=male 的最小值,那么咱们能够应用如下 filter,上面这个语句优先过滤title=up 的文档,聚合统计 user.sex=malenumberOfLikes最小值

    GET /word_document/_search?size=0
    {
      "query": {
        "match": {"title": "up"}
      },
      "aggs": {
        "my_user": {
          "nested": {"path": "user"},
          "aggs": {
            "filter_my_user": {
              "filter": {
                "bool": {
                  "filter": [
                    {
                      "match": {"user.sex": "male"}
                    }
                  ]
                }
              },
              "aggs": {
                "min_price": {
                  "min": {"field": "user.numberOfLikes"}
                }
              }
            },
            "no_filter_my_user":{
              "min": {"field": "user.numberOfLikes"}
            }
          }
        }
      }
    }
  • 最初还有一种就是反向嵌套聚合,通过嵌套对象聚合父文档,返回父文档信息

    首先咱们还是先创立一个索引增加几条数据用来测试

    PUT /issues
    {
      "mappings": {
        "properties": {"tags": { "type": "keyword"},
          "comments": {                            
            "type": "nested",
            "properties": {"username": { "type": "keyword"},
              "comment": {"type": "text"}
            }
          }
        }
      }
    }
    PUT /issues/_doc/1
    {
      "tags":"跳舞",
      "comments":[{
        "username":"小李",
        "comment":"小李想学跳舞"
      },
      {
        "username":"小红",
        "comment":"小红跳舞很有天才"
      }
      ]
    }
    PUT /issues/_doc/2
    {
      "tags":"唱歌",
      "comments":[{
        "username":"小李",
        "comment":"小李会唱歌"
      },
      {
        "username":"小李",
        "comment":"小李唱歌有天才"
      },
      {
        "username":"小红",
        "comment":"小红是歌手"
      }
      ]
    }
    PUT /issues/_doc/3
    {
      "tags":"跳舞",
      "comments":[
      {
        "username":"小红",
        "comment":"小红会跳舞"
      },
      {
        "username":"小红",
        "comment":"小红是舞神"
      }
      ]
    }
    PUT /issues/_doc/4
    {
      "tags":"唱歌",
      "comments":[
      {
        "username":"小李",
        "comment":"小李几乎就是天生歌手"
      }
      ]
    }
    PUT /issues/_doc/5
    {
      "tags":"跳舞",
      "comments":[
      {
        "username":"小红",
        "comment":"小红舞姿很美"
      }
      ]
    }

    issues 问题;tags 标签;username 名字;comment 评论;

    上面咱们应用反向嵌套聚合父文档,需要如下:

    1、先聚合统计出评论最多的username

    2、在聚合统计 usernamecomment最多的tag

    GET /issues/_search?size=0
    {
      "query": {"match_all": {}
      },
      "aggs": {
        "comments": {
          "nested": {"path": "comments"},
          "aggs": {
            "top_usernames": {
              "terms": {"field": "comments.username"},
              "aggs": {
                "comment_to_issue": {"reverse_nested": {}, 
                  "aggs": {
                    "top_tags_per_comment": {
                      "terms": {"field": "tags"}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

    后果如下,得出结论:小红 评论次数最多,评论了 5 次,小红评论最多的标签是 跳舞,有 3 次

    {
      "aggregations" : {
        "comments" : {
          "doc_count" : 9,
          "top_usernames" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "小红",
                "doc_count" : 5,
                "comment_to_issue" : {
                  "doc_count" : 4,
                  "top_tags_per_comment" : {
                    "doc_count_error_upper_bound" : 0,
                    "sum_other_doc_count" : 0,
                    "buckets" : [
                      {
                        "key" : "跳舞",
                        "doc_count" : 3
                      },
                      {
                        "key" : "唱歌",
                        "doc_count" : 1
                      }
                    ]
                  }
                }
              },
              {
                "key" : "小李",
                "doc_count" : 4,
                "comment_to_issue" : {
                  "doc_count" : 3,
                  "top_tags_per_comment" : {
                    "doc_count_error_upper_bound" : 0,
                    "sum_other_doc_count" : 0,
                    "buckets" : [
                      {
                        "key" : "唱歌",
                        "doc_count" : 2
                      },
                      {
                        "key" : "跳舞",
                        "doc_count" : 1
                      }
                    ]
                  }
                }
              }
            ]
          }
        }
      }
    }
    

Nested 反对的参数有哪些

Nested也只是非凡的 Object 的一种,也是有反对的几种参数

  • dynamic: (可选参数) 是否容许在索引 mapping 文件未定义字段的状况下对新字段的解决,默认是退出到现有的嵌套对象中(true), 还反对falsestrict
  • properties: (可选参数) 嵌套对象字段内容属性设置
  • include_in_parent:(可选参数) 默认 false,如果为true,嵌套对象的字段也会作为一般字段的模式(flat) 增加到父文档
  • include_in_root:(可选参数) 默认 false,如果为true,嵌套对象的字段也会作为一般字段的模式(flat) 增加到根文档

Nested 类型的束缚

通过后面的学习,咱们也晓得了 nested 类型能够作为一个独自的 Lucene 文档进行所有,当咱们有 100 个嵌套对象的时候咱们须要 101 个文档来存储映射关系,一个用于父文档,一个用于嵌套文档,所以这一部分的开销,ELasticsearch来通过一下设置进行了束缚

  • index.mapping.nested_fields.limit

    一个索引中,嵌套类型字段 (nested) 最多存在多个限度,默认50 个,如咱们下面的例子中,也就是只占用了一个

  • index.mapping.nested_objects.limit

    一个索引中,单个嵌套类型字段蕴含的嵌套 JSON 对象的最大数量,默认10000

总结

通过下面的学习实际,咱们能够晓得 Nested 嵌套类型是 Elasticsearch 举荐的绝对于 Join 类型,并且 Nested 能够实现查问,聚合,排序等,根本满足了工作的须要。好了,到这就完结吧,有什么须要深刻理解的,留言哦,也能够去官网查看,毕竟官网还是一手材料,博主的也只能算是入门启蒙笔记,实际起来吧,加油!

Join 字段的详解能够参考博主的这一篇文章哦

本文由 mdnice 多平台公布

正文完
 0