关于java:分布式搜索引擎ElasticSearch之高级运用一

40次阅读

共计 2727 个字符,预计需要花费 7 分钟才能阅读完成。

一、过滤查问(分页、含糊、filter)

1. 搜寻合乎匹配条件的信息:

创立数据:

PUT account/_doc/1
{"account": 10001, "balance": 10000, "name": "test1"} 

PUT account/_doc/2
{"account": 10002, "balance": 20000, "name": "test2"} 

PUT account/_doc/3
{"account": 10003, "balance": 30000, "name": "张三"} 

PUT account/_doc/4
{"account": 10004, "balance": 30000, "name": "王五"} 

依据账号编号查找:

GET /account/_search 
{
  "query": { 
    "match": {"accountNo": "10001"}
  }
}

返回后果:

{
  ...
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "account" : 10002,
          "balance" : 20000,
          "name" : "test2"
        }
      }
    ]
  }
  ...
}

匹配胜利,返回所要查问的数据。

2. 反对分页查问:

GET /account/_search 
{
  "query": {"match_all": {}
  },
  "from": 0,
  "size": 2
}

可能返回 2 条数据。

"hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account" : 10001,
          "balance" : 10000,
          "name" : "test1"
        }
      },
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "account" : 10002,
          "balance" : 20000,
          "name" : "test2"
        }
      }
    ]
  }

3. 含糊查问:

数值类型不利于含糊匹配,这里通过字符类型进行测试:

GET /account/_search 
{
  "query": { 
    "match": {"name": "三四"}
  }
}

返回后果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "accountNo" : 10009,
          "balance" : 1000000,
          "name" : "张三"
        }
      }
    ]
  }
}

留神,这里默认会采纳单个汉字分词,所查问的关键字“三四”会拆成“三”和“四”进行含糊匹配。

4. filter 过滤查问

GET /account/_search 
{
  "query": { 
    "bool": {
      "filter": [
        {
          "term": {"name": "张三"}
        }
      ]
    }
  }
}

term 是精准查问,代表齐全匹配,不须要查问评分计算。

返回后果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : []}
}

能够看到没有匹配到任何后果,因为 term 是拿整个词“张三”进行匹配,而 ES 默认是做单字分词,将“张三”划分为了“张”和“三”,所以匹配不到后果。

二、bool 查问(should、must)

  1. should 查问:只有其中一个为 true 则成立。

    GET /movies/_search
    {
      "query":{
       "bool": {
         "must": [{"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }
  2. must 查问:必须所有条件都成立。

    GET /movies/_search
    {
    
      "query":{
       "bool": {
         "must": [{"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }
  3. must_not 查问:必须所有条件都不成立。

    GET /movies/_search
    {
    
      "query":{
       "bool": {
         "must_not": [{"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }

三、聚合查问操作(aggs)

  1. 依据用户的资金 balance 来做分组统计:

    GET /account/_search 
    {
      "query": { 
        "bool": {
          "filter": [
            {
              "range": {
                "account": {"gte": 10001}
              }
            }
          ]
        }
      },
      "sort": [
        {
          "balance": {"order": "desc"}
        }
      ],
      "aggs":{
        "group_by_balance": {
          "terms": {"field": "balance"}
        }
      }
    }

    找出账户编号大于等于 10001 的数据,依据 balance 做倒序排列,采纳 aggs 依据 balance 做分组汇总统计:

    "aggregations" : {
        "group_by_balance" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 30000,
              "doc_count" : 2
            },
            {
              "key" : 10000,
              "doc_count" : 1
            },
            {
              "key" : 20000,
              "doc_count" : 1
            }
          ]
        }
      }

    能够看到,最初会输入分组统计的汇总信息。

本文由 mirson 创作分享,感激大家的反对,心愿对大家有所播种!
入群申请,请加 WX 号:woodblock99

正文完
 0