elasticsearch学习笔记高级篇十四实战前缀搜索通配符搜索正则搜索

jiezi

5 年前

PUT /test_index/_create/1
{"test_field": "C3D0-KD345"}
PUT /test_index/_create/2
{"test_field": "C3K5-DFG65"}
PUT /test_index/_create/3
{"test_field": "C4I8-UI365"}

原理：前缀匹配不会计算相关度分数，与前缀过滤的唯一区别就是过滤会有 cache bitset。它会扫描整个倒排索引。找到符合前缀条件的文档。所以说前缀越短，要处理的文档就越多，性能就越差，尽可能应该用长前缀搜索。
示例，搜索前缀为 C3 的文档：

GET /test_index/_search
{
  "query": {
    "match_phrase_prefix": {"test_field": "C3"}
  }
}

结果：

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808292,
        "_source" : {"test_field" : "C3D0-KD345"}
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {"test_field" : "C3K5-DFG65"}
      }
    ]
  }
}

通配符搜索跟前缀搜索类似，比前缀搜索要更加强大。也是需要扫描整个倒排索引，性能也是很差的。
？：表示匹配任意一个字符

：表示匹配任意多个字符

示例：通配符搜索条件为 *4? 的文档

GET /test_index/_search
{
  "query": {
    "wildcard": {
      "test_field": {"value": "*4?"}
    }
  }
}

输出结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {"test_field" : "C3D0-KD345"}
      }
    ]
  }
}

regexp 可以说功能比之前的通配符搜索功能更加强大，但是都会扫描整个倒排索引，性能也是会非常的差。
[0-9]：指定范围内的数字
[a-z]：指定范围内的字母
.：一个字符
+：前面的正则表达式可以出现一次或多次
*：前面的正则表达式可以出现零次或多次
{n}: n 是非负整数，表示匹配 n 次
示例，搜索条件为.*[a-z]{3}[0-9]{2}的文档

GET /test_index/_search
{
  "query": {
    "regexp": {
      "test_field": {"value": ".*[a-z]{3}[0-9]{2}"
      }
    }
  }
}

输出结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {"test_field" : "C3K5-DFG65"}
      }
    ]
  }
}

elasticsearch学习笔记高级篇十四实战前缀搜索通配符搜索正则搜索

准备数据：

前缀搜索：

通配符搜索：

正则搜索：