乐趣区

ElasticSearch搜索建议与上下文提示

ElasticSearch 搜索建议与上下文提示

搜索建议

通过 Suggester Api 实现

原理是将输入的文本分解为 Token,然后在词典中查找类似的 Term 返回

根据不同场景,ElasticSearch 设计了 4 中类别的 Suggesters。

  • Term Suggester
  • Phrase Suggester
  • Complete Suggester
  • Context Suggester

Term Suggester

类似 Google 搜索引擎,我给的是一个错误的单词 elasticserch,但引擎友好地给出了搜索建议。

要实现这个功能,在 ElasticSearch 中很简单。

  1. 创建索引, 并写入一些文档

    POST articles/_bulk
    {"index" : {} }
    {"body": "lucene is very cool"}
    {"index" : {} }
    {"body": "Elasticsearch builds on top of lucene"}
    {"index" : {} }
    {"body": "Elasticsearch rocks"}
    {"index" : {} }
    {"body": "elastic is the company behind ELK stack"}
    {"index" : {} }
    {"body": "Elk stack rocks"}
    {"index" : {} }
    {"body": "elasticsearch is rock solid"}
  2. 搜索文档,调用 suggest api。

    当中有 3 种 Suggestion Mode

    • missing 索引中已经存在,就不提供建议
    • popular 推荐出现频率更加高的词
    • always 无论是否存在,都提供建议

      POST /articles/_search
      {
        "size": 1,
        "query": {
          "match": {"body": "elasticserch"}
        },
        "suggest": {
          "term-suggestion": {
            "text": "elasticserch",
            "term": {
              "suggest_mode": "missing",
              "field": "body"
            }
          }
        }
      }
  3. 返回结果

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : []},
      "suggest" : {
        "term-suggestion" : [
          {
            "text" : "elasticserch",
            "offset" : 0,
            "length" : 12,
            "options" : [
              {
                "text" : "elasticsearch",
                "score" : 0.9166667,
                "freq" : 3
              }
            ]
          }
        ]
      }
    }

Phrase Suggester

Phrase Suggester 可以在 Term Suggester 上增加一些额外的逻辑

其中一些参数

  • max_errors 最多可以拼错的 terms
  • confidence 限制返回结果数,默认 1

    POST /articles/_search
    {
      "suggest": {
        "my-suggestion": {
          "text": "lucne and elasticsear rock hello world",
          "phrase": {
            "field": "body",
            "max_errors":2,
            "confidence":2,
            "direct_generator":[{
              "field":"body",
              "suggest_mode":"missing"
            }],
            "highlight": {
              "pre_tag": "<em>",
              "post_tag": "</em>"
            }
          }
        }
      }
    }

Completion Suggester

自动完成功能,用户每输入一个字符。就需要即时发送一个查询请求到后端查找匹配项。

它对性能要求比较苛刻。

elastic 将 Analyse 的数据编码成 FST 与索引放在一起,它会被整个加载进内存里面,速度非常快

FST 只能支持前缀查找。

类似百度这样的提示功能

在 ElasticSearch 要实现这样的功能也很简单。

  1. 建立索引

    PUT titles
    {
      "mappings": {
        "properties": {
          "title_completion":{"type": "completion"}
        }
      }
    }
  2. 写入文档

    POST titles/_bulk
    {"index" : {} }
    {"title_completion": "php 是什么"}
    {"index" : {} }
    {"title_completion": "php 是世界上最好的语言"}
    {"index" : {} }
    {"title_completion": "php 货币"}
    {"index" : {} }
    {"title_completion": "php 面试题 2019"}
  3. 搜索数据

    POST titles/_search?pretty
    {
      "size": 0,
      "suggest": {
        "article-suggester": {
          "prefix": "php",
          "completion": {"field": "title_completion"}
        }
      }
    }
  4. 返回结果

    {
      "took" : 173,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : []},
      "suggest" : {
        "article-suggester" : [
          {
            "text" : "php",
            "offset" : 0,
            "length" : 3,
            "options" : [
              {
                "text" : "php 是世界上最好的语言",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "pv8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {"title_completion" : "php 是世界上最好的语言"}
              },
              {
                "text" : "php 是什么",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "pf8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {"title_completion" : "php 是什么"}
              },
              {
                "text" : "php 货币",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "p_8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {"title_completion" : "php 货币"}
              },
              {
                "text" : "php 面试题 2019",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "qP8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {"title_completion" : "php 面试题 2019"}
              }
            ]
          }
        ]
      }
    }

Context Suggester

是 Completion Suggester 的扩展,加入了上下文信息场景。

例如:

你在电器商城,输入苹果,想要找到的苹果笔记本 …
你在水果商城,输入苹果,想要找的是红苹果、绿苹果 …

  1. 建立索引,定制 mapping

    PUT comments
    {
      "mappings": {
        "properties": {
          "comment_autocomplete": {
            "type": "completion",
            "contexts": [
              {
                "type": "category",
                "name": "comment_category"
              }
            ]
          }
        }
      }
    }
  2. 并为每个文档加入 Context 信息

    POST comments/_doc
    {
      "comment":"苹果电脑",
      "comment_autocomplete":{"input":["苹果电脑"],
        "contexts":{"comment_category":"电器商城"}
      }
    }
    
    POST comments/_doc
    {
      "comment":"红红的冰糖心苹果",
      "comment_autocomplete":{"input":["苹果"],
        "contexts":{"comment_category":"水果商城"}
      }
    }
  3. 结合 Context 进行 Suggestion 查询

    POST comments/_search
    {
      "suggest": {
        "MY_SUGGESTION": {
          "prefix": "苹",
          "completion":{
            "field":"comment_autocomplete",
            "contexts":{"comment_category":"电器商城"}
          }
        }
      }
    }
  4. 返回结果

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : []},
      "suggest" : {
        "MY_SUGGESTION" : [
          {
            "text" : "苹",
            "offset" : 0,
            "length" : 1,
            "options" : [
              {
                "text" : "苹果",
                "_index" : "comments",
                "_type" : "_doc",
                "_id" : "qf_s9WwBISxFcLcZszWh",
                "_score" : 1.0,
                "_source" : {
                  "comment" : "苹果电脑",
                  "comment_autocomplete" : {
                    "input" : ["苹果电脑"],
                    "contexts" : {"comment_category" : "电器商城"}
                  }
                },
                "contexts" : {
                  "comment_category" : ["电器商城"]
                }
              }
            ]
          }
        ]
      }
    }

附录

  • suggesters 官方文档
退出移动版