ElasticSearch 搜索建议与上下文提示
搜索建议
通过 Suggester Api 实现
原理是将输入的文本分解为 Token,然后在词典中查找类似的 Term 返回
根据不同场景,ElasticSearch 设计了 4 中类别的 Suggesters。
- Term Suggester
- Phrase Suggester
- Complete Suggester
- Context Suggester
Term Suggester
类似 Google 搜索引擎,我给的是一个错误的单词 elasticserch,但引擎友好地给出了搜索建议。
要实现这个功能,在 ElasticSearch 中很简单。
-
创建索引, 并写入一些文档
POST articles/_bulk {"index" : {} } {"body": "lucene is very cool"} {"index" : {} } {"body": "Elasticsearch builds on top of lucene"} {"index" : {} } {"body": "Elasticsearch rocks"} {"index" : {} } {"body": "elastic is the company behind ELK stack"} {"index" : {} } {"body": "Elk stack rocks"} {"index" : {} } {"body": "elasticsearch is rock solid"}
-
搜索文档,调用 suggest api。
当中有 3 种 Suggestion Mode
- missing 索引中已经存在,就不提供建议
- popular 推荐出现频率更加高的词
-
always 无论是否存在,都提供建议
POST /articles/_search { "size": 1, "query": { "match": {"body": "elasticserch"} }, "suggest": { "term-suggestion": { "text": "elasticserch", "term": { "suggest_mode": "missing", "field": "body" } } } }
-
返回结果
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : []}, "suggest" : { "term-suggestion" : [ { "text" : "elasticserch", "offset" : 0, "length" : 12, "options" : [ { "text" : "elasticsearch", "score" : 0.9166667, "freq" : 3 } ] } ] } }
Phrase Suggester
Phrase Suggester 可以在 Term Suggester 上增加一些额外的逻辑
其中一些参数
- max_errors 最多可以拼错的 terms
-
confidence 限制返回结果数,默认 1
POST /articles/_search { "suggest": { "my-suggestion": { "text": "lucne and elasticsear rock hello world", "phrase": { "field": "body", "max_errors":2, "confidence":2, "direct_generator":[{ "field":"body", "suggest_mode":"missing" }], "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } } }
Completion Suggester
自动完成功能,用户每输入一个字符。就需要即时发送一个查询请求到后端查找匹配项。
它对性能要求比较苛刻。
elastic 将 Analyse 的数据编码成 FST 与索引放在一起,它会被整个加载进内存里面,速度非常快
FST 只能支持前缀查找。
类似百度这样的提示功能
在 ElasticSearch 要实现这样的功能也很简单。
-
建立索引
PUT titles { "mappings": { "properties": { "title_completion":{"type": "completion"} } } }
-
写入文档
POST titles/_bulk {"index" : {} } {"title_completion": "php 是什么"} {"index" : {} } {"title_completion": "php 是世界上最好的语言"} {"index" : {} } {"title_completion": "php 货币"} {"index" : {} } {"title_completion": "php 面试题 2019"}
-
搜索数据
POST titles/_search?pretty { "size": 0, "suggest": { "article-suggester": { "prefix": "php", "completion": {"field": "title_completion"} } } }
-
返回结果
{ "took" : 173, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : []}, "suggest" : { "article-suggester" : [ { "text" : "php", "offset" : 0, "length" : 3, "options" : [ { "text" : "php 是世界上最好的语言", "_index" : "titles", "_type" : "_doc", "_id" : "pv8V8WwBISxFcLcZfDXl", "_score" : 1.0, "_source" : {"title_completion" : "php 是世界上最好的语言"} }, { "text" : "php 是什么", "_index" : "titles", "_type" : "_doc", "_id" : "pf8V8WwBISxFcLcZfDXl", "_score" : 1.0, "_source" : {"title_completion" : "php 是什么"} }, { "text" : "php 货币", "_index" : "titles", "_type" : "_doc", "_id" : "p_8V8WwBISxFcLcZfDXl", "_score" : 1.0, "_source" : {"title_completion" : "php 货币"} }, { "text" : "php 面试题 2019", "_index" : "titles", "_type" : "_doc", "_id" : "qP8V8WwBISxFcLcZfDXl", "_score" : 1.0, "_source" : {"title_completion" : "php 面试题 2019"} } ] } ] } }
Context Suggester
是 Completion Suggester 的扩展,加入了上下文信息场景。
例如:
你在电器商城,输入苹果,想要找到的苹果笔记本 …
你在水果商城,输入苹果,想要找的是红苹果、绿苹果 …
-
建立索引,定制 mapping
PUT comments { "mappings": { "properties": { "comment_autocomplete": { "type": "completion", "contexts": [ { "type": "category", "name": "comment_category" } ] } } } }
-
并为每个文档加入 Context 信息
POST comments/_doc { "comment":"苹果电脑", "comment_autocomplete":{"input":["苹果电脑"], "contexts":{"comment_category":"电器商城"} } } POST comments/_doc { "comment":"红红的冰糖心苹果", "comment_autocomplete":{"input":["苹果"], "contexts":{"comment_category":"水果商城"} } }
-
结合 Context 进行 Suggestion 查询
POST comments/_search { "suggest": { "MY_SUGGESTION": { "prefix": "苹", "completion":{ "field":"comment_autocomplete", "contexts":{"comment_category":"电器商城"} } } } }
-
返回结果
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : []}, "suggest" : { "MY_SUGGESTION" : [ { "text" : "苹", "offset" : 0, "length" : 1, "options" : [ { "text" : "苹果", "_index" : "comments", "_type" : "_doc", "_id" : "qf_s9WwBISxFcLcZszWh", "_score" : 1.0, "_source" : { "comment" : "苹果电脑", "comment_autocomplete" : { "input" : ["苹果电脑"], "contexts" : {"comment_category" : "电器商城"} } }, "contexts" : { "comment_category" : ["电器商城"] } } ] } ] } }
附录
- suggesters 官方文档