前言
- 举荐学习阮一鸣《Elasticsearch 核心技术与实战》
- 本文对 Elasticsearch 7.x 实用
- 同义词能够再建索引时(index-time synonyms)或者检索时(search-time synonyms)应用,个别在检索时应用
- 本文介绍的是 search-time synonyms
同义词文档格局
ipod, i-pod, i pod => ipod
马铃薯, 土豆, potato
试验步骤
增加同义词文件
- 在 Elasticsearch 的
config
目录下新建 analysis
目录,在 analysis
下增加同义词文件 synonym.txt
- 在检索时应用同义词,不须要重启 Elasticsearch,也不须要重建索引
创立索引
PUT my_index{ "settings": { "analysis": { "filter": { "word_syn": { "type": "synonym_graph", "synonyms_path": "analysis/synonym.txt" } }, "analyzer": { "ik_max_word_syn": { "filter": [ "word_syn" ], "type": "custom", "tokenizer": "ik_max_word" } } } }, "mappings": { "properties": { "title": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_max_word" }, "author": { "type": "keyword" } } }}
间接测试分词器
GET my_index/_analyze{ "analyzer": "ik_max_word_syn", "text": "马铃薯"}
{ "tokens" : [ { "token" : "马铃薯", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "土豆", "start_offset" : 0, "end_offset" : 3, "type" : "SYNONYM", "position" : 0 }, { "token" : "potato", "start_offset" : 0, "end_offset" : 3, "type" : "SYNONYM", "position" : 0 } ]}
增加测试数据
POST my_index/_doc/1{ "title": "马铃薯", "author": "土豆"}
检索测试
GET my_index/_search{ "query": { "query_string": { "analyzer": "ik_max_word_syn", "query": "title:potato AND author:potato" } }}
{ "took" : 38, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.5753642, "hits" : [ { "_index" : "my_index", "_type" : "_doc", "_id" : "1", "_score" : 0.5753642, "_source" : { "title" : "马铃薯", "author" : "土豆" } } ] }}
相干文档
- CSDN blog: Elasticsearch:应用同义词 synonyms 来进步搜寻效率
- 官网 blog: 一样,却又不同:借助同义词让 Elasticsearch 更加弱小
- 同义词过滤器: Synonym token filter、Synonym graph token filter
本文出自 qbit snap