共计 1589 个字符,预计需要花费 4 分钟才能阅读完成。
前言
- 举荐学习阮一鸣《Elasticsearch 核心技术与实战》
- 本文对 Elasticsearch 7.x 实用
- 同义词能够再建索引时(index-time synonyms)或者检索时(search-time synonyms)应用,个别在检索时应用
- 本文介绍的是 search-time synonyms
同义词文档格局
- 单向同义词
ipod, i-pod, i pod => ipod
- 双向同义词
马铃薯, 土豆, potato
试验步骤
增加同义词文件
- 在 Elasticsearch 的
config
目录下新建analysis
目录,在analysis
下增加同义词文件synonym.txt
- 在检索时应用同义词,不须要重启 Elasticsearch,也不须要重建索引
创立索引
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"word_syn": {
"type": "synonym_graph",
"synonyms_path": "analysis/synonym.txt"
}
},
"analyzer": {
"ik_max_word_syn": {
"filter": ["word_syn"],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_max_word"
},
"author": {"type": "keyword"}
}
}
}
间接测试分词器
- 查问语句
GET my_index/_analyze
{
"analyzer": "ik_max_word_syn",
"text": "马铃薯"
}
- 输入
{
"tokens" : [
{
"token" : "马铃薯",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "土豆",
"start_offset" : 0,
"end_offset" : 3,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "potato",
"start_offset" : 0,
"end_offset" : 3,
"type" : "SYNONYM",
"position" : 0
}
]
}
增加测试数据
POST my_index/_doc/1
{
"title": "马铃薯",
"author": "土豆"
}
检索测试
- 查问语句
GET my_index/_search
{
"query": {
"query_string": {
"analyzer": "ik_max_word_syn",
"query": "title:potato AND author:potato"
}
}
}
- 后果输入
{
"took" : 38,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"title" : "马铃薯",
"author" : "土豆"
}
}
]
}
}
相干文档
- CSDN blog:Elasticsearch:应用同义词 synonyms 来进步搜寻效率
- 官网 blog:一样,却又不同:借助同义词让 Elasticsearch 更加弱小
- 同义词过滤器:Synonym token filter、Synonym graph token filter
本文出自 qbit snap
正文完
发表至: elasticsearch
2020-09-04