前言

  • 举荐学习阮一鸣《Elasticsearch 核心技术与实战》
  • 本文对 Elasticsearch 7.x 实用
  • 同义词能够再建索引时(index-time synonyms)或者检索时(search-time synonyms)应用,个别在检索时应用
  • 本文介绍的是 search-time synonyms

同义词文档格局

  • 单向同义词
ipod, i-pod, i pod => ipod
  • 双向同义词
马铃薯, 土豆, potato

试验步骤

增加同义词文件

  • 在 Elasticsearch 的 config 目录下新建 analysis 目录,在 analysis 下增加同义词文件 synonym.txt
  • 在检索时应用同义词,不须要重启 Elasticsearch,也不须要重建索引

创立索引

PUT my_index{  "settings": {    "analysis": {      "filter": {        "word_syn": {          "type": "synonym_graph",          "synonyms_path": "analysis/synonym.txt"        }      },      "analyzer": {        "ik_max_word_syn": {          "filter": [            "word_syn"          ],          "type": "custom",          "tokenizer": "ik_max_word"        }      }    }  },  "mappings": {    "properties": {      "title": {        "type": "text",        "analyzer": "ik_smart",        "search_analyzer": "ik_max_word"      },      "author": {        "type": "keyword"      }    }  }}

间接测试分词器

  • 查问语句
GET my_index/_analyze{  "analyzer": "ik_max_word_syn",  "text": "马铃薯"}
  • 输入
{  "tokens" : [    {      "token" : "马铃薯",      "start_offset" : 0,      "end_offset" : 3,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "土豆",      "start_offset" : 0,      "end_offset" : 3,      "type" : "SYNONYM",      "position" : 0    },    {      "token" : "potato",      "start_offset" : 0,      "end_offset" : 3,      "type" : "SYNONYM",      "position" : 0    }  ]}

增加测试数据

POST my_index/_doc/1{    "title": "马铃薯",    "author": "土豆"}

检索测试

  • 查问语句
GET my_index/_search{  "query": {    "query_string": {      "analyzer": "ik_max_word_syn",       "query": "title:potato AND author:potato"    }  }}
  • 后果输入
{  "took" : 38,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 0.5753642,    "hits" : [      {        "_index" : "my_index",        "_type" : "_doc",        "_id" : "1",        "_score" : 0.5753642,        "_source" : {          "title" : "马铃薯",          "author" : "土豆"        }      }    ]  }}

相干文档

  • CSDN blog: Elasticsearch:应用同义词 synonyms 来进步搜寻效率
  • 官网 blog: 一样,却又不同:借助同义词让 Elasticsearch 更加弱小
  • 同义词过滤器: Synonym token filter、Synonym graph token filter
本文出自 qbit snap