关于elasticsearch:Elasticsearch-7x-配置同义词qbit

35次阅读

共计 1589 个字符,预计需要花费 4 分钟才能阅读完成。

前言

  • 举荐学习阮一鸣《Elasticsearch 核心技术与实战》
  • 本文对 Elasticsearch 7.x 实用
  • 同义词能够再建索引时(index-time synonyms)或者检索时(search-time synonyms)应用,个别在检索时应用
  • 本文介绍的是 search-time synonyms

同义词文档格局

  • 单向同义词
ipod, i-pod, i pod => ipod
  • 双向同义词
 马铃薯, 土豆, potato

试验步骤

增加同义词文件

  • 在 Elasticsearch 的 config 目录下新建 analysis 目录,在 analysis 下增加同义词文件 synonym.txt
  • 在检索时应用同义词,不须要重启 Elasticsearch,也不须要重建索引

创立索引

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "word_syn": {
          "type": "synonym_graph",
          "synonyms_path": "analysis/synonym.txt"
        }
      },
      "analyzer": {
        "ik_max_word_syn": {
          "filter": ["word_syn"],
          "type": "custom",
          "tokenizer": "ik_max_word"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_smart",
        "search_analyzer": "ik_max_word"
      },
      "author": {"type": "keyword"}
    }
  }
}

间接测试分词器

  • 查问语句
GET my_index/_analyze
{
  "analyzer": "ik_max_word_syn",
  "text": "马铃薯"
}
  • 输入
{
  "tokens" : [
    {
      "token" : "马铃薯",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "土豆",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "potato",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

增加测试数据

POST my_index/_doc/1
{
    "title": "马铃薯",
    "author": "土豆"
}

检索测试

  • 查问语句
GET my_index/_search
{
  "query": {
    "query_string": {
      "analyzer": "ik_max_word_syn", 
      "query": "title:potato AND author:potato"
    }
  }
}
  • 后果输入
{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "title" : "马铃薯",
          "author" : "土豆"
        }
      }
    ]
  }
}

相干文档

  • CSDN blog:Elasticsearch:应用同义词 synonyms 来进步搜寻效率
  • 官网 blog:一样,却又不同:借助同义词让 Elasticsearch 更加弱小
  • 同义词过滤器:Synonym token filter、Synonym graph token filter

本文出自 qbit snap

正文完
 0