关于搜索引擎:ElasticSearch必知必会基础篇

商业倒退与职能技术部 - 体验保障研发组康睿姚再毅李振刘斌王北永

阐明：以下全副均基于 eslaticsearch 8.1 版本

ElasticSearch	Mysql
Index	Table
Type 废除	Table 废除
Document	Row
Field	Column
Mapping	Schema
Everything is indexed	Index
Query DSL	SQL
GET http://…	select * from
POST http://…	update table set …
Aggregations	group by\sum\sum
cardinality	去重 distinct
reindex	数据迁徙

定义：雷同文档构造（Mapping）文档的联合由惟一索引名称标定一个集群中有多个索引不同的索引代表不同的业务类型数据注意事项：索引名称不反对大写索引名称最大反对 255 个字符长度字段的名称，反对大写，不过倡议全副对立小写

编辑切换为居中

增加图片正文，不超过 140 字（可选）

留神：动态参数索引创立后，不再能够批改，动静参数能够批改思考：一、为什么主分片创立后不可批改？A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the defalue value userd for _routing is the document`s _id es 中写入数据，是根据上述的公式计算文档应该存储在哪个分片中，后续的文档读取也是依据这个公式，一旦分片数扭转，数据也就找不到了简略了解依据 ID 做 Hash 而后再除以主分片数取余，被除数扭转，后果就不一样了二、如果业务层面依据数据状况，的确须要扩大主分片数，那怎么办？reindex 迁徙数据到另外一个索引 https://www.elastic.co/guide/…

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑切换为居中

增加图片正文，不超过 140 字（可选）

自动检测字段类型后增加字段也就是哪怕你没有在 es 的 mapping 中定义该字段，es 也会动静的帮你检测字段类型

// 删除 test01 索引，保障这个索引当初是洁净的
DELETE test01

// 不定义 mapping，间接一条插入数据试试看,
POST test01/_doc/1
{"name":"kangrui10"}

// 而后咱们查看 test01 该索引的 mapping 构造 看看 name 这个字段被定义成了什么类型
// 由此能够看出，name 一级为 text 类型，二级定义为 keyword，但其实这并不是咱们想要的后果，// 咱们业务查问中 name 字段并不会被分词查问，个别都是全匹配 (and name = xxx)
// 以下的这种后果，咱们想要实现全匹配 就须要 name.keyword = xxx  反而麻烦
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

可选值	阐明	解释
true	New fields are added to the mapping (default).	创立 mapping 时，如果不指定 dynamic 的值，默认 true，即如果你的字段没有收到指定类型，就会 es 帮你动静匹配字段类型
false	New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly.	若设置为 false，如果你的字段没有在 es 的 mapping 中创立，那么新的字段，一样能够写入，然而不能被查问，mapping 中也不会有这个字段，也就是被写入的字段，不会被创立索引
strict	If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping.	若设置为 strict，如果新的字段，没有在 mapping 中创立字段，增加会间接报错，生产环境举荐，更加谨严。示例如下, 如要新增字段，就必须手动的新增字段

字段匹配绝对精确，但不肯定是用户冀望的
比方当初有一个 text 字段，es 只会给你设置为默认的 standard 分词器，但咱们个别须要的是 ik 中文分词器
占用多余的存储空间
string 类型匹配为 text 和 keyword 两种类型，意味着会占用更多的存储空间
mapping 爆炸
如果不小心写错了查问语句，get 用成了 put 误操作，就会谬误创立很多字段

DocValue 其实是 Lucene 在构建倒排索引时，会额定建设一个有序的正排索引（基于 document => field value 的映射列表）DocValue 实质上是一个序列化的列式存储，这个构造十分实用于聚合（aggregations）、排序（Sorting）、脚本（scripts access to field）等操作。而且，这种存储形式也十分便于压缩，特地是数字类型。这样能够缩小磁盘空间并且进步访问速度。简直所有字段类型都反对 DocValue，除了 text 和 annotated_text 字段。

正排索引其实就是相似于数据库表，通过 id 和数据进行关联，通过搜寻文档 id，来获取对应的数据

true：默认值，默认开启
false：需手动指定，设置为 false 后，sort、aggregate、access the field from script 将会无奈应用，但会节俭磁盘空间

// 创立一个索引，test03，字段满足以下条件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer
PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {"type": "keyword"},
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{"type": "integer"}
    }
  }
}

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑切换为居中

增加图片正文，不超过 140 字（可选）

官网文档地址：https://www.elastic.co/guide/… 可配置 0 个或多个

HTML Strip Character Filter：用处：删除 HTML 元素，如 ，并解码 HTML 实体，如&amp

Mapping Character Filter：用处：替换指定字符

Pattern Replace Character Filter：用处：基于正则表达式替换指定字符

官网文档地址：https://www.elastic.co/guide/… 只能配置一个用分词器对文本进行分词

官网文档地址：https://www.elastic.co/guide/… 可配置 0 个或多个分词后再加工，比方转小写、删除某些非凡的停用词、减少同义词等

有一个文档，内容相似 dag & cat, 要求索引这个文档，并且应用 match_parase_query, 查问 dag & cat 或者 dag and cat, 都可能查到题目剖析：1. 何为 match_parase_query：match_phrase 会将检索关键词分词。match_phrase 的分词后果必须在被检索字段的分词中都蕴含，而且程序必须雷同，而且默认必须都是间断的。2. 要实现 & 和 and 查问后果要等价，那么就须要自定义分词器来实现了，定制化的需要 3. 如何自定义一个分词器：https://www.elastic.co/guide/…

# 新建索引
PUT /test01
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": ["my_mappings_char_filter"],
          "tokenizer": "standard",
        }
      },
      "char_filter": {
        "my_mappings_char_filter": {
          "type": "mapping",
          "mappings": ["& => and"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}
// 阐明
// 三段论之 Character filters，应用 char_filter 进行文本替换
// 三段论之 Token filters，应用默认分词器
// 三段论之 Token filters，未设定
// 字段 content 应用自定义分词器 my_analyzer

# 填充测试数据
PUT test01/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试,doc & cat || oc and cat 后果输入都为两条
POST test01/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {"content": "doc & cat"}
        }
      ]
    }
  }
}

# 解题思路，将 & 和 and  设定为同义词，应用 Token filters
# 创立索引
PUT /test02
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_synonym_analyzer": {
          "tokenizer": "whitespace",
          "filter": ["my_synonym"]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "lenient": true,
          "synonyms": ["& => and"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}
// 阐明
// 三段论之 Character filters，未设定
// 三段论之 Token filters，应用 whitespace 空格分词器，为什么不必默认分词器？因为默认分词器会把 & 分词后剔除了，就无奈在去做分词后的过滤操作了
// 三段论之 Token filters，应用 synony 分词后过滤器，对 & 和 and 做同义词
// 字段 content 应用自定义分词器 my_synonym_analyzer

# 填充测试数据
PUT test02/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试
POST test02/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {"content": "doc & cat"}
        }
      ]
    }
  }
}

// 单字段多类型, 比方一个字段我想设置两种分词器
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer":"standard",
        "fields": {
          "fieldText": { 
            "type":  "text",
            "analyzer":"ik_smart",
          }
        }
      }
    }
  }
}

如果业务中须要依据某两个数字类型字段的差值来排序，也就是我须要一个不存在的字段, 那么此时应该怎么办？当然你能够刷数，新增一个差值后果字段来实现，如果此时不容许你刷数新增字段怎么办？

编辑切换为居中

增加图片正文，不超过 140 字（可选）

在不从新建设索引的状况下，向现有文档新增字段
在不理解数据结构的状况下解决数据
在查问时笼罩从原索引字段返回的值
为特定用处定义字段而不批改底层架构

Lucene 齐全无感知，因没有被索引化，没有 doc_values
不反对评分，因为没有倒排索引
突破传统先定义后应用的形式
能阻止 mapping 爆炸
减少了 API 的灵活性
留神，会使得搜寻变慢

运行时检索指定，即检索环节可应用（也就是哪怕 mapping 中没有这个字段，我也能够查问）
动静或动态 mapping 指定，即 mapping 环节可应用（也就是在 mapping 中增加一个运行时的字段）

# 假设有以下索引和数据
PUT test03
{
  "mappings": {
    "properties": {
      "emotion": {"type": "integer"}
    }
  }
}
POST test03/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}

# 要求：emotion > 5, 返回 emotion_falg = '1',  
# 要求：emotion < 5, 返回 emotion_falg = '-1',  
# 要求：emotion = 5, 返回 emotion_falg = '0',

检索时指定运行时字段: https://www.elastic.co/guide/… 该字段实质上是不存在的，所以须要检索时要加上 fields *

GET test03/_search
{
  "fields": ["*"], 
  "runtime_mappings": {
    "emotion_falg": {
      "type": "keyword",
      "script": {"source": """if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
      }
    }
  }
}

创立索引时指定运行时字段：https://www.elastic.co/guide/… 该形式反对通过运行时字段做检索

# 创立索引并指定运行时字段
PUT test03_01
{
  "mappings": {
    "runtime": {
      "emotion_falg": {
        "type": "keyword",
        "script": {"source": """if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
        }
      }
    },
    "properties": {
      "emotion": {"type": "integer"}
    }
  }
}
# 导入测试数据
POST test03_01/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 查问测试
GET test03_01/_search
{
  "fields": ["*"]
}

# 有以下索引和数据
PUT test04
{
  "mappings": {
    "properties": {
      "A":{"type": "long"},
      "B":{"type": "long"}
    }
  }
}
PUT task04/_bulk
{"index":{"_id":1}}
{"A":100,"B":2}
{"index":{"_id":2}}
{"A":120,"B":2}
{"index":{"_id":3}}
{"A":120,"B":25}
{"index":{"_id":4}}
{"A":21,"B":25}

# 需要：在 task04 索引里，创立一个 runtime 字段，其值是 A -B，名称为 A_B；创立一个 range 聚合，分为三级：小于 0，0-100，100 以上；返回文档数
// 应用知识点：// 1. 检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
// 2. 范畴聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html

# 后果测试
GET task04/_search
{
  "fields": ["*"], 
  "size": 0, 
  "runtime_mappings": {
    "A_B": {
      "type": "long",
      "script": {"source": """emit(doc['A'].value - doc['B'].value);"""
      }
    }
  },
  "aggs": {
    "price_ranges_A_B": {
      "range": {
        "field": "A_B",
        "ranges": [{ "to": 0},
          {"from": 0, "to": 100},
          {"from": 100}
        ]
      }
    }
  }
}

编辑切换为居中

增加图片正文，不超过 140 字（可选）

// 留神：text 类型默认是不能排或聚合的，如果非要排序或聚合，须要开启 fielddata
GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {"customer_last_name": "wood"}
  },
  "highlight": {
    "number_of_fragments": 3,
    "fragment_size": 150,
    "fields": {
      "customer_last_name": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"]
      }
    }
  },
  "sort": [
    {
      "currency": {"order": "desc"},
      "_score": {"order": "asc"}
    }
  ]
}

# 留神 from 的起始值是 0 不是 1
GET kibana_sample_data_ecommerce/_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match": {"customer_last_name": "wood"}
  }
}

# 题目
In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#"
return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40'

# highlight 解决 text_entry 字段；关键词 Hamlet 高亮
# page 分页：from：40；size:20
# speech_number：倒序

POST test09/_search
{
  "from": 40,
  "size": 20,
  "query": {
    "bool": {
      "must": [
        {
          "match": {"text_entry": "Hamlet"}
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "text_entry": {
        "pre_tags": ["#aaa#"],
        "post_tags": ["#bbb#"]
      }
    }
  },
  "sort": [
    {
      "speech_number.keyword": {"order": "desc"}
    }
  ]
}

执行异步检索
POST /sales*/_async_search?size=0
查看异步检索
GET /_async_search/id 值
查看异步检索状态
GET /_async_search/id 值
删除、终止异步检索
DELETE /_async_search/id 值

返回值	含意
id	异步检索返回的惟一标识符
is_partial	当查问不再运行时，批示再所有分片上搜寻是胜利还是失败。在执行查问时，is_partial=true
is_running	搜寻是否依然再执行
total	将在多少分片上执行搜寻
successful	有多少分片曾经胜利实现搜寻

在 ES 中，索引别名（index aliases）就像一个快捷方式或软连贯，能够指向一个或多个索引。别名带给咱们极大的灵活性，咱们能够应用索引别名实现以下性能：

在一个运行中的 ES 集群中无缝的切换一个索引到另一个索引上（无需停机）
分组多个索引，比方按月创立的索引，咱们能够通过别名结构出一个最近 3 个月的索引
查问一个索引外面的局部数据形成一个相似数据库的视图（views

形式 1：POST index_01,index_02.index_03/_search 形式 2：POST index*/search

创立索引的同时指定别名

# 指定 test05 的别名为 test05_aliases
PUT test05
{
  "mappings": {
    "properties": {
      "name":{"type": "keyword"}
    }
  },
  "aliases": {"test05_aliases": {}
  }
}

应用索引模板的形式指定别名

PUT _index_template/template_1
{"index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {"number_of_shards": 1},
    "mappings": {
      "_source": {"enabled": true},
      "properties": {
        "host_name": {"type": "keyword"},
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {"mydata": {}
    }
  },
  "priority": 500,
  "composed_of": ["component_template1", "runtime_component_template"], 
  "version": 3,
  "_meta": {"description": "my custom"}
}

对已有的索引创立别名

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners
# 为 'accounts-row' 定义一个索引别名，称为 'accounts-male': 利用一个过滤器，只显示男性账户所有者

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "accounts-row",
        "alias": "accounts-male",
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {"gender.keyword": "male"}
              }
            ]
          }
        }
      }
    }
  ]
}

# 创立检索模板
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {"{{query_key}}": "{{query_value}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}

# 应用检索模板查问
GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_key": "your filed",
    "query_value": "your filed value",
    "from": 0,
    "size": 10
  }
}

PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {"message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    },
    "params": {"query_string": "My query string"}
  }
}

POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}

GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 0,
    "size": 10
  }
}

GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts

DELETE _scripts/my-search-templateath=metadata.stored_scripts

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑切换为居中

增加图片正文，不超过 140 字（可选）

// 一批数据里，有不同的标签，数据结构统一，不同的标签存储到不同的索引（A、B、C），最初要严格依照标签来分类展现的话，用什么查问比拟好?
// 要求：先展现 A 类，而后 B 类，而后 C 类

# 测试数据如下
put /index_a_123/_doc/1
{"title":"this is index_a..."}
put /index_b_123/_doc/1
{"title":"this is index_b..."}
put /index_c_123/_doc/1
{"title":"this is index_c..."}
# 一般不指定的查问形式，该查问形式下，返回的三条后果数据评分是雷同的
POST index_*_123/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {"title": "this"}
        }
      ]
    }
  }
}

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
indices_boost
# 也就是索引层面晋升权重
POST index_*_123/_search
{
  "indices_boost": [
    {"index_a_123": 10},
    {"index_b_123": 5},
    {"index_c_123": 1}
  ], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {"title": "this"}
        }
      ]
    }
  }
}

 某索引 index_a 有多个字段，要求实现如下的查问：1）针对字段 title，满足 'ssas' 或者 'sasa’。2）针对字段 tags（数组字段），如果 tags 字段蕴含 'pingpang',
则晋升评分。要求：写出实现的 DSL？# 测试数据如下
put index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas","tags":"basketball"}
{"index":{"_id":2}}
{"title":"sasa","tags":"pingpang; football"}

# 解法 1
POST index_a/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {"title": "ssas"}
              },
              {
                "match": {"title": "sasa"}
              }
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "tags": {
              "query": "pingpang",
              "boost": 1
            }
            
          }
        }
      ]
    }
  }
}
# 解法 2
// https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
POST index_a/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "query": {
              "match": {
                "tags": {"query": "pingpang"}
              }
            },
            "boost": 1
          }
        }
      ],
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {"title": "ssas"}
              },
              {
                "match": {"title": "sasa"}
              }
            ]
          }
        }
      ]
    }
  }
}

 对于某些后果不称心，但又不想通过 must_not 排除掉，能够思考能够思考 boosting query 的 negative_boost。即：升高评分
negative_boost
(Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.
官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html

POST index_a/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {"tags": "football"}
      },
      "negative": {
        "term": {"tags": "pingpang"}
      },
      "negative_boost": 0.5
    }
  }
}

 如何同时依据 销量和浏览人数进行相关度晋升？问题形容：针对商品，例如有想要有一个晋升相关度的计算，同时针对销量和浏览人数？例如 oldScore*(销量 + 浏览人数)
**************************  
商品        销量        浏览人数  
A         10           10      
B         20           20
C         30           30
************************** 
# 示例数据如下    
put goods_index/_bulk
{"index":{"_id":1}}
{"name":"A","sales_count":10,"view_count":10}
{"index":{"_id":2}}
{"name":"B","sales_count":20,"view_count":20}
{"index":{"_id":3}}
{"name":"C","sales_count":30,"view_count":30}

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
知识点：script_score

POST goods_index/_search
{
  "query": {
    "function_score": {
      "query": {"match_all": {}
      },
      "script_score": {
        "script": {"source": "_score * (doc['sales_count'].value+doc['view_count'].value)"
        }
      }
    }
  }
}

编辑切换为居中

增加图片正文，不超过 140 字（可选）

 写一个查问，要求某个关键字再文档的四个字段中至多蕴含两个以上
性能点：bool 查问，should / minimum_should_match
    1. 检索的 bool 查问
    2. 细节点 minimum_should_match
留神：minimum_should_match 当有其余子句的时候，默认值为 0，当没有其余子句的时候默认值为 1

POST test_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {"filed1": "kr"}
        },
        {
          "match": {"filed2": "kr"}
        },
        {
          "match": {"filed3": "kr"}
        },
        {
          "match": {"filed4": "kr"}
        }
      ],
      "minimum_should_match": 2
    }
  }
}

编辑切换为居中

增加图片正文，不超过 140 字（可选）

编辑

增加图片正文，不超过 140 字（可选）

 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html
# 依照作者统计文档数
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      }
    }
  }
}

 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html
# 依照 up_time 按月进行统计
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_up_time": {
      "date_histogram": {
        "field": "up_time",
        "calendar_interval": "month"
      }
    }
  }
}

 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html
# 获取 up_time 最大的
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_max_up_time": {
      "max": {"field": "up_time"}
    }
  }
}

 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html
# 依据 user 聚合只取一个聚合后果，并且获取命中数据的详情前 3 条，并依照指定字段排序
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "terms_agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "video_time",
                "title",
                "see",
                "user",
                "up_time"
              ]
            }, 
            "sort": [
              {
                "see":{"order": "desc"}
              }
            ], 
            "size": 3
          }
        }
      }
    }
  }
}

// 返回后果如下
{
  "took" : 91,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : []},
  "aggregations" : {
    "terms_agg_user" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 975,
      "buckets" : [
        {
          "key" : "Elastic 搜寻",
          "doc_count" : 25,
          "top_user_hits" : {
            "hits" : {
              "total" : {
                "value" : 25,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "5ccCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "03:45",
                    "see" : "92",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会 2021: 用加 Gatling 进行 Elasticsearch 的负载测试，寓教于乐。",
                    "user" : "Elastic 搜寻"
                  },
                  "sort" : ["92"]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "8scCVoQBUyqsIDX6wIgn",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "10:18",
                    "see" : "79",
                    "up_time" : "2020-10-20",
                    "title" : "为 Elasticsearch 启动 htpps 拜访",
                    "user" : "Elastic 搜寻"
                  },
                  "sort" : ["79"]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "7scCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "04:41",
                    "see" : "71",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会 2021: Elasticsearch 作为一个天文空间的数据库",
                    "user" : "Elastic 搜寻"
                  },
                  "sort" : ["71"]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

# 依据 order_date 按月分组，并且求销售总额大于 1000
POST kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "date_his_aggs": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sum_aggs": {
          "sum": {"field": "total_unique_products"}
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {"totalSales": "sum_aggs"},
            "script": "params.totalSales > 1000"
          }
        }
      }
    }
  }
}

earthquakes 索引中蕴含了过来 30 个月的地震信息，请通过一句查问，获取以下信息
l 过来 30 个月，每个月的均匀 mag
l 过来 30 个月里，均匀 mag 最高的一个月及其均匀 mag
l 搜寻不能返回任何文档
    
max_bucket 官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html

POST earthquakes/_search
{
  "size": 0, 
  "query": {
    "range": {
      "time": {
        "gte": "now-30M/d",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "agg_time_his": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_aggs": {
          "avg": {"field": "mag"}
        }
      }
    },
    "max_mag_sales": {
      "max_bucket": {"buckets_path": "agg_time_his>avg_aggs"}
    }
  }
}

商业倒退与职能技术部 - 体验保障研发组 康睿 姚再毅 李振 刘斌 王北永

一. 索引的定义

索引的全局认知

索引的定义

索引的创立

index-settings 参数解析

索引的基本操作

二.Mapping-Param 之 dynamic

外围性能

初识 dynamic

dynamic 的可选值

动静映射的弊病

三.Mapping-Param 之 doc_values

外围性能

何为正排索引

doc_values 可选值

真题演练

四. 分词器 analyzers

ik 中文分词器装置

何为倒排索引

数据索引化的过程

分词器的分类

五. 自定义分词

自定义分词器三段论

1.Character filters 字符过滤

2.Tokenizer 文本切为分词

3.Token filters 分词后再过滤

真题演练

解法 1

解法 2

六.multi-fields

七.runtime_field 运行时字段

产生背景

解决方案

利用场景

性能个性

理论应用

真题演练 1

解法 1

解法 2

真题演练 2

解法

八.Search-highlighted

highlighted 语法初识

九.Search-Order

Order 语法初识

十.Search-Page

page 语法初识

真题演练 1

十一.Search-AsyncSearch

发行版本

实用场景

常用命令

异步查问后果阐明

十二.Aliases 索引别名

Aliases 的作用

假如没有别名，如何解决多索引的检索

创立别名的三种形式

删除别名

真题演练 1

十三.Search-template

性能特点

初识 search-template

索引模板的操作

创立索引模板

验证索引模板

执行检索模板

获取全副检索模板

删除检索模板

十四.Search-dsl 简略检索

检索选型

检索分类

自定义评分

如何自定义评分

1.index Boost 索引层面批改相关性

2.boosting 批改文档相关性

3.negative_boost 升高相关性

4.function_score 自定义评分

十五.Search-del Bool 简单检索

根本语法

商业倒退与职能技术部 - 体验保障研发组康睿姚再毅李振刘斌王北永