关于数据库:MySQL80-优化器介绍三

GreatSQL 社区原创内容未经受权不得随便应用，转载请分割小编并注明起源。
GreatSQL 是 MySQL 的国产分支版本，应用上与 MySQL 统一。
作者：奥特曼爱小怪兽
文章起源：GreatSQL 社区原创

往期回顾

MySQL8.0 优化器介绍（一）

MySQL8.0 优化器介绍（二）

本篇将进一步深刻介绍优化器相干的 join 优化

为更好的了解本篇内容须要提前看一下以下内容：

单表拜访的办法，参考《MySQL 是怎么运行的：从根儿上了解 MySQL》第 10 章 ” 单表拜访办法 ”
更多 select 语句级别的优化细节见（https://dev.mysql.com/doc/refman/8.0/en/select-optimization.html）

为了让读者对 join 优化有更深的理解，章节里的 sql 例子，留了一些思考和入手的问题。可能大家失去的答案会不同，但摸索未知的过程，形式应该是一样的。

MySQL 能够应用 Join Optimizations 来改良上次分享过的 join algorithms，或者决定如何执行局部查问。本次次要介绍三种常常用到的 join Optimizations，更多的 join type 见上面的链接：（https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#e…）

通常 MySQL 只会对每个表应用一个索引。然而，如果对同一表中的多个列在 where 后有条件限度，并且没有笼罩所有列的单个索引，无论选哪个索引都不是最佳的。对于这些状况，MySQL 反对索引合并 (index merge)。select a,b,c from t where a=x1 and b=x2 and c=x3, 这种状况下，建设一个多列的复合索引 index_abc 比应用 index merge +index_a+index_b+index_c 性能更好。

Index merge 反对三种算法见下表

查问打算应用 index merge 时，会在 explain sql 的 access type 列有 ”index_merge”，key 列会蕴含所有参加 merge 的列，key_length 蕴含一个所用索引的最长要害局部的列表。举个 Intersection 例子：

以下代码块正文中提到的知识点略多

## 无论 optimizer 是否抉择  index merge 取决于 index  statistics. 
## index  statistics 是从哪个试图取得呢？mysql.innodb_index_stats  还是 information_schema.statistics 
## 还是 information_schema.INNODB_SYS_TABLESTATS？## 能够参考 https://www.cnblogs.com/ClassicMan/articles/15871403.html
## index_dive  eq_range_index_dive_limit 这两个参数有什么作用？## 意味着即便返回雷同 STATEMENT_DIGEST_TEXT 的 sql 查问语句, WHERE 语句前面跟不同的值，失去的查问打算可能是不一样的  ## 比方 select * from people where name='惟一值'；##select * from people where name='超级多的反复值'

## 同理 index statistics 的扭转会让同一个查问走不同的执行打算，## 体现在 select a,b from t where a=1 and b=1  有时走了 index merges，有时没走。CREATE TABLE `payment` (
 `payment_id` smallint unsigned NOT NULL,
 `customer_id` smallint unsigned NOT NULL,
 `staff_id` tinyint unsigned NOT NULL,
 `rental_id` int(DEFAULT NULL,
 `amount` decimal(5,2) NOT NULL,
 `payment_date` datetime NOT NULL,
 `last_update` timestamp NULL,
 PRIMARY KEY (`payment_id`),
 KEY `idx_fk_staff_id` (`staff_id`),
 KEY `idx_fk_customer_id` (`customer_id`),
 KEY `fk_payment_rental` (`rental_id`)
) ENGINE=InnoDB;

## case1  等值查问
SELECT *
  FROM sakila.payment
 WHERE staff_id = 1
   AND customer_id = 75;
 
 mysql> EXPLAIN SELECT *
                  FROM sakila.payment
                 WHERE staff_id = 1
                   AND customer_id = 75\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
 possible_keys: idx_fk_staff_id,idx_fk_customer_id
 key: idx_fk_customer_id,idx_fk_staff_id
 key_len: 2,1
 ref: NULL
 rows: 20
 filtered: 100
 Extra: Using intersect(idx_fk_customer_id,idx_fk_staff_id); Using 
where 1 row in set, 1 warning (0.0007 sec)
 
mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 AND customer_id = 75\G
**************************** 1. row ****************************
EXPLAIN: -> Filter: ((sakila.payment.customer_id = 75) and (sakila.payment.staff_id = 1)) (cost=14.48 rows=20)
             -> Index range scan on payment using intersect(idx_fk_customer_id,idx_fk_staff_id) (cost=14.48 rows=20)
1 row in set (0.0004 sec)

## 留神 "Index range scan on payment", 两个等值查问条件，为啥触发了 rang scan？## case2  上面的 sql 范畴查问也能用到 index  merge 吗？执行打算 本人上来测试验证
SELECT *
 FROM sakila.payment
 WHERE payment_id > 10
 AND customer_id = 318;

##case1 等值查问
SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318;
 
 mysql> EXPLAIN
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
 possible_keys: idx_fk_staff_id,idx_fk_customer_id
 key: idx_fk_staff_id,idx_fk_customer_id
 key_len: 1,2
 ref: NULL
 rows: 8069
 filtered: 100
 Extra: Using union(idx_fk_staff_id,idx_fk_customer_id); Using where
1 row in set, 1 warning (0.0008 sec)
 
 mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE staff_id = 1
 OR customer_id = 318\G
**************************** 1. row ****************************
EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2236.18 rows=8069)
             -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2236.18 rows=8069)
1 row in set (0.0010 sec)

 ## case2 范畴查问也能用到 index  merge 吗？执行打算 本人上来测试验证，## 有主键参加后，和 Intersection 章节的 case2 执行打算中用到的索引个数有啥不同？SELECT *
 FROM sakila.payment
 WHERE payment_id > 15000
 OR customer_id = 318;

SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10;
 
 mysql> EXPLAIN
 SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: payment
 partitions: NULL
 type: index_merge
possible_keys: idx_fk_customer_id,fk_payment_rental
 key: idx_fk_customer_id,fk_payment_rental
 key_len: 2,5
 ref: NULL
 rows: 826
 filtered: 100
 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); 
Using where 1 row in set, 1 warning (0.0009 sec)

mysql> EXPLAIN FORMAT=TREE
 SELECT *
 FROM sakila.payment
 WHERE customer_id < 30
 OR rental_id < 10\G
**************************** 1. row *****************************
EXPLAIN: -> Filter: ((sakila.payment.customer_id < 30) or (sakila.payment.rental_id < 10)) (cost=1040.52 rows=826)
             -> Index range scan on payment using sort_union(idx_fk_customer_id,fk_payment_rental) (cost=1040.52 rows=826)
1 row in set (0.0005 sec)

多范畴读取（MRR）优化旨在缩小对辅助索引进行范畴扫描所导致的随机 I / O 量。优化读取索引

首先，依据行 id（InnoDB 的汇集索引）对键进行排序，而后按行的存储程序检索行。多量程读取优化

能够用于范畴扫描和应用索引的等值连贯。不反对虚构生成列上的辅助索引。

应用 InnoDB 进行多范畴读取优化的次要用例是用于没有笼罩索引的磁盘绑定查问（disk-bound queries 另外一个层面对 disk-bound 的优化，具体可见：https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-dis…）。优化的成果取决于须要多少行以及存储器的查找时间。MySQL 将会估算（estimate）是否有用。然而，老本估算在过于乐观而不是过于乐观的一面，因而可能有必要提供帮忙优化器做出正确决策的信息。

有两个 optimizer switches 管制 MRR 优化

mrr: Whether the optimizer is allowed to use the Multi-Range Read optimization. The default is ON.
mrr_cost_based: Whether the decision to use the Multi-Range Read optimization is cost based. You can disable this option to always use the optimization when it is supported. The default is ON

能够用 MRR() 和 NO_MRR() 两个 optimizer switches 来管制表级别 or 索引级别的 Multi-Range Read，举个例子：

mysql> EXPLAIN
 SELECT /*+ MRR(city) */
 *
 FROM world.city
 WHERE CountryCode BETWEEN 'AUS' AND 'CHN'\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: city
 partitions: NULL
 type: range
possible_keys: CountryCode
 key: CountryCode
 key_len: 3
 ref: NULL
 rows: 812
 filtered: 100
 Extra: Using index condition; Using MRR
1 row in set, 1 warning (0.0006 sec)


有必要应用 MRR（）优化器提醒或禁用基于 MRR_cost_based 的优化器开关。示例中查问的预计行数太小，没有 MRR 的 hint 时，基于老本的优化无奈应用 MRR。只能显示用 hint 来干涉查问打算应用 MRR。当 MRR 的优化被应用时, MySQL 须要用到 random read buffer 来存储 indexes. 
有一个参数能够影响 MRR 的性能 read_rnd_buffer_size.

能够简略认为 BKA=BNL+MRR . 这使得能够以与非索引连贯相似的形式将连贯缓冲区用于索引连贯，并应用多范畴读取优化来缩小随机 I / O 的数量。BKA 用于大量 disk-bound 查问的场景。然而，没有明确的阐明来确定优化何时有帮忙，何时会导致性能降落。

能够借鉴一下国外出名 dba 在 MySQL 优化方面的 blog（http://oysteing.blogspot.com/2012/04/improved-dbt-3-results-w…）

MRR 在最优时，查问耗时缩小 20%, 最蹩脚时查问耗时减少 2 /3。

BKA 次要的获益在一个绝对较窄的查问范畴，而其余查问的性能可能会升高，因而默认状况下禁用该优化。

（能够简略了解为 MySQL5.6 时，bka 优化带来的收益小于 bka 带来的老本开销）除非确定开启 bka 能来晋升时，再用 hint BKA() 来启用。session 级别开启：

SET SESSION optimizer_switch = 'mrr=on,mrr_cost_based=off,batched_key_access=on';

一个应用 BKA 的例子

mysql> EXPLAIN
 SELECT /*+ BKA(ci) */
 co.Code, co.Name AS Country,
 ci.Name AS City
 FROM world.country co
 INNER JOIN world.city ci
 ON ci.CountryCode = co.Code\G
**************************** 1. row *****************************
 id: 1
 select_type: SIMPLE
 table: co
 partitions: NULL
 type: ALL
 possible_keys: PRIMARY
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 239
 filtered: 100
 Extra: NULL
**************************** 2. row *****************************
 id: 1
 select_type: SIMPLE
 table: ci
 partitions: NULL
 type: ref
 possible_keys: CountryCode
 key: CountryCode
 key_len: 3
 ref: world.co.Code
 rows: 18
 filtered: 100
 Extra: Using join buffer (Batched Key Access)
2 rows in set, 1 warning (0.0007 sec)

留神看执行打算中 Extra 的关键字 “Using join buffer”, 阐明 join_buffer_size 会影响 BKA 个性的性能。从全局怎么调整 join_buffer_size，并又能充分利用上 BKA，是一个极大的挑战。调优最常见的问题，搞定了 A sql, 又引出了其余问题，比方内存使用率过高。

MySQL 还主动反对其余 join 优化，一旦对查问有性能帮忙，优化器会主动抉择他们，个别不须要手动。

理解一下其余 join 的优化形式，有助于咱们在遇到 sql 性能问题时，能够适当给与优化器，一些有用的 hint。

具体有哪些 join 优化形式，能够查看 explain 输入中的 Extra 的内容阐明。本文可能列举的不全，精力有限只做了一些简略的介绍，具体细节须要查看官网，以及大量的实际。

Condition Filtering 条件过滤当一个表有两个或多个与之相关联的条件，并且一个索引能够用于局部条件时，应用条件过滤优化。启用条件过滤后，在预计表的总体过滤时，将思考其余条件的过滤成果。
- Optimizer Switch: condition_fanout_filter – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Derived Merge 优化器能够将派生表(derived table)、视图援用和公共表表达式合并到它们所属的查问块中。优化的代替办法是物化表（materialize the table）、视图援用或公共表表达式。
- Optimizer Switch: derived_merge – enabled by default.
- Optimizer Hints: MERGE(), NO_MERGE().
- EXPLAIN Output: The query plan reflects that the derived table has been merged
Engine Condition Pushdown 此优化将条件向下推到存储引擎。目前仅 NDBCluster 存储引擎反对它。
Index Condition Pushdown

官网文档中给的例子和解释如下：people 表中（zipcode，lastname，firstname）形成一个索引 SELECT * FROM people WHERE zipcode=’95054′ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street%’;

如果没有应用索引下推技术，则 MySQL 会通过 zipcode=’95054’ 从存储引擎中查问对应的数据，返回到 MySQL 服务端，而后 MySQL 服务端基于 lastname LIKE ‘%etrunia%’ 和 address LIKE ‘%Main Street%’ 来判断数据是否符合条件。

如果应用了索引下推技术，则 MySQL 首先会返回合乎 zipcode=’95054’ 的索引，而后依据 lastname LIKE ‘%etrunia%’ 和 address LIKE ‘%Main Street%’ 来判断索引是否符合条件。如果符合条件，则依据该索引来定位对应的数据，如果不合乎，则间接 reject 掉。

有了索引下推优化，能够在有 like 条件查问的状况下，缩小回表次数。

该优化也用于二级索引的范畴条件。
- Optimizer Switch: index_condition_pushdown – enabled by default.
- Optimizer Hints: NO_ICP().
- EXPLAIN Output: The traditional format has Using index condition in the Extra column, and the JSON format sets the index_condition field with the index condition that is pushed
Index Extensions InnoDB 中的所有二级非惟一索引都将主键列附加到索引中。当启用索引扩大优化时，MySQL 会将主键列视为索引的一部分。
- Optimizer Switch: use_index_extensions – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Index Visibility 当表具备不可见的索引（invisible index）时，默认状况下，优化器在创立查问打算时不会思考它。如果启用了索引可见性优化器开关，则将思考不可见的索引。例如，这能够用于测试已增加但尚未可见的索引的成果。
- Optimizer Switch: use_invisible_indexes – disabled by default
  - Optimizer Hints: None
  - EXPLAIN Output: None
Loose Index Scan 在某些状况下，MySQL 能够应用局部索引来进步聚合数据或蕴含 DISTINCT 子句的查问的性能。这要求列用于通过造成多列索引的左前缀以及不用于分组的其余列来分组数据。当存在 GROUP BY 子句时，只有 MIN（）和 MAX（）聚合函数能力应用这个个性。

distinct 效率更高还是 group by 效率更高？
- Optimizer Switch: None.
  - Optimizer Hints: NO_RANGE_OPTIMIZATION() disables the loose index scan optimization as well as index merges and range scans.
  - EXPLAIN Output: The traditional format has Using index for group-by in the Extra column. The JSON format sets the using_index_for_group_by field to true.
Range Access Method 范畴优化与其余优化略有不同，因为它被认为是一种拜访办法。MySQL 将只扫描表或索引的一个或多个局部，而不是执行残缺的表或索引扫描。范畴拜访办法通常用于波及运算符 >、>=、<、=<、BETWEEN、IN（），为 NULL、LIKE 等 range-Optimization 与 index merge Optimization 具备同样的重要性。（https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html）
- Optimizer Switch: None.
  - Optimizer Hints: NO_RANGE_OPTIMIZATION() – this also disables the loose index scan and index merge optimizations. It does however not disable the skip scan optimization even though that also uses range access.
  - EXPLAIN Output: The access method is set to range. range_optimizer_max_mem_size 能够限度 range access 应用的内存。默认是 8M
Semijoin 半联接优化用于 IN 和 EXIST 条件。反对四个策略:

当 subquery materialization 开启时，Semijoin 会尽可能的应用 materialization 策略。EXISTS 在 MySQL8.0.16 当前反对半联接。NOT EXISTS 在 MySQL8.0.17 当前反对半联接。每种策略，都能够以参数的模式，用于 SEMIJOIN() and NO_SEMIJOIN() hint

SEMIJOIN(DUPSWEEDOUT):The duplicate weedout strategy executes the semijoin as if it is a normal join and removes the duplicates using a temporary table. EXPLAIN Output: The traditional format has Start temporary and End temporary in the Extra column for the tables involved. The JSON-formatted output uses a block named duplicates_removal

SEMIJOIN(FIRSTMATCH)：The first match strategy returns the first match for each value rather than all values. EXPLAIN Output: The traditional format has FirstMatch(…) in the Extra column where the value between parentheses is the name of the reference table. The JSON format sets the value of the first_match field to the name of the reference table

SEMIJOIN(LOOSESCAN)：The loose scan strategy uses an index to choose a single value from each of the subquery’s value groups. EXPLAIN Output: The traditional format has LooseScan(m..n) in the Extra column where m and n indicate which parts of the index are used for the loose scan. The JSON format sets the loosescan field equal to true

半连贯特地怕 null 值，Oracle 常常在以下方面出问题：
- where null in (a,b,c,null), null exists (null) .
  - sum（null) 返回 null,count(null) 返回 0
  - materialization
  - duplicate weedout
  - first match
  - loose scan (不要和 loose index scan optimization 混同)。
Skip Scan

Skip Scan MySQL 8.0.13 引入，工作形式相似 loose index scan. 当多列索引的第二列上存在范畴条件，但第一列上没有条件时应用。Skip Scan 将整个索引扫描转换为一系列范畴扫描（对索引中第一列的每个值进行一次范畴扫描）。

scan in the Extra column, and the JSON format sets the using_index_for_skip_scan field to true
- Optimizer Switch: skip_scan – enabled by default.
  - Optimizer Hints: SKIP_SCAN(), NO_SKIP_SCAN().
  - EXPLAIN Output: The traditional format has Using index for skip
Subquery Materialization

子查问物化策略将子查问的后果存储在外部长期表中。如果可能的话，优化器将在长期表上增加一个主动生成的哈希索引，将使其疾速连贯到查问的其余部分。

当启用了 subquery_materialization_cost_based 优化器开关（默认开）时，优化器将应用预计的老本来决定是应用 Subquery Materialization 还是应用 IN-to-EXIST 子查问转换（将 IN 条件重写为 EXISTS）。

当开关敞开时，优化器总是抉择 Subquery Materialization。
- Optimizer Switch: materialization – enabled by default.
  - Optimizer Hints: SUBQUERY(MATERIALIZATION).
  - EXPLAIN Output: The traditional format has MATERIALIZED as the select type. The JSON format creates a block named materialized_from_subquery.

还有咱们能够用哪些办法影响优化器，下篇文章再续。

Enjoy GreatSQL :)

## 对于 GreatSQL

GreatSQL 是由万里数据库保护的 MySQL 分支，专一于晋升 MGR 可靠性及性能，反对 InnoDB 并行查问个性，是实用于金融级利用的 MySQL 分支版本。

相干链接：GreatSQL 社区 Gitee GitHub Bilibili

社区博客有奖征稿详情：https://greatsql.cn/thread-100-1-1.html

微信：扫码增加 GreatSQL 社区助手 微信好友，发送验证信息 加群。

关于数据库:MySQL80-优化器介绍三

join 优化（Join Optimizations）

index merge

Intersection

Union Algorithm

Sort-Union Algorithm

Multi-Range Read (MRR)

Batched Key Access (BKA)

其余 join 优化

GreatSQL 社区：

技术交换群：

Just My Socks（注册教程内含优惠码）

关于数据库:MySQL80-优化器介绍三

join 优化（Join Optimizations）

index merge

Intersection

Union Algorithm

Sort-Union Algorithm

Multi-Range Read (MRR)

Batched Key Access (BKA)

其余 join 优化

GreatSQL 社区：

技术交换群：

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）