关于数据库:MySQL80-优化器介绍三

GreatSQL社区原创内容未经受权不得随便应用，转载请分割小编并注明起源。
GreatSQL是MySQL的国产分支版本，应用上与MySQL统一。
作者：奥特曼爱小怪兽
文章起源：GreatSQL社区原创

往期回顾

MySQL8.0 优化器介绍（一）

MySQL8.0 优化器介绍（二）

本篇将进一步深刻介绍优化器相干的join优化

为更好的了解本篇内容须要提前看一下以下内容：

单表拜访的办法，参考《MySQL 是怎么运行的：从根儿上了解 MySQL》第10章"单表拜访办法"
更多select语句级别的优化细节见（https://dev.mysql.com/doc/refman/8.0/en/select-optimization.html）

为了让读者对join优化有更深的理解，章节里的sql例子，留了一些思考和入手的问题。可能大家失去的答案会不同，但摸索未知的过程，形式应该是一样的。

join优化（Join Optimizations）

MySQL能够应用Join Optimizations来改良上次分享过的join algorithms，或者决定如何执行局部查问。本次次要介绍三种常常用到的join Optimizations，更多的 join type 见上面的链接：（https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#e...）

index merge

通常MySQL只会对每个表应用一个索引。然而，如果对同一表中的多个列在where后有条件限度，并且没有笼罩所有列的单个索引，无论选哪个索引都不是最佳的。对于这些状况，MySQL反对索引合并 (index merge)。select a,b,c from t where a=x1 and b=x2 and c=x3,这种状况下，建设一个多列的复合索引 index_abc 比应用 index merge +index_a+index_b+index_c 性能更好。

Index merge 反对三种算法见下表

查问打算应用index merge 时，会在explain sql 的 access type 列有"index_merge"，key 列会蕴含所有参加merge的列， key_length 蕴含一个所用索引的最长要害局部的列表。举个Intersection例子：

Intersection

以下代码块正文中提到的知识点略多

##无论optimizer 是否抉择  index merge 取决于index  statistics. ## index  statistics 是从哪个试图取得呢？mysql.innodb_index_stats  还是 information_schema.statistics ## 还是 information_schema.INNODB_SYS_TABLESTATS？ ## 能够参考 https://www.cnblogs.com/ClassicMan/articles/15871403.html## index_dive  eq_range_index_dive_limit 这两个参数有什么作用？##意味着即便返回雷同STATEMENT_DIGEST_TEXT的sql查问语句, WHERE语句前面跟不同的值，失去的查问打算可能是不一样的  ##比方select * from people where name='惟一值'；##select * from people where name='超级多的反复值'## 同理index statistics 的扭转会让同一个查问走不同的执行打算，## 体现在 select a,b from t where a=1 and b=1  有时走了 index merges，有时没走。CREATE TABLE `payment` ( `payment_id` smallint unsigned NOT NULL, `customer_id` smallint unsigned NOT NULL, `staff_id` tinyint unsigned NOT NULL, `rental_id` int(DEFAULT NULL, `amount` decimal(5,2) NOT NULL, `payment_date` datetime NOT NULL, `last_update` timestamp NULL, PRIMARY KEY (`payment_id`), KEY `idx_fk_staff_id` (`staff_id`), KEY `idx_fk_customer_id` (`customer_id`), KEY `fk_payment_rental` (`rental_id`)) ENGINE=InnoDB;## case1  等值查问SELECT *  FROM sakila.payment WHERE staff_id = 1   AND customer_id = 75;  mysql> EXPLAIN SELECT *                  FROM sakila.payment                 WHERE staff_id = 1                   AND customer_id = 75\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_customer_id,idx_fk_staff_id key_len: 2,1 ref: NULL rows: 20 filtered: 100 Extra: Using intersect(idx_fk_customer_id,idx_fk_staff_id); Using where 1 row in set, 1 warning (0.0007 sec) mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE staff_id = 1 AND customer_id = 75\G**************************** 1. row ****************************EXPLAIN: -> Filter: ((sakila.payment.customer_id = 75) and (sakila.payment.staff_id = 1)) (cost=14.48 rows=20)             -> Index range scan on payment using intersect(idx_fk_customer_id,idx_fk_staff_id) (cost=14.48 rows=20)1 row in set (0.0004 sec)##留神"Index range scan on payment",两个等值查问条件，为啥触发了rang scan？## case2  上面的sql范畴查问也能用到index  merge 吗？执行打算 本人上来测试验证SELECT * FROM sakila.payment WHERE payment_id > 10 AND customer_id = 318;

Union Algorithm

##case1 等值查问SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318;  mysql> EXPLAIN SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_staff_id,idx_fk_customer_id key_len: 1,2 ref: NULL rows: 8069 filtered: 100 Extra: Using union(idx_fk_staff_id,idx_fk_customer_id); Using where1 row in set, 1 warning (0.0008 sec)  mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G**************************** 1. row ****************************EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2236.18 rows=8069)             -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2236.18 rows=8069)1 row in set (0.0010 sec) ## case2 范畴查问也能用到index  merge 吗？执行打算 本人上来测试验证，  ## 有主键参加后，和Intersection 章节的case2 执行打算中用到的索引个数有啥不同？SELECT * FROM sakila.payment WHERE payment_id > 15000 OR customer_id = 318;

Sort-Union Algorithm

SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10;  mysql> EXPLAIN SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_mergepossible_keys: idx_fk_customer_id,fk_payment_rental key: idx_fk_customer_id,fk_payment_rental key_len: 2,5 ref: NULL rows: 826 filtered: 100 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); Using where 1 row in set, 1 warning (0.0009 sec)mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10\G**************************** 1. row *****************************EXPLAIN: -> Filter: ((sakila.payment.customer_id < 30) or (sakila.payment.rental_id < 10)) (cost=1040.52 rows=826)             -> Index range scan on payment using sort_union(idx_fk_customer_id,fk_payment_rental) (cost=1040.52 rows=826)1 row in set (0.0005 sec)

Multi-Range Read (MRR)

多范畴读取（MRR）优化旨在缩小对辅助索引进行范畴扫描所导致的随机I/O量。优化读取索引

首先，依据行id（InnoDB的汇集索引）对键进行排序，而后按行的存储程序检索行。多量程读取优化

能够用于范畴扫描和应用索引的等值连贯。不反对虚构生成列上的辅助索引。

应用InnoDB进行多范畴读取优化的次要用例是用于没有笼罩索引的磁盘绑定查问（ disk-bound queries 另外一个层面对disk-bound 的优化，具体可见：https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-dis...）。优化的成果取决于须要多少行以及存储器的查找时间。MySQL将会估算（estimate）是否有用。然而，老本估算在过于乐观而不是过于乐观的一面，因而可能有必要提供帮忙优化器做出正确决策的信息。

有两个 optimizer switches 管制MRR优化

mrr: Whether the optimizer is allowed to use the Multi-Range Read optimization. The default is ON.
mrr_cost_based: Whether the decision to use the Multi-Range Read optimization is cost based. You can disable this option to always use the optimization when it is supported. The default is ON

能够用MRR() 和NO_MRR() 两个optimizer switches 来管制表级别or 索引级别的 Multi-Range Read ，举个例子：

mysql> EXPLAIN SELECT /*+ MRR(city) */ * FROM world.city WHERE CountryCode BETWEEN 'AUS' AND 'CHN'\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: city partitions: NULL type: rangepossible_keys: CountryCode key: CountryCode key_len: 3 ref: NULL rows: 812 filtered: 100 Extra: Using index condition; Using MRR1 row in set, 1 warning (0.0006 sec)有必要应用MRR（）优化器提醒或禁用基于MRR_cost_based的优化器开关。示例中查问的预计行数太小，没有MRR的hint时，基于老本的优化无奈应用MRR。只能显示用hint来干涉查问打算应用MRR。当MRR的优化被应用时, MySQL须要用到random read buffer来存储indexes. 有一个参数能够影响MRR的性能 read_rnd_buffer_size.

Batched Key Access (BKA)

能够简略认为 BKA=BNL+MRR .这使得能够以与非索引连贯相似的形式将连贯缓冲区用于索引连贯，并应用多范畴读取优化来缩小随机I/O的数量。BKA 用于大量 disk-bound 查问的场景。然而，没有明确的阐明来确定优化何时有帮忙，何时会导致性能降落。

能够借鉴一下国外出名dba在MySQL 优化方面的blog（http://oysteing.blogspot.com/2012/04/improved-dbt-3-results-w...）

MRR 在最优时，查问耗时缩小20%,最蹩脚时查问耗时减少2/3。

BKA 次要的获益在一个绝对较窄的查问范畴，而其余查问的性能可能会升高，因而默认状况下禁用该优化。

（能够简略了解为 MySQL5.6时，bka优化带来的收益小于bka带来的老本开销）除非确定开启bka能来晋升时，再用hint BKA() 来启用。session级别开启：

SET SESSION optimizer_switch = 'mrr=on,mrr_cost_based=off,batched_key_access=on';

一个应用BKA的例子

mysql> EXPLAIN SELECT /*+ BKA(ci) */ co.Code, co.Name AS Country, ci.Name AS City FROM world.country co INNER JOIN world.city ci ON ci.CountryCode = co.Code\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: co partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 239 filtered: 100 Extra: NULL**************************** 2. row ***************************** id: 1 select_type: SIMPLE table: ci partitions: NULL type: ref possible_keys: CountryCode key: CountryCode key_len: 3 ref: world.co.Code rows: 18 filtered: 100 Extra: Using join buffer (Batched Key Access)2 rows in set, 1 warning (0.0007 sec)

留神看执行打算中Extra 的关键字 "Using join buffer",阐明 join_buffer_size 会影响BKA 个性的性能。从全局怎么调整join_buffer_size，并又能充分利用上BKA，是一个极大的挑战。调优最常见的问题，搞定了A sql,又引出了其余问题，比方内存使用率过高。

其余join优化

MySQL 还主动反对其余join 优化，一旦对查问有性能帮忙，优化器会主动抉择他们，个别不须要手动。

理解一下其余join的优化形式，有助于咱们在遇到sql性能问题时，能够适当给与优化器，一些有用的hint。

具体有哪些join 优化形式，能够查看explain 输入中的Extra 的内容阐明。本文可能列举的不全，精力有限只做了一些简略的介绍，具体细节须要查看官网，以及大量的实际。

Condition Filtering 条件过滤当一个表有两个或多个与之相关联的条件，并且一个索引能够用于局部条件时，应用条件过滤优化。启用条件过滤后，在预计表的总体过滤时，将思考其余条件的过滤成果。
- Optimizer Switch: condition_fanout_filter – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Derived Merge 优化器能够将派生表(derived table)、视图援用和公共表表达式合并到它们所属的查问块中。优化的代替办法是物化表（materialize the table）、视图援用或公共表表达式。
- Optimizer Switch: derived_merge – enabled by default.
- Optimizer Hints: MERGE(), NO_MERGE().
- EXPLAIN Output: The query plan reflects that the derived table has been merged
Engine Condition Pushdown 此优化将条件向下推到存储引擎。目前仅NDBCluster存储引擎反对它。
Index Condition Pushdown
官网文档中给的例子和解释如下：people表中（zipcode，lastname，firstname）形成一个索引 SELECT * FROM people WHERE zipcode='95054' AND lastname LIKE '%etrunia%' AND address LIKE '%Main Street%';
如果没有应用索引下推技术，则MySQL会通过zipcode='95054'从存储引擎中查问对应的数据，返回到MySQL服务端，而后MySQL服务端基于lastname LIKE '%etrunia%'和address LIKE '%Main Street%'来判断数据是否符合条件。
如果应用了索引下推技术，则MySQL首先会返回合乎zipcode='95054'的索引，而后依据lastname LIKE '%etrunia%'和address LIKE '%Main Street%'来判断索引是否符合条件。如果符合条件，则依据该索引来定位对应的数据，如果不合乎，则间接reject掉。
有了索引下推优化，能够在有like条件查问的状况下，缩小回表次数。
该优化也用于二级索引的范畴条件。
- Optimizer Switch: index_condition_pushdown – enabled by default.
- Optimizer Hints: NO_ICP().
- EXPLAIN Output: The traditional format has Using index condition in the Extra column, and the JSON format sets the index_condition field with the index condition that is pushed
Index Extensions InnoDB中的所有二级非惟一索引都将主键列附加到索引中。当启用索引扩大优化时，MySQL会将主键列视为索引的一部分。
- Optimizer Switch: use_index_extensions – enabled by default
- Optimizer Hints: None
- EXPLAIN Output: None
Index Visibility 当表具备不可见的索引（ invisible index）时，默认状况下，优化器在创立查问打算时不会思考它。如果启用了索引可见性优化器开关，则将思考不可见的索引。例如，这能够用于测试已增加但尚未可见的索引的成果。
- Optimizer Switch: use_invisible_indexes – disabled by default
  - Optimizer Hints: None
  - EXPLAIN Output: None
Loose Index Scan 在某些状况下，MySQL能够应用局部索引来进步聚合数据或蕴含DISTINCT子句的查问的性能。这要求列用于通过造成多列索引的左前缀以及不用于分组的其余列来分组数据。当存在GROUP BY子句时，只有MIN（）和MAX（）聚合函数能力应用这个个性。
distinct效率更高还是group by效率更高？
- Optimizer Switch: None.
  - Optimizer Hints: NO_RANGE_OPTIMIZATION() disables the loose index scan optimization as well as index merges and range scans.
  - EXPLAIN Output: The traditional format has Using index for group-by in the Extra column. The JSON format sets the using_index_for_group_by field to true.
Range Access Method 范畴优化与其余优化略有不同，因为它被认为是一种拜访办法。MySQL将只扫描表或索引的一个或多个局部，而不是执行残缺的表或索引扫描。范畴拜访办法通常用于波及运算符>、>=、<、=<、BETWEEN、IN（），为NULL、LIKE等 range-Optimization 与 index merge Optimization 具备同样的重要性。（https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html）
- Optimizer Switch: None.
  - Optimizer Hints: NO_RANGE_OPTIMIZATION() – this also disables the loose index scan and index merge optimizations. It does however not disable the skip scan optimization even though that also uses range access.
  - EXPLAIN Output: The access method is set to range. range_optimizer_max_mem_size 能够限度 range access应用的内存。默认是8M
Semijoin 半联接优化用于IN和EXIST条件。反对四个策略:
当subquery materialization 开启时， Semijoin 会尽可能的应用 materialization策略。EXISTS 在MySQL8.0.16 当前反对半联接。NOT EXISTS 在MySQL8.0.17 当前反对半联接。每种策略，都能够以参数的模式，用于SEMIJOIN() and NO_SEMIJOIN() hint
SEMIJOIN(DUPSWEEDOUT):The duplicate weedout strategy executes the semijoin as if it is a normal join and removes the duplicates using a temporary table. EXPLAIN Output: The traditional format has Start temporary and End temporary in the Extra column for the tables involved. The JSON-formatted output uses a block named duplicates_removal
SEMIJOIN(FIRSTMATCH)：The first match strategy returns the first match for each value rather than all values. EXPLAIN Output: The traditional format has FirstMatch(...) in the Extra column where the value between parentheses is the name of the reference table. The JSON format sets the value of the first_match field to the name of the reference table
SEMIJOIN(LOOSESCAN)：The loose scan strategy uses an index to choose a single value from each of the subquery’s value groups. EXPLAIN Output: The traditional format has LooseScan(m..n) in the Extra column where m and n indicate which parts of the index are used for the loose scan. The JSON format sets the loosescan field equal to true
半连贯特地怕 null 值，Oracle 常常在以下方面出问题：
- where null in (a,b,c,null), null exists (null) .
  - sum（null) 返回null,count(null) 返回 0
  - materialization
  - duplicate weedout
  - first match
  - loose scan (不要和 loose index scan optimization混同)。
Skip Scan
Skip Scan MySQL 8.0.13 引入，工作形式相似loose index scan.当多列索引的第二列上存在范畴条件，但第一列上没有条件时应用。Skip Scan将整个索引扫描转换为一系列范畴扫描（对索引中第一列的每个值进行一次范畴扫描）。
scan in the Extra column, and the JSON format sets the using_index_for_skip_scan field to true
- Optimizer Switch: skip_scan – enabled by default.
  - Optimizer Hints: SKIP_SCAN(), NO_SKIP_SCAN().
  - EXPLAIN Output: The traditional format has Using index for skip
Subquery Materialization
子查问物化策略将子查问的后果存储在外部长期表中。如果可能的话，优化器将在长期表上增加一个主动生成的哈希索引，将使其疾速连贯到查问的其余部分。
当启用了subquery_materialization_cost_based优化器开关（默认开）时，优化器将应用预计的老本来决定是应用Subquery Materialization 还是应用 IN-to-EXIST子查问转换（将IN条件重写为EXISTS）。
当开关敞开时，优化器总是抉择Subquery Materialization。
- Optimizer Switch: materialization – enabled by default.
  - Optimizer Hints: SUBQUERY(MATERIALIZATION).
  - EXPLAIN Output: The traditional format has MATERIALIZED as the select type. The JSON format creates a block named materialized_from_subquery.

还有咱们能够用哪些办法影响优化器，下篇文章再续。

Enjoy GreatSQL :)

## 对于 GreatSQL

GreatSQL是由万里数据库保护的MySQL分支，专一于晋升MGR可靠性及性能，反对InnoDB并行查问个性，是实用于金融级利用的MySQL分支版本。

相干链接： GreatSQL社区 Gitee GitHub Bilibili

GreatSQL社区：

社区博客有奖征稿详情：https://greatsql.cn/thread-100-1-1.html

技术交换群：

微信：扫码增加GreatSQL社区助手微信好友，发送验证信息加群。