关于数据库:MySQL80-优化器介绍二

GreatSQL 社区原创内容未经受权不得随便应用，转载请分割小编并注明起源。
GreatSQL 是 MySQL 的国产分支版本，应用上与 MySQL 统一。
作者：奥特曼爱小怪兽
文章起源：GreatSQL 社区投稿

上一篇 MySQL8.0 优化器介绍（一）介绍了老本优化模型的三要素：表关联程序，与每张表返回的行数（过滤效率），查问老本。而 join 算法又是影响表关联效率的首要因素。

join 在 MySQL 是一个如此重要的章节，毫不夸大的说，everything is a join。

截止到本文写作时，MySQL8.0.32 GA 曾经发行，这里次要介绍三大 join：NL（Nested Loop），BNL（Block Nested Loop）,HASH JOIN

MySQL5.7 之前，都只有 NL，它实现简略，适宜索引查找。

简直每个人都能够手动实现一个 NL。

SELECT CountryCode, 
       country.Name AS Country,  
       city.Name AS City, 
       city.District  
  FROM world.country  
  INNER JOIN world.city  
    ON city.CountryCode = country.Code  
  WHERE Continent = 'Asia'; 
  
  ## 执行打算相似如下：-> Nested loop inner join
     -> Filter: (country.Continent = 'Asia')
         -> Table scan on country
     -> Index lookup on city using CountryCode
     (CountryCode=country.`Code`)
     
 ##python 代码实现一个 NL
 result = []
 for country_row in country:
     if country_row.Continent == 'Asia':
         for city_row in city.CountryCode['country_row.Code']:
             result.append(join_rows(country_row, city_row))

图示化一个 NL

NL 的限度：通常多个表 join，小表在前做驱动表，被驱动表有索引检索，效率会高一些。（官网手册上没有 full outer join ,full join 语法，理论反对 full join）

举个例子多表 join 且关联表不走索引：

# 人为干涉打算，走性能最差的执行门路。SELECT /*+ NO_BNL(city) */
       CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
  FROM world.country IGNORE INDEX (Primary)
  INNER JOIN world.city IGNORE INDEX (CountryCode)
    ON city.CountryCode = country.Code
  WHERE Continent = 'Asia';
  
  SELECT rows_examined, rows_sent,
         last_statement_latency AS latency
    FROM sys.session
   WHERE thd_id =  PS_CURRENT_THREAD_ID()\G
**************************** 1. row ****************************
rows_examined: 208268
 rows_sent: 1766
 latency: 44.83 ms
 
 ## 比照一下优化器 主动优化后的
 SELECT CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
  FROM world.country 
  INNER JOIN world.city 
    ON city.CountryCode = country.Code
  WHERE Continent = 'Asia';
  
  SELECT rows_examined, rows_sent,
         last_statement_latency AS latency
    FROM sys.session
   WHERE thd_id =  PS_CURRENT_THREAD_ID()\G
*************************** 1. row ***************************
rows_examined: 2005
 rows_sent: 1766
 latency: 4.36 ms
1 row in set (0.0539 sec)

块嵌套循环算法是嵌套循环算法的扩大。它也被称为 BNL 算法。连贯缓冲区用于收集尽可能多的行，并在第二个表的一次扫描中比拟所有行，而不是一一提交第一个表中的行。这能够大大提高 NL 在某些查问上的性能。

hash join 是在 MySQL8.0.18 引进的，上面的 sql，应用了 NO_HASH_JOIN(country,city) 的提醒，并且两表的 join 字段上的索引被疏忽，目标是为了介绍 BNL 个性。

SELECT /*+ NO_HASH_JOIN(country,city) */
       CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
 FROM world.country IGNORE INDEX (Primary)
 INNER JOIN world.city IGNORE INDEX (CountryCode)
   ON city.CountryCode = country.Code
WHERE Continent = 'Asia';

## 应用 python 伪代码来解释一下 BNL
result = []
join_buffer = []
for country_row in country:
     if country_row.Continent == 'Asia':
     join_buffer.append(country_row.Code)
     if is_full(join_buffer):
         for city_row in city:
             CountryCode = city_row.CountryCode
             if CountryCode in join_buffer:
                 country_row = get_row(CountryCode)
                 result.append(join_rows(country_row, city_row))
             join_buffer = []
if len(join_buffer) > 0:
     for city_row in city:
     CountryCode = city_row.CountryCode
     if CountryCode in join_buffer:
         country_row = get_row(CountryCode)
         result.append(join_rows(country_row, city_row))
   join_buffer = []

图示化一个 BNL

留神图里的 join_buffer, 在 MySQL5.7 上应用 sysbench 压测读写场景，压力上不去，次要就是因为 BNL 算法下，join_buffer_size 的设置为默认值。适当调整几个 buffer 后，tps 失去显著进步。join buffer 对查问影响，也能够用上面的例子做一个量化阐明。

SELECT /*+ NO_HASH_JOIN(country,city) */
       CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
 FROM world.country IGNORE INDEX (Primary)
 INNER JOIN world.city IGNORE INDEX (CountryCode)
   ON city.CountryCode = country.Code
WHERE Continent = 'Asia';

 SELECT rows_examined, rows_sent,
         last_statement_latency AS latency
    FROM sys.session
   WHERE thd_id =  PS_CURRENT_THREAD_ID()\G
*************************** 1. row ***************************
rows_examined: 4318
 rows_sent: 1766
 latency: 16.87 ms
1 row in set (0.0490 sec)

#人为干涉打算，走性能最差的执行门路。SELECT /*+ NO_BNL(city) */
       CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
  FROM world.country IGNORE INDEX (Primary)
  INNER JOIN world.city IGNORE INDEX (CountryCode)
    ON city.CountryCode = country.Code
  WHERE Continent = 'Asia';
  
  SELECT rows_examined, rows_sent,
         last_statement_latency AS latency
    FROM sys.session
   WHERE thd_id =  PS_CURRENT_THREAD_ID()\G
**************************** 1. row ****************************
rows_examined: 208268
 rows_sent: 1766
 latency: 44.83 ms
 
 在两表 join，且 join 字段不应用索引的前提下，BNL +join_buffer 性能远大于 NL

应用 BNL 有几个点须要留神。（我切实懒得全文翻译官网文档了）

Only the columns required for the join are stored in the join buffer. This means that you will need less memory for the join buffer than you may at first expect.（不须要配置太高 buffer)
The size of the join buffer is configured with the join_buffer_size variable. The value of join_buffer_size is the minimum size of the buffer! Even if less than 1 KiB of country code values will be stored in the join buffer in the discussed example, if join_buffer_size is set to 1 GiB, then 1 GiB will be allocated. For this reason, keep the value of join_buffer_size low and only increase it as needed.
One join buffer is allocated per join using the block nested loop algorithm.
Each join buffer is allocated for the entire duration of the query.
The block nested loop algorithm can be used for full table scans, full index scans, and range scans.(实用 table access 形式）
The block nested loop algorithm will never be used for constant tables as well as the first nonconstant table. This means that it requires a join between two tables with more than one row after filtering by unique indexes to use the block nested loop algorithm

能够通过 optimizer switch 配置 BNL()、NO_BNL()

BNL 特地善于在 non-indexed joins 的场景，很多时候性能优于 hash join。As of version 8.0.20, block nested-loop joins are no longer used; instead, a hash join has replaced it.

Hash Join 作为大杀器在 MySQL8.0.18 引入，期间有过引起内存和文件句柄大量耗费的线上问题，然而不障碍它作为一个 join 算法的重大突破，特地适宜大表与大表的无索引 join。某些场景甚至比 NL+index join 更快。（当然比起 oracle 上的 hash join 仍然弱爆了，40w * 40w 的大表查问，MySQL 优化到极致在 10s 左右，oracle 在 1s 程度，相差一个数量级。

思考：MySQL、Oracle 都是 hash join，为何差距如此大？MySQL hash join 能够哪些方面进行性能进步？

业界次要有两大方向

单线程 hash 优化算法和数据结构
NUMA 架构下，多线程 Hash Join 的优化次要是可能让读写数据尽量产生在以后 NUMA node

参考文档（https://zhuanlan.zhihu.com/p/589601705）

大家无妨看看 MySQL 工程师的 worklog，内容很精彩（https://dev.mysql.com/worklog/task/?id=2241）

能够看出国外大厂弱小的标准化的 it 生产能力，一个性能从需要到实现经验了哪些关键步骤。

MySQL 的 Hash join 是一个内存 hash+ 磁盘 hash 的混合扩大。为了不让 hash join 溢出 join buffer, 须要加大内存设置，应用磁盘 hash 时，须要配置更多的文件句柄数。只管有 disk hash，但理论干活的还是 in-memory hash。

内存 hash 有两个阶段：

build phase. One of the tables in the join is chosen as the build table. The hash is calculated for the columns required for the join and loaded into memory.
probe phase. The other table in the join is the probe input. For this table, rows are read one at a time, and the hash is calculated. Then a hash key lookup is performed on the hashes calculated from the build table, and the result of the join is generated from the matching rows.

当 hashes of build table 不足以全副放进内存时，MySQL 会主动切换到 on-disk 的扩大实现（基于 GRACE hash join algorithm）。在 build phase 阶段，join buffer 满，就会产生 in-mem hash 向 on-disk hash 转换。

on-disk algorithm 蕴含 3 个步骤：

Calculate the hashes of all rows in both the build and probe tables and store them on disk in several small files partitioned by the hash. The number of partitions is chosen to make each partition of the probe table fit into the join buffer but with a limit of at most 128 partitions.
Load the first partition of the build table into memory and iterate over the hashes from the probe table in the same way as for the probe phase for the in-memory algorithm. Since the partitioning in step 1 uses the same hash function for both the build and probe tables, it is only necessary to iterate over the first partition of the probe table.
Clear the in-memory buffer and continue with the rest of the partitions one by one

无论是内存 hash 还是磁盘 hash，都应用 xxHash64 hash function。xxHash64 有足够快，hash 品质好（reducing the number of hash collisions）的特点

BNL 不会被选中的时候，MySQL 就会选用 hash join。

在整顿这篇材料时，对要应用的哈希连贯算法存在以下要求：

The join must be an inner join.
The join cannot be performed using an index, either because there is no available index or because the indexes have been disabled for the query.
All joins in the query must have at least one equi-join condition between the two tables in the join, and only columns from the two tables as well as constants are referenced in the condition.（查问中的所有联接必须在联接中的两个表之间至多有一个等联接条件，并且在该条件中仅援用两个表中的列以及常量）
As of 8.0.20, anti, semi, and outer joins are also supported. If you join the two tables t1 and t2, then examples of join conditions that are supported for hash join include
t1.t1_val = t2.t2_val
t1.t1_val = t2.t2_val + 2
t1.t1_val1 = t2.t2_val AND t1.t1_val2 > 100
MONTH(t1.t1_val) = MONTH(t2.t2_val)

用一个例子来阐明一下 hash join：

SELECT CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
 FROM world.country IGNORE INDEX (Primary)
 INNER JOIN world.city IGNORE INDEX (CountryCode)
   ON city.CountryCode = country.Code
WHERE Continent = 'Asia';

#用一段伪代码翻译一下
result = []
join_buffer = []
partitions = 0
on_disk = False
for country_row in country:
     if country_row.Continent == 'Asia':
         hash = xxHash64(country_row.Code)
         if not on_disk:
             join_buffer.append(hash)
             if is_full(join_buffer):
             # Create partitions on disk
               on_disk = True
               partitions = write_buffer_to_disk(join_buffer)
               join_buffer = []
         else
             write_hash_to_disk(hash)
             
if not on_disk:
     for city_row in city:
         hash = xxHash64(city_row.CountryCode)
         if hash in join_buffer:
             country_row = get_row(hash)
             city_row = get_row(hash)
             result.append(join_rows(country_row, city_row))
else:
     for city_row in city:
         hash = xxHash64(city_row.CountryCode)
         write_hash_to_disk(hash)
         
     for partition in range(partitions):
         join_buffer = load_build_from_disk(partition)
         for hash in load_hash_from_disk(partition):
         if hash in join_buffer:
             country_row = get_row(hash)
             city_row = get_row(hash)
             result.append(join_rows(country_row, city_row))
 join_buffer = []

与所应用的理论算法相比，所形容的算法略微简化了一些。真正的算法必须思考哈希抵触，而对于磁盘上的算法，某些分区可能会变得太大而无奈放入连贯缓冲区，在这种状况下，它们会被分块解决，以防止应用比配置的内存更多的内存

图示化一下 in-mem hash 算法：

量化一下 hash join 的老本

SELECT CountryCode, 
       country.Name AS Country,
       city.Name AS City, 
       city.District
 FROM world.country IGNORE INDEX (Primary)
 INNER JOIN world.city IGNORE INDEX (CountryCode)
   ON city.CountryCode = country.Code
WHERE Continent = 'Asia';

SELECT rows_examined, rows_sent,
         last_statement_latency AS latency
    FROM sys.session
   WHERE thd_id =  PS_CURRENT_THREAD_ID()\G
**************************** 1. row ****************************
rows_examined: 4318
 rows_sent: 1766
 latency: 3.53 ms

从本文的例子中，rows_examined 的角度来看 index_join 下的 NL（2005）优于无索引 join 条件下的 BNL (4318）= 无索引 join 条件下的 hash join。然而当数据量发生变化，可能后果就不一样了，事实中也没有相对的性能好坏规定（如果有，基于规定的成本计算就能很好解决的查问问题，实际上更值得信赖的是老本估算），hash join 与 NL,BNL 的优越比拟，列出几点，当作夸夸其谈：

For a join without using an index, the hash join will usually be much faster than a block nested join unless a LIMIT clause has been added. Improvements of more than a factor of 1000 have been observed.

参考:

（https://mysqlserverteam.com/hash-join-in-mysql-8/）
（https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/）
（http://www.slideshare.net/NorvaldRyeng/mysql-8018-latest-upda…）
For a join without an index where there is a LIMIT clause, a block nested loop can exit when enough rows have been found, whereas a hash join will complete the entire join (but can skip fetching the rows). If the number of rows included due to the LIMIT clause is small compared to the total number of rows found by the join, a block nested loop may be faster.
For joins supporting an index, the hash join algorithm can be faster if the index has a low selectivity.

Hash Join 最大的益处在于晋升多表无索引关联时的查问性能。具体 NL,BNL,HASH JOIN 谁更适宜用于查问打算，实际才是最好的证实。

同样能够应用 HASH_JOIN() 和 NO_HASH_JOIN() 的 hint 来影响查问打算。

MySQL8.0 对于反对的三种 high-level 连贯策略的探讨临时到此结束。下来能够本人去查一下 anti, semi, and outer joins。

更多细节参考

（https://dev.mysql.com/doc/refman/8.0/en/select-optimization.html）（https://dev.mysql.com/doc/refman/8.0/en/semijoins.html）

还有一些对于 join 的 lower-level 优化值得思考，下篇文章合成。

Enjoy GreatSQL :)

## 对于 GreatSQL

GreatSQL 是由万里数据库保护的 MySQL 分支，专一于晋升 MGR 可靠性及性能，反对 InnoDB 并行查问个性，是实用于金融级利用的 MySQL 分支版本。

相干链接：GreatSQL 社区 Gitee GitHub Bilibili

社区博客有奖征稿详情：https://greatsql.cn/thread-100-1-1.html

微信：扫码增加 GreatSQL 社区助手 微信好友，发送验证信息 加群。

关于数据库:MySQL80-优化器介绍二

join 算法（Join Algorithms）

嵌套循环（Nested Loop）

块嵌套循环（Block Nested Loop）

哈希 join（Hash Join）

GreatSQL 社区：

技术交换群：