简介: 本篇介绍子查问、剖析表和JOIN的简单转换过程
一 背景和架构
在《庖丁解牛-图解MySQL 8.0优化器查问解析篇》一文中咱们重点介绍了MySQL最新版本8.0.25对于SQL根本元素表、列、函数、聚合、分组、排序等元素的解析、设置和转换过程,本篇咱们持续来介绍更为简单的子查问、分区表和JOIN的简单转换过程,纲要如下:
Transformation
remove_redundant_subquery_clause :Permanently remove redundant parts from the query if 1) This is a subquery 2) Not normalizing a view. Removal should take place when a query involving a view is optimized, not when the view is created.
remove_base_options:Remove SELECT_DISTINCT options from a query block if can skip distinct
resolve_subquery :Resolve predicate involving subquery, perform early unconditional subquery transformations
Convert subquery predicate into semi-join, or
Mark the subquery for execution using materialization, or
Perform IN->EXISTS transformation, or
Perform more/less ALL/ANY -> MIN/MAX rewrite
Substitute trivial scalar-context subquery with its value
transform_scalar_subqueries_to_join_with_derived:Transform eligible scalar subqueries to derived tables.
flatten_subqueries :Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.
apply_local_transforms :
delete_unused_merged_columns : If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.
simplify_joins : Convert all outer joins to inner joins if possible.
prune_partitions :Perform partition pruning for a given table and condition.
push_conditions_to_derived_tables :Pushing conditions down to derived tables must be done after validity checks of grouped queries done by apply_local_transforms();
Window::eliminate_unused_objects:Eliminate unused window definitions, redundant sorts etc.
二 具体转换过程
1 解析子查问(resolve_subquery)
解析条件中带有子查问的语句,做一些晚期的无限度的子查问转换,包含:
标记subquery是否变成semi-join
转换判断条件
- 查看OPTIMIZER_SWITCH_SEMIJOIN和HINT没有限度
- 子查问是IN/=ANY和EXIST subquery的谓词
- 子查问是简略查问块而不是UNION
- 子查问无隐形和显性的GROUP BY
- 子查问没有HAVING、WINDOW函数
- Resolve的阶段是Query_block::RESOLVE_CONDITION和Query_block::RESOLVE_JOIN_NEST并且没有用到最新的Hyper optimizer优化器。
- 外查问块能够反对semijoins
- 至多要一个表,而不是相似"SELECT 1"
- 子查问的策略还没有指定Subquery_strategy::UNSPECIFIED
- 父查问也至多有一个表
- 父查问和子查问都不能有straight join
- 父查问块不禁止semijoin
- IN谓词返回值是否是确定的,不是RAND
- 依据子查问判断后果是否须要转成true还是false以及是否为NULL,判断是能够做antijoin还是semijoin
- Antijoin是能够反对的,或者是semijoin
- offset和limit对于semjoin是无效的,offset是从第一行开始,limit也不是0
设置Subquery_strategy::CANDIDATE_FOR_SEMIJOIN并增加sj_candidates
标记subquery是否执行时采纳materialization计划
- 如果不合乎转换semijoin,尝试应用物化形式,转换判断条件
- Optimzier开关subquery_to_derived=on
- 子查问是IN/=ANY or EXISTS谓词
- 子查问是简略查问块而不是UNION
- 如果是[NOT] EXISTS,必须没有聚合
- Subquery谓词在WHERE子句(目前没有在ON子句实现),而且是ANDs or ORs的表达式tree
- 父查问块反对semijoins
- 子查问的策略还没有指定Subquery_strategy::UNSPECIFIED
- 父查问也至多有一个表,而后能够做LEFT JOIN
- 父查问块不禁止semijoin
- IN谓词返回值是否是确定的,不是RAND
- 依据子查问判断后果是否须要转成true还是false以及是否为NULL,判断是能够做antijoin还是semijoin
- 不反对右边参数不是multi-column子查问(WHERE (outer_subq) = ROW(derived.col1,derived.col2))
- 该子查问不反对转换为Derived table(m_subquery_to_derived_is_impossible)
- 设置Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE并增加sj_candidates
如果下面两个策略无奈应用,依据类型抉择transformer
Item_singlerow_subselect::select_transformer
对于简略的标量子查问,在查问中间接用执行后果代替
select * from t1 where a = (select 1); =>select * from t1 where a = 1;
Item_in_subselect/Item_allany_subselect::select_transformer->select_in_like_transformer
select_in_like_transformer函数来解决 IN/ALL/ANY/SOME子查问转换transformation
解决"SELECT 1"(Item_in_optimizer)
如果目前还没有子查问的执行形式,也就是无奈应用semijoin/antijoin执行的子查问,会做IN->EXISTS的转换,实质是在物化执行和迭代式循环执行中做抉择。IN语法代表非相干子查问仅执行一次,将查问后果物化成长期表,之后须要后果时候就去物化表中查找;EXISTS代表对于表面的每一条记录,子查问都会执行一次,是迭代式循环执行。子查问策略设定为Subquery_strategy::CANDIDATE_FOR_IN2EXISTS_OR_MAT
重写single-column的IN/ALL/ANY子查问(single_value_transformer)
oe $cmp$ (SELECT ie FROM ... WHERE subq_where ... HAVING subq_having)=>- oe $cmp$ (SELECT MAX(...) ) // handled by Item_singlerow_subselect- oe $cmp$ \<max\>(SELECT ...) // handled by Item_maxmin_subselectfails=>Item_in_optimizer- 对于曾经是materialized计划,不转换- 通过equi-join转换IN到EXISTS
如果是ALL/ANY单值subquery谓词,尝试用MIN/MAX子查问转换
SELECT * FROM t1 WHERE a < ANY (SELECT a FROM t1); =>SELECT * FROM t1 WHERE a < (SELECT MAX(a) FROM t1)
不满足下面,调用single_value_in_to_exists_transformer转换IN到EXISTS
转换将要将子查问设置为相干子查问,设置UNCACHEABLE_DEPENDENT标识
如果子查问蕴含聚合函数、窗口函数、GROUP语法、HAVING语法,将判断条件退出到HAVING子句中,另外通过ref_or_null_helper来辨别NULL和False的后果,如须要解决NULL IN (SELECT ...)还须要封装到Item_func_trig_cond触发器中。
SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2)=>SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2 [trigcond] HAVING t1.b=ref-to-<expr of SUM(t1.a)>)
如果子查问不蕴含聚合函数、窗口函数、GROUP语法,会放在WHERE查问条件中,当然如果须要解决NULL状况还是要放入HAVING子句(Item_func_trig_cond+Item_is_not_null_test)。
不须要辨别NULL和FALSE的子查问:SELECT 1 FROM ... WHERE (oe $cmp$ ie) AND subq_where须要辨别的子查问:SELECT 1 FROM ... WHERE subq_where AND trigcond((oe $cmp$ ie) OR (ie IS NULL)) HAVING trigcond(@<is_not_null_test@>(ie))JOIN::optimize()会计算materialization和EXISTS转换的代价进行抉择,设置m_subquery_to_derived_is_impossible = true
ROW值转换,通过Item_in_optimizer,不反对ALL/ANY/SOME(row_value_transformer)
Item_in_subselect::row_value_in_to_exists_transformer
for (each left operand) create the equi-join condition if (is_having_used || !abort_on_null) create the "is null" and is_not_null_test items if (is_having_used) add the equi-join and the null tests to HAVING else add the equi-join and the "is null" to WHERE add the is_not_null_test to HAVING
没有HAVING表达式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... WHERE where) =>EXISTS (SELECT ... WHERE where and (l1 = v1 or is null v1) and (l2 = v2 or is null v2) and (l3 = v3 or is null v3) [ HAVING is_not_null_test(v1) and is_not_null_test(v2) and is_not_null_test(v3)) ] <-- 保障不为NULL能够去掉HAVING
有HAVING表达式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... HAVING having) =>EXISTS (SELECT ... HAVING having and (l1 = v1 or is null v1) and (l2 = v2 or is null v2) and (l3 = v3 or is null v3) and is_not_null_test(v1) and is_not_null_test(v2) and is_not_null_test(v3))
2 转换的标量子查问转换成Derived Table(transform_scalar_subqueries_to_join_with_derived)
该个性是官网在8.0.16中为了更好的反对Secondary Engine(Heapwave)的剖析下推,加强了子查问的转换能力。能够先直观的看下转换和不转换的执行打算的不同:
root:test> set optimizer_switch = 'subquery_to_derived=off';Query OK, 0 rows affected (0.00 sec)root:test> EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b);+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary || 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where |+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+2 rows in set, 3 warnings (0.00 sec)root:test> set optimizer_switch = 'subquery_to_derived=on';Query OK, 0 rows affected (0.00 sec)root:test> EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b);+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary || 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Using join buffer (hash join) || 2 | DERIVED | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using temporary |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+3 rows in set, 3 warnings (0.01 sec)
transform_scalar_subqueries_to_join_with_derived具体转换的过程如下:
首先从JOIN条件、WHERE条件、HAVING条件和SELECT list中收集能够转换的标量子查问(Item::collect_scalar_subqueries)。
遍历这些子查问,判断是否能够减少一个额定的转换(transform_grouped_to_derived):把隐性的GROUP BY标量子查问变成Derived Table。
SELECT SUM(c1), (SELECT SUM(c1) FROM t3) scalar FROM t1;转换为=>SELECT derived0.summ, derived1.scalarFROM (SELECT SUM(a) AS summ FROM t1) AS derived0 LEFT JOIN (SELECT SUM(b) AS scalar FROM t3) AS derived1 ON TRUE执行打算如下:explain SELECT SUM(a), (SELECT SUM(c1) FROM t3) scalar FROM t1;+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL || 1 | PRIMARY | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (hash join) || 4 | DERIVED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL || 3 | DERIVED | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
收集惟一的聚合函数Item列表(collect_aggregates),这些Item将会被新的Derived Table的列代替。
还须要增加所有援用到这些Item的fields,包含间接在SELECT列表的,Window函数参数、ORDER by、Partition by蕴含的,还有该查问块中ORDER BY的列,因为他们都会引动到Derived Table里。
创立Derived Table须要的Query_expression/Query_block(create_query_expr_and_block)。
增加Derived Table到查问块和top_join_list中。
保留旧的子查问单元块,如果蕴含能够转化的Derived的移到Derived Table上面的Query_block,如果不蕴含,保留到原来的子查问块中。
将之前的聚合函数Item列表插入到Derived Table的查问块中。
收集除GROUP AGG表达式中的列,因为这些fields曾经挪动到Derived Table中,删除不合理的fields援用。
收集所有惟一的列和View的援用后,将他们加到新的Derived Table列表中。
对新的新的Derived Table进行flatten_subqueries/setup_tables
从新resolve_placeholder_tables,不解决进行转换后的子查问。
解决Derived Table中,新退出的HAVING条件中的聚合函数Item,并通过Item_aggregate_refs援用到new_derived->base_ref_items而不是之前的父查问块base_ref_items。
永恒代替父查问块中的聚合函数列表,变成Derived Table的列,并删除他们。
之前保留和退出到Derived Table的惟一的列和View的援用,也要替换新的fields代替他们的援用。
但目前不反对HAVING表达式中蕴含该子查问,其实也是能够转换的。
SELECT SUM(a), (SELECT SUM(b) FROM t3) scalarFROM t1HAVING SUM(a) > scalar;转换为=>SELECT derived0.summ, derived1.scalarFROM (SELECT SUM(a) AS summ FROM t1) AS derived0 LEFT JOIN (SELECT SUM(b) AS scalar FROM t3) AS derived1 ON TRUEWHERE derived0.sum > derived1.scalar;
接下来遍历所有能够转换的子查问,把他们转换成derived tables,并替换相应的表达式变成列(transform_subquery_to_derived)。
- 生成derived table的TABLE_LIST(synthesize_derived)。
- 将能够挪动到derived table的where_cond设置到join_cond上。
- 增加derived table到查问块的表汇合中。
- decorrelate_derived_scalar_subquery_pre
- 增加非相干援用列(NCF)到SELECT list,这些条件被JOIN条件所援用,并且还有另外一个fields蕴含了外查问相干的列,咱们称之为'lifted_where'
- 增加COUNT(*)到SELECT list,这样转换的查问块能够进行cardinality的查看。比方没有任何聚合函数在子查问中。如果确定蕴含聚合函数,返回一行肯定是NCF同时在GROUP BY列表中。
- 增加NCF到子查问的GROUP列表中,如果曾经在了,须要加到最初,如果产生GROUP BY的列因为依赖性查看失败,还要加Item_func_any_value(非聚合列)到SELECT list。对于NCF会创立 derived.field和derived.
count(field)
。
- 设置物化的一些筹备(setup_materialized_derived)。
- decorrelate_derived_scalar_subquery_post:
- 创立对应的'lifted_fields'。
- 更新JOIN条件中相干列的援用,不在援用外查问而换成Derived table相干的列。
代替WHERE、JOIN、HAVING条件和SELECT list中的子查问的表达式变成对应的Derived Table外面列。
上面图解该函数的转换过程和后果:
3 扁平化子查问(flatten_subqueries)
该函数次要是将Semi-join子查问转换为nested JOIN,这个过程只有一次,并且不可逆。
简略来讲步骤能够简化了解为:
- 创立SEMI JOIN (it1 ... itN)语以局部,并退出到外层查问块的执行打算中。
- 将子查问的WHERE条件以及JOIN条件,退出到父查问的WHERE条件中。
- 将子查问谓词从父查问的判断谓词中打消。
因为MySQL在一个query block中可能join的tables数是无限的(MAX_TABLES),不是所有sj_candidates都能够做因而做flatten_subqueries 的,因而须要有优先级决定的先后顺序先unnesting掉,优先级规定如下:
- 相干子查问优先于非相干的
- inner tables多的子查问大于inner tables少的
- 地位前的子查问大于地位后的
subq_item->sj_convert_priority = (((dependent * MAX_TABLES_FOR_SIZE) + // dependent subqueries first child_query_block->leaf_table_count) * 65536) + // then with many tables (65536 - subq_no); // then based on position
另外,因为递归调用flatten_subqueries是bottom-up,顺次把上层的子查问开展到外层查问块中。
for SELECT#1 WHERE X IN (SELECT #2 WHERE Y IN (SELECT#3)) : Query_block::prepare() (select#1) -> fix_fields() on IN condition -> Query_block::prepare() on subquery (select#2) -> fix_fields() on IN condition -> Query_block::prepare() on subquery (select#3) <- Query_block::prepare() <- fix_fields() -> flatten_subqueries: merge #3 in #2 <- flatten_subqueries <- Query_block::prepare() <- fix_fields() -> flatten_subqueries: merge #2 in #1
遍历子查问列表,删除Item::clean_up_after_removal标记为Subquery_strategy::DELETED的子查问,并且依据优先级规定设置sj_convert_priority。依据优先级进行排序。
遍历排序后的子查问列表,对于Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE策略的子查问,转换子查问([NOT] {IN, EXISTS})为JOIN的Derived table(transform_table_subquery_to_join_with_derived)
FROM [tables] WHERE ... AND/OR oe IN (SELECT ie FROM it) ...=>FROM (tables) LEFT JOIN (SELECT DISTINCT ie FROM it) AS derived ON oe = derived.ie WHERE ... AND/OR derived.ie IS NOT NULL ...
设置策略为Subquery_strategy::DERIVED_TABLE
- semijoin子查问不能和antijoin子查问互相嵌套,或者外查问表曾经超过MAX_TABLE,不做转换,否则标记为Subquery_strategy::SEMIJOIN策略。
- 判断子查问的WHERE条件是否为常量。如果判断条件永远为FALSE,那么子查问后果永远为空。该状况下,调用Item::clean_up_after_removal标记为Subquery_strategy::DELETED,删除该子查问。
- 如果无奈标记为Subquery_strategy::DELETED/设置Subquery_strategy::SEMIJOIN策略的从新标记会Subquery_strategy::UNSPECIFIED持续下一个。
- 替换外层查问的WHERE条件中子查问判断的条件(replace_subcondition)
- 子查问内条件并不永远为FALSE,或者永远为FALSE的状况下,须要改写为antijoin(antijoin状况下,子查问后果永远为空,外层查问条件永远通过)。此时将条件改为永远为True。
- 子查问永远为FALSE,且不是antijoin。那么将外层查问中的条件改成永远为False。
- Item_subselect::EXISTS_SUBS不反对有聚合操作
- convert_subquery_to_semijoin函数解析如下模式的SQL
- IN/=ANY谓词
- 如果条件满足解关联,解关联decorrelate_condition
- 增加解关联的内表表达式到 SELECT list
- 收集FROM子句中的表面相干的 derived table或join条件
- 去掉关联标识UNCACHEABLE_DEPENDENT,更新used table
- Derived table子查问减少SELECT_DISTINCT标识
- 转换子查问成为一个derived table,并且插入到所属于的查问块FROM后(transform_subquery_to_derived)
- 创立derived table及其join条件
- 遍历父查问块的WHERE,替换该子查问的Item代替成derived table(replace_subcondition)
- 遍历排序后的子查问列表,对于Subquery_strategy::CANDIDATE_FOR_SEMIJOIN策略的子查问。
- 判断是否能够转换为semijoin
- 遍历排序后的子查问列表,对于Subquery_strategy::SEMIJOIN的子查问,开始转换为semijoin/antijoin(convert_subquery_to_semijoin)
- convert_subquery_to_semijoin函数解析如下模式的SQL
IN/=ANY谓词
SELECT ... FROM ot1 ... otN WHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieM FROM it1 ... itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]=> SELECT ... FROM (ot1 ... otN) SJ (it1 ... itK) ON (oe1, ... oeM) = (ie1, ..., ieM) [AND inner-cond] [WHERE outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]
EXISTS谓词
SELECT ... FROM ot1 ... otN WHERE EXISTS (SELECT expressions FROM it1 ... itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]=> SELECT ... FROM (ot1 ... otN) SJ (it1 ... itK) [ON inner-cond] [WHERE outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]
NOT EXISTS谓词
SELECT ... FROM ot1 ... otN WHERE NOT EXISTS (SELECT expressions FROM it1 ... itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]=> SELECT ... FROM (ot1 ... otN) AJ (it1 ... itK) [ON inner-cond] [WHERE outer-cond AND is-null-cond(it1)] [GROUP BY ...] [HAVING ...] [ORDER BY ...]
NOT IN谓词
SELECT ... FROM ot1 ... otN WHERE (oe1, ... oeM) NOT IN (SELECT ie1, ..., ieM FROM it1 ... itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]=> SELECT ... FROM (ot1 ... otN) AJ (it1 ... itK) ON (oe1, ... oeM) = (ie1, ..., ieM) [AND inner-cond] [WHERE outer-cond] [GROUP BY ...] [HAVING ...] [ORDER BY ...]
- 查找能够插入semi-join嵌套和其生成的条件的地位,比方对于 t1 LEFT JOIN t2, embedding_join_nest为t2,t2也能够是nested,如t1 LEFT JOIN (t2 JOIN t3))
- 生成一个新的semijoin嵌套的TABLE_LIST表
- 解决Antijoin
- 将子查问中潜在的表合并到上述join表(TABLE_LIST::merge_underlying_tables)
- 将子查问的叶子表插入到以后查问块的叶子表前面,从新设置子查问的叶子表的序号和依赖的表面。将子查问的叶子表重置。
- 如果是outer join的话,在join链表中传递可空性(propagate_nullability)
- 将内层子查问中的关联条件去关联化,这些条件被退出到semijoin的列表里。这些条件必须是确定的,仅反对简略判断条件或者由简略判断条件组成的AND条件(Query_block::decorrelate_condition)
- 判断左右条件是否仅依赖于内外层表,将其表达式别离退出到semijoin内表面的表达式列表中(decorrelate_equality)
- 解关联内层查问的join条件(Query_block::decorrelate_condition)
- 移除该子查问表达式在父查问的AST(Query_express::exclude_level)
- 依据semi-join嵌套产生的WHERE/JOIN条件更新对应的table bitmap(Query_block::fix_tables_after_pullout)
- 将子查问的WHERE条件上拉,更新应用表的信息(Item_cond_and::fix_after_pullout())
- 依据semijoin的条件列表创立AND条件,如果有条件为常量True,则去除该条件;如果常量为False,则整个条件都去除(Query_block::build_sj_cond)
- 将创立进去的semijoin条件退出到外层查问的WHERE条件中
- 最初遍历排序后的子查问列表,对于没有转换的子查问,对于Subquery_strategy::UNSPECIFIED的策略,执行IN->EXISTS改写(select_transformer),如果的确原有的子查问曾经有代替的Item,调用replace_subcondition解析并把他们退出到适合的WHERE或者ON子句。
- 革除所有的sj_candidates列表
- Semi-join有5中执行形式,本文并不介绍Optimizer和Execution过程,具体能够参考援用文章中对于semijoin的介绍,最初引入下管制semijoin优化和执行的优化器开关,其中semijoin=on/off是总开关。
SELECT @@optimizer_switch\G*************************** 1. row ***************************@@optimizer_switch: ...... materialization=on,semijoin=on,loosescan=on, firstmatch=on, subquery_materialization_cost_based=on, ......
下图举例说明该转换过程:
SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0);=>/* select#1 */SELECT `t1`.`a` AS `a`FROM `t1`SEMI JOIN (`t2`)WHERE ((`t1`.`a` = `t2`.`c1`) and (`t2`.`c1` > 0))执行打算如下:explain SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0);+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Start temporary || 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; End temporary; Using join buffer (hash join) |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+
4 利用以后查问块转换(apply_local_transforms)
该函数在flattern subqueries之后,bottom-up调用,次要分几个步骤:
删除无用列(delete_unused_merged_columns)
如果查问块曾经删除了一些derived tables/views,遍历SELECT列表的列,删除不必要的列
简化JOIN(simplify_joins)
该函数会把Query_block中的top_join_list的嵌套join的简化为扁平化的join list。嵌套连贯包含table1 join table2,也蕴含table1, (table2, table3)这种模式。如果所示的简化过程:
分区表的动态剪枝(prune_partitions)
因为剪枝依据HASH/RANGE/LIST及二级分区都有不同,这里简略介绍下剪枝过程,现有prune_partitions是在prepare和optimize阶段会被调用,某些常量子查问被评估执行完。
struct TABLE { ...... partition_info *part_info{nullptr}; /* Partition related information */ /* If true, all partitions have been pruned away */ bool all_partitions_pruned_away{false}; ......} SQL tranformation phaseSELECT_LEX::apply_local_transforms--> prune_partitionsfor example, select * from employee where company_id = 1000 ;SQL optimizer phaseJOIN::prune_table_partitions--> prune_partitions ------> based on tbl->join_cond_optim() or JOIN::where_condfor example, explain select * from employee where company_id = (select c1 from t1);
举例上面RANGE剪枝的过程:
root:ref> CREATE TABLE R2 ( -> a INT, -> d INT -> ) PARTITION BY RANGE(a) ( -> PARTITION p20 VALUES LESS THAN (20), -> PARTITION p40 VALUES LESS THAN (40), -> PARTITION p60 VALUES LESS THAN (60), -> PARTITION p80 VALUES LESS THAN (80), -> PARTITION p100 VALUES LESS THAN MAXVALUE -> );Query OK, 0 rows affected (0.09 sec)root:ref> Select * From R2 where a > 40 and a < 80;
剪枝具体过程如下:
因为剪枝须要依据不同条件产生的pruning后果进行交加,因而剪枝过程中须要应用read_partitions这样的bitmap来保留是否应用该对应分区。另外剪枝过程相似迭代判断,因而引入了part_iterator来保留开始、完结和以后,以及对应须要获取区间范畴的endpoint函数和获取下一个值next的迭代器函数。这里奇妙的使用了指针,来兼容不同分区类型Hash/Range/List类型,如下图所示:
获取join_cond或者m_where_cond的SEL_TREE红黑树(get_mm_tree)
- 调用find_used_partitions来获取满足的分区,对于SEL_TREE的每个区间(interval):1. 获取区间的左右端点 2.从右边持续获取下一个满足的分区,直到到左边端点完结,每次调用完满足条件的分区须要应用bitmap_set_bit设置该分区在part_info->read_partitions上的位点。
- find_used_partitions是依据SEL_TREE的构造进行递归,如图从左到右遍历next_key_part(and condition),而后再遍历SEL_TREE的左右(也就是高低方向,or condition)深度递归。
(start) | $ | Partitioning keyparts $ subpartitioning keyparts | $ | ... ... $ | | | $ | +---------+ +---------+ $ +-----------+ +-----------+ \-| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5| +---------+ +---------+ $ +-----------+ +-----------+ | $ | | | $ | +-----------+ | $ | | subpar2=c6| | $ | +-----------+ | $ | | $ +-----------+ +-----------+ | $ | subpar1=c4|--| subpar2=c8| | $ +-----------+ +-----------+ | $ | $ +---------+ $ +------------+ +------------+ | par1=c2 |------------------| subpar1=c10|--| subpar2=c12| +---------+ $ +------------+ +------------+ | $ ... $例如第一行(par1=c1 and par2=c2 and subpar1=c3 and subpar2=c5)的遍历的stack将是:in find_used_partitions(key_tree = "subpar2=c5") (***)in find_used_partitions(key_tree = "subpar1=c3")in find_used_partitions(key_tree = "par2=c2") (**)in find_used_partitions(key_tree = "par1=c1")in prune_partitions(...)而后是持续上面的条件,以此类推or(par1=c1 and par2=c2 and subpar1=c3 and subpar2=c6)or(par1=c1 and par2=c2 and subpar1=c4 and subpar2=c8)or(par1=c2 and subpar1=c10 and subpar2=c12)
下图来展现了pruning的构造和过程:
5 下推条件到Derived Table(push_conditions_to_derived_tables)
该函数将条件下推到derived tables,具体见WL#8084 - Condition pushdown to materialized derived table。
root:test> set optimizer_switch = 'derived_merge=off'; // 敞开dervied_merge 测试下推能力Query OK, 0 rows affected (0.00 sec)root:test> EXPLAIN FORMAT=tree SELECT * FROM (SELECT c1,c2 FROM t1) as dt WHERE c1 > 10;+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| EXPLAIN |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| -> Table scan on dt (cost=2.51..2.51 rows=1) -> Materialize (cost=2.96..2.96 rows=1) -> Filter: (t1.c1 > 10) (cost=0.35 rows=1) -> Table scan on t1 (cost=0.35 rows=1) |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
过程如下:
遍历derived table列表,判断是否能够下推(can_push_condition_to_derived),如果包含上面的状况则不能下推:
- Derived table有UNION
- Derived table有LIMIT
- Derived table不能是outer join中的内表,会导致更多NULL弥补的行
不能是CTE蕴含的Derived table - 创立能够下推到的Derived table的where cond(Condition_pushdown::make_cond_for_derived)
- 保留残余不能下推的条件(Condition_pushdown::get_remainder_cond)
- Top-down递归调用push_conditions_to_derived_tables
具体图解该过程如下:
三 综述
两篇文章重点介绍了下优化器的基于规定的优化局部,并没有波及更多的基于代价的优化,能够看到对于间接使用规定优化带来执行的减速,那么能够间接转换,尤其是对于查问构造下面的变动类转换,如merge_derived。对于使用规定优化无奈判断是否带来执行的减速,那么优化器会保留一些长期构造,为后续的代价估算提供更多抉择,如IN/EXIST/Materialized转换。当然还有一些,又扭转查问构造又无奈断定是否规定转换带来的执行减速,MySQL目前还不反对。文章尽管详尽,但无奈笼罩全副状况,也是为了抛砖引玉,还须要读者本人通过调试的办法更进一步理解某一类SQL的具体过程。
四 参考资料
《MySQL 8.0 Server层最新架构详解》
《WL#13520: Transform correlated scalar subqueries》
《WL#8084 - Condition pushdown to materialized derived table》
《WL#2980: Subquery optimization: Semijoin》
WL#3740: Subquery optimization: Semijoin: Pull-out of inner tables
WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
WL#3750: Subquery optimization: Semijoin: First-match strategy
WL#3751: Subquery optimization: Semijoin: Inside-out strategy
《WL#4389: Subquery optimizations: Make IN optimizations also handle EXISTS》
《WL#4245: Subquery optimization: Transform NOT EXISTS and NOT IN to anti-join》
《WL#2985: Perform Partition Pruning of Range conditions》
《MySQL · 源码剖析 · Semi-join优化执行代码剖析》
《MySQL·源码剖析·子查问优化源码剖析》《Optimizing Subqueries, Derived Tables, View References, and Common Table Expressions》
原文链接
本文为阿里云原创内容,未经容许不得转载。