摘要:
阐明如何优化exists的join查问优化器的解决
外围函数:
TwoDimensionalJoiner::ChooseJoinAlgorithm
JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) { JoinAlgType join_alg = JoinAlgType::JTYPE_GENERAL; if (cond[0].IsType_JoinSimple() && cond[0].op == common::Operator::O_EQ) { if ((cond.Size() == 1) && !stonedb_sysvar_force_hashjoin) join_alg = JoinAlgType::JTYPE_MAP; // available types checked inside else join_alg = JoinAlgType::JTYPE_HASH; } else { if (cond[0].IsType_JoinSimple() && (cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE || cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS)) join_alg = JoinAlgType::JTYPE_SORT; } return join_alg;}
抉择join优化器问题剖析:
- 仅断定join simple场景,未判断exists子句
- cond[0].IsType_JoinSimple() 如果走入了else分支,相当于被执行了两次
ChooseJoinAlgorithm函数优化:
- 退出exists的断定, 以 IsType_JoinSimple 和 == common::Operator::O_EQ条件看待
- 优化代码构造, 清理冗余的cond[0].IsType_JoinSimple()执行
- 其余逻辑不做任何批改
JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) { do { if (cond[0].IsExists()) { break; } if (!cond[0].IsType_JoinSimple()) { return JoinAlgType::JTYPE_GENERAL; } if (cond[0].op == common::Operator::O_EQ) { break; } if (cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE || cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS) { return JoinAlgType::JTYPE_SORT; } } while (0); JoinAlgType join_alg = JoinAlgType::JTYPE_HASH; if ((!stonedb_sysvar_force_hashjoin) && (cond.Size() == 1)) join_alg = JoinAlgType::JTYPE_MAP; // available types checked inside return join_alg;}
代码优化后exists场景剖析:
- 如果未开启强制hash join查问, 且cond.Size() == 1, 则进行JTYPE_MAP查问
- 须要强制开启hash join才可进入hash join查问, 以后测试不开启强制的hash join. 以JTYPE_MAP进行测试
优化走JTYPE_MAP查问测试:
MAP子查问耗时:
mysql> select-> o_orderpriority,-> count(*) as order_count-> from-> orders-> where-> o_orderdate >= date '1993-07-01'-> and o_orderdate < date '1993-07-01' + interval '3' month-> and exists ( -> select -> * -> from -> lineitem -> where -> l_orderkey = o_orderkey -> and l_commitdate < l_receiptdate -> ) -> group by -> o_orderpriority -> order by -> o_orderpriority ; +-----------------+-------------+ | o_orderpriority | order_count | +-----------------+-------------+ | 1-URGENT | 1147477 | | 2-HIGH | 1146447 | | 3-MEDIUM | 1146770 | | 4-NOT SPECIFIED | 1146281 | | 5-LOW | 1146801 | +-----------------+-------------+ 5 rows in set (27.36 sec)
MAP子查问比照之前的子查问耗时:
JTYPE_MAP逻辑的火焰图
强制走JTYPE_HASH查问测试:
强制开启hash join优化, 比照同样场景下与map查问的区别
HASH子查问耗时:
mysql> select -> o_orderpriority, -> count(*) as order_count -> from -> orders -> where -> o_orderdate >= date '1993-07-01' -> and o_orderdate < date '1993-07-01' + interval '3' month -> and exists ( -> select -> * -> from -> lineitem -> where -> l_orderkey = o_orderkey -> and l_commitdate < l_receiptdate -> ) -> group by -> o_orderpriority -> order by -> o_orderpriority ;+-----------------+-------------+| o_orderpriority | order_count |+-----------------+-------------+| 1-URGENT | 1147477 || 2-HIGH | 1146447 || 3-MEDIUM | 1146770 || 4-NOT SPECIFIED | 1146281 || 5-LOW | 1146801 |+-----------------+-------------+5 rows in set (27.60 sec)