摘要:

阐明如何优化exists的join查问优化器的解决

外围函数:

TwoDimensionalJoiner::ChooseJoinAlgorithm

JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) {  JoinAlgType join_alg = JoinAlgType::JTYPE_GENERAL;  if (cond[0].IsType_JoinSimple() && cond[0].op == common::Operator::O_EQ) {    if ((cond.Size() == 1) && !stonedb_sysvar_force_hashjoin)      join_alg = JoinAlgType::JTYPE_MAP;  // available types checked inside    else      join_alg = JoinAlgType::JTYPE_HASH;  } else {    if (cond[0].IsType_JoinSimple() &&        (cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE ||         cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS))      join_alg = JoinAlgType::JTYPE_SORT;  }  return join_alg;}

抉择join优化器问题剖析:

  1. 仅断定join simple场景,未判断exists子句
  2. cond[0].IsType_JoinSimple() 如果走入了else分支,相当于被执行了两次

ChooseJoinAlgorithm函数优化:

  1. 退出exists的断定, 以 IsType_JoinSimple 和 == common::Operator::O_EQ条件看待
  2. 优化代码构造, 清理冗余的cond[0].IsType_JoinSimple()执行
  3. 其余逻辑不做任何批改
JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) {  do {    if (cond[0].IsExists()) {      break;    }    if (!cond[0].IsType_JoinSimple()) {      return JoinAlgType::JTYPE_GENERAL;    }    if (cond[0].op == common::Operator::O_EQ) {      break;    }    if (cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE ||             cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS) {      return JoinAlgType::JTYPE_SORT;    }  } while (0);  JoinAlgType join_alg = JoinAlgType::JTYPE_HASH;  if  ((!stonedb_sysvar_force_hashjoin) && (cond.Size() == 1))      join_alg = JoinAlgType::JTYPE_MAP;  // available types checked inside  return join_alg;}

代码优化后exists场景剖析:

  1. 如果未开启强制hash join查问, 且cond.Size() == 1, 则进行JTYPE_MAP查问
  2. 须要强制开启hash join才可进入hash join查问, 以后测试不开启强制的hash join. 以JTYPE_MAP进行测试

优化走JTYPE_MAP查问测试:

MAP子查问耗时:

mysql> select->                             o_orderpriority,->                             count(*) as order_count->                         from->                             orders->                         where->                             o_orderdate >= date '1993-07-01'->                             and o_orderdate < date '1993-07-01' + interval '3' month->                             and exists (  ->                                 select  ->                                     *  ->                                 from  ->                                     lineitem  ->                                 where  ->                                     l_orderkey = o_orderkey  ->                                     and l_commitdate < l_receiptdate  ->                             )  ->                         group by  ->                             o_orderpriority  ->                         order by  ->                             o_orderpriority ;  +-----------------+-------------+  | o_orderpriority | order_count |  +-----------------+-------------+  | 1-URGENT        |     1147477 |  | 2-HIGH          |     1146447 |  | 3-MEDIUM        |     1146770 |  | 4-NOT SPECIFIED |     1146281 |  | 5-LOW           |     1146801 |  +-----------------+-------------+  5 rows in set (27.36 sec)

MAP子查问比照之前的子查问耗时:

JTYPE_MAP逻辑的火焰图

强制走JTYPE_HASH查问测试:

强制开启hash join优化, 比照同样场景下与map查问的区别

HASH子查问耗时:

mysql> select    ->                             o_orderpriority,    ->                             count(*) as order_count    ->                         from    ->                             orders    ->                         where    ->                             o_orderdate >= date '1993-07-01'    ->                             and o_orderdate < date '1993-07-01' + interval '3' month    ->                             and exists (    ->                                 select    ->                                     *    ->                                 from    ->                                     lineitem    ->                                 where    ->                                     l_orderkey = o_orderkey    ->                                     and l_commitdate < l_receiptdate    ->                             )    ->                         group by    ->                             o_orderpriority    ->                         order by    ->                             o_orderpriority ;+-----------------+-------------+| o_orderpriority | order_count |+-----------------+-------------+| 1-URGENT        |     1147477 || 2-HIGH          |     1146447 || 3-MEDIUM        |     1146770 || 4-NOT SPECIFIED |     1146281 || 5-LOW           |     1146801 |+-----------------+-------------+5 rows in set (27.60 sec)

HASH子查问的火焰图: