关于数据库:GaussDBDWS性能调优系列实现篇六十八般武艺Plan-hint运用

摘要：本文介绍 GaussDB(DWS) 另一种能够人工干预打算生成的性能 –plan hint。

数据库的使用者在书写 SQL 语句时，会依据本人已知的状况尽力写出性能很高的 SQL 语句。然而当须要写大量 SQL 语句，且有些 SQL 语句的逻辑极为简单时，数据库使用者就很难写出性能较高的 SQL 语句。

而每个数据库都有一个相似人的大脑的查问优化器模块，它接管来自语法分析模块传递过去的查问树，在这个查问树的根底上进行逻辑上的等价变换、物理执行门路的筛选，并且把抉择出的最优的执行门路传递给数据库的执行器模块。查问优化器是晋升查问效率十分重要的一个伎俩。

数据库查问优化器的分类详见博文《GaussDB(DWS) 性能调优系列根底篇一：万物之始 analyze 统计信息》性能调优系列根底篇一：万物之始 analyze 统计信息 ”)。

因为优化器基于统计信息和估算模型生成打算，当估算呈现偏差时，打算可能呈现问题，性能较差，使语句的执行变得奇慢无比。

通常，查问优化器的优化过程对数据库使用者是通明的。在上一篇博文《GaussDB(DWS) 性能调优系列实战篇五：十八般武艺之门路干涉》性能调优系列实战篇五：十八般武艺之门路干涉 ”) 中，Gauss DB(DWS) 提供了可通过配置 GUC 参数的形式，全局的干涉查问打算的门路生成。本次，将介绍另一种能够人工干预打算生成的性能 –plan hint。Hint 是一种通过 SQL 语句中的正文传递给优化器的指令，优化器应用 hint 为语句抉择执行打算。在测试或开发环境中，hint 对于测试特定拜访门路的性能十分有用。例如，您可能晓得某些表优先进行连贯，能够无效缩小两头后果集大小，在这种状况下，能够应用提醒来批示优化器应用更好的执行打算。

Plan hint 性能属于语句级的调控，仅对以后语句的以后档次失效，能够帮忙咱们在调优的过程中，针对特定的语句，通过 plan hint 进行人工干预，抉择更高效的执行打算。

GaussDB(DWS) 的 Plan hint 有以下品种:

Join 程序的 hint：调整 join 程序
Scan/Join 办法的 hint：指定或防止 scan/join 的办法
Stream 办法的 hint：指定或防止 redistribute/broadcast
行数 hint：对于给定后果集，指定行数，或对原有估算值进行计算调整
歪斜值 hint：在歪斜优化时，指定须要歪斜解决的非凡值

上面别离对以上几种 plan hint 的性能及其在理论中的使用做一下介绍。在上面几节的介绍中，除歪斜值 hint 外，都以 tpcds 中的 Q6 作为示例。为了能显著看到 hint 在查问优化过程中的作用，咱们将 store_sales 表的统计信息删除。原始语句和生成的初始打算如下。

示例语句：

explain performance
select a.ca_state state, count(*) cnt
  from customer_address a
     ,customer c
     ,store_sales s
    ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
     (select distinct (d_month_seq)
       from date_dim
               where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
      from item j
      where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

语法：

格局 1：

leading(table_list)

仅指定 join 程序，不指定内表面程序

格局 2：

leading((table_list))

同时指定 join 程序和内表面程序，内表面程序仅在最外层失效

阐明：

table_list 为要调整 join 程序的表名列表，表之间应用空格分隔。能够蕴含以后层的任意个表（别名），或对于子查问晋升的场景，也能够蕴含子查问的 hint 别名，同时任意表能够应用括号指定优先级。

留神：

表只能用单个字符串示意，不能带 schema。
表如果存在别名，须要优先应用别名来示意该表。
list 中的表在以后层或晋升的子查问中必须是惟一的。如果不惟一，须要应用不同的别名进行辨别。
同一个表只能在 list 里呈现一次。

示例 1：

对于示例中的打算，能够看出，17-22 号算子时 store_sales 表和 item 表 join 后生成 hash 表，store_sales 表的数据量很大，store_sales 和 item 表 join 后未过滤掉任何数据，所以这两个表 join 并生成 hash 表的工夫都比拟长。依据对 tpcds 各表中数据分布的理解，咱们晓得，store_sales 表和 date_dim 进行 join，能够过滤掉较多数据，所以，能够应用 hint 来提醒优化器优将 store_sales 表和 date_dim 表先进行 join，store_sales 作为表面，date_dim 作为内表，缩小两头后果集大小。语句改写如下：

explain performance
select /*+ leading((s d)) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
   and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

减少了 join 程序 hint 的查问打算如下：

通过调整 join 程序，使得之后各 join 的两头后果集都大幅缩小，执行工夫由 34268.322ms 降为 11095.046ms。

用于批示优化器应用那种 scan 办法或 join 办法。

语法：

Join 办法的 hint 格局：

[no] nestloop|hashjoin|mergejoin(table_list)

Scan 办法的 hint 格局：

[no] tablescan|indexscan|indexonlyscan(table [index])

阐明：

no 示意提醒优化器不应用这种办法。
table 示意 hint 指定的表，只能指定一个表，如果表存在别名应优先应用别名进行 hint。
index 示意应用 indexscan 或 indexonlyscan 的 hint 时，指定的索引名称，以后只能指定一个。

示例 2 -1：

示例 1 中失去的执行打算，因为 store_sales 表的行数估算不准，store_sales 和 date_dim 采纳了效率不好的 nestloop 形式进行 join。当初通过本节的 hint 办法来批示优化器不应用 nestloop 形式进行 join。

explain performance
select /*+ leading((s d)) no nestloop(s d) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
   and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

减少了 join 形式 hint 后的打算如下：

从下面的打算中能够看到，优化器对 store_sales 和 date_dim 表之间的 join 办法曾经由 nestloop 改为了 hashjoin，且这条语句的执行工夫也由 11095.046ms 降为 4644.409ms。

示例 2 -2：

为了演示 scan 形式的 hint 应用，如下在 item 表的 i_item_sk 列上创立一个名称为 i_item 的索引。

create index i_item on item(i_item_sk);

通过上面的语句批示优化器拜访别名为 i 的 item 表时，应用索引 i_item 做索引扫描。

explain performance
select /*+ leading((s d)) no nestloop(s d) indexscan(i i_item) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

应用 scan 的 hint 批示扫描 item 表时采纳 indexscan 后的查问打算如下：

从下面的执行后果看，应用索引扫描后（s 和 d join 后，再和 item 的 join 采纳了 mergejoin 形式）反而使性能略有降落，所以前面的用例中，咱们将不对 item 表采纳索引扫描的办法。

用于批示优化器采纳哪种 stream 办法，能够为 broadcast 和 redistribute。

语法：

[no] broadcast|redistribute(table_list)

阐明：

no 示意不应用 hint 的 stream 形式。
table_list 为进行 stream 操作的单表或多表 join 后果集

示例 3：

此处作为演示，批改语句如下，通过 hint 批示优化器对 item 表扫描的后果应用 broadcast 形式进行散布。

explain performance
select /*+ leading((s d)) no nestloop(s d) broadcast(i) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 Where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
               from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

批示优化器应用 broadcast 形式散布 item 后果的查问打算如下：

能够看出，之前在 item 扫描后的后果上是 redistribute 散布形式，当初曾经变为了 broadcast 散布形式。Broadcast 散布形式个别用于数据量比拟小的后果集上，相同 redistribute 用于数据量比拟大的后果集上。所以，依据执行打算中单表或表 join 后的后果集大小，能够通过这种形式，调整后果集的散布形式，从而晋升查问的性能。

用于指明两头后果集的大小，反对绝对值和相对值的 hint。

语法：

rows(table_list #|+|-|* const)1. #,+,-,，进行行数估算 hint 的四种操作符号。# 示意间接应用前面的行数进行 hint。+,-, 示意对原来估算的行数进行加、减、乘操作，运算后的行数最小值为 1 行。

阐明：

1.#,+,-,，进行行数估算 hint 的四种操作符号。# 示意间接应用前面的行数进行 hint。+,-, 示意对原来估算的行数进行加、减、乘操作，运算后的行数最小值为 1 行。

2.const 能够是任意非正数，反对迷信计数法。

因为 store_sales 表没有统计信息，所以在下面的各个打算中能够看到，store_sales 表的预计行数和理论行数相差十分大，这就会导致生成了最后的效率比拟低的打算。上面咱们看看应用行数 hint 的成果。

示例 4：

explain performance
select /*+ rows(s #2880404) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
    ,store_sales s
     ,date_dim d
     ,item i
 Where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
               from item j
              where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

具体查问打算如下：

指定了 store_sales 表的精确行数后，优化器生成的打算执行工夫间接从最后的 34268.322ms 将为 1991.843ms，晋升了 17 倍。这也充沛的阐明了优化器对统计信息准确性的强烈依赖。

除了能够指明单表的行数，还可指明两头后果集的行数。比方上例中 8 号算子的理论行数和预计行数也相差较大，咱们指明 8 号算子的后果集行数看看成果。在上面这个例子中，还应用了子链接块名的 hint，为子链接指定了一个别名，便于在行数 hint 中指定子链接。

explain performance
select /*+ rows(s #2880404) rows(s i tt c a d #2512) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where       a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
               where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select /*+ blockname (tt)*/ avg(j.i_current_price)
      from item j
      where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100;

查问打算如下：

8 号算子的预计行数曾经和理论行数统一。因为 8 号算子不是打算的瓶颈点，所以性能晋升并不显著。

用于指明查问运行时重散布过程中存在歪斜的重散布键和歪斜值，针对 Join 和 HashAgg 运算中的重散布进行优化。

语法：

指定单表歪斜

skew(table (column) [(value)])

指定两头后果歪斜

skew((join_rel) (column) [(values)])

阐明：

table 示意存在歪斜的单个表名。
join_rel 示意参加 join 的两个或多个表，如（t1 t2）示意 t1 和 t2 join 后的后果存在歪斜。
column 示意歪斜表中存在歪斜的一个或多个列。
value 示意歪斜的列中存在歪斜的一个或多个值。

示例 5：

本节，咱们用 tpcds 中的 Q1 作为示例，未应用 hint 前的查问及打算如下：

explain performance
with customer_total_return as
    (select sr_customer_sk as ctr_customer_sk
          ,sr_store_sk as ctr_store_sk
          ,sum(SR_FEE) as ctr_total_return
     from store_returns
         ,date_dim
     where sr_returned_date_sk = d_date_sk
       and d_year =2000
     group by sr_customer_sk
             ,sr_store_sk)
select  c_customer_id
  from customer_total_return ctr1
      ,store
      ,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                                 from customer_total_return ctr2
                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  and s_store_sk = ctr1.ctr_store_sk
  and s_state = 'NM'
  and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;

with 表达式中 group by 在做 HashAgg 中进行重散布时存在歪斜，对应上图中的 10 和 27 号算子。对 with 表达式中的 hashagg 进行 hint 指定，查问和打算如下：

explain performance
with customer_total_return as
    (select /*+ skew(store_returns(sr_store_sk sr_customer_sk)) */sr_customer_sk as ctr_customer_sk
          ,sr_store_sk as ctr_store_sk
          ,sum(SR_FEE) as ctr_total_return
     from store_returns
         ,date_dim
     where sr_returned_date_sk = d_date_sk
       and d_year =2000
     group by sr_customer_sk
             ,sr_store_sk)
select  c_customer_id
  from customer_total_return ctr1
      ,store
      ,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                                 from customer_total_return ctr2
                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  and s_store_sk = ctr1.ctr_store_sk
  and s_state = 'NM'
  and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;

作了歪斜 hint 的查问打算如下：

从优化后的打算能够看出：对于 HashAgg，因为其重散布存在歪斜，所以优化为双层 Agg。

通过下面各节中的示例，展现了 Gauss DB(DWS) 中 plan hint 的应用办法，及其对执行打算的影响。数据库使用者联合本人对数据库对象、数据分布状况及数据量等信息的理解，或者依据 SQL 语句的查问打算剖析出其中采纳了不正确打算的局部，正确的利用 plan hint，提醒优化器采纳更高效的打算，能够使查问执行的性能取得大幅的晋升，成为性能调优的一件无利的工具。

本文分享自华为云社区《GaussDB(DWS) 性能调优系列实现篇六：十八般武艺 Plan hint 使用》，原文作者：wangxiaojuan8。

点击关注，第一工夫理解华为云陈腐技术~

关于数据库:GaussDBDWS性能调优系列实现篇六十八般武艺Plan-hint运用

前言

Plan hint 的引入

Plan hint 的利用

Join 程序的 hint

Scan/Join 办法的 hint

Stream 办法的 hint

行数 hint

歪斜值 hint

结语

Just My Socks（注册教程内含优惠码）

关于数据库:GaussDBDWS性能调优系列实现篇六十八般武艺Plan-hint运用

前言

Plan hint 的引入

Plan hint 的利用

Join 程序的 hint

Scan/Join 办法的 hint

Stream 办法的 hint

行数 hint

歪斜值 hint

结语

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）