关于数据库:GaussDBDWS性能调优系列实现篇六十八般武艺Plan-hint运用

摘要：本文介绍GaussDB(DWS)另一种能够人工干预打算生成的性能--plan hint。

前言

数据库的使用者在书写SQL语句时，会依据本人已知的状况尽力写出性能很高的SQL语句。然而当须要写大量SQL语句，且有些SQL语句的逻辑极为简单时，数据库使用者就很难写出性能较高的SQL语句。

而每个数据库都有一个相似人的大脑的查问优化器模块，它接管来自语法分析模块传递过去的查问树，在这个查问树的根底上进行逻辑上的等价变换、物理执行门路的筛选，并且把抉择出的最优的执行门路传递给数据库的执行器模块。查问优化器是晋升查问效率十分重要的一个伎俩。

数据库查问优化器的分类详见博文《GaussDB(DWS)性能调优系列根底篇一：万物之始analyze统计信息》性能调优系列根底篇一：万物之始analyze统计信息")。

Plan hint的引入

因为优化器基于统计信息和估算模型生成打算，当估算呈现偏差时，打算可能呈现问题，性能较差，使语句的执行变得奇慢无比。

通常，查问优化器的优化过程对数据库使用者是通明的。在上一篇博文《GaussDB(DWS)性能调优系列实战篇五：十八般武艺之门路干涉》性能调优系列实战篇五：十八般武艺之门路干涉")中，Gauss DB(DWS)提供了可通过配置GUC参数的形式，全局的干涉查问打算的门路生成。本次，将介绍另一种能够人工干预打算生成的性能--plan hint。Hint是一种通过SQL语句中的正文传递给优化器的指令，优化器应用hint为语句抉择执行打算。在测试或开发环境中，hint对于测试特定拜访门路的性能十分有用。例如，您可能晓得某些表优先进行连贯，能够无效缩小两头后果集大小，在这种状况下，能够应用提醒来批示优化器应用更好的执行打算。

Plan hint性能属于语句级的调控，仅对以后语句的以后档次失效，能够帮忙咱们在调优的过程中，针对特定的语句，通过plan hint进行人工干预，抉择更高效的执行打算。

GaussDB(DWS)的Plan hint有以下品种:

Join程序的hint：调整join程序
Scan/Join办法的hint：指定或防止scan/join的办法
Stream办法的hint：指定或防止redistribute/broadcast
行数hint：对于给定后果集，指定行数，或对原有估算值进行计算调整
歪斜值hint：在歪斜优化时，指定须要歪斜解决的非凡值

上面别离对以上几种plan hint的性能及其在理论中的使用做一下介绍。在上面几节的介绍中，除歪斜值hint外，都以tpcds中的Q6作为示例。为了能显著看到hint在查问优化过程中的作用，咱们将store_sales表的统计信息删除。原始语句和生成的初始打算如下。

示例语句：

explain performanceselect a.ca_state state, count(*) cnt  from customer_address a     ,customer c     ,store_sales s    ,date_dim d     ,item i where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq =     (select distinct (d_month_seq)       from date_dim               where d_year = 2000         and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)      from item j      where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

Plan hint的利用

Join 程序的hint

语法：

格局1：

leading(table_list)

仅指定join程序，不指定内表面程序

格局2：

leading((table_list))

同时指定join程序和内表面程序，内表面程序仅在最外层失效

阐明：

table_list为要调整join程序的表名列表，表之间应用空格分隔。能够蕴含以后层的任意个表（别名），或对于子查问晋升的场景，也能够蕴含子查问的hint别名，同时任意表能够应用括号指定优先级。

留神：

表只能用单个字符串示意，不能带schema。
表如果存在别名，须要优先应用别名来示意该表。
list中的表在以后层或晋升的子查问中必须是惟一的。如果不惟一，须要应用不同的别名进行辨别。
同一个表只能在list里呈现一次。

示例1：

对于示例中的打算，能够看出，17-22号算子时store_sales表和item表join后生成hash表，store_sales表的数据量很大，store_sales和item表join后未过滤掉任何数据，所以这两个表join并生成hash表的工夫都比拟长。依据对tpcds各表中数据分布的理解，咱们晓得，store_sales表和date_dim进行join，能够过滤掉较多数据，所以，能够应用hint来提醒优化器优将store_sales表和date_dim表先进行join，store_sales作为表面，date_dim作为内表，缩小两头后果集大小。语句改写如下：

explain performanceselect /*+ leading((s d)) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c     ,store_sales s     ,date_dim d     ,item i where a.ca_address_sk = c.c_current_addr_sk   and c.c_customer_sk = s.ss_customer_sk   and s.ss_sold_date_sk = d.d_date_sk   and s.ss_item_sk = i.i_item_sk   and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim      where d_year = 2000        and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)              from item j             where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

减少了join程序hint的查问打算如下：

通过调整join程序，使得之后各join的两头后果集都大幅缩小，执行工夫由34268.322ms降为11095.046ms。

Scan/Join办法的hint

用于批示优化器应用那种scan办法或join办法。

语法：

Join办法的hint格局：

[no] nestloop|hashjoin|mergejoin(table_list)

Scan办法的hint格局：

[no] tablescan|indexscan|indexonlyscan(table [index])

阐明：

no示意提醒优化器不应用这种办法。
table示意hint指定的表，只能指定一个表，如果表存在别名应优先应用别名进行hint。
index示意应用indexscan或indexonlyscan的hint时，指定的索引名称，以后只能指定一个。

示例2-1：

示例1中失去的执行打算，因为store_sales表的行数估算不准，store_sales和date_dim采纳了效率不好的nestloop形式进行join。当初通过本节的hint办法来批示优化器不应用nestloop形式进行join。

explain performanceselect /*+ leading((s d)) no nestloop(s d) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c     ,store_sales s     ,date_dim d     ,item i where a.ca_address_sk = c.c_current_addr_sk   and c.c_customer_sk = s.ss_customer_sk   and s.ss_sold_date_sk = d.d_date_sk   and s.ss_item_sk = i.i_item_sk   and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim      where d_year = 2000        and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)              from item j             where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

减少了join形式hint后的打算如下：

从下面的打算中能够看到，优化器对store_sales和date_dim表之间的join办法曾经由nestloop改为了hashjoin，且这条语句的执行工夫也由11095.046ms降为4644.409ms。

示例2-2：

为了演示scan形式的hint应用，如下在item表的i_item_sk列上创立一个名称为i_item的索引。

create index i_item on item(i_item_sk);

通过上面的语句批示优化器拜访别名为i的item表时，应用索引i_item做索引扫描。

explain performanceselect /*+ leading((s d)) no nestloop(s d) indexscan(i i_item) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c     ,store_sales s     ,date_dim d     ,item i where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim      where d_year = 2000        and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)              from item j             where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

应用scan的hint批示扫描item表时采纳indexscan后的查问打算如下：

从下面的执行后果看，应用索引扫描后（s 和 d join后，再和item的join采纳了mergejoin形式）反而使性能略有降落，所以前面的用例中，咱们将不对item表采纳索引扫描的办法。

Stream办法的hint

用于批示优化器采纳哪种stream办法，能够为broadcast和redistribute。

语法：

[no] broadcast|redistribute(table_list)

阐明：

no示意不应用hint的stream形式。
table_list为进行stream操作的单表或多表join后果集

示例3：

此处作为演示，批改语句如下，通过hint批示优化器对item表扫描的后果应用broadcast形式进行散布。

explain performanceselect /*+ leading((s d)) no nestloop(s d) broadcast(i) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c     ,store_sales s     ,date_dim d     ,item i Where a.ca_address_sk = c.c_current_addr_sk   and c.c_customer_sk = s.ss_customer_sk   and s.ss_sold_date_sk = d.d_date_sk   and s.ss_item_sk = i.i_item_sk and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim      where d_year = 2000        and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)               from item j             where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

批示优化器应用broadcast形式散布item后果的查问打算如下：

能够看出，之前在item扫描后的后果上是redistribute散布形式，当初曾经变为了broadcast散布形式。Broadcast散布形式个别用于数据量比拟小的后果集上，相同redistribute用于数据量比拟大的后果集上。所以，依据执行打算中单表或表join后的后果集大小，能够通过这种形式，调整后果集的散布形式，从而晋升查问的性能。

行数hint

用于指明两头后果集的大小，反对绝对值和相对值的hint。

语法：

rows(table_list #|+|-|* const)1. #,+,-,，进行行数估算hint的四种操作符号。#示意间接应用前面的行数进行hint。+,-,示意对原来估算的行数进行加、减、乘操作，运算后的行数最小值为1行。

阐明：

1.#,+,-,，进行行数估算hint的四种操作符号。#示意间接应用前面的行数进行hint。+,-,示意对原来估算的行数进行加、减、乘操作，运算后的行数最小值为1行。

2.const能够是任意非正数，反对迷信计数法。

因为store_sales表没有统计信息，所以在下面的各个打算中能够看到，store_sales表的预计行数和理论行数相差十分大，这就会导致生成了最后的效率比拟低的打算。上面咱们看看应用行数hint的成果。

示例4：

explain performanceselect /*+ rows(s #2880404) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c    ,store_sales s     ,date_dim d     ,item i Where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim      where d_year = 2000         and d_moy = 2 ) and i.i_current_price > 1.2 *             (select avg(j.i_current_price)               from item j              where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

具体查问打算如下：

指定了store_sales表的精确行数后，优化器生成的打算执行工夫间接从最后的34268.322ms将为1991.843ms，晋升了17倍。这也充沛的阐明了优化器对统计信息准确性的强烈依赖。

除了能够指明单表的行数，还可指明两头后果集的行数。比方上例中8号算子的理论行数和预计行数也相差较大，咱们指明8号算子的后果集行数看看成果。在上面这个例子中，还应用了子链接块名的hint，为子链接指定了一个别名，便于在行数hint中指定子链接。

explain performanceselect /*+ rows(s #2880404) rows(s i tt c a d #2512) */ a.ca_state state, count(*) cnt from customer_address a     ,customer c     ,store_sales s     ,date_dim d     ,item i where       a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq =      (select distinct (d_month_seq)       from date_dim               where d_year = 2000         and d_moy = 2 ) and i.i_current_price > 1.2 *             (select /*+ blockname (tt)*/ avg(j.i_current_price)      from item j      where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt limit 100;

查问打算如下：

8号算子的预计行数曾经和理论行数统一。因为8号算子不是打算的瓶颈点，所以性能晋升并不显著。

歪斜值hint

用于指明查问运行时重散布过程中存在歪斜的重散布键和歪斜值，针对Join和HashAgg运算中的重散布进行优化。

语法：

指定单表歪斜

skew(table (column) [(value)])

指定两头后果歪斜

skew((join_rel) (column) [(values)])

阐明：

table示意存在歪斜的单个表名。
join_rel示意参加join的两个或多个表，如（t1 t2）示意t1和t2 join后的后果存在歪斜。
column示意歪斜表中存在歪斜的一个或多个列。
value示意歪斜的列中存在歪斜的一个或多个值。

示例5：

本节，咱们用tpcds中的Q1作为示例，未应用hint前的查问及打算如下：

explain performancewith customer_total_return as    (select sr_customer_sk as ctr_customer_sk          ,sr_store_sk as ctr_store_sk          ,sum(SR_FEE) as ctr_total_return     from store_returns         ,date_dim     where sr_returned_date_sk = d_date_sk       and d_year =2000     group by sr_customer_sk             ,sr_store_sk)select  c_customer_id  from customer_total_return ctr1      ,store      ,customerwhere ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2                                 from customer_total_return ctr2                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)  and s_store_sk = ctr1.ctr_store_sk  and s_state = 'NM'  and ctr1.ctr_customer_sk = c_customer_skorder by c_customer_idlimit 100;

with表达式中group by在做HashAgg中进行重散布时存在歪斜，对应上图中的10和27号算子。对with表达式中的hashagg进行hint指定，查问和打算如下：

explain performancewith customer_total_return as    (select /*+ skew(store_returns(sr_store_sk sr_customer_sk)) */sr_customer_sk as ctr_customer_sk          ,sr_store_sk as ctr_store_sk          ,sum(SR_FEE) as ctr_total_return     from store_returns         ,date_dim     where sr_returned_date_sk = d_date_sk       and d_year =2000     group by sr_customer_sk             ,sr_store_sk)select  c_customer_id  from customer_total_return ctr1      ,store      ,customerwhere ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2                                 from customer_total_return ctr2                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)  and s_store_sk = ctr1.ctr_store_sk  and s_state = 'NM'  and ctr1.ctr_customer_sk = c_customer_skorder by c_customer_idlimit 100;

作了歪斜hint的查问打算如下：

从优化后的打算能够看出：对于HashAgg，因为其重散布存在歪斜，所以优化为双层Agg。

结语

通过下面各节中的示例，展现了Gauss DB(DWS)中plan hint的应用办法，及其对执行打算的影响。数据库使用者联合本人对数据库对象、数据分布状况及数据量等信息的理解，或者依据SQL语句的查问打算剖析出其中采纳了不正确打算的局部，正确的利用plan hint，提醒优化器采纳更高效的打算，能够使查问执行的性能取得大幅的晋升，成为性能调优的一件无利的工具。

本文分享自华为云社区《GaussDB(DWS)性能调优系列实现篇六：十八般武艺Plan hint使用》，原文作者：wangxiaojuan8 。

点击关注，第一工夫理解华为云陈腐技术~