关于sql优化:第39期MySQL-时间类分区写-SQL-注意事项

上篇《MySQL 工夫类分区具体实现》介绍了工夫类分区的实现办法，本篇是对上篇的一个延长，介绍基于此类分区的相干 SQL 编写注意事项。

对于分区表的检索无非有两种，一种是带分区键，另一种则不带分区键。一般来讲检索条件带分区键则执行速度快，不带分区键则执行速度变慢。这种论断适应于大多数场景，但不能以偏概全，要针对不同的分区表定义来写最合适的 SQL 语句。用分区表的目标是为了缩小 SQL 语句检索时的记录数，如果没有达到预期成果，则分区表只能带来副作用。接下来我列举几个经典的 SQL 语句：

仔细的读者在浏览完上篇可能心中就有一些疑难，基于表 ytt_p1 的 SQL 语句如下：

select count(*) from ytt_pt1 where log_date >='2018-01-01' and log_date <'2019-01-01';

同样是分区表 ytt_pt1_month1，基于这张表的 SQL 语句如下：

select count(*) from ytt_pt1_month1 where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15');

两张表的检索需要相似，为何写法差别不小？后者为何要写成列表模式而不持续写成简略的范畴检索模式？带着这点疑难，咱们持续。

MySQL 针对分区表有一项优化技术叫 partition pruning，翻译过去就是分区裁剪。其大抵含意是 MySQL 会依据 SQL 语句的过滤条件对应的分区函数进行计算，并把计算结果穿透到底层分区表从而减小扫描记录数的一种优化策略。对于工夫类型（DATE,TIMESTAMP,TIME，DATETIME），MySQL 仅反对局部函数的分区裁剪：to_days，to_seconds，year，unix_timestamp。那么咱们再来看之前的疑难：表 ytt_pt1_month1 分区函数为 month，MySQL 分区表尽管反对 month 函数，然而分区裁剪技术却不蕴含这个函数。接下来，分两局部来介绍本篇内容。

第一、来体验下 MySQL 的分区裁剪技术，新建一张表 pt_pruning：分区函数为 to_days。

create table pt_pruning (
id int,
r1 int,
r2 int,
log_date date)
partition by range(to_days(log_date))
(PARTITION p_01 VALUES LESS THAN (to_days('2020-02-01')) ENGINE = InnoDB,
 PARTITION p_02 VALUES LESS THAN (to_days('2020-03-01')) ENGINE = InnoDB,
 PARTITION p_03 VALUES LESS THAN (to_days('2020-04-01')) ENGINE = InnoDB,
 PARTITION p_04 VALUES LESS THAN (to_days('2020-05-01')) ENGINE = InnoDB,
 PARTITION p_05 VALUES LESS THAN (to_days('2020-06-01')) ENGINE = InnoDB,
 PARTITION p_06 VALUES LESS THAN (to_days('2020-07-01')) ENGINE = InnoDB,
 PARTITION p_07 VALUES LESS THAN (to_days('2020-08-01')) ENGINE = InnoDB,
 PARTITION p_08 VALUES LESS THAN (to_days('2020-09-01')) ENGINE = InnoDB,
 PARTITION p_09 VALUES LESS THAN (to_days('2020-10-01')) ENGINE = InnoDB,
 PARTITION p_10 VALUES LESS THAN (to_days('2020-11-01')) ENGINE = InnoDB,
 PARTITION p_11 VALUES LESS THAN (to_days('2020-12-01')) ENGINE = InnoDB,
 PARTITION p_12 VALUES LESS THAN (to_days('2021-01-01')) ENGINE = InnoDB,
 PARTITION p_max VALUES LESS THAN MAXVALUE ENGINE = InnoDB
)

此表蕴含 2020 年一整年的数据，大略 100W 条，此处省略造数据过程。

(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_pruning;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)

别离执行上面几条 SQL :

SQL 1：求日期蕴含 ‘2020-01-02’ 的记录条数。

SQL 1：select count(*) from pt_pruning where log_date <= '2020-01-02';

SQL 2 和 SQL 3：求 2020 年 1 月份的记录条数。

SQL 2：select count(*) from pt_pruning where log_date < '2020-02-01';

SQL 3:  select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';

SQL 1 和 SQL 2 执行工夫为 0.04 秒，SQL 3 执行工夫为 0.06 秒。在没有应用索引的条件下成果还是比拟现实的。

(localhost:ytt)<mysql> select count(*) from pt_pruning where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.06 sec)

所以切记应用 MySQL 分区裁剪技术规定的分区函数来建设分区表，这样写 SQL 就会绝对随便些。如果因为历史起因，分区表没有应用以上规定的分区函数，能够有以下两项可能的优化策略：

手工改 SQL 语句让其达到最优。
加 HINT 来提醒 MySQL 应用具体的分区。

第二、如果分区表应用的分区函数未满足 MySQL 分区裁剪技术的规定，该如何优化此类 SQL 语句？

为防止和上篇内容混同，建张新表 pt_month，复制表 ytt_pt1_month1 的表定义。表 pt_month 和表 pt_pruning 一样，寄存了 2020 年一整年的记录，总条数也为 100W。

(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_month;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)

再次执行之前的三条 SQL，并把表名替换为 pt_month：

SQL 1 执行工夫为 1.26 秒，相比之前慢了不少。查看执行打算，发现未应用 MySQL 分区裁剪技术，扫描了不必要的表分区。（这里是全副表分区）

(localhost:ytt)<mysql>select count(*) from pt_month where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (1.26 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date <= '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01,p_02,p_03,p_04,p_05,p_06,p_07,p_08,p_09,p_10,p_11,p_max
...
         rows: 992805
     filtered: 33.33
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

接下来对 SQL 1 进行一个简略的优化：既然是求日期为’2020-01-02‘那天的记录，那就不要应用 <= 来过滤，间接用 = 过滤：执行工夫 0.03 秒。查看执行打算，改后的 SQL 间接定位到表分区 p_01，达到了分区裁剪的成果。

(localhost:ytt)<mysql>select count(*) from pt_month where log_date = '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.03 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date = '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01
         type: ALL
...
         rows: 82522
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

继续执行 SQL 2 和 SQL 3：执行工夫都是 1 秒到 2 秒之间，效率很差，也未应用 MySQL 分区裁剪技术。

(localhost:ytt)<mysql>select count(*) from pt_month where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.35 sec)

(localhost:ytt)<mysql>select count(*) from pt_month where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.93 sec)

来持续优化 SQL 2 和 SQL 3，因为两个需要统一，能够把范畴检索改为指定列表检索：执行工夫仅为 0.04 秒。

(localhost:ytt)<mysql>select count(*) from pt_month where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15','2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20','2020-01-21','2020-01-22','2020-01-23','2020-01-24','2020-01-25','2020-01-26','2020-01-27','2020-01-28','2020-01-29','2020-01-30','2020-01-31');
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

把范畴查问改为 IN 列表后，效率失去很大晋升，查问打算显示 MySQL 优化器只在分区 p_01 上检索记录。

...
   partitions: p_01
...

除了革新 SQL 语句，还能够给语句加 HINT 的形式来让 MySQL 应用分区裁剪技术：比方给 SQL 2 加上 HINT 后，执行工夫为 0.04 秒，和之前革新后的语句执行效率相当。

(localhost:ytt)<mysql>select count(*) from pt_month partition (p_01) where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

总结：

如果因为历史起因分区表未应用 MySQL 分区裁剪技术，能够依照以下规定来手动对分区表进行裁剪优化：

select * from tbname where partition_key = value;
select * from tbname where partition_key in (value1,value2,…,valueN);
以上两种规定对于多表 JOIN 仍然实用。