之前的篇章咱们探讨的都是基于单列的分区表,那有无必要建设基于多列的分区表?这种分区表数据分布是否平均?有无非凡的利用场景?有无非凡的优化策略?本篇基于这些问题来进行重点解读。
MySQL 不仅反对基于单列分区,也反对基于多列分区。比方基于字段(f1,f2,f3) 来建设分区表,应用办法和应用场景都有些相似于联结索引。比方上面查问语句,同时对列 (f1,f2,f3) 进行过滤。
select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;
多列分区表的前提是参加分区的列检索频率均等,如果不均等,就没有必要应用多列分区。
咱们还是以具体实例来验证下多列分区的优缺点以及实用场景,这样了解起来更加透彻。
建设一张表 p1,字段 r1,r2,r3 别离取值为 1 -8,1-5,1-5.
create table p1(r1 int,r2 int,r3 int,log_date datetime);
依照字段 (r1,r2,r3) 的散布范畴,我来写个存储过程解决下表 p1,变为分区表。存储过程代码如下:
DELIMITER $$
USE `ytt_new`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
DECLARE i,j,k INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
WHILE i <= 8 DO
set j = 1;
while j <= 5 do
set k = 1;
while k <= 5 do
SET @stmt = CONCAT(@stmt,'PARTITION p',i,j,k,'VALUES LESS THAN (',i,',',j,',',k,'),');
set k = k + 1;
end while;
set j = j + 1;
end while;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;
调用存储过程,变更表 p1 为多列分区表,此时表 p1 有 201 个分区,记录数为 500W 条。
mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)
mysql> select count(partition_name) as partition_count from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
| 201 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (12.01 sec)
用同样的办法建设一张分区表 p2,来对单列分区表与多列分区表在一些场景下的性能做下比照:
分区表 p2 依照字段 r1 分区,仅仅分了 9 个。
mysql> CREATE TABLE `p2` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
PARTITION p2 VALUES LESS THAN (2) ,
PARTITION p3 VALUES LESS THAN (3) ,
PARTITION p4 VALUES LESS THAN (4) ,
PARTITION p5 VALUES LESS THAN (5) ,
PARTITION p6 VALUES LESS THAN (6) ,
PARTITION p7 VALUES LESS THAN (7) ,
PARTITION p8 VALUES LESS THAN (8) ,
PARTITION p_max VALUES LESS THAN (MAXVALUE)
)
1 row in set (0.00 sec)
mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000 Duplicates: 0 Warnings: 0
多个字段等值过滤的性能比照:同样的查问条件,表 p1(执行工夫 0.02 秒)比 p2(执行工夫 0.49 秒)要快几十倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.49 sec)
查看两者执行打算比照:同样的查问,表 p1 扫描行数只有 2W 多,而表 p2 扫描行数有 62W 行,相差微小。
mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p223
type: ALL
...
rows: 24711
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果过滤字段不残缺呢?比方不检索最初一列,再次做下比照:同样表 p1(0.1 秒)比表 p2(0.52 秒)执行工夫要少几倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.10 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.52 sec)
那只检索第一列呢:这次表 p1 和 p2 执行工夫上差不多,p2 略微占优势。
mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.56 sec)
mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.45 sec)
看下执行打算比照:表 p1 扫描的分区数为 26 个,表 p2 仅扫描 1 个分区,分区数量上表 p2 绝对少很多。
mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
type: ALL
...
rows: 648074
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果把字段 r1 拿掉呢?执行工夫也相差无几,表 p1 和表 p2 都会扫描所有分区。
mysql> select count(*) from p1 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.87 sec)
mysql> select count(*) from p2 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.75 sec)
那鉴于此,再来探讨一个问题:对于多列分区,字段的排列程序是否重要?
对于这个程序要和咱们查问语句对应的过滤条件来一一阐明。相似上面两类 SQL:
SQL 1:select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;
对于 SQL 1,程序无关紧要,因为三个列在查问时都已蕴含;
SQL 2: select * from p1 where r1 = 2 and r2 = 2;
对于 SQL 2,(r1,r2,r3) 和 (r2,r1,r3) 都能够满足。
SQL 3:select * from p1 where r2 = 2 and r3 = 2;
对于 SQL 3,(r2,r3,r1) 和 (r3,r2,r1) 也都能够满足。
用同样的办法来建设分区表 p3,分区字段程序为 (r2,r3,r1):
mysql> show create table p3\G
*************************** 1. row ***************************
Table: p3
Create Table: CREATE TABLE `p3` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
/*!50500 PARTITION BY RANGE COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...
对于表 p3 来讲:上面这条 SQL 执行工夫比表 p1 要快几十倍,因为分区字段程序不同,表 p1 要扫描所有分区能力出后果。
mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (0.22 sec)
mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (5.05 sec)
所以对于多列分区表,正如结尾讲的一样,它和联结索引的应用办法、注意事项、应用场景也都很相似。对于某些特定的场景,应用多列分区能显著放慢查问性能。