关于mysql:第43期多表关联场景下如何用好分区表

如何在多表关联场景下正当利用分区表来晋升查问性能？基于前几篇对于分区表的介绍，想必大家对 MySQL 分区表的认知曾经十分全面：分区表存在的目标就是为了缩小每次检索的数据量从而晋升整体性能。

前几篇介绍了基于分区表的单表利用，那么分区表在多表关联时是否有必然的性能晋升？常常有人会问这样的一些问题：我用了分区表，然而查问一点也没有放慢，反而更慢了，是什么起因？是不是分区表自身有缺点？还是我没有了解分区表适宜的场景？对于这些个问题，我明天用几类典型的查问场景来举例说明。

第一种场景：两表关联，关联键是分区键，然而没有过滤条件。

相似这样： select * from t1 inner join t2 using(id);

这类场景用分区表只会让查问性能更差，并不会减速查问性能。

不必分区表时，表关联数目只有两张；用了分区表，参加表关联的表数目就不仅仅是两张，还有泛滥表分区，分区数目越多，查问性能越差。

举个简略例子：表t1 为哈希分区表，有1000个分区，记录数50W行。

localhost:ytt>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 1000 */
1 row in set (0.00 sec)

表 t1_no_pt 为一般表，为表t1的克隆，然而移除掉表分区，记录数也同样为50W条。

localhost:ytt>show create table t1_no_pt\G
*************************** 1. row ***************************
       Table: t1_no_pt
Create Table: CREATE TABLE `t1_no_pt` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
1 row in set (0.00 sec)

这两张表在这种场景下的查问性能比照：分区表和一般表关联查问，执行工夫为6.76秒。

localhost:ytt>select count(*) from t1_no_pt a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.76 sec)

两张分区表关联查问，执行工夫为4.32秒。

localhost:ytt>select count(*) from t1 a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (4.32 sec)

两张一般表关联查问，执行工夫只用了0.87秒。

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (0.87 sec)

同样的查问，分区表在这样的场景下反而更加蹩脚。

第二种场景：两表关联，关联键是分区键，然而有过滤条件。

这里又细分为两种子场景：

1. 过滤条件为分区键

相似这样的查问：select * from t1 inner join t2 using(id) where t1.id = xxx;

这种场景下举荐用分区表！过滤条件为分区键并且为等值查问，最终优化器会定位到某一个固定的表分区来放大检索记录数，完满适宜分区表。

同样，用表t1和表 t1_no_pt 来举个简略例子：

两分区表关联并且过滤条件为分区键，执行工夫为0.01秒。

localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.01 sec)

两一般表关联，同样的条件，执行工夫为0.55秒，比两分区表关联慢很多倍。

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.55 sec)

用分区表和一般表关联，执行工夫0.32秒，介于前两者之间。

localhost:ytt>select count(*) from t1 a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.32 sec)

补一个两分区表关联和两一般表关联的执行打算比照，会体现的更加显著：分区表关联老本381.9，扫描行数为280；一般表关联老本249264389.78，扫描行数249125777。此时分区表关联性能晋升非常明显！

localhost:ytt>explain  format=tree select count(*) from t1 a inner join t1 b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=381.90 rows=280)
        -> Filter: (b.id = 19172)  (cost=1.02 rows=53)
            -> Table scan on b  (cost=1.02 rows=529)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=53.65 rows=53)
                -> Table scan on a  (cost=53.65 rows=529)

1 row in set (0.00 sec)

localhost:ytt>explain  format=tree select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=249264389.78 rows=249125777)
        -> Filter: (b.id = 19172)  (cost=1.87 rows=49913)
            -> Table scan on b  (cost=1.87 rows=499125)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=50257.25 rows=49913)
                -> Table scan on a  (cost=50257.25 rows=499125)

1 row in set (0.00 sec)

2.过滤条件非分区键

相似这样的查问： select * from t1 inner join t2 using(id) where t1.r1 = xxx;

这种场景下，分区表非但不会带来性能晋升，反而造成性能急剧下降。

仍然用表t1和表t1_no_pt来举例：两分区表之间关联，执行工夫为6.16秒。

localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (6.16 sec)

两一般表关联，执行工夫为0.7秒，反而比分区表快很多。

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (0.70 sec)

第三种场景：两表关联，关联键非分区键，然而过滤条件是分区键。

对于这样的场景，分区表同样不能带来性能晋升！

两分区表关联性能很差，执行工夫为6.05秒。

localhost:ytt>select count(*) from t1 a inner join t1 b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (6.05 sec)

两一般表关联性能好很多，执行工夫0.54秒。

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.54 sec)

既然过滤条件是分区键，能够思考让分区表和一般表关联。

改下之前的SQL，用过滤好的分区表数据和一般表关联，这样性能比两一般表关联要好些：执行工夫为0.39秒。

localhost:ytt>select count(*) from (select  * from t1 a where a.id = 19172) t inner join t1_no_pt b using(r1);
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.39 sec)

第四种场景：分区表关联，关联键也是分区键，然而两张分区表分区算法、或者分区数目有差别。

表t2和表t1构造雷同，记录数也雷同，然而分区数目不一样，表t1有1000个分区，表t2只有50个分区：

localhost:ytt>show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 50 */
1 row in set (0.01 sec)

基于此，关联两张分区表：执行工夫为6.43秒。

localhost:ytt>select count(*) from t1 a inner join t2 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.43 sec)

同样，关联两张一般表：执行工夫1.98秒。执行工夫比分区表要快。

localhost:ytt>select count(*) from t1_no_pt a inner join t2_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (1.98 sec)

以上性能差别起因在之前的文章中有局部提及，这里不做额定形容。

那基于表关联是否该用分区表做个总结：

用分区表做关联，最好满足以下条件，否则事与愿违：

分区键为关联条件。
如果分区键为非关联条件，那过滤条件必须得是分区键。
两分区表的分区办法，分区数目必须统一。

关于mysql:第43期多表关联场景下如何用好分区表

第一种场景：两表关联，关联键是分区键，然而没有过滤条件。

第二种场景：两表关联，关联键是分区键，然而有过滤条件。

这里又细分为两种子场景：

1. 过滤条件为分区键

2.过滤条件非分区键

第三种场景：两表关联，关联键非分区键，然而过滤条件是分区键。

第四种场景：分区表关联，关联键也是分区键，然而两张分区表分区算法、或者分区数目有差别。

那基于表关联是否该用分区表做个总结：

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于mysql:第43期多表关联场景下如何用好分区表

第一种场景：两表关联，关联键是分区键，然而没有过滤条件。

第二种场景：两表关联，关联键是分区键，然而有过滤条件。

这里又细分为两种子场景：

1. 过滤条件为分区键

2.过滤条件非分区键

第三种场景：两表关联，关联键非分区键，然而过滤条件是分区键。

第四种场景：分区表关联，关联键也是分区键，然而两张分区表分区算法、或者分区数目有差别。

那基于表关联是否该用分区表做个总结：

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复