关于mysql:第37期适当的使用-MySQL-原生表分区

MySQL 数据库当初次要用的引擎是 InnoDB ，InnoDB 没有相似于 MERGE 引擎这样的原生拆表计划，不过有原生分区表，以程度形式拆分记录集，对利用端通明。

分区表的存在为超大表的检索申请、日常治理提供了一种额定的抉择路径。分区表应用切当，对数据库性能会有大幅晋升。

分区表次要有以下几种劣势：

大幅晋升某些查问的性能。
简化日常数据运维工作量、晋升运维效率。
并行查问、平衡写 IO 。
对利用通明，不须要在应用层部署路由或者中间层。

接下来咱们用理论例子来让前两种劣势体现更新清晰。

针对检索来讲：

优化查问性能（范畴查问）

拆分适合的分区表，对同样的查问来讲，扫描的记录数量要比非分区表少很多，性能远比非分区表来的高效。

以下示例表 t1 为非分区表，对应的分区表为 p1 ，两张表有雷同的纪录数，都为 1KW 条。

localhost:ytt> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)


localhost:ytt> show create table p1\G
*************************** 1. row ***************************
       Table: p1
Create Table: CREATE TABLE `p1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`id`)
(PARTITION p0 VALUES LESS THAN (1000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (3000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (4000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (5000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (6000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (7000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (8000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (9000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)

localhost:ytt> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.94 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.92 sec)

咱们来别离对两张表做范畴检索，以下为执行打算：

localhost:ytt> explain format=tree select count(*) from t1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id < 1000000)  (cost=407495.19 rows=2030006)
        -> Index range scan on t1 using PRIMARY  (cost=407495.19 rows=2030006)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id < 1000000)  (cost=99980.09 rows=499369)
        -> Index range scan on p1 using PRIMARY  (cost=99980.09 rows=499369)

1 row in set (0.00 sec)

表 t1 比照表 p1 的执行打算，从老本，扫描记录数来讲，前者比后者多了几倍，显著分区表比非分区表性能来的更加高效。

再来看看对两张表做不等于检索的执行打算：

localhost:ytt> explain format=tree select count(*) from t1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id <> 2000001)  (cost=1829866.58 rows=9117649)
        -> Index range scan on t1 using PRIMARY  (cost=1829866.58 rows=9117649)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id <> 2000001)  (cost=1002750.23 rows=4993691)
        -> Index range scan on p1 using PRIMARY  (cost=1002750.23 rows=4993691)

1 row in set (0.00 sec)

对于这样的低效率 SQL 来讲，从执行打算后果来看，分区表从老本、扫描记录数等均比非分区表有劣势。

###### 优化写入性能（带过滤条件的 UPDATE )。

对于这类更新申请，分区表同样要比非分区表来的高效。

上面为等值过滤的更新场景下，非分区表与分区表的执行打算比照：仅仅看扫描行数即可，分区表扫描记录数比非分区表要来的更少。

localhost:ytt> explain update t1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: t1
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 3938068
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

localhost:ytt> explain update p1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: p1
   partitions: p1,p2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 998738
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

针对运维来讲：

分区表数据与非分区数据进行替换。

分区表的特定分区数据能够很不便的导出导入，可能疾速的与非分区表数据进行替换。

创立一张表 t_p1 ，用来和表 p1 的分区 p1 替换数据。

localhost:ytt> create table t_p1 like t1;
Query OK, 0 rows affected (0.06 sec)

分区 p1 自身蕴含了 100W 行记录。应用分区表原生数据交换性能来替换数据，只花了 0.07 秒。

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.07 sec)

查看替换后的数据，表 p1 少了 100W 行记录，分区 p1 被清空，表 t_p1 多了 100W 行记录。

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000000 |
+----------+
1 row in set (0.79 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.13 sec)

能够随时把数据交换回来，被替换的表清空。

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.77 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.91 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

比照下非分区表的数据交换，步骤为：

抉择须要替换的调换表。
从原始表选出数据导入到调换表。
删除原始表波及到的数据。

如果此时须要把换出去的数据从新换入原始表，则须要以上步骤反着再来一遍，减少运维难度并且操作低效。

分区表置换还有一个最大的长处，就是比非分区表记录的日志量要小的多。咱们来从新把下面的置换操作做一次。删除所有二进制日志。

localhost:ytt>reset master;

Query OK, 0 rows affected (0.02 sec)

做一次分区置换

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (2.42 sec)

再次做置换删除表 t_p1 数据

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (0.45 sec)

此时两次置换操作记录到二进制日志 ytt1.000001 里。

localhost:ytt>show master status;
...
 ytt1.000001 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-2

重刷日志，非分区表置换记录。

localhost:ytt>flush logs;
Query OK, 0 rows affected (0.01 sec)


localhost:ytt>insert into t_p1 select * from p1 partition (p2) ;
Query OK, 934473 rows affected (5.25 sec)
Records: 934473  Duplicates: 0  Warnings: 0


localhost:ytt>show master status;
...
 ytt1.000002 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-3

来看看具体的日志文件，ytt1.000001 只占了588个字节，而 ytt1.000002 记却要占用 7.2M 。

root@ytt-pc:/var/lib/mysql/3306# ls -sihl ytt1.00000*
2109882 4.0K -rw-r----- 1 mysql mysql  588 7月  23 11:13 ytt1.000001
2109868 7.2M -rw-r----- 1 mysql mysql 7.2M 7月  23 11:14 ytt1.000002

###### 疾速清理单个分区数据。

删除单个分区数据性能要优于非分区表删除某个范畴内的数据。

比方，要清空分区表 p1 分区 p0 ，间接 truncate 单个分区。

localhost:ytt> alter table p1 truncate partition p0;
Query OK, 0 rows affected (0.07 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000001 |
+----------+
1 row in set (0.92 sec)

非分区表只有 truncate 整张表的性能，所以无奈对局部数据进行疾速清理，只能依据过滤条件来 delete 数据，那这个性能就差了很多。同样的操作，比非分区表慢几十倍。

localhost:ytt> delete from t1 where id < 1000000;
Query OK, 999999 rows affected (26.80 sec)

总结：

MySQL 分区表在很多场景下应用十分高效，本篇介绍了分区表在简略检索与运维方面的根底劣势，后续咱们一一来探讨更多场景下的分区表利用。

对于 MySQL 的技术内容，你们还有什么想晓得的吗？连忙留言通知小编吧！

关于mysql:第37期适当的使用-MySQL-原生表分区

分区表次要有以下几种劣势：

接下来咱们用理论例子来让前两种劣势体现更新清晰。

针对检索来讲：

优化查问性能（范畴查问）

针对运维来讲：

分区表数据与非分区数据进行替换。

总结：

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于mysql:第37期适当的使用-MySQL-原生表分区

分区表次要有以下几种劣势：

接下来咱们用理论例子来让前两种劣势体现更新清晰。

针对检索来讲：

优化查问性能（范畴查问）

针对运维来讲：

分区表数据与非分区数据进行替换。

总结：

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复