关于mysql:技术分享-在MySQL对于批量更新操作的一种优化方式

欢送来到 GreatSQL 社区分享的 MySQL 技术文章，如有疑难或想学习的内容，能够在下方评论区留言，看到后会进行解答

作者：景云丽、卢浩、宋源栋

GreatSQL 社区原创内容未经受权不得随便应用，转载请分割小编并注明起源。

批量更新数据，不同于这种 update a=a+1 where pk > 500，而是须要对每一行进行独自更新 update a=1 where pk=1;update a=12 where pk=7;… 这样间断多行 update 语句的场景，是少见的。

能够说是偶尔也是一种必然，在 GreatDB 5.0 的开发过程中，咱们须要对多语句批量 update 的场景进行优化。

两种多行更新操作的耗时比照
在咱们对表做多行更新的时候通常会遇到以下两种状况

1. 单语句批量更新（update a=a+1 where pk > 500）

2. 多语句批量更新（update a=1 where pk=1;update a=12 where pk=7;…）

上面咱们进行实际操作比拟两种场景，在更新雷同行数时所耗费的工夫。

数据库版本：MySQL 8.0.23

t1 表，建表语句以及筹备初始数据 1000 行

create  database if not exists test;
use test
## 建表
create table t1(c1 int primary key,c2 int);
## 创立存储过程用于生成初始数据
DROP PROCEDURE IF EXISTS insdata;
DELIMITER $$
CREATE PROCEDURE insdata(IN beg INT, IN end INT) BEGIN
 WHILE beg <= end
 DO
 INSERT INTO test.t1 values (beg, end);
SET beg = beg+1;
END WHILE;
END $$
DELIMITER ;
## 插入初始数据 1000 行
call insdata(1,1000)

更新语句

update  t1 set c2=10 where c1 <=1000;

执行后果

mysql> update  t1 set c2=10 where c1 <=1000;
Query OK, 1000 rows affected (0.02 sec)
Rows matched: 1000  Changed: 1000  Warnings: 0

以下脚本用于生成 1000 行 update 语句，更新 c2 的值等于 1000 以内的随机数

#!/bin/bash

for i in {1..1000}
do
        echo "update t1 set c2=$((RANDOM%1000+1)) where c1=$i;" >> update.sql
done

生成 sql 语句如下

update t1 set c2=292 where c1=1;
update t1 set c2=475 where c1=2;
update t1 set c2=470 where c1=3;
update t1 set c2=68 where c1=4;
update t1 set c2=819 where c1=5;

... ....
update t1 set c2=970  where c1=1000;

因为 source /ssd/tmp/tmp/1000/update.sql; 执行后果如下，执行工夫不易统计：

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

所以利用 Linux 工夫戳进行统计：

#!/bin/bash

start_time=`date +%s%3N`
/ssd/tmp/mysql/bin/mysql -h127.0.0.1 -uroot -P3316 -pabc123 -e "use test;source /ssd/tmp/tmp/1000/update.sql;"
end_time=`date +%s%3N`
echo "执行工夫为："$(($end_time-$start_time))"ms"

执行后果：

[root@computer-42 test]# bash update.sh
mysql: [Warning] Using a password on the command line interface can be insecure.
执行工夫为：4246ms

执行所用工夫为：4246ms=4.246 sec

后果比拟

总结

由上述例子咱们能够看到，同样是更新 1000 行数据。单语句批量更新与多语句批量更新的执行效率差距很大。

而产生这种微小差别的起因，除了 1000 行 sql 语句自身的网络与语句解析开销外，影响性能的中央次要是以下几个方面：

1. 如果会话是 auto_commit=1，每次执行 update 语句后都要执行 commit 操作。commit 操作消耗工夫较久，会产生两次磁盘同步（写 binlog 和写 redo 日志）。在进行比对测试时，尽量将多个语句放到一个事务内，保障只提交一次事务。

2. 向后端发送多语句时，后端每解决一个语句均会向 client 返回一个 response 包，进行一次交互。如果多语句应用一个事务的话，网络 io 交互应该是影响性能的次要方面。之前在性能测试时发现网卡驱动占用 cpu 很高。

咱们的指标是心愿在更新 1000 行时，第二种场景的耗时可能缩小到一秒以内。

接下来咱们来摸索对更新表中多行为不同值时，如何进步它的执行效率。

从执行的 update 语句自身来说，两种场景所用的表构造都进行了最大水平的简化，update 语句也非常简略，且 where 条件为主键，实践上曾经没有优化的空间。

如果从其余方面来思考，根据上述起因剖析会有这样三个优化思路：

1. 缩小执行语句的解析工夫来进步执行效率
2. 缩小 commit 操作对性能的影响，尽量将多个语句放到一个事务内，保障只提交一次事务。
3. 将多条语句合并成一条来进步执行效率

以下脚本用于生成 prepare 执行语句

#!/bin/bash

echo "prepare pr1 from'update test.t1 set c2=? where c1=?';" > prepare.sql
for i in {1..1000}
do
echo "set @a=$((RANDOM%1000+1)),@b=$i;" >>prepare.sql
echo "execute pr1 using @a,@b;" >> prepare.sql
done
echo "deallocate prepare pr1;" >> prepare.sql

生成语句如下

prepare pr1 from 'update test.t1 set c2=? where c1=?';
set @a=276,@b=1;
execute pr1 using @a,@b;
set @a=341,@b=2;
execute pr1 using @a,@b;
set @a=803,@b=3;
execute pr1 using @a,@b;
... ...
set @a=582,@b=1000;
execute pr1 using @a,@b;
deallocate prepare pr1;

执行语句

#!/bin/bash

start_time=`date +%s%3N`
/ssd/tmp/mysql/bin/mysql -h127.0.0.1 -uroot -P3316 -pabc123 -e "use test;source /ssd/tmp/tmp/test/prepare.sql;"
end_time=`date +%s%3N`
echo "执行工夫为："$(($end_time-$start_time))"ms"

执行后果：

[root@computer-42 test]# bash prepare_update_id.sh
mysql: [Warning] Using a password on the command line interface can be insecure.
执行工夫为：4518ms

与优化前相比

很遗憾，执行总耗时反而减少了。

这里笔者有一点揣测是因为本来一条 update 语句，被拆分成了两条语句：

set @a=276,@b=1;
execute pr1 using @a,@b;

这样在 MySQL 客户端和 MySQL 过程之间的通信次数减少了，所以减少了总耗时。

因为 prepare 预处理语句执行时只能应用用户变量传递，以下执行语句会报错

mysql> execute pr1 using 210,5;
ERROR 1064 (42000): You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version
for the right syntax to use near '210,5' at line 1

所以无奈在语法方面将两条语句从新合并，笔者便应用了以下另外一种执行形式

执行语句

#!/bin/bash

start_time=`date +%s%3N`
/ssd/tmp/mysql/bin/mysql -h127.0.0.1 -uroot -P3316 -pabc123  <<EOF
use test;
DROP PROCEDURE IF EXISTS pre_update;
DELIMITER $$
CREATE PROCEDURE pre_update(IN beg INT, IN end INT) BEGIN
 prepare pr1 from 'update test.t1 set c2=? where c1=?';
 WHILE beg <= end
 DO
 set  @a=beg+1,@b=beg;
 execute pr1 using @a,@b;
 SET beg = beg+1;
END WHILE;
deallocate prepare pr1;
END $$
DELIMITER ;
call pre_update(1,1000);
EOF
end_time=`date +%s%3N`
echo "执行工夫为："$(($end_time-$start_time))"ms"

执行后果：

[root@computer-42 test]# bash prepare_update_id.sh
mysql: [Warning] Using a password on the command line interface can be insecure.
执行工夫为：3862ms

与优化前相比：

这样的优化幅度合乎 prepare 语句的实践预期，但仍旧不够现实。

以下脚本用于生成 1000 行 update 语句在一个事务内，更新 c2 的值等于 1000 以内的随机数

#!/bin/bash
echo "begin;" > update.sql
for i in {1..1000}
do
        echo "update t1 set c2=$((RANDOM%1000+1)) where c1=$i;" >> update.sql
done
echo "commit;" >> update.sql

生成 sql 语句如下

begin;
update t1 set c2=279 where c1=1;
update t1 set c2=425 where c1=2;
update t1 set c2=72 where c1=3;
update t1 set c2=599 where c1=4;
update t1 set c2=161 where c1=5;

... ....
update t1 set c2=775  where c1=1000;
commit;

执行工夫统计的办法，同上

[root@computer-42 test]# bash update.sh 
mysql: [Warning] Using a password on the command line interface can be insecure.
执行工夫为：194ms

执行工夫为 194ms=0.194sec

与优化前相比：

能够看出屡次 commit 操作对性能的影响还是很大的。

合并多条 update 语句
在这里咱们引入一种并不罕用的 MySQL 语法：

1）优化前：

update 多行执行语句相似“update xxx; update xxx;update xxx;… …”

2）优化后：

改成先把要更新的语句拼成一个视图（后果集表)，而后用后果集表和源表进行关联更新。这种更新形式有个隐式限度“按主键或惟一索引关联更新”。

UPDATE t1 m, (
    SELECT 1 AS c1, 2 AS c2
    UNION ALL
    SELECT 2, 2
    UNION ALL
    SELECT 3, 3
    ... ...
    UNION ALL
    SELECT n, 2
  ) r
SET m.c1 = r.c1, m.c2 = r.c2
WHERE m.c1 = r.c1;

3）具体的例子：

### 建表
create table t1(c1 int primary key,c2 int);
### 插入 5 行数据
insert into t1 values(1,1),(2,1),(3,1),(4,1),(5,1);
select  * from t1;
### 更新 c2 为 c1+1
UPDATE t1 m, (
  SELECT 1 AS c1, 2 AS c2
  UNION ALL
  SELECT 2, 3
  UNION ALL
  SELECT 3, 4
  UNION ALL
  SELECT 4, 5
  UNION ALL
  SELECT 5, 6
 ) r
SET m.c1 = r.c1, m.c2 = r.c2
WHERE m.c1 = r.c1;
### 查问后果
select * from t1;

执行后果：

  mysql> create table t1(c1 int primary key,c2 int);
  Query OK, 0 rows affected (0.03 sec)

  mysql> insert into t1 values(1,1),(2,1),(3,1),(4,1),(5,1);
  Query OK, 5 rows affected (0.00 sec)
  Records: 5  Duplicates: 0  Warnings: 0

  mysql> select * from t1;
  +----+------+
  | c1 | c2   |
  +----+------+
  |  1 |    1 |
  |  2 |    1 |
  |  3 |    1 |
  |  4 |    1 |
  |  5 |    1 |
  +----+------+
  5 rows in set (0.00 sec)

  mysql> update  t1 m,(select 1 as c1,2 as c2 union all select 2,3 union all select 3,4 union all select 4,5 union all select 5,6) r set m.c1=r.c1,m.c2=r.c2  where m.c1=r.c1;
Query OK, 5 rows affected (0.01 sec)
  Rows matched: 5  Changed: 5  Warnings: 0

  mysql> select * from t1;
  +----+------+
  | c1 | c2   |
+----+------+
  |  1 |    2 |
  |  2 |    3 |
  |  3 |    4 |
  |  4 |    5 |
  |  5 |    6 |
  +----+------+
  5 rows in set (0.00 sec)

4）更进一步的证实

在这里笔者抉择通过观察语句执行生成的 binlog，来证实优化形式的正确性。

首先是未经优化的语句:

begin;
update t1 set c2=2 where c1=1;
update t1 set c2=3 where c1=2;
update t1 set c2=4 where c1=3;
update t1 set c2=5 where c1=4;
update t1 set c2=6 where c1=5;
commit;

......
### UPDATE `test`.`t1`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=2 /* INT meta=0 nullable=1 is_null=0 */
......
### UPDATE `test`.`t1`
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=3 /* INT meta=0 nullable=1 is_null=0 */
......
### UPDATE `test`.`t1`
### WHERE
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2=4 /* INT meta=0 nullable=1 is_null=0 */
......
### UPDATE `test`.`t1`
### WHERE
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2=5 /* INT meta=0 nullable=1 is_null=0 */
......
### UPDATE `test`.`t1`
### WHERE
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=6 /* INT meta=0 nullable=1 is_null=0 */
......

而后是优化后的语句：

UPDATE t1 m, (
  SELECT 1 AS c1, 2 AS c2
  UNION ALL
  SELECT 2, 3
  UNION ALL
  SELECT 3, 4
  UNION ALL
  SELECT 4, 5
  UNION ALL
  SELECT 5, 6
 ) r
SET m.c1 = r.c1, m.c2 = r.c2
WHERE m.c1 = r.c1;

### UPDATE `test`.`t1`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=2 /* INT meta=0 nullable=1 is_null=0 */
### UPDATE `test`.`t1`
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=3 /* INT meta=0 nullable=1 is_null=0 */
### UPDATE `test`.`t1`
### WHERE
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2=4 /* INT meta=0 nullable=1 is_null=0 */
### UPDATE `test`.`t1`
### WHERE
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2=5 /* INT meta=0 nullable=1 is_null=0 */
### UPDATE `test`.`t1`
### WHERE
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=6 /* INT meta=0 nullable=1 is_null=0 */

能够看到，优化前后 binlog 中记录的 SQL 语句是统一的。这也阐明了咱们优化后语句与原执行语句是等效的。

5）从语法角度的剖析

UPDATE t1 m, -- 被更新的 t1 表设置别名为 m
(
  SELECT 1 AS c1, 2 AS c2
  UNION ALL
  SELECT 2, 3
  UNION ALL
  SELECT 3, 4
  UNION ALL
  SELECT 4, 5
  UNION ALL
  SELECT 5, 6
) r -- 通过子查问构建的长期表 r
SET m.c1 = r.c1, m.c2 = r.c2
WHERE m.c1 = r.c1

将子查问长期表 r 独自拿进去，咱们看一下执行后果：

mysql> select 1 as c1,2 as c2 union all select 2,3 union all select 3,4 union all select 4,5 union all select 5,6;
+----+----+
| c1 | c2 |
+----+----+
|  1 |  2 |
|  2 |  3 |
|  3 |  4 |
|  4 |  5 |
|  5 |  6 |
+----+----+
5 rows in set (0.00 sec)

能够看到，这就是咱们想要更新的那局部数据，在更新之后的样子。通过 t1 表与 r 表进行 join update，就能够将 t1 表中相应的那局部数据，更新成咱们想要的样子，实现了应用一条语句实现多行更新的操作。

6）看一下执行打算

以下为 explain 执行打算，应用了嵌套循环连贯，外循环表 t1 as m 依据条件 m.c1=r.c1 过滤出 5 条数据，每更新一行数据须要扫描一次内循环表 r，共循环 5 次：

如果光看执行打算，仿佛这条语句的执行效率不是很高，所以咱们接下来真正执行一下。

7）实际测验

以下脚本用于生成优化后 update 语句，更新 c2 的值等于 1000 以内的随机数

#!/bin/bash

echo "update t1 as m,(select 1 as c1,2 as c2" >> update-union-all.sql

for j in {2..1000}
do
        echo "union all select $j,$((RANDOM%1000+1))" >> update-union-all.sql

done

echo ") as r set m.c2=r.c2 where m.c1=r.c1" >> update-union-all.sql

生成 SQL 语句如下

update t1 as m,(select 1 as c1,2 as c2
union all select 2,644
union all select 3,322
union all select 4,660
union all select 5,857
union all select 6,752
... ...
union all select 999,225
union all select 1000,77
) as r set m.c2=r.c2 where m.c1=r.c1

执行语句

#!/bin/bash

start_time=`date +%s%3N`
/ssd/tmp/mysql/bin/mysql -h127.0.0.1 -uroot -P3316 -pabc123 -e \
"use test;source /ssd/tmp/tmp/1000/update-union-all.sql;"
end_time=`date +%s%3N`
echo "执行工夫为："$(($end_time-$start_time))"ms"

执行后果：

[root@computer-42 test]# bash update-union-all.sh
mysql: [Warning] Using a password on the command line interface can be insecure.
执行工夫为：58ms

与优化前相比:

屡次测试比照后果如下：

依据以上实践剖析与理论验证，咱们找到了一种对批量更新场景的优化形式。

Enjoy GreatSQL :)

技术分享 | MGR 最佳实际(MGR Best Practice)
https://mp.weixin.qq.com/s/66…

技术分享 | 万里数据库 MGR Bug 修复之路
https://mp.weixin.qq.com/s/Ia…

Macos 零碎编译 percona 及局部函数在 Macos 零碎上运算差别
https://mp.weixin.qq.com/s/jA…

技术分享 | 利用 systemd 治理 MySQL 单机多实例
https://mp.weixin.qq.com/s/iJ…

产品 | GreatSQL，打造更好的 MGR 生态
https://mp.weixin.qq.com/s/By…

产品 | GreatSQL MGR 优化参考
https://mp.weixin.qq.com/s/5m…

GreatSQL 是由万里数据库保护的 MySQL 分支，专一于晋升 MGR 可靠性及性能，反对 InnoDB 并行查问个性，是实用于金融级利用的 MySQL 分支版本。

Gitee:
https://gitee.com/GreatSQL/Gr…

GitHub:
https://github.com/GreatSQL/G…

微信 &QQ 群：

可搜寻增加 GreatSQL 社区助手微信好友，发送验证信息“加群”退出 GreatSQL/MGR 交换微信群

QQ 群：533341697
微信小助手：wanlidbc

本文由博客一文多发平台 OpenWrite 公布！

关于mysql:技术分享-在MySQL对于批量更新操作的一种优化方式

引言

数据筹备

1. 单语句批量更新

2. 多语句批量更新

对第二种场景的优化

简略剖析

计划一：应用 prepare 语句，减小解析工夫

计划二：多个 update 语句放到一个事务内执行，最终 commit 一次

计划三：应用非凡 SQL 语法，将多个 update 语句合并

总结

文章举荐：

对于 GreatSQL