关于mysql:linux-磁盘io利用率高分析的正确姿势

作为一个 DBA 难免不了会遇到性能问题，那么咱们遇到性能问题该如何进行排查呢？例如咱们在高并发的业务下，呈现业务响应慢，解决工夫长咱们又该如何动手进行排查，本片文章将剖析 io 高的状况下如何剖析及定位。

环境配置：本次测试应用 128C_512G_4TSSD 服务器配置，MySQL 版本为 8.0.27
场景模仿：应用 sysbench 创立 5 个表，每个表 2 亿条数据，执行产生笛卡尔积查问的 sql 语句，产生 io, 能够模仿业务压力。
首先应用 sysbench 进行数据压测

Shell> sysbench --test=/usr/local/share/sysbench/oltp_insert.lua --mysql-host=XXX --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 prepare

应用 sysbench 进行模仿高并发

shell> sysbench --test=/usr/local/share/sysbench/oltp_write_only.lua --mysql-host=xxx --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 --max-time=60000 --report-interval=1 --threads=1000 --max-requests=0 --mysql-ignore-errors=all run

执行笛卡尔积 sql 语句

mysql> select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc;

shell> top
top - 19:49:05 up 10 days,  8:16,  2 users,  load average: 72.56, 40.21, 17.08
Tasks: 1288 total,   1 running, 586 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.7 us,  4.2 sy,  0.0 ni, 75.9 id,  1.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 53542118+total, 23667507+free, 22735366+used, 71392448 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 23128006+avail Mem

由上可知：目前一分钟负载为 72.56，且呈上升趋势，并且存在 io 压力

shell> iostat -m -x 1
Linux 4.14.0-115.el7a.0.1.aarch64 (mysql-4)     01/08/2022     _aarch64_    (128 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 14213.00 27430.00   222.08   465.15    33.80     5.39    0.13    0.14    0.12   0.02  86.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

由上可知：目前有多块物理磁盘，sda 磁盘的 io 压力较大

shell> iostat -d /dev/sda -m -x 1
Linux 4.14.0-115.el7a.0.1.aarch64 (mysql-4)     01/08/2022     _aarch64_    (128 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            0.00     0.17    7.86   81.23     0.29     3.96    97.88     0.23    2.53    0.22    2.76   0.04   0.33
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            0.00     0.00 21109.00 42839.00   329.81   710.90    33.33    19.47    0.30    0.16    0.37   0.02  96.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            0.00     0.00 19287.00 41404.00   301.36   692.29    33.53    15.73    0.26    0.18    0.30   0.02  93.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            0.00     0.00 22135.00 43044.00   345.86  1165.18    47.48   100.87    1.55    0.20    2.24   0.01  97.00

由上可知：目前 sda 磁盘的压力比拟大，每秒写入比每秒读差距较大，证实目前有大量的 io 写入

shell> pidstat -d  1
Linux 4.14.0-115.el7a.0.1.aarch64 (mysql-4)     01/08/2022     _aarch64_    (128 CPU)
08:01:43 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
08:01:44 PM  1000     73739  62018.35 171346.79      0.00  mysqld
08:01:44 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
08:01:45 PM  1000     73739 145328.00 435216.00      0.00  mysqld
08:01:45 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
08:01:46 PM  1000     73739 141488.00 433232.00      0.00  mysqld

由上可知：占用 io 高的应用程序是 mysql，且 pid 为 73739

shell> pidstat -dt -p 73739 1  执行两三秒即可
Average:     1000         -     73823      0.00 233133.98      0.00  |__mysqld
Average:     1000         -     74674      0.00 174291.26      0.00  |__mysqld
11:56:18 PM  1000         -     74770 124928.00  74688.00      0.00  |__mysqld
11:56:17 PM  1000         -     74770 124603.77  73358.49      0.00  |__mysqld
Average:     1000         -     74770 124761.17  74003.88      0.00  |__mysqld

由上可知：74770 这个线程占用的 io 比拟高

Shell> perf trace -t 74770 -o /tmp/tmp_aa.pstrace
Shell> cat /tmp/tmp_aa.pstrace
  2850.656 (1.915 ms): futex(uaddr: 0x653ae9c4, op: WAIT|PRIVATE_FLAG, val: 1)               = 0
  2852.572 (0.001 ms): futex(uaddr: 0x653ae990, op: WAKE|PRIVATE_FLAG, val: 1)               = 0
  2852.601 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f68)             = 0
  2852.690 (0.040 ms): write(fd: 159, buf: 0xd7a30020, count: 65536)                         = 65536
  2852.796 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f68)             = 0
  2852.798 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f58)             = 0
  2852.939 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f38)             = 0
  2852.950 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f68)             = 0
  2852.977 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f68)             = 0
  2853.029 (0.035 ms): write(fd: 64, buf: 0xcd51e020, count: 65536)                          = 65536
  2853.164 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f68)             = 0
  2853.167 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f58)             = 0
  2853.302 (0.001 ms): clock_gettime(which_clock: MONOTONIC, tp: 0xfff7bd470f38)             = 0

由上可知：目前这个线程在写入多个文件，fd 为文件句柄，文件句柄号有 64、159

shell> lsof -p 73739|grep 159u
mysqld 73739 mysql  159u   REG                8,0   212143246  7046482357 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#7046482357 (deleted)
[root@mysql-4 ~]# lsof -p 73739|grep 64u
mysqld 73739 mysql   64u   REG                8,0   211872724  6979323031 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#6979323031 (deleted)

由上可知：这个线程在大量的写入临时文件

mysql> select * from information_schema.processlist where command !='sleep';
|  9 | pcms             | 172.16.76.12:57596 | sysbench | Query            |   67 | executing                                                     | select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc |   66477 |         0 |             0 |

由上可知：目前这个 sql 曾经执行了 67s，且此 sql 应用了 group by 和 order by，必然会产生 io

mysql> select * from threads where thread_os_id=74770\G;
*************************** 1. row ***************************
          THREAD_ID: 95
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 9
   PROCESSLIST_USER: pcms
   PROCESSLIST_HOST: 172.16.76.12
     PROCESSLIST_DB: sysbench
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 91
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc
   PARENT_THREAD_ID: 1
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: TCP/IP
       THREAD_OS_ID: 74770
     RESOURCE_GROUP: USR_default
1 row in set (0.00 sec)

由上可知：通过查问 threads 表能够进行验证，该线程在频繁创立长期表的起因就来源于此 sql

mysql> explain select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: sysbench.a.id
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 2 warnings (0.00 sec)

由上可知：该 sql 的执行打算用到了长期表及临时文件，合乎

mysql> show global status like '%tmp%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 3     |
| Created_tmp_files       | 22    |
| Created_tmp_tables      | 8     |
+-------------------------+-------+

多执行几次，能够看出 tmp_files 和 tmp_disk_tables 的值在增长，证实在大量的创立临时文件及磁盘长期表，合乎该线程的行为

通过上述的一系列排查，咱们曾经剖析进去，目前 sda 磁盘的 io 使用率最高，且 mysqld 程序占用的最多，通过排查有一个线程在频繁的创立长期表或临时文件且通过登录 mysql 排查会话及线程视图能够找到是由某一个慢 sql 导致的，查看此慢 sql 的执行打算也会创立长期表和临时文件合乎咱们之前排查的预期。
此时咱们就须要针对此慢 sql 进行优化，优化步骤由 DBA 进行解决，此处进行疏忽。慢 sql 优化实现后能够进行 io 的持续察看，看 io 是否有降落

咱们能够应用 pstack 进行跟踪线程号，获取以后的线程堆栈信息。切记 pstack 会调用 gdb 进行 debug 调试

shell> pstack 74770 >/tmp/74770.pstack
Thread 1 (process 74770):
#0  ha_innobase::general_fetch (this=0xea654228, buf=0xea662028 "\212t\317\030\002", direction=1, match_mode=0) at /builds/naiwei.fang/percona-server/storage/innobase/handler/ha_innodb.cc:11159
#1  0x0000000000d9913c in handler::ha_rnd_next (this=0xea654228, buf=0xea662028 "\212t\317\030\002") at /build
s/naiwei.fang/percona-server/sql/handler.cc:3173
#2  0x0000000000f77db0 in TableScanIterator::Read (this=0xd256d5e8) at /builds/naiwei.fang/percona-server/sql/
row_iterator.h:208
#3  0x000000000124c714 in WriteRowsToChunks (xxhash_seed=899339, write_to_build_chunk=true, write_rows_with_nu
ll_in_join_key=false, join_key_buffer=0xd01fdb98, tables_to_get_rowid_for=0, chunks=0xd01fdb58, join_condition
s=..., tables=..., iterator=0xd256d5e8, thd=0xdb888000) at /builds/naiwei.fang/percona-server/sql/hash_join_it
erator.cc:282
#4  HashJoinIterator::BuildHashTable (this=this@entry=0xd01fd028) at /builds/naiwei.fang/percona-server/sql/ha
sh_join_iterator.cc:495
#5  0x000000000124c8ac in Init (this=0xd01fd028) at /builds/naiwei.fang/percona-server/sql/hash_join_iterator.
cc:203
#6  HashJoinIterator::Init (this=0xd01fd028) at /builds/naiwei.fang/percona-server/sql/hash_join_iterator.cc:1
45
#7  0x00000000010eca14 in Query_expression::ExecuteIteratorQuery (this=0xdec3a8b8, thd=thd@entry=0xdb888000) a
t /builds/naiwei.fang/percona-server/sql/sql_union.cc:1224
#8  0x00000000010ecccc in Query_expression::execute (this=this@entry=0xdec3a8b8, thd=thd@entry=0xdb888000) at 
/builds/naiwei.fang/percona-server/sql/sql_union.cc:1284
#9  0x0000000001083db0 in Sql_cmd_dml::execute_inner (this=0xd256bcb0, thd=0xdb888000) at /builds/naiwei.fang/
percona-server/sql/sql_select.cc:791
#10 0x000000000108cac8 in Sql_cmd_dml::execute (this=0xd256bcb0, thd=0xdb888000) at /builds/naiwei.fang/percon
a-server/sql/sql_select.cc:575
#11 0x00000000010384e8 in mysql_execute_command (thd=thd@entry=0xdb888000, first_level=first_level@entry=true)
 at /builds/naiwei.fang/percona-server/sql/sql_parse.cc:4677
#12 0x000000000103b314 in dispatch_sql_command (thd=thd@entry=0xdb888000, parser_state=parser_state@entry=0xff
f7bd4735b0, update_userstat=update_userstat@entry=false) at /builds/naiwei.fang/percona-server/sql/sql_parse.c
c:5273
#13 0x000000000103ccf0 in dispatch_command (thd=thd@entry=0xdb888000, com_data=0xffffb467c4d0, com_data@entry=
0xfff7bd474640, command=COM_QUERY) at /builds/naiwei.fang/percona-server/sql/sql_parse.cc:1938
#14 0x000000000103da40 in do_command (thd=thd@entry=0xdb888000) at /builds/naiwei.fang/percona-server/sql/sql_
parse.cc:1386
#15 0x0000000001152ca8 in handle_connection (arg=arg@entry=0xda53ab10) at /builds/naiwei.fang/percona-server/s
ql/conn_handler/connection_handler_per_thread.cc:307
#16 0x00000000022bc3ec in pfs_spawn_thread (arg=<optimized out>) at /builds/naiwei.fang/percona-server/storage
/perfschema/pfs.cc:2899
#17 0x0000ffffb43c7c48 in start_thread () from /lib64/libpthread.so.0
#18 0x0000ffffb3c0f600 in thread_start () from /lib64/libc.so.6

Enjoy GreatSQL :)

GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6…

万答 #12，MGR 整个集群挂掉后，如何能力主动选主，不必手动干涉
https://mp.weixin.qq.com/s/07…

『2021 数据技术嘉年华·ON LINE』：《MySQL 高可用架构演进及实际》
https://mp.weixin.qq.com/s/u7…

一条 sql 语句慢在哪之抓包剖析
https://mp.weixin.qq.com/s/AY…

万答 #15，都有哪些状况可能导致 MGR 服务无奈启动
https://mp.weixin.qq.com/s/in…

技术分享 | 为什么 MGR 一致性模式不举荐 AFTER
https://mp.weixin.qq.com/s/rN…

GreatSQL 是由万里数据库保护的 MySQL 分支，专一于晋升 MGR 可靠性及性能，反对 InnoDB 并行查问个性，是实用于金融级利用的 MySQL 分支版本。

Gitee:
https://gitee.com/GreatSQL/Gr…

GitHub:
https://github.com/GreatSQL/G…

Bilibili：
https://space.bilibili.com/13…

微信 &QQ 群：
可搜寻增加 GreatSQL 社区助手微信好友，发送验证信息“加群”退出 GreatSQL/MGR 交换微信群

QQ 群：533341697
微信小助手：wanlidbc

本文由博客一文多发平台 OpenWrite 公布！

关于mysql:linux-磁盘io利用率高分析的正确姿势

一、背景简介

二、环境复现

三、零碎层面底层故障排查

3.1 查看以后服务器状态

3.2 查看以后各个磁盘设施的 io 状况

3.3 查看 sda 磁盘以后的 io 读写状况

3.4 查看 sda 磁盘中哪个应用程序占用的 io 比拟高

3.5 剖析应用程序中哪一个线程占用的 io 比拟高

3.6 剖析这个线程在干什么？

3.7 查看这个文件句柄是什么

四、剖析 MySQL 应用程序

4.1 查看以后的会话列表

4.2 通过线程号查问会话

4.3 查看该 sql 语句的执行打算，进行进一步认证

4.4 查看全局状态进一步进行确认

五、故障解决

六、代码剖析

文章举荐：

对于 GreatSQL