共计 10615 个字符,预计需要花费 27 分钟才能阅读完成。
作者:孙桥
爱可生 DBA 团队成员,次要负责 MySQL 和 DMP 平台日常的维持工作,对数据库自动化运维存有浓厚兴趣。
本文起源:原创投稿
* 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。
前言
对于刚接触 MySQL 数据库不久的小伙伴来说,或多或少会遇到一些辣手的小问题;比方在数据库磁盘无损坏的状况下,咱们常常会遇到数据库服务器的磁盘 IO 压力很大,但因为服务器上只有 MySQL 程序,咱们第一想法必定是某个连贯用户在连到数据库后,背着咱们在瞎搞着什么;然而间接通过 show processlist 命令,看到的后果不肯定精确。上面模仿一个简略场景,让咱们确切地定位问题所在。
场景模仿,制作 IO 压力
## 生成一张大表,用于测试 [root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 prepare
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Creating table 'sbtest1'...
Inserting 15000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...## 对生成的表进行写操作
[root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 run
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options:
Number of threads: 1
Report intermediate results every 2 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[2s] thds: 1 tps: 24.47 qps: 148.33 (r/w/o: 0.00/98.89/49.44) lat (ms,95%): 86.00 err/s: 0.00 reconn/s: 0.00
[4s] thds: 1 tps: 53.02 qps: 318.62 (r/w/o: 0.00/212.58/106.04) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[6s] thds: 1 tps: 60.50 qps: 363.50 (r/w/o: 0.00/242.50/121.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[8s] thds: 1 tps: 65.50 qps: 391.02 (r/w/o: 0.00/260.01/131.01) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00
[10s] thds: 1 tps: 55.50 qps: 334.51 (r/w/o: 0.00/223.51/111.00) lat (ms,95%): 44.98 err/s: 0.00 reconn/s: 0.00
[12s] thds: 1 tps: 57.00 qps: 340.99 (r/w/o: 0.00/227.00/114.00) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00
问题剖析
1、通过 top 命令发现 io wait 比拟高,咦,What is going on?(伪装不晓得我在压测 (●’◡’●))
top - 17:05:59 up 179 days, 7:05, 2 users, load average: 0.84, 0.70, 0.93
Tasks: 181 total, 1 running, 180 sleeping, 0 stopped, 0 zombie
%Cpu0 : 1.3 us, 0.7 sy, 0.0 ni, 95.3 id, 2.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 6.9 us, 3.8 sy, 0.0 ni, 60.3 id, 29.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 2.4 us, 2.0 sy, 0.0 ni, 90.8 id, 4.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 1.7 us, 1.3 sy, 0.0 ni, 95.3 id, 1.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 1.4 us, 1.0 sy, 0.0 ni, 95.6 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 3.1 us, 1.4 sy, 0.0 ni, 81.4 id, 14.2 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.7 us, 1.0 sy, 0.0 ni, 96.6 id, 1.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 1.0 us, 0.3 sy, 0.0 ni, 96.6 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.3 us, 1.0 sy, 0.0 ni, 94.6 id, 4.1 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 2.4 us, 1.0 sy, 0.0 ni, 91.2 id, 5.4 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 14.7 us, 2.7 sy, 0.0 ni, 82.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 2.0 us, 1.0 sy, 0.0 ni, 96.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 12137024 total, 185984 free, 3987936 used, 7963104 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 7150088 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8858 actiont+ 20 0 7249940 1.1g 11892 S 30.0 9.3 210:38.75 mysqld
26447 tidb 20 0 4000364 1.0g 4740 S 5.3 9.0 10018:23 TiFlashMain
22761 actiont+ 20 0 2526348 36188 9756 S 4.0 0.3 1197:43 ustats
29004 root 20 0 3784084 103676 2012 S 2.0 0.9 1802:29 beam.s
2、通过 iostat 命令进一步剖析,咱们看到磁盘在频繁的写操作,维持在 26M/ s 左右(%uitl 值比拟大,一般来说,值大于 75% 是表明磁盘压力比拟大了)
[root@tidb-tikv01 ~]# iostat -xm 1
Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01) 04/22/2022 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.21 0.00 0.44 0.18 0.10 98.07
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.01 0.18 3.53 22.99 0.08 0.25 25.84 0.19 7.16 5.64 7.40 0.71 1.88
scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.50 0.50 0.00 0.50 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
2.22 0.00 1.19 8.28 0.09 88.23
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 328.00 1800.00 5.12 26.24 30.17 3.18 1.49 4.68 0.91 0.41 88.30
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
1.63 0.00 1.03 6.68 0.09 90.57
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 352.00 1918.00 5.50 27.82 30.06 2.96 1.31 3.21 0.96 0.37 83.60
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
5.54 0.00 1.96 8.77 0.09 83.65
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 300.00 1682.00 4.69 25.68 31.38 3.09 1.55 3.68 1.17 0.45 88.60
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3、通过 iotop 命令能够看到与 MySQL 相干线程 ID 22233 的 IO 耗费为 52.59%,读为 2.56M/s,同时线程 ID 8879 的写为 27.31M/s(一般来说,IO 列值越大,阐明 IO 压力最有可能与其相干)
Total DISK READ : 5.29 M/s | Total DISK WRITE : 29.42 M/s
Actual DISK READ: 5.29 M/s | Actual DISK WRITE: 26.83 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
22233 be/4 actionte 2.56 M/s 473.07 K/s 0.00 % 52.59 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8879 be/4 actionte 0.00 B/s 27.31 M/s 0.00 % 17.76 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8887 be/4 actionte 0.00 B/s 0.00 B/s 0.00 % 10.10 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8907 be/4 actionte 116.33 K/s 0.00 B/s 0.00 % 6.27 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8930 be/4 actionte 7.76 K/s 0.00 B/s 0.00 % 3.45 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8910 be/4 actionte 1217.57 K/s 0.00 B/s 0.00 % 3.33 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
8911 be/4 actionte 372.25 K/s 0.00 B/s 0.00 % 1.53 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
2946 be/4 actionte 0.00 B/s 1985.34 B/s 0.00 % 1.52 % ustats
8909 be/4 actionte 418.78 K/s 0.00 B/s 0.00 % 1.08 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
4、将下面获取的零碎线程 ID 带入到 MySQL 中,咱们便能够查到连贯用户 sun 始终在频繁写操作,当初我很狐疑你,但我先保持沉默
[root@tidb-tikv01 ~]# /opt/mysql/base/8.0.21/bin/mysql -S /opt/mysql/data/3306/mysqld.sock -pxx
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15041
Server version: 8.0.21 MySQL Community Server - GPL
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> select * from performance_schema.threads where THREAD_OS_ID=22233\G
*************************** 1. row ***************************
THREAD_ID: 14880
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 14830 ### ID 为 148830 连贯在疯狂的进行写操作,show processlist 能够看到正在进行的连贯
PROCESSLIST_USER: sun
PROCESSLIST_HOST: 10.186.61.16
PROCESSLIST_DB: testdb
PROCESSLIST_COMMAND: Query
PROCESSLIST_TIME: 0
PROCESSLIST_STATE: updating
PROCESSLIST_INFO: UPDATE sbtest1 SET k=k+1 WHERE id=7899611
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: TCP/IP
THREAD_OS_ID: 22233 ### THREAD_OS_ID 为 linux 零碎中的线程 ID
RESOURCE_GROUP: USR_default
1 row in set (0.00 sec)
mysql> select * from performance_schema.threads where THREAD_OS_ID=8879\G
*************************** 1. row ***************************
THREAD_ID: 21
NAME: thread/innodb/page_flush_coordinator_thread ### MySQL 本身刷脏数据线程,因为用户的频繁写操作,导致该线程始终进行落盘操作
TYPE: BACKGROUND
PROCESSLIST_ID: NULL
PROCESSLIST_USER: NULL
PROCESSLIST_HOST: NULL
PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: NULL
PROCESSLIST_TIME: 264523
PROCESSLIST_STATE: NULL
PROCESSLIST_INFO: NULL
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: NULL
THREAD_OS_ID: 8879
RESOURCE_GROUP: SYS_default
1 row in set (0.00 sec)
mysql> show processlist;
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 264544 | Waiting on empty queue | NULL |
| 8 | universe_op | 127.0.0.1:38362 | NULL | Sleep | 0 | | NULL |
| 9 | universe_op | 127.0.0.1:38374 | NULL | Sleep | 0 | | NULL |
| 10 | universe_op | 127.0.0.1:38376 | NULL | Sleep | 0 | | NULL |
| 11 | universe_op | 127.0.0.1:38380 | NULL | Sleep | 0 | | NULL |
| 12 | universe_op | 127.0.0.1:38378 | NULL | Sleep | 0 | | NULL |
| 13 | universe_op | 127.0.0.1:38384 | NULL | Sleep | 0 | | NULL |
| 14 | universe_op | 127.0.0.1:38382 | NULL | Sleep | 0 | | NULL |
| 15 | universe_op | 10.186.61.45:50322 | NULL | Binlog Dump GTID | 264542 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 17 | universe_op | 127.0.0.1:38454 | NULL | Sleep | 14 | | NULL |
| 18 | universe_op | 127.0.0.1:38456 | NULL | Sleep | 14 | | NULL |
| 19 | universe_op | 127.0.0.1:38458 | NULL | Sleep | 14 | | NULL |
| 20 | universe_op | 127.0.0.1:38460 | NULL | Sleep | 14 | | NULL |
| 21 | universe_op | 127.0.0.1:38462 | NULL | Sleep | 14 | | NULL |
| 22 | universe_op | 127.0.0.1:38464 | NULL | Sleep | 14 | | NULL |
| 23 | universe_op | 127.0.0.1:38452 | NULL | Sleep | 14 | | NULL |
| 24 | universe_op | 127.0.0.1:38466 | NULL | Sleep | 14 | | NULL |
| 142 | universe_op | 127.0.0.1:33390 | NULL | Sleep | 0 | | NULL |
| 14830 | sun | 10.186.61.16:60730 | testdb | Query | 0 | update | INSERT INTO sbtest1 (id, k, c, pad) VALUES (7493994, 6518751, '15795471662-35813531247-93886706048-2 |
| 15041 | root | localhost | NULL | Query | 0 | starting | show processlist |
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
20 rows in set (0.00 sec)
5、通过 kill 掉 14830 后,发现磁盘压力果然 over 了,你这“小阳人”(~ε(#~)☆╰╮o(~皿~///)
mysql> kill 14830;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
Segmentation fault
[root@tidb-tikv01 ~]# iostat -xm 1
Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01) 04/22/2022 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.21 0.00 0.44 0.18 0.10 98.06
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.01 0.18 3.83 24.55 0.09 0.27 26.12 0.19 6.78 5.46 6.99 0.69 1.95
scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.50 0.50 0.00 0.50 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
1.85 0.00 0.59 0.00 0.00 97.56
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 61.00 0.00 0.46 15.52 0.01 0.20 0.00 0.20 0.18 1.10
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
3.36 0.00 0.84 0.08 0.00 95.71
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 10.00 1.00 53.00 0.02 0.48 18.81 2.87 0.22 1.00 0.21 1.31 7.10
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
备注阐明:
生产环境可不要随随便便 kill 哦,出问题是要被鞭尸的呦~
总结
对于一些新入门的小可爱来说,以上办法能够循序渐进的帮忙你定位到是哪些连接线程把你的服务器 IO 折腾高的,能够让你更加理解 Linux 零碎与 MySQL 之间的联动关系。老鸟嘛,emmm…