作者:孙桥
爱可生DBA团队成员,次要负责MySQL和DMP平台日常的维持工作,对数据库自动化运维存有浓厚兴趣。
本文起源:原创投稿
*爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。
前言
对于刚接触 MySQL 数据库不久的小伙伴来说,或多或少会遇到一些辣手的小问题;比方在数据库磁盘无损坏的状况下,咱们常常会遇到数据库服务器的磁盘IO压力很大,但因为服务器上只有 MySQL 程序,咱们第一想法必定是某个连贯用户在连到数据库后,背着咱们在瞎搞着什么;然而间接通过 show processlist 命令,看到的后果不肯定精确。上面模仿一个简略场景,让咱们确切地定位问题所在。
场景模仿,制作 IO 压力
## 生成一张大表,用于测试[root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 preparesysbench 1.0.17 (using system LuaJIT 2.0.4)Creating table 'sbtest1'...Inserting 15000000 records into 'sbtest1'Creating a secondary index on 'sbtest1'...## 对生成的表进行写操作[root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 runsysbench 1.0.17 (using system LuaJIT 2.0.4)Running the test with following options:Number of threads: 1Report intermediate results every 2 second(s)Initializing random number generator from current timeInitializing worker threads...Threads started![ 2s ] thds: 1 tps: 24.47 qps: 148.33 (r/w/o: 0.00/98.89/49.44) lat (ms,95%): 86.00 err/s: 0.00 reconn/s: 0.00[ 4s ] thds: 1 tps: 53.02 qps: 318.62 (r/w/o: 0.00/212.58/106.04) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00[ 6s ] thds: 1 tps: 60.50 qps: 363.50 (r/w/o: 0.00/242.50/121.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00[ 8s ] thds: 1 tps: 65.50 qps: 391.02 (r/w/o: 0.00/260.01/131.01) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00[ 10s ] thds: 1 tps: 55.50 qps: 334.51 (r/w/o: 0.00/223.51/111.00) lat (ms,95%): 44.98 err/s: 0.00 reconn/s: 0.00[ 12s ] thds: 1 tps: 57.00 qps: 340.99 (r/w/o: 0.00/227.00/114.00) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00
问题剖析
1、通过top命令发现io wait比拟高,咦,What is going on?(伪装不晓得我在压测(●'◡'●))
top - 17:05:59 up 179 days, 7:05, 2 users, load average: 0.84, 0.70, 0.93Tasks: 181 total, 1 running, 180 sleeping, 0 stopped, 0 zombie%Cpu0 : 1.3 us, 0.7 sy, 0.0 ni, 95.3 id, 2.7 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu1 : 6.9 us, 3.8 sy, 0.0 ni, 60.3 id, 29.0 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu2 : 2.4 us, 2.0 sy, 0.0 ni, 90.8 id, 4.7 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu3 : 1.7 us, 1.3 sy, 0.0 ni, 95.3 id, 1.7 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu4 : 1.4 us, 1.0 sy, 0.0 ni, 95.6 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu5 : 3.1 us, 1.4 sy, 0.0 ni, 81.4 id, 14.2 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu6 : 0.7 us, 1.0 sy, 0.0 ni, 96.6 id, 1.7 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu7 : 1.0 us, 0.3 sy, 0.0 ni, 96.6 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu8 : 0.3 us, 1.0 sy, 0.0 ni, 94.6 id, 4.1 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu9 : 2.4 us, 1.0 sy, 0.0 ni, 91.2 id, 5.4 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu10 : 14.7 us, 2.7 sy, 0.0 ni, 82.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu11 : 2.0 us, 1.0 sy, 0.0 ni, 96.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem : 12137024 total, 185984 free, 3987936 used, 7963104 buff/cacheKiB Swap: 0 total, 0 free, 0 used. 7150088 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8858 actiont+ 20 0 7249940 1.1g 11892 S 30.0 9.3 210:38.75 mysqld 26447 tidb 20 0 4000364 1.0g 4740 S 5.3 9.0 10018:23 TiFlashMain 22761 actiont+ 20 0 2526348 36188 9756 S 4.0 0.3 1197:43 ustats 29004 root 20 0 3784084 103676 2012 S 2.0 0.9 1802:29 beam.s
2、通过iostat命令进一步剖析,咱们看到磁盘在频繁的写操作,维持在26M/s左右(%uitl值比拟大,一般来说,值大于75%是表明磁盘压力比拟大了)
[root@tidb-tikv01 ~]# iostat -xm 1 Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01) 04/22/2022 _x86_64_ (12 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 1.21 0.00 0.44 0.18 0.10 98.07Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.01 0.18 3.53 22.99 0.08 0.25 25.84 0.19 7.16 5.64 7.40 0.71 1.88scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.50 0.50 0.00 0.50 0.00avg-cpu: %user %nice %system %iowait %steal %idle 2.22 0.00 1.19 8.28 0.09 88.23Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.00 0.00 328.00 1800.00 5.12 26.24 30.17 3.18 1.49 4.68 0.91 0.41 88.30scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00avg-cpu: %user %nice %system %iowait %steal %idle 1.63 0.00 1.03 6.68 0.09 90.57Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.00 0.00 352.00 1918.00 5.50 27.82 30.06 2.96 1.31 3.21 0.96 0.37 83.60scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00avg-cpu: %user %nice %system %iowait %steal %idle 5.54 0.00 1.96 8.77 0.09 83.65Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.00 0.00 300.00 1682.00 4.69 25.68 31.38 3.09 1.55 3.68 1.17 0.45 88.60scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3、通过iotop命令能够看到与MySQL相干线程ID 22233的IO耗费为52.59%,读为2.56M/s,同时线程ID 8879的写为27.31M/s(一般来说,IO列值越大,阐明IO压力最有可能与其相干)
Total DISK READ : 5.29 M/s | Total DISK WRITE : 29.42 M/sActual DISK READ: 5.29 M/s | Actual DISK WRITE: 26.83 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 22233 be/4 actionte 2.56 M/s 473.07 K/s 0.00 % 52.59 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8879 be/4 actionte 0.00 B/s 27.31 M/s 0.00 % 17.76 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8887 be/4 actionte 0.00 B/s 0.00 B/s 0.00 % 10.10 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8907 be/4 actionte 116.33 K/s 0.00 B/s 0.00 % 6.27 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8930 be/4 actionte 7.76 K/s 0.00 B/s 0.00 % 3.45 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8910 be/4 actionte 1217.57 K/s 0.00 B/s 0.00 % 3.33 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8911 be/4 actionte 372.25 K/s 0.00 B/s 0.00 % 1.53 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 2946 be/4 actionte 0.00 B/s 1985.34 B/s 0.00 % 1.52 % ustats 8909 be/4 actionte 418.78 K/s 0.00 B/s 0.00 % 1.08 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
4、将下面获取的零碎线程ID带入到MySQL中,咱们便能够查到连贯用户sun始终在频繁写操作,当初我很狐疑你,但我先保持沉默
[root@tidb-tikv01 ~]# /opt/mysql/base/8.0.21/bin/mysql -S /opt/mysql/data/3306/mysqld.sock -pxxmysql: [Warning] Using a password on the command line interface can be insecure.Welcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 15041Server version: 8.0.21 MySQL Community Server - GPLCopyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> select * from performance_schema.threads where THREAD_OS_ID=22233\G*************************** 1. row *************************** THREAD_ID: 14880 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 14830 ### ID为148830连贯在疯狂的进行写操作,show processlist能够看到正在进行的连贯 PROCESSLIST_USER: sun PROCESSLIST_HOST: 10.186.61.16 PROCESSLIST_DB: testdbPROCESSLIST_COMMAND: Query PROCESSLIST_TIME: 0 PROCESSLIST_STATE: updating PROCESSLIST_INFO: UPDATE sbtest1 SET k=k+1 WHERE id=7899611 PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: TCP/IP THREAD_OS_ID: 22233 ### THREAD_OS_ID为linux零碎中的线程ID RESOURCE_GROUP: USR_default1 row in set (0.00 sec)mysql> select * from performance_schema.threads where THREAD_OS_ID=8879\G*************************** 1. row *************************** THREAD_ID: 21 NAME: thread/innodb/page_flush_coordinator_thread ### MySQL本身刷脏数据线程,因为用户的频繁写操作,导致该线程始终进行落盘操作 TYPE: BACKGROUND PROCESSLIST_ID: NULL PROCESSLIST_USER: NULL PROCESSLIST_HOST: NULL PROCESSLIST_DB: NULLPROCESSLIST_COMMAND: NULL PROCESSLIST_TIME: 264523 PROCESSLIST_STATE: NULL PROCESSLIST_INFO: NULL PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: NULL THREAD_OS_ID: 8879 RESOURCE_GROUP: SYS_default1 row in set (0.00 sec)mysql> show processlist;+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+| Id | User | Host | db | Command | Time | State | Info |+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+| 5 | event_scheduler | localhost | NULL | Daemon | 264544 | Waiting on empty queue | NULL || 8 | universe_op | 127.0.0.1:38362 | NULL | Sleep | 0 | | NULL || 9 | universe_op | 127.0.0.1:38374 | NULL | Sleep | 0 | | NULL || 10 | universe_op | 127.0.0.1:38376 | NULL | Sleep | 0 | | NULL || 11 | universe_op | 127.0.0.1:38380 | NULL | Sleep | 0 | | NULL || 12 | universe_op | 127.0.0.1:38378 | NULL | Sleep | 0 | | NULL || 13 | universe_op | 127.0.0.1:38384 | NULL | Sleep | 0 | | NULL || 14 | universe_op | 127.0.0.1:38382 | NULL | Sleep | 0 | | NULL || 15 | universe_op | 10.186.61.45:50322 | NULL | Binlog Dump GTID | 264542 | Master has sent all binlog to slave; waiting for more updates | NULL || 17 | universe_op | 127.0.0.1:38454 | NULL | Sleep | 14 | | NULL || 18 | universe_op | 127.0.0.1:38456 | NULL | Sleep | 14 | | NULL || 19 | universe_op | 127.0.0.1:38458 | NULL | Sleep | 14 | | NULL || 20 | universe_op | 127.0.0.1:38460 | NULL | Sleep | 14 | | NULL || 21 | universe_op | 127.0.0.1:38462 | NULL | Sleep | 14 | | NULL || 22 | universe_op | 127.0.0.1:38464 | NULL | Sleep | 14 | | NULL || 23 | universe_op | 127.0.0.1:38452 | NULL | Sleep | 14 | | NULL || 24 | universe_op | 127.0.0.1:38466 | NULL | Sleep | 14 | | NULL || 142 | universe_op | 127.0.0.1:33390 | NULL | Sleep | 0 | | NULL || 14830 | sun | 10.186.61.16:60730 | testdb | Query | 0 | update | INSERT INTO sbtest1 (id, k, c, pad) VALUES (7493994, 6518751, '15795471662-35813531247-93886706048-2 || 15041 | root | localhost | NULL | Query | 0 | starting | show processlist |+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+20 rows in set (0.00 sec)
5、通过kill掉14830后,发现磁盘压力果然over了,你这“小阳人”( ̄(# ̄)☆╰╮o( ̄皿 ̄///)
mysql> kill 14830;Query OK, 0 rows affected (0.00 sec)mysql> exitByeSegmentation fault[root@tidb-tikv01 ~]# iostat -xm 1Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01) 04/22/2022 _x86_64_ (12 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 1.21 0.00 0.44 0.18 0.10 98.06Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.01 0.18 3.83 24.55 0.09 0.27 26.12 0.19 6.78 5.46 6.99 0.69 1.95scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.50 0.50 0.00 0.50 0.00avg-cpu: %user %nice %system %iowait %steal %idle 1.85 0.00 0.59 0.00 0.00 97.56Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.00 0.00 0.00 61.00 0.00 0.46 15.52 0.01 0.20 0.00 0.20 0.18 1.10scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00avg-cpu: %user %nice %system %iowait %steal %idle 3.36 0.00 0.84 0.08 0.00 95.71Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilvda 0.00 10.00 1.00 53.00 0.02 0.48 18.81 2.87 0.22 1.00 0.21 1.31 7.10scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
备注阐明:
生产环境可不要随随便便kill哦,出问题是要被鞭尸的呦~
总结
对于一些新入门的小可爱来说,以上办法能够循序渐进的帮忙你定位到是哪些连接线程把你的服务器IO折腾高的,能够让你更加理解Linux零碎与MySQL之间的联动关系。老鸟嘛,emmm...