共计 3654 个字符,预计需要花费 10 分钟才能阅读完成。
概述
IO 提早问题简直是每个生产零碎都会或多或少遇到的问题。尽管当初 NVMe + SSDs 曾经能够达到 10Gbytes/s 的呑吐量,价格也十分亲民。但 IO 提早问题不会隐没。因为:
- 一些基于网络的的存储计划,如 Ceph,人造地有不稳定性
- SSD / RAIN Controller 自身的不稳定性
在 Linux 下,传统地,咱们有 iostat / sar 等等工具能够看零碎级、存储设备级的问题。但均不能通知你以下几点:
- 产生 IO 提早时,过程 / 线程实际上的挂起时长
- 过程 / 线程因 IO 提早挂起过几次
这些问题,在 Linux 下因为写操作的异步落盘 pdflush 线程,问题变得更加难答复。当然,你能够用新技术 BPF,但不肯定须要这个牛刀。
有同学会问,晓得 IO 提早对线程的理论影响有什么用?答案可见我另外的两篇文章:
- eBPF 求证坊间风闻:Java GC 日志可导致整个 JVM 服务卡顿?
- eBPF 求证坊间风闻:mmap + Java Safepoint 可导致整个 JVM 服务卡顿?
基本原理
Linux 有很多暗藏个性,其中 Per-task statistics interface 与其下层的 Delay accounting 能够为每个线程减少一些 delay 的统计指标。内核源码中,也有读取相干指标的 demo 利用源码。以下是一个操作示例:
sudo sysctl kernel.task_delayacct=1
# run your app
sudo ./getdelays -t 608452 -d -i
print delayacct stats ON
printing IO accounting
TGID 608452
CPU count real total virtual total delay total delay average
60043 4332000000 4670325102 49110971 0.001ms
IO count delay total delay average
199 200090856599 1005ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
0 0 0ms
THRASHING count delay total delay average
0 0 0ms
COMPACT count delay total delay average
0 0 0ms
WPCOPY count delay total delay average
201 313245 0ms
: read=0, write=0, cancelled_write=0
Ubuntu 如同没有这个 getdelays 工具的 apt 包,须要本人编译。我是个懒人,所以我抉择了另一个查看这些指标的办法。
咱们晓得 linux 的 /proc/pid/stat
其实曾经有这个 IO Delay 的指标了:
https://man7.org/linux/man-pages/man5/proc.5.html
/proc/pid/stat Status information about the process. This is used by ps(1). It is defined in the kernel source file fs/proc/array.c. ... (42) delayacct_blkio_ticks %llu (since Linux 2.6.18) Aggregated block I/O delays, measured in clock ticks (centiseconds).
所以只须要找个其它现成的程序,会应用到这个指标就行。正好,htop 与 pidstat 都会。咱们看看罕用点的 pidstat。
https://manpages.ubuntu.com/manpages/focal/en/man1/pidstat.1….
-d Report I/O statistics (kernels 2.6.20 and later only). The following values may be displayed: ... iodelay Block I/O delay of the task being monitored, measured in clock ticks. This metric includes the delays spent waiting for sync block I/O completion and for swapin block I/O completion.
如果你好奇,能够查看它的源码:https://github.dev/sysstat/sysstat/blob/master/pidstat.c
这是我执行的一个后果:
sudo sysctl kernel.task_delayacct=1
# run your app
sudo pidstat -d -r -p 202484 -u 205:08:35 PM UID PID %usr %system %guest %wait %CPU CPU Command
05:08:37 PM 0 202484 9.50 66.00 0.00 0.00 75.50 1 dd
05:08:35 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
05:08:37 PM 0 202484 1347648.00 0.00 0.00 49 dd
05:08:37 PM UID PID %usr %system %guest %wait %CPU CPU Command
05:08:39 PM 0 202484 10.00 82.00 0.00 0.00 92.00 1 dd
05:08:37 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
05:08:39 PM 0 202484 1598144.00 0.00 0.00 15 dd
如果你好奇 Per-task statistics interface 与其下层的 Delay accounting 与 /proc/pid/stat
之间在内核设计与源码之间的关系,能够见我整顿的图:
如上图排版有问题,请点这里用 Draw.io 关上。局部带互动链接和 hover tips
sysctl kernel.task_delayacct
- [kernel commit: [tip: sched/core] delayacct: Add sysctl to enable at runtime. CommitterDate: Wed, 12 May 2021](https://www.spinics.net/lists/linux-tip-commits/msg57566.html)
iotop
documentation:
https://kaisenlinux.org/manpages/iotop.html
iotop watches I/O usage information output by the Linux kernel (requires 2.6.20 or later) and displays a table of current I/O usage by processes or threads on the system. At least the CONFIG_TASK_DELAY_ACCT, CONFIG_TASK_IO_ACCOUNTING, CONFIG_TASKSTATS and CONFIG_VM_EVENT_COUNTERS options need to be enabled in your Linux kernel build configuration and since Linux kernel 5.14, the
kernel.task_delayacct
sysctl enabled.从 Linux 内核 5.14.x 开始,
kernel.task_delayacct
可在运行时配置并默认设置为敞开。
Ubuntu 下编译 getdelays 的办法
https://www.kimullaa.com/posts/202112072130/
cd kernel_src/tools/accounting/
gcc -I/usr/src/linux-headers-5.19.0-50-generic/include/uapi -I/usr/src/linux-headers-5.19.0-50-generic/include/generated/uapi -I/usr/src/linux-headers-5.19.0-50-generic/include getdelays.c -o getdelays
结语
前面,我会写另外两编相干文章:《eBPF 求证坊间风闻:Java GC 日志可导致整个 JVM 服务卡顿?》、eBPF 求证坊间风闻:mmap + Java Safepoint 可导致整个 JVM 服务卡顿?
参考
- The struct taskstats
- Control Groupstats
- Linux Delay Accounting
- getdelays – get delay accounting information from the kernel
- 尝试应用 delay accounting – delay accounting を使ってみる
- delayacct: Default disabled