有很多时候我们发现线上 cpu 使用率过高或者内存溢出等情况,其实在 linux 环境下是可以看到其使用情况和具体的错误信息的
查看占用 cpu 高的进程
[log@task-a-shprod-1 ~]$ top
top - 12:00:19 up 20 days, 19:46, 1 user, load average: 2.42, 1.71, 2.40
Tasks: 98 total, 2 running, 96 sleeping, 0 stopped, 0 zombie
%Cpu(s): 53.1 us, 16.9 sy, 0.0 ni, 27.8 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st
KiB Mem : 16267724 total, 353600 free, 8349840 used, 7564284 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 7557728 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15980 root 20 0 8375964 3.148g 14824 S 175.7 20.3 3329:10 java
348 root 20 0 62560 20068 19612 R 85.3 0.1 13592:20 systemd-journal
8978 root 20 0 7898304 1.205g 14804 S 22.3 7.8 56:33.51 java
10214 root 20 0 8065148 1.695g 14800 S 1.3 10.9 15:43.18 java
1038 root 10 -10 128800 12248 9300 S 1.0 0.1 294:46.14 AliYunDun
9605 root 20 0 7970496 1.689g 14784 S 1.0 10.9 4:29.24 java
3 root 20 0 0 0 0 S 0.3 0.0 12:25.10 ksoftirqd/0
9 root 20 0 0 0 0 S 0.3 0.0 43:15.84 rcu_sched
13 root 20 0 0 0 0 S 0.3 0.0 14:10.83 ksoftirqd/1
18 root 20 0 0 0 0 S 0.3 0.0 13:32.39 ksoftirqd/2
23 root 20 0 0 0 0 S 0.3 0.0 16:09.67 ksoftirqd/3
1044 root 20 0 263504 41520 5936 S 0.3 0.3 40:29.56 ilogtail
1 root 20 0 43384 3788 2496 S 0.0 0.0 0:25.62 systemd
可以看到,占用最高的是 java 进程 PID 为 15980 占用了 175.7%
查看进程中最耗 cpu 的子线程
[log@task-a-shprod-1 ~]$ top -Hp 15980
top - 12:01:25 up 20 days, 19:48, 1 user, load average: 4.98, 2.55, 2.64
Threads: 58 total, 2 running, 56 sleeping, 0 stopped, 0 zombie
%Cpu(s): 65.4 us, 15.2 sy, 0.0 ni, 17.2 id, 0.1 wa, 0.0 hi, 2.1 si, 0.0 st
KiB Mem : 16267724 total, 322392 free, 8380124 used, 7565208 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 7527436 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16131 root 20 0 8375964 3.223g 14824 S 44.9 20.8 651:07.49 java
16130 root 20 0 8375964 3.223g 14824 S 35.9 20.8 628:32.23 java
16132 root 20 0 8375964 3.223g 14824 R 30.9 20.8 569:00.13 java
16133 root 20 0 8375964 3.223g 14824 S 25.9 20.8 638:04.25 java
16129 root 20 0 8375964 3.223g 14824 R 12.0 20.8 678:13.62 java
15982 root 20 0 8375964 3.223g 14824 S 0.7 20.8 12:06.16 java
15983 root 20 0 8375964 3.223g 14824 S 0.7 20.8 12:07.24 java
16149 root 20 0 8375964 3.223g 14824 S 0.7 20.8 25:09.56 java
15984 root 20 0 8375964 3.223g 14824 S 0.3 20.8 12:07.52 java
15985 root 20 0 8375964 3.223g 14824 S 0.3 20.8 12:04.10 java
15987 root 20 0 8375964 3.223g 14824 S 0.3 20.8 5:59.05 java
将最耗 cpu 的线程 id 转换为 16 进制输出
[log@task-a-shprod-1 ~]$ printf "%x \n" 16131
3f03
查询具体出现问题的代码位置
[log@task-a-shprod-1 ~]$ jstack 15980 | grep 3f03 -A 30
便可以定位出问题代码了