一:背景

置信很多人都晓得通过 工作管理器 抓取dump,尽管简略粗犷,但无奈满足程序的无数种死法,比方:

  • 内存收缩,程序爆炸
  • CPU爆高,程序累死
  • 利用无响应,用户气死
  • 意外退出,和人生一样

既然手工太弱鸡,那有什么好的工具呢? 除了 adplus,本文举荐一款神器 procdump, 下载地址:https://docs.microsoft.com/zh... ,还能反对 linux ,具体怎么装置就不细说了。

二:内存收缩,程序爆炸

内存收缩 这种状况我置信很有敌人都遇到过,我见过最多的案例就是用了小缓存 static,而后有意无意的遗记开释,导致有限沉积终爆炸,那这种怎么用 procdump 去抓呢?

为了不便演示,我先写一个有限分配内存的例子。

        static void Main(string[] args)        {            List<string> list = new List<string>();            for (int i = 0; i < int.MaxValue; i++)            {                list.Add(string.Join(",", Enumerable.Range(0, 10000)));            }            Console.ReadLine();        }

将程序跑起来后,设置 procdump 在内存超过 1G 的时候主动抓取全内存 dump,应用如下命令.

C:\Windows\system32>procdump  ConsoleApp2 -m 1024 -ma E:\net5\ConsoleApp1\ConsoleApp2\bin\DebugProcDump v10.0 - Sysinternals process dump utilityCopyright (C) 2009-2020 Mark Russinovich and Andrew RichardsSysinternals - www.sysinternals.comProcess:               ConsoleApp2.exe (24112)Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exeCPU threshold:         n/aPerformance counter:   n/aCommit threshold:      >= 1024 MBThreshold seconds:     10Hung window check:     DisabledLog debug strings:     DisabledException monitor:     DisabledException filter:      [Includes]                       *                       [Excludes]Terminate monitor:     DisabledCloning type:          DisabledConcurrent limit:      n/aAvoid outage:          n/aNumber of dumps:       1Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSSQueue to WER:          DisabledKill after dump:       DisabledPress Ctrl-C to end monitoring without terminating the process.[21:23:43] Commit:    1087Mb[21:23:43] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_212343.dmp[21:23:43] Dump 1 writing: Estimated dump file size is 1179 MB.[21:23:44] Dump 1 complete: 1179 MB written in 1.3 seconds[21:23:44] Dump count reached.

从最初五行能够看出,当内存达到 1087M 的时候主动生成了 dump 文件,接下来用 windbg 看一看。

  • 查看以后 process 的内存占用量,应用 !address -summary 即可
0:000> !address -summary                                     Mapping file section regions...Mapping module regions...Mapping PEB regions...Mapping TEB and stack regions...Mapping heap regions...Mapping page heap regions...Mapping other regions...Mapping stack trace database regions...Mapping activation context regions...--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalFree                                     63          b30b4000 (   2.798 GB)           69.94%<unknown>                               228          48547000 (   1.130 GB)  93.99%   28.25%Image                                   210           4115000 (  65.082 MB)   5.29%    1.59%Stack                                    21            700000 (   7.000 MB)   0.57%    0.17%Heap                                     12            170000 (   1.438 MB)   0.12%    0.04%Other                                     7             5a000 ( 360.000 kB)   0.03%    0.01%TEB                                       7             13000 (  76.000 kB)   0.01%    0.00%PEB                                       1              3000 (  12.000 kB)   0.00%    0.00%--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_PRIVATE                             250          47121000 (   1.110 GB)  92.36%   27.76%MEM_IMAGE                               217           411e000 (  65.117 MB)   5.29%    1.59%MEM_MAPPED                               19           1cfd000 (  28.988 MB)   2.35%    0.71%--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_FREE                                 63          b30b4000 (   2.798 GB)           69.94%MEM_COMMIT                              357          47f12000 (   1.124 GB)  93.49%   28.10%MEM_RESERVE                             129           502a000 (  80.164 MB)   6.51%    1.96%--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotalPAGE_READWRITE                          177          437d5000 (   1.055 GB)  87.70%   26.36%PAGE_EXECUTE_READ                        35           33c7000 (  51.777 MB)   4.21%    1.26%PAGE_READONLY                            90            c41000 (  12.254 MB)   1.00%    0.30%PAGE_WRITECOPY                           34            70b000 (   7.043 MB)   0.57%    0.17%PAGE_READWRITE|PAGE_GUARD                14             23000 ( 140.000 kB)   0.01%    0.00%PAGE_EXECUTE_READWRITE                    7              7000 (  28.000 kB)   0.00%    0.00%--- Largest Region by Usage ----------- Base Address -------- Region Size ----------Free                                        80010000          7f130000 (   1.986 GB)<unknown>                                   438e1000           200f000 (  32.059 MB)Image                                       660e0000            f55000 (  15.332 MB)Stack                                         e00000             fd000 (1012.000 kB)Heap                                          c97000             98000 ( 608.000 kB)Other                                       ff2c0000             33000 ( 204.000 kB)TEB                                           990000              3000 (  12.000 kB)PEB                                           98d000              3000 (  12.000 kB)

看到下面 PAGE_READWRITE 行的 (1.055 GB) 吗? 和方才 Console 中的 1087M 一唱一和,没故障。

  • 寻找大对象,在托管堆中应用 !dumpheap -stat -min 1024 即可
||0:0:000> !dumpheap -stat -min 1024Statistics:      MT    Count    TotalSize Class Name65d42788        2        13044 System.Object[]65d42d74        2        98328 System.String[]65d42c60       73      1082988 System.Char[]65d424e4    11452   1119913984 System.String

从输入的最初一行能够看出,System.String 有1w多个,接下来能够减少 -type 属性筛选出 >10k 的字符串。

0:000> !dumpheap -type System.String -min 10240 Address       MT     Size03c75568 65d424e4    97792     03c8d378 65d424e4    97792    4a855060 65d424e4    97792     Statistics:      MT    Count    TotalSize Class Name65d424e4    11452   1119913984 System.StringTotal 11452 objects0:000> !gcroot 4a855060Thread 36e4:*** WARNING: Unable to verify checksum for ConsoleApp2.exe    00b3f358 012108d1 ConsoleApp2.Program.Main(System.String[]) [E:\net5\ConsoleApp1\ConsoleApp2\Program.cs @ 18]        ebp+18: 00b3f370            ->  02c71fd8 System.Collections.Generic.List`1[[System.String, mscorlib]]            ->  02cce2ec System.String[]            ->  4a855060 System.StringFound 1 unique roots (run '!GCRoot -all' to see all roots).

从最初的 !gcroot 看,的确是被 Program.cs:18 行的 List 所持有,到此上不着天;下不着地。

三:CPU爆高,程序累死

说起CPU爆高的案例,我发现更多的是在 非托管堆 上,比方GC回收,争抢锁等,很少有人能傻到在 托管层 上把cpu搞起来。

对了,剖析CPU 爆高有一个小技巧,那就是间断抓 dump 快照,看两个 dump 中的线程运行状况,这时候就非常适合 procdump,先来看测试代码。

    class Program    {        static void Main(string[] args)        {            Parallel.For(0, int.MaxValue, (i) =>            {                while (true)                {                }            });            Console.ReadLine();        }    }

当初我设定 间断 5s 内 CPU 超过 70% 抓取 dump,直到 2 个为止

C:\Windows\system32>procdump  ConsoleApp2 -s 5 -n 2 -c 70 E:\net5\ConsoleApp1\ConsoleApp2\bin\DebugProcDump v10.0 - Sysinternals process dump utilityCopyright (C) 2009-2020 Mark Russinovich and Andrew RichardsSysinternals - www.sysinternals.comProcess:               ConsoleApp2.exe (22152)Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exeCPU threshold:         >= 70% of systemPerformance counter:   n/aCommit threshold:      n/aThreshold seconds:     5Hung window check:     DisabledLog debug strings:     DisabledException monitor:     DisabledException filter:      [Includes]                       *                       [Excludes]Terminate monitor:     DisabledCloning type:          DisabledConcurrent limit:      n/aAvoid outage:          n/aNumber of dumps:       2Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSSQueue to WER:          DisabledKill after dump:       DisabledPress Ctrl-C to end monitoring without terminating the process.[22:25:47] CPU: 95% 1s[22:25:48] CPU: 100% 2s[22:25:50] CPU: 96% 3s[22:25:51] CPU: 98% 4s[22:25:52] CPU: 99% 5s (Trigger)[22:25:53] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222553.dmp[22:25:54] Dump 1 complete: 5 MB written in 0.3 seconds[22:25:56] CPU: 88% 1s[22:25:58] CPU: 93% 2s[22:26:00] CPU: 89% 3s[22:26:02] CPU: 89% 4s[22:26:04] CPU: 95% 5s (Trigger)[22:26:05] Dump 2 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222605.dmp[22:26:06] Dump 2 complete: 5 MB written in 0.4 seconds[22:26:07] Dump count reached.

从最初输入中能够看到,间断 5s CPU 超过了 70% 抓取了 dump,总共来了2个。

当初 dump 有了,接下来用两个 windbg 实例关上,验证下 dump 的生成工夫,如下图所示:

从图中能够看到,两个 dump 生成工夫相隔 12s,而且通过 !runaway 发现上面的线程:

  • 14:2cb8
  • 19:3f8c
  • ...

都运行了长达 10s ,这阐明什么?阐明这二个线程应该在某个中央死循环了。。。对吧。。。

切到 14 号线程通过 !clrstack 看调用堆栈即可,都是死在 ConsoleApp2.Program+c.b__0_0(Int32) 这里出不来。。。

四:总结

感觉篇幅有点长了,就先说到这里吧,有趣味的话,能够把 procdump 拉下来玩一玩 。