关于.net:如何在-NET-程序万种死法中有效的生成-Dump-上

9次阅读

共计 6197 个字符,预计需要花费 16 分钟才能阅读完成。

一:背景

置信很多人都晓得通过 工作管理器 抓取 dump,尽管简略粗犷,但无奈满足程序的无数种死法,比方:

  • 内存收缩,程序爆炸
  • CPU 爆高,程序累死
  • 利用无响应,用户气死
  • 意外退出,和人生一样

既然手工太弱鸡,那有什么好的工具呢?除了 adplus,本文举荐一款神器 procdump, 下载地址:https://docs.microsoft.com/zh… , 还能反对 linux 😘😘😘,具体怎么装置就不细说了。

二:内存收缩,程序爆炸

内存收缩 这种状况我置信很有敌人都遇到过,我见过最多的案例就是用了小缓存 static,而后有意无意的遗记开释,导致有限沉积终爆炸,那这种怎么用 procdump 去抓呢?

为了不便演示,我先写一个有限分配内存的例子。


        static void Main(string[] args)
        {List<string> list = new List<string>();

            for (int i = 0; i < int.MaxValue; i++)
            {list.Add(string.Join(",", Enumerable.Range(0, 10000)));
            }

            Console.ReadLine();}

将程序跑起来后,设置 procdump 在内存超过 1G 的时候主动抓取全内存 dump,应用如下命令.


C:\Windows\system32>procdump  ConsoleApp2 -m 1024 -ma E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug

ProcDump v10.0 - Sysinternals process dump utility
Copyright (C) 2009-2020 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Process:               ConsoleApp2.exe (24112)
Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe
CPU threshold:         n/a
Performance counter:   n/a
Commit threshold:      >= 1024 MB
Threshold seconds:     10
Hung window check:     Disabled
Log debug strings:     Disabled
Exception monitor:     Disabled
Exception filter:      [Includes]
                       *
                       [Excludes]
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       1
Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS
Queue to WER:          Disabled
Kill after dump:       Disabled


Press Ctrl-C to end monitoring without terminating the process.

[21:23:43] Commit:    1087Mb
[21:23:43] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_212343.dmp
[21:23:43] Dump 1 writing: Estimated dump file size is 1179 MB.
[21:23:44] Dump 1 complete: 1179 MB written in 1.3 seconds
[21:23:44] Dump count reached.

从最初五行能够看出,当内存达到 1087M 的时候主动生成了 dump 文件,接下来用 windbg 看一看。

  • 查看以后 process 的内存占用量,应用 !address -summary 即可

0:000> !address -summary

                                     
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                     63          b30b4000 (2.798 GB)           69.94%
<unknown>                               228          48547000 (1.130 GB)  93.99%   28.25%
Image                                   210           4115000 (65.082 MB)   5.29%    1.59%
Stack                                    21            700000 (7.000 MB)   0.57%    0.17%
Heap                                     12            170000 (1.438 MB)   0.12%    0.04%
Other                                     7             5a000 (360.000 kB)   0.03%    0.01%
TEB                                       7             13000 (76.000 kB)   0.01%    0.00%
PEB                                       1              3000 (12.000 kB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE                             250          47121000 (1.110 GB)  92.36%   27.76%
MEM_IMAGE                               217           411e000 (65.117 MB)   5.29%    1.59%
MEM_MAPPED                               19           1cfd000 (28.988 MB)   2.35%    0.71%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                 63          b30b4000 (2.798 GB)           69.94%
MEM_COMMIT                              357          47f12000 (1.124 GB)  93.49%   28.10%
MEM_RESERVE                             129           502a000 (80.164 MB)   6.51%    1.96%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                          177          437d5000 (1.055 GB)  87.70%   26.36%
PAGE_EXECUTE_READ                        35           33c7000 (51.777 MB)   4.21%    1.26%
PAGE_READONLY                            90            c41000 (12.254 MB)   1.00%    0.30%
PAGE_WRITECOPY                           34            70b000 (7.043 MB)   0.57%    0.17%
PAGE_READWRITE|PAGE_GUARD                14             23000 (140.000 kB)   0.01%    0.00%
PAGE_EXECUTE_READWRITE                    7              7000 (28.000 kB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free                                        80010000          7f130000 (1.986 GB)
<unknown>                                   438e1000           200f000 (32.059 MB)
Image                                       660e0000            f55000 (15.332 MB)
Stack                                         e00000             fd000 (1012.000 kB)
Heap                                          c97000             98000 (608.000 kB)
Other                                       ff2c0000             33000 (204.000 kB)
TEB                                           990000              3000 (12.000 kB)
PEB                                           98d000              3000 (12.000 kB)

看到下面 PAGE_READWRITE 行的 (1.055 GB) 吗?和方才 Console 中的 1087M 一唱一和,没故障。

  • 寻找大对象,在托管堆中应用 !dumpheap -stat -min 1024 即可

||0:0:000> !dumpheap -stat -min 1024
Statistics:
      MT    Count    TotalSize Class Name
65d42788        2        13044 System.Object[]
65d42d74        2        98328 System.String[]
65d42c60       73      1082988 System.Char[]
65d424e4    11452   1119913984 System.String

从输入的最初一行能够看出,System.String 有 1w 多个,接下来能够减少 -type 属性筛选出 >10k 的字符串。


0:000> !dumpheap -type System.String -min 10240
 Address       MT     Size
03c75568 65d424e4    97792     
03c8d378 65d424e4    97792    
4a855060 65d424e4    97792     

Statistics:
      MT    Count    TotalSize Class Name
65d424e4    11452   1119913984 System.String
Total 11452 objects

0:000> !gcroot 4a855060
Thread 36e4:
*** WARNING: Unable to verify checksum for ConsoleApp2.exe
    00b3f358 012108d1 ConsoleApp2.Program.Main(System.String[]) [E:\net5\ConsoleApp1\ConsoleApp2\Program.cs @ 18]
        ebp+18: 00b3f370
            ->  02c71fd8 System.Collections.Generic.List`1[[System.String, mscorlib]]
            ->  02cce2ec System.String[]
            ->  4a855060 System.String

Found 1 unique roots (run '!GCRoot -all' to see all roots).

从最初的 !gcroot 看,的确是被 Program.cs:18 行的 List 所持有,到此上不着天; 下不着地。

三:CPU 爆高,程序累死

说起 CPU 爆高的案例,我发现更多的是在 非托管堆 上,比方 GC 回收,争抢锁等,很少有人能傻到在 托管层 上把 cpu 搞起来。

对了,剖析 CPU 爆高有一个小技巧,那就是间断抓 dump 快照,看两个 dump 中的线程运行状况,这时候就非常适合 procdump,先来看测试代码。


    class Program
    {static void Main(string[] args)
        {Parallel.For(0, int.MaxValue, (i) =>
            {while (true)
                {}});

            Console.ReadLine();}
    }

当初我设定 间断 5s 内 CPU 超过 70% 抓取 dump,直到 2 个为止


C:\Windows\system32>procdump  ConsoleApp2 -s 5 -n 2 -c 70 E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug

ProcDump v10.0 - Sysinternals process dump utility
Copyright (C) 2009-2020 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Process:               ConsoleApp2.exe (22152)
Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe
CPU threshold:         >= 70% of system
Performance counter:   n/a
Commit threshold:      n/a
Threshold seconds:     5
Hung window check:     Disabled
Log debug strings:     Disabled
Exception monitor:     Disabled
Exception filter:      [Includes]
                       *
                       [Excludes]
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       2
Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS
Queue to WER:          Disabled
Kill after dump:       Disabled


Press Ctrl-C to end monitoring without terminating the process.

[22:25:47] CPU: 95% 1s
[22:25:48] CPU: 100% 2s
[22:25:50] CPU: 96% 3s
[22:25:51] CPU: 98% 4s
[22:25:52] CPU: 99% 5s (Trigger)
[22:25:53] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222553.dmp
[22:25:54] Dump 1 complete: 5 MB written in 0.3 seconds
[22:25:56] CPU: 88% 1s
[22:25:58] CPU: 93% 2s
[22:26:00] CPU: 89% 3s
[22:26:02] CPU: 89% 4s
[22:26:04] CPU: 95% 5s (Trigger)
[22:26:05] Dump 2 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222605.dmp
[22:26:06] Dump 2 complete: 5 MB written in 0.4 seconds
[22:26:07] Dump count reached.

从最初输入中能够看到,间断 5s CPU 超过了 70% 抓取了 dump,总共来了 2 个。

当初 dump 有了,接下来用两个 windbg 实例关上,验证下 dump 的生成工夫,如下图所示:

从图中能够看到,两个 dump 生成工夫相隔 12s,而且通过 !runaway 发现上面的线程:

  • 14:2cb8
  • 19:3f8c

都运行了长达 10s,这阐明什么?阐明这二个线程应该在某个中央死循环了。。。对吧。。。

切到 14 号线程通过 !clrstack 看调用堆栈即可,都是死在 ConsoleApp2.Program+c.b__0_0(Int32) 这里出不来。。。

四:总结

感觉篇幅有点长了,就先说到这里吧,有趣味的话,能够把 procdump 拉下来玩一玩 🤭。

正文完
 0