共计 10370 个字符,预计需要花费 26 分钟才能阅读完成。
背景
近期我开发的一个 C 程序,在生产环境产生了 coredump
,然而在调试该core
文件时,打出的 debug
信息并不全。
这种 debug
信息失落,其实说白了,就是符号表失落。个别由两种状况造成,一种是编译的时候没有加 -g
参数,另一种是 dwarf
版本不对。
首先排除第一种可能,因为编译脚本是我本人写的,-g
参数是有的。而惟一可能出问题的中央,就是 dwarf
版本不对。
而之所以呈现 dwarf
版本不对,还是编译环境的问题。我为了兼容编译 C++17
规范的另外一个 cpp
我的项目,就对编译环境做了容器化解决,在镜像里装置了 gcc11.3
, 而在生产环境应用的时候,gdb
版本依然是 4.8.5
, 因为gcc
版本和 gdb
版本不匹配,就造成了该问题的呈现。
为了验证这一点,我在物理机上重现了这种景象:
[root@ck08 ctest]# gcore `pidof flow` | |
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow] | |
[New LWP 3048] | |
[New LWP 3047] | |
[New LWP 3046] | |
[New LWP 3045] | |
[Thread debugging using libthread_db enabled] | |
Using host libthread_db library "/lib64/libthread_db.so.1". | |
0x00007f50dfd850e3 in epoll_wait () from /lib64/libc.so.6 | |
warning: target file /proc/3044/cmdline contained unexpected null characters | |
Saved corefile core.3044 | |
[Inferior 1 (process 3044) detached] |
我的物理机的 gdb
版本也是 4.8.5
,我应用gcore
命令生成 core
文件的时候,呈现了上面的正告:Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)
,这句话从字面意思很好了解,就是说,gdb
反对的 dwarf
版本应该是 2
,3
,或者4
,然而以后二进制文件的dwarf
版本是 5
,无奈调试。
那么,何为 dwarf
?什么又是dwarf
版本呢?
何为 dwarf
所谓的 dwarf
,它是一种文件调试的格局。你能够将其简略了解为调试信息的组织模式。除了dwarf
之外,常见的调试格局还有 stabs
,COFF
,pdb
等。
除了 pdb
这种 windows
专用的调试格局外,绝大多数的调试格局都是反对 Unix
零碎的。但随着工夫的推移,逐步被 dwarf
一统江山,被各大支流编译器所反对。其余的一些调试格局尽管还零星存在,但也是苟延残喘,有名无实。
说到 dwarf
本身的倒退,也是经验了好几个阶段,从 1992 年推出至今,曾经迭代了 5 个版本。其中,dwarf1
作为第一个版本,构造不紧凑,性能不成熟,很多编译器都曾经不反对。dwarf2
是 1993 年 PLSIG
机构在初版的根底上做了一些优化,缩小了调试信息的大小,但只是有一个草案,并没有正式公布。
第一个正式公布的 dwarf
版本是 Free Standards Group
于 2005 年公布的 dwarf3
,该机构并于 2010 年公布了dwarf4
。目前最新的dwarf
版本是 2017 年公布的 dwarf5
。
官网说法是这样的:
Produce debugging information in DWARF format (if that is supported). The value of version may be either 2, 3, 4 or 5; the default version for most targets is 5 (with the exception of VxWorks, TPF and Darwin/Mac OS X, which default to version 2, and AIX, which defaults to version 4).
Note that with DWARF Version 2, some ports require and always use some non-conflicting DWARF 3 extensions in the unwind tables.
Version 4 may require GDB 7.0 and
-fvar-tracking-assignments
for maximum benefit. Version 5 requires GDB 8.0 or higher.GCC no longer supports DWARF Version 1, which is substantially different than Version 2 and later. For historical reasons, some other DWARF-related options such as
-fno-dwarf2-cfi-asm
) retain a reference to DWARF Version 2 in their names, but apply to all currently-supported versions of DWARF.
对于 dwarf
的调试文件格式,本文就不多做介绍了,如果开展来说,一个专题远远不够。但须要明确的是,各个 dwarf
版本之间,数据格式也是有所区别的,这也就造成了彼此之间的不兼容,因而才会呈现文章结尾呈现的问题。
如何指定 dwarf 版本
那么,起因定位到了,咱们如何解决这个问题呢?
难不成,我须要降级 gcc
版本?总不能逼着客户去降级生产环境的 gdb
版本吧?这显著都是不事实的。
不过好在 gcc
编译器提供了指定 dwarf
版本的选项。咱们只须要在编译时,减少 -gdwarf-version
选项即可。
为了演示指定 dwarf
版本,我在这里筹备了一个demo
。
C 程序如下:
//hello.c | |
#include <stdio.h> | |
int main(void){ | |
char *p = "hello"; | |
printf("p = %s\n", p); | |
p[3] = 'M'; | |
printf("p = %s\n", p); | |
return 0; | |
} |
容器内 gcc
版本如下:
[root@5b2c03891f42 tmp]# gcc -v | |
Using built-in specs. | |
COLLECT_GCC=gcc | |
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/11.3.0/lto-wrapper | |
Target: x86_64-pc-linux-gnu | |
Configured with: ./configure --enable-languages=c,c++ | |
Thread model: posix | |
Supported LTO compression algorithms: zlib | |
gcc version 11.3.0 (GCC) |
在容器内编译:
gcc -o hello hello.c -g
该程序肯定会产生 core
文件。咱们在容器外运行,此时,这个 core
文件是无奈调试的:
[root@ck08 ctest]# ulimit -c unlimited | |
[root@ck08 ctest]# ./hello | |
p = hello | |
Segmentation fault (core dumped) | |
[root@ck08 ctest]# gdb ./hello core.30856 | |
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 | |
Copyright (C) 2013 Free Software Foundation, Inc. | |
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> | |
This is free software: you are free to change and redistribute it. | |
There is NO WARRANTY, to the extent permitted by law. Type "show copying" | |
and "show warranty" for details. | |
This GDB was configured as "x86_64-redhat-linux-gnu". | |
For bug reporting instructions, please see: | |
<http://www.gnu.org/software/gdb/bugs/>... | |
Reading symbols from /root/chenyc/src/ctest/hello...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/ctest/hello] | |
(no debugging symbols found)...done. | |
[New LWP 30856] | |
Core was generated by `./hello'. | |
Program terminated with signal 11, Segmentation fault. | |
#0 0x0000000000401164 in main () | |
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 | |
(gdb) bt | |
#0 0x0000000000401164 in main () | |
(gdb) |
咱们尝试指定 dwarf 版本编译:
gcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments -o hello hello.c
其中:
-gdwarf-4
指定 dwarf 版本为 4-fvar-tracking-assignments
在编译的晚期对用户变量的赋值进行正文,并尝试在整个编译过程中将正文始终连续到最初,以尝试在优化的同时改良调试信息。-gstrict-dwarf
禁用更高版本的的dwarf
扩大,转而应用指定的dwarf
版本的扩大
此时咱们能够看到,可能失常调试了。通过上述的演示,实践上咱们只须要在我的项目编译时,指定
dwarf
版本,就能够失常调试了。
然而,如果问题如此简略就能解决,那仿佛没有必要专门写一篇文章的必要,事实上,我在应用的时候,又遇到了比拟玄学的问题。
玄之又玄
截取局部编译输入,能够看到,我确实应用了 dwarf-4
版本:
然而咱们在运行时,发现依然报Dwarf Error
:
[root@ck08 flow]# gdb ./flow core.10772 | |
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 | |
Copyright (C) 2013 Free Software Foundation, Inc. | |
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> | |
This is free software: you are free to change and redistribute it. | |
There is NO WARRANTY, to the extent permitted by law. Type "show copying" | |
and "show warranty" for details. | |
This GDB was configured as "x86_64-redhat-linux-gnu". | |
For bug reporting instructions, please see: | |
<http://www.gnu.org/software/gdb/bugs/>... | |
Reading symbols from /root/chenyc/src/flow/flow...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow] | |
(no debugging symbols found)...done. | |
[New LWP 10773] | |
[New LWP 10774] | |
[New LWP 10775] | |
[New LWP 10776] | |
[New LWP 10772] | |
[Thread debugging using libthread_db enabled] | |
Using host libthread_db library "/lib64/libthread_db.so.1". | |
Core was generated by `./flow'. | |
#0 0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 | |
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 | |
(gdb) bt | |
#0 0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 | |
#1 0x00000000004117d5 in nxlog_worker_thread () | |
#2 0x000000000040cdd5 in _thread_helper () | |
#3 0x00007f13b9ae3ea5 in start_thread () from /lib64/libpthread.so.0 | |
#4 0x00007f13b9400b0d in clone () from /lib64/libc.so.6 | |
(gdb) |
那么,问题出在哪呢?为什么设置了 dwarf
版本,然而不失效?
为了实锤咱们设置的 dwarf
版本的确失效了,我应用 objdump
命令查看了一下:
[root@ck08 flow]# objdump --dwarf=info ./flow|more | |
./flow: file format elf64-x86-64 | |
Contents of the .debug_info section: | |
Compilation Unit @ offset 0x0: | |
Length: 0x3e07 (32-bit) | |
Version: 4 | |
Abbrev Offset: 0x0 | |
Pointer Size: 8 | |
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit) | |
<c> DW_AT_producer : (indirect string, offset: 0x31f): GNU C17 11.3.0 -mtune=generic -march=x86-64 -g -gdwarf-4 -gstrict-dwa | |
rf -O2 -fPIC | |
<10> DW_AT_language : 12 (ANSI C99) | |
<11> DW_AT_name : (indirect string, offset: 0x16ac): src/core/protocol.c | |
<15> DW_AT_comp_dir : (indirect string, offset: 0x1c15): /tmp | |
<19> DW_AT_low_pc : 0x4090c0 | |
<21> DW_AT_high_pc : 0x127c | |
<29> DW_AT_stmt_list : 0x0 |
这里,能看到 src/core/protocol.c
文件编译进去的二进制文件,dwarf
版本的确是 4
。那么,为什么gdb
调试依然会报 dwarf
版本是 5 呢?
那么,会不会是程序依赖的第三方库应用了 dwarf-5
?
带着疑难,我查看了一下所有的version
:
发现的确有局部二进制文件应用到了 dwarf-5
版本。
先把 dwarf
的.debug-info
导出来:
objdump --dwarf=info ./flow > dwarf.info
间接定位到 754527
行:
能够定位到,是在编译 bzip2
库的时候,呈现了 dwarf-5
的版本。
为了验证我的猜测,我间接到容器里找到了libbz2
, 果然它就是罪魁祸首。
[root@5703f261ff2b lib]# objdump --dwarf=info libbz2.a|grep Version | |
Version: 5 | |
Version: 5 | |
Version: 5 | |
Version: 5 | |
Version: 5 | |
Version: 5 | |
Version: 5 | |
<1760> DW_AT_name : (indirect string, offset: 0x650): BZ2_bzlibVersion | |
[root@5703f261ff2b lib]# |
那么问题来了,我是在容器里编译第三方依赖的,在编译之前对立设置过 CC
环境变量:
[root@5703f261ff2b tmp]# echo $CC | |
gcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments |
截取局部 Dockerfile
内容:
从 Dockerfile
可知,咱们先设置了 CC
,而后顺次编译openssl
,libapr
,bzip2
,那为什么其余的依赖都没有问题,单单bzip2
没有失效呢?
[root@5703f261ff2b lib]# objdump --dwarf=info libssl.a|grep Version | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 |
所以仿佛还要到 bzip2
源码自身去找起因。于是我从新解压了 bzip2
的源码包,发现它是没有 configure
文件的,只有一个Makefile
,关上Makefile
,发现了端倪:
尽管咱们在里面设置了 CC
的值,然而在 Makefile
里又将其笼罩掉了,应用的是 gcc
的默认 dwarf
版本,而咱们的 gcc
是11.3
,所以默认应用了 dwarf-5
版本。
这里,显著看到 bzip2
开发者省了个懒,其实比拟平安一点的写法应该是:
CC ?= gcc
咱们将 Makefile
批改一下,从新编译,发现后果正确了:
[root@5703f261ff2b bzip2-1.0.8]# objdump --dwarf=info libbz2.a|grep Version | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
Version: 4 | |
<1482> DW_AT_name : (indirect string, offset: 0x60c): BZ2_bzlibVersion |
我应用新的 bzip2
库编译了一下程序,这时应用 gcore
生成 core
文件,曾经不会报 Dwarf Error
了:
[root@ck08 flow]# gcore `pidof flow` | |
[New LWP 25963] | |
[New LWP 25962] | |
[New LWP 25961] | |
[New LWP 25960] | |
[Thread debugging using libthread_db enabled] | |
Using host libthread_db library "/lib64/libthread_db.so.1". | |
0x00007f704555fb43 in select () from /lib64/libc.so.6 | |
warning: target file /proc/25959/cmdline contained unexpected null characters | |
Saved corefile core.25959 | |
[Inferior 1 (process 25959) detached] |
应用 gdb
调试这个 core
文件也能拿到具体的调试信息:
[root@ck08 flow]# gdb ./flow core.25959 | |
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 | |
Copyright (C) 2013 Free Software Foundation, Inc. | |
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> | |
This is free software: you are free to change and redistribute it. | |
There is NO WARRANTY, to the extent permitted by law. Type "show copying" | |
and "show warranty" for details. | |
This GDB was configured as "x86_64-redhat-linux-gnu". | |
For bug reporting instructions, please see: | |
<http://www.gnu.org/software/gdb/bugs/>... | |
Reading symbols from /root/chenyc/src/flow/flow...done. | |
[New LWP 25960] | |
[New LWP 25961] | |
[New LWP 25962] | |
[New LWP 25963] | |
[New LWP 25959] | |
[Thread debugging using libthread_db enabled] | |
Using host libthread_db library "/lib64/libthread_db.so.1". | |
Core was generated by `./flow'. | |
#0 0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0 | |
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 | |
(gdb) bt | |
#0 0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0 | |
#1 0x000000000049b731 in apr_file_open (new=0x7f7034003320, | |
fname=0x7f7034002ad0 "/root/chenyc/test/dc/mave/probes/itoa-flow/data/utf-8_nolb.log", flag=1, perm=<optimized out>, | |
pool=0x7f7034003288) at file_io/unix/open.c:176 | |
#2 0x000000000041c1b9 in im_file_ext_input_open (module=0x2313a00, file=0x7f7045253fd8, finfo=0x7f704524eaa0, readfromlast=false, | |
existed=true) at src/modules/input/fileExt/im_fileExt.c:976 | |
#3 0x000000000041f51f in im_file_ext_check_file (module=<optimized out>, file=<optimized out>, fname=<optimized out>, | |
pool=<optimized out>) at src/modules/input/fileExt/im_fileExt.c:1315 | |
#4 0x0000000000420294 in im_file_ext_check_files (module=0x2313a00, active_only=<optimized out>) | |
at src/modules/input/fileExt/im_fileExt.c:1475 | |
#5 0x000000000042076b in im_file_ext_read (module=0x2313a00) at src/modules/input/fileExt/im_fileExt.c:2981 | |
#6 0x00000000004208f8 in im_file_ext_event (module=0x2313a00, event=0x7f702c0008c0) at src/modules/input/fileExt/im_fileExt.c:3583 | |
#7 0x00000000004118da in nxlog_worker_thread (thd=0x22f1c08, data=<optimized out>) at src/core/nxlog.c:552 | |
#8 0x000000000040cdd5 in _thread_helper (thd=0x22f1c08, d=0x7ffc646c4050) at src/core/core.c:85 | |
#9 0x00007f7045c4bea5 in start_thread () from /lib64/libpthread.so.0 | |
#10 0x00007f7045568b0d in clone () from /lib64/libc.so.6 | |
(gdb) |
总结
dwarf error
的问题,网上很多材料说得很含混,大多也都只知其一; 不知其二,真要深入研究,还是有很多坑的。反正总之从以下几个思路进行切入,根本都能找到解决方向:
dwarf error
个别呈现在gcc
编译环境版本与gdb
调试环境版本不匹配导致,个别能够通过编译时指定dwarf
版本解决- 除了咱们本身的源码须要指定
dwarf
版本,程序所依赖的第三方库也须要应用指定的dwarf
版本进行编译
参考资料
- https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
- https://zhuanlan.zhihu.com/p/419908664