背景
近期我开发的一个C程序,在生产环境产生了coredump
,然而在调试该core
文件时,打出的debug
信息并不全。
这种debug
信息失落,其实说白了,就是符号表失落。个别由两种状况造成,一种是编译的时候没有加-g
参数,另一种是dwarf
版本不对。
首先排除第一种可能,因为编译脚本是我本人写的,-g
参数是有的。而惟一可能出问题的中央,就是dwarf
版本不对。
而之所以呈现dwarf
版本不对,还是编译环境的问题。我为了兼容编译C++17
规范的另外一个cpp
我的项目,就对编译环境做了容器化解决,在镜像里装置了gcc11.3
,而在生产环境应用的时候,gdb
版本依然是4.8.5
,因为gcc
版本和gdb
版本不匹配,就造成了该问题的呈现。
为了验证这一点,我在物理机上重现了这种景象:
[root@ck08 ctest]# gcore `pidof flow`Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow][New LWP 3048][New LWP 3047][New LWP 3046][New LWP 3045][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".0x00007f50dfd850e3 in epoll_wait () from /lib64/libc.so.6warning: target file /proc/3044/cmdline contained unexpected null charactersSaved corefile core.3044[Inferior 1 (process 3044) detached]
我的物理机的gdb
版本也是4.8.5
, 我应用gcore
命令生成core
文件的时候,呈现了上面的正告:Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)
,这句话从字面意思很好了解,就是说,gdb
反对的dwarf
版本应该是2
,3
,或者4
,然而以后二进制文件的dwarf
版本是5
,无奈调试。
那么,何为dwarf
?什么又是dwarf
版本呢?
何为dwarf
所谓的dwarf
,它是一种文件调试的格局。你能够将其简略了解为调试信息的组织模式。除了dwarf
之外,常见的调试格局还有stabs
, COFF
, pdb
等。
除了pdb
这种windows
专用的调试格局外,绝大多数的调试格局都是反对Unix
零碎的。但随着工夫的推移,逐步被dwarf
一统江山,被各大支流编译器所反对。其余的一些调试格局尽管还零星存在,但也是苟延残喘,有名无实。
说到dwarf
本身的倒退,也是经验了好几个阶段,从1992年推出至今,曾经迭代了5个版本。其中,dwarf1
作为第一个版本,构造不紧凑,性能不成熟,很多编译器都曾经不反对。dwarf2
是1993年PLSIG
机构在初版的根底上做了一些优化,缩小了调试信息的大小,但只是有一个草案,并没有正式公布。
第一个正式公布的dwarf
版本是Free Standards Group
于2005年公布的dwarf3
,该机构并于2010年公布了dwarf4
。目前最新的dwarf
版本是2017年公布的dwarf5
。
官网说法是这样的:
Produce debugging information in DWARF format (if that is supported). The value of version may be either 2, 3, 4 or 5; the default version for most targets is 5 (with the exception of VxWorks, TPF and Darwin/Mac OS X, which default to version 2, and AIX, which defaults to version 4).
Note that with DWARF Version 2, some ports require and always use some non-conflicting DWARF 3 extensions in the unwind tables.
Version 4 may require GDB 7.0 and
-fvar-tracking-assignments
for maximum benefit. Version 5 requires GDB 8.0 or higher.GCC no longer supports DWARF Version 1, which is substantially different than Version 2 and later. For historical reasons, some other DWARF-related options such as
-fno-dwarf2-cfi-asm
) retain a reference to DWARF Version 2 in their names, but apply to all currently-supported versions of DWARF.
对于dwarf
的调试文件格式,本文就不多做介绍了,如果开展来说,一个专题远远不够。但须要明确的是,各个dwarf
版本之间,数据格式也是有所区别的,这也就造成了彼此之间的不兼容,因而才会呈现文章结尾呈现的问题。
如何指定dwarf版本
那么,起因定位到了,咱们如何解决这个问题呢?
难不成,我须要降级gcc
版本?总不能逼着客户去降级生产环境的gdb
版本吧?这显著都是不事实的。
不过好在gcc
编译器提供了指定dwarf
版本的选项。咱们只须要在编译时,减少-gdwarf-version
选项即可。
为了演示指定dwarf
版本,我在这里筹备了一个demo
。
C程序如下:
//hello.c#include <stdio.h>int main(void){ char *p = "hello"; printf("p = %s\n", p); p[3] = 'M'; printf("p = %s\n", p); return 0;}
容器内gcc
版本如下:
[root@5b2c03891f42 tmp]# gcc -vUsing built-in specs.COLLECT_GCC=gccCOLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/11.3.0/lto-wrapperTarget: x86_64-pc-linux-gnuConfigured with: ./configure --enable-languages=c,c++Thread model: posixSupported LTO compression algorithms: zlibgcc version 11.3.0 (GCC)
在容器内编译:
gcc -o hello hello.c -g
该程序肯定会产生core
文件。咱们在容器外运行,此时,这个core
文件是无奈调试的:
[root@ck08 ctest]# ulimit -c unlimited[root@ck08 ctest]# ./hello p = helloSegmentation fault (core dumped)[root@ck08 ctest]# gdb ./hello core.30856 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/ctest/hello...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/ctest/hello](no debugging symbols found)...done.[New LWP 30856]Core was generated by `./hello'.Program terminated with signal 11, Segmentation fault.#0 0x0000000000401164 in main ()Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0 0x0000000000401164 in main ()(gdb)
咱们尝试指定dwarf版本编译:
gcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments -o hello hello.c
其中:
-gdwarf-4
指定dwarf版本为4-fvar-tracking-assignments
在编译的晚期对用户变量的赋值进行正文,并尝试在整个编译过程中将正文始终连续到最初,以尝试在优化的同时改良调试信息。-gstrict-dwarf
禁用更高版本的的dwarf
扩大,转而应用指定的dwarf
版本的扩大
此时咱们能够看到,可能失常调试了。
通过上述的演示,实践上咱们只须要在我的项目编译时,指定dwarf
版本,就能够失常调试了。
然而,如果问题如此简略就能解决,那仿佛没有必要专门写一篇文章的必要,事实上,我在应用的时候,又遇到了比拟玄学的问题。
玄之又玄
截取局部编译输入,能够看到,我确实应用了dwarf-4
版本:
然而咱们在运行时,发现依然报Dwarf Error
:
[root@ck08 flow]# gdb ./flow core.10772 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/flow/flow...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow](no debugging symbols found)...done.[New LWP 10773][New LWP 10774][New LWP 10775][New LWP 10776][New LWP 10772][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Core was generated by `./flow'.#0 0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0 0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1 0x00000000004117d5 in nxlog_worker_thread ()#2 0x000000000040cdd5 in _thread_helper ()#3 0x00007f13b9ae3ea5 in start_thread () from /lib64/libpthread.so.0#4 0x00007f13b9400b0d in clone () from /lib64/libc.so.6(gdb)
那么,问题出在哪呢?为什么设置了dwarf
版本,然而不失效?
为了实锤咱们设置的dwarf
版本的确失效了,我应用objdump
命令查看了一下:
[root@ck08 flow]# objdump --dwarf=info ./flow|more./flow: file format elf64-x86-64Contents of the .debug_info section: Compilation Unit @ offset 0x0: Length: 0x3e07 (32-bit) Version: 4 Abbrev Offset: 0x0 Pointer Size: 8 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit) <c> DW_AT_producer : (indirect string, offset: 0x31f): GNU C17 11.3.0 -mtune=generic -march=x86-64 -g -gdwarf-4 -gstrict-dwarf -O2 -fPIC <10> DW_AT_language : 12 (ANSI C99) <11> DW_AT_name : (indirect string, offset: 0x16ac): src/core/protocol.c <15> DW_AT_comp_dir : (indirect string, offset: 0x1c15): /tmp <19> DW_AT_low_pc : 0x4090c0 <21> DW_AT_high_pc : 0x127c <29> DW_AT_stmt_list : 0x0
这里,能看到src/core/protocol.c
文件编译进去的二进制文件,dwarf
版本的确是4
。那么,为什么gdb
调试依然会报dwarf
版本是5呢?
那么,会不会是程序依赖的第三方库应用了dwarf-5
?
带着疑难,我查看了一下所有的version
:
发现的确有局部二进制文件应用到了dwarf-5
版本。
先把dwarf
的.debug-info
导出来:
objdump --dwarf=info ./flow > dwarf.info
间接定位到754527
行:
能够定位到,是在编译bzip2
库的时候,呈现了dwarf-5
的版本。
为了验证我的猜测,我间接到容器里找到了libbz2
,果然它就是罪魁祸首。
[root@5703f261ff2b lib]# objdump --dwarf=info libbz2.a|grep Version Version: 5 Version: 5 Version: 5 Version: 5 Version: 5 Version: 5 Version: 5 <1760> DW_AT_name : (indirect string, offset: 0x650): BZ2_bzlibVersion[root@5703f261ff2b lib]#
那么问题来了,我是在容器里编译第三方依赖的,在编译之前对立设置过CC
环境变量:
[root@5703f261ff2b tmp]# echo $CCgcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments
截取局部Dockerfile
内容:
从Dockerfile
可知,咱们先设置了CC
,而后顺次编译openssl
, libapr
, bzip2
,那为什么其余的依赖都没有问题,单单bzip2
没有失效呢?
[root@5703f261ff2b lib]# objdump --dwarf=info libssl.a|grep Version Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4
所以仿佛还要到bzip2
源码自身去找起因。于是我从新解压了bzip2
的源码包,发现它是没有configure
文件的,只有一个Makefile
,关上Makefile
,发现了端倪:
尽管咱们在里面设置了CC
的值,然而在Makefile
里又将其笼罩掉了,应用的是gcc
的默认dwarf
版本,而咱们的gcc
是11.3
,所以默认应用了dwarf-5
版本。
这里,显著看到bzip2
开发者省了个懒,其实比拟平安一点的写法应该是:
CC ?= gcc
咱们将Makefile
批改一下,从新编译,发现后果正确了:
[root@5703f261ff2b bzip2-1.0.8]# objdump --dwarf=info libbz2.a|grep Version Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 Version: 4 <1482> DW_AT_name : (indirect string, offset: 0x60c): BZ2_bzlibVersion
我应用新的bzip2
库编译了一下程序,这时应用gcore
生成core
文件,曾经不会报Dwarf Error
了:
[root@ck08 flow]# gcore `pidof flow`[New LWP 25963][New LWP 25962][New LWP 25961][New LWP 25960][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".0x00007f704555fb43 in select () from /lib64/libc.so.6warning: target file /proc/25959/cmdline contained unexpected null charactersSaved corefile core.25959[Inferior 1 (process 25959) detached]
应用gdb
调试这个core
文件也能拿到具体的调试信息:
[root@ck08 flow]# gdb ./flow core.25959GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/flow/flow...done.[New LWP 25960][New LWP 25961][New LWP 25962][New LWP 25963][New LWP 25959][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Core was generated by `./flow'.#0 0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0 0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0#1 0x000000000049b731 in apr_file_open (new=0x7f7034003320, fname=0x7f7034002ad0 "/root/chenyc/test/dc/mave/probes/itoa-flow/data/utf-8_nolb.log", flag=1, perm=<optimized out>, pool=0x7f7034003288) at file_io/unix/open.c:176#2 0x000000000041c1b9 in im_file_ext_input_open (module=0x2313a00, file=0x7f7045253fd8, finfo=0x7f704524eaa0, readfromlast=false, existed=true) at src/modules/input/fileExt/im_fileExt.c:976#3 0x000000000041f51f in im_file_ext_check_file (module=<optimized out>, file=<optimized out>, fname=<optimized out>, pool=<optimized out>) at src/modules/input/fileExt/im_fileExt.c:1315#4 0x0000000000420294 in im_file_ext_check_files (module=0x2313a00, active_only=<optimized out>) at src/modules/input/fileExt/im_fileExt.c:1475#5 0x000000000042076b in im_file_ext_read (module=0x2313a00) at src/modules/input/fileExt/im_fileExt.c:2981#6 0x00000000004208f8 in im_file_ext_event (module=0x2313a00, event=0x7f702c0008c0) at src/modules/input/fileExt/im_fileExt.c:3583#7 0x00000000004118da in nxlog_worker_thread (thd=0x22f1c08, data=<optimized out>) at src/core/nxlog.c:552#8 0x000000000040cdd5 in _thread_helper (thd=0x22f1c08, d=0x7ffc646c4050) at src/core/core.c:85#9 0x00007f7045c4bea5 in start_thread () from /lib64/libpthread.so.0#10 0x00007f7045568b0d in clone () from /lib64/libc.so.6(gdb)
总结
dwarf error
的问题,网上很多材料说得很含混,大多也都只知其一;不知其二,真要深入研究,还是有很多坑的。反正总之从以下几个思路进行切入,根本都能找到解决方向:
dwarf error
个别呈现在gcc
编译环境版本与gdb
调试环境版本不匹配导致,个别能够通过编译时指定dwarf
版本解决- 除了咱们本身的源码须要指定
dwarf
版本,程序所依赖的第三方库也须要应用指定的dwarf
版本进行编译
参考资料
- https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
- https://zhuanlan.zhihu.com/p/419908664