背景

近期我开发的一个C程序,在生产环境产生了coredump,然而在调试该core文件时,打出的debug信息并不全。

这种debug信息失落,其实说白了,就是符号表失落。个别由两种状况造成,一种是编译的时候没有加-g参数,另一种是dwarf版本不对。
首先排除第一种可能,因为编译脚本是我本人写的,-g参数是有的。而惟一可能出问题的中央,就是dwarf版本不对。
而之所以呈现dwarf版本不对,还是编译环境的问题。我为了兼容编译C++17规范的另外一个cpp我的项目,就对编译环境做了容器化解决,在镜像里装置了gcc11.3,而在生产环境应用的时候,gdb版本依然是4.8.5,因为gcc版本和gdb版本不匹配,就造成了该问题的呈现。
为了验证这一点,我在物理机上重现了这种景象:

[root@ck08 ctest]# gcore `pidof flow`Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow][New LWP 3048][New LWP 3047][New LWP 3046][New LWP 3045][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".0x00007f50dfd850e3 in epoll_wait () from /lib64/libc.so.6warning: target file /proc/3044/cmdline contained unexpected null charactersSaved corefile core.3044[Inferior 1 (process 3044) detached]

我的物理机的gdb版本也是4.8.5, 我应用gcore命令生成core文件的时候,呈现了上面的正告:Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4),这句话从字面意思很好了解,就是说,gdb反对的dwarf版本应该是23,或者4,然而以后二进制文件的dwarf版本是5,无奈调试。
那么,何为dwarf?什么又是dwarf版本呢?

何为dwarf

所谓的dwarf,它是一种文件调试的格局。你能够将其简略了解为调试信息的组织模式。除了dwarf之外,常见的调试格局还有stabsCOFFpdb等。
除了pdb这种windows专用的调试格局外,绝大多数的调试格局都是反对Unix零碎的。但随着工夫的推移,逐步被dwarf一统江山,被各大支流编译器所反对。其余的一些调试格局尽管还零星存在,但也是苟延残喘,有名无实。
说到dwarf本身的倒退,也是经验了好几个阶段,从1992年推出至今,曾经迭代了5个版本。其中,dwarf1作为第一个版本,构造不紧凑,性能不成熟,很多编译器都曾经不反对。dwarf2是1993年PLSIG机构在初版的根底上做了一些优化,缩小了调试信息的大小,但只是有一个草案,并没有正式公布。
第一个正式公布的dwarf版本是Free Standards Group于2005年公布的dwarf3,该机构并于2010年公布了dwarf4。目前最新的dwarf版本是2017年公布的dwarf5
官网说法是这样的:

Produce debugging information in DWARF format (if that is supported). The value of version may be either 2, 3, 4 or 5; the default version for most targets is 5 (with the exception of VxWorks, TPF and Darwin/Mac OS X, which default to version 2, and AIX, which defaults to version 4).

Note that with DWARF Version 2, some ports require and always use some non-conflicting DWARF 3 extensions in the unwind tables.

Version 4 may require GDB 7.0 and -fvar-tracking-assignments for maximum benefit. Version 5 requires GDB 8.0 or higher.

GCC no longer supports DWARF Version 1, which is substantially different than Version 2 and later. For historical reasons, some other DWARF-related options such as -fno-dwarf2-cfi-asm) retain a reference to DWARF Version 2 in their names, but apply to all currently-supported versions of DWARF.

对于dwarf的调试文件格式,本文就不多做介绍了,如果开展来说,一个专题远远不够。但须要明确的是,各个dwarf版本之间,数据格式也是有所区别的,这也就造成了彼此之间的不兼容,因而才会呈现文章结尾呈现的问题。

如何指定dwarf版本

那么,起因定位到了,咱们如何解决这个问题呢?
难不成,我须要降级gcc版本?总不能逼着客户去降级生产环境的gdb版本吧?这显著都是不事实的。
不过好在gcc编译器提供了指定dwarf版本的选项。咱们只须要在编译时,减少-gdwarf-version选项即可。
为了演示指定dwarf版本,我在这里筹备了一个demo
C程序如下:

//hello.c#include <stdio.h>int main(void){        char *p = "hello";        printf("p = %s\n", p);        p[3] = 'M';        printf("p = %s\n", p);        return 0;}

容器内gcc版本如下:

[root@5b2c03891f42 tmp]# gcc -vUsing built-in specs.COLLECT_GCC=gccCOLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/11.3.0/lto-wrapperTarget: x86_64-pc-linux-gnuConfigured with: ./configure --enable-languages=c,c++Thread model: posixSupported LTO compression algorithms: zlibgcc version 11.3.0 (GCC) 

在容器内编译:

gcc -o hello hello.c -g

该程序肯定会产生core文件。咱们在容器外运行,此时,这个core文件是无奈调试的:

[root@ck08 ctest]# ulimit -c unlimited[root@ck08 ctest]# ./hello p = helloSegmentation fault (core dumped)[root@ck08 ctest]# gdb ./hello core.30856 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/ctest/hello...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/ctest/hello](no debugging symbols found)...done.[New LWP 30856]Core was generated by `./hello'.Program terminated with signal 11, Segmentation fault.#0  0x0000000000401164 in main ()Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0  0x0000000000401164 in main ()(gdb) 

咱们尝试指定dwarf版本编译:

gcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments -o hello hello.c

其中:

  • -gdwarf-4 指定dwarf版本为4
  • -fvar-tracking-assignments 在编译的晚期对用户变量的赋值进行正文,并尝试在整个编译过程中将正文始终连续到最初,以尝试在优化的同时改良调试信息。
  • -gstrict-dwarf 禁用更高版本的的dwarf扩大,转而应用指定的dwarf版本的扩大
    此时咱们能够看到,可能失常调试了。

    通过上述的演示,实践上咱们只须要在我的项目编译时,指定dwarf版本,就能够失常调试了。
    然而,如果问题如此简略就能解决,那仿佛没有必要专门写一篇文章的必要,事实上,我在应用的时候,又遇到了比拟玄学的问题。

玄之又玄

截取局部编译输入,能够看到,我确实应用了dwarf-4版本:

然而咱们在运行时,发现依然报Dwarf Error:

[root@ck08 flow]# gdb ./flow core.10772 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/flow/flow...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /root/chenyc/src/flow/flow](no debugging symbols found)...done.[New LWP 10773][New LWP 10774][New LWP 10775][New LWP 10776][New LWP 10772][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Core was generated by `./flow'.#0  0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0  0x00007f13b9ae7a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1  0x00000000004117d5 in nxlog_worker_thread ()#2  0x000000000040cdd5 in _thread_helper ()#3  0x00007f13b9ae3ea5 in start_thread () from /lib64/libpthread.so.0#4  0x00007f13b9400b0d in clone () from /lib64/libc.so.6(gdb) 

那么,问题出在哪呢?为什么设置了dwarf版本,然而不失效?
为了实锤咱们设置的dwarf版本的确失效了,我应用objdump命令查看了一下:

[root@ck08 flow]# objdump --dwarf=info ./flow|more./flow:     file format elf64-x86-64Contents of the .debug_info section:  Compilation Unit @ offset 0x0:   Length:        0x3e07 (32-bit)   Version:       4   Abbrev Offset: 0x0   Pointer Size:  8 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)    <c>   DW_AT_producer    : (indirect string, offset: 0x31f): GNU C17 11.3.0 -mtune=generic -march=x86-64 -g -gdwarf-4 -gstrict-dwarf -O2 -fPIC    <10>   DW_AT_language    : 12       (ANSI C99)    <11>   DW_AT_name        : (indirect string, offset: 0x16ac): src/core/protocol.c    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x1c15): /tmp    <19>   DW_AT_low_pc      : 0x4090c0    <21>   DW_AT_high_pc     : 0x127c    <29>   DW_AT_stmt_list   : 0x0

这里,能看到src/core/protocol.c文件编译进去的二进制文件,dwarf版本的确是4。那么,为什么gdb调试依然会报dwarf版本是5呢?
那么,会不会是程序依赖的第三方库应用了dwarf-5
带着疑难,我查看了一下所有的version

发现的确有局部二进制文件应用到了dwarf-5版本。
先把dwarf.debug-info导出来:

objdump --dwarf=info ./flow > dwarf.info

间接定位到754527行:

能够定位到,是在编译bzip2库的时候,呈现了dwarf-5的版本。
为了验证我的猜测,我间接到容器里找到了libbz2,果然它就是罪魁祸首。

[root@5703f261ff2b lib]# objdump --dwarf=info libbz2.a|grep Version   Version:       5   Version:       5   Version:       5   Version:       5   Version:       5   Version:       5   Version:       5    <1760>   DW_AT_name        : (indirect string, offset: 0x650): BZ2_bzlibVersion[root@5703f261ff2b lib]# 

那么问题来了,我是在容器里编译第三方依赖的,在编译之前对立设置过CC环境变量:

[root@5703f261ff2b tmp]# echo $CCgcc -gdwarf-4 -gstrict-dwarf -fvar-tracking-assignments

截取局部Dockerfile内容:

Dockerfile可知,咱们先设置了CC,而后顺次编译openssllibaprbzip2,那为什么其余的依赖都没有问题,单单bzip2没有失效呢?

[root@5703f261ff2b lib]# objdump --dwarf=info libssl.a|grep Version   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4

所以仿佛还要到bzip2源码自身去找起因。于是我从新解压了bzip2的源码包,发现它是没有configure文件的,只有一个Makefile,关上Makefile,发现了端倪:

尽管咱们在里面设置了CC的值,然而在Makefile里又将其笼罩掉了,应用的是gcc的默认dwarf版本,而咱们的gcc11.3,所以默认应用了dwarf-5版本。
这里,显著看到bzip2开发者省了个懒,其实比拟平安一点的写法应该是:

CC ?= gcc

咱们将Makefile批改一下,从新编译,发现后果正确了:

[root@5703f261ff2b bzip2-1.0.8]# objdump --dwarf=info libbz2.a|grep Version   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4   Version:       4    <1482>   DW_AT_name        : (indirect string, offset: 0x60c): BZ2_bzlibVersion

我应用新的bzip2库编译了一下程序,这时应用gcore生成core文件,曾经不会报Dwarf Error了:

[root@ck08 flow]# gcore `pidof flow`[New LWP 25963][New LWP 25962][New LWP 25961][New LWP 25960][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".0x00007f704555fb43 in select () from /lib64/libc.so.6warning: target file /proc/25959/cmdline contained unexpected null charactersSaved corefile core.25959[Inferior 1 (process 25959) detached]

应用gdb调试这个core文件也能拿到具体的调试信息:

[root@ck08 flow]# gdb ./flow core.25959GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /root/chenyc/src/flow/flow...done.[New LWP 25960][New LWP 25961][New LWP 25962][New LWP 25963][New LWP 25959][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Core was generated by `./flow'.#0  0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64(gdb) bt#0  0x00007f7045c52efd in open64 () from /lib64/libpthread.so.0#1  0x000000000049b731 in apr_file_open (new=0x7f7034003320,     fname=0x7f7034002ad0 "/root/chenyc/test/dc/mave/probes/itoa-flow/data/utf-8_nolb.log", flag=1, perm=<optimized out>,     pool=0x7f7034003288) at file_io/unix/open.c:176#2  0x000000000041c1b9 in im_file_ext_input_open (module=0x2313a00, file=0x7f7045253fd8, finfo=0x7f704524eaa0, readfromlast=false,     existed=true) at src/modules/input/fileExt/im_fileExt.c:976#3  0x000000000041f51f in im_file_ext_check_file (module=<optimized out>, file=<optimized out>, fname=<optimized out>,     pool=<optimized out>) at src/modules/input/fileExt/im_fileExt.c:1315#4  0x0000000000420294 in im_file_ext_check_files (module=0x2313a00, active_only=<optimized out>)    at src/modules/input/fileExt/im_fileExt.c:1475#5  0x000000000042076b in im_file_ext_read (module=0x2313a00) at src/modules/input/fileExt/im_fileExt.c:2981#6  0x00000000004208f8 in im_file_ext_event (module=0x2313a00, event=0x7f702c0008c0) at src/modules/input/fileExt/im_fileExt.c:3583#7  0x00000000004118da in nxlog_worker_thread (thd=0x22f1c08, data=<optimized out>) at src/core/nxlog.c:552#8  0x000000000040cdd5 in _thread_helper (thd=0x22f1c08, d=0x7ffc646c4050) at src/core/core.c:85#9  0x00007f7045c4bea5 in start_thread () from /lib64/libpthread.so.0#10 0x00007f7045568b0d in clone () from /lib64/libc.so.6(gdb) 

总结

dwarf error的问题,网上很多材料说得很含混,大多也都只知其一;不知其二,真要深入研究,还是有很多坑的。反正总之从以下几个思路进行切入,根本都能找到解决方向:

  • dwarf error 个别呈现在gcc编译环境版本与gdb调试环境版本不匹配导致,个别能够通过编译时指定dwarf版本解决
  • 除了咱们本身的源码须要指定dwarf版本,程序所依赖的第三方库也须要应用指定的dwarf版本进行编译

参考资料

  • https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
  • https://zhuanlan.zhihu.com/p/419908664