前几天在一台测试机器上遇到了rmmod失败的景象,通过lsmod能够看到它的援用计数为1。然而我能够确定曾经没有被应用了,所以这应该是一个代码中的bug。
从网上能够找到一篇写得十分好的rmmod失败的剖析文章,外面还提供了一段代码,能够编译出一个ko文件,通过加载这个module并且传入有问题的module名字,能够达到强行问题module援用计数的成果,而后就能够应用rmmod命令来删除它了。
这篇文章的地址是 https://blog.csdn.net/gatieme...
作者也给出了代码的github地址https://github.com/gatieme/LD...
可是外面提供的源代码对应的内核版本是4以上的,我的机器是3.10,所以在编译的时候会遇到语法问题

[root@controller22860 force_rmmod]# makeecho /root/force_rmmod/root/force_rmmodecho 3.10.0-957.10.2.el7.x86_643.10.0-957.10.2.el7.x86_64echo /lib/modules/3.10.0-957.10.2.el7.x86_64/build/lib/modules/3.10.0-957.10.2.el7.x86_64/buildmake -C /lib/modules/3.10.0-957.10.2.el7.x86_64/build M=/root/force_rmmod modulesmake[1]: Entering directory `/usr/src/kernels/3.10.0-957.10.2.el7.x86_64'  CC [M]  /root/force_rmmod/force_rmmod.o/root/force_rmmod/force_rmmod.c: In function ‘force_cleanup_module’:/root/force_rmmod/force_rmmod.c:85:17: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ [-Wformat=]                 mod->name ,mod->state, module_refcount(mod));                 ^In file included from /root/force_rmmod/force_rmmod.c:6:0:/root/force_rmmod/force_rmmod.c:110:46: error: ‘struct module’ has no member named ‘refcnt’         local_set((local_t*)per_cpu_ptr(&(mod->refcnt), cpu), 0);                                              ^include/asm-generic/local.h:29:43: note: in definition of macro ‘local_set’ #define local_set(l,i) atomic_long_set((&(l)->a),(i))                                           ^include/asm-generic/percpu.h:46:2: note: in expansion of macro ‘__verify_pcpu_ptr’  __verify_pcpu_ptr((__p));     \  ^include/linux/percpu.h:149:31: note: in expansion of macro ‘SHIFT_PERCPU_PTR’ #define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))                               ^/root/force_rmmod/force_rmmod.c:110:29: note: in expansion of macro ‘per_cpu_ptr’         local_set((local_t*)per_cpu_ptr(&(mod->refcnt), cpu), 0);                             ^compilation terminated due to -Wfatal-errors.

于是我简略剖析了一下3.10的代码,发现它的 struct module 外面的确是没有定义 refcnt 这个成员的。所以须要批改一下 force_rmmod.c 的代码,我把报错那段改成了上面这样:

    //  革除驱动的援用计数    int ref_cnt = 0;    for_each_possible_cpu(cpu)    {        //local_set((local_t*)per_cpu_ptr(&(mod->refcnt), cpu), 0);        //local_set(__module_ref_addr(mod, cpu), 0);        if (per_cpu_ptr(mod->refptr, cpu)->decs) {          printk("module has dec %d on cpu %d\n", per_cpu_ptr(mod->refptr, cpu)->decs, cpu);          ref_cnt -= per_cpu_ptr(mod->refptr, cpu)->decs;        }        if (per_cpu_ptr(mod->refptr, cpu)->incs) {          //module_put(mod);          printk("module has inc %d on cpu %d\n", per_cpu_ptr(mod->refptr, cpu)->incs, cpu);          ref_cnt += per_cpu_ptr(mod->refptr, cpu)->incs;        }    }    for(int i = 0; i < ref_cnt; i++) {        module_put(mod);    }

次要的原理就是通过计算各个cpu上对该模块的援用计数累计失去以后的计数值,而后依照计数值做module_put动作,就能够把援用计数值降到0了,随后就能够失常rmmod了。

我把作者的repo fork之后批改了一下,新的文件在 https://github.com/yzx1983/LD...