乐趣区

FRR学习第10天zebra进程分析

启动 zebra 进程

sudo zebra -d

查看 zebra 进程状态

ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:23:38 up 10 min,  1 user,  load average: 0.05, 0.17, 0.19
Threads:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  0.0 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3918.7 total,   2696.1 free,    607.3 used,    615.4 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3054.7 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1474 frr       20   0   84008   5596   3076 S   0.0   0.1   0:00.00 zebra
  1475 frr       20   0   84008   5596   3076 S   0.0   0.1   0:00.00 Zebra dplane
ubuntu@ubuntu:~$ 

从上面信息可以看出,只启动 zebra 进程,它会启动一个子线程 zebra dplane。

启动 staticd

sudo staticd -d

查看 zebra 进程的线程情况:

ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`     
top - 03:26:53 up 14 min,  1 user,  load average: 0.00, 0.08, 0.15
Threads:   3 total,   0 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  6.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3918.7 total,   2692.0 free,    610.8 used,    615.9 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3051.1 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1474 frr       20   0  158156   6368   3696 S   0.0   0.2   0:00.00 zebra
  1475 frr       20   0  158156   6368   3696 S   0.0   0.2   0:00.00 Zebra dplane
  1706 frr       20   0  158156   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
ubuntu@ubuntu:~$ 

可以看出来,启动了 staticd 进程后,zebra 多了一个线程 zebra_apic。这可以猜测 staticd 与 zebra 线程之间进行了连接,zebra 创建了一个线程处理 staticd 的请求。

启动 bgpd

sudo bgpd -d

查看 zebra 线程个数:

ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:31:05 up 18 min,  2 users,  load average: 0.00, 0.03, 0.10
Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3918.7 total,   2677.2 free,    621.3 used,    620.2 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3040.5 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1474 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra
  1475 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 Zebra dplane
  1706 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
  1882 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
  1883 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
ubuntu@ubuntu:~$ 

可以看出来,启动了 bgpd 进程后,zebra 多了两个 zebra_apic 线程。这可以猜测 bgpd 与 zebra 线程之间进行了连接,zebra 创建了一个线程处理 bgpd 的请求。

启动 vtysh

sudo vtysh

查看 zebra 线程个数:

ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:32:59 up 20 min,  2 users,  load average: 0.16, 0.05, 0.10
Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3918.7 total,   2669.6 free,    628.9 used,    620.3 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3032.9 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1474 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra
  1475 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 Zebra dplane
  1706 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
  1882 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
  1883 frr       20   0  305620   6368   3696 S   0.0   0.2   0:00.00 zebra_apic
ubuntu@ubuntu:~$ 

可以看出,vtysh 启动后,zebra 的线程个数没有变化。

rib 添加删除更新流程

在函数 rib_process_add_fib 中打个断点,触发路由添加 (邻居使用 network 发布路由)。

Thread 1 "zebra" hit Breakpoint 2, rib_process_add_fib (new=0x55c3cabb3b50, rn=0x55c3cabb3c30, zvrf=0x55c3caba2150) at zebra/zebra_rib.c:1709
1709                    rib_process_add_fib(zvrf, rn, new_fib);
(gdb) bt
#0  rib_process_add_fib (new=0x55c3cabb3b50, rn=0x55c3cabb3c30, zvrf=0x55c3caba2150) at zebra/zebra_rib.c:1709
#1  rib_process (rn=0x55c3cabb3c30) at zebra/zebra_rib.c:1709
#2  process_subq (qindex=0 '\000', subq=0x55c3cab6f4f0) at zebra/zebra_rib.c:2137
#3  meta_queue_process (dummy=<optimized out>, data=0x55c3cab6fff0) at zebra/zebra_rib.c:2198
#4  0x00007ff6f9c03163 in work_queue_run (thread=0x7ffe38158a90) at lib/workqueue.c:291
#5  0x00007ff6f9bfb968 in thread_call (thread=thread@entry=0x7ffe38158a90) at lib/thread.c:1547
#6  0x00007ff6f9bd8257 in frr_run (master=0x55c3caad4aa0) at lib/libfrr.c:1021
#7  0x000055c3ca32b1be in main (argc=2, argv=0x7ffe38158e58) at zebra/main.c:475
(gdb) 

经过研究代码发现,上面流程只是其中一部分,主线程将任务提交给一个队列,然后由 zebra_dplane 线程进行处理,在函数 netlink_talk_info 处设置断点:

#0  netlink_talk_info (filter=0x55c3ca334d05 <netlink_talk_filter>, n=n@entry=0x7ff6f95a7ad0, dp_info=0x55c3cad18d18, startup=startup@entry=0)
    at zebra/kernel_netlink.c:949
#1  0x000055c3ca33a965 in netlink_route_multipath (cmd=cmd@entry=24, ctx=ctx@entry=0x55c3cad18c40) at zebra/rt_netlink.c:1750
#2  0x000055c3ca33bd5f in kernel_route_update (ctx=ctx@entry=0x55c3cad18c40) at zebra/rt_netlink.c:1850
#3  0x000055c3ca342283 in kernel_dplane_route_update (ctx=0x55c3cad18c40) at zebra/zebra_dplane.c:2120
#4  kernel_dplane_process_func (prov=0x55c3cab70030) at zebra/zebra_dplane.c:2194
#5  0x000055c3ca341493 in dplane_thread_loop (event=<optimized out>) at zebra/zebra_dplane.c:2607
#6  0x00007ff6f9bfb968 in thread_call (thread=thread@entry=0x7ff6f95abe30) at lib/thread.c:1547
#7  0x00007ff6f9bcf4aa in fpt_run (arg=0x55c3cac17ee0) at lib/frr_pthread.c:268
#8  0x00007ff6f9b52182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9  0x00007ff6f9a7bb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

经过分析基本可以确定三个线程的各自的作用:

主线程 zebra: 用于集中处理各类消息,后续详细分析。

线程 zebra_dplane:用于处理路由信息下内核的最终的任务

zebra_apic:用于处理客户端进程的请求,用于交互,将获取的信息交给主线程进行处理。

重点关注了 sonic 的 fpm 与 zebra 交互的地方在:

rib_process_add_fib 函数调用 hook_call(rib_update, rn, “new route selected”); 处调用 fpm 注册的 zfpm_trigger_update 函数。

后续详细分析整个代码流程。

退出移动版