启动 zebra 进程
sudo zebra -d
查看 zebra 进程状态
ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:23:38 up 10 min, 1 user, load average: 0.05, 0.17, 0.19
Threads: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.2 us, 0.0 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3918.7 total, 2696.1 free, 607.3 used, 615.4 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 3054.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1474 frr 20 0 84008 5596 3076 S 0.0 0.1 0:00.00 zebra
1475 frr 20 0 84008 5596 3076 S 0.0 0.1 0:00.00 Zebra dplane
ubuntu@ubuntu:~$
从上面信息可以看出,只启动 zebra 进程,它会启动一个子线程 zebra dplane。
启动 staticd
sudo staticd -d
查看 zebra 进程的线程情况:
ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:26:53 up 14 min, 1 user, load average: 0.00, 0.08, 0.15
Threads: 3 total, 0 running, 3 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 6.2 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3918.7 total, 2692.0 free, 610.8 used, 615.9 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 3051.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1474 frr 20 0 158156 6368 3696 S 0.0 0.2 0:00.00 zebra
1475 frr 20 0 158156 6368 3696 S 0.0 0.2 0:00.00 Zebra dplane
1706 frr 20 0 158156 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
ubuntu@ubuntu:~$
可以看出来,启动了 staticd 进程后,zebra 多了一个线程 zebra_apic。这可以猜测 staticd 与 zebra 线程之间进行了连接,zebra 创建了一个线程处理 staticd 的请求。
启动 bgpd
sudo bgpd -d
查看 zebra 线程个数:
ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:31:05 up 18 min, 2 users, load average: 0.00, 0.03, 0.10
Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3918.7 total, 2677.2 free, 621.3 used, 620.2 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 3040.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1474 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra
1475 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 Zebra dplane
1706 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
1882 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
1883 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
ubuntu@ubuntu:~$
可以看出来,启动了 bgpd 进程后,zebra 多了两个 zebra_apic 线程。这可以猜测 bgpd 与 zebra 线程之间进行了连接,zebra 创建了一个线程处理 bgpd 的请求。
启动 vtysh
sudo vtysh
查看 zebra 线程个数:
ubuntu@ubuntu:~$ sudo top -b -n 1 -H -p `pidof zebra`
top - 03:32:59 up 20 min, 2 users, load average: 0.16, 0.05, 0.10
Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3918.7 total, 2669.6 free, 628.9 used, 620.3 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 3032.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1474 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra
1475 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 Zebra dplane
1706 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
1882 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
1883 frr 20 0 305620 6368 3696 S 0.0 0.2 0:00.00 zebra_apic
ubuntu@ubuntu:~$
可以看出,vtysh 启动后,zebra 的线程个数没有变化。
rib 添加删除更新流程
在函数 rib_process_add_fib 中打个断点,触发路由添加 (邻居使用 network 发布路由)。
Thread 1 "zebra" hit Breakpoint 2, rib_process_add_fib (new=0x55c3cabb3b50, rn=0x55c3cabb3c30, zvrf=0x55c3caba2150) at zebra/zebra_rib.c:1709
1709 rib_process_add_fib(zvrf, rn, new_fib);
(gdb) bt
#0 rib_process_add_fib (new=0x55c3cabb3b50, rn=0x55c3cabb3c30, zvrf=0x55c3caba2150) at zebra/zebra_rib.c:1709
#1 rib_process (rn=0x55c3cabb3c30) at zebra/zebra_rib.c:1709
#2 process_subq (qindex=0 '\000', subq=0x55c3cab6f4f0) at zebra/zebra_rib.c:2137
#3 meta_queue_process (dummy=<optimized out>, data=0x55c3cab6fff0) at zebra/zebra_rib.c:2198
#4 0x00007ff6f9c03163 in work_queue_run (thread=0x7ffe38158a90) at lib/workqueue.c:291
#5 0x00007ff6f9bfb968 in thread_call (thread=thread@entry=0x7ffe38158a90) at lib/thread.c:1547
#6 0x00007ff6f9bd8257 in frr_run (master=0x55c3caad4aa0) at lib/libfrr.c:1021
#7 0x000055c3ca32b1be in main (argc=2, argv=0x7ffe38158e58) at zebra/main.c:475
(gdb)
经过研究代码发现,上面流程只是其中一部分,主线程将任务提交给一个队列,然后由 zebra_dplane 线程进行处理,在函数 netlink_talk_info 处设置断点:
#0 netlink_talk_info (filter=0x55c3ca334d05 <netlink_talk_filter>, n=n@entry=0x7ff6f95a7ad0, dp_info=0x55c3cad18d18, startup=startup@entry=0)
at zebra/kernel_netlink.c:949
#1 0x000055c3ca33a965 in netlink_route_multipath (cmd=cmd@entry=24, ctx=ctx@entry=0x55c3cad18c40) at zebra/rt_netlink.c:1750
#2 0x000055c3ca33bd5f in kernel_route_update (ctx=ctx@entry=0x55c3cad18c40) at zebra/rt_netlink.c:1850
#3 0x000055c3ca342283 in kernel_dplane_route_update (ctx=0x55c3cad18c40) at zebra/zebra_dplane.c:2120
#4 kernel_dplane_process_func (prov=0x55c3cab70030) at zebra/zebra_dplane.c:2194
#5 0x000055c3ca341493 in dplane_thread_loop (event=<optimized out>) at zebra/zebra_dplane.c:2607
#6 0x00007ff6f9bfb968 in thread_call (thread=thread@entry=0x7ff6f95abe30) at lib/thread.c:1547
#7 0x00007ff6f9bcf4aa in fpt_run (arg=0x55c3cac17ee0) at lib/frr_pthread.c:268
#8 0x00007ff6f9b52182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9 0x00007ff6f9a7bb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)
经过分析基本可以确定三个线程的各自的作用:
主线程 zebra: 用于集中处理各类消息,后续详细分析。
线程 zebra_dplane:用于处理路由信息下内核的最终的任务
zebra_apic:用于处理客户端进程的请求,用于交互,将获取的信息交给主线程进行处理。
重点关注了 sonic 的 fpm 与 zebra 交互的地方在:
rib_process_add_fib 函数调用 hook_call(rib_update, rn, “new route selected”); 处调用 fpm 注册的 zfpm_trigger_update 函数。
后续详细分析整个代码流程。