sonic容器swss启动过程
sonic业务进程都是运行在容器中的,那容器启动后是如何启动它的服务呢。
要分析这个问题,首先要搞清楚容器构建过程。我们以docker-orchagent容器为例进行分析。
Dockerfile文件
sonic中的Dockerfile由Dockerfile.j2文件生成。
docker-orchagent/Dockerfile.j2
FROM docker-config-engineARG docker_container_nameRUN [ -f /etc/rsyslog.conf ] && sed -ri "s/%syslogtag%/$docker_container_name\/%syslogtag%/;" /etc/rsyslog.conf## Make apt-get non-interactiveENV DEBIAN_FRONTEND=noninteractiveRUN apt-get updateRUN apt-get install -f -y ifupdown arping libdbus-1-3 libdaemon0 libjansson4## Install redis-tools dependencies## TODO: implicitly install dependenciesRUN apt-get -y install libjemalloc1COPY \{% for deb in docker_orchagent_debs.split(' ') -%}debs/{{ deb }}{{' '}}{%- endfor -%}debs/RUN dpkg -i \{% for deb in docker_orchagent_debs.split(' ') -%}debs/{{ deb }}{{' '}}{%- endfor %}## Clean upRUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -yRUN rm -rf /debsCOPY ["files/arp_update", "/usr/bin"]COPY ["enable_counters.py", "/usr/bin"]COPY ["start.sh", "orchagent.sh", "swssconfig.sh", "/usr/bin/"]COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]## Copy all Jinja2 template files into the templates folderCOPY ["*.j2", "/usr/share/sonic/templates/"]#程序的入口点ENTRYPOINT ["/usr/bin/supervisord"]
从上面的配置来看,容器启动后制定的程序为:/usr/bin/supervisord
Host启动容器
Host是以swss服务形式启动docker-orchagent容器的,使用如下命令可以看出:
admin@sonic:~$ sudo config reload -yRunning command: systemctl stop swssRunning command: systemctl stop pmonRunning command: systemctl stop teamdRunning command: sonic-cfggen -j /etc/sonic/config_db.json --write-to-dbRunning command: systemctl restart hostname-configRunning command: systemctl restart interfaces-configRunning command: systemctl restart ntp-configRunning command: systemctl restart rsyslog-configRunning command: systemctl restart swssRunning command: systemctl restart teamdRunning command: systemctl restart pmonadmin@sonic:~$
我们查看一下swss的service文件
admin@sonic:~$ cat /etc/systemd/system/swss.service
[Unit]Description=switch state serviceRequires=database.service updategraph.serviceAfter=database.service updategraph.serviceAfter=interfaces-config.service[Service]User=root# Wait for redis server start before database cleanExecStartPre=/bin/bash -c 'until [[ $(/usr/bin/docker exec database redis-cli ping | grep -c PONG) -gt 0 ]]; do sleep 1; done'ExecStartPre=/usr/bin/docker exec database redis-cli -n 0 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 1 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 2 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 5 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 6 FLUSHDBExecStartPre=/usr/bin/swss.sh start ExecStartPre=/usr/bin/syncd.sh startExecStart=/usr/bin/swss.sh attachExecStop=/usr/bin/swss.sh stopExecStopPost=/usr/bin/syncd.sh stop[Install]WantedBy=multi-user.target
可以看出swss服务的启动程序是/usr/bin/swss.sh attach。在启动该服务之前,需要执行如下命令:
# Wait for redis server start before database clean# 等待,直到redis可用,可用表示ping之后会返回PONG,那么grep -c PONG则为1大于0ExecStartPre=/bin/bash -c 'until [[ $(/usr/bin/docker exec database redis-cli ping | grep -c PONG) -gt 0 ]]; do sleep 1; done'ExecStartPre=/usr/bin/docker exec database redis-cli -n 0 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 1 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 2 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 5 FLUSHDBExecStartPre=/usr/bin/docker exec database redis-cli -n 6 FLUSHDBExecStartPre=/usr/bin/swss.sh start ExecStartPre=/usr/bin/syncd.sh start
会清空数据库0,1,2,5,6,不会清空4(config_db),即会保留配置。还会启动/usr/bin/swss.sh start 和/usr/bin/syncd.sh start。
我们看一下/usr/bin/swss.sh脚本
#!/bin/bashfunction getMountPoint(){ echo $1 | python -c "import sys, json, os; mnts = [x for x in json.load(sys.stdin)[0]['Mounts'] if x['Destination'] == '/usr/share/sonic/hwsku']; print '' if len(mnts) == 0 else os.path.basename(mnts[0]['Source'])" 2>/dev/null}function postStartAction(){ docker exec swss rm -f /ready # remove cruft if [[ -d /host/fast-reboot ]]; then test -e /host/fast-reboot/fdb.json && docker cp /host/fast-reboot/fdb.json swss:/ test -e /host/fast-reboot/arp.json && docker cp /host/fast-reboot/arp.json swss:/ test -e /host/fast-reboot/default_routes.json && docker cp /host/fast-reboot/default_routes.json swss:/ rm -fr /host/fast-reboot fi docker exec swss touch /ready # signal swssconfig.sh to go}# Obtain our platform as we will mount directories with these names in each dockerPLATFORM=`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`# Obtain our HWSKU as we will mount directories with these names in each dockerHWSKU=`sonic-cfggen -d -v 'DEVICE_METADATA["localhost"]["hwsku"]'`#启动容器start() { DOCKERCHECK=`docker inspect --type container swss 2>/dev/null` if [ "$?" -eq "0" ]; then DOCKERMOUNT=`getMountPoint "$DOCKERCHECK"` if [ "$DOCKERMOUNT" == "$HWSKU" ]; then echo "Starting existing swss container with HWSKU $HWSKU" docker start swss postStartAction exit 0 fi # docker created with a different HWSKU, remove and recreate echo "Removing obsolete swss container with HWSKU $DOCKERMOUNT" docker rm -f swss fi echo "Starting new swss container with HWSKU $HWSKU" docker run -d --net=host --privileged -t -v /etc/network/interfaces:/etc/network/interfaces:ro -v /etc/network/interfaces.d/:/etc/network/interfaces.d/:ro -v /host/machine.conf:/host/machine.conf:ro -v /etc/sonic:/etc/sonic:ro -v /var/log/swss:/var/log/swss:rw \ --log-opt max-size=2M --log-opt max-file=5 \ -v /var/run/redis:/var/run/redis:rw \ -v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \ -v /usr/share/sonic/device/$PLATFORM/$HWSKU:/usr/share/sonic/hwsku:ro \ --tmpfs /tmp \ --tmpfs /var/tmp \ --name=swss docker-orchagent-bfn:latest || { echo "Failed to docker run" >&1 exit 4 } postStartAction}attach() { docker attach --no-stdin swss}stop() { docker stop swss}case "$1" in start|stop|attach) $1 ;; *) echo "Usage: $0 {start|stop|attach}" exit 1 ;;esac
从上面的脚本可以看出,Host使用如下命令启动容器:
docker run -d --net=host --privileged -t -v /etc/network/interfaces:/etc/network/interfaces:ro -v /etc/network/interfaces.d/:/etc/network/interfaces.d/:ro -v /host/machine.conf:/host/machine.conf:ro -v /etc/sonic:/etc/sonic:ro -v /var/log/swss:/var/log/swss:rw \ --log-opt max-size=2M --log-opt max-file=5 \ -v /var/run/redis:/var/run/redis:rw \ -v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \ -v /usr/share/sonic/device/$PLATFORM/$HWSKU:/usr/share/sonic/hwsku:ro \ --tmpfs /tmp \ --tmpfs /var/tmp \ --name=swss docker-orchagent-bfn:latest
- -d, --detach Run container in background and print container ID
我们单独的使用
run
只会启动容器,他会立即启动,相应然后就自动消失。你在这个时候使用exec
命令已经太迟了。
所以,当我们启动容器的时候一定要加上--detach或者-d
来保持容器在后台持续运行。 - --net=host 与host共享网络命名空间
- --privileged Give extended privileges to this container 使用该参数,container内的root拥有真正的root权限
- -v挂在host的一些目录到容器中。
- --name=swss 容器名字为swss
- docker-orchagent-bfn:latest 使用docker-orchagent-bfn:latest
命令没有携带CMD。
容器运行入口点
从Dockerfile.j2文件可以看出文件的入口点为ENTRYPOINT ["/usr/bin/supervisord"],使用supervisord进行进程监控,我们看一下supervisord的配置文件
进入swss容器后,我们查看启动了多少个进程。
root@switch:/# ps -efUID PID PPID C STIME TTY TIME CMDroot 1 0 0 09:48 ? 00:00:01 /usr/bin/python /usr/bin/supervisordroot 20 1 0 09:48 ? 00:00:00 /usr/bin/watcherdroot 42 1 0 09:48 ? 00:00:00 /usr/sbin/rsyslogd -nroot 47 1 0 09:48 ? 00:00:00 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:90:fb:60:e2:8broot 59 1 1 09:48 ? 00:00:23 /usr/bin/portsyncd -p /usr/share/sonic/hwsku/port_config.iniroot 62 1 0 09:48 ? 00:00:00 /usr/bin/intfsyncdroot 65 1 0 09:48 ? 00:00:00 /usr/bin/neighsyncdroot 77 1 0 09:49 ? 00:00:00 /usr/bin/vlanmgrdroot 94 1 0 09:49 ? 00:00:00 /usr/bin/intfmgrdroot 102 1 0 09:49 ? 00:00:00 /usr/bin/buffermgrd -l /usr/share/sonic/hwsku/pg_profile_lookup.iniroot 112 1 0 09:49 ? 00:00:00 /bin/bash /usr/bin/arp_updateroot 335 112 0 10:24 ? 00:00:00 sleep 300root 344 0 1 10:25 ? 00:00:00 bashroot 349 344 0 10:25 ? 00:00:00 ps -efroot@switch:/#
上面的结果/usr/bin/python /usr/bin/supervisord可以看出,supervisord启动的时候没有指定配置文件,那么其使用的是默认配置文件/etc/supervisor/supervisord.conf:
; supervisor config file[unix_http_server]file=/var/run/supervisor.sock ; (the path to the socket file)chmod=0700 ; sockef file mode (default 0700)username=dummypassword=dummy[supervisord]logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)childlogdir=/var/log/supervisor ; ('AUTO' child log dir, default $TEMP)user=root; the below section must remain in the config file for RPC; (supervisorctl/web interface) to work, additional interfaces may be; added by defining them in separate rpcinterface: sections[rpcinterface:supervisor]supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface[supervisorctl]serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socketusername=dummypassword=dummy; The [include] section can just contain the "files" setting. This; setting can list multiple files (separated by whitespace or; newlines). It can also contain wildcards. The filenames are; interpreted as relative to this file. Included files *cannot*; include files themselves.[include]files = /etc/supervisor/conf.d/*.conf
查看子配置文件files = /etc/supervisor/conf.d/*.conf
/etc/supervisor/conf.d/目录下只有一个文件supervisord.conf:
[supervisord]logfile_maxbytes=1MBlogfile_backups=2nodaemon=true#运行start.sh,优先级为1[program:start.sh]command=/usr/bin/start.shpriority=1autostart=trueautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#rsyslogd,优先级为2[program:rsyslogd]command=/usr/sbin/rsyslogd -npriority=2autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#rsyslogd,优先级为2[program:orchagent]command=/usr/bin/orchagent.shpriority=3autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#rsyslogd,优先级为2[program:portsyncd]command=/usr/bin/portsyncd -p /usr/share/sonic/hwsku/port_config.inipriority=4autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#intfsyncd,优先级为2[program:intfsyncd]command=/usr/bin/intfsyncdpriority=5autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#neighsyncd,优先级为6[program:neighsyncd]command=/usr/bin/neighsyncdpriority=6autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog#swssconfig.sh,优先级为7[program:swssconfig]command=/usr/bin/swssconfig.shpriority=7autostart=falseautorestart=unexpectedstartretries=0stdout_logfile=syslogstderr_logfile=syslog#arp_update,优先级为8[program:arp_update]command=/usr/bin/arp_updatepriority=8autostart=falseautorestart=unexpectedstdout_logfile=syslogstderr_logfile=syslog#vlanmgrd,优先级为9[program:vlanmgrd]command=/usr/bin/vlanmgrdpriority=9autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog[program:intfmgrd]command=/usr/bin/intfmgrdpriority=10autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog[program:buffermgrd]command=/usr/bin/buffermgrd -l /usr/share/sonic/hwsku/pg_profile_lookup.inipriority=10autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog[program:enable_counters]command=/usr/bin/enable_counters.pypriority=11autostart=falseautorestart=falsestdout_logfile=syslogstderr_logfile=syslog[eventlistener:mylistener]command=/usr/bin/watcherdevents=PROCESS_STATE
start.sh
#!/usr/bin/env bashmkdir -p /etc/swss/config.d/sonic-cfggen -d -t /usr/share/sonic/templates/switch.json.j2 > /etc/swss/config.d/switch.jsonsonic-cfggen -d -t /usr/share/sonic/templates/ipinip.json.j2 > /etc/swss/config.d/ipinip.jsonsonic-cfggen -d -t /usr/share/sonic/templates/ports.json.j2 > /etc/swss/config.d/ports.jsonexport platform=`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`rm -f /var/run/rsyslogd.pidsupervisorctl start rsyslogdsupervisorctl start orchagentsupervisorctl start portsyncdsupervisorctl start intfsyncdsupervisorctl start neighsyncdsupervisorctl start swssconfigsupervisorctl start vlanmgrdsupervisorctl start intfmgrdsupervisorctl start buffermgrdsupervisorctl start enable_counters# Start arp_update when VLAN existsVLAN=`sonic-cfggen -d -v 'VLAN.keys() | join(" ") if VLAN'`if [ "$VLAN" != "" ]; then supervisorctl start arp_updatefi