关于lua:Lua-OpenResty容器化考古历程

原文地址：Lua OpenResty容器化（考古历程）

背景

公司有几个“远古期间”的我的项目，始终都绝对较为稳固，然而我的项目每天总会在一些时段，申请每分钟QPS达到峰值800K左右，导致机器的性能呈现了一些瓶颈，每到峰值期间，总会呈现一个告警，切实是令人头疼。更蹩脚的是这只是远古期间我的项目中的其中一个而且都是部署在物理机器上，所有机器加起来靠近100台。

出于稳定性（削峰）和老本的角度思考，咱们最终决定将所有的Lua OpenResty我的项目上到k8s集群。

抉择适合的openresty根底镜像

通过查看线上在应用的openresty版本信息：

/usr/local/openresty/nginx/sbin/nginx -Vnginx version: openresty/1.13.6.2built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)built with OpenSSL 1.1.0h  27 Mar 2018 (running with OpenSSL 1.1.0k  28 May 2019)TLS SNI support enabledconfigure arguments: --prefix=/usr/local/openresty/nginx ...

lua -vLua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio

得悉在应用的是openresty/1.13.6.2和Lua 5.1.4 :

docker pull openresty/openresty:1.13.6.2-2-centos

Q：能不能抉择应用更小的alpine系列的呢？

A：因为我的项目依赖许多的so库，都是glibc编译的，alpine的话是musl-lib，不兼容。

Q：为啥不从新编译？

A：一方面是危险问题，另外一方面是有一些so库不肯定能找到。

查找我的项目的动静库依赖关系

Nginx配置文件

$ tree -L 3 nginx/confnginx/conf├── vhosts/│    ├── inner.prometheus.nginx.conf│    └── project.nginx.conf└── nginx.conf

自编译的C动静库文件，如`binary_protocol.so`

编写好dockerfile，而后将我的项目打包进容器，执行：

/usr/local/openresty/nginx/sbin/nginx nginx -t

果不其然，报错：

/usr/local/openresty/nginx/lua/init.lua:1: module 'binary_protocol' not found:no field package.preload['binary_protocol']no file '/usr/local/openresty/nginx/lua/binary_protocol.lua'no file '/usr/local/openresty/nginx/lua_lib/binary_protocol.lua'no file '/usr/local/openresty/nginx/luarocks/share/lua/5.1/binary_protocol.lua'no file '/usr/local/openresty/site/lualib/binary_protocol.ljbc'…… ……no file '/usr/local/openresty/nginx/luarocks/lib64/lua/5.1/binary_protocol.so'no file '/usr/local/openresty/site/lualib/binary_protocol.so'no file '/usr/local/openresty/lualib/binary_protocol.so'no file '/usr/local/openresty/site/lualib/binary_protocol.so'no file '/usr/local/openresty/lualib/binary_protocol.so'no file './binary_protocol.so'no file '/usr/local/lib/lua/5.1/binary_protocol.so'no file '/usr/local/openresty/luajit/lib/lua/5.1/binary_protocol.so'no file '/usr/local/lib/lua/5.1/loadall.so'no file '/usr/local/openresty/luajit/lib/lua/5.1/binary_protocol.so'

Q：仔细观察，发现so动静库是外部编译进去提供给lua调用的，如何找到它们呢？

A：是ldd、pldd又或者应用lsof查看动静库文件。

通过ldd、pldd命令，能够查看so所相干的依赖

ldd binary_protocol.solinux-vdso.so.1 =>  (0x00007fff40bd4000)libtolua++.so => not found        ## 会通知咱们ldd短少这个依赖libcrypto.so.6 => not foundliblog4cplus.so.2 => not found        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f458d9ef000)libm.so.6 => /lib64/libm.so.6 (0x00007f458d6ed000)libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f458d4d7000)libc.so.6 => /lib64/libc.so.6 (0x00007f458d10a000)/lib64/ld-linux-x86-64.so.2 (0x00007f458df1e000)

通过这些办法，一点点跟踪，晓得找齐所有依赖库即可。

Luarocks内部包文件

从线上的nginx.conf找到lua_package_path和lua_package_cpath中包含的luarocks门路，再从这个门路中，找到manifest文件，此文件有形容装置了哪些luarocks库。

luarocks 内部依赖装置

RUN luarocks --tree=${WORK_DIR}/luarocks install lua-cjson \    && luarocks --tree=${WORK_DIR}/luarocks install penlight \    && luarocks --tree=${WORK_DIR}/luarocks install version \    && luarocks --tree=${WORK_DIR}/luarocks install lua-resty-http \    && luarocks --tree=${WORK_DIR}/luarocks install luaunit \    && luarocks --tree=${WORK_DIR}/luarocks install ldoc \    && luarocks --tree=${WORK_DIR}/luarocks install lua-discount \    && luarocks --tree=${WORK_DIR}/luarocks install serpent \    && luarocks --tree=${WORK_DIR}/luarocks install luacov \    && luarocks --tree=${WORK_DIR}/luarocks install cluacov \    && luarocks --tree=${WORK_DIR}/luarocks install mmdblua \    && luarocks --tree=${WORK_DIR}/luarocks install lua-resty-jit-uuid \    && luarocks --tree=${WORK_DIR}/luarocks install luasocketRUN luarocks --tree=/usr/local/openresty/nginx/luarocks install nginx-lua-prometheus

遇到的问题及其解决办法

问题1：容器老被OOM Killed

通过剖析，确实占用了十分大的内存：

通过ps命令定位到 worker 数量十分多

解决办法：

限定worker数量：worker_processes 4;

Q：为啥会产生这么多worker？

A：在k8s上，nginx 启动的 worker process，并没有遵循咱们给 Pod 设置的 limit，而是与 Pod 所在 node 无关。

问题2：nginx worker process exited on signal 9

是因为Deployment设定的内存限额太小所致

解决办法：调大requests资源限额

resources:  limits:      cpu: "2000m"      memory: "1Gi"  requests:      cpu: "1000m"      memory: "512Mi"

ps：启动4个Worker大概耗费200Mi。

问题3：attempt to index upvalue ‘result_dict’ (a nil value)

起因是线上的nginx.conf有相干的定义
而代码层面上没有，加上即可：

lua_shared_dict monitor_status 150m;

缩减镜像大小的一个小技巧

借鸡生蛋

如何接入Prometheus监控

在OpenResty中接入 Prometheus，https://github.com/knyar/ngin...

装置依赖

luarocks --tree=/usr/local/openresty/nginx/luarocks install nginx-lua-prometheus

新增配置

为nginx/conf/vhosts/project.nginx.conf减少：

lua_shared_dict prometheus_metrics 10M;log_by_lua_block {    metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})    metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name})}

新增配置文件

新增nginx/conf/vhosts/inner.prometheus.nginx.conf

server {    listen 8099;    location /metrics {        content_by_lua_block {            metric_connections:set(ngx.var.connections_reading, {"reading"})            metric_connections:set(ngx.var.connections_waiting, {"waiting"})            metric_connections:set(ngx.var.connections_writing, {"writing"})            prometheus:collect()        }    }}

更新deployment配置

apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: ${name}  namespace: ${namespace}  labels:    test-app: test-serverspec:  replicas: ${replicas}  template:    metadata:      labels:        test-app: test-server      annotations: # <----------------------- 新增        prometheus.io/scrape: "true"        prometheus.io/path: "/metrics"        prometheus.io/port: "8099"

总结

至此，lua的一个我的项目容器化实现，中途遇到的问题还是蛮多的，下面也只记录了几个次要的步骤和问题。

背景