前言
K8S 性能优化系列文章,本文为第一篇:OS sysctl 性能优化参数最佳实际。
参数一览
sysctl 调优参数一览
# Kubernetes Settings
vm.max_map_count = 262144
kernel.softlockup_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
# Increase the number of connections
net.core.somaxconn = 32768
# Maximum Socket Receive Buffer
net.core.rmem_max = 16777216
# Maximum Socket Send Buffer
net.core.wmem_max = 16777216
# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
# Increase the number of outstanding syn requests allowed
net.ipv4.tcp_max_syn_backlog = 8096
# For persistent HTTP connections
net.ipv4.tcp_slow_start_after_idle = 0
# Allow to reuse TIME_WAIT sockets for new connections
# when it is safe from protocol viewpoint
net.ipv4.tcp_tw_reuse = 1
# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384
# Increase size of file handles and inode cache
fs.file-max = 2097152
# Max number of inotify instances and watches for a user
# Since dockerd runs as a single user, the default instances value of 128 per user is too low
# e.g. uses of inotify: nginx ingress controller, kubectl logs -f
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Additional sysctl flags that kubelet expects
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1
# Prevent docker from changing iptables: https://github.com/kubernetes/kubernetes/issues/40182
net.ipv4.ip_forward=1
如果是 AWS,额定减少如下:
# AWS settings
# Issue #23395
net.ipv4.neigh.default.gc_thresh1=0
如果启用了 IPv6,额定减少如下:
# Enable IPv6 forwarding for network plugins that don't do it themselves
net.ipv6.conf.all.forwarding=1
参数解释
分类 | 内核参数 | 阐明 | 参考链接 |
---|---|---|---|
Kubernetes | vm.max_map_count = 262144 |
限度一个过程能够领有的 VMA(虚拟内存区域)的数量, 一个更大的值对于 elasticsearch、mongo 或其余 mmap 用户来说十分有用 |
ES Configuration |
Kubernetes | kernel.softlockup_panic = 1 |
用于解决 K8S 内核软锁相干 bug | root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com) |
Kubernetes | kernel.softlockup_all_cpu_backtrace = 1 |
用于解决 K8S 内核软锁相干 bug | root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com) |
Kubernetes | net.ipv4.ip_local_reserved_ports = 30000-32767 |
默认 K8S Nodport 端口 | service-node-port-range and ip_local_port_range collision · Issue #6342 · kubernetes/kops (github.com) |
网络 | net.core.somaxconn = 32768 |
示意 socket 监听(listen)的 backlog 下限。什么是 backlog?backlog 就是 socket 的监听队列,当一个申请(request)尚未被解决或建设时,他会进入 backlog。 减少连接数. |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.core.rmem_max = 16777216 |
接管套接字缓冲区大小的最大值 (以字节为单位)。 最大化 Socket Receive Buffer |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.core.wmem_max = 16777216 |
发送套接字缓冲区大小的最大值 (以字节为单位)。 最大化 Socket Send Buffer |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 |
减少总的可调配的 buffer 空间的最大值 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.ipv4.tcp_max_syn_backlog = 8096 |
示意那些尚未收到客户端确认信息的连贯(SYN 音讯)队列的长度,默认为 1024 减少未实现的 syn 申请的数量 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.ipv4.tcp_slow_start_after_idle = 0 |
长久化 HTTP 连贯 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.ipv4.tcp_tw_reuse = 1 |
示意容许重用 TIME_WAIT 状态的套接字用于新的 TCP 连贯, 默认为 0,示意敞开。 容许在协定平安的状况下重用 TIME_WAIT 套接字用于新的连贯 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.core.netdev_max_backlog = 16384 |
当网卡接管数据包的速度大于内核解决的速度时,会有一个队列保留这些数据包。这个参数示意该队列的最大值 如果内核接管数据包的速度超过了能够解决的速度,这个队列就会减少 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
文件系统 | fs.file-max = 2097152 |
该参数决定了零碎中所容许的文件句柄最大数目,文件句柄设置代表 linux 零碎中能够关上的文件的数量。 减少文件句柄和 inode 缓存的大小 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
文件系统 | fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 |
一个用户的 inotify 实例和 watch 的最大数量 <br/> 因为 dockerd 作为单个用户运行,每个用户的默认实例值 128 太低了 <br/> 例如应用 inotify: nginx ingress controller, kubectl logs -f | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | vm.overcommit_memory = 1 |
对内存调配的一种策略 =1,示意内核容许调配所有的物理内存,而不论以后的内存状态如何 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | kernel.panic = 10 |
panic 谬误中主动重启,等待时间为 10 秒 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | kernel.panic_on_oops = 1 |
在 Oops 产生时会进行 panic()操作 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
网络 | net.ipv4.ip_forward=1 |
启用 ip 转发 另外也避免 docker 扭转 iptables |
Upgrading docker 1.13 on nodes causes outbound container traffic to stop working · Issue #40182 · kubernetes/kubernetes (github.com) |
网络 | net.ipv4.neigh.default.gc_thresh1=0 |
修复 AWS arp_cache: neighbor table overflow! 报错 |
arp_cache: neighbor table overflow! · Issue #4533 · kubernetes/kops (github.com) |
EOF