关于prometheus:nodeexporter监控宿主机磁盘的源码剖析及问题定位

3次阅读

共计 4886 个字符,预计需要花费 13 分钟才能阅读完成。

node_exporter 以 Pod 模式部署,它监控宿主机的 CPU、Mem、Disk 等监控指标。
Pod 隔离的运行环境,会对宿主机的监控造成烦扰,故尽量与宿主机 share namespace,通常配置

hostNetwork: true
hostPID: true

这里重点关注监控宿主机 Disk 分区使用率的过程。

node_exporter 运行的用户

Dockerfile 中,以 USER 指定运行用户,若未指定,则为 root;
能够看出,node_exporter 默认的用户为: nobody,其用户 Id=65534

......
COPY ./node_exporter /bin/node_exporter

EXPOSE      9100
USER        nobody
ENTRYPOINT  ["/bin/node_exporter"]

node_exporter 的 daemonset.yaml 中,配置的 securityContext 为:

......
        hostNetwork: true
    hostPID: true
    securityContext:
        runAsNonRoot: true
        runAsUser: 65534
......

这里 runAsNonRoot 的配置:

  • 若 runAsNonRoot 未配置,则应用镜像内的默认用户;
  • 若配置了 runAsNonRoot,则应用指定的用户执行容器过程;
# kubectl explain daemonset.spec.template.spec.securityContext.runAsNonRoot
KIND:     DaemonSet
VERSION:  apps/v1

FIELD:    runAsNonRoot <boolean>

DESCRIPTION:
     Indicates that the container must run as a non-root user. If true, the
     Kubelet will validate the image at runtime to ensure that it does not run
     as UID 0 (root) and fail to start the container if it does. If unset or
     false, no such validation will be performed. May also be set in
     SecurityContext. If set in both SecurityContext and PodSecurityContext, the
     value specified in SecurityContext takes precedence.

能够看出,node_exporter 指定以非 root 用户 (nobody) 执行 node_exporter。

node_exporter 监控宿主机磁盘分区的原理

1) 挂载宿主机的 /proc 目录

将宿主机的 /proc 目录,挂载到容器内的 /host/root/proc;
将宿主机的 / 目录,挂载到容器内的 /host/root;

spec:
  template:
    spec:
      containers:
      - name: node-exporter
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/root
          mountPropagation: HostToContainer
          name: root
      volumes:
      - hostPath:
          path: /proc
        name: proc
      - hostPath:
          path: /
        name: root

2) node_exporter 读取磁盘分区

读取容器内的 /host/proc/1/mounts 文件,实际上读 1 号过程 (也是宿主机过程) 的 mounts 信息:

// node_exporter/collector/filesystem_linux.go
func mountPointDetails() ([]filesystemLabels, error) {file, err := os.Open(procFilePath("1/mounts"))
    if os.IsNotExist(err) {
        // Fallback to `/proc/mounts` if `/proc/1/mounts` is missing due hidepid.
        log.Debugf("Got %q reading root mounts, falling back to system mounts", err)
        file, err = os.Open(procFilePath("mounts"))
    }
    if err != nil {return nil, err}
    defer file.Close()

    return parseFilesystemLabels(file)
}

文件内容:

cat /host/proc/1/mounts
/dev/sda1 /root/workspace xfs rw,relatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p2 /boot ext3 rw,relatime 0 0
/dev/nvme0n1p1 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=936,iocharset=cp936,shortname=winnt,errors=remount-ro 0 0
/dev/nvme0n1p3 / xfs rw,relatime,attr2,inode64,noquota 0 0

解析文件内容:

func parseFilesystemLabels(r io.Reader) ([]filesystemLabels, error) {var filesystems []filesystemLabels
    scanner := bufio.NewScanner(r)
    for scanner.Scan() {parts := strings.Fields(scanner.Text())
        if len(parts) < 4 {return nil, fmt.Errorf("malformed mount point information: %q", scanner.Text())
        }
        // Ensure we handle the translation of \040 and \011
        // as per fstab(5).
        parts[1] = strings.Replace(parts[1], "\\040", " ", -1)
        parts[1] = strings.Replace(parts[1], "\\011", "\t", -1)
        filesystems = append(filesystems, filesystemLabels{device:     parts[0],
            mountPoint: parts[1],
            fsType:     parts[2],
            options:    parts[3],
        })
    }
    return filesystems, scanner.Err()}

3) 查问分区大小及应用状况

  • 首先,读取 mount 分区状况;
  • 而后,对每个 mount 点,执行系统命令 stat,查问其大小和应用状况;
  • 若 stat 命令执行失败,则记录该分区的 deviceError=1;
// node_exporter/collector/filesystem_linux.go
// GetStats returns filesystem stats.
func (c *filesystemCollector) GetStats() ([]filesystemStats, error) {mps, err := mountPointDetails()
    if err != nil {return nil, err}
    stats := []filesystemStats{}
    for _, labels := range mps {
        ......

        // The success channel is used do tell the "watcher" that the stat
        // finished successfully. The channel is closed on success.
        success := make(chan struct{})
        go stuckMountWatcher(labels.mountPoint, success)

        // 对 mountPoint 执行 stat 命令,将执行后果存入 buf
        buf := new(syscall.Statfs_t)
        err = syscall.Statfs(rootfsFilePath(labels.mountPoint), buf)

        close(success)

        if err != nil {
            stats = append(stats, filesystemStats{
                labels:      labels,
                deviceError: 1,
            })
            log.Debugf("Error on statfs() system call for %q: %s", rootfsFilePath(labels.mountPoint), err)
            continue
        }

        var ro float64
        for _, option := range strings.Split(labels.options, ",") {
            if option == "ro" {
                ro = 1
                break
            }
        }
        stats = append(stats, filesystemStats{
            labels:    labels,
            size:      float64(buf.Blocks) * float64(buf.Bsize),
            free:      float64(buf.Bfree) * float64(buf.Bsize),
            avail:     float64(buf.Bavail) * float64(buf.Bsize),
            files:     float64(buf.Files),
            filesFree: float64(buf.Ffree),
            ro:        ro,
        })
    }
    return stats, nil
}

问题:node_exporter 监控宿主机分区:/root/workspace

宿主机上 /dev/sda1 挂载分区 /root/workspace,通过 node_filesystem_size_byte 查问不到其分区大小;
然而通过 node_filesystem_device_error,查问到其信息:

node_filesystem_device_error{device="/dev/sda1",endpoint="https",fstype="xfs",instance="master1",job="node-exporter",mountpoint="/root/workspace",namespace="monitoring",pod="node-exporter-69hpl",service="node-exporter"}    1

通过下面的代码能够看出,应该是读取到了分区,然而执行 stat 命令的时候失败;

到容器内看一下:

  • 宿主机的 /,挂载到容器的 /host/root;
  • 故宿主机的 /root/workspace,应该挂载到 /host/root/root/workspace
/host/root $ ls root
ls: can't open'root': Permission denied

起因是 没有读取宿主机 root 目录的权限

解决:node_exporter 监控宿主机分区:/root/workspace

批改 node-exporter-daemonset.yaml,让 pod 以 root 用户执行容器:

......
      securityContext:
        runAsUser: 0
......

能够解决该问题,不过以 root 用户运行 node_exporter,可能会有安全隐患。

值得注意的是,若宿主机的分区未挂载在 /root/… 目录下,那么不须要 node_exporter 以 root 运行,也就是不须要批改下面的配置,因为它能够 stat 命令读取到其大小及应用信息。

通常状况下,咱们个别不会将分区挂载到 /root/… 目录下,所以这个问题个别也不会遇到。

参考:

1.https://www.cnblogs.com/YaoDD…
2.https://kubernetes.io/docs/co…

正文完
 0