乐趣区

运维编排场景系列给实例加到SLS机器组

场景简介

我们经常会有这样的运维场景,扩容一批机器需要配置 SLS 日志,对于已经配置好的 SLS Logstore 后,我们只需要将机器加到机器组里。

解决方案

传统的解决方案是登录每台 ecs 实例并安装 logtail,执行的命令为

wget http://logtail-release-{{ACS::RegionId}}.oss-{{ACS::RegionId}}-internal.aliyuncs.com/linux64/logtail.sh
            -O logtail.sh; chmod 755 logtail.sh; ./logtail.sh install {{ACS::RegionId}};
echo {{LogTailUserDefinedId}} > /etc/ilogtail/user_defined_id

分解下 Task,需要以下几步:
1. 检查实例是不是 Running 状态
2. 调用云助手 CreateCommand 创建上述命令
3. 调用 InvokeCommand 执行
4. 等待执行成功
5. 删除模板

再转换成 OOS 模板并创建命名为 installSlsAgent

{
  "FormatVersion": "OOS-2019-06-01",
  "Description": "Install Logtail agent on the ECS Instance.",
  "Parameters": {
    "InstanceId": {
      "Type": "String",
      "Description": "the Instance Id to install ilogtail",
      "AllowedPattern": "i-[A-Za-z0-9]*",
      "MinLength": 1,
      "MaxLength": 30
    },
    "LogTailUserDefinedId": {
      "Type": "String",
      "Description": "the user defined Id write to /etc/ilogtail/user_defined_id",
      "AllowedPattern": "[A-Za-z0-9\\-_]*",
      "MinLength": 1,
      "MaxLength": 30
    },
    "OOSAssumeRole": {
      "Type": "String",
      "Description": "The RAM role to be assumed by OOS.",
      "Default": "OOSServiceRole"
    }
  },
  "RamRole": "{{OOSAssumeRole}}",
  "Tasks": [
    {
      "Name": "checkInstanceReady",
      "Action": "ACS::CheckFor",
      "Description": "describe instances with specified parameters, refer them here: https://help.aliyun.com/document_detail/63440.html",
      "Properties": {
        "API": "DescribeInstances",
        "Service": "ECS",
        "PropertySelector": "Instances.Instance[].Status",
        "DesiredValues": ["Running"],
        "Parameters": {
          "InstanceIds": ["{{ InstanceId}}"
          ]
        }
      },
      "Outputs": {
        "InstanceIds": {"ValueSelector": "InstanceIdSets.InstanceIdSet[]",
          "Type": "List"
        }
      }
    },
    {
      "Name": "createCommand",
      "Action": "ACS::ExecuteApi",
      "Description": "create the command to install logtail agent.",
      "Properties": {
        "API": "CreateCommand",
        "Service": "ECS",
        "Parameters": {
          "CommandContent": {"Fn::Base64Encode": "wget http://logtail-release-{{ACS::RegionId}}.oss-{{ACS::RegionId}}-internal.aliyuncs.com/linux64/logtail.sh -O logtail.sh; chmod 755 logtail.sh; ./logtail.sh install {{ACS::RegionId}}; echo {{LogTailUserDefinedId}} > /etc/ilogtail/user_defined_id"
          },
          "Name": "oos-{{ACS::TemplateName}}",
          "Type": "RunShellScript"
        }
      },
      "Outputs": {
        "CommandId": {
          "Type": "String",
          "ValueSelector": "CommandId"
        }
      }
    },
    {
      "Name": "invokeCommand",
      "Action": "ACS::ExecuteApi",
      "Description": "invoke the command to install ilogtail",
      "Properties": {
        "Service": "ECS",
        "API": "InvokeCommand",
        "Parameters": {"CommandId": "{{ createCommand.CommandId}}",
          "InstanceIds": ["{{ InstanceId}}"
          ]
        }
      },
      "Outputs": {
        "InvokeId": {
          "Type": "String",
          "ValueSelector": "InvokeId"
        }
      }
    },
    {
      "Name": "untilInvocationDone",
      "Action": "ACS::WaitFor",
      "Description": "until invocation ready",
      "MaxAttempts": 5,
      "Properties": {
        "Service": "ECS",
        "API": "DescribeInvocations",
        "Parameters": {"InvokeId": "{{  invokeCommand.InvokeId}}"
        },
        "DesiredValues": ["Finished"],
        "PropertySelector": "Invocations.Invocation[].InvokeStatus"}
    },
    {
      "Name": "describeInvocationResult",
      "Action": "ACS::ExecuteApi",
      "Description": "get the command invocation result",
      "Properties": {
        "Service": "Ecs",
        "API": "DescribeInvocationResults",
        "Parameters": {"InvokeId": "{{ invokeCommand.InvokeId}}"
        }
      },
      "Outputs": {
        "InvocationResult": {
          "Type": "String",
          "ValueSelector": "Invocation.InvocationResults.InvocationResult[].Output"},
        "ExitCode": {
          "Type": "Number",
          "ValueSelector": "Invocation.InvocationResults.InvocationResult[].ExitCode"}
      }
    },
    {
      "Name": "deleteCommand",
      "Action": "ACS::ExecuteAPI",
      "Description": "clean up the install ilogtail command",
      "Properties": {
        "Service": "ECS",
        "Risk": "Normal",
        "API": "DeleteCommand",
        "Parameters": {"CommandId": "{{ createCommand.CommandId}}"
        }
      }
    }
  ],
  "Outputs": {
    "InvocationResult": {
      "Type": "String",
      "Value": {"Fn::Base64Decode": "{{ describeInvocationResult.InvocationResult}}"
      }
    },
    "ExitCode": {
      "Type": "String",
      "Value": "{{describeInvocationResult.ExitCode}}"
    }
  }
}

以上模板我们很好的解决了单台机器执行 Install sls Agent 的任务,那么对于多台机器的执行怎么办呢?OOS 的 Loop 功能可以很好的解决这个问题。并且 OOS 支持模板嵌套执行,那么我们只需要构建一个传入实例 ID 列表的

{
  "FormatVersion": "OOS-2019-06-01",
  "Parameters": {
    "InstanceIds":{
          "Type": "List",
           "Description": "the instance id list"
    },
    "LogTailUserDefinedId": {
      "Type": "String",
      "Description": "log tail user defined id",
      "MinLength": 1,
      "MaxLength": 30
    }
  },
  "Tasks": [
    {
      "Properties": {
        "TemplateName": "installSlsAgent",
        "Parameters": {"InstanceId": "{{ ACS::TaskLoopItem}}",
          "LogTailUserDefinedId": "{{LogTailUserDefinedId}}"
        }
      },
      "Name": "installSLSAgent",
      "Action": "ACS::Template",
      "Outputs": {
        "ExitCode": {
          "ValueSelector": "ExitCode",
          "Type": "Number"
        }
      },
      "Loop": {"Items": "{{ InstanceIds}}",
        "MaxErrors": 100,
        "Concurrency": 10,
        "Outputs": {}}
    }
  ],
  "Outputs": {}}

创建一个执行

执行后看详细信息,发现执行已经成功并且可以看到每个 loop 子 task 的状态

由于子 task 是一个嵌套执行,我们点击可以看到嵌套的模板执行情况

最后到机器组查看机器状态 OK 说明已经执行成功

总结

以上我们介绍了如何使用资源编排 OOS 批量安装 SLS-Agent 并加入到机器组中,更多场景待挖掘。目前运维编排 (OOS) 处于内测中,欢迎试用提意见


本文作者:巴梨

阅读原文

本文为云栖社区原创内容,未经允许不得转载。

退出移动版