关于spark:mlflow的搭建使用

背景

mlflow是Databrick开源的机器学习治理平台，它很好的解藕了算法训练和算法模型服务，使得算法工程师专一于模型的训练，而不须要过多的关注于服务的,
而且在咱们公司曾经有十多个服务稳固运行了两年多。

搭建

mlflow的搭建次要是mlflow tracking server的搭建，tracking server次要是用于模型的元数据以及模型的数据存储
咱们这次以minio作为模型数据的存储后盾，mysql作为模型元数据的存储，因为这种模式能满足线上的需要，不仅仅是用于测试

minio的搭建
参考我之前的文章MinIO的搭建应用，并且创立名为mlflow的bucket，便于后续操作

mlflow的搭建

conda的装置
参照install conda,依据本人的零碎装置不同的conda环境

mlfow tracking server装置

# 创立conda环境 并装置 python 3.6  
conda create -n mlflow-1.11.0 python==3.6
#激活conda环境
conda activate mlflow-1.11.0
# 装置mlfow tracking server python须要的依赖包
pip install mlflow==1.11.0 
pip install mysqlclient
pip install boto3

mlflow tracking server的启动

暴露出minio url以及须要的ID和KEY，因为mlflow tracking server在上传模型文件时须要   
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001
mlflow server \
   --backend-store-uri mysql://root:AO,h07ObIeH-@localhost/mlflow_test \
   --host 0.0.0.0 -p 5002 \
   --default-artifact-root s3://mlflow

拜访localhost:5002, 就能看到如下界面:

应用

拷贝以下的wine.py文件

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow.sklearn


def eval_metrics(actual, pred):
  rmse = np.sqrt(mean_squared_error(actual, pred))
  mae = mean_absolute_error(actual, pred)
  r2 = r2_score(actual, pred)
  return rmse, mae, r2


if __name__ == "__main__":
  warnings.filterwarnings("ignore")
  np.random.seed(40)

  # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
  wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
  data = pd.read_csv(wine_path)

  # Split the data into training and test sets. (0.75, 0.25) split.
  train, test = train_test_split(data)

  # The predicted column is "quality" which is a scalar from [3, 9]
  train_x = train.drop(["quality"], axis=1)
  test_x = test.drop(["quality"], axis=1)
  train_y = train[["quality"]]
  test_y = test[["quality"]]

  alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
  l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
  mlflow.set_tracking_uri("http://localhost:5002")
  client = mlflow.tracking.MlflowClient()
  mlflow.set_experiment('http_metrics_test')
  with mlflow.start_run():
      lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
      lr.fit(train_x, train_y)

      predicted_qualities = lr.predict(test_x)

      (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

      print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
      print("  RMSE: %s" % rmse)
      print("  MAE: %s" % mae)
      print("  R2: %s" % r2)

      mlflow.log_param("alpha", alpha)
      mlflow.log_param("l1_ratio", l1_ratio)
      mlflow.log_metric("rmse", rmse)
      mlflow.log_metric("r2", r2)
      mlflow.log_metric("mae", mae)

      mlflow.sklearn.log_model(lr, "model")

留神：

  1.`mlflow.set_tracking_uri("http://localhost:5002")` 设置为方才启动的mlflow tracking server的地址    
  2.`mlflow.set_experiment('http_metrics_test')` 设置试验的名字    
  3.装置该程序所依赖的python包   
  4.如果不是在同一个conda环境中，还得执行     
   ```
    export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
    export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001
   ```
   便于python客户端上传模型文件以及模型元数据

间接执行 python wine.py 如果胜利，拜访mlflow tracking server ui下有如下

点击 2020-10-30 10:34:38，如下：

启动mlflow 算法服务

在同一个conda环境中执行命令

export MLFLOW_TRACKING_URI=http://localhost:5002 
mlflow models serve -m runs:/e69aed0b22fb45debd115dfc09dbc75a/model -p 1234 --no-conda

其中e69aed0b22fb45debd115dfc09dbc75a为mlflow tracking server ui中的run id

如遇到ModuleNotFoundError: No module named ‘sklearn’
执行 pip install scikit-learn==0.19.1
遇到ModuleNotFoundError: No module named ‘scipy’
执行pip install scipy

申请拜访该model启动的服务:

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations

输入 [5.455573233630147] 则表明该模型服务胜利部署

至此次要简略的mlflow应用就实现了，如果还有mlflow不反对的算法，能够参照自定义model

关于spark:mlflow的搭建使用

背景

搭建

应用

启动mlflow 算法服务

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于spark:mlflow的搭建使用

背景

搭建

应用

启动mlflow 算法服务

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复