Tensorflow Rust实战上篇

共计 5766 个字符，预计需要花费 15 分钟才能阅读完成。

机器学习的一个方向是能够将它用于照片中的对象识别。这包括能够挑选动物，建筑物甚至人脸等特征。本文将引导您使用一些现有模型来使用 rust 和 tensorflow 完成人脸检测。我们将使用一个名为 mtcnn 的预训练模型进行人脸检测（注意：训练新模型不是我们在本文中关注的内容）。

我们想要读取照片，检测到人脸，然后返回带有绘制边界框的图像。
换句话说，我们想转换它（图片使用 RustFest 的许可，由 FionaCastiñeira 拍摄）：

最初的 MTCNN 模型是使用 Caffe 编写的，但幸运的是 mtcnn 有许多 tensorflow python 实现。我将选择 tf-mtcnn，因为它是直接转换为单个图形模型文件。

首先，我们要添加 tensorflow rust 作为依赖。从 Cargo.toml 开始：

[package]
name = "mtcnn"
version = "0.1.0"
edition = "2018"

[dependencies]
tensorflow = "0.12.0"

我们要做的是加载一个 Graph，它是预先训练好的 MTCNN，并运行一个会话。要点是 Graph 是用于计算的模型，Session 是 Graph 的一次运行。有关这些概念的更多信息可以在这里找到。我喜欢将 Graph 视为大桶中的人造大脑，用途只是在您输入和输出时获得一些很棒的图像。

因此，让我们首先抓住现有的 mtcnn.pb 模型并尝试加载它。Tensorflow 图以 protobuf 格式序列化，可以使用 Graph::import_graph_def 加载。

use std::error::Error;

use tensorflow::Graph;
use tensorflow::ImportGraphDefOptions;

fn main() -> Result<(), Box<dyn Error>> {

    //First, we load up the graph as a byte array
    let model = include_bytes!("mtcnn.pb");

    //Then we create a tensorflow graph from the model
    let mut graph = Graph::new();
    graph.import_graph_def(&*model, &ImportGraphDefOptions::new())?

    Ok(())
}

跑 cargo run 命令，我们应该看到没有任何错误:

$ cargo run
   Compiling mtcnn v0.1.0 (~/mtcnn)
    Finished dev [unoptimized + debuginfo] target(s) in 0.89s
     Running `target/debug/mtcnn`

太棒了！看起来我们可以加载此图像！

我们想要测试图像生成，所以让我们使用 structopt 来获取两个参数：输入和输出。
如果您之前没有使用过 structopt：structopt 就像将 clap 与 serde 结合起来一样。输入参数是图像文件的路径。输出是我们保存输出图像的位置。

因此我们的结构体长这样：

use std::path::PathBuf;
use structopt::StructOpt

#[derive(StructOpt)]
struct Opt {#[structopt(parse(from_os_str))]
    input: PathBuf,

    #[structopt(parse(from_os_str))]
    output: PathBuf
}

parse（from_os_str）属性将字符串参数转换为 PathBuf 以节省一些开销，然后我们可以使用它来获取带有命令行参数的结构：

fn main() -> Result<(), Box<dyn Error>> {let opt = Opt::from_args();
    ....
}

我们需要提供以我们的图像数据为基础的 tensorflow graph。那我们该怎么做呢？我们使用 Tensor！Tensors 表示我们图形中的数据，它让我想起将顶点发送到 GPU。您有大量数据，并以 tensorflow 期望的格式发送它。

此图中的输入 tensor 是一个浮点数组，其尺寸为：高 x 宽 x 3（对于 3 个颜色通道）。

让我们使用 image crate 来加载图像：

let input_image = image::open(&opt.input)?;

接下来，我们要通过使用 GenericImage::pixels 函数将此图像转换为其原始像素，并将其发送到我们的图形。所有多维 Tensor 阵列都是扁平的，并主要按行顺序存储。该模型使用 BGR 而不是传统的 RGB 颜色，因此我们需要在迭代时反转像素值。

把它们放在一起：

let mut flattened: Vec<f32> = Vec::new();

for (_x, _y, rgb) in input_image.pixels() {flattened.push(rgb[2] as f32);
    flattened.push(rgb[1] as f32);
    flattened.push(rgb[0] as f32);
}

这只是迭代像素，将它们添加到扁平的 Vec。然后我们可以将它加载到 tensor 中，将图像高度和宽度指定为参数：

let input = Tensor::new(&[input_image.height() as u64, input_image.width() as u64, 3])
    .with_values(&flattened)?

太好了！我们已将图像加载到 graph 理解的格式中。让我们跑一个 session 试试！

我们有一个 graph、输入的图像，但是现在我们需要一个 session。我们将使用默认值跑一个 session:

let mut session = Session::new(&SessionOptions::new(), &graph)?;

在我们运行会话之前，我们还有一些 mtcnn 模型所期望的其他输入。我们将使用与 mtcnn 库相同的默认值来获取这些值：

let min_size = Tensor::new(&[]).with_values(&[40f32])?;
let thresholds = Tensor::new(&[3]).with_values(&[0.6f32, 0.7f32, 0.7f32])?;
let factor = Tensor::new(&[]).with_values(&[0.709f32])?;

该 graph 可以定义在运行之前所需的多个输入 / 输出，并且它取决于具体的神经网络是什么。对于 MTCNN，这些都在原始实现中描述。可能最重要的一个是 min_size，它描述了寻找人脸的最小尺寸。

现在我们构建 session 的输入参数：

let mut args = SessionRunArgs::new();

//Load our parameters for the model
args.add_feed(&graph.operation_by_name_required("min_size")?, 0, &min_size);
args.add_feed(&graph.operation_by_name_required("thresholds")?, 0, &thresholds);
args.add_feed(&graph.operation_by_name_required("factor")?, 0, &factor);

//Load our input image
args.add_feed(&graph.operation_by_name_required("input")?, 0, &input);

好的，输出怎么样？会话结束时我们要响应里获取边界框和概率：

let bbox = args.request_fetch(&graph.operation_by_name_required("box")?, 0);
let prob = args.request_fetch(&graph.operation_by_name_required("prob")?, 0);

非常酷，我们已经定义好输入和输出，可以跑起来了！

session.run(&mut args)?;

该模型输出以下值：

边界框的人脸
人脸的地标
是人脸可能性：从 0 到 1

为了使它更容易使用，我们将定义一个边界框结构，以更易于阅读的方式对这些值进行编码：

#[derive(Copy, Clone, Debug)]
pub struct BBox {
    pub x1: f32,
    pub y1: f32,
    pub x2: f32,
    pub y2: f32,
    pub prob: f32,
}

为简单起见，我们将省略地标，但如果需要，我们可以随时添加它们。我们的工作是将我们从 tensorflow session 中返回的数组转换为此结构体，这样它就更有意义。

是的，现在让我们保存输出。就像输入一样，输出也是 Tensors：

let bbox_res: Tensor<f32> = args.fetch(bbox)?;
let prob_res: Tensor<f32> = args.fetch(prob)?;

bbox 的定义是什么？好吧，它是一个多维扁平数组，每个边界框包含 4 个浮点数，代表边界框范围。prob 是一个浮点值的数组，每个人脸都有一个浮点值：从 0 到 1 的概率。所以我们应该期望 bbox_res 长度是人脸数 x 4，而 prob_res 等于人脸数。

让我们做一些基本的迭代并将结果存储到 Vec 中：

//Let's store the results as a Vec<BBox>
let mut bboxes = Vec::new();

let mut i = 0;
let mut j = 0;

//While we have responses, iterate through
while i < bbox_res.len() {

    //Add in the 4 floats from the `bbox_res` array. 
    //Notice the y1, x1, etc.. is ordered differently to our struct definition.
    bboxes.push(BBox {y1: bbox_res[i],
        x1: bbox_res[i + 1],
        y2: bbox_res[i + 2],
        x2: bbox_res[i + 3],
        prob: prob_res[j], // Add in the facial probability
    });

    //Step `i` ahead by 4. 
    i += 4;
    //Step `i` ahead by 1. 
    j += 1;
}

好吧，我们还没有将边界框编码成图像。但是让我们进行 debug，以确保我们得到了结果：

println!("BBox Length: {}, Bboxes:{:#?}", bboxes.len(), bboxes);

跑上面的代码，得到如下输出：

BBox Length: 120, BBoxes:[
    BBox {
        x1: 471.4591,
        y1: 287.59888,
        x2: 495.3053,
        y2: 317.25327,
        prob: 0.9999908
    },
    ....

哇哦!120 张人脸! 非常好！

太好了，我们有一些边界框。让我们在图像上绘制它们，并将输出保存到文件中。
要绘制边界框，我们可以使用 imageproc 库在边界框周围绘制简单边框。
首先，我们将在 main 函数之外定义线条颜色：

const LINE_COLOUR: Rgba<u8> = Rgba {data: [0, 255, 0, 0],
};

回到图像，我们只读取输入图像，所以让我们首先克隆它，然后我们可以修改图像：

let mut output_image = input_image.clone();

然后我们迭代 bboxe 数组:

for bbox in bboxes {//Drawing Happens Here!}

接下来，我们使用 draw_hollow_rect_mut 函数。这将采用可变图像引用，并绘制由输入 Rect 指定的空心矩形（轮廓），覆盖任何现有像素。

Rect 使用 at 函数获取 x 和 y 坐标，然后使用 of_size 函数获取宽度和高度。我们使用一些几何运算将我们的边界框转换为这种格式：

let rect = Rect::at(bbox.x1 as i32, bbox.y1 as i32)
    .of_size((bbox.x2 - bbox.x1) as u32, (bbox.y2 - bbox.y1) as u32);

然后缓制 Rect:

draw_hollow_rect_mut(&mut img, rect, LINE_COLOUR);

当 for 循环结束，我们保存图像到输出文件：

output_image.save(&opt.output)?

ok, 完成！我们跑一跑：

$ cargo run rustfest.jpg output.jpg
   Compiling mtcnn v0.1.0 (~/mtcnn)
    Finished dev [unoptimized + debuginfo] target(s) in 5.12s
     Running `target/debug/mtcnn rustfest.jpg output.jpg`
2019-03-28 16:15:48.194933: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
BBox Length: 154, BBoxes:[
    BBox {
        x1: 951.46875,
        y1: 274.00577,
        x2: 973.68304,
        y2: 301.93915,
        prob: 0.9999999
    },
....

太好了！没有错误！

让我们逐步完成在这个小应用程序中所做的事情：

加载了预先训练的 tensorflow graph
解析命令行参数
读入图像数据
通过运行 tensorflow session 提取人脸
将该 session 的结果保存回图像
写图像文件

希望这能为您提供在 rust 中使用 tensorflow 的良好介绍

挑战

Tensorflow and MTCNN

处理输入

StructOpt and the Command Line

加载图像数据

构建一个 tensorflow session

跑一个 session

输出处理

BBox 结构体

保存输出

打印边界框

绘制边界框

总结

Just My Socks（注册教程内含优惠码）