关于人工智能:如何在PyTorch和TensorFlow中训练图像分类模型

作者 |PULKIT SHARMA
编译 |Flin
起源 |analyticsvidhya

图像分类是计算机视觉的最重要利用之一。它的利用范畴包含从主动驾驶汽车中的物体分类到医疗行业中的血细胞辨认，从制造业中的缺点物品辨认到建设能够对戴口罩与否的人进行分类的零碎。在所有这些行业中，图像分类都以一种或另一种形式应用。他们是如何做到的呢？他们应用哪个框架？

你必须已浏览很多无关不同深度学习框架（包含 TensorFlow，PyTorch，Keras 等）之间差别的信息。TensorFlow 和 PyTorch 无疑是业内最受欢迎的框架。我置信你会发现无穷的资源来学习这些深度学习框架之间的异同。

这是为你提供的一份资源：每个数据科学家都必须晓得的 5 种惊人的深度学习框架！

https://www.analyticsvidhya.c…

在本文中，咱们将理解如何在 PyTorch 和 TensorFlow 中建设根本的图像分类模型。咱们将从 PyTorch 和 TensorFlow 的简要概述开始。而后，咱们将应用 MNIST 手写数字分类数据集，并在 PyTorch 和 TensorFlow 中应用 CNN（卷积神经网络）建设图像分类模型。

这将是你的终点，而后你能够抉择本人喜爱的任何框架，也能够开始构建其余计算机视觉模型。

如果你不相熟深度学习而且对计算机视觉畛域很感兴趣（谁不是呢），请查看“认证计算机视觉硕士课程”。

https://courses.analyticsvidh…

PyTorch 概述
TensorFlow 概述
理解问题陈说：MNIST
在 PyTorch 中实现卷积神经网络（CNN）
在 TensorFlow 中施行卷积神经网络（CNN）

PyTorch 在深度学习社区中越来越受欢迎，并且被深度学习从业者宽泛应用，PyTorch 是一个提供 Tensor 计算的 Python 软件包。此外，tensors 是多维数组，就像 NumPy 的 ndarrays 也能够在 GPU 上运行一样。

PyTorch 的一个独特性能是它应用动静计算图。PyTorch 的 Autograd 软件包从张量生成计算图并主动计算梯度。而不是具备特定性能的预约义图形。

PyTorch 为咱们提供了一个框架，能够随时随地构建计算图，甚至在运行时进行更改。特地是，对于咱们不晓得创立神经网络须要多少内存的状况，这很有用。

你能够应用 PyTorch 应答各种深度学习挑战。以下是一些挑战：

图像（检测，分类等）
文字（分类，生成等）
强化学习

如果你心愿从头开始理解 PyTorch，则以下是一些具体资源：

PyTorch 入门指南
- https://www.analyticsvidhya.c…
在 PyTorch 中应用卷积神经网络建设图像分类模型
- https://www.analyticsvidhya.c…
所有人的深度学习：应用 PyTorch 把握弱小的迁徙学习艺术
- https://www.analyticsvidhya.c…
应用 PyTorch 进行深度学习的图像增强–图像特色工程
- https://www.analyticsvidhya.c…

TensorFlow 由 Google Brain 团队的钻研人员和工程师开发。它与深度学习畛域最罕用的软件库相距甚远（只管其他软件库正在迅速追赶）。

TensorFlow 如此受欢迎的最大起因之一是它反对多种语言来创立深度学习模型，例如 Python，C ++ 和 R。它提供了具体的文档和指南的领导。

TensorFlow 蕴含许多组件。以下是两个卓越的代表：

TensorBoard：应用数据流图帮忙无效地可视化数据
TensorFlow：对于疾速部署新算法 / 试验十分有用

TensorFlow 以后正在运行 2.0 版本，该版本于 2019 年 9 月正式公布。咱们还将在 2.0 版本中实现 CNN。

如果你想理解无关此新版本的 TensorFlow 的更多信息，请查看 TensorFlow 2.0 深度学习教程

https://www.analyticsvidhya.c…

我心愿你当初对 PyTorch 和 TensorFlow 都有根本的理解。当初，让咱们尝试应用这两个框架构建深度学习模型并理解其外部工作。在此之前，让咱们首先理解咱们将在本文中解决的问题陈说。

在开始之前，让咱们理解数据集。在本文中，咱们将解决风行的 MNIST 问题。这是一个数字辨认工作，其中咱们必须将手写数字的图像分类为 0 到 9 这 10 个类别之一。

在 MNIST 数据集中，咱们具备从各种扫描的文档中获取的数字图像，尺寸通过标准化并居中。随后，每个图像都是 28 x 28 像素的正方形（总计 784 像素）。数据集的规范拆分用于评估和比拟模型，其中 60,000 张图像用于训练模型，而独自的 10,000 张图像集用于测试模型。

.png)

当初，咱们也理解了数据集。因而，让咱们在 PyTorch 和 TensorFlow 中应用 CNN 构建图像分类模型。咱们将从 PyTorch 中的实现开始。咱们将在 google colab 中实现这些模型，该模型提供收费的 GPU 以运行这些深度学习模型。

我心愿你相熟卷积神经网络（CNN），如果没有，请随时参考以下文章：

从头开始学习卷积神经网络的综合教程:https://www.analyticsvidhya.c…

让咱们首先导入所有库：

# importing the libraries
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

咱们还要在 Google colab 上查看 PyTorch 的版本：

# version of pytorch
print(torch.__version__)

因而，我正在应用 1.5.1 版本的 PyTorch。如果应用任何其余版本，则可能会收到一些正告或谬误，因而你能够更新到此版本的 PyTorch。咱们将对图像执行一些转换，例如对像素值进行归一化，因而，让咱们也定义这些转换：

# transformations to be applied on images
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])

当初，让咱们加载 MNIST 数据集的训练和测试集：

# defining the training and testing set
trainset = datasets.MNIST('./data', download=True, train=True, transform=transform)
testset = datasets.MNIST('./', download=True, train=False, transform=transform)

接下来，我定义了训练和测试加载器，这将帮忙咱们分批加载训练和测试集。我将批量大小定义为 64：

# defining trainloader and testloader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

首先让咱们看一下训练集的摘要：


# shape of training data
dataiter = iter(trainloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

.png)

因而，在每个批次中，咱们有 64 个图像，每个图像的大小为 28,28，并且对于每个图像，咱们都有一个相应的标签。让咱们可视化训练图像并查看其外观：

# visualizing the training images
plt.imshow(images[0].numpy().squeeze(), cmap='gray')

.png)

它是数字 0 的图像。相似地，让咱们可视化测试集图像：

# shape of validation data
dataiter = iter(testloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

.png)

在测试集中，咱们也有大小为 64 的批次。当初让咱们定义架构

咱们将在这里应用 CNN 模型。因而，让咱们定义并训练该模型：

# defining the model architecture
class Net(nn.Module):   
  def __init__(self):
      super(Net, self).__init__()

      self.cnn_layers = nn.Sequential(
          # Defining a 2D convolution layer
          nn.Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
          # Defining another 2D convolution layer
          nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
      )

      self.linear_layers = nn.Sequential(nn.Linear(4 * 7 * 7, 10)
      )

  # Defining the forward pass    
  def forward(self, x):
      x = self.cnn_layers(x)
      x = x.view(x.size(0), -1)
      x = self.linear_layers(x)
      return x

咱们还定义优化器和损失函数，而后咱们将看一下该模型的摘要：

# defining the model
model = Net()
# defining the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)
# defining the loss function
criterion = nn.CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()
    
print(model)

.png)

因而，咱们有 2 个卷积层，这将有助于从图像中提取特色。这些卷积层的特色传递到齐全连贯的层，该层将图像分类为各自的类别。当初咱们的模型架构已准备就绪，让咱们训练此模型十个期间：

for i in range(10):
    running_loss = 0
    for images, labels in trainloader:

        if torch.cuda.is_available():
          images = images.cuda()
          labels = labels.cuda()

        # Training pass
        optimizer.zero_grad()
        
        output = model(images)
        loss = criterion(output, labels)
        
        #This is where the model learns by backpropagating
        loss.backward()
        
        #And optimizes its weights here
        optimizer.step()
        
        running_loss += loss.item()
    else:
        print("Epoch {} - Training loss: {}".format(i+1, running_loss/len(trainloader)))

.png)

你会看到训练随着期间的减少而缩小。这意味着咱们的模型是从训练集中学习模式。让咱们在测试集上查看该模型的性能：

# getting predictions on test set and measuring the performance
correct_count, all_count = 0, 0
for images,labels in testloader:
  for i in range(len(labels)):
    if torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    img = images[i].view(1, 1, 28, 28)
    with torch.no_grad():
        logps = model(img)

    
    ps = torch.exp(logps)
    probab = list(ps.cpu()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.cpu()[i]
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

.png)

因而，咱们总共测试了 10000 张图片，并且该模型在预测测试图片的标签方面的准确率约为 96%。

这是你能够在 PyTorch 中构建卷积神经网络的办法。在下一节中，咱们将钻研如何在 TensorFlow 中实现雷同的体系结构。

当初，让咱们在 TensorFlow 中应用卷积神经网络解决雷同的 MNIST 问题。与平常一样，咱们将从导入库开始：

# importing the libraries
import tensorflow as tf

from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

检查一下咱们正在应用的 TensorFlow 的版本：


# version of tensorflow
print(tf.__version__)

因而，咱们正在应用 TensorFlow 的 2.2.0 版本。当初让咱们应用 tensorflow.keras 的数据集类加载 MNIST 数据集：


(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data(path='mnist.npz')
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

在这里，咱们曾经加载了训练以及 MNIST 数据集的测试集。此外，咱们曾经将训练和测试图像的像素值标准化了。接下来，让咱们可视化来自数据集的一些图像：

# visualizing a few images
plt.figure(figsize=(10,10))
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap='gray')
plt.show()

.png)

这就是咱们的数据集的样子。咱们有手写数字的图像。再来看一下训练和测试集的形态：

# shape of the training and test set
(train_images.shape, train_labels.shape), (test_images.shape, test_labels.shape)

.png)

因而，咱们在训练集中有 60,000 张 28 乘 28 的图像，在测试集中有 10,000 张雷同形态的图像。接下来，咱们将调整图像的大小，并一键编码指标变量：

# reshaping the images
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# one hot encoding the target variable
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

当初，咱们将定义模型的体系结构。咱们将应用 Pytorch 中定义的雷同架构。因而，咱们的模型将是具备 2 个卷积层，以及最大池化层的组合，而后咱们将有一个 Flatten 层，最初是一个有 10 个神经元的全连贯层，因为咱们有 10 个类。

# defining the model architecture
model = models.Sequential()
model.add(layers.Conv2D(4, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Conv2D(4, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))

让咱们疾速看一下该模型的摘要：

# summary of the model
model.summary()

.png)

总而言之，咱们有 2 个卷积层，2 个最大池层，一个 Flatten 层和一个全连贯层。模型中的参数总数为 1198 个。当初咱们的模型曾经筹备好了，咱们将编译它：

# compiling the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

咱们正在应用 Adam 优化器，你也能够对其进行更改。损失函数被设置为分类穿插熵，因为咱们正在解决一个多类分类问题，并且度量规范是‘accuracy’。当初让咱们训练模型 10 个期间

# training the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

.png)

总而言之，最后，训练损失约为 0.46，通过 10 个期间后，训练损失降至 0.08。10 个期间后的训练和验证准确性别离为 97.31%和 97.48%。

因而，这就是咱们能够在 TensorFlow 中训练 CNN 的形式。

总而言之，在本文中，咱们首先钻研了 PyTorch 和 TensorFlow 的简要概述。而后咱们理解了 MNIST 手写数字分类的挑战，最初，在 PyTorch 和 TensorFlow 中应用 CNN（卷积神经网络）建设了图像分类模型。当初，我心愿你相熟这两个框架。下一步，应答另一个图像分类挑战，并尝试同时应用 PyTorch 和 TensorFlow 来解决。

上面是一些练习和图像分类方面的技巧

辨认服装（时尚 MNIST）：https://datahack.analyticsvid…

原文链接：https://www.analyticsvidhya.c…

欢送关注磐创 AI 博客站：
http://panchuang.net/

sklearn 机器学习中文官网文档：
http://sklearn123.com/

欢送关注磐创博客资源汇总站：
http://docs.panchuang.net/

关于人工智能:如何在PyTorch和TensorFlow中训练图像分类模型

介绍

目录

PyTorch 概述

TensorFlow 概述

理解问题陈说：MNIST

在 PyTorch 中实现卷积神经网络（CNN）

定义模型架构

在 TensorFlow 中施行卷积神经网络（CNN）

定义模型体系结构

尾注