关于机器学习:深度学习感知机原理与实战手写数字识别

留个留念，🕚2022 年 2 月 15 日上午 11 时，我的思否创作之路自此开启！🎉🎉🎉

所谓机器学习，在大多数时候都是拿到现有的模型做些简略的批改后就开始“炼丹”，次要工作就是调参，所以江湖人称“调参师”或者“炼丹师”。因而，我想对一些罕用的机器学习模型做一些梳理和总结，一来是作为集体的学习笔记，二来是不便各位点进来的敌人复制代码后能够间接开始“炼丹”，争取做到「开箱即用」。

观前提醒：这是自己在思否的第一篇文章，程度无限，先在这里给各位大佬赔不是了🙏。

梳理的程序根本是依照工夫来的，大体合乎机器学习算法的倒退过程，所有模型都会提供其 Pytorch 实现，并简要介绍其原理。本文介绍的是神经网络的鼻祖——感知机。上面开始注释👇

感知机（Perceptron），又称“人工神经元”或“奢侈感知机”，是神经网络的根本单元，本文先介绍感知机的基本原理，而后联合具体的分类工作给出感知机模型的 Pytorch 实现。

Rosenblatt 是神经网络的开山鼻祖，他于 1957 年提出了感知机（Perceptron）的实践；1960 年，他基于硬件构造搭建了一个神经网络。然而，这项成绩受到 Marvin Minksy 和 Seymour Papert 的质疑，使得 Perceptron 寂静了近 20 年，直到 80 年代 Hinton 创造 BP 算法才使得其成为热门。

假如输出空间（特色空间）为 $x\in R^n $，输入空间是 $y\in\{1,-1\} $，则输出空间到输入空间的函数：$f(x)=sign(wx+b) $ 就称为感知机。其中，w 叫做权值（weight）或权值向量（weight vector），b 叫做偏置（bias），sign 是符号函数：

$$
sign(x)=\begin{cases}1,x\geq0\\-1,x<0\end{cases}
$$

给定数据集 $T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)\} $，则利用感知机进行分类学习的过程等价于求解如下最小化问题：

$$
min L(w,b)=-\sum_{x_i\in M}y_i(wx_i+b_i)
$$

其中，M 是误分类点的汇合，也就是说感知机是由误分类点驱动的。对于 w 和 b 的更新则是采纳随机梯度降落法（SGD）：

$$
w^{i+1}=w^i – \eta\frac{\partial L(w,b)}{\partial w}\\
b^{i+1}=b^i – \eta\frac{\partial L(w,b)}{\partial b}
$$

其中，$\eta $ 称为学习率。

导包

import numpy as np
import matplotlib.pyplot as plt
import torch
%matplotlib inline

加载数据

data = np.genfromtxt('../data/perceptron_toydata.txt', delimiter='\t')
X, y = data[:, :2], data[:, 2]
y = y.astype(np.int)

print('Class label counts:', np.bincount(y))
print('X.shape:', X.shape)
print('y.shape:', y.shape)

输入如下👇

Class label counts: [50 50]
X.shape: (100, 2)
y.shape: (100,)

打乱数据并随机划分训练集和测试集

shuffle_idx = np.arange(y.shape[0])
shuffle_rng = np.random.RandomState(123) #定义一个随机数种子，实现每次代码执行生成的随机数集都雷同
shuffle_rng.shuffle(shuffle_idx)
X, y = X[shuffle_idx], y[shuffle_idx]
X_train, X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]]
y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]

对数据进行Z-Score 标准化，标准化后的数据均值为 0，方差为 1，标准化后特色数据的散布没有产生扭转。

线性模型个别状况下都须要做数据归一化 / 标准化解决 ，如KNN(K 近邻)、K-means 聚类、感知机和SVM。

决策树、基于决策树的 Boosting 和Bagging等集成学习模型对于特色取值大小并不敏感，如随机森林、XGBoost、LightGBM等树模型，以及奢侈贝叶斯，这些模型个别不须要做数据归一化 / 标准化解决。

# Normalize (mean zero, unit variance)
mu, sigma = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma

数据散点图👇，能够显著看出分为两类。

plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.legend()
plt.show()

模型定义

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


def custom_where(cond, x_1, x_2):
    return (cond * x_1) + ((~cond) * x_2)


class Perceptron():
    def __init__(self, num_features):
        self.num_features = num_features
        self.weights = torch.zeros(num_features, 1, 
                                   dtype=torch.float32, device=device)
        self.bias = torch.zeros(1, dtype=torch.float32, device=device)

    def forward(self, x):
        linear = torch.add(torch.mm(x, self.weights), self.bias)
        predictions = custom_where(linear > 0., 1, 0).float()
        return predictions
        
    def backward(self, x, y):  
        predictions = self.forward(x)
        errors = y - predictions
        return errors
        
    def train(self, x, y, epochs):
        for e in range(epochs):
            
            for i in range(y.size()[0]):
                # use view because backward expects a matrix (i.e., 2D tensor)
                errors = self.backward(x[i].view(1, self.num_features), y[i]).view(-1)
                self.weights += (errors * x[i]).view(self.num_features, 1)
                self.bias += errors
                
    def evaluate(self, x, y):
        predictions = self.forward(x).view(-1)
        accuracy = torch.sum(predictions == y).float() / y.size()[0]
        return accuracy

模型训练

ppn = Perceptron(num_features=2)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32, device=device)

ppn.train(X_train_tensor, y_train_tensor, epochs=10)

print('Model parameters:')
print('Weights: %s' % ppn.weights)
print('Bias: %s' % ppn.bias)

输入如下👇

Model parameters:
Weights: tensor([[1.2734], [1.3464]])
Bias: tensor([-1.])

模型评估

X_test_tensor = torch.tensor(X_test, dtype=torch.float32, device=device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32, device=device)

test_acc = ppn.evaluate(X_test_tensor, y_test_tensor)
print('Test set accuracy: %.2f%%' % (test_acc*100))

输入如下👇

Test set accuracy: 93.33%

效果图

w, b = ppn.weights, ppn.bias

x_min = -2
y_min = ((-(w[0] * x_min) - b[0]) 
          / w[1] )

x_max = 2
y_max = ((-(w[0] * x_max) - b[0]) 
          / w[1] )


fig, ax = plt.subplots(1, 2, sharex=True, figsize=(7, 3))

ax[0].plot([x_min, x_max], [y_min, y_max])
ax[1].plot([x_min, x_max], [y_min, y_max])

ax[0].scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
ax[0].scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')

ax[1].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='class 0', marker='o')
ax[1].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='class 1', marker='s')

ax[1].legend(loc='upper left')
plt.show()

导包

import time
import numpy as np
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch


if torch.cuda.is_available():
  torch.backends.cudnn.deterministic = True

参数设置

# Device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

# Hyperparameters
random_seed = 1
learning_rate = 0.1
num_epochs = 10
batch_size = 64

# Architecture
num_features = 784
num_hidden_1 = 128
num_hidden_2 = 256
num_classes = 10

加载数据

train_dataset = datasets.MNIST(root='data', 
                             train=True, 
                             transform=transforms.ToTensor(),
                             download=True)

test_dataset = datasets.MNIST(root='data', 
                            train=False, 
                            transform=transforms.ToTensor())


train_loader = DataLoader(dataset=train_dataset, 
                        batch_size=batch_size, 
                        shuffle=True)

test_loader = DataLoader(dataset=test_dataset, 
                       batch_size=batch_size, 
                       shuffle=False)

# Checking the dataset
for images, labels in train_loader:  
  print('Image batch dimensions:', images.shape)
  print('Image label dimensions:', labels.shape)
  break

transforms.ToTensor() 将输出图像缩放到 0-1 范畴，输入如下👇

Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])

模型定义

class MultilayerPerceptron(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(MultilayerPerceptron, self).__init__()
        
        ### 1st hidden layer
        self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
        # 权重初始化，默认状况下，PyTorch 应用 Xavier/Glorot 初始化
        self.linear_1.weight.detach().normal_(0.0, 0.1)
        self.linear_1.bias.detach().zero_()
        #self.linear_1_bn = torch.nn.BatchNorm1d(num_hidden_1)
        
        ### 2nd hidden layer
        self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
        self.linear_2.weight.detach().normal_(0.0, 0.1)
        self.linear_2.bias.detach().zero_()
        
        ### Output layer
        self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
        self.linear_out.weight.detach().normal_(0.0, 0.1)
        self.linear_out.bias.detach().zero_()
        
    def forward(self, x):
        out = self.linear_1(x)
        out = F.relu(out)
        #out = self.linear_1_bn(out)
        
        out = self.linear_2(out)
        out = F.relu(out)
        #out = F.dropout(out, p=dropout_prob, training=self.training)
        
        logits = self.linear_out(out)
        probas = F.log_softmax(logits, dim=1)
        return logits, probas

    
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
                             num_classes=num_classes)

model = model.to(device)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

BatchNorm 和 Dropout 的用法在下面的代码中的 # 号正文处，BatchNorm通过缩小外部协变量偏移来减速深度网络训练，Dropout 应用来自伯努利散布的样本以概率 p 将输出张量的一些元素随机归零，是应答 过拟合 时的一种罕用办法。

模型训练

def compute_accuracy(net, data_loader):
  net.eval()
  correct_pred, num_examples = 0, 0
  with torch.no_grad():
      for features, targets in data_loader:
          features = features.view(-1, 28*28).to(device)
          targets = targets.to(device)
          logits, probas = net(features)
          _, predicted_labels = torch.max(probas, 1)
          num_examples += targets.size(0)
          correct_pred += (predicted_labels == targets).sum()
      return correct_pred.float()/num_examples * 100

计算准确率☝

start_time = time.time()
minibatch_cost = []
epoch_acc = []
for epoch in range(num_epochs):
  model.train()
  for batch_idx, (features, targets) in enumerate(train_loader):
      
      features = features.view(-1, 28*28).to(device)
      targets = targets.to(device)
          
      ### FORWARD AND BACK PROP
      logits, probas = model(features)
      cost = F.cross_entropy(logits, targets)
      optimizer.zero_grad()
      
      cost.backward()
      
      ### UPDATE MODEL PARAMETERS
      optimizer.step()
      
      ### LOGGING
      minibatch_cost.append(cost)
      if not batch_idx % 50:
          print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                 %(epoch+1, num_epochs, batch_idx, 
                   len(train_loader), cost))

  with torch.set_grad_enabled(False):
      acc = compute_accuracy(model, train_loader)
      epoch_acc.append(acc)
      print('Epoch: %03d/%03d training accuracy: %.2f%%' % (epoch+1, num_epochs, acc))
      
  print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
  
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

训练过程可视化

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(range(len(minibatch_cost)), minibatch_cost)
plt.ylabel('Train loss')
plt.xlabel('Minibatch')
plt.show()

plt.plot(range(len(epoch_acc)), epoch_acc)
plt.ylabel('Train Acc')
plt.xlabel('Epoch')
plt.show()

上述代码☝执行报错，起因是 minibatch_cost 的每一个元素都是带有梯度的tensor，无奈转化成numpy，解决办法是在此之前增加上面这行代码：

minibatch_cost = [a.detach().numpy() for a in minibatch_cost]

跑 50 个 epoch 的损失和准确率变动图如下👇

模型评估

在测试集上的准确率

print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

后果如下👇

Test accuracy: 98.04%

for features, targets in test_loader:
    break

_, predictions = model.forward(features[:4].view(-1, 28*28))
predictions = torch.argmax(predictions, dim=1)
predictions = predictions.tolist()

fig, ax = plt.subplots(1, 4)
for i in range(4):
    ax[i].imshow(features[i].view(28, 28), cmap=matplotlib.cm.binary)
    ax[i].set_title("Predicted:" + str(predictions[i]))

plt.show()

感激大家能看到这里，如果你感觉这篇内容对你有帮忙的话：

点赞反对下吧，让更多的人也能看到这篇内容。
欢送在留言区与我分享你的想法，也欢送你在留言区记录你的思考过程。

再次感激大家的反对与激励🌹🌹🌹

ps：此文是自己曾在掘金创作过的文章点此跳转，因而图片带有掘金的水印😅。

关于机器学习:深度学习感知机原理与实战手写数字识别

前言

感知机的准备常识

1.Rosenblatt

2. 基本原理

单层感知机模型对玩具数据分类

多层感知机模型 & 手写数字辨认

❤️ 感激大家