关于机器学习:深度学习感知机原理与实战手写数字识别

2次阅读

共计 9019 个字符,预计需要花费 23 分钟才能阅读完成。

留个留念,🕚2022 年 2 月 15 日上午 11 时,我的思否创作之路自此开启!🎉🎉🎉

前言

所谓机器学习,在大多数时候都是拿到现有的模型做些简略的批改后就开始“炼丹”,次要工作就是调参,所以江湖人称“调参师”或者“炼丹师”。因而,我想对一些罕用的机器学习模型做一些梳理和总结,一来是作为集体的学习笔记,二来是不便各位点进来的敌人复制代码后能够间接开始“炼丹”,争取做到「开箱即用」。

观前提醒:这是自己在思否的第一篇文章,程度无限,先在这里给各位大佬赔不是了🙏。

梳理的程序根本是依照工夫来的,大体合乎机器学习算法的倒退过程,所有模型都会提供其 Pytorch 实现,并简要介绍其原理。本文介绍的是神经网络的鼻祖——感知机。上面开始注释👇

感知机的准备常识

感知机(Perceptron),又称“人工神经元”或“奢侈感知机”,是神经网络的根本单元,本文先介绍感知机的基本原理,而后联合具体的分类工作给出感知机模型的 Pytorch 实现。

1.Rosenblatt

Rosenblatt 是神经网络的开山鼻祖,他于 1957 年提出了感知机(Perceptron)的实践;1960 年,他基于硬件构造搭建了一个神经网络。然而,这项成绩受到 Marvin Minksy 和 Seymour Papert 的质疑,使得 Perceptron 寂静了近 20 年,直到 80 年代 Hinton 创造 BP 算法才使得其成为热门。

2. 基本原理

假如输出空间(特色空间)为 \(x\in R^n \),输入空间是 \(y\in\{1,-1\} \),则输出空间到输入空间的函数:\(f(x)=sign(wx+b) \) 就称为感知机。其中,w 叫做权值(weight)或权值向量(weight vector),b 叫做偏置(bias),sign 是符号函数:

$$
sign(x)=\begin{cases}1,x\geq0\\-1,x<0\end{cases}
$$

给定数据集 \(T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)\} \),则利用感知机进行分类学习的过程等价于求解如下最小化问题:

$$
min L(w,b)=-\sum_{x_i\in M}y_i(wx_i+b_i)
$$

其中,M 是误分类点的汇合,也就是说感知机是由误分类点驱动的。对于 w 和 b 的更新则是采纳随机梯度降落法(SGD):

$$
w^{i+1}=w^i – \eta\frac{\partial L(w,b)}{\partial w}\\
b^{i+1}=b^i – \eta\frac{\partial L(w,b)}{\partial b}
$$

其中,\(\eta \) 称为学习率。

单层感知机模型对玩具数据分类

  • 导包

    import numpy as np
    import matplotlib.pyplot as plt
    import torch
    %matplotlib inline
  • 加载数据

    data = np.genfromtxt('../data/perceptron_toydata.txt', delimiter='\t')
    X, y = data[:, :2], data[:, 2]
    y = y.astype(np.int)
    
    print('Class label counts:', np.bincount(y))
    print('X.shape:', X.shape)
    print('y.shape:', y.shape)

输入如下👇

Class label counts: [50 50]
X.shape: (100, 2)
y.shape: (100,)

打乱数据并随机划分训练集和测试集

shuffle_idx = np.arange(y.shape[0])
shuffle_rng = np.random.RandomState(123) #定义一个随机数种子,实现每次代码执行生成的随机数集都雷同
shuffle_rng.shuffle(shuffle_idx)
X, y = X[shuffle_idx], y[shuffle_idx]
X_train, X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]]
y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]

对数据进行Z-Score 标准化,标准化后的数据均值为 0,方差为 1,标准化后特色数据的散布没有产生扭转。

线性模型个别状况下都须要做数据归一化 / 标准化解决 ,如KNN(K 近邻)、K-means 聚类、感知机和SVM

决策树、基于决策树的 BoostingBagging等集成学习模型对于特色取值大小并不敏感,如随机森林、XGBoostLightGBM等树模型,以及奢侈贝叶斯,这些模型个别不须要做数据归一化 / 标准化解决

# Normalize (mean zero, unit variance)
mu, sigma = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma

数据散点图👇,能够显著看出分为两类。

plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.legend()
plt.show()

  • 模型定义
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


def custom_where(cond, x_1, x_2):
    return (cond * x_1) + ((~cond) * x_2)


class Perceptron():
    def __init__(self, num_features):
        self.num_features = num_features
        self.weights = torch.zeros(num_features, 1, 
                                   dtype=torch.float32, device=device)
        self.bias = torch.zeros(1, dtype=torch.float32, device=device)

    def forward(self, x):
        linear = torch.add(torch.mm(x, self.weights), self.bias)
        predictions = custom_where(linear > 0., 1, 0).float()
        return predictions
        
    def backward(self, x, y):  
        predictions = self.forward(x)
        errors = y - predictions
        return errors
        
    def train(self, x, y, epochs):
        for e in range(epochs):
            
            for i in range(y.size()[0]):
                # use view because backward expects a matrix (i.e., 2D tensor)
                errors = self.backward(x[i].view(1, self.num_features), y[i]).view(-1)
                self.weights += (errors * x[i]).view(self.num_features, 1)
                self.bias += errors
                
    def evaluate(self, x, y):
        predictions = self.forward(x).view(-1)
        accuracy = torch.sum(predictions == y).float() / y.size()[0]
        return accuracy
  • 模型训练
ppn = Perceptron(num_features=2)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32, device=device)

ppn.train(X_train_tensor, y_train_tensor, epochs=10)

print('Model parameters:')
print('Weights: %s' % ppn.weights)
print('Bias: %s' % ppn.bias)

输入如下👇

Model parameters:
Weights: tensor([[1.2734], [1.3464]])
Bias: tensor([-1.])

  • 模型评估
X_test_tensor = torch.tensor(X_test, dtype=torch.float32, device=device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32, device=device)

test_acc = ppn.evaluate(X_test_tensor, y_test_tensor)
print('Test set accuracy: %.2f%%' % (test_acc*100))

输入如下👇

Test set accuracy: 93.33%

效果图

w, b = ppn.weights, ppn.bias

x_min = -2
y_min = ((-(w[0] * x_min) - b[0]) 
          / w[1] )

x_max = 2
y_max = ((-(w[0] * x_max) - b[0]) 
          / w[1] )


fig, ax = plt.subplots(1, 2, sharex=True, figsize=(7, 3))

ax[0].plot([x_min, x_max], [y_min, y_max])
ax[1].plot([x_min, x_max], [y_min, y_max])

ax[0].scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
ax[0].scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')

ax[1].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='class 0', marker='o')
ax[1].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='class 1', marker='s')

ax[1].legend(loc='upper left')
plt.show()

多层感知机模型 & 手写数字辨认

  • 导包

    import time
    import numpy as np
    from torchvision import datasets
    from torchvision import transforms
    from torch.utils.data import DataLoader
    import torch.nn.functional as F
    import torch
    
    
    if torch.cuda.is_available():
      torch.backends.cudnn.deterministic = True
  • 参数设置

    # Device
    device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
    
    # Hyperparameters
    random_seed = 1
    learning_rate = 0.1
    num_epochs = 10
    batch_size = 64
    
    # Architecture
    num_features = 784
    num_hidden_1 = 128
    num_hidden_2 = 256
    num_classes = 10
  • 加载数据

    train_dataset = datasets.MNIST(root='data', 
                                 train=True, 
                                 transform=transforms.ToTensor(),
                                 download=True)
    
    test_dataset = datasets.MNIST(root='data', 
                                train=False, 
                                transform=transforms.ToTensor())
    
    
    train_loader = DataLoader(dataset=train_dataset, 
                            batch_size=batch_size, 
                            shuffle=True)
    
    test_loader = DataLoader(dataset=test_dataset, 
                           batch_size=batch_size, 
                           shuffle=False)
    
    # Checking the dataset
    for images, labels in train_loader:  
      print('Image batch dimensions:', images.shape)
      print('Image label dimensions:', labels.shape)
      break

    transforms.ToTensor() 将输出图像缩放到 0-1 范畴,输入如下👇

Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])

  • 模型定义
class MultilayerPerceptron(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(MultilayerPerceptron, self).__init__()
        
        ### 1st hidden layer
        self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
        # 权重初始化,默认状况下,PyTorch 应用 Xavier/Glorot 初始化
        self.linear_1.weight.detach().normal_(0.0, 0.1)
        self.linear_1.bias.detach().zero_()
        #self.linear_1_bn = torch.nn.BatchNorm1d(num_hidden_1)
        
        ### 2nd hidden layer
        self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
        self.linear_2.weight.detach().normal_(0.0, 0.1)
        self.linear_2.bias.detach().zero_()
        
        ### Output layer
        self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
        self.linear_out.weight.detach().normal_(0.0, 0.1)
        self.linear_out.bias.detach().zero_()
        
    def forward(self, x):
        out = self.linear_1(x)
        out = F.relu(out)
        #out = self.linear_1_bn(out)
        
        out = self.linear_2(out)
        out = F.relu(out)
        #out = F.dropout(out, p=dropout_prob, training=self.training)
        
        logits = self.linear_out(out)
        probas = F.log_softmax(logits, dim=1)
        return logits, probas

    
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
                             num_classes=num_classes)

model = model.to(device)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

BatchNormDropout 的用法在下面的代码中的 # 号正文处,BatchNorm通过缩小外部协变量偏移来减速深度网络训练,Dropout 应用来自伯努利散布的样本以概率 p 将输出张量的一些元素随机归零,是应答 过拟合 时的一种罕用办法。

  • 模型训练

    def compute_accuracy(net, data_loader):
      net.eval()
      correct_pred, num_examples = 0, 0
      with torch.no_grad():
          for features, targets in data_loader:
              features = features.view(-1, 28*28).to(device)
              targets = targets.to(device)
              logits, probas = net(features)
              _, predicted_labels = torch.max(probas, 1)
              num_examples += targets.size(0)
              correct_pred += (predicted_labels == targets).sum()
          return correct_pred.float()/num_examples * 100
      

    计算准确率☝

    start_time = time.time()
    minibatch_cost = []
    epoch_acc = []
    for epoch in range(num_epochs):
      model.train()
      for batch_idx, (features, targets) in enumerate(train_loader):
          
          features = features.view(-1, 28*28).to(device)
          targets = targets.to(device)
              
          ### FORWARD AND BACK PROP
          logits, probas = model(features)
          cost = F.cross_entropy(logits, targets)
          optimizer.zero_grad()
          
          cost.backward()
          
          ### UPDATE MODEL PARAMETERS
          optimizer.step()
          
          ### LOGGING
          minibatch_cost.append(cost)
          if not batch_idx % 50:
              print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                     %(epoch+1, num_epochs, batch_idx, 
                       len(train_loader), cost))
    
      with torch.set_grad_enabled(False):
          acc = compute_accuracy(model, train_loader)
          epoch_acc.append(acc)
          print('Epoch: %03d/%03d training accuracy: %.2f%%' % (epoch+1, num_epochs, acc))
          
      print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
      
    print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

    训练过程可视化

    import matplotlib
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    plt.plot(range(len(minibatch_cost)), minibatch_cost)
    plt.ylabel('Train loss')
    plt.xlabel('Minibatch')
    plt.show()
    
    plt.plot(range(len(epoch_acc)), epoch_acc)
    plt.ylabel('Train Acc')
    plt.xlabel('Epoch')
    plt.show()

    上述代码☝执行报错,起因是 minibatch_cost 的每一个元素都是带有梯度的tensor,无奈转化成numpy,解决办法是在此之前增加上面这行代码:

    minibatch_cost = [a.detach().numpy() for a in minibatch_cost]

    跑 50 个 epoch 的损失和准确率变动图如下👇

  • 模型评估

在测试集上的准确率

print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

后果如下👇

Test accuracy: 98.04%

for features, targets in test_loader:
    break

_, predictions = model.forward(features[:4].view(-1, 28*28))
predictions = torch.argmax(predictions, dim=1)
predictions = predictions.tolist()

fig, ax = plt.subplots(1, 4)
for i in range(4):
    ax[i].imshow(features[i].view(28, 28), cmap=matplotlib.cm.binary)
    ax[i].set_title("Predicted:" + str(predictions[i]))

plt.show()

❤️ 感激大家

感激大家能看到这里,如果你感觉这篇内容对你有帮忙的话:

  1. 点赞反对下吧,让更多的人也能看到这篇内容。
  2. 欢送在留言区与我分享你的想法,也欢送你在留言区记录你的思考过程。

再次感激大家的反对与激励🌹🌹🌹

ps:此文是自己曾在掘金创作过的文章点此跳转,因而图片带有掘金的水印😅。

正文完
 0