留个留念,🕚2022 年 2 月 15 日上午 11 时,我的思否创作之路自此开启!🎉🎉🎉
前言
所谓机器学习,在大多数时候都是拿到现有的模型做些简略的批改后就开始“炼丹”,次要工作就是调参,所以江湖人称“调参师”或者“炼丹师”。因而,我想对一些罕用的机器学习模型做一些梳理和总结,一来是作为集体的学习笔记,二来是不便各位点进来的敌人复制代码后能够间接开始“炼丹”,争取做到「开箱即用」。
观前提醒:这是自己在思否的第一篇文章,程度无限,先在这里给各位大佬赔不是了🙏。
梳理的程序根本是依照工夫来的,大体合乎机器学习算法的倒退过程,所有模型都会提供其 Pytorch 实现,并简要介绍其原理。本文介绍的是神经网络的鼻祖——感知机。上面开始注释👇
感知机的准备常识
感知机(Perceptron),又称“人工神经元”或“奢侈感知机”,是神经网络的根本单元,本文先介绍感知机的基本原理,而后联合具体的分类工作给出感知机模型的 Pytorch 实现。
1.Rosenblatt
Rosenblatt 是神经网络的开山鼻祖,他于 1957 年提出了感知机(Perceptron)的实践;1960 年,他基于硬件构造搭建了一个神经网络。然而,这项成绩受到 Marvin Minksy 和 Seymour Papert 的质疑,使得 Perceptron 寂静了近 20 年,直到 80 年代 Hinton 创造 BP 算法才使得其成为热门。
2. 基本原理
假如输出空间(特色空间)为 \(x\in R^n \),输入空间是 \(y\in\{1,-1\} \),则输出空间到输入空间的函数:\(f(x)=sign(wx+b) \) 就称为感知机。其中,w 叫做权值(weight)或权值向量(weight vector),b 叫做偏置(bias),sign 是符号函数:
$$
sign(x)=\begin{cases}1,x\geq0\\-1,x<0\end{cases}
$$
给定数据集 \(T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)\} \),则利用感知机进行分类学习的过程等价于求解如下最小化问题:
$$
min L(w,b)=-\sum_{x_i\in M}y_i(wx_i+b_i)
$$
其中,M 是误分类点的汇合,也就是说感知机是由误分类点驱动的。对于 w 和 b 的更新则是采纳随机梯度降落法(SGD):
$$
w^{i+1}=w^i – \eta\frac{\partial L(w,b)}{\partial w}\\
b^{i+1}=b^i – \eta\frac{\partial L(w,b)}{\partial b}
$$
其中,\(\eta \) 称为学习率。
单层感知机模型对玩具数据分类
-
导包
import numpy as np import matplotlib.pyplot as plt import torch %matplotlib inline
-
加载数据
data = np.genfromtxt('../data/perceptron_toydata.txt', delimiter='\t') X, y = data[:, :2], data[:, 2] y = y.astype(np.int) print('Class label counts:', np.bincount(y)) print('X.shape:', X.shape) print('y.shape:', y.shape)
输入如下👇
Class label counts: [50 50]
X.shape: (100, 2)
y.shape: (100,)
打乱数据并随机划分训练集和测试集
shuffle_idx = np.arange(y.shape[0])
shuffle_rng = np.random.RandomState(123) #定义一个随机数种子,实现每次代码执行生成的随机数集都雷同
shuffle_rng.shuffle(shuffle_idx)
X, y = X[shuffle_idx], y[shuffle_idx]
X_train, X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]]
y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]
对数据进行Z-Score 标准化
,标准化后的数据均值为 0,方差为 1,标准化后特色数据的散布没有产生扭转。
线性模型个别状况下都须要做数据归一化 / 标准化解决 ,如KNN
(K 近邻)、K-means
聚类、感知机和SVM
。
决策树、基于决策树的 Boosting
和Bagging
等集成学习模型对于特色取值大小并不敏感,如随机森林、XGBoost
、LightGBM
等树模型,以及奢侈贝叶斯,这些模型个别不须要做数据归一化 / 标准化解决。
# Normalize (mean zero, unit variance)
mu, sigma = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma
数据散点图👇,能够显著看出分为两类。
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.legend()
plt.show()
- 模型定义
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def custom_where(cond, x_1, x_2):
return (cond * x_1) + ((~cond) * x_2)
class Perceptron():
def __init__(self, num_features):
self.num_features = num_features
self.weights = torch.zeros(num_features, 1,
dtype=torch.float32, device=device)
self.bias = torch.zeros(1, dtype=torch.float32, device=device)
def forward(self, x):
linear = torch.add(torch.mm(x, self.weights), self.bias)
predictions = custom_where(linear > 0., 1, 0).float()
return predictions
def backward(self, x, y):
predictions = self.forward(x)
errors = y - predictions
return errors
def train(self, x, y, epochs):
for e in range(epochs):
for i in range(y.size()[0]):
# use view because backward expects a matrix (i.e., 2D tensor)
errors = self.backward(x[i].view(1, self.num_features), y[i]).view(-1)
self.weights += (errors * x[i]).view(self.num_features, 1)
self.bias += errors
def evaluate(self, x, y):
predictions = self.forward(x).view(-1)
accuracy = torch.sum(predictions == y).float() / y.size()[0]
return accuracy
- 模型训练
ppn = Perceptron(num_features=2)
X_train_tensor = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32, device=device)
ppn.train(X_train_tensor, y_train_tensor, epochs=10)
print('Model parameters:')
print('Weights: %s' % ppn.weights)
print('Bias: %s' % ppn.bias)
输入如下👇
Model parameters:
Weights: tensor([[1.2734], [1.3464]])
Bias: tensor([-1.])
- 模型评估
X_test_tensor = torch.tensor(X_test, dtype=torch.float32, device=device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32, device=device)
test_acc = ppn.evaluate(X_test_tensor, y_test_tensor)
print('Test set accuracy: %.2f%%' % (test_acc*100))
输入如下👇
Test set accuracy: 93.33%
效果图
w, b = ppn.weights, ppn.bias
x_min = -2
y_min = ((-(w[0] * x_min) - b[0])
/ w[1] )
x_max = 2
y_max = ((-(w[0] * x_max) - b[0])
/ w[1] )
fig, ax = plt.subplots(1, 2, sharex=True, figsize=(7, 3))
ax[0].plot([x_min, x_max], [y_min, y_max])
ax[1].plot([x_min, x_max], [y_min, y_max])
ax[0].scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
ax[0].scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
ax[1].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='class 0', marker='o')
ax[1].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='class 1', marker='s')
ax[1].legend(loc='upper left')
plt.show()
多层感知机模型 & 手写数字辨认
-
导包
import time import numpy as np from torchvision import datasets from torchvision import transforms from torch.utils.data import DataLoader import torch.nn.functional as F import torch if torch.cuda.is_available(): torch.backends.cudnn.deterministic = True
-
参数设置
# Device device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu") # Hyperparameters random_seed = 1 learning_rate = 0.1 num_epochs = 10 batch_size = 64 # Architecture num_features = 784 num_hidden_1 = 128 num_hidden_2 = 256 num_classes = 10
-
加载数据
train_dataset = datasets.MNIST(root='data', train=True, transform=transforms.ToTensor(), download=True) test_dataset = datasets.MNIST(root='data', train=False, transform=transforms.ToTensor()) train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False) # Checking the dataset for images, labels in train_loader: print('Image batch dimensions:', images.shape) print('Image label dimensions:', labels.shape) break
transforms.ToTensor()
将输出图像缩放到 0-1 范畴,输入如下👇
Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])
- 模型定义
class MultilayerPerceptron(torch.nn.Module):
def __init__(self, num_features, num_classes):
super(MultilayerPerceptron, self).__init__()
### 1st hidden layer
self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
# 权重初始化,默认状况下,PyTorch 应用 Xavier/Glorot 初始化
self.linear_1.weight.detach().normal_(0.0, 0.1)
self.linear_1.bias.detach().zero_()
#self.linear_1_bn = torch.nn.BatchNorm1d(num_hidden_1)
### 2nd hidden layer
self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
self.linear_2.weight.detach().normal_(0.0, 0.1)
self.linear_2.bias.detach().zero_()
### Output layer
self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
self.linear_out.weight.detach().normal_(0.0, 0.1)
self.linear_out.bias.detach().zero_()
def forward(self, x):
out = self.linear_1(x)
out = F.relu(out)
#out = self.linear_1_bn(out)
out = self.linear_2(out)
out = F.relu(out)
#out = F.dropout(out, p=dropout_prob, training=self.training)
logits = self.linear_out(out)
probas = F.log_softmax(logits, dim=1)
return logits, probas
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
num_classes=num_classes)
model = model.to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
BatchNorm
和 Dropout
的用法在下面的代码中的 #
号正文处,BatchNorm
通过缩小外部协变量偏移来减速深度网络训练,Dropout
应用来自伯努利散布的样本以概率 p
将输出张量的一些元素随机归零,是应答 过拟合 时的一种罕用办法。
-
模型训练
def compute_accuracy(net, data_loader): net.eval() correct_pred, num_examples = 0, 0 with torch.no_grad(): for features, targets in data_loader: features = features.view(-1, 28*28).to(device) targets = targets.to(device) logits, probas = net(features) _, predicted_labels = torch.max(probas, 1) num_examples += targets.size(0) correct_pred += (predicted_labels == targets).sum() return correct_pred.float()/num_examples * 100
计算准确率☝
start_time = time.time() minibatch_cost = [] epoch_acc = [] for epoch in range(num_epochs): model.train() for batch_idx, (features, targets) in enumerate(train_loader): features = features.view(-1, 28*28).to(device) targets = targets.to(device) ### FORWARD AND BACK PROP logits, probas = model(features) cost = F.cross_entropy(logits, targets) optimizer.zero_grad() cost.backward() ### UPDATE MODEL PARAMETERS optimizer.step() ### LOGGING minibatch_cost.append(cost) if not batch_idx % 50: print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' %(epoch+1, num_epochs, batch_idx, len(train_loader), cost)) with torch.set_grad_enabled(False): acc = compute_accuracy(model, train_loader) epoch_acc.append(acc) print('Epoch: %03d/%03d training accuracy: %.2f%%' % (epoch+1, num_epochs, acc)) print('Time elapsed: %.2f min' % ((time.time() - start_time)/60)) print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
训练过程可视化
import matplotlib import matplotlib.pyplot as plt %matplotlib inline plt.plot(range(len(minibatch_cost)), minibatch_cost) plt.ylabel('Train loss') plt.xlabel('Minibatch') plt.show() plt.plot(range(len(epoch_acc)), epoch_acc) plt.ylabel('Train Acc') plt.xlabel('Epoch') plt.show()
上述代码☝执行报错,起因是
minibatch_cost
的每一个元素都是带有梯度的tensor
,无奈转化成numpy
,解决办法是在此之前增加上面这行代码:minibatch_cost = [a.detach().numpy() for a in minibatch_cost]
跑 50 个
epoch
的损失和准确率变动图如下👇
- 模型评估
在测试集上的准确率
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
后果如下👇
Test accuracy: 98.04%
for features, targets in test_loader:
break
_, predictions = model.forward(features[:4].view(-1, 28*28))
predictions = torch.argmax(predictions, dim=1)
predictions = predictions.tolist()
fig, ax = plt.subplots(1, 4)
for i in range(4):
ax[i].imshow(features[i].view(28, 28), cmap=matplotlib.cm.binary)
ax[i].set_title("Predicted:" + str(predictions[i]))
plt.show()
❤️ 感激大家
感激大家能看到这里,如果你感觉这篇内容对你有帮忙的话:
- 点赞反对下吧,让更多的人也能看到这篇内容。
- 欢送在留言区与我分享你的想法,也欢送你在留言区记录你的思考过程。
再次感激大家的反对与激励🌹🌹🌹
ps:此文是自己曾在掘金创作过的文章点此跳转,因而图片带有掘金的水印😅。