关于pytorch:使用scikitlearn为PyTorch-模型进行超参数网格搜索

scikit-learn 是 Python 中最好的机器学习库，而 PyTorch 又为咱们构建模型提供了不便的操作，是否将它们的长处整合起来呢？在本文中，咱们将介绍如何应用 scikit-learn 中的网格搜寻性能来调整 PyTorch 深度学习模型的超参数:

如何包装 PyTorch 模型以用于 scikit-learn 以及如何应用网格搜寻
如何网格搜寻常见的神经网络参数，如学习率、Dropout、epochs、神经元数
在本人的我的项目上定义本人的超参数调优试验

要让 PyTorch 模型能够在 scikit-learn 中应用的一个最简略的办法是应用 skorch 包。这个包为 PyTorch 模型提供与 scikit-learn 兼容的 API。在 skorch 中，有分类神经网络的 NeuralNetClassifier 和回归神经网络的 NeuralNetRegressor。

 pip install skorch

要应用这些包装器，必须应用 nn.Module 将 PyTorch 模型定义为类，而后在结构 NeuralNetClassifier 类时将类的名称传递给模块参数。例如：

 class MyClassifier(nn.Module):
     def __init__(self):
         super().__init__()
         ...
 
     def forward(self, x):
         ...
         return x
 
 # create the skorch wrapper
 model = NeuralNetClassifier(module=MyClassifier)

NeuralNetClassifier 类的构造函数能够取得传递给 model.fit() 调用的参数（在 scikit-learn 模型中调用训练循环的办法），例如轮次数和批量大小等。例如：

 model = NeuralNetClassifier(
     module=MyClassifier,
     max_epochs=150,
     batch_size=10
 )

NeuralNetClassifier 类的构造函数也能够承受新的参数，这些参数能够传递给你的模型类的构造函数，要求是必须在它后面加上 module__(两个下划线)。这些新参数可能在构造函数中带有默认值，但当包装器实例化模型时，它们将被笼罩。例如:

 import torch.nn as nn
 from skorch import NeuralNetClassifier
 
 class SonarClassifier(nn.Module):
     def __init__(self, n_layers=3):
         super().__init__()
         self.layers = []
         self.acts = []
         for i in range(n_layers):
             self.layers.append(nn.Linear(60, 60))
             self.acts.append(nn.ReLU())
             self.add_module(f"layer{i}", self.layers[-1])
             self.add_module(f"act{i}", self.acts[-1])
         self.output = nn.Linear(60, 1)
 
     def forward(self, x):
         for layer, act in zip(self.layers, self.acts):
             x = act(layer(x))
         x = self.output(x)
         return x
 
 model = NeuralNetClassifier(
     module=SonarClassifier,
     max_epochs=150,
     batch_size=10,
     module__n_layers=2
 )

咱们能够通过初始化一个模型并打印来验证后果:

 print(model.initialize())
 
 #后果如下：<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
   module_=SonarClassifier((layer0): Linear(in_features=60, out_features=60, bias=True)
     (act0): ReLU()
     (layer1): Linear(in_features=60, out_features=60, bias=True)
     (act1): ReLU()
     (output): Linear(in_features=60, out_features=1, bias=True)
   ),
 )

网格搜寻是一种模型超参数优化技术。它只是简略地穷尽超参数的所有组合，并找到给出最佳分数的组合。在 scikit-learn 中，GridSearchCV 类提供了这种技术。在结构这个类时，必须在 param_grid 参数中提供一个超参数字典。这是模型参数名和要尝试的值数组的映射。

默认应用精度作为优化的分数，但其余分数能够在 GridSearchCV 构造函数的 score 参数中指定。GridSearchCV 将为每个参数组合构建一个模型进行评估。并且应用默认的 3 倍穿插验证，这些都是能够通过参数来进行设置的。

上面是定义一个简略网格搜寻的例子:

 param_grid = {'epochs': [10,20,30]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, Y)

通过将 GridSearchCV 构造函数中的 n_jobs 参数设置为 - 1 示意将应用机器上的所有外围。否则，网格搜寻过程将只在单线程中运行，这在多核 cpu 中较慢。

运行结束就能够在 grid.fit()返回的后果对象中拜访网格搜寻的后果。best_score 提供了在优化过程中察看到的最佳分数，best_params_形容了获得最佳后果的参数组合。

咱们的示例都将在一个小型规范机器学习数据集上进行演示，该数据集是一个糖尿病发生分类数据集。这是一个小型数据集，所有的数值属性都很容易解决。

在第一个简略示例中，咱们将介绍如何调优批大小和拟合网络时应用的 epoch 数。

咱们将简略评估从 10 到 100 的不批大小，代码清单如下所示:

 import random
 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 # PyTorch classifier
 class PimaClassifier(nn.Module):
     def __init__(self):
         super().__init__()
         self.layer = nn.Linear(8, 12)
         self.act = nn.ReLU()
         self.output = nn.Linear(12, 1)
         self.prob = nn.Sigmoid()
 
     def forward(self, x):
         x = self.act(self.layer(x))
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     optimizer=optim.Adam,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
     'max_epochs': [10, 50, 100]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

后果如下:

 Best: 0.714844 using {'batch_size': 10, 'max_epochs': 100}
 0.665365 (0.020505) with: {'batch_size': 10, 'max_epochs': 10}
 0.588542 (0.168055) with: {'batch_size': 10, 'max_epochs': 50}
 0.714844 (0.032369) with: {'batch_size': 10, 'max_epochs': 100}
 0.671875 (0.022326) with: {'batch_size': 20, 'max_epochs': 10}
 0.696615 (0.008027) with: {'batch_size': 20, 'max_epochs': 50}
 0.714844 (0.019918) with: {'batch_size': 20, 'max_epochs': 100}
 0.666667 (0.009744) with: {'batch_size': 40, 'max_epochs': 10}
 0.687500 (0.033603) with: {'batch_size': 40, 'max_epochs': 50}
 0.707031 (0.024910) with: {'batch_size': 40, 'max_epochs': 100}
 0.667969 (0.014616) with: {'batch_size': 60, 'max_epochs': 10}
 0.694010 (0.036966) with: {'batch_size': 60, 'max_epochs': 50}
 0.694010 (0.042473) with: {'batch_size': 60, 'max_epochs': 100}
 0.670573 (0.023939) with: {'batch_size': 80, 'max_epochs': 10}
 0.674479 (0.020752) with: {'batch_size': 80, 'max_epochs': 50}
 0.703125 (0.026107) with: {'batch_size': 80, 'max_epochs': 100}
 0.680990 (0.014382) with: {'batch_size': 100, 'max_epochs': 10}
 0.670573 (0.013279) with: {'batch_size': 100, 'max_epochs': 50}
 0.687500 (0.017758) with: {'batch_size': 100, 'max_epochs': 100}

能够看到 ’batch_size’: 10, ‘max_epochs’: 100 达到了约 71% 的精度的最佳后果。

上面咱们看看如何调整优化器，咱们晓得有很多个优化器能够抉择比方 SDG,Adam 等，那么如何抉择呢？

残缺的代码如下:

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 # PyTorch classifier
 class PimaClassifier(nn.Module):
     def __init__(self):
         super().__init__()
         self.layer = nn.Linear(8, 12)
         self.act = nn.ReLU()
         self.output = nn.Linear(12, 1)
         self.prob = nn.Sigmoid()
 
     def forward(self, x):
         x = self.act(self.layer(x))
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     max_epochs=100,
     batch_size=10,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {
     'optimizer': [optim.SGD, optim.RMSprop, optim.Adagrad, optim.Adadelta,
                   optim.Adam, optim.Adamax, optim.NAdam],
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

输入如下:

 Best: 0.721354 using {'optimizer': <class 'torch.optim.adamax.Adamax'>}
 0.674479 (0.036828) with: {'optimizer': <class 'torch.optim.sgd.SGD'>}
 0.700521 (0.043303) with: {'optimizer': <class 'torch.optim.rmsprop.RMSprop'>}
 0.682292 (0.027126) with: {'optimizer': <class 'torch.optim.adagrad.Adagrad'>}
 0.572917 (0.051560) with: {'optimizer': <class 'torch.optim.adadelta.Adadelta'>}
 0.714844 (0.030758) with: {'optimizer': <class 'torch.optim.adam.Adam'>}
 0.721354 (0.019225) with: {'optimizer': <class 'torch.optim.adamax.Adamax'>}
 0.709635 (0.024360) with: {'optimizer': <class 'torch.optim.nadam.NAdam'>}

能够看到对于咱们的模型和数据集 Adamax 优化算法是最佳的，准确率约为 72%。

尽管 pytorch 外面学习率打算能够让咱们依据轮次动静调整学习率，然而作为样例，咱们将学习率和学习率的参数作为网格搜寻的一个参数来进行演示。在 PyTorch 中，设置学习率和动量的办法如下:

 optimizer = optim.SGD(lr=0.001, momentum=0.9)

在 skorch 包中，应用前缀 optimizer__将参数路由到优化器。

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 # PyTorch classifier
 class PimaClassifier(nn.Module):
     def __init__(self):
         super().__init__()
         self.layer = nn.Linear(8, 12)
         self.act = nn.ReLU()
         self.output = nn.Linear(12, 1)
         self.prob = nn.Sigmoid()
 
     def forward(self, x):
         x = self.act(self.layer(x))
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     optimizer=optim.SGD,
     max_epochs=100,
     batch_size=10,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {'optimizer__lr': [0.001, 0.01, 0.1, 0.2, 0.3],
     'optimizer__momentum': [0.0, 0.2, 0.4, 0.6, 0.8, 0.9],
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

后果如下：

 Best: 0.682292 using {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
 0.648438 (0.016877) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.0}
 0.671875 (0.017758) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.2}
 0.674479 (0.022402) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.4}
 0.677083 (0.011201) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.6}
 0.679688 (0.027621) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.8}
 0.682292 (0.026557) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
 0.671875 (0.019918) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.0}
 0.648438 (0.024910) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.2}
 0.546875 (0.143454) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.4}
 0.567708 (0.153668) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.6}
 0.552083 (0.141790) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.8}
 0.451823 (0.144561) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.9}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.0}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.2}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.4}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.6}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.8}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.9}
 0.444010 (0.136265) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.0}
 0.450521 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.2}
 0.348958 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.4}
 0.552083 (0.141790) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.6}
 0.549479 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.8}
 0.651042 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.9}
 0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.0}
 0.348958 (0.001841) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.2}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.4}
 0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.6}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.8}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.9}

对于 SGD，应用 0.001 的学习率和 0.9 的动量取得了最佳后果，准确率约为 68%。

激活函数管制单个神经元的非线性。咱们将演示评估 PyTorch 中可用的一些激活函数。

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 # PyTorch classifier
 class PimaClassifier(nn.Module):
     def __init__(self, activation=nn.ReLU):
         super().__init__()
         self.layer = nn.Linear(8, 12)
         self.act = activation()
         self.output = nn.Linear(12, 1)
         self.prob = nn.Sigmoid()
         # manually init weights
         init.kaiming_uniform_(self.layer.weight)
         init.kaiming_uniform_(self.output.weight)
 
     def forward(self, x):
         x = self.act(self.layer(x))
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     optimizer=optim.Adamax,
     max_epochs=100,
     batch_size=10,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {
     'module__activation': [nn.Identity, nn.ReLU, nn.ELU, nn.ReLU6,
                            nn.GELU, nn.Softplus, nn.Softsign, nn.Tanh,
                            nn.Sigmoid, nn.Hardsigmoid]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

后果如下：

 Best: 0.699219 using {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
 0.687500 (0.025315) with: {'module__activation': <class 'torch.nn.modules.linear.Identity'>}
 0.699219 (0.011049) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
 0.674479 (0.035849) with: {'module__activation': <class 'torch.nn.modules.activation.ELU'>}
 0.621094 (0.063549) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU6'>}
 0.674479 (0.017566) with: {'module__activation': <class 'torch.nn.modules.activation.GELU'>}
 0.558594 (0.149189) with: {'module__activation': <class 'torch.nn.modules.activation.Softplus'>}
 0.675781 (0.014616) with: {'module__activation': <class 'torch.nn.modules.activation.Softsign'>}
 0.619792 (0.018688) with: {'module__activation': <class 'torch.nn.modules.activation.Tanh'>}
 0.643229 (0.019225) with: {'module__activation': <class 'torch.nn.modules.activation.Sigmoid'>}
 0.636719 (0.022326) with: {'module__activation': <class 'torch.nn.modules.activation.Hardsigmoid'>}

ReLU 激活函数取得了最好的后果，准确率约为 70%。

在本例中，咱们将尝试在 0.0 到 0.9 之间的 dropout 百分比 (1.0 没有意义) 和在 0 到 5 之间的 MaxNorm 权重束缚值。

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 # PyTorch classifier
 class PimaClassifier(nn.Module):
     def __init__(self, dropout_rate=0.5, weight_constraint=1.0):
         super().__init__()
         self.layer = nn.Linear(8, 12)
         self.act = nn.ReLU()
         self.dropout = nn.Dropout(dropout_rate)
         self.output = nn.Linear(12, 1)
         self.prob = nn.Sigmoid()
         self.weight_constraint = weight_constraint
         # manually init weights
         init.kaiming_uniform_(self.layer.weight)
         init.kaiming_uniform_(self.output.weight)
 
     def forward(self, x):
         # maxnorm weight before actual forward pass
         with torch.no_grad():
             norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
             desired = torch.clamp(norm, max=self.weight_constraint)
             self.layer.weight *= (desired / norm)
         # actual forward pass
         x = self.act(self.layer(x))
         x = self.dropout(x)
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     optimizer=optim.Adamax,
     max_epochs=100,
     batch_size=10,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {'module__weight_constraint': [1.0, 2.0, 3.0, 4.0, 5.0],
     'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

后果如下：

 Best: 0.701823 using {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
 0.669271 (0.015073) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 1.0}
 0.692708 (0.035132) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 2.0}
 0.589844 (0.170180) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 3.0}
 0.561198 (0.151131) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 4.0}
 0.688802 (0.021710) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 5.0}
 0.697917 (0.009744) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 1.0}
 0.701823 (0.016367) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
 0.694010 (0.010253) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 3.0}
 0.686198 (0.025976) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 4.0}
 0.679688 (0.026107) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 5.0}
 0.701823 (0.029635) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 1.0}
 0.682292 (0.014731) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 2.0}
 0.701823 (0.009744) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 3.0}
 0.701823 (0.026557) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 4.0}
 0.687500 (0.015947) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 5.0}
 0.686198 (0.006639) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 1.0}
 0.656250 (0.006379) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 2.0}
 0.565104 (0.155608) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 3.0}
 0.700521 (0.028940) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 4.0}
 0.669271 (0.012890) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 5.0}
 0.661458 (0.018688) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 1.0}
 0.669271 (0.017566) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 2.0}
 0.652344 (0.006379) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 3.0}
 0.680990 (0.037783) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 4.0}
 0.692708 (0.042112) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 5.0}
 0.666667 (0.006639) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 1.0}
 0.652344 (0.011500) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 2.0}
 0.662760 (0.007366) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 3.0}
 0.558594 (0.146610) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 4.0}
 0.552083 (0.141826) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 5.0}
 0.548177 (0.141826) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 1.0}
 0.653646 (0.013279) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 2.0}
 0.661458 (0.008027) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 3.0}
 0.553385 (0.142719) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 4.0}
 0.669271 (0.035132) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 5.0}
 0.662760 (0.015733) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 1.0}
 0.636719 (0.024910) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 2.0}
 0.550781 (0.146818) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 3.0}
 0.537760 (0.140094) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 4.0}
 0.542969 (0.138144) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 5.0}
 0.565104 (0.148654) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 1.0}
 0.657552 (0.008027) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 2.0}
 0.428385 (0.111418) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 3.0}
 0.549479 (0.142719) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 4.0}
 0.648438 (0.005524) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 5.0}
 0.540365 (0.136861) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 1.0}
 0.605469 (0.053083) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 2.0}
 0.553385 (0.139948) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 3.0}
 0.549479 (0.142719) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 4.0}
 0.595052 (0.075566) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 5.0}

能够看到，10% 的 Dropout 和 2.0 的权重束缚取得了 70% 的最佳精度。

单层神经元的数量是一个须要调优的重要参数。一般来说，一层神经元的数量管制着网络的示意能力，至多在拓扑的这一点上是这样。

实践上来说：因为通用迫近定理，一个足够大的单层网络能够近似任何其余神经网络。

在本例中，将尝试从 1 到 30 的值，步骤为 5。一个更大的网络须要更多的训练，至多批大小和 epoch 的数量应该与神经元的数量一起优化。

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
 
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
 class PimaClassifier(nn.Module):
     def __init__(self, n_neurons=12):
         super().__init__()
         self.layer = nn.Linear(8, n_neurons)
         self.act = nn.ReLU()
         self.dropout = nn.Dropout(0.1)
         self.output = nn.Linear(n_neurons, 1)
         self.prob = nn.Sigmoid()
         self.weight_constraint = 2.0
         # manually init weights
         init.kaiming_uniform_(self.layer.weight)
         init.kaiming_uniform_(self.output.weight)
 
     def forward(self, x):
         # maxnorm weight before actual forward pass
         with torch.no_grad():
             norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
             desired = torch.clamp(norm, max=self.weight_constraint)
             self.layer.weight *= (desired / norm)
         # actual forward pass
         x = self.act(self.layer(x))
         x = self.dropout(x)
         x = self.prob(self.output(x))
         return x
 
 # create model with skorch
 model = NeuralNetClassifier(
     PimaClassifier,
     criterion=nn.BCELoss,
     optimizer=optim.Adamax,
     max_epochs=100,
     batch_size=10,
     verbose=False
 )
 
 # define the grid search parameters
 param_grid = {'module__n_neurons': [1, 5, 10, 15, 20, 25, 30]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
 
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
     print("%f (%f) with: %r" % (mean, stdev, param))

后果如下：

 Best: 0.708333 using {'module__n_neurons': 30}
 0.654948 (0.003683) with: {'module__n_neurons': 1}
 0.666667 (0.023073) with: {'module__n_neurons': 5}
 0.694010 (0.014382) with: {'module__n_neurons': 10}
 0.682292 (0.014382) with: {'module__n_neurons': 15}
 0.707031 (0.028705) with: {'module__n_neurons': 20}
 0.703125 (0.030758) with: {'module__n_neurons': 25}
 0.708333 (0.015733) with: {'module__n_neurons': 30}

你能够看到，在暗藏层中有 30 个神经元的网络取得了最好的后果，准确率约为 71%。

在这篇文章中，咱们介绍了如何应用 PyTorch 和 scikit-learn 在 Python 中优化深度学习网络的超参数。如果你对 skorch 感兴趣，能够看看他的文档

https://avoid.overfit.cn/post/fda8764b85174b6ca3c9eac4fc6d0db9

作者：Jason Brownlee

关于pytorch:使用scikitlearn为PyTorch-模型进行超参数网格搜索

如何在 scikit-learn 中应用 PyTorch 模型

在 scikit-learn 中应用网格搜寻

示例问题形容

如何调优批大小和训练的轮次

如何调整训练优化器

如何调整学习率

如何激活函数

如何调整 Dropout 参数

如何调整暗藏层神经元的数量

总结