大家好,我是eriktse,最近在学习计算机视觉。对cv略微有点理解的小伙伴都晓得,猫狗辨认是一个入门的我的项目,虽说这是入门级我的项目,然而要本人写一个神经网络还是没那么简略的。

在这篇文章中,我将从数据处理、网络模型构建、模型训练、模型评估四步来带你亲自动手制作一个准确率达70%以上的能够实现猫狗辨认的卷积神经网络。

试验环境

深度学习框架是百度的PaddlePaddle,机器用的是AI Studio平台。

进入AI Studio官网后点击顶部菜单中的我的项目进入我的项目界面,再点击创立我的项目,而后依照下图的设置创立一个环境。

抉择Notebook类型。


抉择BML Codelab版本。

留神在数据集的设置这里须要本人上传kaggle猫狗辨认的数据集,我当然曾经给大家上传好了,大家搜寻Kaggle猫狗辨认即可找到我的数据集。

aistudio平台是反对从百度网盘导入数据集的,这个十分nice!

数据处理

咱们的数据在上面这个文件夹外面,将train.zip和test.zip解压。

train外面的图片是cat.xxx.jpg, dog.xxx.jpg,数据处理的思路是,先用os模块获取所有图片的地址,打上标签,而后用PIL.Image加载成tensor,再Resize到一个固定的大小,而后进行归一化,再将多个imglbl形成一个batch

结构适合的batch能够减速训练过程。batch相当于是将很多个输出同时输进去,而后一次跑出后果,相比于一张一张图片解决要快很多。

通过os.listdir()能够失去某个地址的所有子文件和子文件夹并返回一个可迭代对象。

通过os.path.join(path1, path2)合并两个path。

编写一个loadImagetoTensor()函数便于咱们解决图像。

这部分须要读者有肯定的图片解决根底,比方Image模块的用法,numpy模块的用法。

咱们这里的图片维度规范为[3. 150, 150]示意有3个通道,每个通道尺寸为[150, 150]。
import paddle, osfrom PIL import Imagefrom paddle.vision.transforms import Resizeimport numpy as npprint(paddle.get_device())paddle.set_device('gpu:0')data_dir = "/home/aistudio/data/data195536/train"train_data = []test_data = []siz = 150 # 规定图片大小为(size, size)batch_size = 16data_paths = os.listdir(data_dir)def loadImagetoTensor(path: str):    #将某个地址的图片读入并进行Transform解决,返回一个tensor    img = np.array(Image.open(path), dtype='float32') # 读取图片并转换成灰度,这样就只有一个通道了    img = Resize(size=(siz, siz))(img) # 将图片缩放到(siz, siz)    img /= 255.0    return paddle.to_tensor(img) # 返回一个(siz, siz)的tensor# 规定0为狗,1为猫train_data = [] # 存一个元组(图像tensor, 标签int)test_data = []# 为了放慢试验进度,咱们取其中训练集中的前5000张cnt = 0for path in data_paths:    img_path = os.path.join(data_dir, path)    img = loadImagetoTensor(img_path)    img = paddle.transpose(img, [2, 0, 1])#转换维度    # 剖析地址失去label    lbl = path.split('.')[0]    lbl = 0 if lbl == 'dog' else 1    train_data.append((img, lbl))    cnt += 1    if cnt == 5000:        cnt = 0        break# 咱们取其中训练集中的前500张作为测试集test_data = train_data[:500]train_data = train_data[500:]def getBatch(data: list):    imgs = []    lbls = []    img, lbl = [], []    #将data打包成batch    for idx, val in enumerate(data):        if idx > 0 and idx % batch_size == 0:            imgs.append(img)            lbls.append(lbl)            img, lbl = [], []        img.append(val[0])        lbl.append(val[1])    return paddle.to_tensor(imgs), paddle.to_tensor(lbls)(train_imgs, train_lbls) = getBatch(train_data)print(train_imgs.shape)print("数据加载实现")

运行后果如图:

模型构建

模型采纳3层卷积层+3层池化层+3层线性层的构造。

import matplotlib.pyplot as pltclass Model(paddle.nn.Layer):    def __init__(self):        super(Model, self).__init__()        self.conv0 = paddle.nn.Conv2D(in_channels=3, out_channels=20, kernel_size=12, padding=0)        self.pool0 = paddle.nn.MaxPool2D(kernel_size =4, stride =4)        self.conv1 = paddle.nn.Conv2D(in_channels=20, out_channels=50, kernel_size=5, padding=0)        self.pool1 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)        self.conv2 = paddle.nn.Conv2D(in_channels=50, out_channels=50, kernel_size=5, padding=0)        self.pool2 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)        self.fc1 = paddle.nn.Linear(in_features=1250, out_features=512)        self.fc2 = paddle.nn.Linear(in_features=512, out_features=64)        self.fc3 = paddle.nn.Linear(in_features=64, out_features=2)        def forward(self, input):        input=paddle.reshape(input, shape=[-1, 3,150,150])        x = self.conv0(input)        x = paddle.nn.functional.relu(x)        x = self.pool0(x)        x = paddle.nn.functional.relu(x)        x = self.conv1(x)        x = paddle.nn.functional.relu(x)        x = self.pool1(x)        x = paddle.nn.functional.relu(x)        x = self.conv2(x)        x = paddle.nn.functional.relu(x)        x = self.pool2(x)        x = paddle.reshape(x, [x.shape[0], -1])        x = self.fc1(x)        x = paddle.nn.functional.relu(x)        x = self.fc2(x)        x = paddle.nn.functional.relu(x)        x = self.fc3(x)        return xmodel = Model()paddle.Model(Model()).summary(input_size=(1,3, 150, 150))#输入模型构造losser = paddle.nn.loss.CrossEntropyLoss()opt = paddle.optimizer.Adam(learning_rate=0.0001,parameters=model.parameters())#学习率尽量小一点,避免出现loss震荡或不收敛的状况

当把图像tensor组成的batch传入model时,会主动调用forward函数。

开始训练

设置好迭代次数epoches,再将img丢进model前,须要将model裁减两个维度,确保和model的conv层承受参数维度统一。

而后其余的货色都是套路一样地写就行了。

迭代次数设置个10左右,准确率就能够达到70%以上了,我这里的迭代次数较多是因为学习率较低,防止函数不收敛。

# 保留和加载模型参数# model.set_state_dict(paddle.load("linear_net.pdparams"))# opt.set_state_dict(paddle.load("adam.pdopt"))# paddle.save(model.state_dict(), "linear_net.pdparams")# paddle.save(opt.state_dict(), "adam.pdopt")loss_pic = []acc_cnt = 0test_cnt = 0epoches = 30 #迭代次数losssum = 0for epoch in range(epoches):    epoch_loss = 0    epoch_cnt = 0    for idx, (img, lbl) in enumerate(zip(train_imgs, train_lbls)):        pred = model(img)        loss = losser(pred, lbl)        loss.backward()        opt.step()        opt.clear_grad()        test_cnt += batch_size        epoch_cnt += batch_size        losssum += loss.numpy()[0]        epoch_loss += loss.numpy()[0]        for it, val in enumerate(pred):            if np.argmax(val.numpy()) == lbl[it]:                acc_cnt += 1        if idx > 0 and idx % 50 == 0:            mean_loss = losssum / 50            print("epoch:[{}/{}], batch:[{}/{}] acc: {:.3f} mean_loss: {:.5f}, epoch_loss:{:.5f}".format(                epoch+1, epoches, idx, len(train_imgs), acc_cnt / test_cnt,                 mean_loss, epoch_loss / epoch_cnt))            loss_pic.append(mean_loss)            losssum = 0#展现loss降落图像plt.figure()plt.plot(range(0, len(loss_pic)), loss_pic, 'r')plt.show()

局部训练过程如下:

epoch:[1/10], batch:[50/281] acc: 0.540 mean_loss: 0.70554, epoch_loss:0.04323epoch:[1/10], batch:[100/281] acc: 0.530 mean_loss: 0.70643, epoch_loss:0.04369epoch:[1/10], batch:[150/281] acc: 0.553 mean_loss: 0.66813, epoch_loss:0.04305epoch:[1/10], batch:[200/281] acc: 0.563 mean_loss: 0.67019, epoch_loss:0.04276epoch:[1/10], batch:[250/281] acc: 0.577 mean_loss: 0.65668, epoch_loss:0.04242epoch:[2/10], batch:[50/281] acc: 0.585 mean_loss: 1.06030, epoch_loss:0.04058epoch:[2/10], batch:[100/281] acc: 0.584 mean_loss: 0.66430, epoch_loss:0.04104epoch:[2/10], batch:[150/281] acc: 0.592 mean_loss: 0.62010, epoch_loss:0.04029epoch:[2/10], batch:[200/281] acc: 0.601 mean_loss: 0.60328, epoch_loss:0.03964epoch:[2/10], batch:[250/281] acc: 0.607 mean_loss: 0.60943, epoch_loss:0.03933epoch:[3/10], batch:[50/281] acc: 0.615 mean_loss: 0.98787, epoch_loss:0.03771epoch:[3/10], batch:[100/281] acc: 0.619 mean_loss: 0.60885, epoch_loss:0.03788epoch:[3/10], batch:[150/281] acc: 0.624 mean_loss: 0.57734, epoch_loss:0.03728epoch:[3/10], batch:[200/281] acc: 0.631 mean_loss: 0.54101, epoch_loss:0.03642epoch:[3/10], batch:[250/281] acc: 0.637 mean_loss: 0.54732, epoch_loss:0.03598epoch:[4/10], batch:[50/281] acc: 0.644 mean_loss: 0.89757, epoch_loss:0.03418epoch:[4/10], batch:[100/281] acc: 0.647 mean_loss: 0.55520, epoch_loss:0.03444epoch:[4/10], batch:[150/281] acc: 0.652 mean_loss: 0.52098, epoch_loss:0.03382epoch:[4/10], batch:[200/281] acc: 0.657 mean_loss: 0.49120, epoch_loss:0.03304epoch:[4/10], batch:[250/281] acc: 0.663 mean_loss: 0.48655, epoch_loss:0.03252epoch:[5/10], batch:[50/281] acc: 0.670 mean_loss: 0.80389, epoch_loss:0.03140epoch:[5/10], batch:[100/281] acc: 0.673 mean_loss: 0.50279, epoch_loss:0.03141epoch:[5/10], batch:[150/281] acc: 0.677 mean_loss: 0.46048, epoch_loss:0.03054epoch:[5/10], batch:[200/281] acc: 0.682 mean_loss: 0.43338, epoch_loss:0.02968epoch:[5/10], batch:[250/281] acc: 0.686 mean_loss: 0.43409, epoch_loss:0.02917epoch:[6/10], batch:[50/281] acc: 0.692 mean_loss: 0.72375, epoch_loss:0.02844

loss降落图像如下:

模型评估

评估的时候咱们能够将图片一张一张输出,而后算出准确率,loss等评估参数。

评估代码:

acc_cnt = 0test_cnt = 0epoches = 1losssum = 0for epoch in range(epoches):    for idx, (img, lbl) in enumerate(test_data):        img = paddle.unsqueeze(img, 0)        pred = model(img)        loss = losser(pred, paddle.to_tensor(lbl))        test_cnt += 1        losssum += loss.numpy()[0]        if np.argmax(pred.numpy()) == lbl:            acc_cnt += 1        if idx > 0 and idx % 100 == 0:            plt.figure()            img = paddle.squeeze(img, 0)            print(img.shape)            img = img.numpy() * 255            img = img.astype('uint8')            plt.imshow(Image.fromarray(img.transpose(1, 2, 0)))            plt.title("(0 dog, 1 cat)pred:{},lbl:{}".format(np.argmax(pred.numpy()), lbl))            plt.show()            print("epoch:[{}/{}], batch:[{}/{}] acc: {} mean_loss: {}".format(                epoch+1, epoches, idx, len(test_data), acc_cnt / test_cnt, losssum / test_cnt))

最终评测出的准确率为:71.82%

在数据集和训练次数足够大的状况下,限度模型准确率的次要因素就是模型构造了,所以须要更牛逼的网络能力达到更高的准确率。

我是一名大二学生,刚刚入门机器学习,所以有很多中央了解不够,有些货色解释不清晰,如果有写的不好的中央欢送大家斧正!