共计 10632 个字符,预计需要花费 27 分钟才能阅读完成。
我对他的框图加了正文,便于了解,红色圈为 yolo\_block, 深红色注解为前一模块的输入,请对照代码
YOLOv3 相比于之前的 yolo1 和 yolo2,改良较大,次要改良方向有:
**1、应用了残差网络 Residual,残差卷积就是进行一次 3X3 的卷积,而后保留该卷积 layer,再进行一次 1X1 的卷积和一次 3X3 的卷积,并把这个后果加上 layer 作为最初的后果,残差网络的特点是容易优化,并且可能通过减少相当的深度来进步准确率。其外部的残差块应用了跳跃连贯,缓解了在深度神经网络中减少深度带来的梯度隐没问题。
2、提取多特色层进行指标检测,一共提取三个特色层(粉色方框图 ),它的 shape 别离为(13,13,75),(26,26,75),(52,52,75) 最初一个维度为 75 是因为该图是基于 voc 数据集的,它的类为 20 种,yolo3 只有针对每一个特色层存在 3 个先验框,所以最初维度为 3 ×25。
3、其采纳反卷积 UmSampling2d 设计,逆卷积绝对于卷积在神经网络构造的正向和反向流传中做相同的运算,其能够更多更好的提取出特色 **
\# l2 正则化
def \_batch\_normalization\_layer(self, input\_layer, name = None, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
'''
Introduction
------------
对卷积层提取的 feature map 应用 batch normalization
Parameters
----------
input\_layer: 输出的四维 tensor
name: batchnorm 层的名字
trainging: 是否为训练过程
norm\_decay: 在预测时计算 moving average 时的衰减率
norm\_epsilon: 方差加上极小的数,避免除以 0 的状况
Returns
-------
bn\_layer: batch normalization 解决之后的 feature map
'''
bn\_layer = tf.layers.batch\_normalization(inputs = input\_layer,
momentum = norm\_decay, epsilon = norm\_epsilon, center = True,
scale = True, training = training, name = name)
return tf.nn.leaky\_relu(bn\_layer, alpha = 0.1)
\# 这个就是用来进行卷积的
def \_conv2d\_layer(self, inputs, filters\_num, kernel\_size, name, use\_bias = False, strides = 1):
"""
Introduction
------------
应用 tf.layers.conv2d 缩小权重和偏置矩阵初始化过程,以及卷积后加上偏置项的操作
通过卷积之后须要进行 batch norm,最初应用 leaky ReLU 激活函数
依据卷积时的步长,如果卷积的步长为 2,则对图像进行降采样
比方,输出图片的大小为 416\*416,卷积核大小为 3,若 stride 为 2 时,(416 - 3 + 2)/ 2 + 1,计算结果为 208,相当于做了池化层解决
因而须要对 stride 大于 1 的时候,先进行一个 padding 操作, 采纳周围都 padding 一维代替 'same' 形式
Parameters
----------
inputs: 输出变量
filters\_num: 卷积核数量
strides: 卷积步长
name: 卷积层名字
trainging: 是否为训练过程
use\_bias: 是否应用偏置项
kernel\_size: 卷积核大小
Returns
-------
conv: 卷积之后的 feature map
"""
conv = tf.layers.conv2d(
inputs = inputs, filters = filters\_num,
kernel\_size = kernel\_size, strides = \[strides, strides\], kernel\_initializer = tf.glorot\_uniform\_initializer(),
padding = ('SAME' if strides == 1 else 'VALID'), kernel\_regularizer = tf.contrib.layers.l2\_regularizer(scale = 5e-4), use\_bias = use\_bias, name = name)
return conv
\# 这个用来进行残差卷积的
\# 残差卷积就是进行一次 3X3 的卷积,而后保留该卷积 layer
\# 再进行一次 1X1 的卷积和一次 3X3 的卷积,并把这个后果加上 layer 作为最初的后果
def \_Residual\_block(self, inputs, filters\_num, blocks\_num, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
Darknet 的残差 block,相似 resnet 的两层卷积构造,别离采纳 1x1 和 3x3 的卷积核,应用 1x1 是为了缩小 channel 的维度
Parameters
----------
inputs: 输出变量
filters\_num: 卷积核数量
trainging: 是否为训练过程
blocks\_num: block 的数量
conv\_index: 为了不便加载预训练权重,对立命名序号
weights\_dict: 加载预训练模型的权重
norm\_decay: 在预测时计算 moving average 时的衰减率
norm\_epsilon: 方差加上极小的数,避免除以 0 的状况
Returns
-------
inputs: 通过残差网络解决后的后果
"""
# 在输出 feature map 的长宽维度进行 padding
inputs = tf.pad(inputs, paddings=\[\[0, 0\], \[1, 0\], \[1, 0\], \[0, 0\]\], mode='CONSTANT')
layer = self.\_conv2d\_layer(inputs, filters\_num, kernel\_size = 3, strides = 2, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
for \_ in range(blocks\_num):
shortcut = layer
layer = self.\_conv2d\_layer(layer, filters\_num // 2, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
layer = self.\_conv2d\_layer(layer, filters\_num, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
layer += shortcut
return layer, conv\_index
#---------------------------------------#
\# 生成 \_darknet53 和逆卷积层
#---------------------------------------#
def \_darknet53(self, inputs, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
构建 yolo3 应用的 darknet53 网络结构
Parameters
----------
inputs: 模型输出变量
conv\_index: 卷积层数序号,不便依据名字加载预训练权重
weights\_dict: 预训练权重
training: 是否为训练
norm\_decay: 在预测时计算 moving average 时的衰减率
norm\_epsilon: 方差加上极小的数,避免除以 0 的状况
Returns
-------
conv: 通过 52 层卷积计算之后的后果, 输出图片为 416x416x3,则此时输入的后果 shape 为 13x13x1024
route1: 返回第 26 层卷积计算结果 52x52x256, 供后续应用
route2: 返回第 43 层卷积计算结果 26x26x512, 供后续应用
conv\_index: 卷积层计数,不便在加载预训练模型时应用
"""with tf.variable\_scope('darknet53'):
# 416,416,3 -> 416,416,32
conv = self.\_conv2d\_layer(inputs, filters\_num = 32, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
# 416,416,32 -> 208,208,64
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 64, blocks\_num = 1, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# 208,208,64 -> 104,104,128
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 128, blocks\_num = 2, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# 104,104,128 -> 52,52,256
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 256, blocks\_num = 8, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route1 = 52,52,256
route1 = conv
# 52,52,256 -> 26,26,512
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 512, blocks\_num = 8, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route2 = 26,26,512
route2 = conv
# 26,26,512 -> 13,13,1024
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 1024, blocks\_num = 4, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route3 = 13,13,1024
return route1, route2, conv, conv\_index
\# 输入两个网络后果
\# 第一个是进行 5 次卷积后,用于下一次逆卷积的,卷积过程是 1X1,3X3,1X1,3X3,1X1
\# 第二个是进行 5 + 2 次卷积,作为一个特色层的,卷积过程是 1X1,3X3,1X1,3X3,1X1,3X3,1X1
def \_yolo\_block(self, inputs, filters\_num, out\_filters, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
yolo3 在 Darknet53 提取的特色层根底上,又加了针对 3 种不同比例的 feature map 的 block,这样来进步对小物体的检测率
Parameters
----------
inputs: 输出特色
filters\_num: 卷积核数量
out\_filters: 最初输入层的卷积核数量
conv\_index: 卷积层数序号,不便依据名字加载预训练权重
training: 是否为训练
norm\_decay: 在预测时计算 moving average 时的衰减率
norm\_epsilon: 方差加上极小的数,避免除以 0 的状况
Returns
-------
route: 返回最初一层卷积的前一层后果
conv: 返回最初一层卷积的后果
conv\_index: conv 层计数
"""conv = self.\_conv2d\_layer(inputs, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name ="conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
route = conv
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = out\_filters, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index), use\_bias = True)
conv\_index += 1
return route, conv, conv\_index
\# 返回三个特色层的内容
def yolo\_inference(self, inputs, num\_anchors, num\_classes, training = True):
"""
Introduction
------------
构建 yolo 模型构造
Parameters
----------
inputs: 模型的输出变量
num\_anchors: 每个 grid cell 负责检测的 anchor 数量
num\_classes: 类别数量
training: 是否为训练模式
"""
conv\_index = 1
# route1 = 52,52,256、route2 = 26,26,512、route3 = 13,13,1024
conv2d\_26, conv2d\_43, conv, conv\_index = self.\_darknet53(inputs, conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
with tf.variable\_scope('yolo'):
#--------------------------------------#
# 取得第一个特色层:conv2d\_59
#--------------------------------------#
# conv2d\_57 = 13,13,512,conv2d\_59 = 13,13,255(3x(80+5))
conv2d\_57, conv2d\_59, conv\_index = self.\_yolo\_block(conv, 512, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
#--------------------------------------#
# 取得第二个特色层:conv2d\_67
#--------------------------------------#
conv2d\_60 = self.\_conv2d\_layer(conv2d\_57, filters\_num = 256, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv2d\_60 = self.\_batch\_normalization\_layer(conv2d\_60, name = "batch\_normalization\_" + str(conv\_index),training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
conv\_index += 1
# unSample\_0 = 26,26,256
unSample\_0 = tf.image.resize\_nearest\_neighbor(conv2d\_60, \[2 \* tf.shape(conv2d\_60)\[1\], 2 \* tf.shape(conv2d\_60)\[1\]\], name='upSample\_0')
# route0 = 26,26,768
route0 = tf.concat(\[unSample\_0, conv2d\_43\], axis = -1, name = 'route\_0')
# conv2d\_65 = 52,52,256,conv2d\_67 = 26,26,255
conv2d\_65, conv2d\_67, conv\_index = self.\_yolo\_block(route0, 256, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
#--------------------------------------#
# 取得第三个特色层:conv2d\_75
#--------------------------------------#
conv2d\_68 = self.\_conv2d\_layer(conv2d\_65, filters\_num = 128, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv2d\_68 = self.\_batch\_normalization\_layer(conv2d\_68, name = "batch\_normalization\_" + str(conv\_index), training=training, norm\_decay=self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
conv\_index += 1
# unSample\_1 = 52,52,128
unSample\_1 = tf.image.resize\_nearest\_neighbor(conv2d\_68, \[2 \* tf.shape(conv2d\_68)\[1\], 2 \* tf.shape(conv2d\_68)\[1\]\], name='upSample\_1')
# route1= 52,52,384
route1 = tf.concat(\[unSample\_1, conv2d\_26\], axis = -1, name = 'route\_1')
# conv2d\_75 = 52,52,255
\_, conv2d\_75, \_ = self.\_yolo\_block(route1, 128, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
return \[conv2d\_59, conv2d\_67, conv2d\_75\]
正文完