共计 12198 个字符,预计需要花费 31 分钟才能阅读完成。
背景
源码克隆于 https://github.com/ultralytic…,刚克隆下来训练本人的数据集有点无从下手,看其 github 源码主页提供的教程也是摸不着头脑,钻研了两天搞定,记录一下,心愿能够帮忙到和我雷同状况的小伙伴,感觉有用的话能够点赞珍藏一下,有谬误评论区斧正!另外介绍下我是 mac 作为本地电脑,在有 4 块 gpu 的服务器上近程训练的。
1. 克隆 yolov3 源码,筹备本人数据集,筹备预训练权重文件
这里应用 voc 数据集,记得解压到你想放的地位
wget -c http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
预训练权重文件下载
wget -c https://pjreddie.com/media/files/darknet53.conv.74
我是间接把它解压至我的项目根目录下的,你们能够本人决定门路,然而前面会有些代码局部须要相应批改对应此门路。
我略微解释一下这个 VOC2012 上面几个文件夹,最初两个不必管,其实用不到,labels 文件夹和 lables.npy 文件是我前面跑脚本生成的,你解压的是没有的,临时不论,Annotaions 文件夹是寄存对应图片的 xml 文件,该文件是用标签语言来对图片的阐明,ImageSets 里只关系 Main 这个文件夹,能够先把外面的文件删了,因为前面是要生成本人的 txt 文件的,外面记录的是对应训练集、验证集、测试集、训练验证集的图片名称。
2. 新增一个脚本
从数据集中随机生成训练集、验证集、测试集、训练验证集,并生成对应的 txt 文件,存到 Voc2012/Imagesets/Main 中,如图:
get_train_val_txt.py(脚本代码如下,自行拷贝,并适当批改)
import os
import random
random.seed(0)
xmlfilepath=r'./VOC2012/Annotations' # 改成你本人的 Annotations 门路
saveBasePath=r"./VOC2012/ImageSets/Main/" # 改成你本人的 ImageSets/Main 门路
# 值能够本人取,决定你各个数据集的占比数
trainval_percent=.8
train_percent=.7
temp_xml = os.listdir(xmlfilepath)
total_xml = []
for xml in temp_xml:
if xml.endswith(".xml"):
total_xml.append(xml)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w')
ftest = open(os.path.join(saveBasePath,'test.txt'), 'w')
ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w')
fval = open(os.path.join(saveBasePath,'val.txt'), 'w')
for i in list:
name=total_xml[i][:-4]+'\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest .close()
3. 在 data 文件夹里新建两个文件
新建 voc2012.data 文件,外面配置如下
classes=20 # 改成你数据集总共的类别
train=data/train.txt
valid=data/val.txt
names=data/voc2012.names
backup=backup/
新建 voc2012.names 文件,因为我是 voc 数据集,所以配置如下,你须要改成你本人数据集的类别名字
person
bird
cat
cow
dog
horse
sheep
aeroplane
bicycle
boat
bus
car
motorbike
train
bottle
chair
diningtable
pottedplant
sofa
tvmonitor
4. 新建 voc_label.py 脚本
# -*- coding: utf-8 -*-
"""
须要批改的中央:1. sets 中替换为本人的数据集
2. classes 中替换为本人的类别
"""
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
sets = [('2012','train'), ('2012','test'),('2012','val')] #替换为本人的数据集, 格局[(year,image_id)]
classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] #批改为本人的类别名称
# 转化为 yolov3 的数据格式提供的办法
def convert(size, box):
dw = 1./(size[0])
dh = 1./(size[1])
x = (box[0] + box[1])/2.0 - 1
y = (box[2] + box[3])/2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
# 依据 annotation 的文件转换成 yolov3 数据格式
def convert_annotation(year, image_id):
in_file = open('VOC%s/Annotations/%s.xml'%(year, image_id)) # 改成本人 annotation 的 xml 文件地位
out_file = open('VOC%s/labels/%s.txt'%(year, image_id), 'w') # 存生成 labels 文件夹的门路,依据本人状况批改,我是打算存在./VOC2012/labels
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult)==1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + "" +" ".join([str(a) for a in bb]) +'\n')
wd = getcwd()
# 上面代码门路要相应你本人的数据集寄存门路、labels 寄存门路去批改
for year,image_set in sets:
if not os.path.exists('VOC%s/labels/'%(year)):
os.makedirs('VOC%s/labels/'%(year))
image_ids = open('VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
list_file = open('data/%s.txt'%(image_set), 'w')
for image_id in image_ids:
list_file.write('VOC%s/JPEGImages/%s.jpg\n'%(year, image_id))
convert_annotation(year, image_id)
list_file.close()
运行该脚本,就会在我设定的 VOC2012/lables 下生成各个图片对应的 txt 文件
轻易关上一个文件,内容大抵如下
这里的内容其实就是 yolov3 意识的 labels 标示格局:类别所在的行数(对应 voc2012.names 里的行数,从 0 开始算)归一化 x 坐标 归一化 y 坐标 归一化宽 归一化高
5. 阅览一下 train.py 文件和 datasets.py 文件,留神一下是否有些门路处要批改
我就是 labels 所在的中央要批改,没批改前始终报错 labels 找不到,这里说一下我本人的批改处
我本人把 tian.py 的 attempt_download(weights)
代码正文了,因为我曾经下好了预训练权重文件,你们能够本人在 train.py 里搜寻这个函数,也给其正文
另外我的 datasets.py 文件 labels 文件门路批改如下:
因为我的 labels 里的 txt 文件门路是在 VOC2012/labels/ 下的,我的图片文件是在 VOC2012/JPEGImages/ 下的,所以应该是把 JPEGImages 替换成 labels
6. 批改网络配置文件 yolov3.cfg
次要批改这几个中央:
如果你有 gpu,就像我一样批改
batch=64
subdivisions=16
没有就都是 1 这两项
在在该文件搜寻 classes,一共有三处,数值都批改成你数据集里的类别总数,并把 classes 所在层的上一层的 [convolutional] 网络的 filters 批改,批改值按此公式计算:$(classes+5)*3$,我把本人的 cfg 文件贴在上面
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
# Downsample
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
######################
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=20
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 61
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=20
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 36
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=20
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
7. 进到 yolov3 文件夹,就是你本人的我的项目根文件夹下,运行脚本
留神运行命令中的文件门路,请依据本人的理论状况批改,–epochs 迭代次数你们可自行批改
python3 train.py --data ./data/voc2012.data --cfg ./cfg/yolov3.cfg --epochs 3 --weights ./weights/darknet53.conv.74
搞定!训练后果如图,我执行了三次迭代,为了示例快点有后果,另外训练好模型的权重文件存在./weights/best.pt 里,如果你也是用近程服务器训练的,想在本地浏览器查看 tensorboard 监控面板,能够看我的另一篇博文
https://segmentfault.com/a/11…。