背景

源码克隆于https://github.com/ultralytic...,刚克隆下来训练本人的数据集有点无从下手,看其github源码主页提供的教程也是摸不着头脑,钻研了两天搞定,记录一下,心愿能够帮忙到和我雷同状况的小伙伴,感觉有用的话能够点赞珍藏一下,有谬误评论区斧正!另外介绍下我是mac作为本地电脑,在有4块gpu的服务器上近程训练的。
1. 克隆yolov3源码,筹备本人数据集,筹备预训练权重文件
这里应用voc数据集,记得解压到你想放的地位

wget -c http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

预训练权重文件下载

wget -c https://pjreddie.com/media/files/darknet53.conv.74

我是间接把它解压至我的项目根目录下的,你们能够本人决定门路,然而前面会有些代码局部须要相应批改对应此门路。

我略微解释一下这个VOC2012上面几个文件夹,最初两个不必管,其实用不到,labels文件夹和lables.npy文件是我前面跑脚本生成的,你解压的是没有的,临时不论,Annotaions文件夹是寄存对应图片的xml文件,该文件是用标签语言来对图片的阐明,ImageSets里只关系Main这个文件夹,能够先把外面的文件删了,因为前面是要生成本人的txt文件的,外面记录的是对应训练集、验证集、测试集、训练验证集的图片名称。
2. 新增一个脚本
从数据集中随机生成训练集、验证集、测试集、训练验证集,并生成对应的txt文件,存到Voc2012/Imagesets/Main中,如图:

get_train_val_txt.py(脚本代码如下,自行拷贝,并适当批改)

import osimport random random.seed(0)xmlfilepath=r'./VOC2012/Annotations' # 改成你本人的Annotations门路saveBasePath=r"./VOC2012/ImageSets/Main/" # 改成你本人的ImageSets/Main门路 # 值能够本人取,决定你各个数据集的占比数trainval_percent=.8 train_percent=.7temp_xml = os.listdir(xmlfilepath)total_xml = []for xml in temp_xml:    if xml.endswith(".xml"):        total_xml.append(xml)num=len(total_xml)  list=range(num)  tv=int(num*trainval_percent)  tr=int(tv*train_percent)  trainval= random.sample(list,tv)  train=random.sample(trainval,tr)   ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  ftest = open(os.path.join(saveBasePath,'test.txt'), 'w')  ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w')  fval = open(os.path.join(saveBasePath,'val.txt'), 'w')   for i  in list:      name=total_xml[i][:-4]+'\n'      if i in trainval:          ftrainval.write(name)          if i in train:              ftrain.write(name)          else:              fval.write(name)      else:          ftest.write(name)    ftrainval.close()  ftrain.close()  fval.close()  ftest .close()

3. 在data文件夹里新建两个文件
新建voc2012.data文件,外面配置如下

classes=20 # 改成你数据集总共的类别train=data/train.txt valid=data/val.txtnames=data/voc2012.namesbackup=backup/

新建voc2012.names文件,因为我是voc数据集,所以配置如下,你须要改成你本人数据集的类别名字

personbirdcatcowdoghorsesheepaeroplanebicycleboatbuscarmotorbiketrainbottlechairdiningtablepottedplantsofatvmonitor

4. 新建voc_label.py脚本

# -*- coding: utf-8 -*-"""须要批改的中央:1. sets中替换为本人的数据集2. classes中替换为本人的类别"""import xml.etree.ElementTree as ETimport pickleimport osfrom os import listdir, getcwdfrom os.path import joinsets = [('2012','train'), ('2012','test'),('2012','val')]  #替换为本人的数据集,格局[(year,image_id)]classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]     #批改为本人的类别名称# 转化为yolov3的数据格式提供的办法def convert(size, box):    dw = 1./(size[0])    dh = 1./(size[1])    x = (box[0] + box[1])/2.0 - 1    y = (box[2] + box[3])/2.0 - 1    w = box[1] - box[0]    h = box[3] - box[2]    x = x*dw    w = w*dw    y = y*dh    h = h*dh    return (x,y,w,h)# 依据annotation的文件转换成yolov3数据格式def convert_annotation(year, image_id):    in_file = open('VOC%s/Annotations/%s.xml'%(year, image_id))  # 改成本人annotation的xml文件地位    out_file = open('VOC%s/labels/%s.txt'%(year, image_id), 'w') # 存生成labels文件夹的门路,依据本人状况批改,我是打算存在./VOC2012/labels    tree=ET.parse(in_file)    root = tree.getroot()    size = root.find('size')    w = int(size.find('width').text)    h = int(size.find('height').text)    for obj in root.iter('object'):        difficult = obj.find('difficult').text        cls = obj.find('name').text        if cls not in classes or int(difficult)==1:            continue        cls_id = classes.index(cls)        xmlbox = obj.find('bndbox')        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))        bb = convert((w,h), b)        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')wd = getcwd()# 上面代码门路要相应你本人的数据集寄存门路、labels寄存门路去批改for year,image_set in sets:    if not os.path.exists('VOC%s/labels/'%(year)):        os.makedirs('VOC%s/labels/'%(year))    image_ids = open('VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()    list_file = open('data/%s.txt'%(image_set), 'w')    for image_id in image_ids:        list_file.write('VOC%s/JPEGImages/%s.jpg\n'%(year, image_id))        convert_annotation(year, image_id)    list_file.close()   

运行该脚本,就会在我设定的VOC2012/lables下生成各个图片对应的txt文件

轻易关上一个文件,内容大抵如下

这里的内容其实就是yolov3意识的labels标示格局:类别所在的行数(对应voc2012.names里的行数,从0开始算) 归一化x坐标 归一化y坐标 归一化宽 归一化高
5. 阅览一下train.py文件和datasets.py文件,留神一下是否有些门路处要批改
我就是labels所在的中央要批改,没批改前始终报错labels找不到,这里说一下我本人的批改处
我本人把tian.py的attempt_download(weights)代码正文了,因为我曾经下好了预训练权重文件,你们能够本人在train.py里搜寻这个函数,也给其正文
另外我的datasets.py文件labels文件门路批改如下:

因为我的labels里的txt文件门路是在VOC2012/labels/下的,我的图片文件是在VOC2012/JPEGImages/下的,所以应该是把JPEGImages替换成labels
6. 批改网络配置文件yolov3.cfg
次要批改这几个中央:
如果你有gpu,就像我一样批改

batch=64subdivisions=16

没有就都是1这两项
在在该文件搜寻classes,一共有三处,数值都批改成你数据集里的类别总数,并把classes所在层的上一层的[convolutional]网络的filters批改,批改值按此公式计算:$(classes+5)*3$,我把本人的cfg文件贴在上面

[net]# Testing#batch=1#subdivisions=1# Trainingbatch=64subdivisions=16width=608height=608channels=3momentum=0.9decay=0.0005angle=0saturation = 1.5exposure = 1.5hue=.1learning_rate=0.001burn_in=1000max_batches = 500200policy=stepssteps=400000,450000scales=.1,.1[convolutional]batch_normalize=1filters=32size=3stride=1pad=1activation=leaky# Downsample[convolutional]batch_normalize=1filters=64size=3stride=2pad=1activation=leaky[convolutional]batch_normalize=1filters=32size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=64size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear# Downsample[convolutional]batch_normalize=1filters=128size=3stride=2pad=1activation=leaky[convolutional]batch_normalize=1filters=64size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=128size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=64size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=128size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear# Downsample[convolutional]batch_normalize=1filters=256size=3stride=2pad=1activation=leaky[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear# Downsample[convolutional]batch_normalize=1filters=512size=3stride=2pad=1activation=leaky[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear# Downsample[convolutional]batch_normalize=1filters=1024size=3stride=2pad=1activation=leaky[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=1024size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=1024size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=1024size=3stride=1pad=1activation=leaky[shortcut]from=-3 activation=linear[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1filters=1024size=3stride=1pad=1activation=leaky[shortcut]from=-3activation=linear######################[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky[convolutional]size=1stride=1pad=1filters=75activation=linear[yolo]mask = 6,7,8anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326classes=20num=9jitter=.3ignore_thresh = .7truth_thresh = 1random=1[route]layers = -4[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[upsample]stride=2[route]layers = -1, 61[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky[convolutional]size=1stride=1pad=1filters=75activation=linear[yolo]mask = 3,4,5anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326classes=20num=9jitter=.3ignore_thresh = .7truth_thresh = 1random=1[route]layers = -4[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[upsample]stride=2[route]layers = -1, 36[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=256activation=leaky[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=256activation=leaky[convolutional]batch_normalize=1filters=128size=1stride=1pad=1activation=leaky[convolutional]batch_normalize=1size=3stride=1pad=1filters=256activation=leaky[convolutional]size=1stride=1pad=1filters=75activation=linear[yolo]mask = 0,1,2anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326classes=20num=9jitter=.3ignore_thresh = .7truth_thresh = 1random=1

7. 进到yolov3文件夹,就是你本人的我的项目根文件夹下,运行脚本
留神运行命令中的文件门路,请依据本人的理论状况批改,--epochs迭代次数你们可自行批改

python3 train.py --data ./data/voc2012.data --cfg ./cfg/yolov3.cfg --epochs 3 --weights ./weights/darknet53.conv.74

搞定!训练后果如图,我执行了三次迭代,为了示例快点有后果,另外训练好模型的权重文件存在./weights/best.pt里,如果你也是用近程服务器训练的,想在本地浏览器查看tensorboard监控面板,能够看我的另一篇博文
https://segmentfault.com/a/11...。