乐趣区

关于cnn:技术博客目标检测算法RCNN介绍

指标检测算法 R -CNN 介绍

作者:高雨茁

指标检测简介

指标检测(Object Detection)的工作是找出图像中所有感兴趣的指标(物体),确定它们的类别和地位。
计算机视觉中对于图像识别有四大类工作:
1. 分类 -Classification:解决“是什么?”的问题,即给定一张图片或一段视频判断外面蕴含什么类别的指标。
2. 定位 -Location:解决“在哪里?”的问题,即定位出这个指标的的地位。
3. 检测 -Detection:解决“是什么?在哪里?”的问题,即定位出这个指标的的地位并且晓得指标物是什么。
4. 宰割 -Segmentation:分为实例的宰割(Instance-level)和场景宰割(Scene-level),解决“每一个像素属于哪个指标物或场景”的问题。

以后指标检测算法分类

1.Two stage 指标检测算法
先进行区域生成(region proposal,RP)(一个有可能蕴含待检物体的预选框),再通过卷积神经网络进行样本分类。
工作:特征提取—> 生成 RP—> 分类 / 定位回归。
常见的 two stage 指标检测算法有:R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN 和 R -FCN 等。

2.One stage 指标检测算法
不必 RP,间接在网络中提取特色来预测物体分类和地位。
工作:特征提取—> 分类 / 定位回归。
常见的 one stage 指标检测算法有:OverFeat、YOLOv1、YOLOv2、YOLOv3、SSD 和 RetinaNet 等。

本文后续将介绍其中的经典算法 R-CNN 并给出相应的代码实现。

R-CNN

R-CNN(Regions with CNN features)是将 CNN 办法利用到指标检测问题上的一个里程碑。借助 CNN 良好的特征提取和分类性能,通过 RegionProposal 办法实现目标检测问题的转化。
算法分为四个步骤:

  1. 从原图像生成候选区域(RoI proposal)
  2. 将候选区域输出 CNN 进行特征提取
  3. 将特色送入每一类别的 SVM 检测器,判断是否属于该类
  4. 通过边界回归失去准确的指标区域

算法前向流程图如下(图中数字标记对应上述四个步骤):

在下文中咱们也会依照上述四个步骤的程序解说 模型构建 ,在这之后咱们会解说如何进行 模型训练
但在开始具体上述操作之前,让咱们简略理解下在训练中咱们将会应用到的数据集。

数据集简介

原论文中应用的数据集为:
1.ImageNet ILSVC(一个较大的辨认库)一千万图像,1000 类。
2.PASCAL VOC 2007(一个较小的检测库)一万图像,20 类。
训练时应用辨认库进行预训练,而后用检测库调优参数并在检测库上评测模型成果。

因为原数据集容量较大,模型的训练工夫可能会达到几十个小时之久。为了简化训练,咱们替换了训练数据集。
与原论文相似,咱们应用的数据包含两局部:
1. 含 17 种分类 的花朵图片
2. 含 2 种分类 的花朵图片。

咱们后续将应用 17 分类数据进行模型的预训练,用 2 分类数据进行 fine-tuning 失去最终的预测模型, 并在 2 分类图片上进行评测。

模型构建

步骤一

该步骤中咱们要实现的算法流程局部如下图数字标记:

R-CNN 中采纳了 selective search 算法 来进行 region proposal。该算法首先通过基于图的图像宰割办法初始化原始区域,行将图像宰割成很多很多的小块。而后应用贪婪策略,计算每两个相邻的区域的类似度,而后每次合并最类似的两块,直至最终只剩下一块残缺的图片。并将该过程中每次产生的图像块包含合并的图像块都保留下来作为最终的RoI(Region of Interest)集。具体算法流程如下:

区域合并采纳了多样性的策略,如果简略采纳一种策略很容易谬误合并不类似的区域,比方只思考纹理时,不同色彩的区域很容易被误合并。selective search 采纳三种多样性策略来减少候选区域以保障召回:

  • 多种色彩空间,思考 RGB、灰度、HSV 及其变种
  • 多种类似度度量规范,既思考色彩类似度,又思考纹理、大小、重叠状况等
  • 通过扭转阈值初始化原始区域,阈值越大,宰割的区域越少

很多机器学习框架都内置实现了 selective search 操作。

步骤二

该步骤中咱们要实现的算法流程局部如下图数字标记:

在步骤一中咱们失去了由 selective search 算法 生成的 region proposals,但各 proposal 大小根本不统一,思考到region proposals 后续要被输出到 ConvNet 中进行特征提取,因而有必要将所有 region proposals 调整至对立且合乎 ConvNet 架构的规范尺寸。相干的代码实现如下:

import matplotlib.patches as mpatches
# Clip Image
def clip_pic(img, rect):
    x = rect[0]
    y = rect[1]
    w = rect[2]
    h = rect[3]
    x_1 = x + w
    y_1 = y + h
    # return img[x:x_1, y:y_1, :], [x, y, x_1, y_1, w, h]   
    return img[y:y_1, x:x_1, :], [x, y, x_1, y_1, w, h]

#Resize Image
def resize_image(in_image, new_width, new_height, out_image=None, resize_mode=cv2.INTER_CUBIC):
    img = cv2.resize(in_image, (new_width, new_height), resize_mode)
    if out_image:
        cv2.imwrite(out_image, img)
    return img

def image_proposal(img_path):
    img = cv2.imread(img_path)
    img_lbl, regions = selective_search(img, scale=500, sigma=0.9, min_size=10)
    candidates = set()
    images = []
    vertices = []
    for r in regions:
        # excluding same rectangle (with different segments)
        if r['rect'] in candidates:
            continue
        # excluding small regions
        if r['size'] < 220:
            continue
        if (r['rect'][2] * r['rect'][3]) < 500:
            continue
        # resize to 227 * 227 for input
        proposal_img, proposal_vertice = clip_pic(img, r['rect'])
        # Delete Empty array
        if len(proposal_img) == 0:
            continue
        # Ignore things contain 0 or not C contiguous array
        x, y, w, h = r['rect']
        if w == 0 or h == 0:
            continue
        # Check if any 0-dimension exist
        [a, b, c] = np.shape(proposal_img)
        if a == 0 or b == 0 or c == 0:
            continue
        resized_proposal_img = resize_image(proposal_img,224, 224)
        candidates.add(r['rect'])
        img_float = np.asarray(resized_proposal_img, dtype="float32")
        images.append(img_float)
        vertices.append(r['rect'])
    return images, vertices

让咱们抉择一张图片查看下 selective search 算法成果

img_path = './17flowers/jpg/7/image_0591.jpg' 
imgs, verts = image_proposal(img_path)
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
img = skimage.io.imread(img_path)
ax.imshow(img)
for x, y, w, h in verts:
    rect = mpatches.Rectangle((x, y), w, h, fill=False, edgecolor='red', linewidth=1)
    ax.add_patch(rect)
plt.show()


失去尺寸对立的 proposals 后,能够将其输出到 ConvNet 进行特征提取。这里咱们 ConvNet 应用的网络架构模型为AlexNet。其网络具体结构如下:

import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression

# Building 'AlexNet'
def create_alexnet(num_classes, restore = True):
    # Building 'AlexNet'
    network = input_data(shape=[None, 224, 224, 3])
    network = conv_2d(network, 96, 11, strides=4, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 256, 5, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 256, 3, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, num_classes, activation='softmax', restore=restore)
    network = regression(network, optimizer='momentum',
                         loss='categorical_crossentropy',
                         learning_rate=0.001)
    return network

至此,咱们实现了 ConvNet 局部的架构,通过 ConvNet 咱们能够从 proposal 上提取到feature map

步骤三、四

该步骤中咱们要实现的算法流程局部如下图数字标记:

失去每个 proposal 上提取到的 feature map 之后,咱们能够将其输出到 SVMs(值得注意的是 SVM 分类器的数量并不惟一,每对应一个分类类别咱们都须要训练一个 SVM。对应到咱们的数据集,最终要分类的花朵类别是两类,因而此时咱们的 SVM 数量为 2 个)中进行 分类判断
对于上述判断为正例(非背景)的 proposal 后续输出到 Bbox reg 中进行 bbox 的微调,并输入最终的边框预测。
在通晓了算法的整个流程后,当初让咱们着手于模型训练。

模型训练

R-CNN 模型的训练分为两步:

  1. 初始化 ConvNet 并应用大数据集预训练失去 预训练模型 ,在 预训练模型 上应用小数据集进行 fine-tuning 并失去最终的ConvNet
  2. 将图片输出模型,通过第一步中失去的 ConvNet 提取每个 proposal 的 feature map,应用feature map 来训练咱们的 分类器 SVMs 回归器 Bbox reg。(该过程 ConvNet 不参加学习,即 ConvNet参数放弃不变

首先在大数据集上 预训练 , 训练时 输出 X 原图片 正确标签 Y 原图片的分类。相干代码如下:

import codecs

def load_data(datafile, num_class, save=False, save_path='dataset.pkl'):
    fr = codecs.open(datafile, 'r', 'utf-8')
    train_list = fr.readlines()
    labels = []
    images = []
    for line in train_list:
        tmp = line.strip().split(' ')
        fpath = tmp[0]
        img = cv2.imread(fpath)
        img = resize_image(img, 224, 224)
        np_img = np.asarray(img, dtype="float32")
        images.append(np_img)

        index = int(tmp[1])
        label = np.zeros(num_class)
        label[index] = 1
        labels.append(label)
    if save:
        pickle.dump((images, labels), open(save_path, 'wb'))
    fr.close()
    return images, labels

def train(network, X, Y, save_model_path):
    # Training
    model = tflearn.DNN(network, checkpoint_path='model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output')
    if os.path.isfile(save_model_path + '.index'):
        model.load(save_model_path)
        print('load model...')
    for _ in range(5):
        model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True,
                  show_metric=True, batch_size=64, snapshot_step=200,
                  snapshot_epoch=False, run_id='alexnet_oxflowers17') # epoch = 1000
        # Save the model
        model.save(save_model_path)
        print('save model...')
        
X, Y = load_data('./train_list.txt', 17)
net = create_alexnet(17)
train(net, X, Y,'./pre_train_model/model_save.model')

之后在 预训练模型 上,应用小数据集 fine-tuning。这部分训练形式与上局部训练有两个不同点:
1.输出 应用 region proposal 生成的 RoI 而不是原图片。
2. 对于每个 RoI 的 正确标签 Y ,咱们通过计算 RoI 与 ground truth(原图片标注的检测物体范畴标签)的 IOU(Intersection over Union) 来确定。
IoU计算形式 如下图:


可知 IoU 取值∈[0,1]且取值越大表明 RoI 与 ground truth 差距越小。定义 IoU 大于 0.5 的候选区域为正样本,其余的为负样本。
计算 IoU 的代码如下:

# IOU Part 1
def if_intersection(xmin_a, xmax_a, ymin_a, ymax_a, xmin_b, xmax_b, ymin_b, ymax_b):
    if_intersect = False
    if xmin_a < xmax_b <= xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a):
        if_intersect = True
    elif xmin_a <= xmin_b < xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a):
        if_intersect = True
    elif xmin_b < xmax_a <= xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b):
        if_intersect = True
    elif xmin_b <= xmin_a < xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b):
        if_intersect = True
    else:
        return if_intersect
    if if_intersect:
        x_sorted_list = sorted([xmin_a, xmax_a, xmin_b, xmax_b])
        y_sorted_list = sorted([ymin_a, ymax_a, ymin_b, ymax_b])
        x_intersect_w = x_sorted_list[2] - x_sorted_list[1]
        y_intersect_h = y_sorted_list[2] - y_sorted_list[1]
        area_inter = x_intersect_w * y_intersect_h
        return area_inter


# IOU Part 2
def IOU(ver1, vertice2):
    # vertices in four points
    vertice1 = [ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]]
    area_inter = if_intersection(vertice1[0], vertice1[2], vertice1[1], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3])
    if area_inter:
        area_1 = ver1[2] * ver1[3]
        area_2 = vertice2[4] * vertice2[5]
        iou = float(area_inter) / (area_1 + area_2 - area_inter)
        return iou
    return False

在应用小数据集进行 fine-tuning 之前,让咱们实现相干训练数据(RoI 集的标签、对应图片、框体标记等)的读取工作,下方代码中咱们顺带读取并保留了用于 SVM 训练和指标框体回归的数据。

# Read in data and save data for Alexnet
def load_train_proposals(datafile, num_clss, save_path, threshold=0.5, is_svm=False, save=False):
    fr = open(datafile, 'r')
    train_list = fr.readlines()
    # random.shuffle(train_list)
    for num, line in enumerate(train_list):
        labels = []
        images = []
        rects = []
        tmp = line.strip().split(' ')
        # tmp0 = image address
        # tmp1 = label
        # tmp2 = rectangle vertices
        img = cv2.imread(tmp[0])
        # 抉择搜寻失去候选框
        img_lbl, regions = selective_search(img, scale=500, sigma=0.9, min_size=10)
        candidates = set()
        ref_rect = tmp[2].split(',')
        ref_rect_int = [int(i) for i in ref_rect]
        Gx = ref_rect_int[0]
        Gy = ref_rect_int[1]
        Gw = ref_rect_int[2]
        Gh = ref_rect_int[3]
        for r in regions:
            # excluding same rectangle (with different segments)
            if r['rect'] in candidates:
                continue
            # excluding small regions
            if r['size'] < 220:
                continue
            if (r['rect'][2] * r['rect'][3]) < 500:
                continue
            # 截取指标区域
            proposal_img, proposal_vertice = clip_pic(img, r['rect'])
            # Delete Empty array
            if len(proposal_img) == 0:
                continue
            # Ignore things contain 0 or not C contiguous array
            x, y, w, h = r['rect']
            if w == 0 or h == 0:
                continue
            # Check if any 0-dimension exist
            [a, b, c] = np.shape(proposal_img)
            if a == 0 or b == 0 or c == 0:
                continue
            resized_proposal_img = resize_image(proposal_img, 224, 224)
            candidates.add(r['rect'])
            img_float = np.asarray(resized_proposal_img, dtype="float32")
            images.append(img_float)
            # IOU
            iou_val = IOU(ref_rect_int, proposal_vertice)
            # x,y,w,h 作差,用于 boundingbox 回归
            rects.append([(Gx-x)/w, (Gy-y)/h, math.log(Gw/w), math.log(Gh/h)])
            # propasal_rect = [proposal_vertice[0], proposal_vertice[1], proposal_vertice[4], proposal_vertice[5]]
            # print(iou_val)
            # labels, let 0 represent default class, which is background
            index = int(tmp[1])
            if is_svm:
                # iou 小于阈值,为背景,0
                if iou_val < threshold:
                    labels.append(0)
                else:
                     labels.append(index)
            else:
                label = np.zeros(num_clss + 1)
                if iou_val < threshold:
                    label[0] = 1
                else:
                    label[index] = 1
                labels.append(label)


        if is_svm:
            ref_img, ref_vertice = clip_pic(img, ref_rect_int)
            resized_ref_img = resize_image(ref_img, 224, 224)
            img_float = np.asarray(resized_ref_img, dtype="float32")
            images.append(img_float)
            rects.append([0, 0, 0, 0])
            labels.append(index)
        view_bar("processing image of %s" % datafile.split('\\')[-1].strip(), num + 1, len(train_list))

        if save:
            if is_svm:
                # strip()去除首位空格
                np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'), [images, labels, rects])
            else:
                # strip()去除首位空格
                np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'),
                        [images, labels])
    print(' ')
    fr.close()
    
# load data
def load_from_npy(data_set):
    images, labels = [], []
    data_list = os.listdir(data_set)
    # random.shuffle(data_list)
    for ind, d in enumerate(data_list):
        i, l = np.load(os.path.join(data_set, d),allow_pickle=True)
        images.extend(i)
        labels.extend(l)
        view_bar("load data of %s" % d, ind + 1, len(data_list))
    print(' ')
    return images, labels

import math
import sys
#Progress bar 
def view_bar(message, num, total):
    rate = num / total
    rate_num = int(rate * 40)
    rate_nums = math.ceil(rate * 100)
    r = '\r%s:[%s%s]%d%%\t%d/%d' % (message, ">" * rate_num, " " * (40 - rate_num), rate_nums, num, total,)
    sys.stdout.write(r)
    sys.stdout.flush()

有了上述筹备咱们能够开始模型 fine-tuning 阶段的训练,相干代码如下:

def fine_tune_Alexnet(network, X, Y, save_model_path, fine_tune_model_path):
    # Training
    model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN')
    if os.path.isfile(fine_tune_model_path + '.index'):
        print("Loading the fine tuned model")
        model.load(fine_tune_model_path)
    elif os.path.isfile(save_model_path + '.index'):
        print("Loading the alexnet")
        model.load(save_model_path)
    else:
        print("No file to load, error")
        return False

    model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True,
              show_metric=True, batch_size=64, snapshot_step=200,
              snapshot_epoch=False, run_id='alexnet_rcnnflowers2')
    # Save the model
    model.save(fine_tune_model_path)
        
data_set = './data_set'
if len(os.listdir('./data_set')) == 0:
    print("Reading Data")
    load_train_proposals('./fine_tune_list.txt', 2, save=True, save_path=data_set)
print("Loading Data")
X, Y = load_from_npy(data_set)
restore = False
if os.path.isfile('./fine_tune_model/fine_tune_model_save.model' + '.index'):
    restore = True
    print("Continue fine-tune")
# three classes include background
net = create_alexnet(3, restore=restore)
fine_tune_Alexnet(net, X, Y, './pre_train_model/model_save.model', './fine_tune_model/fine_tune_model_save.model')

步骤二

该步骤中咱们要训练 SVMsBbox reg如下图数字标记:
首先咱们从步骤一这里应用的 CNN 模型里提取出 feature map,留神这里应用的 ConvNet 与之前训练时所用的相比少了最初一层 softmax,因为此时咱们须要的是从 RoI 上提取到的特色而训练中须要 softmax 层来进行分类。相干代码如下:

def create_alexnet():
    # Building 'AlexNet'
    network = input_data(shape=[None, 224, 224, 3])
    network = conv_2d(network, 96, 11, strides=4, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 256, 5, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 256, 3, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='tanh')
    network = regression(network, optimizer='momentum',
                         loss='categorical_crossentropy',
                         learning_rate=0.001)
    return network

每对应一个分类类别咱们都须要训练一个 SVM。咱们最终要分类的 花朵类别是两类 ,因而咱们须要训练的SVM 数量为 2 个
SVM 训练所用的 输出 为 RoI 中提取到的 feature map,所用的标签共有n+ 1 个类别(+ 1 的为背景),对应到咱们的数据集此时标签共有 三个类别
相干代码如下:

from sklearn import svm
from sklearn.externals import joblib

# Construct cascade svms
def train_svms(train_file_folder, model):
    files = os.listdir(train_file_folder)
    svms = []
    train_features = []
    bbox_train_features = []
    rects = []
    for train_file in files:
        if train_file.split('.')[-1] == 'txt':
            X, Y, R = generate_single_svm_train(os.path.join(train_file_folder, train_file))
            Y1 = []
            features1 = []
            features_hard = []
            for ind, i in enumerate(X):
                # extract features 提取特色
                feats = model.predict([i])
                train_features.append(feats[0])
                # 所有正负样本退出 feature1,Y1
                if Y[ind]>=0:
                    Y1.append(Y[ind])
                    features1.append(feats[0])
                    # 对与 groundtruth 的 iou>0.5 的退出 boundingbox 训练集
                    if Y[ind]>0:
                        bbox_train_features.append(feats[0])
                view_bar("extract features of %s" % train_file, ind + 1, len(X))

            clf = svm.SVC(probability=True)

            clf.fit(features1, Y1)
            print(' ')
            print("feature dimension")
            print(np.shape(features1))
            svms.append(clf)
            # 将 clf 序列化,保留 svm 分类器
            joblib.dump(clf, os.path.join(train_file_folder, str(train_file.split('.')[0]) + '_svm.pkl'))

    # 保留 boundingbox 回归训练集
    np.save((os.path.join(train_file_folder, 'bbox_train.npy')),
            [bbox_train_features, rects])
    return svms

# Load training images
def generate_single_svm_train(train_file):
    save_path = train_file.rsplit('.', 1)[0].strip()
    if len(os.listdir(save_path)) == 0:
        print("reading %s's svm dataset"% train_file.split('\\')[-1])
        load_train_proposals(train_file, 2, save_path, threshold=0.3, is_svm=True, save=True)
    print("restoring svm dataset")
    images, labels,rects = load_from_npy_(save_path)

    return images, labels,rects

# load data
def load_from_npy_(data_set):
    images, labels ,rects= [], [], []
    data_list = os.listdir(data_set)
    # random.shuffle(data_list)
    for ind, d in enumerate(data_list):
        i, l, r = np.load(os.path.join(data_set, d),allow_pickle=True)
        images.extend(i)
        labels.extend(l)
        rects.extend(r)
        view_bar("load data of %s" % d, ind + 1, len(data_list))
    print(' ')
    return images, labels ,rects

回归器是线性的,输出为 N 对值,{(????????,????????)}????=1,2,…,????{(Pi,Gi)}i=1,2,…,N,别离为候选区域的框坐标和实在的框坐标。相干代码如下:

from sklearn.linear_model import Ridge

#在图片上显示 boundingbox
def show_rect(img_path, regions):
    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
    img = skimage.io.imread(img_path)
    ax.imshow(img)
    for x, y, w, h in regions:
        rect = mpatches.Rectangle((x, y), w, h, fill=False, edgecolor='red', linewidth=1)
        ax.add_patch(rect)
    plt.show()
    

# 训练 boundingbox 回归
def train_bbox(npy_path):
    features, rects = np.load((os.path.join(npy_path, 'bbox_train.npy')),allow_pickle=True)
    # 不能间接 np.array(),应该把元素全副取出放入空列表中。因为 features 和 rects 建设时用的 append,导致其中元素构造不能间接转换成矩阵
    X = []
    Y = []
    for ind, i in enumerate(features):
        X.append(i)
    X_train = np.array(X)

    for ind, i in enumerate(rects):
        Y.append(i)
    Y_train = np.array(Y)

    # 线性回归模型训练
    clf = Ridge(alpha=1.0)
    clf.fit(X_train, Y_train)
    # 序列化,保留 bbox 回归
    joblib.dump(clf, os.path.join(npy_path,'bbox_train.pkl'))
    return clf

开始训练 SVM 分类器与框体回归器。

train_file_folder = './svm_train'
# 建设模型,网络
net = create_alexnet()
model = tflearn.DNN(net)
# 加载微调后的 alexnet 网络参数
model.load('./fine_tune_model/fine_tune_model_save.model')
# 加载 / 训练 svm 分类器 和 boundingbox 回归器
svms = []
bbox_fit = []
# boundingbox 回归器是否有存档
bbox_fit_exit = 0
# 加载 svm 分类器和 boundingbox 回归器
for file in os.listdir(train_file_folder):
    if file.split('_')[-1] == 'svm.pkl':
        svms.append(joblib.load(os.path.join(train_file_folder, file)))
    if file == 'bbox_train.pkl':
        bbox_fit = joblib.load(os.path.join(train_file_folder, file))
        bbox_fit_exit = 1
if len(svms) == 0:
    svms = train_svms(train_file_folder, model)
if bbox_fit_exit == 0:
    bbox_fit = train_bbox(train_file_folder)

print("Done fitting svms")

至此模型已训练结束。

模型成果查看

让咱们抉择一张图片顺着模型正向流传的程序查看模型的具体运行成果。首先查看下 region proposal 所产生的 RoI 区域。

img_path = './2flowers/jpg/1/image_1282.jpg'  
image = cv2.imread(img_path)
im_width = image.shape[1]
im_height = image.shape[0]
# 提取 region proposal
imgs, verts = image_proposal(img_path)
show_rect(img_path, verts)


将 RoI 输出 ConvNet 中失去特色并输出 SVMs 中与回归器中,并选取 SVM 分类后果为正例的样例进行边框回归。

# 从 CNN 中提取 RoI 的特色
features = model.predict(imgs)
print("predict image:")
# print(np.shape(features))
results = []
results_label = []
results_score = []
count = 0
print(len(features))
for f in features:
    for svm in svms:
        pred = svm.predict([f.tolist()])
        # not background
        if pred[0] != 0:
            # boundingbox 回归
            bbox = bbox_fit.predict([f.tolist()])
            tx, ty, tw, th = bbox[0][0], bbox[0][1], bbox[0][2], bbox[0][3]
            px, py, pw, ph = verts[count]
            gx = tx * pw + px
            gy = ty * ph + py
            gw = math.exp(tw) * pw
            gh = math.exp(th) * ph
            if gx < 0:
                gw = gw - (0 - gx)
                gx = 0
            if gx + gw > im_width:
                gw = im_width - gx
            if gy < 0:
                gh = gh - (0 - gh)
                gy = 0
            if gy + gh > im_height:
                gh = im_height - gy
            results.append([gx, gy, gw, gh])
            results_label.append(pred[0])
            results_score.append(svm.predict_proba([f.tolist()])[0][1])
    count += 1
print(results)
print(results_label)
print(results_score)
show_rect(img_path, results)


能够看到可能会失去数量大于一的框体,此时咱们须要借助 NMS(Non-Maximum Suppression)来抉择出绝对最优的后果。
代码如下:

results_final = []
results_final_label = []

# 非极大克制
# 删除得分小于 0.5 的候选框
delete_index1 = []
for ind in range(len(results_score)):
    if results_score[ind] < 0.5:
        delete_index1.append(ind)
num1 = 0
for idx in delete_index1:
    results.pop(idx - num1)
    results_score.pop(idx - num1)
    results_label.pop(idx - num1)
    num1 += 1

while len(results) > 0:
    # 找到列表中得分最高的
    max_index = results_score.index(max(results_score))
    max_x, max_y, max_w, max_h = results[max_index]
    max_vertice = [max_x, max_y, max_x + max_w, max_y + max_h, max_w, max_h]
    # 该候选框退出最终后果
    results_final.append(results[max_index])
    results_final_label.append(results_label[max_index])
    # 从 results 中删除该候选框
    results.pop(max_index)
    results_label.pop(max_index)
    results_score.pop(max_index)
    # print(len(results_score))
    # 删除与得分最高候选框 iou>0.5 的其余候选框
    delete_index = []
    for ind, i in enumerate(results):
        iou_val = IOU(i, max_vertice)
        if iou_val > 0.5:
            delete_index.append(ind)
    num = 0
    for idx in delete_index:
        # print('\n')
        # print(idx)
        # print(len(results))
        results.pop(idx - num)
        results_score.pop(idx - num)
        results_label.pop(idx - num)
        num += 1

print("result:",results_final)
print("result label:",results_final_label)
show_rect(img_path, results_final)

总结

至此咱们失去了一个毛糙的 R -CNN 模型。
R-CNN 灵便地使用了过后比拟先进的工具和技术,并充沛排汇,依据本人的逻辑革新,最终获得了很大的提高。但其中也有不少显著的毛病:

  1. 训练过于繁琐:微调网络 + 训练 SVM+ 边框回归,其中会波及到许多硬盘读写操作效率低下。
  2. 每个 RoI 都须要通过 CNN 网络进行特征提取,产生了大量的额定运算(设想一下两个有重合局部的 RoI,重合局部相当于进行了两次卷积运算,但实践上来说仅需进行一次)。
  3. 运行速度慢,像独立特征提取、应用 selective search 作为 region proposal 等都过于耗时。

侥幸的是,这些问题在后续的 Fast R-CNN 与 Faster R-CNN 都有了很大的改善。

我的项目地址

https://momodel.cn/workspace/5f1ec0505607a4070d65203b?type=app

退出移动版