关于程序员:利用mAP评估目标检测模型

在本文中，咱们将理解如何应用 precision 和召回率来计算均匀精度 (mAP)。mAP 将实在边界框与检测到的框进行比拟并返回分数。分数越高，模型的检测越精确。

之前咱们具体钻研了混同矩阵、模型准确性、精确度和召回率。咱们也应用 Scikit-learn 库来计算这些指标。当初咱们将扩大探讨以理解如何应用精度和召回率来计算 mAP。

在本节中，咱们将疾速回顾一下如何从预测分数中派生出类标签。鉴于有两个类别，正类和负类，这里是 10 个样本的实在标签。

y_true = ["positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive", "negative", "positive"]

当这些样本被输出模型时，它会返回以下预测分数。基于这些分数，咱们如何对样本进行分类（即为每个样本调配一个类标签）？

pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3]

为了将分数转换为类别标签，应用了一个阈值。当分数等于或高于阈值时，样本被归为一类。否则，它被归类为其余类别。如果样本的分数高于或等于阈值，则该样本为阳性。否则，它是负面的。下一个代码块将分数转换为阈值为 0.5 的类别标签。

import numpy

pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3]
y_true = ["positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive", "negative", "positive"]

threshold = 0.5
y_pred = ["positive" if score >= threshold else "negative" for score in pred_scores]
print(y_pred)

y_pred

['positive', 'negative', 'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative']

当初 y_true 和 y_pred 变量中都提供了实在标签和预测标签。基于这些标签，能够计算混同矩阵、精度和召回率。

r = numpy.flip(sklearn.metrics.confusion_matrix(y_true, y_pred))
print(r)

precision = sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
print(precision)

recall = sklea

后果

# Confusion Matrix (From Left to Right & Top to Bottom: True Positive, False Negative, False Positive, True Negative)
[[4 2]
 [1 3]]

# Precision = 4/(4+1)
0.8

# Recall = 4/(4+2)
0.6666666666666666

在疾速回顾了计算准确率和召回率之后，在下一节中咱们将探讨创立准确率 - 召回率曲线。

依据第 1 局部给出的精度和召回率的定义，请记住 精度越高，模型将样本分类为正时的置信度就越高。召回率越高，模型正确分类为正的正样本就越多。

当一个模型的召回率高但精度低时，该模型会正确分类大部分正样本，但它有很多误报（行将许多负样本分类为正样本）。当模型具备高精度但召回率低时，模型将样本分类为正样本时是精确的，但它可能仅对局部正样本进行分类。

因为精度和召回率的重要性，精度 - 召回率曲线显示了不同阈值的精度和召回率值之间的衡量。该曲线有助于抉择最佳阈值以最大化两个指标。

创立准确 - 召回曲线须要一些输出：

实在标签。
样本的预测分数。
将预测分数转换为类别标签的一些阈值。

上面的代码块创立 y_true 列表来保留实在标签，pred_scores 列表用于预测分数，最初是不同阈值的阈值列表。

import numpy

y_true = ["positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive", "negative", "positive", "positive", "positive", "positive", "negative", "negative", "negative"]

pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3, 0.7, 0.5, 0.8, 0.2, 0.3, 0.35]

以下是保留在阈值列表中的阈值。因为有 10 个阈值，所以将创立 10 个精度和召回值。

[0.2, 
 0.25, 
 0.3, 
 0.35, 
 0.4, 
 0.45, 
 0.5, 
 0.55, 
 0.6, 
 0.65]

上面是名为 precision_recall_curve() 的函数，其承受实在标签、预测分数和阈值。它返回两个代表精度和召回值的等长列表。

import sklearn.metrics

def precision_recall_curve(y_true, pred_scores, thresholds):
    precisions = []
    recalls = []
    
    for threshold in thresholds:
        y_pred = ["positive" if score >= threshold else "negative" for score in pred_scores]

        precision = sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        recall = sklearn.metrics.recall_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        
        precisions.append(precision)
        recalls.append(recall)

    return precisions, recalls

下一段代码在三个先前筹备好的列表后调用 precision_recall_curve() 函数。它返回精度和召回列表，别离蕴含精度和召回的所有值。

precisions, recalls = precision_recall_curve(y_true=y_true, 
                                             pred_scores=pred_scores,
                                             thresholds=thresholds)

以下是精度列表中的返回值。

[0.5625,
 0.5714285714285714,
 0.5714285714285714,
 0.6363636363636364,
 0.7,
 0.875,
 0.875,
 1.0,
 1.0,
 1.0]

这是召回列表中的值列表。

[1.0,
 0.8888888888888888,
 0.8888888888888888,
 0.7777777777777778,
 0.7777777777777778,
 0.7777777777777778,
 0.7777777777777778,
 0.6666666666666666,
 0.5555555555555556,
 0.4444444444444444]

给定两个长度相等的列表，能够在二维图中绘制它们的值，如下所示。

matplotlib.pyplot.plot(recalls, precisions, linewidth=4, color="red")
matplotlib.pyplot.xlabel("Recall", fontsize=12, fontweight='bold')
matplotlib.pyplot.ylabel("Precision", fontsize=12, fontweight='bold')
matplotlib.pyplot.title("Precision-Recall Curve", fontsize=15, fontweight="bold")
matplotlib.pyplot.show()

准确率 - 召回率曲线如下图所示。请留神，随着召回率的减少，精度会升高。起因是当正样本数量减少（高召回率）时，正确分类每个样本的准确率升高（低精度）。这是预料之中的，因为当有很多样本时，模型更有可能预测出错。

准确率 - 召回率曲线能够很容易地确定准确率和召回率都高的点。依据上图，最好的点是(recall, precision)=(0.778, 0.875)。

应用上图以图形形式确定精度和召回率的最佳值可能无效，因为曲线并不简单。更好的办法是应用称为 f1 分数的指标，它是依据下一个等式计算的。

f1 指标掂量准确率和召回率之间的均衡。当 f1 的值很高时，这意味着精度和召回率都很高。较低的 f1 分数意味着精确度和召回率之间的失衡更大。

依据后面的例子，f1 是依据上面的代码计算的。依据 f1 列表中的值，最高分是 0.82352941。它是列表中的第 6 个元素（即索引 5）。召回率和精度列表中的第 6 个元素别离为 0.778 和 0.875。相应的阈值为 0.45。

f1 = 2 * ((numpy.array(precisions) * numpy.array(recalls)) / (numpy.array(precisions) + numpy.array(recalls)))

下图以蓝色显示了与召回率和准确率之间的最佳均衡绝对应的点的地位。总之，均衡精度和召回率的最佳阈值是 0.45，此时精度为 0.875，召回率为 0.778。

matplotlib.pyplot.plot(recalls, precisions, linewidth=4, color="red", zorder=0)
matplotlib.pyplot.scatter(recalls[5], precisions[5], zorder=1, linewidth=6)

matplotlib.pyplot.xlabel("Recall", fontsize=12, fontweight='bold')
matplotlib.pyplot.ylabel("Precision", fontsize=12, fontweight='bold')
matplotlib.pyplot.title("Precision-Recall Curve", fontsize=15, fontweight="bold")
matplotlib.pyplot.show()

均匀精度 (AP) 是一种将精度召回曲线汇总为示意所有精度平均值的单个值的办法。依据面等式计算 AP。应用遍历所有精度 / 召回率的循环，计算以后召回率和下一次召回率之间的差别，而后乘以以后精度。换句话说，AP 是每个阈值的精度加权和，其中权重是召回率的减少。

别离在召回率和准确率列表上附加 0 和 1 很重要。例如，如果召回列表为 0.8、0.60.8、0.6，则应在 0.8、0.6、0.00.8、0.6、0.0 后附加 0。精度列表也是如此，但附加了 1 而不是 0（例如 0.8、0.2、1.00.8、0.2、1.0）。

鉴于召回率和精度都是 NumPy 数组，后面的等式依据上面 Python 代码建模。

AP = numpy.sum((recalls[:-1] - recalls[1:]) * precisions[:-1])

这是计算 AP 的残缺代码。

import numpy
import sklearn.metrics

def precision_recall_curve(y_true, pred_scores, thresholds):
    precisions = []
    recalls = []
    
    for threshold in thresholds:
        y_pred = ["positive" if score >= threshold else "negative" for score in pred_scores]

        precision = sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        recall = sklearn.metrics.recall_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        
        precisions.append(precision)
        recalls.append(recall)

    return precisions, recalls

y_true = ["positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive", "negative", "positive", "positive", "positive", "positive", "negative", "negative", "negative"]
pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3, 0.7, 0.5, 0.8, 0.2, 0.3, 0.35]
thresholds=numpy.arange(start=0.2, stop=0.7, step=0.05)

precisions, recalls = precision_recall_curve(y_true=y_true, 
                                             pred_scores=pred_scores, 
                                             thresholds=thresholds)

precisions.append(1)
recalls.append(0)

precisions = numpy.array(precisions)
recalls = numpy.array(recalls)

AP = numpy.sum((recalls[:-1] - recalls[1:]) * precisions[:-1])
print(AP)

这都是对于均匀精度的。以下是计算 AP 的步骤摘要：

应用模型生成预测分数。
将预测分数转换为类别标签。
计算混同矩阵。
计算精度和召回率指标。
创立准确召回曲线。
测量均匀精度。

要训练指标检测模型，通常有 2 个输出：

图片
图像检测后果的实在框

该模型预测检测到的对象的边界框。预计预测框不会与实在框齐全匹配。下图显示了猫的图像。对象的实在框为红色，而预测框为黄色。基于 2 个框的可视化，模型是否做出了高匹配分数的良好预测？

很难主观地评估模型预测。例如，有人可能会得出匹配率为 50% 的论断，而其他人则留神到匹配率为 60%。

更好的代替办法是应用定量测量来对实在框和预测框的匹配水平进行评分。此度量是交并集 (IoU)。IoU 有助于理解一个区域是否有对象。

IoU 是依据上面等式计算的，通过将 2 个框之间的穿插区域除以它们的联结区域。IoU 越高，预测越好。

下图显示了具备不同 IoU 的 3 个案例。请留神，每个案例顶部的 IoU 都是主观测量的，可能与事实略有不同，但它是有情理的。
对于案例 A，预测的黄色框远未与红色真值框对齐，因而 IoU 得分为 0.2（即两个框之间只有 20% 的重叠）。
对于状况 B，2 个框之间的穿插区域更大，但 2 个框依然没有很好地对齐，因而 IoU 分数为 0.5。
对于案例 C，两个框的坐标十分靠近，因而它们的 IoU 为 0.9（即两个框之间有 90% 的重叠）。
请留神，当预测框和实在框之间的重叠率为 0% 时，IoU 为 0.0。当 2 个框彼此 100% 匹配时，IoU 为 1.0。

要计算图像的 IoU，这里有一个名为 intersection_over_union() 的函数。它承受以下 2 个参数：

gt_box：实在边界框。
pred_box：预测边界框。

它别离计算交加和并集变量中两个框之间的交加和并集。此外，IoU 是在 iou 变量中计算的。它返回所有这 3 个变量。

def intersection_over_union(gt_box, pred_box):
    inter_box_top_left = [max(gt_box[0], pred_box[0]), max(gt_box[1], pred_box[1])]
    inter_box_bottom_right = [min(gt_box[0]+gt_box[2], pred_box[0]+pred_box[2]), min(gt_box[1]+gt_box[3], pred_box[1]+pred_box[3])]

    inter_box_w = inter_box_bottom_right[0] - inter_box_top_left[0]
    inter_box_h = inter_box_bottom_right[1] - inter_box_top_left[1]

    intersection = inter_box_w * inter_box_h
    union = gt_box[2] * gt_box[3] + pred_box[2] * pred_box[3] - intersection
    
    iou = intersection / union

    return iou, intersection, union

传递给函数的边界框是一个蕴含 4 个元素的列表，它们是：

左上角的 x 轴。
左上角的 y 轴。
宽
高

这是汽车图像的实在边界框和预测边界框。

gt_box = [320, 220, 680, 900]
pred_box = [500, 320, 550, 700]

图像名为 cat.jpg，上面是在图像上绘制边界框的残缺示例。

import imageio
import matplotlib.pyplot
import matplotlib.patches

def intersection_over_union(gt_box, pred_box):
    inter_box_top_left = [max(gt_box[0], pred_box[0]), max(gt_box[1], pred_box[1])]
    inter_box_bottom_right = [min(gt_box[0]+gt_box[2], pred_box[0]+pred_box[2]), min(gt_box[1]+gt_box[3], pred_box[1]+pred_box[3])]

    inter_box_w = inter_box_bottom_right[0] - inter_box_top_left[0]
    inter_box_h = inter_box_bottom_right[1] - inter_box_top_left[1]

    intersection = inter_box_w * inter_box_h
    union = gt_box[2] * gt_box[3] + pred_box[2] * pred_box[3] - intersection
    
    iou = intersection / union

    return iou, intersection, union

im = imageio.imread("cat.jpg")

gt_box = [320, 220, 680, 900]
pred_box = [500, 320, 550, 700]

fig, ax = matplotlib.pyplot.subplots(1)
ax.imshow(im)

gt_rect = matplotlib.patches.Rectangle((gt_box[0], gt_box[1]),
                                       gt_box[2],
                                       gt_box[3],
                                       linewidth=5,
                                       edgecolor='r',
                                       facecolor='none')

pred_rect = matplotlib.patches.Rectangle((pred_box[0], pred_box[1]),
                                         pred_box[2],
                                         pred_box[3],
                                         linewidth=5,
                                         edgecolor=(1, 1, 0),
                                         facecolor='none')
ax.add_patch(gt_rect)
ax.add_patch(pred_rect)

ax.axes.get_xaxis().set_ticks([])
ax.axes.get_yaxis().set_ticks([])

下图显示了带有边界框的图像。

要计算 IoU，只需调用 intersection_over_union() 函数。基于边界框，IoU 得分为 0.54。

iou, intersect, union = intersection_over_union(gt_box, pred_box)
print(iou, intersect, union)

后果

IoU 分数 0.54 意味着实在边界框和预测边界框之间有 54% 的重叠。看着方框，有人可能在视觉上感觉它足以得出模型检测到猫对象的论断。其他人可能会感觉模型还不精确，因为预测框与实在框不太吻合。

为了主观地判断模型是否正确预测了框的地位，应用了一个阈值。如果模型预测 IoU 分数大于或等于阈值的框，则预测框与其中一个实在框之间存在高度重叠。这意味着该模型可能胜利检测到一个对象。检测到的区域被归类为阳性（即蕴含一个对象）。

另一方面，当 IoU 分数小于阈值时，模型做出了谬误的预测，因为预测框与实在框不重叠。这意味着检测到的区域被归类为负面（即不蕴含对象）。

让咱们举个例子来说明 IoU 分数如何帮忙将区域分类为对象。假如对象检测模型由下一张图像提供，其中有 2 个指标对象，其实在框为红色，预测框为黄色。
下一个代码读取图像（假如它被命名为 pets.jpg），绘制框，并计算每个对象的 IoU。左侧对象的 IoU 为 0.76，而另一个对象的 IoU 分数为 0.26。

import matplotlib.pyplot
import matplotlib.patches
import imageio

def intersection_over_union(gt_box, pred_box):
    inter_box_top_left = [max(gt_box[0], pred_box[0]), max(gt_box[1], pred_box[1])]
    inter_box_bottom_right = [min(gt_box[0]+gt_box[2], pred_box[0]+pred_box[2]), min(gt_box[1]+gt_box[3], pred_box[1]+pred_box[3])]

    inter_box_w = inter_box_bottom_right[0] - inter_box_top_left[0]
    inter_box_h = inter_box_bottom_right[1] - inter_box_top_left[1]

    intersection = inter_box_w * inter_box_h
    union = gt_box[2] * gt_box[3] + pred_box[2] * pred_box[3] - intersection
    
    iou = intersection / union

    return iou, intersection, union, 

im = imageio.imread("pets.jpg")

gt_box = [10, 130, 370, 350]
pred_box = [30, 100, 370, 350]

iou, intersect, union = intersection_over_union(gt_box, pred_box)
print(iou, intersect, union)

fig, ax = matplotlib.pyplot.subplots(1)
ax.imshow(im)

gt_rect = matplotlib.patches.Rectangle((gt_box[0], gt_box[1]),
                                       gt_box[2],
                                       gt_box[3],
                                       linewidth=5,
                                       edgecolor='r',
                                       facecolor='none')

pred_rect = matplotlib.patches.Rectangle((pred_box[0], pred_box[1]),
                                         pred_box[2],
                                         pred_box[3],
                                         linewidth=5,
                                         edgecolor=(1, 1, 0),
                                         facecolor='none')
ax.add_patch(gt_rect)
ax.add_patch(pred_rect)

gt_box = [645, 130, 310, 320]
pred_box = [500, 60, 310, 320]

iou, intersect, union = intersection_over_union(gt_box, pred_box)
print(iou, intersect, union)

gt_rect = matplotlib.patches.Rectangle((gt_box[0], gt_box[1]),
                                       gt_box[2],
                                       gt_box[3],
                                       linewidth=5,
                                       edgecolor='r',
                                       facecolor='none')

pred_rect = matplotlib.patches.Rectangle((pred_box[0], pred_box[1]),
                                         pred_box[2],
                                         pred_box[3],
                                         linewidth=5,
                                         edgecolor=(1, 1, 0),
                                         facecolor='none')
ax.add_patch(gt_rect)
ax.add_patch(pred_rect)

ax.axes.get_xaxis().set_ticks([])
ax.axes.get_yaxis().set_ticks([])

鉴于 IoU 阈值为 0.6，则只有 IoU 分数大于或等于 0.6 的区域被归类为正（即有物体）。因而，IoU 得分为 0.76 的框为正，而另一个 IoU 为 0.26 的框为负。

如果阈值更改为 0.2 而不是 0.6，则两个预测都是 Positive 的。如果阈值为 0.8，则两个预测都是负面的。

作为总结，IoU 分数掂量预测框与实在框的靠近水平。它的范畴从 0.0 到 1.0，其中 1.0 是最佳后果。当 IoU 大于阈值时，该框被分类为正，因为它围绕着一个对象。否则，它被归类为负面。

通常，指标检测模型应用不同的 IoU 阈值进行评估，其中每个阈值可能给出与其余阈值不同的预测。假如模型由一个图像提供，该图像具备散布在 2 个类中的 10 个对象。如何计算 mAP？

要计算 mAP，首先要计算每个类的 AP。所有类别的 AP 的平均值是 mAP。

假如应用的数据集只有 2 个类。对于第一类，这里别离是 y_true 和 pred_scores 变量中的实在标签和预测分数。

y_true = ["positive", "negative", "positive", "negative", "positive", "positive", "positive", "negative", "positive", "negative"]

pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.75, 0.2, 0.8, 0.3]

这是第二类的 y_true 和 pred_scores 变量。

y_true = ["negative", "positive", "positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive"]

pred_scores = [0.32, 0.9, 0.5, 0.1, 0.25, 0.9, 0.55, 0.3, 0.35, 0.85]

IoU 阈值列表从 0.2 到 0.9，步长为 0.25。

thresholds = numpy.arange(start=0.2, stop=0.9, step=0.05)

要计算一个类的 AP，只需将其 y_true 和 pred_scores 变量提供给下一个代码。

precisions, recalls = precision_recall_curve(y_true=y_true, 
                                             pred_scores=pred_scores, 
                                             thresholds=thresholds)

matplotlib.pyplot.plot(recalls, precisions, linewidth=4, color="red", zorder=0)

matplotlib.pyplot.xlabel("Recall", fontsize=12, fontweight='bold')
matplotlib.pyplot.ylabel("Precision", fontsize=12, fontweight='bold')
matplotlib.pyplot.title("Precision-Recall Curve", fontsize=15, fontweight="bold")
matplotlib.pyplot.show()

precisions.append(1)
recalls.append(0)

precisions = numpy.array(precisions)
recalls = numpy.array(recalls)

AP = numpy.sum((recalls[:-1] - recalls[1:]) * precisions[:-1])
print(AP)

对于第一类，这是它的准确召回曲线。基于此曲线，AP 为 0.949。

第二类的 precision-recall 曲线如下图所示。它的 AP 是 0.958。

基于 2 个类（0.949 和 0.958）的 AP，依据上面等式计算指标检测模型的 mAP。

基于此等式，mAP 为 0.9535。

mAP = (0.949 + 0.958)/2 = 0.9535

本教程探讨了如何计算指标检测模型的均匀精度 (mAP)。咱们首先探讨如何将预测分数转换为类别标签。应用不同的阈值，创立准确召回曲线。从该曲线能够测量均匀精度 (AP)。

对于指标检测模型，阈值是对检测到的对象进行评分的 IoU。一旦为数据集中的每个类测量了 AP，就会计算出 mAP。

欢送 Star -> 学习目录

本文由 mdnice 多平台公布

关于程序员:利用mAP评估目标检测模型

1. 从预测分数到类别标签

2. PR 曲线

3. AP

4. IoU

5. mAP

总结