关于ruby:从论文到代码完成-RoIPooling

RoI Pooling
失去特色图和候选框，就会将候选框投影在特色图，而后进行一次缩放失去大小一致的特色图，在 Faster RCNN 中，区域候选框用来预测对象是前景还是背景，这是 class head 要做的工作，而 regression 是学习到基于 anchor 的差分，也就是核心的偏移量和宽高的缩放。

在投影过程中候选框的尺寸和地位是相干于输出图像，而不是相干于特色图，首先需要将其进行转换到候选框在特色图上具体地位，而后在对提取候选框进行尺寸的缩放。

给定一个特色图和一组提议，返回会合的特色示意。区域提议网络被用来预测对象性和回归盒的偏差（对锚点）。这些偏移量与 anchor 拆散起来生成候选框。这些倡导通常是输出图像的大小而不是特色层的大小。因此，这些倡导需要按比例缩小到特色图层，之所以这样做，以便上游的 CNN 层能够提取特色。

咱们在原图上有一个尺寸，也就是候选框中心点的坐标以及宽度，首先咱们投影在原图上坐标点除以下采样的倍数，也就是 32 倍下采样，如果坐标无奈整除则进行取整操作。

import numpy as np
import torch
import torch.nn as nn
floattype = torch.cuda.FloatTensor
class TorchROIPool(object):
    def __init__(self, output_size, scaling_factor):
        #输入特色图的尺寸
        self.output_size = output_size
        #缩放比率
        self.scaling_factor = scaling_factor
    def _roi_pool(self, features):
        """
        在给定的缩放提取特色图基础，返回固定大小的特色图
        Args:
            features (np.Array): 
        """
        # 特色图的通道数、高 和 宽
        num_channels, h, w = features.shape
        
        # 计算步长
        w_stride = w/self.output_size
        h_stride = h/self.output_size
        # 
        res = torch.zeros((num_channels, self.output_size, self.output_size))
        res_idx = torch.zeros((num_channels, self.output_size, self.output_size))
        
        for i in range(self.output_size):
            for j in range(self.output_size):
                
                # important to round the start and end, and then conver to int
                # 
                w_start = int(np.floor(j*w_stride))
                w_end = int(np.ceil((j+1)*w_stride))
                h_start = int(np.floor(i*h_stride))
                h_end = int(np.ceil((i+1)*h_stride))
                # limiting start and end based on feature limits
                # 
                w_start = min(max(w_start, 0), w)
                w_end = min(max(w_end, 0), w)
                h_start = min(max(h_start, 0), h)
                h_end = min(max(h_end, 0), h)
                patch = features[:, h_start: h_end, w_start: w_end]
                max_val, max_idx = torch.max(patch.reshape(num_channels, -1), dim=1)
                res[:, i, j] = max_val
                res_idx[:, i, j] = max_idx
        return res, res_idx
    def __call__(self, feature_layer, proposals):
        """Given feature layers and a list of proposals, it returns pooled
        respresentations of the proposals. Proposals are scaled by scaling factor
        before pooling.
        Args:
            feature_layer (np.Array): 特色层尺寸
            proposals (list of np.Array): 列表中每一个元素 Each element of the list represents a bounding
            box as (w,y,w,h)
        Returns:
            np.Array: proposal 数量，通道数，输入特色图高度, self.output_size
        """
        batch_size, num_channels, _, _ = feature_layer.shape
        # first scale proposals based on self.scaling factor 
        scaled_proposals = torch.zeros_like(proposals)
        # the rounding by torch.ceil is important for ROI pool
        scaled_proposals[:, 0] = torch.ceil(proposals[:, 0] * self.scaling_factor)
        scaled_proposals[:, 1] = torch.ceil(proposals[:, 1] * self.scaling_factor)
        scaled_proposals[:, 2] = torch.ceil(proposals[:, 2] * self.scaling_factor)
        scaled_proposals[:, 3] = torch.ceil(proposals[:, 3] * self.scaling_factor)
        res = torch.zeros((len(proposals), num_channels, self.output_size,
                        self.output_size))
        res_idx = torch.zeros((len(proposals), num_channels, self.output_size,
                        self.output_size))
        
        # 遍历候选框
        for idx in range(len(proposals)):
            #
            proposal = scaled_proposals[idx]
            # adding 1 to include the end indices from proposal
            extracted_feat = feature_layer[0, :, proposal[1].to(dtype=torch.int8):proposal[3].to(dtype=torch.int8)+1, proposal[0].to(dtype=torch.int8):proposal[2].to(dtype=torch.int8)+1]
            res[idx], res_idx[idx] = self._roi_pool(extracted_feat)
        return res

从论文到代码实现 RoIPooling