关于人工智能:Residual-BottleNeck-Inverted-Residual-MBConv的解释和Pytorch实现

上篇 ConvNext 的文章有小伙伴问 BottleNeck，Inverted Residual 的区别，所以找了这篇文章，具体的解释一些用到的卷积块，当作趁热打铁吧

在介绍下面的这些概念之间，咱们先创立一个通用的 conv-norm-act 层，这也是最根本的卷积块。

fromfunctoolsimportpartial
fromtorchimportnn

classConvNormAct(nn.Sequential):
    def__init__(
        self,
        in_features: int,
        out_features: int,
        kernel_size: int,
        norm: nn.Module = nn.BatchNorm2d,
        act: nn.Module = nn.ReLU,
        **kwargs
    ):

        super().__init__(
            nn.Conv2d(
                in_features,
                out_features,
                kernel_size=kernel_size,
                padding=kernel_size//2,
            ),
            norm(out_features),
            act(),)

Conv1X1BnReLU = partial(ConvNormAct, kernel_size=1)
Conv3X3BnReLU = partial(ConvNormAct, kernel_size=3)
importtorch

x = torch.randn((1, 32, 56, 56))

Conv1X1BnReLU(32, 64)(x).shape

#torch.Size([1, 64, 56, 56])

ResNet 中提出并应用了残差连贯，这个想法是将层的输出与层的输入相加，输入 = 层（输出）+ 输出。下图能够帮忙您将其可视化。然而，它只应用了一个 + 运算符。残差操作进步了梯度在乘法器层上流传的能力，容许无效地训练超过一百层的网络。

在 PyTorch 中，咱们能够轻松地创立一个 ResidualAdd 层

fromtorchimportnn
fromtorchimportTensor

classResidualAdd(nn.Module):
    def__init__(self, block: nn.Module):
        super().__init__()
        self.block = block
        
    defforward(self, x: Tensor) ->Tensor:
        res = x
        x = self.block(x)
        x += res
        returnx

    
ResidualAdd(nn.Conv2d(32, 32, kernel_size=1)
)(x).shape

有时候残差没有雷同的输入维度，所以无奈将它们相加。所以就须要应用 conv(带 + 的彩色箭头) 来投影输出，以匹配输入的个性

fromtypingimportOptional

classResidualAdd(nn.Module):
    def__init__(self, block: nn.Module, shortcut: Optional[nn.Module] = None):
        super().__init__()
        self.block = block
        self.shortcut = shortcut
        
    defforward(self, x: Tensor) ->Tensor:
        res = x
        x = self.block(x)
        ifself.shortcut:
            res = self.shortcut(res)
        x += res
        returnx

ResidualAdd(nn.Conv2d(32, 64, kernel_size=1),
    shortcut=nn.Conv2d(32, 64, kernel_size=1)
)(x).shape

在用于图像识别的深度残差网络中也引入了瓶颈块。BottleNeck 块承受大小为 BxCxHxW 的输出，它首先应用 1 ×1 卷积将其缩减为 BxC/rxHxW，而后再利用 3×3 卷积，最初再应用 1×1 卷积将输入从新映射到与输出雷同的特色维度 BxCxHxW。这比应用三个 3×3 转换要快的多，因为中间层缩小输出维度，所以将其称之为“BottleNeck”。下图可视化了该块，咱们在原始实现中应用 r=4

前两个 convs 之后是 batchnorm 和一个非线性激活，在加法之后还有一个非线性的激活

fromtorchimportnn

classBottleNeck(nn.Sequential):
    def__init__(self, in_features: int, out_features: int, reduction: int = 4):
        reduced_features = out_features//reduction
        super().__init__(
            nn.Sequential(
                ResidualAdd(
                    nn.Sequential(
                        # wide -> narrow
                        Conv1X1BnReLU(in_features, reduced_features),
                        # narrow -> narrow
                        Conv3X3BnReLU(reduced_features, reduced_features),
                        # narrow -> wide
                        Conv1X1BnReLU(reduced_features, out_features, act=nn.Identity),
                    ),
                    shortcut=Conv1X1BnReLU(in_features, out_features)
                    ifin_features!= out_features
                    elseNone,
                ),
                nn.ReLU(),)
        )
        
BottleNeck(32, 64)(x).shape

请留神这里仅在输出和输入特色维度不同时才应用 shortcut。

个别状况下当心愿缩小空间维度时，在两头卷积中应用 stride=2。

线性瓶颈是在 MobileNetV2: Inverted Residuals 中引入的。线性瓶颈块是不蕴含最初一个激活的瓶颈块。在论文的第 3.2 节中，他们具体介绍了为什么在输入之前存在非线性会侵害性能。简而言之：非线性函数 Line ReLU 将所有 < 0 设置为 0 会毁坏信息。依据教训表明，当输出的通道小于输入的通道时删除最初的激活函数是正确的。所以只有删除 BottleNeck 中的 nn.ReLU 即可。

在 MobileNetV2 中还引入了倒置残差。Inverted Residual 块是倒置的 BottleNeck 层。他们应用第一个 conv 对维度进行扩大而不是缩小。下图应该分明地阐明这一点

从 BxCxHxW -> BxCexHxW -> BxCexHxW -> BxCxHxW，其中 e 是收缩比，默认设置为 4。而不是像失常的瓶颈块那样变宽 -> 窄 -> 宽，他们做相同的事件窄 -> 宽 -> 窄。

classInvertedResidual(nn.Sequential):
    def__init__(self, in_features: int, out_features: int, expansion: int = 4):
        expanded_features = in_features*expansion
        super().__init__(
            nn.Sequential(
                ResidualAdd(
                    nn.Sequential(
                        # narrow -> wide
                        Conv1X1BnReLU(in_features, expanded_features),
                        # wide -> wide
                        Conv3X3BnReLU(expanded_features, expanded_features),
                        # wide -> narrow
                        Conv1X1BnReLU(expanded_features, out_features, act=nn.Identity),
                    ),
                    shortcut=Conv1X1BnReLU(in_features, out_features)
                    ifin_features!= out_features
                    elseNone,
                ),
                nn.ReLU(),)
        )
        
InvertedResidual(32, 64)(x).shape

在 MobileNet 中，残差连贯仅在输出和输入特色匹配时利用，这个咱们在后面曾经阐明了

classMobileNetLikeBlock(nn.Sequential):
    def__init__(self, in_features: int, out_features: int, expansion: int = 4):
        # use ResidualAdd if features match, otherwise a normal Sequential
        residual = ResidualAddifin_features == out_featureselsenn.Sequential
        expanded_features = in_features*expansion
        super().__init__(
            nn.Sequential(
                residual(
                    nn.Sequential(
                        # narrow -> wide
                        Conv1X1BnReLU(in_features, expanded_features),
                        # wide -> wide
                        Conv3X3BnReLU(expanded_features, expanded_features),
                        # wide -> narrow
                        Conv1X1BnReLU(expanded_features, out_features, act=nn.Identity),
                    ),
                ),
                nn.ReLU(),)
        )
        
MobileNetLikeBlock(32, 64)(x).shape
MobileNetLikeBlock(32, 32)(x).shape

在 MobileNetV2 之后，它的构建块被称为 MBConv。MBConv 是具备深度可拆散卷积的倒置线性瓶颈层，听着很绕对吧，其实就是把下面咱们介绍的几个块进行了整合。

1、深度可拆散卷积 Depth-Wise Separable Convolutions

Depth-Wise Separable Convolutions 是一种缩小参数的数量技巧，它将一个一般的 3×3 卷积拆分为两个卷积。第一个卷积将单个的 3×3 卷积核利用于每个输出的通道，另一个卷积将 1×1 卷积核利用于所有通道。这和做一个一般的 3×3 转换是一样的，然而却缩小了参数。

然而其实这个有点多余，因为在咱们现有的硬件上它比一般的 3×3 慢得多。

通道中的不同色彩代表每个通道利用的一个独自的卷积核（过滤器）

classDepthWiseSeparableConv(nn.Sequential):
    def__init__(self, in_features: int, out_features: int):
        super().__init__(nn.Conv2d(in_features, in_features, kernel_size=3, groups=in_features),
            nn.Conv2d(in_features, out_features, kernel_size=1)
        )
        
DepthWiseSeparableConv(32, 64)(x).shape

让咱们看看参数缩小了多少：

sum(p.numel() forpinDepthWiseSeparableConv(32, 64).parameters() ifp.requires_grad) 
#2432

再看看一个一般的 Conv2d

sum(p.numel() forpinnn.Conv2d(32, 64, kernel_size=3).parameters() ifp.requires_grad)
#18496

这是微小的差距

2、实现 MBConv

当初能够创立一个残缺的 MBConv。MBConv 有几个重要细节，归一化实用于深度和点卷积，非线性仅实用于深度卷积（请记住线性瓶颈）。而激活函数应用 ReLU6。咱们当初把把所有货色放在一起

classMBConv(nn.Sequential):
    def__init__(self, in_features: int, out_features: int, expansion: int = 4):
        residual = ResidualAddifin_features == out_featureselsenn.Sequential
        expanded_features = in_features*expansion
        super().__init__(
            nn.Sequential(
                residual(
                    nn.Sequential(
                        # narrow -> wide
                        Conv1X1BnReLU(in_features, 
                                      expanded_features,
                                      act=nn.ReLU6
                                     ),
                        # wide -> wide
                        Conv3X3BnReLU(expanded_features, 
                                      expanded_features, 
                                      groups=expanded_features,
                                      act=nn.ReLU6
                                     ),
                        # here you can apply SE
                        # wide -> narrow
                        Conv1X1BnReLU(expanded_features, out_features, act=nn.Identity),
                    ),
                ),
                nn.ReLU(),)
        )
        
MBConv(32, 64)(x).shape

在 EfficientNet 中也应用的是带有 Squeeze 和 Excitation 的这个块的批改的版本。

在 EfficientNetV2: Smaller Models and Faster Training 中引入了 Fused Inverted Residuals，这样能够使 MBConv 更快。解决了咱们下面说的深度卷积很慢的问题，它们将第一个和第二个卷积交融在一个 3×3 卷积中（第 3.2 节）。

classFusedMBConv(nn.Sequential):
    def__init__(self, in_features: int, out_features: int, expansion: int = 4):
        residual = ResidualAddifin_features == out_featureselsenn.Sequential
        expanded_features = in_features*expansion
        super().__init__(
            nn.Sequential(
                residual(
                    nn.Sequential(
                        Conv3X3BnReLU(in_features, 
                                      expanded_features, 
                                      act=nn.ReLU6
                                     ),
                        # here you can apply SE
                        # wide -> narrow
                        Conv1X1BnReLU(expanded_features, out_features, act=nn.Identity),
                    ),
                ),
                nn.ReLU(),)
        )
        
MBConv(32, 64)(x).shape

本文介绍了这些根本的卷积块的操作和代码，这些卷积块的架构是咱们在 CV 中常常会遇到的，所以强烈建议浏览与他们相干的论文。另外如果你对本文代码感兴趣，请看这里：

https://avoid.overfit.cn/post/af49b27f50bb416ca829b4987e902874

作者：Francesco Zuppichini

关于人工智能:Residual-BottleNeck-Inverted-Residual-MBConv的解释和Pytorch实现

残差连贯

捷径 Shortcut

瓶颈块 BottleNeck

线性瓶颈 Linear BottleNeck

倒置残差 Inverted Residual

MBConv

交融倒置残差 (Fused MBConv)

总结