关于深度学习:卷积自编码器中注意机制和使用线性模型进行超参数分析

新神经网络架构设计的最新进展之一是注意力模块的引入。首次呈现在在NLP 上的注意力背地的次要思维是为数据的重要局部增加权重。在卷积神经网络的状况下，第一个留神机制是在卷积块留神模型中提出的。其中留神机制分为两个局部:通道留神模块和空间留神模块。

空间留神模块通过将图像合成为两个通道，即最大池化和跨通道的均匀池化来创立特色空间的掩码。这一层是卷积层的输出，卷积层只利用一个放弃与输出雷同大小的滤波器。而后应用sigmoid激活创立从0到1的激活映射。生成的新的映射会按比例缩放输出，它通过缩放输出加强空间特色。

class SpatialAttention(Layer):    '''    Custom Spatial attention layer    '''        def __init__(self, **kwargs):        super(SpatialAttention, self).__init__()        self.kwargs = kwargs    def build(self, input_shapes):        self.conv = Conv2D(filters=1, kernel_size=5, strides=1, padding='same')    def call(self, inputs):                pooled_channels = tf.concat(            [tf.math.reduce_max(inputs, axis=3, keepdims=True),            tf.math.reduce_mean(inputs, axis=3, keepdims=True)],            axis=3)        scale = self.conv(pooled_channels)        scale = tf.math.sigmoid(scale)        return inputs * scale

咱们能够将其增加到密集卷积块中，创立自编码器模型。还能够通过增加一个选项来查看注意力模块的是否存在

def MakeConvolutionBlock(X, Convolutions,BatchNorm=True,Drop=True,SpAttention=True,Act='relu'):    '''    Parameters    ----------    X : keras functional layer        Previous layer in the model.    Convolutions : int        Number of convolutional filters.    BatchNorm : bool, optional        If True a batchnorm layer is added to the convolutional block.         The default is True.    Drop : bool, optional        If true a Droput layer is added to the model. The default is True.    SpAttention : bool, optional        If true a SpatialAttention layer is added to the model. The default is True.    Act : string, optional        Controls the kind of activation to be used. The default is 'relu'.    Returns    -------    X : keras functiona layer         Block of layers added to the model.    '''    X = Conv2D(Convolutions, (3,3), padding='same',use_bias=False)(X)        if SpAttention:        X = SpatialAttention()(X)            if BatchNorm:        X = BatchNormalization()(X)            if Drop:        X = Dropout(0.2)(X)        X=Activation(Act)(X)    return X

随着函数中不同参数的数量减少，间接其增加到下一个函数会有问题。所以能够在 python 中应用 kwargs 性能，它通过应用字典将关键字参数解包到一个函数中。只需将 kwargs 增加到应用与主构建块雷同的参数的函数中。

def MakeDenseConvolutionalCoder(InputShape,Units,BlockDepth,UpSampling=False,**kwargs):    '''    Parameters    ----------    InputShape : tuple        Input shape of the images.    Units : Array-like        Number of convolutional filters to apply per block.    BlockDepth : int        Size of the concatenated convolutional block.    UpSampling : bool, optional        Controls the upsamplig or downsampling behaviour of the network.        The default is False.    **kwargs         keyword arguments from MakeConvolutionBlock.    Returns    -------    InputFunction : Keras functional model input        input of the network.    localCoder : Keras functional model        Coder model, main body of the autoencoder.            '''        if UpSampling:        denseUnits=Units[::-1]        Name="Decoder"    else:        denseUnits=Units        Name="Encoder"        nUnits = len(denseUnits)        InputFunction=Input(shape=InputShape)    X = Conv2D(denseUnits[0], (3, 3), padding='same',use_bias=False)(InputFunction)    X=Activation('relu')(X)    for k in range(1,nUnits-1):                X=MakeDenseBlock(X,denseUnits[k],BlockDepth,**kwargs)                if UpSampling:            X=Conv2DTranspose(denseUnits[k], (3, 3), padding='same',use_bias=False,strides=(2,2))(X)        else:            X=Conv2D(denseUnits[k], (3, 3), padding='same',use_bias=False,strides=(2,2))(X)        if UpSampling:                X=Conv2D(1, (3, 3), padding='same',use_bias=False)(X)        X=BatchNormalization()(X)        Output=Activation('sigmoid')(X)            else:        X=Conv2D(denseUnits[-1], (3, 3), padding='same',use_bias=False)(X)        X=BatchNormalization()(X)        Output=Activation('relu')(X)            localCoder=Model(inputs=InputFunction,outputs=Output,name=Name)    return InputFunction,localCoder

下面代码创立了自编码器的主体，并通过在其间增加采样层，咱们就能够定义变分自编码器。应用 MNIST 数据集训练模型样本能够失去上面相似的后果。

曾经定义了神经网络的架构，上面就是评估其余超参数。随着超参数数量的减少，搜寻空间的复杂性也随之减少。如果没有显著的差别，许多不同类型的参数组合可能会使解释变得艰难。为了躲避所有这些问题的一种简略办法是将简略的线性模型利用于在不同设置下训练的模型的性能数据。

names = ['BatchNorm','Dropout','SpatialAttention','Activation_elu','Activation_relu','Activation_selu','Activation_sigmoid']container = []for conf in configs:    initial = [int(conf[ky]) for ky in ['BatchNorm', 'Drop', 'SpAttention']]    for k,val in enumerate(activationNames):        current=[0,0,0,0]        if conf['Act']==val:            current[k]=1            break                initial.extend(current)    container.append(initial)        linearModel = sm.OLS(performanceA,np.array(container))results = linearModel.fit()results.summary(xname=names)

从这个线性模型中，系数的解释非常简单。正系数示意性能值减少，而负值示意性能值升高。当应用重建损失时，负系数将示意性能进步。

从这个简略的线性模型中，能够看到抉择增加到主构建块中的三种不同类型的层进步了模型的性能。在扭转激活函数的同时，模型性能向相同的方向挪动。即便适宜线性模型的样本量很小，它也能够将优化工作导向特定方向。

本文的残缺代码在这里：

https://www.overfit.cn/post/9...

作者：Octavio Gonzalez-Lugo