关于深度学习:使用卷积操作实现因子分解机

本文将介绍如何应用卷积操作实现因子合成机器。卷积网络因其局部性和权值共享的演绎偏差而在计算机视觉畛域取得了宽泛的胜利和利用。卷积网络能够用来捕捉形态的重叠分类特色 (B, num_cat, embedding_size) 和形态的重叠特色 (B, num_features, embedding_size) 之间的特色交互。

下图显示了卷积网络如何创立交互特色

上图有 5 个曾经进行嵌入的分类特色 (batch_size, num_categorical=5, embedding_size)。假如咱们有一个大小为(高度 =3，宽度为 1) 的卷积过滤器。当咱们在 num_categorical 维度 (输出维度 =1) 上利用卷积 (高度 =3，宽度 =1) 的过滤器时，应用红框的示例(当咱们在 dim= 1 上卷积时)，能够看到咱们无效地计算了 3 个特色之间的卷积(因为过滤器的高度为 3)。单个卷积的每个输入是 3 个分类特色之间的相互作用。当咱们在 num_categorical 上滑动卷积时，能够无效地捕捉任何滚动三元组特色之间的交互，其中 3 个不同特色窗口之间的每个交互都在卷积的输入中被捕捉。

因为过滤器的宽度为 1，所以正在计算三个特色在嵌入维度上独立的滚动窗口交互，如红色、蓝色、紫色和绿色框所示。卷积层的输入高度是产生的可能交互特色的总数，本例是 3。卷积层输入的宽度将是原始嵌入大小，因为卷积滤波器的宽度为 1。

因为嵌入大小是雷同的，咱们能够无效地将卷积网络的这种应用视为合成机，其中以滚动窗口的形式捕捉特色之间的交互。

咱们应用 PyTorch 进行实现，并且可视化视卷积网络中的填充、跨步和扩张

1、填充 Padding

进行填充后，咱们的输出和输入的大小是雷同的，上面代码在 pytorch 中应用 padding=’same’。

 class Conv2dSame(nn.Conv2d):
     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                  padding=0, dilation=1, groups=1, bias=True):
         # initialize with no padding first
         super(Conv2dSame, self).__init__(
             in_channels, out_channels, kernel_size, stride, 0, dilation,
             groups, bias)
         nn.init.xavier_uniform_(self.weight)
     def forward(self, x):
         # input height and width
         ih, iw = x.size()[-2:]
         # filter height and width
         kh, kw = self.weight.size()[-2:]
         # output height
         oh = math.ceil(ih / self.stride[0])
         # output width
         ow = math.ceil(iw / self.stride[1])
         # 2* padding for height
         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
         # 2 * padding for width
         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
         # divide the paddings equally on both sides and pad equally
         # note the ordering of the paddings are reversed for height and width. (it is width then height in the code)
         if pad_h > 0 or pad_w > 0:
             x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
         
         # manually create padding
         out = F.conv2d(x, self.weight, self.bias, self.stride,
                        self.padding, self.dilation, self.groups)
         return out
 
 # self implementation
 conv_same = Conv2dSame(in_channels=1, out_channels=5, kernel_size=3, stride=1, dilation=2, padding=0)
 conv_same_out= conv_same(x)
 
 ## pytorch
 conv_same_pt = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, stride=1, dilation=2, padding='same')
 conv_same_pt.weight = conv_same.weight
 conv_same_pt.bias = conv_same.bias
 conv_same_pt_out= conv_same_pt(x)
 
 assert torch.equal(conv_same_out, conv_same_pt_out) == True

为什么须要填充

有两种最常见的填充类型:(1)“valid”填充(2)“same”填充，如上图所示。

应用“valid”填充对一个维度进行卷积时，输入维度将小于输出维度。(下面的“被抛弃”的例子)

应用“same”填充对一个维度进行卷积时，对输出数据进行填充，使输入维度与输出维度雷同(参考下面的“pad”示例)。

2、步幅 Stride

步幅就是在输出上滑动过滤器的步长。

Stride 指的是卷积核在输出张量上挪动的步长：

步幅为 1 意味着过滤器每次挪动一个元素，产生密集的计算。步幅大于 1 意味着过滤器在挪动过程中跳过元素，产生输出的子采样。步幅间接影响输入特色图的空间维度。较大的步幅会导致输入大小的减小。

步幅为 2，则输入大小将减小。咱们能够用 Pytorch 验证这一点，如果咱们将 height 和 width 的 stride 设置为 2，则 height 和 width 从 5 减小到 3。(留神，能够为高度和宽度指定不同的步长):

 # sample data
 batch_size=10
 channel_in = 1
 height = 5
 width = 5
 x = torch.randn(batch_size, channel_in, height, width)
 x.shape # torch.Size([10, 1, 5, 5])
 
 # padded convolution with padding specified in nn.Conv2d
 pad_conv = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, stride=2, dilation=1, padding=1)
 pad_conv(x).shape # torch.Size([10, 5, 3, 3])

3、扩张 Dilation

扩张是滤波器中输出张量和权重之间的间隙大小

扩张是指卷积运算过程中核元素之间的间距。扩张为 1 意味着核元素之间没有间隙，产生规范卷积。大于 1 则引入了核元素之间的间隙，无效地扩大了卷积操作的承受域。

扩张通常用于减少卷积层的承受野，能够在不增加额定参数的状况下捕捉更宽泛的上下文信息。扩张不间接影响输入特色图的空间维度。它影响内核如何采样输出元素。

4、Flexible K-max pooling

在计算机视觉中，最大池化的思维曾经十分风行，以缩小卷积网络所需的计算，并已被证实是胜利的辨认图像中的重要特色。max_pooling 在计算机视觉中体现得很好，但咱们不能将其用于举荐零碎，因为只检索 (height, width) 字段中的最大值是没有意义的，因为具备大值的交互特色将在池化层的输入中反复呈现(因为卷积网络逾越输出的实质，其中每个输出能够在输入中呈现屡次)。

所以能够扩大池化操作(输入交互特色的大值比输入交互特色的小值更重要)，并引入了灵便的 p -max 池化，只从每个卷积层输入中取得 top- k 个最大特色。因为 k 是由卷积层的深度决定的，它随着深度的减少而减小。这模拟了卷积层中的最大池化思维，其中最大池化产生的输入大小小于输出大小。

以上公式的代码如下：

 conv_filters = [100,100,4,5]
 length_conv = len(conv_filters)
 n = 10
 conv_width = [3,5,7]
 for i in range(length_conv):
     if i != length_conv-1:
         p_i = int((1- (i-length_conv) ** (i-length_conv)) * n)
     else:
         p_i = 3
     print(p_i)
 
 
 # 9
 # 10
 # 7
 # 3

这个公式并不完满。咱们能够看到 p_i 的值通常趋向于减小。然而 p_i 可能会减少(例如，从 9 减少到 10)。这就是为什么在代码中，咱们必须确保 p_i 不会减少。如果咱们设置 n ==1，也有可能 p_i == 0。在应用时咱们还须要在代码中解决这个问题

咱们在 Pytorch 中实现 k -max_pooling: 依据 number_of_feature 的示例抉择 top- k 个特色

 class KMaxPooling(nn.Module):
     """K Max pooling that selects the k biggest value along the specific axis.
 
       Input shape
         -  nD tensor with shape: ``(batch_size, ..., input_dim)``.
 
       Output shape
         - nD tensor with shape: ``(batch_size, ..., output_dim)``.
 
      """def __init__(self, k, axis, device='cpu'):
         super(KMaxPooling, self).__init__()
         self.k = k
         self.axis = axis
     def forward(self, inputs):
         out = torch.topk(inputs, k=self.k, dim=self.axis, sorted=True)[0]
         return out

5、因子合成机

有了以上的一些概念的介绍，咱们就能够实现因子合成机了，咱们将步骤分成 3 步：

(1)创立样本 x，其中 num_categories 作为特色的数量

(2)依据层的深度计算 p_i 或 k。

(3)应用 k -max-Pooling 失去以后 conv 层的最终输入

 # create sample
 batch_size=12
 num_categories = 10
 in_channel = 1 # must be 1 for it to work
 embedding_size = 11
 sample_x = torch.randn(batch_size, in_channel, num_categories, embedding_size)
 
 # initialize example
 conv_filters = [3,4,5]
 length_conv = len(conv_filters)
 n = num_categories
 conv_width = [3,5,7]
 field_shape = n
 module_list = []
 for i in range(1, length_conv+1):
     if i == 1:
         in_channels = 1
     else:
         in_channels = conv_filters[i-2]
         
     out_channels = conv_filters[i-1]
     width = conv_width[i-1]
     # max because it is possible that the formula is 0 if n == 1
     k = max(1, int((1- (i-length_conv) ** (i-length_conv)) * n)) if i < length_conv else 3
     #if i == 1,  shape = (B,out_channel, num_category, embedding_size)
     if i == 1:
         c = Conv2dSame(in_channels=in_channels, out_channels=out_channels, kernel_size=(width, 1), stride=1)
         first_out = c(sample_x)
         print(first_out.shape) # torch.Size([12, 3, 10, 11]) = (B,out_channel, num_category, embedding_size)
     module_list.append(Conv2dSame(in_channels=in_channels, out_channels=out_channels, kernel_size=(width, 1), stride=1))
     module_list.append(nn.ReLU())
     
     # get the topk values
     module_list.append(KMaxPooling(k=min(k, field_shape), axis=2)) # (B,out_channel, k, embedding_size)
     # we do not want the field_shape to increase
     field_shape = min(field_shape, k)
     
 conv_layer = nn.Sequential(*module_list)
 conv_layer(sample_x).shape

咱们首先介绍了卷积的一些基本知识，而后介绍了如何应用卷积实现因子合成机，因为应用来自卷积层的 max_pooling 来取得重要的交互特色是没有意义的，所以咱们还介绍了一个新的池化层，而后将下面的内容整合实现了实现了因子合成机的操作。

本文的大部分内容来自于这篇论文：

https://avoid.overfit.cn/post/9e333ddb2e814bafacf4d33b1474a499

作者：Ngieng Kianyew

关于深度学习:使用卷积操作实现因子分解机

作为合成机的卷积网络

PyTorch 实现

总结