关于人工智能:GradCAM的详细介绍和Pytorch代码实现

Grad-CAM (Gradient-weighted Class Activation Mapping) 是一种可视化深度神经网络中哪些局部对于预测后果奉献最大的技术。它可能定位到特定的图像区域，从而使得神经网络的决策过程更加可解释和可视化。

Grad-CAM 的根本思维是，在神经网络中，最初一个卷积层的输入特色图对于分类后果的影响最大，因而咱们能够通过对最初一个卷积层的梯度进行全局均匀池化来计算每个通道的权重。这些权重能够用来加权特色图，生成一个 Class Activation Map (CAM)，其中每个像素都代表了该像素区域对于分类后果的重要性。

相比于传统的 CAM 办法，Grad-CAM 可能解决任意品种的神经网络，因为它不须要批改网络结构或应用特定的层构造。此外，Grad-CAM 还能够用于对特色的可视化，以及对网络中的一些特定层或单元进行剖析。

在Pytorch中，咱们能够应用钩子 (hook) 技术，在网络中注册前向钩子和反向钩子。前向钩子用于记录指标层的输入特色图，反向钩子用于记录指标层的梯度。在本篇文章中，咱们将具体介绍如何在Pytorch中实现Grad-CAM。

加载并查看预训练的模型

为了演示Grad-CAM的实现，我将应用来自Kaggle的胸部x射线数据集和我制作的一个预训练分类器，该分类器可能将x射线分类为是否患有肺炎。

 model_path="your/model/path/"  # instantiate your model model=XRayClassifier()   # load your model. Here we're loading on CPU since we're not going to do  # large amounts of inference model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))   # put it in evaluation mode for inference model.eval()

首先咱们看看这个模型的架构。就像后面提到的，咱们须要辨认最初一个卷积层，特地是它的激活函数。这一层示意模型学习到的最简单的特色，它最有能力帮忙咱们了解模型的行为，上面是咱们这个演示模型的代码：

 importtorch importtorch.nnasnn importtorch.nn.functionalasF  # hyperparameters nc=3# number of channels nf=64# number of features to begin with dropout=0.2 device=torch.device('cuda'iftorch.cuda.is_available() else'cpu')  # setup a resnet block and its forward function classResNetBlock(nn.Module):     def__init__(self, in_channels, out_channels, stride=1):         super(ResNetBlock, self).__init__()         self.conv1=nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)         self.bn1=nn.BatchNorm2d(out_channels)         self.conv2=nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)         self.bn2=nn.BatchNorm2d(out_channels)                  self.shortcut=nn.Sequential()         ifstride!=1orin_channels!=out_channels:             self.shortcut=nn.Sequential(                 nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),                 nn.BatchNorm2d(out_channels)             )              defforward(self, x):         out=F.relu(self.bn1(self.conv1(x)))         out=self.bn2(self.conv2(out))         out+=self.shortcut(x)         out=F.relu(out)         returnout  # setup the final model structure classXRayClassifier(nn.Module):     def__init__(self, nc=nc, nf=nf, dropout=dropout):         super(XRayClassifier, self).__init__()          self.resnet_blocks=nn.Sequential(             ResNetBlock(nc,   nf,    stride=2), # (B, C, H, W) -> (B, NF, H/2, W/2), i.e., (64,64,128,128)             ResNetBlock(nf,   nf*2,  stride=2), # (64,128,64,64)             ResNetBlock(nf*2, nf*4,  stride=2), # (64,256,32,32)             ResNetBlock(nf*4, nf*8,  stride=2), # (64,512,16,16)             ResNetBlock(nf*8, nf*16, stride=2), # (64,1024,8,8)         )          self.classifier=nn.Sequential(             nn.Conv2d(nf*16, 1, 8, 1, 0, bias=False),             nn.Dropout(p=dropout),             nn.Sigmoid(),         )      defforward(self, input):         output=self.resnet_blocks(input.to(device))         output=self.classifier(output)         returnoutput

模型3通道接管256x256的图片。它冀望输出为[batch size, 3,256,256]。每个ResNet块以一个ReLU激活函数完结。对于咱们的指标，咱们须要抉择最初一个ResNet块。

 XRayClassifier(   (resnet_blocks): Sequential(     (0): ResNetBlock(       (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)       (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)       (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (shortcut): Sequential(         (0): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)         (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       )     )     (1): ResNetBlock(       (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)       (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)       (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (shortcut): Sequential(         (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)         (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       )     )     (2): ResNetBlock(       (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)       (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)       (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (shortcut): Sequential(         (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)         (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       )     )     (3): ResNetBlock(       (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)       (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)       (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (shortcut): Sequential(         (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)         (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       )     )     (4): ResNetBlock(       (conv1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)       (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)       (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       (shortcut): Sequential(         (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)         (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)       )     )   )   (classifier): Sequential(     (0): Conv2d(1024, 1, kernel_size=(8, 8), stride=(1, 1), bias=False)     (1): Dropout(p=0.2, inplace=False)     (2): Sigmoid()   ) )

在Pytorch中，咱们能够很容易地应用模型的属性进行抉择。

 model.resnet_blocks[-1] #ResNetBlock( #  (conv1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) #  (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) #  (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) #  (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) #  (shortcut): Sequential( #    (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) #    (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) #  ) #)

Pytorch的钩子函数

Pytorch有许多钩子函数，这些函数能够解决在向前或后向流传期间流经模型的信息。咱们能够应用它来查看两头梯度值，更改特定层的输入。

在这里，咱们这里将关注两个办法：

register_full_backward_hook(hook, prepend=False)

该办法在模块上注册了一个后向流传的钩子，当调用backward()办法时，钩子函数将会运行。后向钩子函数接管模块自身的输出、绝对于层的输出的梯度和绝对于层的输入的梯度

 hook(module, grad_input, grad_output) -> tuple(Tensor) or None

它返回一个torch.utils.hooks.RemovableHandle，能够应用这个返回值来删除钩子。咱们在前面会探讨这个问题。

register_forward_hook(hook, *, prepend=False, with_kwargs=False)

这与前一个十分类似，它在前向流传中后运行，这个函数的参数略有不同。它能够让你拜访层的输入:

 hook(module, args, output) -> None or modified output

它的返回也是torch.utils.hooks.RemovableHandle

向模型增加钩子函数

为了计算Grad-CAM，咱们须要定义后向和前向钩子函数。这里的指标是对于最初一个卷积层的输入的梯度，须要它的激活，即层的激活函数的输入。钩子函数会在推理和向后流传期间为咱们提取这些值。

 # defines two global scope variables to store our gradients and activations gradients=None activations=None  defbackward_hook(module, grad_input, grad_output):   globalgradients# refers to the variable in the global scope   print('Backward hook running...')   gradients=grad_output   # In this case, we expect it to be torch.Size([batch size, 1024, 8, 8])   print(f'Gradients size: {gradients[0].size()}')    # We need the 0 index because the tensor containing the gradients comes   # inside a one element tuple.  defforward_hook(module, args, output):   globalactivations# refers to the variable in the global scope   print('Forward hook running...')   activations=output   # In this case, we expect it to be torch.Size([batch size, 1024, 8, 8])   print(f'Activations size: {activations.size()}')

在定义了钩子函数和存储激活和梯度的变量之后，就能够在感兴趣的层中注册钩子，注册的代码如下：

 backward_hook=model.resnet_blocks[-1].register_full_backward_hook(backward_hook, prepend=False) forward_hook=model.resnet_blocks[-1].register_forward_hook(forward_hook, prepend=False)

检索须要的梯度和激活

当初曾经为模型设置了钩子函数，让咱们加载一个图像，计算gradcam。

 fromPILimportImage  img_path="/your/image/path/" image=Image.open(img_path).convert('RGB')

为了进行推理，咱们还须要对其进行预处理：

 fromtorchvisionimporttransforms fromtorchvision.transformsimportToTensor  image_size=256 transform=transforms.Compose([                                transforms.Resize(image_size, antialias=True),                                transforms.CenterCrop(image_size),                                transforms.ToTensor(),                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),                            ])  img_tensor=transform(image) # stores the tensor that represents the image

当初就能够进行前向流传了：

 model(img_tensor.unsqueeze(0)).backward()

钩子函数的返回如下:

 Forwardhookrunning... Activationssize: torch.Size([1, 1024, 8, 8]) Backwardhookrunning... Gradientssize: torch.Size([1, 1024, 8, 8])

失去了梯度和激活变量后就能够生成热图：

计算Grad-CAM

为了计算Grad-CAM，咱们将原始论文公式进行一些简略的批改：

 pooled_gradients=torch.mean(gradients[0], dim=[0, 2, 3])

 importtorch.nn.functionalasF importmatplotlib.pyplotasplt  # weight the channels by corresponding gradients foriinrange(activations.size()[1]):     activations[:, i, :, :] *=pooled_gradients[i]  # average the channels of the activations heatmap=torch.mean(activations, dim=1).squeeze()  # relu on top of the heatmap heatmap=F.relu(heatmap)  # normalize the heatmap heatmap/=torch.max(heatmap)  # draw the heatmap plt.matshow(heatmap.detach())

后果如下：

失去的激活蕴含1024个特色映射，这些特色映射捕捉输出图像的不同方面，每个方面的空间分辨率为8x8。通过钩子取得的梯度示意每个特色映射对最终预测的重要性。通过计算梯度和激活的元素积能够取得突出显示图像最相干局部的特色映射的加权和。通过计算加权特色图的全局平均值，能够失去一个繁多的热图，该热图表明图像中对模型预测最重要的区域。这就是Grad-CAM，它提供了模型决策过程的可视化解释，能够帮忙咱们解释和调试模型的行为。

然而这个图能代表什么呢？咱们将他与图片进行整合就能更加清晰的可视化了。

联合原始图像和热图

上面的代码将原始图像和咱们生成的热图进行整合显示：

 fromtorchvision.transforms.functionalimportto_pil_image frommatplotlibimportcolormaps importnumpyasnp importPIL  # Create a figure and plot the first image fig, ax=plt.subplots() ax.axis('off') # removes the axis markers  # First plot the original image ax.imshow(to_pil_image(img_tensor, mode='RGB'))  # Resize the heatmap to the same size as the input image and defines # a resample algorithm for increasing image resolution # we need heatmap.detach() because it can't be converted to numpy array while # requiring gradients overlay=to_pil_image(heatmap.detach(), mode='F')                       .resize((256,256), resample=PIL.Image.BICUBIC)  # Apply any colormap you want cmap=colormaps['jet'] overlay= (255*cmap(np.asarray(overlay) **2)[:, :, :3]).astype(np.uint8)  # Plot the heatmap on the same axes,  # but with alpha < 1 (this defines the transparency of the heatmap) ax.imshow(overlay, alpha=0.4, interpolation='nearest', extent=extent)  # Show the plot plt.show()

这样看是不是就了解多了。因为它是一个失常的x射线后果，所以并没有什么须要非凡阐明的。

再看这个例子，这个后果中被标注的是肺炎。Grad-CAM能精确显示出医生为确定是否患有肺炎而必须查看的胸部x光片区域。也就是说咱们的模型确实学到了一些货色（红色区域再肺部左近）

删除钩子

要从模型中删除钩子，只须要在返回句柄中调用remove()办法。

 backward_hook.remove() forward_hook.remove()

总结

这篇文章能够帮忙你理清Grad-CAM 是如何工作的，以及如何用Pytorch实现它。因为Pytorch蕴含了弱小的钩子函数，所以咱们能够在任何模型中应用本文的代码。

https://avoid.overfit.cn/post/59ce70fd73cc4110acd4016e992b50ea

作者：Vinícius Almeida