关于人工智能:使用Pytorch实现对比学习SimCLR-进行自监督预训练

SimCLR（Simple Framework for Contrastive Learning of Representations）是一种学习图像示意的自监督技术。与传统的监督学习办法不同，SimCLR 不依赖标记数据来学习有用的示意。它利用比照学习框架来学习一组有用的特色，这些特色能够从未标记的图像中捕捉高级语义信息。

SimCLR 已被证实在各种图像分类基准上优于最先进的无监督学习办法。并且它学习到的示意能够很容易地转移到上游工作，例如对象检测、语义宰割和小样本学习，只需在较小的标记数据集上进行起码的微调。

SimCLR 次要思维是通过加强模块 T 将图像与同一图像的其余加强版本进行比照，从而学习图像的良好示意。这是通过通过编码器网络 f(.) 映射图像，而后进行投影来实现的。head g(.) 将学习到的特色映射到低维空间。而后在同一图像的两个加强版本的示意之间计算比照损失，以激励对同一图像的类似示意和对不同图像的不同示意。

本文咱们将深入研究 SimCLR 框架并摸索该算法的要害组件，包含数据加强、比照损失函数以及编码器和投影的 head 架构。

咱们这里应用来自 Kaggle 的垃圾分类数据集来进行试验

SimCLR 中最重要的就是转换图像的加强模块。SimCLR 论文的作者倡议，弱小的数据加强对于无监督学习很有用。因而，咱们将遵循论文中举荐的办法。

调整大小的随机裁剪
50% 概率的随机程度翻转
随机色彩失真（色彩抖动概率为 80%，色彩降落概率为 20%）
50% 概率为随机高斯含糊

 defget_complete_transform(output_shape, kernel_size, s=1.0):
     """
     Color distortion transform
     
     Args:
         s: Strength parameter
         
     Returns:
         A color distortion transform
     """
     rnd_crop=RandomResizedCrop(output_shape)
     rnd_flip=RandomHorizontalFlip(p=0.5)
     
     color_jitter=ColorJitter(0.8*s, 0.8*s, 0.8*s, 0.2*s)
     rnd_color_jitter=RandomApply([color_jitter], p=0.8)
     
     rnd_gray=RandomGrayscale(p=0.2)
     gaussian_blur=GaussianBlur(kernel_size=kernel_size)
     rnd_gaussian_blur=RandomApply([gaussian_blur], p=0.5)
     to_tensor=ToTensor()
     image_transform=Compose([
         to_tensor,
         rnd_crop,
         rnd_flip,
         rnd_color_jitter,
         rnd_gray,
         rnd_gaussian_blur,
     ])
     returnimage_transform
 
 classContrastiveLearningViewGenerator(object):
     """Take 2 random crops of 1 image as the query and key."""
     def__init__(self, base_transform, n_views=2):
         self.base_transform=base_transform
         self.n_views=n_views
         
     def__call__(self, x):
         views= [self.base_transform(x) foriinrange(self.n_views)]
         returnviews

下一步就是定义一个 PyTorch 的 Dataset。

 classCustomDataset(Dataset):
     def__init__(self, list_images, transform=None):
         """
         Args:
             list_images (list): List of all the images
             transform (callable, optional): Optional transform to be applied on a sample.
         """
         self.list_images=list_images
         self.transform=transform
         
     def__len__(self):
         returnlen(self.list_images)
     
     def__getitem__(self, idx):
         iftorch.is_tensor(idx):
             idx=idx.tolist()
             
         img_name=self.list_images[idx]
         image=io.imread(img_name)
         ifself.transform:
             image=self.transform(image)
             
         returnimage

作为样例，咱们应用比拟小的模型 ResNet18 作为骨干，所以他的输出是 224×224 图像，咱们依照要求设置一些参数并生成 dataloader

 out_shape= [224, 224]
 kernel_size= [21, 21] # 10% of out_shape
 
 # Custom transform
 base_transforms=get_complete_transform(output_shape=out_shape, kernel_size=kernel_size, s=1.0)
 custom_transform=ContrastiveLearningViewGenerator(base_transform=base_transforms)
 
 garbage_ds=CustomDataset(list_images=glob.glob("/kaggle/input/garbage-classification/garbage_classification/*/*.jpg"),
     transform=custom_transform
 )
 
 BATCH_SZ=128
 
 # Build DataLoader
 train_dl=torch.utils.data.DataLoader( 
     garbage_ds,
     batch_size=BATCH_SZ,
     shuffle=True,
     drop_last=True,
     pin_memory=True)

咱们曾经筹备好了数据，开始对模型进行复现。下面的加强模块提供了图像的两个加强视图，它们通过编码器前向传递以取得相应的示意。SimCLR 的指标是通过激励模型从两个不同的加强视图中学习对象的个别示意来最大化这些不同学习示意之间的相似性。

编码器网络的抉择不受限制，能够是任何架构。下面曾经说了，为了简略演示，咱们应用 ResNet18。编码器模型学习到的示意决定了相似性系数，为了进步这些示意的品质，SimCLR 应用投影头将编码向量投影到更丰盛的潜在空间中。这里咱们将 ResNet18 的 512 维度的特色投影到 256 的空间中，看着很简单，其实就是加了一个带 relu 的 mlp。

 classIdentity(nn.Module):
     def__init__(self):
         super(Identity, self).__init__()
     defforward(self, x):
         returnx
     
 classSimCLR(nn.Module):
     def__init__(self, linear_eval=False):
         super().__init__()
         self.linear_eval=linear_eval
         resnet18=models.resnet18(pretrained=False)
         resnet18.fc=Identity()
         self.encoder=resnet18
         self.projection=nn.Sequential(nn.Linear(512, 512),
             nn.ReLU(),
             nn.Linear(512, 256)
         )
     defforward(self, x):
         ifnotself.linear_eval:
             x=torch.cat(x, dim=0)
         encoding=self.encoder(x)
         projection=self.projection(encoding)
         returnprojection

比照损失函数，也称为归一化温度标度穿插熵损失 (NT-Xent)，是 SimCLR 的一个要害组成部分，它激励模型学习雷同图像的类似示意和不同图像的不同示意。

NT-Xent 损失是应用一对通过编码器网络传递的图像的加强视图来计算的，以取得它们相应的示意。比照损失的指标是激励同一图像的两个加强视图的示意类似，同时迫使不同图像的示意不类似。

NT-Xent 将 softmax 函数利用于加强视图示意的成对相似性。softmax 函数利用于小批量内的所有示意对，失去每个图像的相似性概率分布。温度参数 temperature 用于在利用 softmax 函数之前缩放成对相似性，这有助于在优化过程中取得更好的梯度。

在取得相似性的概率分布后，通过最大化同一图像的匹配示意的对数似然和最小化不同图像的不匹配示意的对数似然来计算 NT-Xent 损失。

 LABELS=torch.cat([torch.arange(BATCH_SZ) foriinrange(2)], dim=0)
 LABELS= (LABELS.unsqueeze(0) ==LABELS.unsqueeze(1)).float() #one-hot representations
 LABELS=LABELS.to(DEVICE)
 
 defntxent_loss(features, temp):
     """
     NT-Xent Loss.
     
     Args:
         z1: The learned representations from first branch of projection head
         z2: The learned representations from second branch of projection head 
     Returns:
         Loss
     """
     similarity_matrix=torch.matmul(features, features.T) 
     mask=torch.eye(LABELS.shape[0], dtype=torch.bool).to(DEVICE)
     labels=LABELS[~mask].view(LABELS.shape[0], -1)
     similarity_matrix=similarity_matrix[~mask].view(similarity_matrix.shape[0], -1)
     
     positives=similarity_matrix[labels.bool()].view(labels.shape[0], -1)
     
     negatives=similarity_matrix[~labels.bool()].view(similarity_matrix.shape[0], -1)
     
     logits=torch.cat([positives, negatives], dim=1)
     labels=torch.zeros(logits.shape[0], dtype=torch.long).to(DEVICE)
     
     logits=logits/temp
     returnlogits, labels

所有的筹备都实现了，让咱们训练 SimCLR 看看成果！

 simclr_model=SimCLR().to(DEVICE)
 criterion=nn.CrossEntropyLoss().to(DEVICE)
 optimizer=torch.optim.Adam(simclr_model.parameters())
 
 epochs=10
 withtqdm(total=epochs) aspbar:
     forepochinrange(epochs):
         t0=time.time()
         running_loss=0.0
         fori, viewsinenumerate(train_dl):
             projections=simclr_model([view.to(DEVICE) forviewinviews])
             logits, labels=ntxent_loss(projections, temp=2)
             loss=criterion(logits, labels)
             optimizer.zero_grad() 
             loss.backward()
             optimizer.step() 
             
             # print stats
             running_loss+=loss.item()
             ifi%10==9: # print every 10 mini-batches
                 print(f"Epoch: {epoch+1} Batch: {i+1} Loss: {(running_loss/100):.4f}")
                 running_loss=0.0
         pbar.update(1)
         print(f"Time taken: {((time.time()-t0)/60):.3f} mins")

下面代码训练了 10 轮，假如咱们曾经实现了预训练过程，能够将预训练的编码器用于咱们想要的上游工作。这能够通过上面的代码来实现。

 fromtorchvision.transformsimportResize, CenterCrop
 resize=Resize(255)
 ccrop=CenterCrop(224)
 ttensor=ToTensor()
 
 custom_transform=Compose([
     resize,
     ccrop,
     ttensor,
 ])
 
 garbage_ds=ImageFolder(
     root="/kaggle/input/garbage-classification/garbage_classification/",
     transform=custom_transform
 )
 
 classes=len(garbage_ds.classes)
 
 BATCH_SZ=128
 
 train_dl=torch.utils.data.DataLoader(
     garbage_ds,
     batch_size=BATCH_SZ, 
     shuffle=True,
     drop_last=True,
     pin_memory=True,
 )
 
 classIdentity(nn.Module):
     def__init__(self):
         super(Identity, self).__init__() 
     defforward(self, x):
         returnx
     
 classLinearEvaluation(nn.Module):
     def__init__(self, model, classes):
         super().__init__()
         simclr=model
         simclr.linear_eval=True
         simclr.projection=Identity()
         self.simclr=simclr
         forparaminself.simclr.parameters():
             param.requires_grad=False
         self.linear=nn.Linear(512, classes)
     defforward(self, x):
         encoding=self.simclr(x)
         pred=self.linear(encoding)
         returnpred
       
 eval_model=LinearEvaluation(simclr_model, classes).to(DEVICE)
 criterion=nn.CrossEntropyLoss().to(DEVICE)
 optimizer=torch.optim.Adam(eval_model.parameters())
 
 preds, labels= [], []
 correct, total=0, 0
 
 withtorch.no_grad():
     t0=time.time()
     forimg, gtintqdm(train_dl):
         image=img.to(DEVICE)
         label=gt.to(DEVICE)
         pred=eval_model(image)
         _, pred=torch.max(pred.data, 1)
         total+=label.size(0)
         correct+= (pred==label).float().sum().item()
 
     print(f"Time taken: {((time.time()-t0)/60):.3f} mins")
     
 print("Accuracy of the network on the {} Train images: {} %".format(total, 100*correct/total)
     )

下面的代码最次要的局部就是读取刚刚训练的 simclr 模型，而后解冻所有的权重，而后再创立一个分类头 self.linear，进行上游的分类工作

本文介绍了 SimCLR 框架，并应用它来预训练随机初始化权重的 ResNet18。预训练是深度学习中应用的一种弱小的技术，用于在大型数据集上训练模型，学习能够转移到其余工作中的有用特色。SimCLR 论文认为，批量越大，性能越好。咱们的实现只应用 128 个批大小，只训练 10 个 epoch。所以这不是模型的最佳性能，如果须要性能比照还须要进一步的训练。

https://avoid.overfit.cn/post/e105b37642c241b080ae514778b86a6e

本文作者：Prabowo Yoga Wicaksana

加强模块

SimCLR

比照损失

总结