DataWhale街景字符编码识别项目数据增强

深层神经网络一般都需要大量的训练数据才能获得比较理想的结果。在数据量有限的情况下，可以通过 数据增强（Data Augmentation）来增加训练样本的多样性，提高模型鲁棒性，避免过拟合。

数据增强的另⼀种解释是，随机改变训练样本可以 降低模型对某些属性的依赖，从而提⾼模型的泛化能⼒。

例如，我们可以对图像进⾏不同方式的裁剪，让物体以不同的⽐例出现在图像的不同位置，这同样能够降低模型对⽬标位置的敏感性。我们也可以调整亮度、对⽐度、饱和度和⾊调等因素来降低模型对⾊彩的敏感度。

图片数据增强通常只是针对训练数据，对于测试数据则用得较少。后者也叫 TTA(test time augmentation)，比如做 5 次随机剪裁，然后将 5 张图片的预测结果做均值。

裁剪

随机裁剪：transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
随机裁剪

"""
首先对图片进行填充，然后随机裁剪出大小为 100x100 大小的图像
Params:
    size(integer or tuple): 裁剪后的尺寸
    
    padding(integer or tuple): 填充大小
    
    padding_mode(string): 填充模式，可选值 'constant','edge', 'reflect', 分别代表常数填充，边缘填充，镜像填充
"""trans = transforms.RandomCrop(size=(100, 100), padding=5, padding_mode='constant')

# img 为 PIL.Image 对象，返回值仍然为 PIL.Image 对象
trans_img = trans(img)

中心裁剪 ：transforms.CenterCrop(size)
以图像中心为裁剪区域中心，向四周扩展裁剪，如果 size 大于图片大小，则会自动对边缘进行 0 填充。

随机尺寸以及随机长宽比裁剪：RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.), interpolation=Image.BILINEAR)

"""

首先会按照裁剪尺寸比（指定范围随机取值）和长宽比指定范围随机取值对原始图片进行裁剪。对裁剪后的图片进行缩放（Resize），并对其进行插值。Params:
  size(integer or tuple): 最终输出的图片尺寸
  
  scale(float or tuple): 裁剪区域尺寸相对于原始图片的比例，比如（0.08，1）表示裁剪出来的图片大小在原始 

  ratio(float or tuple): 裁剪区域长宽比例
  
  interpolation：插值方式
  
"""
RandomResizedCrop(size=100, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.), interpolation=Image.BILINEAR)

trans_img = trans(img)

上下左右中心裁剪: FiveCrop(size)


"""
Descriptions:
  FiveCrop 操作将产生 5 张图像, 分别对应左上角、右上角、左下角、右下角和中心
"""img_path = r'xx.jpg'

  img = Image.open(img_path)

  imgs = transforms.FiveCrop(700)(img)

  plt.figure(figsize=(16, 8))
  plt.subplot(1, 2, 1)
  img = np.array(img)
  plt.imshow(img)
  plt.xticks([])
  plt.yticks([])
  plt.title('Original Image')

  crop_title = ['left-top', 'right-top', 'left-botton', 'right-bottom', 'center']
  for i, im in enumerate(imgs):
      plt.subplot(2, 6, (i // 3) * 3 + 4 + i)
      plt.imshow(np.array(im), )
      plt.title(crop_title[i])
      plt.xticks([])
      plt.yticks([])
  plt.show()

上下左右中心及其镜像裁剪 ：TenCrop(size, vertical_flip=False)
与 FiveCrop 类似，TenCrop 将会得到 10 张图片，分别是 4 个角和中心区域的裁剪，以及它们的镜像(默认是水平镜像)。

旋转

"""
degrees(sequence or float or integer): 旋转角度，如果不是序列，则旋转角度范围是[-degrees, degrees]。否则，旋转角度是序列中的随机值

resample: 可选值，{PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}

expand(bool): 如果为 true，则旋转后的图片尺寸将变大以适应旋转。否则，旋转后的图片尺寸不变。"""
transforms.RandomRotation(degrees, resample=False, expand=False, center=None)

翻转
水平翻转 ：transforms.RandomHorizontalFlip(p=0.5)
垂直翻转：transforms.RandomVerticalFlip(p=0.5)

p 为翻转的概率
缩放
transforms.Resize(size, interpolation=Image.BILINEAR)

size(integer or tuple): 如果 size 是 tuple，则会按照给定的 size 进行缩放。如果 size 是整数，那么图片的最小边将会缩放为 size 大小。比如 $width>height$, 那么 $height$ 将会缩放到 size，而 $width$ 将会缩放到 $height*size/width$.
填充
transforms.Pad(padding, fill=0, padding_mode='constant')

padding(integer or tuple): 如果是长度为 2 的 tuple，left/right,top/bottom 分别匹配 tuple 的 2 个元素. 如果是长度为 4 的 tuple，left、right、top 和 bottom 分别匹配 tuple 的 4 个元素. 如果是整数，则 4 个边缘都将被填充相同宽度。

fill(integer or tuple): 如果是 tuple，则 RGB3 个通道分别与之匹配。只有在 padding_mode 设置为 constant 时有效

padding_mode(string): 填充模式，可选值 ’constant’,’edge’, ‘reflect’, 分别代表常数填充，边缘填充，镜像填充
修改 H、S、V
随机改变图像的亮度、对比度和饱和度
transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)

brightness (float or tuple of float (min, max)): 亮度抖动的程度. 如果 brightness 是一个浮点数，brightness_factor 从 [max(0, 1 – brightness), 1 + brightness] 均匀采样得到。否则 brightness 在 [min, max] 中均匀采样，brightness 应该包含非负元素.

contrast (float or tuple of float (min, max)): 与 brightness 类似.

saturation (float or tuple of float (min, max)): 与 brightness 类似.

hue (float or tuple of float (min, max)): 色调抖动的程度. 如果 hue 是一个浮点数，hue_factor 从 [-hue, hue] 均匀采样得到. 否则 hue_factor 从从 [min, max] 均匀采样得到.
需要注意的是 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
转灰度图
transforms.RandomGrayscale(p=0.1)

p 为转灰度图的概率，如果图像是单通道的，输出也是单通道的；如果输入是 3 通道的，输出还是 3 通道的(R=G=B,3 个通道相同)
线性变换
对图像进行线性变换，可用于图像的白化处理
transforms.LinearTransformation(transformation_matrix, mean_vector)

transformation_matrix(tensor): 形状为[D, D],D = C * H * W
mean_vector(tensor): 形状为[D,]
白化操作的过程为，假设输入为 X，首先将其变为 zero-centered；
之后，计算其协方差矩阵 t.mm(X, X.t()); 然后对其进行矩阵 SVD 分解，得到 transformation_matrix。
```
C, H, W = img.shape
trans_img = t.mm((img.flatten() - mean_vecter), transformation_matrix).view(C, H, W)
```
仿射变换
包括了平移(translate)、旋转(rotate)、缩放(scale)、剪切(shear)
transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, resample=False, fillcolor=0)

degrees(sequence or float or integer): 旋转角度，如果不是序列，则旋转角度范围是[-degrees, degrees]。否则，旋转角度是序列中的随机值

translate(tuple or optional): 水平和竖直平移的最大绝对值分数. 比如 (a, b), 那么水平方向上的平移平移量将会在[-img_width * a, img_width * a] 之间随机采样，竖直方向上的平移量将会在[-img_height * b, img_height * b]. 默认不平移。

scale(tuple or optional): 缩放因子，比如 (a, b), 缩放因子将会在[a, b] 之间随机采样，作为缩放因子。默认不缩放。

shear(sequence or float or int, optional): 如果 shear 是一个数值, x 轴方向上的剪切范围将在 [-shear, +shear] 之间随机取值。如果 shear 是包含 2 个元素的序列，x 轴的剪切范围将在 [shear[0], shear[1]] 随机取值. 如果 shear 是包含 4 个元素的序列, x 轴方向上的剪切范围在 [shear[0], shear[1]] 之间随机取值，y 轴方向上 shear 将在 [shear[2], shear[3]]之间随机取值.

resample: 可选值，{PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}

fill_color(int or tuple): 如果是 tuple，则 RGB3 个通道分别与之匹配。
转 PIL.Image
将 numpy.ndarray 或者 torch.tensor 转为 PIL.Image
transforms.ToPILImage(mode=None)

mode (PIL.Image mode): 颜色空间和输入数据的像素深度，可选值。
If the input has 4 channels, the mode is assumed to be RGBA.
If the input has 3 channels, the mode is assumed to be RGB.
If the input has 2 channels, the mode is assumed to be LA.
If the input has 1 channel, the mode is determined by the data type (i.e int, float, short).

自定义操作
transforms.Lambda(lambd)

# lambd(function): 自定义操作函数

# 调整图片色调
trans = transforms.Lambda(lambda img: F.adjust_hue(img, hue_factor))

trans_img = trans(img)

ToTensor
将 PIL.Image 转为 torch.tensor, 同时将 uint8 类型转为 float 类型，归一化到 [0, 1]。另外，将维度顺序从[H, W, C] 改为[C, H, W]
transforms.ToTensor()
标准化
使用指定的均值和标准差对数据进行归一化
transforms.Normalize(mean, std, inplace=False)

mean(sequence): 序列长度与图像的通道数一致，分别对应与每个通道
std(sequence): 序列长度与图像的通道数一致，分别对应与每个通道
归一化的计算过程为
$$\frac{img – mean}{std}$$

因为我们经常会使用 ImageNet 的预训练权重来初始化我们的模型，因此会使用 ImageNet 数据集的均值和方差。
但针对一些特定任务, 与自然图像存在较大差距，可能使用该数据集的均值和方差进行标准化效果会更好。手动计算训练集的均值和方差可以参考这里

pytorch 不仅提供了大量的针对单个图片的增强操作，还可以对这些操作进行灵活的组合

Compose
组合多个数据增强操作
transforms.Compose(transforms)
```
transforms.Compose([transforms.Resize((128, 64)),
    transforms.ToTensor(),
    Transforms.Normalize(mean=[0.485, 0.456, 0.406] , std=[0.229, 0.224, 0.225])
])
```
需要注意的是，Compose 中的增强列表参数是有顺序的。通常而言，ToTensor 和 Normalize 是放在最后。
RandomChoice
从一系列的操作中选择一个进行操作
transforms.RandomChoice(transforms)
RandomOrder
打乱 transforms 中的操作顺序
transforms.RandomOrder(transforms)
RandomApply
给一个 transform 加上概率，以一定的概率执行该操作
transforms.RandomApply(transform, p=0.5)

这里，我们介绍一些更复杂，效果可能也更加明显的一些增强方法

Cutout
Cutout 是一种对图像随机擦除的数据增强操作。如下图所示

首先，确定需要正方形 patch 的大小 length，以及需要擦除的 patch 的个数 n_holes。然后对图片擦除 n_holes 次，每次随机指定图像上的一个点为中心，擦除 patch 大小的区域。


import torch as t
class Cutout:
    """
    Descriptions:
        从图像中随机抹掉一个或多个 patch
    Params:
        n_holes (int): 每个图像需要抹掉 patch 的数量.
        length (int): 每个正方形 patch 的长度（像素）"""
    def __init__(self, n_holes, length):
        self.n_holes = n_holes
        self.length = length
        
    def __call__(self, img):
        """
        Params:
            img (Tensor): 形状为(C, H, W).
        Returns:
            
        """
            
        h = img.shape(1)
        w = img.shape(2)
        mask = t.ones(h, w).float()

        for n in range(self.n_holes):
            y = t.randint(h, (1,)).item()
            x = t.randint(w, (1,)).item()

            y1 = t.clamp(y - self.length // 2, 0, h)
            y2 = t.clamp(y + self.length // 2, 0, h)
            x1 = t.clamp(x - self.length // 2, 0, w)
            x2 = t.clamp(x + self.length // 2, 0, w)

            mask[y1: y2, x1: x2] = 0.

        mask = mask.expand_as(img)
        img = img * mask

        return img

CutMix
Cutmix 是通过随机复制一个图片中的区域粘贴到另一个图片中的相同位置，如下图所示

# ------------------------- #
for batch_idx, (input, target) in enumerate(trainloader):
    input = input.cuda()
    target = target.cuda()
    r = np.random.rand(1)
    # 如果参数大于 0，并且
    if args.beta > 0 and r < args.cutmix_prob:
       # generate mixed sample
       # lam 在 beta 分布随机采样得到
       lam = np.random.beta(args.beta, args.beta)

       rand_index = torch.randperm(input.size()[0]).cuda()
       target_a = target
       target_b = target[rand_index]
       bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam)
       input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2]
       # adjust lambda to exactly match pixel ratio

       lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (input.size()[-1] * input.size()[-2]))
       # compute output
       output = model(input)
       # 最后的类别标签是混合之后的面积比例加权，lam 为权重系数
       loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam)
# ------------------------- #
   
def rand_bbox(size, lam):
    """
    size(tuple): 图片尺寸
    
    lam(float): 剪切区域相对于图片的比例
    """
    W = size[2]
    H = size[3]
    cut_rat = t.sqrt(1. - lam)
    cut_w = int(W * cut_rat)
    cut_h = int(H * cut_rat)

    # uniform
    cx = t.randint(W, (1,)).item()
    cy = t.randint(H, (1,)).item()

    bbx1 = t.clamp(cx - cut_w // 2, 0, W)
    bby1 = t.clamp(cy - cut_h // 2, 0, H)
    bbx2 = t.clamp(cx + cut_w // 2, 0, W)
    bby2 = t.clamp(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2

MixUp
MixUp 通过将两个图像以不同的权重叠加在一起，类似于 cv2.addWeights 函数的效果。如下图所示

def mixup_data(x, y, alpha=1.0, use_cuda=True):
    '''
    
    Params:
        x(tensor): 形状为[N, C, H, W]
        y(tensor): 形状为[N,]
        
    Returns mixed inputs, pairs of targets, and lambda
    '''
    
    # lam 为混合权重
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size()[0]
    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)
    
    
for batch_idx, (inputs, targets) in enumerate(trainloader):
    if use_cuda:
        inputs, targets = inputs.cuda(), targets.cuda()

    inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, args.alpha, use_cuda)
    
    outputs = net(inputs)
    loss = mixup_criterion(criterion, outputs, targets_a, targets_b, lam)
    train_loss += loss.data[0]
    _, predicted = torch.max(outputs.data, 1)
    total += targets.size(0)
    
    correct += (lam * predicted.eq(targets_a.data).cpu().sum().float()
                + (1 - lam) * predicted.eq(targets_b.data).cpu().sum().float())

Hide-and-seek
TODO
Grid-Mask
TODO
Dropout
TODO
DropConnection
TODO
DropBlock
TODO
Style Transfer
TODO

imgaug 是一个非常强大，也非常全面的一款数据增强库。
下面引用官方 tutorial 中的一段代码来进行说明，首先通常会定义所有的增强操作组成一个 iaa.Sequential, Sequential 操作的对象是一个 4D 数组。

import imgaug.augmenters as iaa

# random example images
images = np.random.randint(0, 255, (16, 128, 128, 3), dtype=np.uint8)

# Sometimes(0.5, ...) applies the given augmenter in 50% of all cases,
# e.g. Sometimes(0.5, GaussianBlur(0.3)) would blur roughly every second image.
sometimes = lambda aug: iaa.Sometimes(0.5, aug)

# Define our sequence of augmentation steps that will be applied to every image
# All augmenters with per_channel=0.5 will sample one value _per image_
# in 50% of all cases. In all other cases they will sample new values

seq = iaa.Sequential(
    [
        # apply the following augmenters to most images
        # 50% 的图片会应用该操作
        iaa.Fliplr(0.5), # horizontally flip 50% of all images
        iaa.Flipud(0.2), # vertically flip 20% of all images
        # crop images by -5% to 10% of their height/width
        sometimes(iaa.CropAndPad(percent=(-0.05, 0.1),
            pad_mode=ia.ALL,
            pad_cval=(0, 255)
        )),
        sometimes(iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
            translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis)
            rotate=(-45, 45), # rotate by -45 to +45 degrees
            shear=(-16, 16), # shear by -16 to +16 degrees
            order=[0, 1], # use nearest neighbour or bilinear interpolation (fast)
            cval=(0, 255), # if mode is constant, use a cval between 0 and 255
            mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples)
        )),
        # execute 0 to 5 of the following (less important) augmenters per image
        # don't execute all of them, as that would often be way too strong
        iaa.SomeOf((0, 5),
            [sometimes(iaa.Superpixels(p_replace=(0, 1.0), n_segments=(20, 200))), # convert images into their superpixel representation
                iaa.OneOf([iaa.GaussianBlur((0, 3.0)), # blur images with a sigma between 0 and 3.0
                    iaa.AverageBlur(k=(2, 7)), # blur image using local means with kernel sizes between 2 and 7
                    iaa.MedianBlur(k=(3, 11)), # blur image using local medians with kernel sizes between 2 and 7
                ]),
                iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)), # sharpen images
                iaa.Emboss(alpha=(0, 1.0), strength=(0, 2.0)), # emboss images
                # search either for all edges or for directed edges,
                # blend the result with the original image using a blobby mask
                iaa.SimplexNoiseAlpha(iaa.OneOf([iaa.EdgeDetect(alpha=(0.5, 1.0)),
                    iaa.DirectedEdgeDetect(alpha=(0.5, 1.0), direction=(0.0, 1.0)),
                ])),
                iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5), # add gaussian noise to images
                iaa.OneOf([iaa.Dropout((0.01, 0.1), per_channel=0.5), # randomly remove up to 10% of the pixels
                    iaa.CoarseDropout((0.03, 0.15), size_percent=(0.02, 0.05), per_channel=0.2),
                ]),
                iaa.Invert(0.05, per_channel=True), # invert color channels
                iaa.Add((-10, 10), per_channel=0.5), # change brightness of images (by -10 to 10 of original value)
                iaa.AddToHueAndSaturation((-20, 20)), # change hue and saturation
                # either change the brightness of the whole image (sometimes
                # per channel) or change the brightness of subareas
                iaa.OneOf([iaa.Multiply((0.5, 1.5), per_channel=0.5),
                    iaa.FrequencyNoiseAlpha(exponent=(-4, 0),
                        first=iaa.Multiply((0.5, 1.5), per_channel=True),
                        second=iaa.LinearContrast((0.5, 2.0))
                    )
                ]),
                iaa.LinearContrast((0.5, 2.0), per_channel=0.5), # improve or worsen the contrast
                iaa.Grayscale(alpha=(0.0, 1.0)),
                sometimes(iaa.ElasticTransformation(alpha=(0.5, 3.5), sigma=0.25)), # move pixels locally around (with random strengths)
                sometimes(iaa.PiecewiseAffine(scale=(0.01, 0.05))), # sometimes move parts of the image around
                sometimes(iaa.PerspectiveTransform(scale=(0.01, 0.1)))
            ],
            random_order=True
        )
    ],
    random_order=True
)

# images ->nupy.ndarray, shape [16, 128, 128, 3]
# images_aug ->nupy.ndarray, shape [16, 128, 128, 3]
images_aug = seq(images=images)

albumentations 是一款很强大，功能也很全的数据增强库，包含了多个领域的数据增强操作，比如图像分割、目标检测以及关键点检测等等。

从官方提供的 Benchmarking results 结果看到，albumentations 貌似比其他增强库的速度要快很多。

下面引用官方 tutorial 中的一段代码来进行说明

from albumentations import (RandomRotate90, IAAAdditiveGaussianNoise, GaussNoise)
import numpy as np
p2 = 0.1
p3 = 0.3
def aug(p1):
    return Compose([RandomRotate90(p=p2),
        OneOf([IAAAdditiveGaussianNoise(p=0.9),
            GaussNoise(p=0.6),
        ], p=p3)
    ], p=p1)

image = np.ones((300, 300, 3), dtype=np.uint8)
mask = np.ones((300, 300), dtype=np.uint8)
whatever_data = "my name"
augmentation = aug(p=0.9)
data = {"image": image, "mask": mask, "whatever_data": whatever_data, "additional": "hello"}
augmented = augmentation(**data)

# 执行增强后，返回值为字典类型
image, mask, whatever_data, additional = augmented["image"], augmented["mask"], augmented["whatever_data"], augmented["additional"]

这段代码，让人困惑的地方主要在于，3 个概率 p(p1，p2, p3)，它们分别代表什么。

p1，也就是 Compose 最外层的概率 p，它表示是否执行增强操作，有 p1 的概率执行。
p2，也就是每个操作内部的概率 p，它表示该操作有 p2 的概率执行。
p3，也就是 OneOf 中最外层的概率 p，它表示 OneOf 包含的操作有 p3 的概率会执行。

albumentations 使用的方式和 imgaug 比较一致。
虽然 albumentations 不能无缝的集成到 Pytorch 中，但基本上也能很好的在自定义的数据集中使用，基本上只需要增加归一化和 Normalize 即可。

Augmentor 同样是一款强大的图像增强库，主要用于分类和分割数据的增强，并且可以很方便的集成到常用的机器学习框架中。

Augmentor 使用基于 Pipeline 的方法，会顺序添加增强操作以生成流 Pipeline。然后，图像通过此 Pipeline 传递，在此过程中，对图像进行的每个操作都将应用到该 Pipeline。
同样，Augmentor 会根据每个操作的自定义概率值，在图像通过 Pipeline 时将操作随机应用于图像。

使用 Augmentor 的基本过程可以分为以下几个阶段

建立新的 Pipeline。在所有的增强任务开始之前，都需要建立一个空的 Pipeline 对象，该对象指向数据集所在的路径
```
>>> import Augmentor

>>> p = Augmentor.Pipeline("/path/to/images")
Initialised with 100 images found in selected directory.
```

向 Pipeline 中添加增强操作

# 向 pipeline 添加旋转操作，probability 为执行概率
>>> p.rotate(probability=0.7, max_left_rotation=10, max_right_rotation=10)

# 向 pipeline 添加缩放操作
>>> p.zoom(probability=0.3, min_factor=1.1, max_factor=1.6)

对图像执行 pipeline 中所有所有操作，并对结果进行采样
```
# 采样 10000 次，得到 10000 张图像
>>> p.sample(10000)
```

建立好 pipeline，并且已经添加好需要的增强操作后，也可以很方便的和 torchvision.transforms 集成。


>>> trans = transforms.Compose([p.torch_transform(),
    transforms.ToTensor()])

本文主要的介绍了

常用的 pytorch 中的数据增强方法
一些更高级的数据增强方法
常用的第三方数据增强库

数据增强可以划分为 3 类，几何变换、颜色变换和信息删除，torchvision.transforms 自带的增强模块已经包含了几何变换和颜色变换中常规的操作。需要注意的是 transforms.Compose 中的增强操作序列是有顺序的。
文中介绍了一些高级的数据增强方法，主要属于信息删除这种类型。近年来有各种各样的增强方式不断涌出，这些方法是否能够在自己的数据集上 work 也尚未可知。如果使用的话，需要合理的设置参数，应该确保增强不会丢失图像过多的有效信息。
imgaug、albumentations 和 augmentor 都是非常优秀的第三方数据增强库,imgaug 和 albumentations 都是非常全面，非常强大的工具，但 albumentations 速度更快，而 augmentor 与 TF、Torch 等机器学习框架可以很好的无缝对接。可以根据自己的喜好和需求，合理选择。
数据增强操作作为数据预处理阶段不可或缺的一步，通常能在一定程度上提高网络的精度以及泛化能力。但需要注意的是，数据增强操作并不是越多越好，需要根据数据集的特点，合理的选择数据增强操作。

好了，就这样吧。以后想到什么再继续补充。

[1] torchvision.transforms Tutorial
[2] Cuout 代码参考
[3] Cutout paper reference
[4] Cutmix 代码参考
[5] Cutmix paper 参考
[6] mixup 代码参考
[7] mixup paper 参考
[8] Grid Mask 代码参考
[9] Grid Mask paper 参考
[10] Dropblock 代码参考
[11] Dropblock paper 参考
[12] DropConnect Paper Reference
[13] DropConnect 代码参考
[14] imgaug Github 地址
[15] imgaug Tutorial
[16] albumentations Github 地址
[17] albumentations Tutorial
[18] Augmentor Github 地址
[19] Augmentor Tutorial

DataWhale街景字符编码识别项目数据增强

数据增强介绍

数据增强方法

常用数据增强方法

transforms 中主要的数据增强操作

高级数据增强方法

图像增强第三方库

imgaug 介绍

albumentations 介绍

Augmentor 介绍

总结

Reference

Just My Socks（注册教程内含优惠码）

DataWhale街景字符编码识别项目数据增强

数据增强介绍

数据增强方法

常用数据增强方法

transforms 中主要的数据增强操作

高级数据增强方法

图像增强第三方库

imgaug 介绍

albumentations 介绍

Augmentor 介绍

总结

Reference

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）