关于人工智能:处理帧数不等的视频的批处理代码

作者|Rahul Varma
编译|VK
起源|Towards Data Science

训练和测试一个无效的机器学习模型最重要的一步是收集大量数据并应用这些数据对其进行无效训练。小批量（Mini-batches）有助于解决这个问题，在每次迭代中应用一小部分数据进行训练。

然而，随着大量的机器学习工作在视频数据集上执行，存在着对不等长视频进行无效批处理的问题。大多数办法依赖于将视频裁剪成相等的长度，以便在迭代期间提取雷同数量的帧。但在咱们须要从每一帧获取信息来无效地预测某些事件的场景中，这并不是特地有用，特地是在主动驾驶汽车和动作辨认的状况下。

咱们能够创立一个能够解决不同长度视频的解决办法。

在Glenn Jocher的Yolov3中(https://github.com/ultralytic...，我用LoadStreams作为根底，创立了LoadStreamsBatch类。

类初始化

def __init__(self, sources='streams.txt', img_size=416, batch_size=2, subdir_search=False):        self.mode = 'images'        self.img_size = img_size        self.def_img_size = None        videos = []        if os.path.isdir(sources):            if subdir_search:                for subdir, dirs, files in os.walk(sources):                    for file in files:                        if 'video' in magic.from_file(subdir + os.sep + file, mime=True):                            videos.append(subdir + os.sep + file)            else:                for elements in os.listdir(sources):                    if not os.path.isdir(elements) and 'video' in magic.from_file(sources + os.sep + elements, mime=True):                        videos.append(sources + os.sep + elements)        else:            with open(sources, 'r') as f:                videos = [x.strip() for x in f.read().splitlines() if len(x.strip())]        n = len(videos)        curr_batch = 0        self.data = [None] * batch_size        self.cap = [None] * batch_size        self.sources = videos        self.n = n        self.cur_pos = 0        # 启动线程从视频流中读取帧        for i, s in enumerate(videos):            if curr_batch == batch_size:                break            print('%g/%g: %s... ' % (self.cur_pos+1, n, s), end='')            self.cap[curr_batch] = cv2.VideoCapture(s)            try:                assert self.cap[curr_batch].isOpened()            except AssertionError:                print('Failed to open %s' % s)                self.cur_pos+=1                continue            w = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_WIDTH))            h = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_HEIGHT))            fps = self.cap[curr_batch].get(cv2.CAP_PROP_FPS) % 100            frames = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_COUNT))            _, self.data[i] = self.cap[curr_batch].read()  # guarantee first frame            thread = Thread(target=self.update, args=([i, self.cap[curr_batch], self.cur_pos+1]), daemon=True)            print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))            curr_batch+=1            self.cur_pos+=1            thread.start()            print('')  # 新的一行        if all( v is None for v in self.data ):            return        # 查看常见形态        s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.data], 0)  # 推理的形态        self.rect = np.unique(s, axis=0).shape[0] == 1        if not self.rect:            print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

在__init__函数中，承受四个参数。尽管img_size与原始版本雷同，但其余三个参数定义如下：

sources：它以目录门路或文本文件作为输出。
batch_size：所需的批大小
subdir_search：能够切换此选项，以确保在将目录作为sources参数传递时搜寻所有子目录中的相干文件

我首先查看sources参数是目录还是文本文件。如果是一个目录，我会读取目录中的所有内容（如果subdir_search参数为True，子目录也会包含在内），否则我会读取文本文件中视频的门路。视频的门路存储在列表中。应用cur_pos以跟踪列表中的以后地位。

该列表以batch_size为最大值进行迭代，并查看以跳过谬误视频或不存在的视频。它们被发送到letterbox函数，以调整图像大小。这与原始版本相比没有任何变动，除非所有视频都有故障/不可用。

def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):    # 将图像调整为32个像素倍数的矩形 https://github.com/ultralytics/yolov3/issues/232    shape = img.shape[:2]  # 以后形态 [height, width]    if isinstance(new_shape, int):        new_shape = (new_shape, new_shape)    # 比例    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])    if not scaleup:  # 只按比例放大，不按比例放大（用于更好的测试图）        r = min(r, 1.0)    # 计算填充    ratio = r, r  # 宽高比    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  #填充    if auto:  # 最小矩形        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # 填充    elif scaleFill:  # 舒展        dw, dh = 0.0, 0.0        new_unpad = new_shape        ratio = new_shape[0] / shape[1], new_shape[1] / shape[0]  # 宽高比    dw /= 2  # 将填充分成两侧    dh /= 2    if shape[::-1] != new_unpad:  # 扭转大小        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # 增加边界    return img, ratio, (dw, dh)

固定距离检索帧函数

update函数有一个小的变动，咱们另外存储了默认的图像大小，以便在所有视频都被提取进行解决，但因为长度不相等，一个视频比另一个视频提前完成。当我解释代码的下一部分时，它会更分明，那就是__next__ 函数。

def update(self, index, cap, cur_pos):        # 读取守护过程线程中的下一个帧        n = 0        while cap.isOpened():            n += 1            # _, self.imgs[index] = cap.read()            cap.grab()            if n == 4:  # 每4帧读取一次                _, self.data[index] = cap.retrieve()                if self.def_img_size is None:                    self.def_img_size = self.data[index].shape                n = 0            time.sleep(0.01)  # 期待

迭代器

如果帧存在，它会像平常一样传递给letterbox函数。在frame为None的状况下，这意味着视频已被齐全解决，咱们查看列表中的所有视频是否都已被解决。如果有更多的视频要解决，cur_pos指针用于获取下一个可用视频的地位。

如果不再从列表中提取视频，但仍在解决某些视频，则向其余解决组件发送一个空白帧，即，它依据其余批次中的残余帧动静调整视频大小。

def __next__(self):        self.count += 1        img0 = self.data.copy()        img = []        for i, x in enumerate(img0):            if x is not None:                img.append(letterbox(x, new_shape=self.img_size, auto=self.rect)[0])            else:                if self.cur_pos == self.n:                    if all( v is None for v in img0 ):                        cv2.destroyAllWindows()                        raise StopIteration                    else:                        img0[i] = np.zeros(self.def_img_size)                        img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])                else:                    print('%g/%g: %s... ' % (self.cur_pos+1, self.n, self.sources[self.cur_pos]), end='')                    self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])                    fldr_end_flg = 0                    while not self.cap[i].isOpened():                        print('Failed to open %s' % self.sources[self.cur_pos])                        self.cur_pos+=1                        if self.cur_pos == self.n:                            img0[i] = np.zeros(self.def_img_size)                            img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])                            fldr_end_flg = 1                            break                        self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])                    if fldr_end_flg:                        continue                    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))                    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))                    fps = cap.get(cv2.CAP_PROP_FPS) % 100                    frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))                    _, self.data[i] = self.cap[i].read()  # 保障第一帧                    img0[i] = self.data[i]                    img.append(letterbox(self.data[i], new_shape=self.img_size, auto=self.rect)[0])                    thread = Thread(target=self.update, args=([i, self.cap[i], self.cur_pos+1]), daemon=True)                    print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))                    self.cur_pos+=1                    thread.start()                    print('')  # 新的一行        # 重叠        img = np.stack(img, 0)        # 转换        img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR 到 RGB, bsx3x416x416        img = np.ascontiguousarray(img)        return self.sources, img, img0, None

论断

随着大量的工夫破费在数据收集和数据预处理上，我置信这有助于缩小视频与模型匹配的工夫，咱们能够集中精力使模型与数据相匹配。

我在这里附上残缺的源代码。心愿这有帮忙！

原文链接：https://towardsdatascience.co...

欢送关注磐创AI博客站：
http://panchuang.net/

sklearn机器学习中文官网文档：
http://sklearn123.com/

欢送关注磐创博客资源汇总站：
http://docs.panchuang.net/