本篇文章将介绍如何将赛道的图像转换为语义宰割后鸟瞰图的轨迹。
如下所示,输出图像为
输入:
总结来说咱们的工作是获取输出图像,即后方轨道的前置摄像头视图,并构建一个鸟瞰轨道视图,而鸟瞰轨道视图会宰割不同的色彩示意赛道和路面的边界。
仅仅从输出图像中提取出对于走向的信息是相当艰难的,因为将来的许多轨道信息被压缩到图像的前20个像素行中。鸟瞰摄像头可能以更清晰的格局表白对于后方赛道的信息,咱们能够更容易地应用它来布局汽车的行为。
在失常行驶时拍摄鸟瞰图是十分难实现的,所以如果咱们能够应用前置摄像头的图像重建这些鸟眼图像,就能让咱们用更清晰信息来进行门路的布局。另一个益处是能够升高维度,无效地将整个图像示意为一组32个数字,这比整个图像占用的空间少得多。并且如果还能够应用这种低维数据作为强化学习算法的察看空间。
本文中利用一种叫做变分主动编码器(VAEs)的工具来帮忙咱们实现这项工作。简略地说,咱们把图像压缩到32维的潜在空间,而后重建咱们宰割的鸟瞰图。本文开端的PyTorch代码显示了残缺的模型代码。
为了训练这一点,咱们从前置摄像头和鸟类摄像头收集了一系列图像。而后用编码器进行编码,而后应用全连贯的层将维度升高到指标大小,最初应用解码器用一系列反卷积层重建图像。
后果如下所示:
尽管咱们能够在重建中看到一些噪声,但它能够很好地捕捉到整体曲线。代码如下:
import cv2import tqdmimport numpy as npimport torchimport torch.nn as nnimport torch.nn.functional as Fclass BEVVAE(nn.Module): """Input should be (bsz, C, H, W) where C=3, H=42, W=144""" def __init__(self, im_c=3, im_h=95, im_w=512, z_dim=32): super().__init__() self.im_c = im_c self.im_h = im_h self.im_w = im_w encoder_list = [ nn.Conv2d(im_c, 32, kernel_size=4, stride=2, padding=1), nn.ReLU(), nn.Conv2d(32, 64, kernel_size=4, stride=2, padding=1), nn.ReLU(), nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1), nn.ReLU(), nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), nn.ReLU(), nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), nn.ReLU(), nn.Flatten(), ] self.encoder = nn.Sequential(*encoder_list) self.encoder_list = encoder_list sample_img = torch.zeros([1, im_c, im_h, im_w]) em_shape = nn.Sequential(*encoder_list[:-1])(sample_img).shape[1:] h_dim = np.prod(em_shape) self.fc1 = nn.Linear(h_dim, z_dim) self.fc2 = nn.Linear(h_dim, z_dim) self.fc3 = nn.Linear(z_dim, h_dim) self.decoder = nn.Sequential( nn.Unflatten(1, em_shape), nn.ConvTranspose2d( em_shape[0], 256, kernel_size=4, stride=2, padding=1, output_padding=(1, 0), ), nn.ReLU(), nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1, output_padding=(1, 0)), nn.ReLU(), nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1, output_padding=(1, 0)), nn.ReLU(), nn.ConvTranspose2d( 64, 32, kernel_size=4, stride=2, padding=1, output_padding=(1, 0) ), nn.ReLU(), nn.ConvTranspose2d(32, im_c, kernel_size=4, stride=2, padding=1, output_padding=(1, 0)), nn.Sigmoid(), ) def reparameterize(self, mu, logvar): std = logvar.mul(0.5).exp_() esp = torch.randn(*mu.size(), device=mu.device) z = mu + std * esp return z def bottleneck(self, h): mu, logvar = self.fc1(h), self.fc2(h) z = self.reparameterize(mu, logvar) return z, mu, logvar def representation(self, x): return self.bottleneck(self.encoder(x))[0] def encode_raw(self, x: np.ndarray, device): # assume x is RGB image with shape (bsz, H, W, 3) p = np.zeros([x.shape[0], 95, 512, 3], np.float) for i in range(x.shape[0]): p[i] = x[i][190:285] / 255 x = p.transpose(0, 3, 1, 2) x = torch.as_tensor(x, device=device, dtype=torch.float) v = self.representation(x) return v, v.detach().cpu().numpy() def squish_targets(self, x: np.ndarray) -> np.ndarray: # Take in target images and resize them p = np.zeros([x.shape[0], 95, 512, 3], np.float) for i in range(x.shape[0]): p[i] = cv2.resize(x[i], (512, 95)) / 255 x = p.transpose(0, 3, 1, 2) return x def encode(self, x): h = self.encoder(x) z, mu, logvar = self.bottleneck(h) return z, mu, logvar def decode(self, z): z = self.fc3(z) return self.decoder(z) def forward(self, x): # expects (N, C, H, W) z, mu, logvar = self.encode(x) z = self.decode(z) return z, mu, logvar def loss(self, bev, recon, mu, logvar, kld_weight=1.0): bce = F.binary_cross_entropy(recon, bev, reduction="sum") kld = -0.5 * torch.sum(1 + logvar - mu ** 2 - logvar.exp()) return bce + kld * kld_weight
https://avoid.overfit.cn/post/48f129f8e05242128cc55be13433ad0a
作者:Nandan Tumu