关于tensorflow:基于tfjs和threejs的web端AR人脸特效

前言

加强事实（Augmented Reality，简称AR）浪潮正滚滚而来，各种AR利用层出不穷——AR导航、AR购物、AR教学、AR游戏……能够说，AR正在粗浅的扭转咱们的生存。

而撑持AR的底层技术也在一直降级。AI技术的遍及，让AI能力得以轻松的接入理论利用，TensorFlow.js（tfjs）的呈现，则让前端也能在AI畛域大展身手。浏览器和挪动设施的降级，也使得Web利用具备更多的可能性。

Web 浏览器作为人们最唾手可得的人机交互终端，具备不用装置APP，“开箱即用”的人造劣势，且可能反对手机、平板、PC等多种终端运行。在这场AR技术的浪潮中，Web AR必将无可限量。

第一章工具介绍

TensorFlow.js是Google公布的用于应用 JavaScript 进行机器学习开发的库，自2018年公布以来就受到宽泛关注，有了tfjs，咱们就能够应用 JavaScript 开发机器学习模型，并间接在浏览器或 Node.js 中训练或应用机器学习模型。

Three.js是一个用于在浏览器中创立和展现3D图形的js工具库，由Ricardo Cabello在2010四月于GitHub首次公布。它基于WebGL，可能调用硬件加速，这使得在浏览器中显示简单的三维图形和动画成为可能。

第二章重要概念

咱们先来理解一些重要的概念。

Face Mesh

Face Mesh是一种脸部几何解决方案，蕴含468集体脸特色点。每个点具备编号，能够依据编号获取各个部位对应的特色点。(编号查问)

UV Map

UV是二维纹理坐标，U代表程度方向，V代表垂直方向。UV Map用来形容三维物体外表与图像纹理(Texture) 的映射关系，有了UV Map，咱们就能够将二维的图像纹理粘贴到三维的物体外表。

Matrix

Matrix即矩阵，能够形容物体的平移，旋转和缩放。Three.js应用matrix来进行3D变换。

状态键

状态键（morph target）在3D制作软件中，通常用来制作物体形变动画，例如一些面部动作——眨眼、张嘴等等。状态键的取值范畴是0.0到1.0，对应形变动画的起始和终止状态。通过扭转状态键的取值，就能够准确的管制形变动画。

第三章架构设计

程序的架构如图所示，过程形容如下：
首先咱们须要调取Camera，取得相机画面
而后通过tfjs加载人脸识别模型，并生成Face Mesh
依据Face Mesh生成三角网格，并进行UV贴图，绘制面部图案
通过人脸特色点计算出Matrix，和面部动作辨认
加载3D模型，并对其利用Matrix，使其呈现在正确的地位
管制模型做出眨眼、张嘴等面部动作

第四章性能拆解

调取Camera

通过navigator.mediaDevices.enumerateDevices获取设施列表，找到videoinput，即摄像头

export async function getVideoDevices() {  const devices = await navigator.mediaDevices.enumerateDevices()  const videoDevices = devices.filter(item => item.kind === 'videoinput')  return videoDevices}

获取video stream

export async function getVideoStream(deviceId: string, width?: number, height?: number) {  try {    const stream = await navigator.mediaDevices.getUserMedia({      video: { deviceId, width, height }    })    return stream  } catch (error) {    return Promise.reject(error)  }}

咱们提前搁置一个<video>元素，通过它来接管stream，后续就能够通过这个<video>来获取视频图像

<video id="video" width="1" height="1"></video>video.autoplay = truevideo.playsInline = truevideo.srcObject = stream

人脸识别

这里咱们用的是tensorflow提供的开源模型face-landmarks-detection

import * as faceLandmarksDetection from '@tensorflow-models/face-landmarks-detection'import * as tf from '@tensorflow/tfjs-core'import '@mediapipe/face_mesh'import '@tensorflow/tfjs-backend-webgl'

tensorflow提供了多种运行后端(cpu, webgl, wasm)，性能比照能够看这里。这里咱们选用webgl

await tf.setBackend('webgl')// await tf.setBackend('wasm')

加载人脸识别模型

const model = await faceLandmarksDetection.load(  faceLandmarksDetection.SupportedPackages.mediapipeFacemesh,  {    // 瞳孔辨认    shouldLoadIrisModel: true,    // 人脸数量，为节俭性能，咱们设置为1    maxFaces: 1,    // 模型加载地址    // modelUrl: '/tfjs/facemesh/model.json',    // detectorModelUrl: '/tfjs/blazeface/model.json',    // irisModelUrl: '/tfjs/iris/model.json'  })

tfjs默认会从tfhub加载模型，可怜的是这个地址在国内无法访问。如果你没有vpn的话，须要去镜像网站下载你须要的模型，而后自行部署

将<video>作为图像输出，进行人脸识别。如果辨认胜利，咱们就能够读取到蕴含Face Mesh的人脸信息(predictions[0])

const predictions = await model.estimateFaces({  input: video,  predictIrises: true})if (predictions.length > 0) {  // console.log(predictions[0])}

渲染函数

因为人脸识别以及渲染的过程会比拟耗时，为了防止画面卡顿，咱们须要借助requestAnimationFrame来进步性能。将人脸识别和后续的渲染过程(render3D)集中放到frame callback中，从新组织代码如下

async function render(model: MediaPipeFaceMesh) {  const predictions = await model.estimateFaces({    input: video,    predictIrises: true  })  if (predictions.length > 0) {    // console.log(predictions[0])  }  render3D(predictions[0])  requestAnimationFrame(() => {    render(model)  })}render(model)

3D场景

在进行3D模型渲染之前，咱们须要先发明一个3D场景

import * as THREE from 'three'const scene = new THREE.Scene()依据你的须要，也能够把camera画面当作scene的背景const vw = video.videoWidthconst vh = video.videoHeightconst backgroundTexture = new THREE.VideoTexture(video)const background = new THREE.Mesh(  new THREE.PlaneGeometry(vw, vh),  new THREE.MeshBasicMaterial({    map: backgroundTexture  }))background.position.set(0, 0, -1000)scene.add(background)

搁置camera，这里选用的是正交相机(OrthographicCamera)

let w = window.innerWidthlet h = window.innerHeightconst camera = new THREE.OrthographicCamera(  w / -2,  w / 2,  h / 2,  h / -2,  0.1,  2000)camera.position.set(0, 0, 1000)camera.lookAt(scene.position)

增加一些光照

const hemiLight = new THREE.HemisphereLight(0xffffff, 0xffffff, 0.3)scene.add(hemiLight)const ambientLight = new AmbientLight(0xffffff, 0.7)scene.add(ambientLight)const directionalLight = new THREE.DirectionalLight(0xffffff, 0.7)directionalLight.position.set(0.5, 0, 0.866)scene.add(directionalLight)

创立渲染器

const renderer = new THREE.WebGLRenderer({  canvas,  alpha: true,  antialias: true})renderer.setPixelRatio(window.devicePixelRatio)renderer.setSize(w, h)

至此，3D场景就创立好了，最初咱们须要把渲染过程搁置在上文提到的渲染函数中

function render3D(prediction: AnnotatedPrediction | undefined) {  renderer.render(scene, camera)}

面部贴图

Three.js中有各种3D几何模型，其中Mesh示意三角网格模型，它能够用三角网格来模仿简单的3D物体，例如人脸。geometry是物体的几何属性，material是物体的材质属性

const mesh = new THREE.Mesh(geometry, material)scene.add(mesh)

创立geometry，将468集体脸特色点依照肯定的程序(TRIANGULATION)组成三角网格，并加载uv map

const geometry = new THREE.BufferGeometry()geometry.setIndex(TRIANGULATION)geometry.setAttribute('uv', new THREE.Float32BufferAttribute(uvs.map((item, index) => index % 2 ? item : 1 - item), 2))geometry.computeVertexNormals()

依据face mesh实时更新geometry

function updateGeometry(prediction: AnnotatedPrediction) {  const faceMesh = resolveMesh(prediction.scaledMesh as Coords3D, vw, vh)  const positionBuffer = faceMesh.reduce((acc, pos) => acc.concat(pos), [] as number[])  geometry.setAttribute('position', new THREE.Float32BufferAttribute(positionBuffer, 3))  geometry.attributes.position.needsUpdate = true}function resolveMesh(faceMesh: Coords3D, vw: number, vh: number): Coords3D {  return faceMesh.map(p => [p[0] - vw / 2, vh / 2 - p[1], -p[2]])}

创立material

const textureLoader = new THREE.TextureLoader()const texture = textureLoader.load(pathToYourTexturePic)texture.encoding = THREE.sRGBEncodingtexture.anisotropy = 16const material = new THREE.MeshBasicMaterial({  map: texture,  transparent: true,  color: new THREE.Color(0xffffff),  reflectivity: 0.5});

最初将geometry的计算过程搁置在渲染函数中

function render3D(prediction: AnnotatedPrediction | undefined) {  if (prediction) {    updateGeometry(prediction)  }  renderer.render(scene, camera)}

至此，咱们就能够在面部绘制各种图案了

如果你须要制作更多的素材，能够对照这张规范脸模型绘制你想要的图案

接下来咱们开始制作3D卡通头像成果

加载3D模型

咱们应用gltf格局的3d素材，在加载3D模型之后，须要对物体的地位、尺寸等属性进行调整，使其回到画面地方

const loader = new GLTFLoader()const 3dModel = new THREE.Object3D()model.position.set( 0, 0, 0 )loader.load('/models/animal_head/bear.glb', (gltf) => {  const object = gltf.scene  const box = new Box3().setFromObject(object)  const size = box.getSize(new Vector3()).length()  const center = box.getCenter(new Vector3())  object.position.x += (object.position.x - center.x);  object.position.y += (object.position.y - center.y + 1);  object.position.z += (object.position.z - center.z - 15);  3dModel.add(object)})

计算Matrix

依据人脸识别的后果，计算出面部的position, scale, rotation等信息，而后将其利用到3D模型上。
position: 以眉心(midwayBetweenEyes)作为地位基准
scale: 以最左侧(234)和最右侧(454)的间隔作为缩放基准(编号查问)
rotation: 以头顶(10)左脸颊(50)右脸颊(280)作为旋转基准

function track(object: Object3D, prediction: AnnotatedPrediction) {  const annotations: Annotations = (prediction as any).annotations  const position = annotations.midwayBetweenEyes[0]  const scale = getScale(prediction.scaledMesh as Coords3D, 234, 454)  const rotation = getRotation(prediction.scaledMesh as Coords3D, 10, 50, 280)  object.position.set(...position)  object.scale.setScalar(scale / 18)  object.scale.x *= -1  object.rotation.setFromRotationMatrix(rotation)  object.rotation.y = -object.rotation.y  object.rotateZ(Math.PI)}

将跟踪过程搁置到渲染函数中，就能够看到3D头像成果了

function render3D(prediction: AnnotatedPrediction | undefined) {  if (prediction) {    // updateGeometry(prediction)    track(3dModel, prediction)  }  renderer.render(scene, camera)}

面部动作

面部动作须要借助状态键来实现，在模型加载之后，递归查找出模型外部所有的状态键。（对于3D模型的状态动画如何制作，这须要肯定的3D绘图根底，有趣味的同学能够本人找材料学习。过段时间我也会整顿一篇教程）

const morphTarget = findMorphTarget(gltf.scene)export function findMorphTarget(nodes: THREE.Object3D): Record<string, (value: number) => void> {  const morphTarget = {} as Record<string, (value: number) => void>  const traverse = (node: THREE.Object3D) => {    if (node.type === 'Mesh' && (node as Mesh).morphTargetInfluences) {      const mesh = node as Mesh      Object.keys(mesh.morphTargetDictionary!).forEach(key => {        morphTarget[key] = (value: number) => {          mesh.morphTargetInfluences![mesh.morphTargetDictionary![key]] = value        }      })    }    node.children.forEach(traverse)  }  traverse(nodes)  return morphTarget}

而后咱们来计算眨眼，张嘴的幅度，这里借助了第三方库kalidokit

import * as Kalidokit from "kalidokit"export function getFaceRig(prediction: AnnotatedPrediction, video: HTMLVideoElement) {  const faceRig = Kalidokit.Face.solve(coordsToXYZ(prediction.scaledMesh as Coords3D), {    runtime: "tfjs", // `mediapipe` or `tfjs`    video,    imageSize: { height: 480, width: 640 },    smoothBlink: false, // smooth left and right eye blink delays    blinkSettings: [0.25, 0.75], // adjust upper and lower bound blink sensitivity  })  return faceRig}export function coordsToXYZ(coords: Coords3D) {  return coords.map(item => ({    x: item[0],    y: item[1],    z: item[2]  }))}

计算结果示例如下

{    eye: {l: 1,r: 1},    mouth: {        x: 0,        y: 0,        shape: {A:0, E:0, I:0, O:0, U:0}    },    head: {        x: 0,        y: 0,        z: 0,        width: 0.3,        height: 0.6,        position: {x: 0.5, y: 0.5, z: 0}    },    brow: 0,    pupil: {x: 0, y: 0}}

依据计算结果，设置状态键的取值，即可准确管制眨眼，张嘴的幅度

function track(object: Object3D, prediction: AnnotatedPrediction, faceRig: TFace) {  // ...  if (morphTarget) {    // flipped    morphTarget['leftEye'] && morphTarget['leftEye'](1 - faceRig.eye.r)    morphTarget['rightEye'] && morphTarget['rightEye'](1 - faceRig.eye.l)    morphTarget['mouth'] && morphTarget['mouth'](faceRig.mouth.shape.A)  }}

第五章成果展现

面部贴图

3D卡通头像

更多成果能够查看Demo: https://caiwenlie.github.io/A...

以上。

参考文档

mediapipe: https://google.github.io/medi...
TensorFlow.js: https://www.tensorflow.org/js...
three.js: https://threejs.org/

前言

第一章 工具介绍

第二章 重要概念

Face Mesh

UV Map

Matrix

状态键

第三章 架构设计

第四章 性能拆解

调取Camera

人脸识别

渲染函数

3D场景

面部贴图

加载3D模型

计算Matrix

面部动作

第五章 成果展现

参考文档

第一章工具介绍

第二章重要概念

第三章架构设计

第四章性能拆解

第五章成果展现