关于tensorflow:基于tfjs和threejs的web端AR人脸特效

加强事实（Augmented Reality，简称 AR）浪潮正滚滚而来，各种 AR 利用层出不穷——AR 导航、AR 购物、AR 教学、AR 游戏……能够说，AR 正在粗浅的扭转咱们的生存。

而撑持 AR 的底层技术也在一直降级。AI 技术的遍及，让 AI 能力得以轻松的接入理论利用，TensorFlow.js（tfjs）的呈现，则让前端也能在 AI 畛域大展身手。浏览器和挪动设施的降级，也使得 Web 利用具备更多的可能性。

Web 浏览器作为人们最唾手可得的人机交互终端，具备不用装置 APP，“开箱即用”的人造劣势，且可能反对手机、平板、PC 等多种终端运行。在这场 AR 技术的浪潮中，Web AR 必将无可限量。

TensorFlow.js 是 Google 公布的用于应用 JavaScript 进行机器学习开发的库，自 2018 年公布以来就受到宽泛关注，有了 tfjs，咱们就能够应用 JavaScript 开发机器学习模型，并间接在浏览器或 Node.js 中训练或应用机器学习模型。

Three.js 是一个用于在浏览器中创立和展现 3D 图形的 js 工具库，由 Ricardo Cabello 在 2010 四月于 GitHub 首次公布。它基于 WebGL，可能调用硬件加速，这使得在浏览器中显示简单的三维图形和动画成为可能。

咱们先来理解一些重要的概念。

Face Mesh 是一种脸部几何解决方案，蕴含 468 集体脸特色点。每个点具备编号，能够依据编号获取各个部位对应的特色点。(编号查问)

UV 是二维纹理坐标，U 代表程度方向，V 代表垂直方向。UV Map 用来形容三维物体外表与图像纹理(Texture) 的映射关系，有了 UV Map，咱们就能够将二维的图像纹理粘贴到三维的物体外表。

Matrix 即矩阵，能够形容物体的平移，旋转和缩放。Three.js 应用 matrix 来进行 3D 变换。

状态键（morph target）在 3D 制作软件中，通常用来制作物体形变动画，例如一些面部动作——眨眼、张嘴等等。状态键的取值范畴是 0.0 到 1.0，对应形变动画的起始和终止状态。通过扭转状态键的取值，就能够准确的管制形变动画。

程序的架构如图所示，过程形容如下：
首先咱们须要调取 Camera，取得相机画面
而后通过 tfjs 加载人脸识别模型，并生成 Face Mesh
依据 Face Mesh 生成三角网格，并进行 UV 贴图，绘制面部图案
通过人脸特色点计算出 Matrix，和面部动作辨认
加载 3D 模型，并对其利用 Matrix，使其呈现在正确的地位
管制模型做出眨眼、张嘴等面部动作

通过 navigator.mediaDevices.enumerateDevices 获取设施列表，找到videoinput，即摄像头

export async function getVideoDevices() {const devices = await navigator.mediaDevices.enumerateDevices()
  const videoDevices = devices.filter(item => item.kind === 'videoinput')
  return videoDevices
}

获取 video stream

export async function getVideoStream(deviceId: string, width?: number, height?: number) {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({video: { deviceId, width, height}
    })
    return stream
  } catch (error) {return Promise.reject(error)
  }
}

咱们提前搁置一个 <video> 元素，通过它来接管 stream，后续就能够通过这个 <video> 来获取视频图像

<video id="video" width="1" height="1"></video>

video.autoplay = true
video.playsInline = true
video.srcObject = stream

这里咱们用的是 tensorflow 提供的开源模型 face-landmarks-detection

import * as faceLandmarksDetection from '@tensorflow-models/face-landmarks-detection'
import * as tf from '@tensorflow/tfjs-core'
import '@mediapipe/face_mesh'
import '@tensorflow/tfjs-backend-webgl'

tensorflow 提供了多种运行后端(cpu, webgl, wasm)，性能比照能够看这里。这里咱们选用 webgl

await tf.setBackend('webgl')
// await tf.setBackend('wasm')

加载人脸识别模型

const model = await faceLandmarksDetection.load(
  faceLandmarksDetection.SupportedPackages.mediapipeFacemesh,
  {
    // 瞳孔辨认
    shouldLoadIrisModel: true,
    // 人脸数量，为节俭性能，咱们设置为 1
    maxFaces: 1,
    // 模型加载地址
    // modelUrl: '/tfjs/facemesh/model.json',
    // detectorModelUrl: '/tfjs/blazeface/model.json',
    // irisModelUrl: '/tfjs/iris/model.json'
  }
)

tfjs 默认会从 tfhub 加载模型，可怜的是这个地址在国内无法访问。如果你没有 vpn 的话，须要去镜像网站下载你须要的模型，而后自行部署

将 <video> 作为图像输出，进行人脸识别。如果辨认胜利，咱们就能够读取到蕴含 Face Mesh 的人脸信息(predictions[0])

const predictions = await model.estimateFaces({
  input: video,
  predictIrises: true
})

if (predictions.length > 0) {// console.log(predictions[0])
}

因为人脸识别以及渲染的过程会比拟耗时，为了防止画面卡顿，咱们须要借助 requestAnimationFrame 来进步性能。将人脸识别和后续的渲染过程 (render3D) 集中放到 frame callback 中，从新组织代码如下

async function render(model: MediaPipeFaceMesh) {
  const predictions = await model.estimateFaces({
    input: video,
    predictIrises: true
  })

  if (predictions.length > 0) {// console.log(predictions[0])
  }

  render3D(predictions[0])

  requestAnimationFrame(() => {render(model)
  })
}

render(model)

在进行 3D 模型渲染之前，咱们须要先发明一个 3D 场景

import * as THREE from 'three'

const scene = new THREE.Scene()
依据你的须要，也能够把 camera 画面当作 scene 的背景
const vw = video.videoWidth
const vh = video.videoHeight

const backgroundTexture = new THREE.VideoTexture(video)
const background = new THREE.Mesh(new THREE.PlaneGeometry(vw, vh),
  new THREE.MeshBasicMaterial({map: backgroundTexture})
)
background.position.set(0, 0, -1000)
scene.add(background)

搁置 camera，这里选用的是正交相机(OrthographicCamera)

let w = window.innerWidth
let h = window.innerHeight

const camera = new THREE.OrthographicCamera(
  w / -2,
  w / 2,
  h / 2,
  h / -2,
  0.1,
  2000
)
camera.position.set(0, 0, 1000)
camera.lookAt(scene.position)

增加一些光照

const hemiLight = new THREE.HemisphereLight(0xffffff, 0xffffff, 0.3)
scene.add(hemiLight)
const ambientLight = new AmbientLight(0xffffff, 0.7)
scene.add(ambientLight)
const directionalLight = new THREE.DirectionalLight(0xffffff, 0.7)
directionalLight.position.set(0.5, 0, 0.866)
scene.add(directionalLight)

创立渲染器

const renderer = new THREE.WebGLRenderer({
  canvas,
  alpha: true,
  antialias: true
})
renderer.setPixelRatio(window.devicePixelRatio)
renderer.setSize(w, h)

至此，3D 场景就创立好了，最初咱们须要把渲染过程搁置在上文提到的渲染函数中

function render3D(prediction: AnnotatedPrediction | undefined) {renderer.render(scene, camera)
}

Three.js 中有各种 3D 几何模型，其中 Mesh 示意三角网格模型，它能够用三角网格来模仿简单的 3D 物体，例如人脸。geometry 是物体的几何属性，material 是物体的材质属性

const mesh = new THREE.Mesh(geometry, material)
scene.add(mesh)

创立 geometry，将 468 集体脸特色点依照肯定的程序 (TRIANGULATION) 组成三角网格，并加载 uv map

const geometry = new THREE.BufferGeometry()
geometry.setIndex(TRIANGULATION)
geometry.setAttribute('uv', new THREE.Float32BufferAttribute(uvs.map((item, index) => index % 2 ? item : 1 - item), 2))
geometry.computeVertexNormals()

依据 face mesh 实时更新 geometry

function updateGeometry(prediction: AnnotatedPrediction) {const faceMesh = resolveMesh(prediction.scaledMesh as Coords3D, vw, vh)
  const positionBuffer = faceMesh.reduce((acc, pos) => acc.concat(pos), [] as number[])
  geometry.setAttribute('position', new THREE.Float32BufferAttribute(positionBuffer, 3))
  geometry.attributes.position.needsUpdate = true
}

function resolveMesh(faceMesh: Coords3D, vw: number, vh: number): Coords3D {return faceMesh.map(p => [p[0] - vw / 2, vh / 2 - p[1], -p[2]])
}

创立 material

const textureLoader = new THREE.TextureLoader()
const texture = textureLoader.load(pathToYourTexturePic)
texture.encoding = THREE.sRGBEncoding
texture.anisotropy = 16
const material = new THREE.MeshBasicMaterial({
  map: texture,
  transparent: true,
  color: new THREE.Color(0xffffff),
  reflectivity: 0.5
});

最初将 geometry 的计算过程搁置在渲染函数中

function render3D(prediction: AnnotatedPrediction | undefined) {if (prediction) {updateGeometry(prediction)
  }
  renderer.render(scene, camera)
}

至此，咱们就能够在面部绘制各种图案了

如果你须要制作更多的素材，能够对照这张规范脸模型绘制你想要的图案

接下来咱们开始制作 3D 卡通头像成果

咱们应用 gltf 格局的 3d 素材，在加载 3D 模型之后，须要对物体的地位、尺寸等属性进行调整，使其回到画面地方

const loader = new GLTFLoader()
const 3dModel = new THREE.Object3D()
model.position.set(0, 0, 0)
loader.load('/models/animal_head/bear.glb', (gltf) => {
  const object = gltf.scene
  const box = new Box3().setFromObject(object)
  const size = box.getSize(new Vector3()).length()
  const center = box.getCenter(new Vector3())
  object.position.x += (object.position.x - center.x);
  object.position.y += (object.position.y - center.y + 1);
  object.position.z += (object.position.z - center.z - 15);
  3dModel.add(object)
})

依据人脸识别的后果，计算出面部的 position, scale, rotation 等信息，而后将其利用到 3D 模型上。
position: 以眉心 (midwayBetweenEyes) 作为地位基准
scale: 以最左侧(234) 和最右侧 (454) 的间隔作为缩放基准 (编号查问)
rotation: 以头顶(10) 左脸颊 (50) 右脸颊 (280) 作为旋转基准

function track(object: Object3D, prediction: AnnotatedPrediction) {const annotations: Annotations = (prediction as any).annotations
  const position = annotations.midwayBetweenEyes[0]
  const scale = getScale(prediction.scaledMesh as Coords3D, 234, 454)
  const rotation = getRotation(prediction.scaledMesh as Coords3D, 10, 50, 280)
  object.position.set(...position)
  object.scale.setScalar(scale / 18)
  object.scale.x *= -1
  object.rotation.setFromRotationMatrix(rotation)
  object.rotation.y = -object.rotation.y
  object.rotateZ(Math.PI)
}

将跟踪过程搁置到渲染函数中，就能够看到 3D 头像成果了

function render3D(prediction: AnnotatedPrediction | undefined) {if (prediction) {// updateGeometry(prediction)
    track(3dModel, prediction)
  }
  renderer.render(scene, camera)
}

面部动作须要借助状态键来实现，在模型加载之后，递归查找出模型外部所有的状态键。（对于 3D 模型的状态动画如何制作，这须要肯定的 3D 绘图根底，有趣味的同学能够本人找材料学习。过段时间我也会整顿一篇教程）

const morphTarget = findMorphTarget(gltf.scene)

export function findMorphTarget(nodes: THREE.Object3D): Record<string, (value: number) => void> {const morphTarget = {} as Record<string, (value: number) => void>
  const traverse = (node: THREE.Object3D) => {if (node.type === 'Mesh' && (node as Mesh).morphTargetInfluences) {
      const mesh = node as Mesh
      Object.keys(mesh.morphTargetDictionary!).forEach(key => {morphTarget[key] = (value: number) => {mesh.morphTargetInfluences![mesh.morphTargetDictionary![key]] = value
        }
      })
    }
    node.children.forEach(traverse)
  }
  traverse(nodes)
  return morphTarget
}

而后咱们来计算眨眼，张嘴的幅度，这里借助了第三方库 kalidokit

import * as Kalidokit from "kalidokit"

export function getFaceRig(prediction: AnnotatedPrediction, video: HTMLVideoElement) {const faceRig = Kalidokit.Face.solve(coordsToXYZ(prediction.scaledMesh as Coords3D), {
    runtime: "tfjs", // `mediapipe` or `tfjs`
    video,
    imageSize: {height: 480, width: 640},
    smoothBlink: false, // smooth left and right eye blink delays
    blinkSettings: [0.25, 0.75], // adjust upper and lower bound blink sensitivity
  })
  return faceRig
}

export function coordsToXYZ(coords: Coords3D) {
  return coords.map(item => ({x: item[0],
    y: item[1],
    z: item[2]
  }))
}

计算结果示例如下

{eye: {l: 1,r: 1},
    mouth: {
        x: 0,
        y: 0,
        shape: {A:0, E:0, I:0, O:0, U:0}
    },
    head: {
        x: 0,
        y: 0,
        z: 0,
        width: 0.3,
        height: 0.6,
        position: {x: 0.5, y: 0.5, z: 0}
    },
    brow: 0,
    pupil: {x: 0, y: 0}
}

依据计算结果，设置状态键的取值，即可准确管制眨眼，张嘴的幅度

function track(object: Object3D, prediction: AnnotatedPrediction, faceRig: TFace) {
  // ...

  if (morphTarget) {
    // flipped
    morphTarget['leftEye'] && morphTarget['leftEye'](1 - faceRig.eye.r)
    morphTarget['rightEye'] && morphTarget['rightEye'](1 - faceRig.eye.l)
    morphTarget['mouth'] && morphTarget['mouth'](faceRig.mouth.shape.A)
  }
}

面部贴图

3D 卡通头像

更多成果能够查看 Demo: https://caiwenlie.github.io/A…

以上。

mediapipe: https://google.github.io/medi…
TensorFlow.js: https://www.tensorflow.org/js…
three.js: https://threejs.org/

关于tensorflow:基于tfjs和threejs的web端AR人脸特效

前言

第一章工具介绍

第二章重要概念

Face Mesh

UV Map

Matrix

状态键

第三章架构设计

第四章性能拆解

调取 Camera

人脸识别

渲染函数

3D 场景

面部贴图

加载 3D 模型

计算 Matrix

面部动作

第五章成果展现

参考文档

前言

第一章 工具介绍

第二章 重要概念

Face Mesh

UV Map

Matrix

状态键

第三章 架构设计

第四章 性能拆解

调取 Camera

人脸识别

渲染函数

3D 场景

面部贴图

加载 3D 模型

计算 Matrix

面部动作

第五章 成果展现

参考文档

第一章工具介绍

第二章重要概念

第三章架构设计

第四章性能拆解

第五章成果展现