关于前端:Vue模板编译原理

写过 Vue 的同学必定体验过，.vue 这种单文件组件有如许不便。然而咱们也晓得，Vue 底层是通过虚构 DOM 来进行渲染的，那么 .vue 文件的模板到底是怎么转换成虚构 DOM 的呢？这一块对我来说始终是个黑盒，之前也没有深入研究过，明天打算一探到底。

Vue 3 公布在即，原本想着间接看看 Vue 3 的模板编译，然而我关上 Vue 3 源码的时候，发现我如同连 Vue 2 是怎么编译模板的都不晓得。从小鲁迅就通知咱们，不能一口吃成一个瘦子，那我只能回头看看 Vue 2 的模板编译源码，至于 Vue 3 就留到正式公布的时候再看。

很多人应用 Vue 的时候，都是间接通过 vue-cli 生成的模板代码，并不知道 Vue 其实提供了两个构建版本。

vue.js：残缺版本，蕴含了模板编译的能力；
vue.runtime.js：运行时版本，不提供模板编译能力，须要通过 vue-loader 进行提前编译。

简略来说，就是如果你用了 vue-loader，就能够应用 vue.runtime.min.js，将模板编译的过程交过 vue-loader，如果你是在浏览器中间接通过 script 标签引入 Vue，须要应用 vue.min.js，运行的时候编译模板。

理解了 Vue 的版本，咱们看看 Vue 完整版的入口文件（src/platforms/web/entry-runtime-with-compiler.js）。

// 省略了局部代码，只保留了要害局部
import {compileToFunctions} from './compiler/index'

const mount = Vue.prototype.$mount
Vue.prototype.$mount = function (el) {
  const options = this.$options
  
  // 如果没有 render 办法，则进行 template 编译
  if (!options.render) {
    let template = options.template
    if (template) {
      // 调用 compileToFunctions，编译 template，失去 render 办法
      const {render, staticRenderFns} = compileToFunctions(template, {
        shouldDecodeNewlines,
        shouldDecodeNewlinesForHref,
        delimiters: options.delimiters,
        comments: options.comments
      }, this)
      // 这里的 render 办法就是生成生成虚构 DOM 的办法
      options.render = render
    }
  }
  return mount.call(this, el, hydrating)
}

再看看 ./compiler/index 文件的 compileToFunctions 办法从何而来。

import {baseOptions} from './options'
import {createCompiler} from 'compiler/index'

// 通过 createCompiler 办法生成编译函数
const {compile, compileToFunctions} = createCompiler(baseOptions)
export {compile, compileToFunctions}

后续的次要逻辑都在 compiler 模块中，这一块有些绕，因为本文不是做源码剖析，就不贴整段源码了。简略看看这一段的逻辑是怎么样的。

export function createCompiler(baseOptions) {const baseCompile = (template, options) => {
    // 解析 html，转化为 ast
    const ast = parse(template.trim(), options)
    // 优化 ast，标记动态节点
    optimize(ast, options)
    // 将 ast 转化为可执行代码
    const code = generate(ast, options)
    return {
      ast,
      render: code.render,
      staticRenderFns: code.staticRenderFns
    }
  }
  const compile = (template, options) => {const tips = []
    const errors = []
    // 收集编译过程中的错误信息
    options.warn = (msg, tip) => {(tip ? tips : errors).push(msg)
    }
    // 编译
    const compiled = baseCompile(template, options)
    compiled.errors = errors
    compiled.tips = tips

    return compiled
  }
  const createCompileToFunctionFn = () => {
    // 编译缓存
    const cache = Object.create(null)
    return (template, options, vm) => {
      // 已编译模板间接走缓存
      if (cache[template]) {return cache[template]
      }
      const compiled = compile(template, options)
        return (cache[key] = compiled)
    }
  }
  return {
    compile,
    compileToFunctions: createCompileToFunctionFn(compile)
  }
}

能够看到次要的编译逻辑根本都在 baseCompile 办法内，次要分为三个步骤：

模板编译，将模板代码转化为 AST；
优化 AST，不便后续虚构 DOM 更新；
生成代码，将 AST 转化为可执行的代码；

const baseCompile = (template, options) => {
  // 解析 html，转化为 ast
  const ast = parse(template.trim(), options)
  // 优化 ast，标记动态节点
  optimize(ast, options)
  // 将 ast 转化为可执行代码
  const code = generate(ast, options)
  return {
    ast,
    render: code.render,
    staticRenderFns: code.staticRenderFns
  }
}

首先看到 parse 办法，该办法的次要作用就是解析 HTML，并转化为 AST（形象语法树），接触过 ESLint、Babel 的同学必定对 AST 不生疏，咱们能够先看看通过 parse 之后的 AST 长什么样。

上面是一段普普通通的 Vue 模板：

new Vue({
  el: '#app',
  template: `
    <div>
      <h2 v-if="message">{{message}}</h2>
      <button @click="showName">showName</button>
    </div>
  `,
  data: {
    name: 'shenfq',
    message: 'Hello Vue!'
  },
  methods: {showName() {alert(this.name)
    }
  }
})

通过 parse 之后的 AST：

AST 为一个树形构造的对象，每一层示意一个节点，第一层就是 div（tag: "div"）。div 的子节点都在 children 属性中，别离是 h2 标签、空行、button 标签。咱们还能够留神到有一个用来标记节点类型的属性：type，这里 div 的 type 为 1，示意是一个元素节点，type 一共有三种类型：

元素节点；
表达式；
文本；

在 h2 和 button 标签之间的空行就是 type 为 3 的文本节点，而 h2 标签下就是一个表达式节点。

parse 的整体逻辑较为简单，咱们能够先简化一下代码，看看 parse 的流程。

import {parseHTML} from './html-parser'

export function parse(template, options) {
  let root
  parseHTML(template, {
    // some options...
    start() {}, // 解析到标签地位开始的回调
    end() {}, // 解析到标签地位完结的回调
    chars() {}, // 解析到文本时的回调
    comment() {} // 解析到正文时的回调
  })
  return root
}

能够看到 parse 次要通过 parseHTML 进行工作，这个 parseHTML 自身来自于开源库：simple html parser，只不过通过了 Vue 团队的一些批改，修复了相干 issue。

上面咱们一起来理一理 parseHTML 的逻辑。

export function parseHTML(html, options) {
  let index = 0
  let last,lastTag
  const stack = []
  while(html) {
    last = html
    let textEnd = html.indexOf('<')

    // "<" 字符在以后 html 字符串开始地位
    if (textEnd === 0) {
      // 1、匹配到正文: <!-- -->
      if (/^<!\--/.test(html)) {const commentEnd = html.indexOf('-->')
        if (commentEnd >= 0) {
          // 调用 options.comment 回调，传入正文内容
          options.comment(html.substring(4, commentEnd))
          // 裁切掉正文局部
          advance(commentEnd + 3)
          continue
        }
      }

      // 2、匹配到条件正文: <![if !IE]>  <![endif]>
      if (/^<!\[/.test(html)) {// ... 逻辑与匹配到正文相似}

      // 3、匹配到 Doctype: <!DOCTYPE html>
      const doctypeMatch = html.match(/^<!DOCTYPE [^>]+>/i)
      if (doctypeMatch) {// ... 逻辑与匹配到正文相似}

      // 4、匹配到完结标签: </div>
      const endTagMatch = html.match(endTag)
      if (endTagMatch) {}

      // 5、匹配到开始标签: <div>
      const startTagMatch = parseStartTag()
      if (startTagMatch) {}}
    // "<" 字符在以后 html 字符串两头地位
    let text, rest, next
    if (textEnd > 0) {
      // 提取两头字符
      rest = html.slice(textEnd)
      // 这一部分当成文本处理
      text = html.substring(0, textEnd)
      advance(textEnd)
    }
    // "<" 字符在以后 html 字符串中不存在
    if (textEnd < 0) {
      text = html
      html = ''
    }
    
    // 如果存在 text 文本
    // 调用 options.chars 回调，传入 text 文本
    if (options.chars && text) {
      // 字符相干回调
      options.chars(text)
    }
  }
  // 向前推动，裁切 html
  function advance(n) {
    index += n
    html = html.substring(n)
  }
}

上述代码为简化后的 parseHTML，while 循环中每次截取一段 html 文本，而后通过正则判断文本的类型进行解决，这就相似于编译原理中罕用的无限状态机。每次拿到 "<" 字符前后的文本，"<" 字符前的就当做文本处理，"<" 字符后的通过正则判断，可推算出无限的几种状态。

其余的逻辑解决都不简单，次要是开始标签与完结标签，咱们先看看对于开始标签与完结标签相干的正则。

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)

这段正则看起来很长，然而理清之后也不是很难。这里举荐一个正则可视化工具。咱们到工具上看看 startTagOpen：

这里比拟纳闷的点就是为什么 tagName 会存在 :，这个是 XML 的命名空间，当初曾经很少应用了，咱们能够间接疏忽，所以咱们简化一下这个正则：

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const startTagOpen = new RegExp(`^<${ncname}`)
const startTagClose = /^\s*(\/?)>/
const endTag = new RegExp(`^<\\/${ncname}[^>]*>`)

除了下面对于标签开始和完结的正则，还有一段用来提取标签属性的正则，真的是又臭又长。

const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/

把正则放到工具上就高深莫测了，以 = 为分界，后面为属性的名字，前面为属性的值。

理清正则后能够更加不便咱们看前面的代码。

while(html) {
  last = html
  let textEnd = html.indexOf('<')

  // "<" 字符在以后 html 字符串开始地位
  if (textEnd === 0) {
    // some code ...

    // 4、匹配到标签完结地位: </div>
    const endTagMatch = html.match(endTag)
    if (endTagMatch) {
      const curIndex = index
      advance(endTagMatch[0].length)
      parseEndTag(endTagMatch[1], curIndex, index)
      continue
    }

    // 5、匹配到标签开始地位: <div>
    const startTagMatch = parseStartTag()
    if (startTagMatch) {handleStartTag(startTagMatch)
      continue
    }
  }
}
// 向前推动，裁切 html
function advance(n) {
  index += n
  html = html.substring(n)
}

// 判断是否标签开始地位，如果是，则提取标签名以及相干属性
function parseStartTag () {
  // 提取 <xxx
  const start = html.match(startTagOpen)
  if (start) {const [fullStr, tag] = start
    const match = {attrs: [],
      start: index,
      tagName: tag,
    }
    advance(fullStr.length)
    let end, attr
    // 递归提取属性，直到呈现 ">" 或 "/>" 字符
    while (!(end = html.match(startTagClose)) &&
      (attr = html.match(attribute))
    ) {advance(attr[0].length)
      match.attrs.push(attr)
    }
    if (end) {
      // 如果是 "/>" 示意单标签
      match.unarySlash = end[1]
      advance(end[0].length)
      match.end = index
      return match
    }
  }
}

// 解决开始标签
function handleStartTag (match) {
  const tagName = match.tagName
  const unary = match.unarySlash
  const len = match.attrs.length
  const attrs = new Array(len)
  for (let i = 0; i < l; i++) {const args = match.attrs[i]
    // 这里的 3、4、5 别离对应三种不同复制属性的形式
    // 3: attr="xxx" 双引号
    // 4: attr='xxx' 单引号
    // 5: attr=xxx   省略引号
    const value = args[3] || args[4] || args[5] || ''
    attrs[i] = {name: args[1],
      value
    }
  }

  if (!unary) {
    // 非单标签，入栈
    stack.push({
      tag: tagName,
      lowerCasedTag:
      tagName.toLowerCase(),
      attrs: attrs
    })
    lastTag = tagName
  }

  if (options.start) {
    // 开始标签的回调
    options.start(tagName, attrs, unary, match.start, match.end)
  }
}

// 解决闭合标签
function parseEndTag (tagName, start, end) {
  let pos, lowerCasedTagName
  if (start == null) start = index
  if (end == null) end = index

  if (tagName) {lowerCasedTagName = tagName.toLowerCase()
  }

  // 在栈内查找雷同类型的未闭合标签
  if (tagName) {for (pos = stack.length - 1; pos >= 0; pos--) {if (stack[pos].lowerCasedTag === lowerCasedTagName) {break}
    }
  } else {pos = 0}

  if (pos >= 0) {
    // 敞开该标签内的未闭合标签，更新堆栈
    for (let i = stack.length - 1; i >= pos; i--) {if (options.end) {
        // end 回调
        options.end(stack[i].tag, start, end)
      }
    }

    // 堆栈中删除已敞开标签
    stack.length = pos
    lastTag = pos && stack[pos - 1].tag
  }
}

在解析开始标签的时候，如果该标签不是单标签，会将该标签放入到一个堆栈当中，每次闭合标签的时候，会从栈顶向下查找同名标签，直到找到同名标签，这个操作会闭合同名标签下面的所有标签。接下来咱们举个例子：

<div>
  <h2>test</h2>
  <p>
  <p>
</div>

在解析了 div 和 h2 的开始标签后，栈内就存在了两个元素。h2 闭合后，就会将 h2 出栈。而后会解析两个未闭合的 p 标签，此时，栈内存在三个元素（div、p、p）。如果这个时候，解析了 div 的闭合标签，除了将 div 闭合外，div 内两个未闭合的 p 标签也会追随闭合，此时栈被清空。

为了便于了解，顺便录制了一个动图，如下：

理清了 parseHTML 的逻辑后，咱们回到调用 parseHTML 的地位，调用该办法的时候，一共会传入四个回调，别离对应标签的开始和完结、文本、正文。

parseHTML(template, {
  // some options...

  // 解析到标签地位开始的回调
  start(tag, attrs, unary) {},
  // 解析到标签地位完结的回调
  end(tag) {},
  // 解析到文本时的回调
  chars(text: string) {},
  // 解析到正文时的回调
  comment(text: string) {}})

首先看解析到开始标签时，会生成一个 AST 节点，而后解决标签上的属性，最初将 AST 节点放入树形构造中。

function makeAttrsMap(attrs) {const map = {}
  for (let i = 0, l = attrs.length; i < l; i++) {const { name, value} = attrs[i]
    map[name] = value
  }
  return map
}
function createASTElement(tag, attrs, parent) {
  const attrsList = attrs
  const attrsMap = makeAttrsMap(attrsList)
  return {
    type: 1,       // 节点类型
    tag,           // 节点名称
    attrsMap,      // 节点属性映射
    attrsList,     // 节点属性数组
    parent,        // 父节点
    children: [],  // 子节点}
}

const stack = []
let root // 根节点
let currentParent // 暂存以后的父节点
parseHTML(template, {
  // some options...

  // 解析到标签地位开始的回调
  start(tag, attrs, unary) {
    // 创立 AST 节点
    let element = createASTElement(tag, attrs, currentParent)

    // 解决指令: v-for v-if v-once
    processFor(element)
    processIf(element)
    processOnce(element)
    processElement(element, options)

    // 解决 AST 树
    // 根节点不存在，则设置该元素为根节点
       if (!root) {
      root = element
      checkRootConstraints(root)
    }
    // 存在父节点
    if (currentParent) {
      // 将该元素推入父节点的子节点中
      currentParent.children.push(element)
      element.parent = currentParent
    }
    if (!unary) {
        // 非单标签须要入栈，且切换以后父元素的地位
      currentParent = element
      stack.push(element)
    }
  }
})

标签完结的逻辑就比较简单了，只须要去除栈内最初一个未闭合标签，进行闭合即可。

parseHTML(template, {
  // some options...

  // 解析到标签地位完结的回调
  end() {const element = stack[stack.length - 1]
    const lastNode = element.children[element.children.length - 1]
    // 解决尾部空格的状况
    if (lastNode && lastNode.type === 3 && lastNode.text === ' ') {element.children.pop()
    }
    // 出栈，重置以后的父节点
    stack.length -= 1
    currentParent = stack[stack.length - 1]
  }
})

解决完标签后，还须要对标签内的文本进行解决。文本的解决分两种状况，一种是带表达式的文本，还一种就是纯动态的文本。

parseHTML(template, {
  // some options...

  // 解析到文本时的回调
  chars(text) {if (!currentParent) {
      // 文本节点外如果没有父节点则不解决
      return
    }
    
    const children = currentParent.children
    text = text.trim()
    if (text) {
      // parseText 用来解析表达式
      // delimiters 示意表达式标识符，默认为 ['{{', '}}']
      const res = parseText(text, delimiters))
      if (res) {
        // 表达式
        children.push({
          type: 2,
          expression: res.expression,
          tokens: res.tokens,
          text
        })
      } else {
        // 动态文本
        children.push({
          type: 3,
          text
        })
      }
    }
  }
})

上面咱们看看 parseText 如何解析表达式。

// 结构匹配表达式的正则
const buildRegex = delimiters => {const open = delimiters[0]
  const close = delimiters[1]
  return new RegExp(open + '((?:.|\\n)+?)' + close, 'g')
}

function parseText (text, delimiters){// delimiters 默认为 {{}}
  const tagRE = buildRegex(delimiters || ['{{', '}}'])
  // 未匹配到表达式，间接返回
  if (!tagRE.test(text)) {return}
  const tokens = []
  const rawTokens = []
  let lastIndex = tagRE.lastIndex = 0
  let match, index, tokenValue
  while ((match = tagRE.exec(text))) {
    // 表达式开始的地位
    index = match.index
    // 提取表达式开始地位后面的动态字符，放入 token 中
    if (index > lastIndex) {rawTokens.push(tokenValue = text.slice(lastIndex, index))
      tokens.push(JSON.stringify(tokenValue))
    }
    // 提取表达式外部的内容，应用 _s() 办法包裹
    const exp = match[1].trim()
    tokens.push(`_s(${exp})`)
    rawTokens.push({'@binding': exp})
    lastIndex = index + match[0].length
  }
  // 表达式前面还有其余动态字符，放入 token 中
  if (lastIndex < text.length) {rawTokens.push(tokenValue = text.slice(lastIndex))
    tokens.push(JSON.stringify(tokenValue))
  }
  return {expression: tokens.join('+'),
    tokens: rawTokens
  }
}

首先通过一段正则来提取表达式：

看代码可能有点难，咱们间接看例子，这里有一个蕴含表达式的文本。

<div> 是否登录：{{isLogin ? '是' : '否'}}</div>

通过上述一些列解决，咱们就失去了 Vue 模板的 AST。因为 Vue 是响应式设计，所以拿到 AST 之后还须要进行一系列优化，确保动态的数据不会进入虚构 DOM 的更新阶段，以此来优化性能。

export function optimize (root, options) {if (!root) return
  // 标记动态节点
  markStatic(root)
}

简略来说，就是把所以动态节点的 static 属性设置为 true。

function isStatic (node) {if (node.type === 2) { // 表达式，返回 false
    return false
  }
  if (node.type === 3) { // 动态文本，返回 true
    return true
  }
  // 此处省略了局部条件
  return !!(
    !node.hasBindings && // 没有动静绑定
    !node.if && !node.for && // 没有 v-if/v-for
    !isBuiltInTag(node.tag) && // 不是内置组件 slot/component
    !isDirectChildOfTemplateFor(node) && // 不在 template for 循环内
    Object.keys(node).every(isStaticKey) // 非动态节点
  )
}

function markStatic (node) {node.static = isStatic(node)
  if (node.type === 1) {
    // 如果是元素节点，须要遍历所有子节点
    for (let i = 0, l = node.children.length; i < l; i++) {const child = node.children[i]
      markStatic(child)
      if (!child.static) {
        // 如果有一个子节点不是动态节点，则该节点也必须是动静的
        node.static = false
      }
    }
  }
}

失去优化的 AST 之后，就须要将 AST 转化为 render 办法。还是用之前的模板，先看看生成的代码长什么样：

<div>
  <h2 v-if="message">{{message}}</h2>
  <button @click="showName">showName</button>
</div>

{render: "with(this){return _c('div',[(message)?_c('h2',[_v(_s(message))]):_e(),_v(" "),_c('button',{on:{"click":showName}},[_v("showName")])])}"
}

将生成的代码开展：

with (this) {
    return _c(
      'div',
      [(message) ? _c('h2', [_v(_s(message))]) : _e(),
        _v(' '),
        _c('button', { on: { click: showName} }, [_v('showName')])
      ])
    ;
}

看到这里一堆的下划线必定很懵逼，这里的 _c 对应的是虚构 DOM 中的 createElement 办法。其余的下划线办法在 core/instance/render-helpers 中都有定义，每个办法具体做了什么不做开展。

具体转化办法就是一些简略的字符拼接，上面是简化了逻辑的局部，不做过多讲述。

export function generate(ast, options) {const state = new CodegenState(options)
  const code = ast ? genElement(ast, state) : '_c("div")'
  return {render: `with(this){return ${code}}`,
    staticRenderFns: state.staticRenderFns
  }
}

export function genElement (el, state) {
  let code
  const data = genData(el, state)
  const children = genChildren(el, state, true)
  code = `_c('${el.tag}'${data ? `,${data}` : '' // data
  }${children ? `,${children}` : '' // children
  })`
  return code
}

理清了 Vue 模板编译的整个过程，重点都放在了解析 HTML 生成 AST 的局部。本文只是大抵讲述了次要流程，其中省略了特地多的细节，比方：对 template/slot 的解决、指令的解决等等，如果想理解其中的细节能够间接浏览源码。心愿大家在浏览这篇文章后有所播种。

关于前端:Vue模板编译原理

写在结尾

Vue 的版本

编译入口

主流程

parse

AST

解析 HTML

解决开始标签

解决完结标签

解决文本

optimize

generate

总结