译文起源

    欢送浏览如何应用 TypeScript, React, ANTLR4, Monaco Editor 创立一个自定义 Web 编辑器系列的第二章节, 在这之前建议您浏览应用 TypeScript, React, ANTLR4, Monaco Editor 创立一个自定义 Web 编辑器(一)

    在本文中, 我将介绍如何实现语言服务, 语言服务在编辑器中次要用来解析键入文本的沉重工作, 咱们将应用通过Parser生成的形象语法树(AST)来查找语法或词法谬误, 格局文本, 针对用户键入文本对TODOS语法做只能提醒(本文中我不会实现语法主动实现), 基本上, 语言服务裸露如下函数:

  • format(code: string): string
  • validate(code: string): Errors[]
  • autoComplete(code: string, currentPosition: Position): string[]

Add ANTLER, Generate Lexer and Parser From the Grammar

我将引入ANTLR库并减少一个依据TODOLang.g4 语法文件生ParserLexer的脚本, 首先引入两个必须的库:antlr4tsantlr4ts-cli,  antlr4 Typescript 指标生成的解析器对antlr4ts包有运行时依赖, 另一方面, 顾名思义antlr4ts-cli 就是CLI咱们将应用它生成该语言的ParserLexer

npm add antlr4tsnpm add -D antlr4ts-cli

在根门路创立蕴含TodoLang语法规定的文件TodoLangGrammar.g4

grammar TodoLangGrammar;todoExpressions : (addExpression)* (completeExpression)*;addExpression : ADD TODO STRING;completeExpression : COMPLETE TODO STRING;ADD : 'ADD';TODO : 'TODO';COMPLETE: 'COMPLETE';STRING: '"' ~ ["]* '"';EOL: [\r\n] + -> skip;WS: [ \t] -> skip;

当初咱们在package.json文件里减少通过antlr-cli生成ParserLexer的脚本

"antlr4ts": "antlr4ts ./TodoLangGrammar.g4 -o ./src/ANTLR"

让咱们执行一下antlr4ts脚本,就能够在./src/ANTLR目录看到生成的解析器的typescript源码了

npm run antlr4ts

正如咱们看到的那样, 这里有一个LexerParser, 如果你查看Parser文件, 你会发现它导出 TodoLangGrammarParser类, 该类有个构造函数constructor(input: TokenStream), 该构造函数将TodoLangGrammarLexer为给定代码生成的TokenStream作为参数,  TodoLangGrammarLexer 有一个以代码作为入参的构造函数 constructor(input: CharStream)

Parser文件蕴含了public todoExpressions(): TodoExpressionsContext办法,该办法会返回代码中定义的所有TodoExpressions的上下文对象, 猜测一下TodoExpressions在哪里能够追踪到,其实它是源于咱们语法规定文件的第一行语法规定:

todoExpressions : (addExpression)* (completeExpression)*;

TodoExpressionsContextAST的根基, 其中的每个节点都是另一个规定的另一个上下文, 它蕴含了终端和节点上下文,终端领有最终令牌(ADD 令牌, TODO 令牌, todo 事项名称的令牌)

TodoExpressionsContext蕴含了addExpressionscompleteExpressions表达式列表, 来源于以下三条规定

todoExpressions : (addExpression)* (completeExpression)*; addExpression : ADD TODO STRING;completeExpression : COMPLETE TODO STRING;

另一方面, 每个上下文类都蕴含了终端节点, 它根本蕴含以下文本(代码段或者令牌, 例如:ADD, COMPLETE, 代表 TODO 的字符串), AST的复杂度取决于你编写的语法规定

让咱们来看看TodoExpressionsContext, 它蕴含了ADD, TODOSTRING终端节点, 对应的规定如:

addExpression : ADD TODO STRING;

STRING终端节点保留了咱们要加的Todo文本内容, 先来解析一个简略的TodoLang代码以来理解AST如何工作的,在./src/language-service目录建一个蕴含以下内容的文件parser.ts

import { TodoLangGrammarParser, TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";import { TodoLangGrammarLexer } from "../ANTLR/TodoLangGrammarLexer";import { ANTLRInputStream, CommonTokenStream } from "antlr4ts";export default function parseAndGetASTRoot(code: string): TodoExpressionsContext {    const inputStream = new ANTLRInputStream(code);    const lexer = new TodoLangGrammarLexer(inputStream);    const tokenStream = new CommonTokenStream(lexer);    const parser = new TodoLangGrammarParser(tokenStream);    // Parse the input, where `compilationUnit` is whatever entry point you defined    return parser.todoExpressions();}

parser.ts文件导出了parseAndGetASTRoot(code)办法, 它承受TodoLang代码并且生成相应的AST, 解析以下TodoLang代码:

parseAndGetASTRoot(`ADD TODO "Create an editor"COMPLETE TODO "Create an editor"`)

Implementing Lexical and Syntax Validation

在本节中, 我将疏导您逐渐理解如何向编辑器增加语法验证, ANTLR开箱即用为咱们生成词汇和语法错误, 咱们只须要实现ANTLRErrorListner类并将其提供给LexerParser, 这样咱们就能够在 ANTLR解析代码时收集谬误

./src/language-service目录下创立TodoLangErrorListener.ts文件, 文件导出实现ANTLRErrorListner接口的TodoLangErrorListener

import { ANTLRErrorListener, RecognitionException, Recognizer } from "antlr4ts";export interface ITodoLangError {    startLineNumber: number;    startColumn: number;    endLineNumber: number;    endColumn: number;    message: string;    code: string;}export default class TodoLangErrorListener implements ANTLRErrorListener<any>{    private errors: ITodoLangError[] = []    syntaxError(recognizer: Recognizer<any, any>, offendingSymbol: any, line: number, charPositionInLine: number, message: string, e: RecognitionException | undefined): void {                this.errors.push(            {                startLineNumber:line,                endLineNumber: line,                startColumn: charPositionInLine,                endColumn: charPositionInLine+1,//Let's suppose the length of the error is only 1 char for simplicity                message,                code: "1" // This the error code you can customize them as you want            }        )    }    getErrors(): ITodoLangError[] {        return this.errors;    }}

每次 ANTLR 在代码解析期间遇到谬误时, 它将调用此TodoLangErrorListener, 以向其提供无关谬误的信息, 该监听器会返回蕴含解析产生谬误的代码地位极错误信息, 当初咱们尝试把TodoLangErrorListener绑定到parser.ts的文件的LexerParser里, eg:

import { TodoLangGrammarParser, TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";import { TodoLangGrammarLexer } from "../ANTLR/TodoLangGrammarLexer";import { ANTLRInputStream, CommonTokenStream } from "antlr4ts";import TodoLangErrorListener, { ITodoLangError } from "./TodoLangErrorListener";function parse(code: string): {ast:TodoExpressionsContext, errors: ITodoLangError[]} {    const inputStream = new ANTLRInputStream(code);    const lexer = new TodoLangGrammarLexer(inputStream);    lexer.removeErrorListeners()    const todoLangErrorsListner = new TodoLangErrorListener();    lexer.addErrorListener(todoLangErrorsListner);    const tokenStream = new CommonTokenStream(lexer);    const parser = new TodoLangGrammarParser(tokenStream);    parser.removeErrorListeners();    parser.addErrorListener(todoLangErrorsListner);    const ast =  parser.todoExpressions();    const errors: ITodoLangError[]  = todoLangErrorsListner.getErrors();    return {ast, errors};}export function parseAndGetASTRoot(code: string): TodoExpressionsContext {    const {ast} = parse(code);    return ast;}export function parseAndGetSyntaxErrors(code: string): ITodoLangError[] {    const {errors} = parse(code);    return errors;}

./src/language-service目录下创立LanguageService.ts, 以下是它导出的内容

import { TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";import { parseAndGetASTRoot, parseAndGetSyntaxErrors } from "./Parser";import { ITodoLangError } from "./TodoLangErrorListener";export default class TodoLangLanguageService {    validate(code: string): ITodoLangError[] {        const syntaxErrors: ITodoLangError[] = parseAndGetSyntaxErrors(code);        //Later we will append semantic errors        return syntaxErrors;    }}

不错, 咱们实现了编辑器谬误解析, 为此我将要创立上篇文章探讨过的web worker, 并且增加worker服务代理, 该代理将调用语言服务区实现编辑器的高级性能

Creating the web worker

首先, 咱们调用 monaco.editor.createWebWorker 来应用内置的 ES6 Proxies 创立代理TodoLangWorker, TodoLangWorker将应用语言服务来执行编辑器性能,在web worker中执行的那些办法将由monaco代理,因而在web worker中调用办法仅是在主线程中调用被代理的办法。

./src/todo-lang文件夹下创立TodoLangWorker.ts蕴含以下内容:

import * as monaco from "monaco-editor-core";import IWorkerContext = monaco.worker.IWorkerContext;import TodoLangLanguageService from "../language-service/LanguageService";import { ITodoLangError } from "../language-service/TodoLangErrorListener";export class TodoLangWorker {    private _ctx: IWorkerContext;    private languageService: TodoLangLanguageService;    constructor(ctx: IWorkerContext) {        this._ctx = ctx;        this.languageService = new TodoLangLanguageService();    }    doValidation(): Promise<ITodoLangError[]> {        const code = this.getTextDocument();        return Promise.resolve(this.languageService.validate(code));    }      private getTextDocument(): string {        const model = this._ctx.getMirrorModels()[0];        return model.getValue();    }

咱们创立了language service实例 并且增加了doValidation办法, 进一步它会调用language servicevalidate办法, 还增加了getTextDocument办法, 该办法用来获取编辑器的文本值, TodoLangWorker类还能够扩大很多性能如果你想要反对多文件编辑等, _ctx: IWorkerContext 是编辑器的上下文对象, 它保留了文件的 model 信息

当初让咱们在./src/todo-lang目录下创立 web worker 文件todolang.worker.ts

import * as worker from 'monaco-editor-core/esm/vs/editor/editor.worker';import { TodoLangWorker } from './todoLangWorker';self.onmessage = () => {    worker.initialize((ctx) => {        return new TodoLangWorker(ctx)    });};

咱们应用内置的worker.initialize初始化咱们的 worker,并应用TodoLangWorker进行必要的办法代理

那是一个web worker, 因而咱们必须让webpack输入对应的worker文件

// webpack.config.jsentry: {        app: './src/index.tsx',        "editor.worker": 'monaco-editor-core/esm/vs/editor/editor.worker.js',        "todoLangWorker": './src/todo-lang/todolang.worker.ts'    },    output: {        globalObject: 'self',        filename: (chunkData) => {            switch (chunkData.chunk.name) {                case 'editor.worker':                    return 'editor.worker.js';                case 'todoLangWorker':                    return "todoLangWorker.js"                default:                    return 'bundle.[hash].js';            }        },        path: path.resolve(__dirname, 'dist')    }

咱们命名worker文件为todoLangWorker.js文件, 当初咱们在编辑器启动函数外面减少getWorkUrl

 (window as any).MonacoEnvironment = {        getWorkerUrl: function (moduleId, label) {            if (label === languageID)                return "./todoLangWorker.js";            return './editor.worker.js';        }    }

这是 monaco 如何获取web worker的 URL 的办法,  请留神, 如果worker的 label 是TodoLang的 ID, 咱们将返回用于在 Webpack 中打包输入的同名worker, 如果当初构建我的项目, 则可能会发现有一个名为todoLangWorker.js的文件(或者在 dev-tools 中, 您将在线程局部中找到两个worker

当初创立一个用来治理worker创立和获取代理worker客户端的 WorkerManager

import * as monaco from "monaco-editor-core";import Uri = monaco.Uri;import { TodoLangWorker } from './todoLangWorker';import { languageID } from './config';export class WorkerManager {    private worker: monaco.editor.MonacoWebWorker<TodoLangWorker>;    private workerClientProxy: Promise<TodoLangWorker>;    constructor() {        this.worker = null;    }    private getClientproxy(): Promise<TodoLangWorker> {        if (!this.workerClientProxy) {            this.worker = monaco.editor.createWebWorker<TodoLangWorker>({                moduleId: 'TodoLangWorker',                label: languageID,                createData: {                    languageId: languageID,                }            });            this.workerClientProxy = <Promise<TodoLangWorker>><any>this.worker.getProxy();        }        return this.workerClientProxy;    }    async getLanguageServiceWorker(...resources: Uri[]): Promise<TodoLangWorker> {        const _client: TodoLangWorker = await this.getClientproxy();        await this.worker.withSyncedResources(resources)        return _client;    }}

咱们应用createWebWorker创立monaco代理的web worker, 其次咱们获取返回了代理的客户端对象, 咱们应用workerClientProxy调用代理的一些办法, 让咱们创立DiagnosticsAdapter类, 该类用来连贯 Monaco 标记 Api 和语言服务返回的 error,为了让解析的谬误正确的标记在monaco

import * as monaco from "monaco-editor-core";import { WorkerAccessor } from "./setup";import { languageID } from "./config";import { ITodoLangError } from "../language-service/TodoLangErrorListener";export default class DiagnosticsAdapter {    constructor(private worker: WorkerAccessor) {        const onModelAdd = (model: monaco.editor.IModel): void => {            let handle: any;            model.onDidChangeContent(() => {                // here we are Debouncing the user changes, so everytime a new change is done, we wait 500ms before validating                // otherwise if the user is still typing, we cancel the                clearTimeout(handle);                handle = setTimeout(() => this.validate(model.uri), 500);            });            this.validate(model.uri);        };        monaco.editor.onDidCreateModel(onModelAdd);        monaco.editor.getModels().forEach(onModelAdd);    }    private async validate(resource: monaco.Uri): Promise<void> {        const worker = await this.worker(resource)        const errorMarkers = await worker.doValidation();        const model = monaco.editor.getModel(resource);        monaco.editor.setModelMarkers(model, languageID, errorMarkers.map(toDiagnostics));    }}function toDiagnostics(error: ITodoLangError): monaco.editor.IMarkerData {    return {        ...error,        severity: monaco.MarkerSeverity.Error,    };}

onDidChangeContent监听器监听model信息, 如果model信息变更, 咱们将每隔 500ms 调用webworker去验证代码并且减少谬误标记;setModelMarkers告诉monaco减少谬误标记, 为了使得编辑器语法验证性能实现,请确保在setup函数中调用它们,并留神咱们正在应用WorkerManager来获取代理worker

monaco.languages.onLanguage(languageID, () => {        monaco.languages.setMonarchTokensProvider(languageID, monarchLanguage);        monaco.languages.setLanguageConfiguration(languageID, richLanguageConfiguration);        const client = new WorkerManager();        const worker: WorkerAccessor = (...uris: monaco.Uri[]): Promise<TodoLangWorker> => {            return client.getLanguageServiceWorker(...uris);        };        //Call the errors provider        new DiagnosticsAdapter(worker);    });}export type WorkerAccessor = (...uris: monaco.Uri[]) => Promise<TodoLangWorker>;

当初所有准备就绪, 运行我的项目并且输出谬误的TodoLang代码, 你会发现错误被标记在代码上面

Implementing Semantic Validation

当初往编辑器减少语义校验, 记得我在上篇文章提到的两个语义规定

  • 如果应用 ADD TODO 阐明定义了 TODO ,咱们能够从新增加它。
  • 在 TODO 中利用中,COMPLETE 指令不应在尚未应用申明 ADD TODO 前

要查看是否定义了 TODO,咱们要做的就是遍历 AST 以获取每个 ADD 表达式并将其推入definedTodos .而后咱们在definedTodos中查看 TODO 的存在. 如果存在, 则是语义谬误, 因而请从 ADD 表达式的上下文中获取谬误的地位, 而后将谬误推送到数组中, 第二条规定也是如此

function checkSemanticRules(ast: TodoExpressionsContext): ITodoLangError[] {    const errors: ITodoLangError[] = [];    const definedTodos: string[] = [];    ast.children.forEach(node => {        if (node instanceof AddExpressionContext) {            // if a Add expression : ADD TODO "STRING"            const todo = node.STRING().text;            // If a TODO is defined using ADD TODO instruction, we can re-add it.            if (definedTodos.some(todo_ => todo_ === todo)) {                // node has everything to know the position of this expression is in the code                errors.push({                    code: "2",                    endColumn: node.stop.charPositionInLine + node.stop.stopIndex - node.stop.stopIndex,                    endLineNumber: node.stop.line,                    message: `Todo ${todo} already defined`,                    startColumn: node.stop.charPositionInLine,                    startLineNumber: node.stop.line                });            } else {                definedTodos.push(todo);            }        }else if(node instanceof CompleteExpressionContext) {            const todoToComplete = node.STRING().text;            if(definedTodos.every(todo_ => todo_ !== todoToComplete)){                // if the the todo is not yet defined, here we are only checking the predefined todo until this expression                // which means the order is important                errors.push({                    code: "2",                    endColumn: node.stop.charPositionInLine + node.stop.stopIndex - node.stop.stopIndex,                    endLineNumber: node.stop.line,                    message: `Todo ${todoToComplete} is not defined`,                    startColumn: node.stop.charPositionInLine,                    startLineNumber: node.stop.line                });            }        }    })    return errors;}

当初调用checkSemanticRules函数, 在language servicevalidate办法中将语义和语法错误合并返回, 当初咱们编辑器曾经反对语义校验

Implementing Auto-Formatting

对于编辑器的主动格式化性能, 您须要通过调用Monaco API registerDocumentFormattingEditProvider提供并注册 Monaco 的格式化提供程序. 查看 monaco-editor 文档以获取更多详细信息. 调用并遍历 AST 将为你展现丑化后的代码

// languageService.ts   format(code: string): string{        // if the code contains errors, no need to format, because this way of formating the code, will remove some of the code        // to make things simple, we only allow formatting a valide code        if(this.validate(code).length > 0)            return code;        let formattedCode = "";        const ast: TodoExpressionsContext = parseAndGetASTRoot(code);        ast.children.forEach(node => {            if (node instanceof AddExpressionContext) {                // if a Add expression : ADD TODO "STRING"                const todo = node.STRING().text;                formattedCode += `ADD TODO ${todo}\n`;            }else if(node instanceof CompleteExpressionContext) {                // If a Complete expression: COMPLETE TODO "STRING"                const todoToComplete = node.STRING().text;                formattedCode += `COMPLETE TODO ${todoToComplete}\n`;            }        });        return formattedCode;    }

todoLangWorker中增加format办法, 该format办法会应用language serviceformat办法

当初创立TodoLangFomattingProvider类去实现`DocumentFormattingEditProvider接口

import * as monaco from "monaco-editor-core";import { WorkerAccessor } from "./setup";export default class TodoLangFormattingProvider implements monaco.languages.DocumentFormattingEditProvider {    constructor(private worker: WorkerAccessor) {    }    provideDocumentFormattingEdits(model: monaco.editor.ITextModel, options: monaco.languages.FormattingOptions, token: monaco.CancellationToken): monaco.languages.ProviderResult<monaco.languages.TextEdit[]> {        return this.format(model.uri, model.getValue());    }    private async format(resource: monaco.Uri, code: string): Promise<monaco.languages.TextEdit[]> {        // get the worker proxy        const worker = await this.worker(resource)        // call the validate methode proxy from the langaueg service and get errors        const formattedCode = await worker.format(code);        const endLineNumber = code.split("\n").length + 1;        const endColumn = code.split("\n").map(line => line.length).sort((a, b) => a - b)[0] + 1;        console.log({ endColumn, endLineNumber, formattedCode, code })        return [            {                text: formattedCode,                range: {                    endColumn,                    endLineNumber,                    startColumn: 0,                    startLineNumber: 0                }            }        ]    }}

TodoLangFormattingProvider通过调用worker提供的format办法, 并借助editor.getValue()作为入参, 并且向monaco提供各式后的代码及想要替换的代码范畴, 当初进入setup函数并且应用Monaco registerDocumentFormattingEditProvider API注册formatting provider,  重跑利用,  你能看到编辑器已反对主动格式化了

monaco.languages.registerDocumentFormattingEditProvider(languageID, new TodoLangFormattingProvider(worker));

尝试点击Format documentShift + Alt + F, 你能看到如图的成果:

Implementing Auto-Completion

若要使主动实现反对定义的 TODO, 您要做的就是从 AST 获取所有定义的 TODO, 并提供completion provider通过在setup中调用registerCompletionItemProvidercompletion provider为您提供代码和光标的以后地位,因而您能够检查用户正在键入的上下文,如果他们在残缺的表达式中键入 TODO,则能够倡议预约义的 TO DOs。 请记住,默认状况下,Monaco-editor 反对对代码中的预约义标记进行主动补全,您可能须要禁用该性能并实现本人的标记以使其更加智能化和高低文化

译者信息