共计 16455 个字符,预计需要花费 42 分钟才能阅读完成。
规则引擎 RulerZ 用法及实现原理解读
废话不多说,rulerz 的官方地址是:https://github.com/K-Phoen/ru…
注意,本例中只拿普通数组做例子进行分析
1. 简介
RulerZ 是一个用 php 实现的 composer 依赖包,目的是实现一个数据过滤规则引擎。RulerZ 不仅支持数组过滤,也支持一些市面上常见的 ORM,如 Eloquent、Doctrine 等,也支持 Solr 搜索引擎。这是一个缺少中文官方文档的开源包,当然由于 star 数比较少,可能作者也觉得没必要。
2. 安装
在你的项目 composer.json 所在目录下运行:
composer require ‘kphoen/rulerz’
3. 使用 – 过滤
现有数组如下:
$players = [
[‘pseudo’ => ‘Joe’, ‘fullname’ => ‘Joe la frite’, ‘gender’ => ‘M’, ‘points’ => 2500],
[‘pseudo’ => ‘Moe’, ‘fullname’ => ‘Moe, from the bar!’, ‘gender’ => ‘M’, ‘points’ => 1230],
[‘pseudo’ => ‘Alice’, ‘fullname’ => ‘Alice, from… you know.’, ‘gender’ => ‘F’, ‘points’ => 9001],
];
初始化引擎:
use RulerZ\Compiler\Compiler;
use RulerZ\Target;
use RulerZ\RulerZ;
// compiler
$compiler = Compiler::create();
// RulerZ engine
$rulerz = new RulerZ(
$compiler, [
new Target\Native\Native([// 请注意,这里是添加目标编译器,处理数组类型的数据源时对应的是 Native
‘length’ => ‘strlen’
]),
]
);
创建一条规则:
$rule = “gender = :gender and points > :min_points’
将参数和规则交给引擎分析。
$parameters = [
‘min_points’ => 30,
‘gender’ => ‘F’,
];
$result = iterator_to_array(
$rulerz->filter($players, $rule, $parameters) // the parameters can be omitted if empty
);
// result 是一个过滤后的数组
array:1 [▼
0 => array:4 [▼
“pseudo” => “Alice”
“fullname” => “Alice, from… you know.”
“gender” => “F”
“points” => 9001
]
]
4. 使用 – 判断是否满足规则
$rulerz->satisfies($player, $rule, $parameters);
// 返回布尔值,true 表示满足
5. 底层代码解读
下面,让我们看看从创建编译器开始,到最后出结果的过程中发生了什么。1.Compiler::create(); 这一步是实例化一个 FileEvaluator 类,这个类默认会将本地的系统临时目录当做下一步临时类文件读写所在目录,文件类里包含一个 has()方法和一个 write()方法。文件类如下:
<?php
declare(strict_types=1);
namespace RulerZ\Compiler;
class NativeFilesystem implements Filesystem
{
public function has(string $filePath): bool
{
return file_exists($filePath);
}
public function write(string $filePath, string $content): void
{
file_put_contents($filePath, $content, LOCK_EX);
}
}
2. 初始化 RulerZ 引擎,new RulerZ()先看一下 RulerZ 的构建方法:
public function __construct(Compiler $compiler, array $compilationTargets = [])
{
$this->compiler = $compiler;
foreach ($compilationTargets as $targetCompiler) {
$this->registerCompilationTarget($targetCompiler);
}
}
这里的第一个参数,就是刚刚的编译器类,第二个是目标编译器类(实际处理数据源的),因为我们选择的是数组,所以这里的目标编译器是 Native,引擎会将这个目标编译类放到自己的属性 $compilationTargets。
public function registerCompilationTarget(CompilationTarget $compilationTarget): void
{
$this->compilationTargets[] = $compilationTarget;
}
3. 运用 filter 或 satisfies 方法
这一点便是核心了。以 filter 为例:
public function filter($target, string $rule, array $parameters = [], array $executionContext = [])
{
$targetCompiler = $this->findTargetCompiler($target, CompilationTarget::MODE_FILTER);
$compilationContext = $targetCompiler->createCompilationContext($target);
$executor = $this->compiler->compile($rule, $targetCompiler, $compilationContext);
return $executor->filter($target, $parameters, $targetCompiler->getOperators()->getOperators(), new ExecutionContext($executionContext));
}
第一步会检查目标编译器是否支持筛选模式。第二步创建编译上下文,这个一般统一是 Context 类实例
public function createCompilationContext($target): Context
{
return new Context();
}
第三步,执行 compiler 的 compile()方法
public function compile(string $rule, CompilationTarget $target, Context $context): Executor
{
$context[‘rule_identifier’] = $this->getRuleIdentifier($target, $context, $rule);
$context[‘executor_classname’] = ‘Executor_’.$context[‘rule_identifier’];
$context[‘executor_fqcn’] = ‘\RulerZ\Compiled\Executor\\Executor_’.$context[‘rule_identifier’];
if (!class_exists($context[‘executor_fqcn’], false)) {
$compiler = function () use ($rule, $target, $context) {
return $this->compileToSource($rule, $target, $context);
};
$this->evaluator->evaluate($context[‘rule_identifier’], $compiler);
}
return new $context[‘executor_fqcn’]();
}
protected function getRuleIdentifier(CompilationTarget $compilationTarget, Context $context, string $rule): string
{
return hash(‘crc32b’, get_class($compilationTarget).$rule.$compilationTarget->getRuleIdentifierHint($rule, $context));
}
protected function compileToSource(string $rule, CompilationTarget $compilationTarget, Context $context): string
{
$ast = $this->parser->parse($rule);
$executorModel = $compilationTarget->compile($ast, $context);
$flattenedTraits = implode(PHP_EOL, array_map(function ($trait) {
return “\t”.’use \\’.ltrim($trait, ‘\\’).’;’;
}, $executorModel->getTraits()));
$extraCode = ”;
foreach ($executorModel->getCompiledData() as $key => $value) {
$extraCode .= sprintf(‘private $%s = %s;’.PHP_EOL, $key, var_export($value, true));
}
$commentedRule = str_replace(PHP_EOL, PHP_EOL.’ // ‘, $rule);
return <<<EXECUTOR
namespace RulerZ\Compiled\Executor;
use RulerZ\Executor\Executor;
class {$context[‘executor_classname’]} implements Executor
{
$flattenedTraits
$extraCode
// $commentedRule
protected function execute(\$target, array \$operators, array \$parameters)
{
return {$executorModel->getCompiledRule()};
}
}
EXECUTOR;
}
这段代码会依照 crc13 算法生成一个哈希串和 Executor 拼接作为执行器临时类的名称,并将执行器相关代码写进上文提到的临时目录中去。生成的代码如下:
// /private/var/folders/w_/sh4r42wn4_b650l3pc__fh7h0000gp/T/rulerz_executor_ff2800e8
<?php
namespace RulerZ\Compiled\Executor;
use RulerZ\Executor\Executor;
class Executor_ff2800e8 implements Executor
{
use \RulerZ\Executor\ArrayTarget\FilterTrait;
use \RulerZ\Executor\ArrayTarget\SatisfiesTrait;
use \RulerZ\Executor\ArrayTarget\ArgumentUnwrappingTrait;
// gender = :gender and points > :min_points and points > :min_points
protected function execute($target, array $operators, array $parameters)
{
return ($this->unwrapArgument($target[“gender”]) == $parameters[“gender”] && ($this->unwrapArgument($target[“points”]) > $parameters[“min_points”] && $this->unwrapArgument($target[“points”]) > $parameters[“min_points”]));
}
}
这个临时类文件就是最后要执行过滤动作的类。FilterTrait 中的 filter 方法是首先被执行的,里面会根据 execute 返回的布尔值来判断,是否通过迭代器返回符合条件的行。execute 方法就是根据具体的参数和操作符挨个判断每行中对应的 cell 是否符合判断来返回 true/false。
public function filter($target, array $parameters, array $operators, ExecutionContext $context)
{
return IteratorTools::fromGenerator(function () use ($target, $parameters, $operators) {
foreach ($target as $row) {
$targetRow = is_array($row) ? $row : new ObjectContext($row);
if ($this->execute($targetRow, $operators, $parameters)) {
yield $row;
}
}
});
}
satisfies 和 filter 基本逻辑类似,只是最后 satisfies 是执行单条判断。
有一个问题,我们的编译器是如何知道我们设立的操作规则 $rule 的具体含义的,如何 parse 的?这就涉及另一个问题了,抽象语法树(AST)。
Go further – 抽象语法树
我们都知道 php zend 引擎在解读代码的过程中有一个过程是语法和词法分析,这个过程叫做 parser,中间会将代码转化为抽象语法树,这是引擎能够读懂代码的关键步骤。
同样,我们在写一条规则字符串的时候,代码如何能够明白我们写的是什么呢?那就是抽象语法树。
以上面的规则为例:
gender = :gender and points > :min_points
这里,=、and、> 都是操作符,但是机器并不知道他们是操作符,也不知道其他字段是什么含义。
于是 rulerz 使用自己的语法模板。
首先是默认定义了几个操作符。
<?php
declare(strict_types=1);
namespace RulerZ\Target\Native;
use RulerZ\Target\Operators\Definitions;
class NativeOperators
{
public static function create(Definitions $customOperators): Definitions
{
$defaultInlineOperators = [
‘and’ => function ($a, $b) {
return sprintf(‘(%s && %s)’, $a, $b);
},
‘or’ => function ($a, $b) {
return sprintf(‘(%s || %s)’, $a, $b);
},
‘not’ => function ($a) {
return sprintf(‘!(%s)’, $a);
},
‘=’ => function ($a, $b) {
return sprintf(‘%s == %s’, $a, $b);
},
‘is’ => function ($a, $b) {
return sprintf(‘%s === %s’, $a, $b);
},
‘!=’ => function ($a, $b) {
return sprintf(‘%s != %s’, $a, $b);
},
‘>’ => function ($a, $b) {
return sprintf(‘%s > %s’, $a, $b);
},
‘>=’ => function ($a, $b) {
return sprintf(‘%s >= %s’, $a, $b);
},
‘<‘ => function ($a, $b) {
return sprintf(‘%s < %s’, $a, $b);
},
‘<=’ => function ($a, $b) {
return sprintf(‘%s <= %s’, $a, $b);
},
‘in’ => function ($a, $b) {
return sprintf(‘in_array(%s, %s)’, $a, $b);
},
];
$defaultOperators = [
‘sum’ => function () {
return array_sum(func_get_args());
},
];
$definitions = new Definitions($defaultOperators, $defaultInlineOperators);
return $definitions->mergeWith($customOperators);
}
}
在 RulerZParserParser 中,有如下方法:
public function parse($rule)
{
if ($this->parser === null) {
$this->parser = Compiler\Llk::load(
new File\Read(__DIR__.’/../Grammar.pp’)
);
}
$this->nextParameterIndex = 0;
return $this->visit($this->parser->parse($rule));
}
这里要解读一个核心语法文件:
//
// Hoa
//
//
// @license
//
// New BSD License
//
// Copyright © 2007-2015, Ivan Enderlin. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// * Neither the name of the Hoa nor the names of its contributors may be
// used to endorse or promote products derived from this software without
// specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS AND CONTRIBUTORS BE
// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
//
// Inspired from \Hoa\Ruler\Grammar.
//
// @author Stéphane Py <stephane.py@hoa-project.net>
// @author Ivan Enderlin <ivan.enderlin@hoa-project.net>
// @author Kévin Gomez <contact@kevingomez.fr>
// @copyright Copyright © 2007-2015 Stéphane Py, Ivan Enderlin, Kévin Gomez.
// @license New BSD License
%skip space \s
// Scalars.
%token true (?i)true
%token false (?i)false
%token null (?i)null
// Logical operators
%token not (?i)not\b
%token and (?i)and\b
%token or (?i)or\b
%token xor (?i)xor\b
// Value
%token string (“|’)(.*?)(?<!\\)\1
%token float -?\d+\.\d+
%token integer -?\d+
%token parenthesis_ \(
%token _parenthesis \)
%token bracket_ \[
%token _bracket \]
%token comma ,
%token dot \.
%token positional_parameter \?
%token named_parameter :[a-z-A-Z0-9_]+
%token identifier [^\s\(\)\[\],\.]+
#expression:
logical_operation()
logical_operation:
operation()
(( ::and:: #and | ::or:: #or | ::xor:: #xor) logical_operation())?
operation:
operand() ( <identifier> logical_operation() #operation )?
operand:
::parenthesis_:: logical_operation() ::_parenthesis::
| value()
parameter:
<positional_parameter>
| <named_parameter>
value:
::not:: logical_operation() #not
| <true> | <false> | <null> | <float> | <integer> | <string>
| parameter()
| variable()
| array_declaration()
| function_call()
variable:
<identifier> (object_access() #variable_access )*
object_access:
::dot:: <identifier> #attribute_access
#array_declaration:
::bracket_:: value() ( ::comma:: value() )* ::_bracket::
#function_call:
<identifier> ::parenthesis_::
(logical_operation() (::comma:: logical_operation() )* )?
::_parenthesis::
上面 Llk::load 方法会加载这个基础语法内容并解析出片段 tokens,tokens 解析的逻辑就是正则匹配出我们需要的一些操作符和基础标识符,并将对应的正则表达式提取出来:
array:1 [▼
“default” => array:20 [▼
“skip” => “\s”
“true” => “(?i)true”
“false” => “(?i)false”
“null” => “(?i)null”
“not” => “(?i)not\b”
“and” => “(?i)and\b”
“or” => “(?i)or\b”
“xor” => “(?i)xor\b”
“string” => “(“|’)(.*?)(?<!\\)\1”
“float” => “-?\d+\.\d+”
“integer” => “-?\d+”
“parenthesis_” => “\(”
“_parenthesis” => “\)”
“bracket_” => “\[”
“_bracket” => “\]”
“comma” => “,”
“dot” => “\.”
“positional_parameter” => “\?”
“named_parameter” => “:[a-z-A-Z0-9_]+”
“identifier” => “[^\s\(\)\[\],\.]+”
]
]
这一步也会生成一个 rawRules
array:10 [▼
“#expression” => ” logical_operation()”
“logical_operation” => ” operation() ( ( ::and:: #and | ::or:: #or | ::xor:: #xor) logical_operation())?”
“operation” => ” operand() ( <identifier> logical_operation() #operation )?”
“operand” => ” ::parenthesis_:: logical_operation() ::_parenthesis:: | value()”
“parameter” => ” <positional_parameter> | <named_parameter>”
“value” => ” ::not:: logical_operation() #not | <true> | <false> | <null> | <float> | <integer> | <string> | parameter() | variable() | array_declaration() | function_call(▶”
“variable” => ” <identifier> (object_access() #variable_access )*”
“object_access” => ” ::dot:: <identifier> #attribute_access”
“#array_declaration” => ” ::bracket_:: value() ( ::comma:: value() )* ::_bracket::”
“#function_call” => ” <identifier> ::parenthesis_:: (logical_operation() (::comma:: logical_operation() )* )? ::_parenthesis::”
]
这个 rawRules 会通过 analyzer 类的 analyzeRules 方法解析替换里面的:: 表示的空位,根据 $_ppLexemes 属性的值,Compiler\Llk\Lexer()词法解析器会将 rawRules 数组每一个元素解析放入双向链表栈 (SplStack) 中,然后再通过对该栈插入和删除操作,形成一个包含所有操作符和 token 实例的数组 $rules。
array:54 [▼
0 => Concatenation {#64 ▶}
“expression” => Concatenation {#65 ▼
#_name: “expression”
#_children: array:1 [▼
0 => 0
]
#_nodeId: “#expression”
#_nodeOptions: []
#_defaultId: “#expression”
#_defaultOptions: []
#_pp: ” logical_operation()”
#_transitional: false
}
2 => Token {#62 ▶}
3 => Concatenation {#63 ▼
#_name: 3
#_children: array:1 [▼
0 => 2
]
#_nodeId: “#and”
#_nodeOptions: []
#_defaultId: null
#_defaultOptions: []
#_pp: null
#_transitional: true
}
4 => Token {#68 ▶}
5 => Concatenation {#69 ▶}
6 => Token {#70 ▶}
7 => Concatenation {#71 ▶}
8 => Choice {#72 ▶}
9 => Concatenation {#73 ▶}
10 => Repetition {#74 ▶}
“logical_operation” => Concatenation {#75 ▶}
12 => Token {#66 ▶}
13 => Concatenation {#67 ▶}
14 => Repetition {#78 ▶}
“operation” => Concatenation {#79 ▶}
16 => Token {#76 ▶}
17 => Token {#77 ▶}
18 => Concatenation {#82 ▶}
“operand” => Choice {#83 ▶}
20 => Token {#80 ▶}
21 => Token {#81 ▼
#_tokenName: “named_parameter”
#_namespace: null
#_regex: null
#_ast: null
#_value: null
#_kept: true
#_unification: -1
#_name: 21
#_children: null
#_nodeId: null
#_nodeOptions: []
#_defaultId: null
#_defaultOptions: []
#_pp: null
#_transitional: true
}
“parameter” => Choice {#86 ▶}
23 => Token {#84 ▶}
24 => Concatenation {#85 ▶}
25 => Token {#89 ▶}
26 => Token {#90 ▶}
27 => Token {#91 ▶}
28 => Token {#92 ▶}
29 => Token {#93 ▶}
30 => Token {#94 ▶}
“value” => Choice {#95 ▶}
32 => Token {#87 ▶}
33 => Concatenation {#88 ▶}
34 => Repetition {#98 ▶}
“variable” => Concatenation {#99 ▶}
36 => Token {#96 ▶}
37 => Token {#97 ▶}
“object_access” => Concatenation {#102 ▶}
39 => Token {#100 ▶}
40 => Token {#101 ▶}
41 => Concatenation {#105 ▶}
42 => Repetition {#106 ▶}
43 => Token {#107 ▶}
“array_declaration” => Concatenation {#108 ▶}
45 => Token {#103 ▶}
46 => Token {#104 ▶}
47 => Token {#111 ▶}
48 => Concatenation {#112 ▶}
49 => Repetition {#113 ▶}
50 => Concatenation {#114 ▶}
51 => Repetition {#115 ▶}
52 => Token {#116 ▶}
“function_call” => Concatenation {#117 ▶}
]
然后返回 HoaCompilerLlkParser 实例,这个实例有一个 parse 方法,正是此方法构成了一个语法树。
public function parse($text, $rule = null, $tree = true)
{
$k = 1024;
if (isset($this->_pragmas[‘parser.lookahead’])) {
$k = max(0, intval($this->_pragmas[‘parser.lookahead’]));
}
$lexer = new Lexer($this->_pragmas);
$this->_tokenSequence = new Iterator\Buffer(
$lexer->lexMe($text, $this->_tokens),
$k
);
$this->_tokenSequence->rewind();
$this->_errorToken = null;
$this->_trace = [];
$this->_todo = [];
if (false === array_key_exists($rule, $this->_rules)) {
$rule = $this->getRootRule();
}
$closeRule = new Rule\Ekzit($rule, 0);
$openRule = new Rule\Entry($rule, 0, [$closeRule]);
$this->_todo = [$closeRule, $openRule];
do {
$out = $this->unfold();
if (null !== $out &&
‘EOF’ === $this->_tokenSequence->current()[‘token’]) {
break;
}
if (false === $this->backtrack()) {
$token = $this->_errorToken;
if (null === $this->_errorToken) {
$token = $this->_tokenSequence->current();
}
$offset = $token[‘offset’];
$line = 1;
$column = 1;
if (!empty($text)) {
if (0 === $offset) {
$leftnl = 0;
} else {
$leftnl = strrpos($text, “\n”, -(strlen($text) – $offset) – 1) ?: 0;
}
$rightnl = strpos($text, “\n”, $offset);
$line = substr_count($text, “\n”, 0, $leftnl + 1) + 1;
$column = $offset – $leftnl + (0 === $leftnl);
if (false !== $rightnl) {
$text = trim(substr($text, $leftnl, $rightnl – $leftnl), “\n”);
}
}
throw new Compiler\Exception\UnexpectedToken(
‘Unexpected token “%s” (%s) at line %d and column %d:’ .
“\n” . ‘%s’ . “\n” . str_repeat(‘ ‘, $column – 1) . ‘↑’,
0,
[
$token[‘value’],
$token[‘token’],
$line,
$column,
$text
],
$line,
$column
);
}
} while (true);
if (false === $tree) {
return true;
}
$tree = $this->_buildTree();
if (!($tree instanceof TreeNode)) {
throw new Compiler\Exception(
‘Parsing error: cannot build AST, the trace is corrupted.’,
1
);
}
return $this->_tree = $tree;
}
我们得到的一个完整的语法树是这样的:
Rule {#120 ▼
#_root: Operator {#414 ▼
#_name: “and”
#_arguments: array:2 [▼
0 => Operator {#398 ▼
#_name: “=”
#_arguments: array:2 [▼
0 => Context {#396 ▼
#_id: “gender”
#_dimensions: []
}
1 => Parameter {#397 ▼
-name: “gender”
}
]
#_function: false
#_laziness: false
#_id: null
#_dimensions: []
}
1 => Operator {#413 ▼
#_name: “and”
#_arguments: array:2 [▼
0 => Operator {#401 ▼
#_name: “>”
#_arguments: array:2 [▼
0 => Context {#399 ▶}
1 => Parameter {#400 ▶}
]
#_function: false
#_laziness: false
#_id: null
#_dimensions: []
}
1 => Operator {#412 ▶}
]
#_function: false
#_laziness: true
#_id: null
#_dimensions: []
}
]
#_function: false
#_laziness: true
#_id: null
#_dimensions: []
}
}
这里有根节点、子节点、操作符参数以及 HoaRulerModelOperator 实例。
这时 $executorModel = $compilationTarget->compile($ast, $context); 就可以通过 NativeVisitor 的 visit 方法对这个语法树进行访问和分析了。
这一步走的是 visitOperator()
/**
* {@inheritdoc}
*/
public function visitOperator(AST\Operator $element, &$handle = null, $eldnah = null)
{
$operatorName = $element->getName();
// the operator does not exist at all, throw an error before doing anything else.
if (!$this->operators->hasInlineOperator($operatorName) && !$this->operators->hasOperator($operatorName)) {
throw new OperatorNotFoundException($operatorName, sprintf(‘Operator “%s” does not exist.’, $operatorName));
}
// expand the arguments
$arguments = array_map(function ($argument) use (&$handle, $eldnah) {
return $argument->accept($this, $handle, $eldnah);
}, $element->getArguments());
// and either inline the operator call
if ($this->operators->hasInlineOperator($operatorName)) {
$callable = $this->operators->getInlineOperator($operatorName);
return call_user_func_array($callable, $arguments);
}
$inlinedArguments = empty($arguments) ? ” : ‘, ‘.implode(‘, ‘, $arguments);
// or defer it.
return sprintf(‘call_user_func($operators[“%s”]%s)’, $operatorName, $inlinedArguments);
}
返回的逻辑代码可以通过得到:
$executorModel->getCompiledRule()