共计 3255 个字符,预计需要花费 9 分钟才能阅读完成。
算法整体流程
算法流程大抵如下:
- 客户端申请主节点执行一些操作(发送申请)
- 主节点将申请多播到所有备份节点
- 各节点执申请,响应客户端
客户端期待 f + 1 个不同节点响应的雷同后果
The algorithm works roughly as follows:
- A client sends a request to invoke a service operation to the primary
- The primary multicasts the request to the backups
- Replicas execute the request and send a reply to the client
- The client waits for f+1 replies from different replicas with the same result; this is the result of the operation.
Three Phase 共识的三个阶段
<m>σi示意 i 节点对音讯 m 进行签名
pre-prepare 预筹备阶段
主节点接管到客户端申请后,为申请调配一个序列号
n
, 将预筹备音讯 (pre-prepare message) 多播给所有备份节点,并将其追加到日志中。
预筹备音讯:<<PRE-PREPARE,v,n,d>σp,m>
- v:视图号
- d:m 的摘要
- n:视图中的音讯序列号
- m:客户端的申请音讯
In the pre-prepare phase, the primary assigns a sequence number, , to the request, multicasts a pre-prepare message with piggybacked to all the backups,and appends the message to its log. The message has the form <<PRE-PREPARE,v,n,d>σp,m>, where indicates the view in which the message is being sent, is theclient’s request message, and is’s digest.
prepare 筹备阶段
如果节点 i 接管到经校验无误的预筹备音讯,就将筹备音讯多播给所有节点进入筹备阶段,并将预筹备音讯和筹备音讯追加到日志中。否则不作任何解决。
筹备音讯:<PREPARE,v,n,d,i>σi
If backup accepts the <<PRE-PREPARE,v,n,d>σp,m> message, it enters the prepare phase by multicasting a <PREPARE,v,n,d,i>σi message to all other replicas and adds both messages to its log. Otherwise, it does nothing.
谓词 (predicate) prepared(m,v,n,i)
为真,当且仅当节点 i 在日志中插入了:
- 申请音讯 m
- m 的预筹备音讯 <<PRE-PREPARE,v,n,d>σp,m>
- 与预筹备音讯匹配的 2f 个节点(包含本人)的筹备音讯
We define the predicate prepared to be true if and only if replica has inserted in its log: the request m, a pre-prepare for in view with sequence number n, and 2f prepares from different backups that match the pre-prepare. The replicas verify whether the prepares match the pre-prepare by checking that they have the same view, sequence number, and digest.
预筹备与筹备阶段保障了所有失常节点在特定视图中申请的排序统一
The pre-prepare and prepare phases of the algorithm guarantee that non-faulty replicas agree on a total order for the requests within a view.
commit 提交阶段
当谓词 prepared
为真时,节点将提交音讯多播给其余节点
提交音讯:<COMMIT,v,n,d,i>σp
Replica multicasts a COMMIT to theother replicas when prepared becomes true.
谓词 committed(m,v,n)
为真,当且仅当某 f + 1 个节点中的谓词 prepared
为真
committed(m,v,n) is true if and only if prepared(m,v,n,i) is true for all in some set of f+1 non-faulty replicas
谓词 committed-local(m,v,n,i)
为真,当且仅当谓词 prepared
为真,且收到了来自不同节点(通常包含本人)2f+ 1 个与 m 的预筹备音讯匹配的提交音讯。即 v, n, d 雷同
committed-local(m,v,n,i) is true if and only if prepared is true and has accepted 2f+1 commits (possibly including its own) from different replicas that match the pre-prepare for m. a commit matches a pre-prepare if they have the same view, sequence number, and digest.
在谓词 committed-local(m,v,n,i)
为真之后,节点执行 m 申请,i 的状态反映了较小序列号申请线性执行的后果。
Each replica executes the operation requested by m after committed-local is true and i’s state reflects the sequential execution of all requests with lower sequence numbers.
算法流程示例
3 个失常节点,1 个离线节点
- A 节点接管到申请后,将 <<PRE-PREPARE,v,n,d>σp,m>(省略为[P-P])多播给所有备份节点
- 各备份节点校验正确后,进入筹备阶段,将 <PREPARE,v,n,d,i>σi(省略为[P])多播给所有节点
- 各节点谓词
prepared
为真后,将 <COMMIT,v,n,d,i>σp(省略为[C])多播给所有节点。谓词prepared
的意义在于节点可借此判断本人是否收与到了正确的信息,与网络中的信息统一。但并不能判断其余节点是否也正确收到了音讯(因为可能存在拜占庭节点向不同的节点发送不同的音讯)。 - 各节点谓词
committed-local
为真后,执行申请,并向客户端响应执行后果。谓词committed-local
的意义在于,能够确认 2f+ 1 个(即 n -f,除拜占庭节点外的所有节点)曾经收到了正确的音讯,因而能够执行申请内容。
须要留神的是
- 第一阶段只有主节点发送音讯,第二阶段主节点不发送音讯
节点会持有本人向其余节点多播的音讯,例如第二阶段后 B 节点只收到了一条 (2f-1)[P],加上本人持有的[P] 即为两条(2f)
3 个失常节点,主节点为拜占庭节点
主节点 A 向 BC 发送了 <<PRE-PREPARE,v,n,d>σp,m>(省略为[P-P],相应的 PREPARE 音讯为[P]),向 D 发送了 <<PRE-PREPARE,v,n,d>σp,m’>(省略为[P-P]’,相应的 PREPARE 音讯为[P]’)
在 prepare 阶段后,失常节点的谓词 prepared 为假,不再进入 commit 阶段
文献
Miguel Castro and Barbara Liskov.“Practical Byzantine fault tolerance”Operating Systems Design and Implementation (1999).