关于spark:Livy探究五-解释器的实现

本篇咱们深刻源码，探索一下livy解释器的实现原理。

ReplDriver

ReplDriver是真正最终运行的Driver程序对应的类（其基类是第三篇中提到的RSCDrvier）。在这一层，重点关注handle系列办法：

def handle(ctx: ChannelHandlerContext, msg: BaseProtocol.ReplJobRequest): Int = {    ...}def handle(ctx: ChannelHandlerContext, msg: BaseProtocol.CancelReplJobRequest): Unit = {    ...}def handle(ctx: ChannelHandlerContext, msg: BaseProtocol.ReplCompleteRequest): Array[String] = {    ...}def handle(ctx: ChannelHandlerContext, msg: BaseProtocol.GetReplJobResults): ReplJobResults = {    ...}

这些办法其实负责解决各种类型的request，例如BaseProtocol.ReplJobRequest就是解决执行代码申请。后面有篇提到的RpcServer，负责基于netty启动服务端，并且绑定解决申请的类，其外部的dispatcher会负责通过反射，找到对应的handle办法并调用。

对于RPC，这里只是提一下，前面的篇章再跟大家一起剖析细节

本篇的重点是探索REPL，所以咱们重点从BaseProtocol.ReplJobRequest解决办法跟入：

def handle(ctx: ChannelHandlerContext, msg: BaseProtocol.ReplJobRequest): Int = {  session.execute(EOLUtils.convertToSystemEOL(msg.code), msg.codeType)}

这里调用了session对象的execute，所以持续进去看session对象

Session

ReplDriver持有Session对象的实例，在ReplDriver初始化阶段实例化，并调用了session.start()办法：

session会创立SparkInterpreter，并调用SparkInterpreter.start。

session的execute办法最终会调用SparkInterpreter.execute。

SparkInterpreter

在Livy中SparkInterpreter是一种Interpreter(接口)。同样是Interpreter的还有：

PythonInterpreter
SparkRInterpreter
SQLInterpreter
...

SparkInterpreter.start次要干的事件就是初始化SparkILoop。SparkILoop是org.apache.spark.repl包下的类，它其实就是spark自身实现REPL的外围类。livy在这里其实只是包装了spark自身曾经实现的性能。另外一件事件，就是第三篇中提到的在解释器中bind变量，上面的代码就是bind变量的过程：

下面代码中的bind办法和execute办法就是外围办法，其实现办法就是间接调用SparkILoop的对应办法：

// execute其实最初调到interpret// code就是要执行的代码override protected def interpret(code: String): Result = {  sparkILoop.interpret(code)}// name: 变量名// tpe: 变量类型// value: 变量对象实在援用// modifier: 变量各种修饰符override protected def bind(name: String, tpe: String, value: Object, modifier: List[String]): Unit = {  sparkILoop.beQuietDuring {    sparkILoop.bind(name, tpe, value, modifier)  }}

到这里其实思路曾经比拟清晰了，咱们失去上面的档次关系图：

总结

本篇从源码的角度剖析了livy如何利用spark实现的REPL，实现交互式代码运行。因而，有了livy，相当于把spark的REPL搬到了web。