Storm

关于storm:storm集群部署和项目部署

storm集群部署和我的项目部署wget http://mirror.bit.edu.cn/apache/storm/apache-storm-1.2.2/apache-storm-1.2.2.tar.gztar -zxvf apache-storm-1.2.2.tar.gz cd apache-storm-1.2.2vim conf/storm.yaml 配置storm.local.dir: "/data/liang/datas/storm"storm.zookeeper.servers: - "10.2.45.3" - "10.2.45.4" - "10.2.45.5"storm.zookeeper.port: 2181nimbus.seeds: ["10.2.45.5"]ui.port: 8081supervisor.slots.ports:- 6700- 6701- 6702- 6703storm我的项目部署应用maven命令把依赖的jar打到target/dependency目录下 clean dependency:copy-dependencies而后把storm-core-1.2.1.jar包去掉,复制到apache-storm-1.2.2/extlib目录下mkdir app应用maven命令 clean install 打的包 sjz-rta-app-1.0-SNAPSHOT.jar把rta.yaml配置文件 sjz-rta-app-1.0-SNAPSHOT.jar放在app目录../bin/storm jar sjz-rta-app-1.0-SNAPSHOT.jar com.liang.ecpm.rta.AppRTA 运行应用插件maven-assembly-plugin 参看gj-bus-server或xinyun的实时计费我的项目 <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>1.2.1</version> <scope>provided</scope>打包时这里须要这个配置(本地运行不必) </dependency>应用maven命令 clean install 打的包 sjz-rta-app-1.0-SNAPSHOT-jar-with-dependencies.jar 会把所有依赖的jar包都打在一起间接能够运行../bin/storm jar sjz-rta-app-1.0-SNAPSHOT-jar-with-dependencies.jar com.liang.ecpm.rta.AppRTA../bin/storm kill rta-sjz-v1-daili-fei -w 3../bin/storm kill saveDbStormStart -w 3 ../bin/storm kill RealPassengerServer -w 3 ../bin/storm kill ScheduleRealServer -w 3../bin/storm jar sjz-rta-app-1.0-SNAPSHOT.jar com.liang.ecpm.rta.AppRTA -nimbus carlan152 -port 6627../bin/storm jar sjz-rta-app.jar com.liang.ecpm.rta.AppRTA -nimbus carlan152 -port 6627../bin/storm jar sjz-rta-app-passenger-schedule.jar com.liang.ecpm.rta.AppRTA cluster

大数据学习路线

一、大数据处理流程上图是一个简化的大数据处理流程图，大数据处理的主要流程包括数据收集、数据存储、数据处理、数据应用等主要环节。下面我们逐一对各个环节所需要的技术栈进行讲解： 1.1 数据收集大数据处理的第一步是数据的收集。现在的中大型项目通常采用微服务架构进行分布式部署，所以数据的采集需要在多台服务器上进行，且采集过程不能影响正常业务的开展。基于这种需求，就衍生了多种日志收集工具，如 Flume 、Logstash、Kibana 等，它们都能通过简单的配置完成复杂的数据收集和数据聚合。 1.2 数据存储收集到数据后，下一个问题就是：数据该如何进行存储？通常大家最为熟知是 MySQL、Oracle 等传统的关系型数据库，它们的优点是能够快速存储结构化的数据，并支持随机访问。但大数据的数据结构通常是半结构化（如日志数据）、甚至是非结构化的（如视频、音频数据），为了解决海量半结构化和非结构化数据的存储，衍生了 Hadoop HDFS 、KFS、GFS 等分布式文件系统，它们都能够支持结构化、半结构和非结构化数据的存储，并可以通过增加机器进行横向扩展。分布式文件系统完美地解决了海量数据存储的问题，但是一个优秀的数据存储系统需要同时考虑数据存储和访问两方面的问题，比如你希望能够对数据进行随机访问，这是传统的关系型数据库所擅长的，但却不是分布式文件系统所擅长的，那么有没有一种存储方案能够同时兼具分布式文件系统和关系型数据库的优点，基于这种需求，就产生了 HBase、MongoDB。 1.3 数据分析大数据处理最重要的环节就是数据分析，数据分析通常分为两种：批处理和流处理。批处理：对一段时间内海量的离线数据进行统一的处理，对应的处理框架有 Hadoop MapReduce、Spark、Flink 等；流处理：对运动中的数据进行处理，即在接收数据的同时就对其进行处理，对应的处理框架有 Storm、Spark Streaming、Flink Streaming等。批处理和流处理各有其适用的场景，时间不敏感或者硬件资源有限，可以采用批处理；时间敏感和及时性要求高就可以采用流处理。随着服务器硬件的价格越来越低和大家对及时性的要求越来越高，流处理越来越普遍，如股票价格预测和电商运营数据分析等。上面的框架都是需要通过编程来进行数据分析，那么如果你不是一个后台工程师，是不是就不能进行数据的分析了？当然不是，大数据是一个非常完善的生态圈，有需求就有解决方案。为了能够让熟悉 SQL 的人员也能够进行数据的分析，查询分析框架应运而生，常用的有 Hive 、Spark SQL 、Flink SQL、 Pig、Phoenix 等。这些框架都能够使用标准的 SQL 或者类SQL 语法灵活地进行数据的查询分析。这些 SQL 经过解析优化后转换为对应的作业程序来运行，如 Hive 本质上就是将 SQL 转换为 MapReduce 作业，Spark SQL 将 SQL 转换为一系列的 RDDs 和转换关系（transformations），Phoenix 将 SQL 查询转换为一个或多个HBase Scan。 1.4 数据应用数据分析完成后，接下来就是数据应用的范畴，这取决于你实际的业务需求。比如你可以将数据进行可视化展现，或者将数据用于优化你的推荐算法，这种运用现在很普遍，比如短视频个性化推荐、电商商品推荐、头条新闻推荐等。当然你也可以将数据用于训练你的机器学习模型，这些都属于其他领域的范畴，都有着对应的框架和技术栈进行处理，这里就不一一赘述。 1.5 其他框架上面是一个标准的大数据处理流程所用到的技术框架。但是实际的大数据处理流程比上面复杂很多，针对大数据处理中的各种复杂问题分别衍生了各类框架：单机的处理能力都是存在瓶颈的，所以大数据框架都是采用集群模式进行部署，为了更方便的进行集群的部署、监控和管理，衍生了 Ambari、Cloudera Manager 等集群管理工具；想要保证集群高可用，需要用到 ZooKeeper ，ZooKeeper 是最常用的分布式协调服务，它能够解决大多数集群问题，包括首领选举、失败恢复、元数据存储及其一致性保证。同时针对集群资源管理的需求，又衍生了 Hadoop YARN ;复杂大数据处理的另外一个显著的问题是，如何调度多个复杂的并且彼此之间存在依赖关系的作业？基于这种需求，产生了 Azkaban 和 Oozie 等工作流调度框架；大数据流处理中使用的比较多的另外一个框架是 Kafka，它可以用于消峰，避免在秒杀等场景下并发数据对流处理程序造成冲击；另一个常用的框架是 Sqoop ，主要是解决了数据迁移的问题，它能够通过简单的命令将关系型数据库中的数据导入到 HDFS 、Hive 或 HBase 中，或者从 HDFS 、Hive 导出到关系型数据库上。二、学习路线介绍完大数据框架，接着就可以介绍其对应的学习路线了，主要分为以下几个方面： ...

大数据系列Storm安装和API

1. 实时计算有别于传统的离线批处理操作(对很多数据的集合进行的操作)实时处理，说白就是针对一条一条的数据/记录进行操作实时计算计算的是无界数据2. 有界数据和无界数据2.1 有界数据离线计算面临的操作数据都是有界限的，无论是1G、1T、1P、1EB、1NB数据的有界必然会导致计算的有界2.2 无界数据实时计算面临的操作数据是源源不断的向水流一样，是没有界限的数据的无界必然导致计算的无界3. 计算中心和计算引擎在大数据领域中存在三大计算中心和三大计算引擎 3.1 三大计算中心离线计算计算中心(mapreduce)实时计算中心(storm flink...)准实时计算中心(spark)3.2 三大计算引擎交互式查询计算引擎(hive sparksql)图计算计算引擎机器学习计算引擎4. Storm简介免费开源分布式实时计算系统处理无界的数据流Tiwtter开源的cloujreStorm能实现高频数据和大规模数据的实时处理官网资料显示storm的一个节点1秒钟能够处理100万个100字节的消息(IntelE5645@2.4Ghz的CPU,24GB的内存)storm是毫秒级的实时处理框架Apache Storm是Twitter开源的一个类似于Hadoop的实时数据处理框架，它原来是由BackType开发，后BackType被Twitter收购，将Storm作为Twitter的实时数据分析系统。 5. hadoop与storm的计算数据来源 hadoop HADOOP处理的是HDFS上TB级别的数据(历史数据)storm STORM是处理的是实时新增的某一笔数据(实时数据)处理过程 hadoop HADOOP是分MAP阶段到REDUCE阶段HADOOP最后是要结束的storm STORM是由用户定义处理流程，流程中可以包含多个步骤，每个步骤可以是数据源(SPOUT)或处理逻辑(BOLT)STORM是没有结束状态，到最后一步时，就停在那，直到有新数据进入时再从头开始处理速度 hadoop HADOOP是以处理HDFS上TB级别数据为目的，处理速度慢storm STORM是只要处理新增的某一笔数据即可，可以做到很快 (毫秒级的响应)适用场景 HADOOP是在要处理批量数据时用的，不讲究时效性STORM是要处理某一新增数据时用的，要讲时效性6. Storm的架构Spout Storm认为每个stream都有一个stream源，也就是原始元组的源头，所以它将这个源头称为Spout消息源，是消息生产者，他会从一个外部源读取数据并向topology里面面发出消息Bolt 消息处理者，所有的消息处理逻辑被封装在bolts里面，处理输入的数据流并产生新的输出数据流,可执行过滤，聚合，查询数据库等操作数据流Task 每一个Spout和Bolt会被当作很多task在整个集群里面执行,每一个task对应到一个线程.Stream groupings: 消息分发策略,定义一个Topology的其中一步是定义每个tuple接受什么样的流作为输入,stream grouping就是用来定义一个stream应该如何分配给Bolts们.7. Storm集群的安装准备安装文件 apache-storm-1.0.2.tar.gz 解压[root@uplooking01 /soft] tar -zxvf apache-storm-1.0.2.tar.gz -C /opt mv apache-storm-1.0.2/ storm配置stormstorm-env.sh [root@uplooking01 /soft] export JAVA_HOME=/opt/jdk export STORM_CONF_DIR="/opt/storm/conf"storm.yaml [root@uplooking01 /opt/storm/conf]storm.zookeeper.servers: - "uplooking03" - "uplooking04" - "uplooking05"#配置两个主节点,实现主节点的单点故障nimbus.seeds: ["uplooking01", "uplooking02"]storm.local.dir: "/opt/storm/storm-local"#配置从节点的槽数supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703分发到其他节点[root@uplooking01 /] scp -r /opt/storm uplooking02:/opt scp -r /opt/storm uplooking03:/opt scp -r /opt/storm uplooking04:/opt scp -r /opt/storm uplooking05:/opt启动storm[root@uplooking01 /] #启动主进程和ui进程 nohup /opt/storm/bin/storm nimbus >/dev/null 2>&1 & nohup /opt/storm/bin/storm ui >/dev/null 2>&1 & nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &[root@uplooking02 /] #启动主进程(numbus) nohup /opt/storm/bin/storm numbus >/dev/null 2>&1 & nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &#启动从节点进程(supervisor)[root@uplooking03 /] nohup /opt/storm/bin/storm supervisor >/dev/null 2>&1 & nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &[root@uplooking04 /] nohup /opt/storm/bin/storm supervisor >/dev/null 2>&1 & nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &[root@uplooking05 /] nohup /opt/storm/bin/storm supervisor >/dev/null 2>&1 & nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &8. Storm集群的启动脚本#!/bin/bash#启动nimbusfor nimbusHost in `cat /opt/shell/nimbus.host`do#-T 进制分配伪终端一般自动化脚本不需要分配伪终端ssh -T root@${nimbusHost} << eeooff nohup /opt/storm/bin/storm nimbus >/dev/null 2>&1 &eeooffdone#启动supervisorfor supervisorHost in `cat /opt/shell/supervisor.host`do#-T 进制分配伪终端一般自动化脚本不需要分配伪终端ssh -T root@${supervisorHost} << eeooff nohup /opt/storm/bin/storm supervisor >/dev/null 2>&1 &eeooffdone#启动logviewerfor logviewerHost in `cat /opt/shell/logviewer.host`do#-T 进制分配伪终端一般自动化脚本不需要分配伪终端ssh -T root@${logviewerHost} << eeooff nohup /opt/storm/bin/storm logviewer >/dev/null 2>&1 &eeooffdone#启动uifor uiHost in `cat /opt/shell/ui.host`do#-T 进制分配伪终端一般自动化脚本不需要分配伪终端ssh -T root@${uiHost} << eeooff nohup /opt/storm/bin/storm ui >/dev/null 2>&1 &eeooffdone9. Storm实现数字累加编写Spoutpublic class MySpout extends BaseRichSpout { private SpoutOutputCollector collector; //初始化累加的数字 int num = 0; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.collector = collector; } @Override public void nextTuple() { collector.emit(new Values(num)); num++; } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("mynum")); }}编写Boltpublic class MyBolt extends BaseRichBolt { @Override public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { } @Override public void execute(Tuple tuple) { Integer num = tuple.getIntegerByField("mynum"); System.out.println(num); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { }}编写Topologypublic class MyTopology { public static void main(String[] args) { //创建自定义的spout MySpout mySpout = new MySpout(); //创建自定义的bolt MyBolt myBolt = new MyBolt(); //创建topology名称 String topologyName = "MyNumTopology"; //创建topology的配置对象 Map conf = new Config(); //创建topology的构造器 TopologyBuilder topologyBuilder = new TopologyBuilder(); //为topology设置spout和bolt topologyBuilder.setSpout("myspout", mySpout); topologyBuilder.setBolt("mybolt", myBolt).shuffleGrouping("myspout"); //创建本地的topology提交器 StormTopology stormTopology = topologyBuilder.createTopology(); LocalCluster localCluster = new LocalCluster(); localCluster.submitTopology(topologyName, conf, stormTopology); }}10. 多个Bolt的问题定义下一个Boltpublic class MyBolt02 extends BaseRichBolt { @Override public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { } @Override public void execute(Tuple tuple) { System.out.println(tuple.getIntegerByField("mynum02") + "....."); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { }}第一个Bolt中给第二个Bolt发射数据public class MyBolt extends BaseRichBolt { private OutputCollector collector; @Override public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.collector = collector; } @Override public void execute(Tuple tuple) { Integer num = tuple.getIntegerByField("mynum"); System.out.println(num); collector.emit(new Values(num)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("mynum02")); }}在Topology中配置第二个Boltpublic class MyTopology { public static void main(String[] args) { //创建自定义的spout MySpout mySpout = new MySpout(); //创建自定义的bolt MyBolt myBolt = new MyBolt(); MyBolt02 myBolt02 = new MyBolt02(); //创建topology名称 String topologyName = "MyNumTopology"; //创建topology的配置对象 Map conf = new Config(); //创建topology的构造器 TopologyBuilder topologyBuilder = new TopologyBuilder(); //为topology设置spout和bolt topologyBuilder.setSpout("myspout", mySpout); topologyBuilder.setBolt("mybolt", myBolt).shuffleGrouping("myspout"); topologyBuilder.setBolt("mybolt02", myBolt02).shuffleGrouping("mybolt"); //创建本地的topology提交器 StormTopology stormTopology = topologyBuilder.createTopology(); LocalCluster localCluster = new LocalCluster(); localCluster.submitTopology(topologyName, conf, stormTopology); }}11. 提交作业到集群 StormSubmitter.submitTopology(topologyName, conf, stormTopology);12. Storm的并行度在storm中的并行度说的就是一个进程的运行需要多少个线程来参与，如果storm运行的线程个数+1，则并行度+1 ...

Storm安装笔记

Storm安装笔记安装环境三台服务器10.223.138.[141-143]CentOS release 6.7 (Final) 64位JDK 1.7.0_80It is strongly recommended to use Oracle JDK rather than OpenJDK.CDH 5.7.0本次安装Storm版本为1.1.0官网地址请点击我官方文档 Version: 1.1.0 安装文档 Setting up a Storm Cluster安装Zookeeper集群CDH安装Zookeeper服务，此处略 Zookeeper version:zookeeper-3.4.5-cdh5.7.010.223.138.141:218110.223.138.142:218110.223.138.143:2181官方建议：A few notes about Zookeeper deployment:It’s critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See here for more details.It’s critical that you set up a cron to compact Zookeeper’s data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don’t set up a cron, Zookeeper will quickly run out of disk space. See here for more details.进程监控以避免任何错误导致的程序退出定时压缩数据和日志以避免可能引发的磁盘空间不足安装依赖安装Storm所需要的依赖Java 7Python 2.6.6JDK 7 下载地址请点击我 CenOS 6.7自带Python 2.6.6# python -VPython 2.6.6以上安装方法此处不再赘述，可自行网上查阅。下载Storm包下载地址 apache-storm-1.1.0.tar.gz 将压缩包解压到指定目录tar -zxvf apache-storm-1.1.0.tar.gz -C /your/path/配置storm.yamlThe Storm release contains a file at conf/storm.yaml that configures the Storm daemons. You can see the default configuration values here. storm.yaml overrides anything in defaults.yaml.配置文件位置conf/storm.yaml，以下是用以启动集群的必要配置storm.zookeeper.servers：Zookeeper集群配置storm.zookeeper.servers: - “10.223.138.141” - “10.223.138.142” - “10.223.138.143"如果端口不是默认的2181还需要配置storm.zookeeper.port参数storm.local.dir：存储目录用以保存运行环境，比如jar包、配置文件等，在每台机器上手动创建目录并赋权限storm.local.dir: “/home/storm"nimbus.seeds：master节点，官方建议配置machine’s FQDN，即全域名。维基百科Fully_qualified_domain_name。本机查看命令hostname -f，修改方法请自行网上查阅。（配置多个Nimbus角色的机器可以启动Nimbus H/A。）nimbus.seeds: [“cdh-1”]原来我使用的是10.223.138.141，但是storm ui上Nimbus Summary出现了两行数据，一个是cdh-1，一个是10.223.138.141这种奇怪的问题，但是换成了全域名的这种形式就没问题了。supervisor.slots.ports：对应每一台Worker机器，定义一台机器上跑几个worker就配置几个端口，比如我这边一个机器上跑三个worker，配置如下supervisor.slots.ports:- 6700- 6701- 6702健康度监控# 脚本相关文件的存放目录storm.health.check.dir: “/home/storm/healthchecks”# 脚本执行的超时时间storm.health.check.timeout.ms: 5000配置第三方库和环境变量如果需要扩展第三方库或自定义插件，把jar包放入extlib/或者extlib-daemon/目录，extlib-daemon/这个里面只能被Storm的程序所使用，比如(Nimbus, Supervisor, DRPC, UI, Logviewer)这些，也可以通过环境变量STORM_EXT_CLASSPATH和STORM_EXT_CLASSPATH_DAEMON去配置扩展库classpath目录。启动Nimbus: Nimbus角色的机器执行命令bin/storm nimbus &Supervisor: Worker角色的机器执行命令bin/storm supervisor &，它用于启停Worker进程。UI: 某一台机器执行命令bin/storm ui &，然后可以通过浏览器访问http://{ui host}:8080。logs/ 此目录下可查看运行日志。进程监控在Setting up a Storm Cluster这份官方安装文档中强烈建议run under supervision，引用官方文档的原话（原文地址：Daemon-Fault-Tolerance）What happens when Nimbus or Supervisor daemons die? The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk). As described in Setting up a Storm cluster, the Nimbus and Supervisor daemons must be run under supervision using a tool like daemontools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.当Nimbus或者Supervisor daemon进程挂了会怎样？Nimbus和Supervisor daemon进程设计成快速失败（无论何时遇到任何异常情况执行自毁）和无状态（所有状态保存在Zookeeper或者磁盘上）。正如Setting up a Storm Cluster中描述的，Nimbus和Supervior daemon进程必须在监控下运行，如使用daemontools或者monit工具。所以如果Nimbus或者Supervisor daemon进程挂了，它可以像什么异常也没有发生似的重新启动。这里我使用monit，详细介绍请参阅官网。（官网地址请点击我）安装monit安装命令yum install monit，如果没有找到的话，需要先安装epel源yum install epel-release 配置文件位置 /etc/monit.conf，分为Global section和Services两个部分配置Global section，以下是我个人的Global配置：（邮件功能暂略）set daemon 60 #每60秒检查一次服务 with start delay 240 #Monit启动后第一次检查延迟240秒set logfile /var/log/monit.log #日志输出到单独的文件set pidfile /var/run/.monit.pid #pid文件位置set idfile /var/.monit.id #id文件位置set statefile /var/.monit.state #state文件位置#配置web页面访问set httpd port 2812 and #端口2812 use address 10.223.138.141 #如果配置localhost只能本地访问 allow localhost #允许本地访问 allow 10.223.138.141 #若不配置，monit status命令不可用，后台报错 #error : Denied connection from non-authorized client #error : Cannot read status from the monit daemon allow 10.223.132.0/24 #允许10.223.132网段访问，即我的电脑所在网段 allow admin:monit #用户名与密码中间插一下：如果使用防火墙，要访问的话还需要把2812端口加入防火墙配置，编辑防火墙配置文件/etc/sysconfig/iptables增加一行配置：-A INPUT -m state –state NEW -m tcp -p tcp –dport 2812 -j ACCEPT 注意：需要在相应的配置段中加入，其他位置不生效..-A INPUT -m state –state NEW -m tcp -p tcp –dport 22 -j ACCEPT-A INPUT -m state –state NEW -m tcp -p tcp –dport 2812 -j ACCEPT..重启服务service iptables restart配置Services这里把Services的配置和全局配置分开，在/etc/monit.d/下面新建文件nimbus和supervisor作为监控nimbus和supervisor的配置。这里主要是监控nimbus和supervisor的进程，我们只看monit监控进程的语法 CHECK PROCESS <unique name> <PIDFILE <path> | MATCHING <regex>>有两种方式，一个是pidfile，一个是正则匹配进程名，因为storm的deamon没写pid文件，这里我用第二种方法，使用monit procmatch命令验证是否可以匹配相应的进程，例如nimbus的进程，Command里面最后有org.apache.storm.daemon.nimbus#monit procmatch org.apache.storm.daemon.nimbusList of processes matching pattern “org.apache.storm.daemon.nimbus”:——————————————…此处省略…——————————————Total matches: 1匹配到nimbus的进程，表示OK。#vi /etc/monit.d/nimbuscheck process nimbus matching org.apache.storm.daemon.nimbus start program = “/bin/bash -c ‘/home/storm/apache-storm-1.1.0/bin/nimbus.sh &>/tmp/nimbus.out’” with timeout 60 seconds stop program = “/bin/kill -9 ps -ef|grep daemon.nimbus|grep -v grep|awk '{print $2}'” if 3 restarts within 5 cycles then unmonitor group storm关于monit更为详细的说明请参见官方手册 Monit manual以上设置的很简单，这里得好好说说这个start program，就是启动的命令，它可是实现进程挂掉后重启，但是我一开始试了好几种方式都不行直接执行，/home/storm/apache-storm-1.1.0/bin/storm nimbus，启动超时放入后台，/home/storm/apache-storm-1.1.0/bin/storm nimbus &，程序会自行终结创建脚本文件/home/storm/apache-storm-1.1.0/bin/nimbus.sh/home/storm/apache-storm-1.1.0/bin/storm nimbus &给执行权限chmod 755 /home/storm/apache-storm-1.1.0/bin/nimbus.sh脚本“&”符号还是要的，不然PPID是monit的。不管是/home/storm/apache-storm-1.1.0/bin/nimbus.sh还是/bin/bash /home/storm/apache-storm-1.1.0/bin/nimbus.sh都不能实现自动重启。参考monit的FAQ（请点击我）使用/bin/bash -c执行但是后面必须要有一个输出的文件，没有会报错：error : ’nimbus’ failed to start (exit status 0) – no output所以经过好一番折腾，最终 start program = “/bin/bash -c ‘/home/storm/apache-storm-1.1.0/bin/nimbus.sh &>/tmp/nimbus.out’"，nimbus.sh就是上面提到的那个脚本文件。我觉得这应该不是最正确的配置，但是至少可以实现重启了。supervisor同理#vi /etc/monit.d/supervisorcheck process supervisor matching org.apache.storm.daemon.supervisor.Supervisor start program = “/bin/bash -c ‘/home/storm/apache-storm-1.1.0/bin/supervisor.sh &>/tmp/supervisor.out’” with timeout 60 seconds stop program = “/bin/kill -9 ps -ef|grep daemon.supervisor|grep -v grep|awk '{print $2}'” if 3 restarts within 5 cycles then unmonitor group storm这样就算配置完成了。141配置nimbus，然后141-143配置supervisor，三台机器都装了Monit，虽然M/Monit可以统一管理集群，但是需要收费，这边就没有考虑了。后面再尝试使用其他监控工具试试。接下来就是启动monit，先熟悉monit的基本命令monit 启动monit -t 校验配置文件正确性monit reload 重新加载配置monit status 查看状态monit quit 退出所以当修改完配置文件之后，第一步执行monit -t查看配置是否正确，然后monit reload重新加载配置。monit启动之后可以登录web查看，根据Global里面的配置，我的页面地址是：http://10.223.138.141:2812/启动storm的UI后查看，我在10.223.138.141上启动的，访问地址为：10.223.138.141:8080（不要忘记防火墙！）至此，Storm安装完毕！参考Monit manualMonit FAQMonit：开源服务器监控工具 ...

[case47]聊聊flink的BoltWrapper

序本文主要研究一下flink的BoltWrapperBoltWrapperflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/BoltWrapper.java/** * A {@link BoltWrapper} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming program. * It takes the Flink input tuples of type {@code IN} and transforms them into {@link StormTuple}s that the bolt can * process. Furthermore, it takes the bolt’s output tuples and transforms them into Flink tuples of type {@code OUT} * (see {@link AbstractStormCollector} for supported types). * * Works for single input streams only! See {@link MergedInputsBoltWrapper} for multi-input stream * Bolts. /public class BoltWrapper<IN, OUT> extends AbstractStreamOperator<OUT> implements OneInputStreamOperator<IN, OUT> { @Override public void open() throws Exception { super.open(); this.flinkCollector = new TimestampedCollector<>(this.output); GlobalJobParameters config = getExecutionConfig().getGlobalJobParameters(); StormConfig stormConfig = new StormConfig(); if (config != null) { if (config instanceof StormConfig) { stormConfig = (StormConfig) config; } else { stormConfig.putAll(config.toMap()); } } this.topologyContext = WrapperSetupHelper.createTopologyContext( getRuntimeContext(), this.bolt, this.name, this.stormTopology, stormConfig); final OutputCollector stormCollector = new OutputCollector(new BoltCollector<OUT>( this.numberOfAttributes, this.topologyContext.getThisTaskId(), this.flinkCollector)); if (this.stormTopology != null) { Map<GlobalStreamId, Grouping> inputs = this.topologyContext.getThisSources(); for (GlobalStreamId inputStream : inputs.keySet()) { for (Integer tid : this.topologyContext.getComponentTasks(inputStream .get_componentId())) { this.inputComponentIds.put(tid, inputStream.get_componentId()); this.inputStreamIds.put(tid, inputStream.get_streamId()); this.inputSchemas.put(tid, this.topologyContext.getComponentOutputFields(inputStream)); } } } this.bolt.prepare(stormConfig, this.topologyContext, stormCollector); } @Override public void dispose() throws Exception { super.dispose(); this.bolt.cleanup(); } @Override public void processElement(final StreamRecord<IN> element) throws Exception { this.flinkCollector.setTimestamp(element); IN value = element.getValue(); if (this.stormTopology != null) { Tuple tuple = (Tuple) value; Integer producerTaskId = tuple.getField(tuple.getArity() - 1); this.bolt.execute(new StormTuple<>(value, this.inputSchemas.get(producerTaskId), producerTaskId, this.inputStreamIds.get(producerTaskId), this.inputComponentIds .get(producerTaskId), MessageId.makeUnanchored())); } else { this.bolt.execute(new StormTuple<>(value, this.inputSchemas.get(null), -1, null, null, MessageId.makeUnanchored())); } }}flink用BoltWrapper来包装storm的IRichBolt，它实现了OneInputStreamOperator接口，继承AbstractStreamOperator类OneInputStreamOperator接口继承了StreamOperator接口，额外定义了processElement、processWatermark、processLatencyMarker三个接口AbstractStreamOperator类实现的是StreamOperator接口，但是里头帮忙实现了processWatermark、processLatencyMarker这两个接口BoltWrapper里头主要是实现OneInputStreamOperator接口的processElement方法，然后是覆盖StreamOperator接口定义的open及dispose方法open方法有个要点就是调用bolt的prepare方法，传入包装BoltCollector的OutputCollector，通过BoltCollector来收集bolt发射的数据到flink，它使用的是flink的TimestampedCollectorBoltCollectorflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/BoltCollector.java/* * A {@link BoltCollector} is used by {@link BoltWrapper} to provided an Storm compatible * output collector to the wrapped bolt. It transforms the emitted Storm tuples into Flink tuples * and emits them via the provide {@link Output} object. /class BoltCollector<OUT> extends AbstractStormCollector<OUT> implements IOutputCollector { /* The Flink output Collector. / private final Collector<OUT> flinkOutput; /* * Instantiates a new {@link BoltCollector} that emits Flink tuples to the given Flink output object. If the * number of attributes is negative, any output type is supported (ie, raw type). If the number of attributes is * between 0 and 25, the output type is {@link Tuple0} to {@link Tuple25}, respectively. * * @param numberOfAttributes * The number of attributes of the emitted tuples per output stream. * @param taskId * The ID of the producer task (negative value for unknown). * @param flinkOutput * The Flink output object to be used. * @throws UnsupportedOperationException * if the specified number of attributes is greater than 25 / BoltCollector(final HashMap<String, Integer> numberOfAttributes, final int taskId, final Collector<OUT> flinkOutput) throws UnsupportedOperationException { super(numberOfAttributes, taskId); assert (flinkOutput != null); this.flinkOutput = flinkOutput; } @Override protected List<Integer> doEmit(final OUT flinkTuple) { this.flinkOutput.collect(flinkTuple); // TODO return null; } @Override public void reportError(final Throwable error) { // not sure, if Flink can support this } @Override public List<Integer> emit(final String streamId, final Collection<Tuple> anchors, final List<Object> tuple) { return this.tansformAndEmit(streamId, tuple); } @Override public void emitDirect(final int taskId, final String streamId, final Collection<Tuple> anchors, final List<Object> tuple) { throw new UnsupportedOperationException(“Direct emit is not supported by Flink”); } @Override public void ack(final Tuple input) {} @Override public void fail(final Tuple input) {} @Override public void resetTimeout(Tuple var1) {}}BoltCollector实现了storm的IOutputCollector接口，只是ack、fail、resetTimeout、reportError操作都为空，不支持emitDirect操作doEmit方法调用的是flinkOutput.collect(flinkTuple)emit方法调用的是tansformAndEmit(streamId, tuple)，它由继承的父类AbstractStormCollector实现TimestampedCollector.collectflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/operators/TimestampedCollector.java/* * Wrapper around an {@link Output} for user functions that expect a {@link Collector}. * Before giving the {@link TimestampedCollector} to a user function you must set * the timestamp that should be attached to emitted elements. Most operators * would set the timestamp of the incoming * {@link org.apache.flink.streaming.runtime.streamrecord.StreamRecord} here. * * @param <T> The type of the elements that can be emitted. /@Internalpublic class TimestampedCollector<T> implements Collector<T> { private final Output<StreamRecord<T>> output; private final StreamRecord<T> reuse; /* * Creates a new {@link TimestampedCollector} that wraps the given {@link Output}. / public TimestampedCollector(Output<StreamRecord<T>> output) { this.output = output; this.reuse = new StreamRecord<T>(null); } @Override public void collect(T record) { output.collect(reuse.replace(record)); } public void setTimestamp(StreamRecord<?> timestampBase) { if (timestampBase.hasTimestamp()) { reuse.setTimestamp(timestampBase.getTimestamp()); } else { reuse.eraseTimestamp(); } } public void setAbsoluteTimestamp(long timestamp) { reuse.setTimestamp(timestamp); } public void eraseTimestamp() { reuse.eraseTimestamp(); } @Override public void close() { output.close(); }}TimestampedCollector实现了flink的Collector接口，这里头额外新增了setTimestamp、setAbsoluteTimestamp、eraseTimestamp方法它使用了StreamRecord对象，它里头有value、timestamp、hasTimestamp三个属性，可以将value与时间戳关联起来这里的collect方法调用了StreamRecord的replace返回的对象，replace方法只是更新了value引用，但是里头的时间戳没有更新AbstractStormCollector.tansformAndEmitflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/AbstractStormCollector.java /* * Transforms a Storm tuple into a Flink tuple of type {@code OUT} and emits this tuple via {@link #doEmit(Object)} * to the specified output stream. * * @param The * The output stream id. * @param tuple * The Storm tuple to be emitted. * @return the return value of {@link #doEmit(Object)} / @SuppressWarnings(“unchecked”) protected final List<Integer> tansformAndEmit(final String streamId, final List<Object> tuple) { List<Integer> taskIds; int numAtt = this.numberOfAttributes.get(streamId); int taskIdIdx = numAtt; if (this.taskId >= 0 && numAtt < 0) { numAtt = 1; taskIdIdx = 0; } if (numAtt >= 0) { assert (tuple.size() == numAtt); Tuple out = this.outputTuple.get(streamId); for (int i = 0; i < numAtt; ++i) { out.setField(tuple.get(i), i); } if (this.taskId >= 0) { out.setField(this.taskId, taskIdIdx); } if (this.split) { this.splitTuple.streamId = streamId; this.splitTuple.value = out; taskIds = doEmit((OUT) this.splitTuple); } else { taskIds = doEmit((OUT) out); } } else { assert (tuple.size() == 1); if (this.split) { this.splitTuple.streamId = streamId; this.splitTuple.value = tuple.get(0); taskIds = doEmit((OUT) this.splitTuple); } else { taskIds = doEmit((OUT) tuple.get(0)); } } this.tupleEmitted = true; return taskIds; }AbstractStormCollector.tansformAndEmit，这里主要处理了split的场景，即一个bolt declare了多个stream，最后都通过子类BoltCollector.doEmit来发射数据如果split为true，则传给doEmit方法的是splitTuple，即SplitStreamType，它记录了streamId及其value如果split为false，则传给doEmit方法的是Tuple类型，即相当于SplitStreamType中的value，相比于SplitStreamType少了streamId信息Task.runflink-runtime_2.11-1.6.2-sources.jar!/org/apache/flink/runtime/taskmanager/Task.java/* * The Task represents one execution of a parallel subtask on a TaskManager. * A Task wraps a Flink operator (which may be a user function) and * runs it, providing all services necessary for example to consume input data, * produce its results (intermediate result partitions) and communicate * with the JobManager. * * The Flink operators (implemented as subclasses of * {@link AbstractInvokable} have only data readers, -writers, and certain event callbacks. * The task connects those to the network stack and actor messages, and tracks the state * of the execution and handles exceptions. * * Tasks have no knowledge about how they relate to other tasks, or whether they * are the first attempt to execute the task, or a repeated attempt. All of that * is only known to the JobManager. All the task knows are its own runnable code, * the task’s configuration, and the IDs of the intermediate results to consume and * produce (if any). * * Each Task is run by one dedicated thread. /public class Task implements Runnable, TaskActions, CheckpointListener { //…… /* * The core work method that bootstraps the task and executes its code. / @Override public void run() { //…… // now load and instantiate the task’s invokable code invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env); // —————————————————————- // actual task core work // —————————————————————- // we must make strictly sure that the invokable is accessible to the cancel() call // by the time we switched to running. this.invokable = invokable; // switch to the RUNNING state, if that fails, we have been canceled/failed in the meantime if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) { throw new CancelTaskException(); } // notify everyone that we switched to running notifyObservers(ExecutionState.RUNNING, null); taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING)); // make sure the user code classloader is accessible thread-locally executingThread.setContextClassLoader(userCodeClassLoader); // run the invokable invokable.invoke(); //…… }}Task的run方法会调用invokable.invoke()，这里的invokable为StreamTaskStreamTask.invokeflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/tasks/StreamTask.java/* * Base class for all streaming tasks. A task is the unit of local processing that is deployed * and executed by the TaskManagers. Each task runs one or more {@link StreamOperator}s which form * the Task’s operator chain. Operators that are chained together execute synchronously in the * same thread and hence on the same stream partition. A common case for these chains * are successive map/flatmap/filter tasks. * * The task chain contains one “head” operator and multiple chained operators. * The StreamTask is specialized for the type of the head operator: one-input and two-input tasks, * as well as for sources, iteration heads and iteration tails. * * The Task class deals with the setup of the streams read by the head operator, and the streams * produced by the operators at the ends of the operator chain. Note that the chain may fork and * thus have multiple ends. * * The life cycle of the task is set up as follows: * <pre>{@code * – setInitialState -> provides state of all operators in the chain * * – invoke() * | * +—-> Create basic utils (config, etc) and load the chain of operators * +—-> operators.setup() * +—-> task specific init() * +—-> initialize-operator-states() * +—-> open-operators() * +—-> run() * +—-> close-operators() * +—-> dispose-operators() * +—-> common cleanup * +—-> task specific cleanup() * }</pre> * * The {@code StreamTask} has a lock object called {@code lock}. All calls to methods on a * {@code StreamOperator} must be synchronized on this lock object to ensure that no methods * are called concurrently. * * @param <OUT> * @param <OP> */@Internalpublic abstract class StreamTask<OUT, OP extends StreamOperator<OUT>> extends AbstractInvokable implements AsyncExceptionHandler { //…… @Override public final void invoke() throws Exception { boolean disposed = false; try { //…… // let the task do its work isRunning = true; run(); // if this left the run() method cleanly despite the fact that this was canceled, // make sure the “clean shutdown” is not attempted if (canceled) { throw new CancelTaskException(); } LOG.debug(“Finished task {}”, getName()); //…… } finally { // clean up everything we initialized isRunning = false; //…… } }}StreamTask的invoke方法里头调用了子类的run方法，这里子类为OneInputStreamTaskOneInputStreamTask.runflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/tasks/OneInputStreamTask.java @Override protected void run() throws Exception { // cache processor reference on the stack, to make the code more JIT friendly final StreamInputProcessor<IN> inputProcessor = this.inputProcessor; while (running && inputProcessor.processInput()) { // all the work happens in the “processInput” method } }该run方法主要是调用inputProcessor.processInput()，这里的inputProcessor为StreamInputProcessorStreamInputProcessor.processInputflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java public boolean processInput() throws Exception { if (isFinished) { return false; } if (numRecordsIn == null) { try { numRecordsIn = ((OperatorMetricGroup) streamOperator.getMetricGroup()).getIOMetricGroup().getNumRecordsInCounter(); } catch (Exception e) { LOG.warn(“An exception occurred during the metrics setup.”, e); numRecordsIn = new SimpleCounter(); } } while (true) { if (currentRecordDeserializer != null) { DeserializationResult result = currentRecordDeserializer.getNextRecord(deserializationDelegate); if (result.isBufferConsumed()) { currentRecordDeserializer.getCurrentBuffer().recycleBuffer(); currentRecordDeserializer = null; } if (result.isFullRecord()) { StreamElement recordOrMark = deserializationDelegate.getInstance(); if (recordOrMark.isWatermark()) { // handle watermark statusWatermarkValve.inputWatermark(recordOrMark.asWatermark(), currentChannel); continue; } else if (recordOrMark.isStreamStatus()) { // handle stream status statusWatermarkValve.inputStreamStatus(recordOrMark.asStreamStatus(), currentChannel); continue; } else if (recordOrMark.isLatencyMarker()) { // handle latency marker synchronized (lock) { streamOperator.processLatencyMarker(recordOrMark.asLatencyMarker()); } continue; } else { // now we can do the actual processing StreamRecord<IN> record = recordOrMark.asRecord(); synchronized (lock) { numRecordsIn.inc(); streamOperator.setKeyContextElement1(record); streamOperator.processElement(record); } return true; } } } //…… } }该processInput方法，先是通过currentRecordDeserializer.getNextRecord(deserializationDelegate)获取nextRecord，之后有调用到streamOperator.processElement(record)来处理，这里的streamOperator为BoltWrapper小结flink用BoltWrapper来包装storm的IRichBolt，它实现OneInputStreamOperator接口的processElement方法，在该方法中执行bolt.execute方法；另外在实现StreamOperator的open方法中调用bolt的prepare方法，传入包装BoltCollector的OutputCollector，通过BoltCollector来收集bolt.execute时发射的数据到flink，它使用的是flink的TimestampedCollectorBoltCollector的emit方法内部调用了AbstractStormCollector.tansformAndEmit(它最后调用BoltCollector.doEmit方法来发射)，针对多个stream的场景，封装了SplitStreamType的tuple给到doEmit方法；如果只有一个stream，则仅仅将普通的tuple传给doEmit方法flink的Task的run方法会调用StreamTask的invoke方法，而StreamTask的invoke方法会调用子类(这里子类为OneInputStreamTask)的run方法，OneInputStreamTask的run方法是不断循环调用inputProcessor.processInput()，这里的inputProcessor为StreamInputProcessor，它的processInput()会调用currentRecordDeserializer.getNextRecord(deserializationDelegate)获取nextRecord，之后根据条件选择调用streamOperator.processElement(record)方法，这里的streamOperator为BoltWrapper，而BoltWrapper的processElement正好调用storm bolt的execute方法来执行bolt逻辑并使用flink的BoltCollector进行发射docStorm Compatibility Beta ...

聊聊flink的SpoutWrapper

序本文主要研究一下flink的SpoutWrapperSpoutWrapperflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/SpoutWrapper.java/** * A {@link SpoutWrapper} wraps an {@link IRichSpout} in order to execute it within a Flink Streaming program. It * takes the spout’s output tuples and transforms them into Flink tuples of type {@code OUT} (see * {@link SpoutCollector} for supported types). * * Per default, {@link SpoutWrapper} calls the wrapped spout’s {@link IRichSpout#nextTuple() nextTuple()} method in * an infinite loop. * Alternatively, {@link SpoutWrapper} can call {@link IRichSpout#nextTuple() nextTuple()} for a finite number of * times and terminate automatically afterwards (for finite input streams). The number of {@code nextTuple()} calls can * be specified as a certain number of invocations or can be undefined. In the undefined case, {@link SpoutWrapper} * terminates if no record was emitted to the output collector for the first time during a call to * {@link IRichSpout#nextTuple() nextTuple()}. * If the given spout implements {@link FiniteSpout} interface and {@link #numberOfInvocations} is not provided or * is {@code null}, {@link SpoutWrapper} calls {@link IRichSpout#nextTuple() nextTuple()} method until * {@link FiniteSpout#reachedEnd()} returns true. /public final class SpoutWrapper<OUT> extends RichParallelSourceFunction<OUT> implements StoppableFunction { //…… /* The number of {@link IRichSpout#nextTuple()} calls. / private Integer numberOfInvocations; // do not use int -> null indicates an infinite loop /* * Instantiates a new {@link SpoutWrapper} that calls the {@link IRichSpout#nextTuple() nextTuple()} method of * the given {@link IRichSpout spout} a finite number of times. The output type will be one of {@link Tuple0} to * {@link Tuple25} depending on the spout’s declared number of attributes. * * @param spout * The {@link IRichSpout spout} to be used. * @param numberOfInvocations * The number of calls to {@link IRichSpout#nextTuple()}. If value is negative, {@link SpoutWrapper} * terminates if no tuple was emitted for the first time. If value is {@code null}, finite invocation is * disabled. * @throws IllegalArgumentException * If the number of declared output attributes is not with range [0;25]. / public SpoutWrapper(final IRichSpout spout, final Integer numberOfInvocations) throws IllegalArgumentException { this(spout, (Collection<String>) null, numberOfInvocations); } /* * Instantiates a new {@link SpoutWrapper} that calls the {@link IRichSpout#nextTuple() nextTuple()} method of * the given {@link IRichSpout spout} in an infinite loop. The output type will be one of {@link Tuple0} to * {@link Tuple25} depending on the spout’s declared number of attributes. * * @param spout * The {@link IRichSpout spout} to be used. * @throws IllegalArgumentException * If the number of declared output attributes is not with range [0;25]. / public SpoutWrapper(final IRichSpout spout) throws IllegalArgumentException { this(spout, (Collection<String>) null, null); } @Override public final void run(final SourceContext<OUT> ctx) throws Exception { final GlobalJobParameters config = super.getRuntimeContext().getExecutionConfig() .getGlobalJobParameters(); StormConfig stormConfig = new StormConfig(); if (config != null) { if (config instanceof StormConfig) { stormConfig = (StormConfig) config; } else { stormConfig.putAll(config.toMap()); } } final TopologyContext stormTopologyContext = WrapperSetupHelper.createTopologyContext( (StreamingRuntimeContext) super.getRuntimeContext(), this.spout, this.name, this.stormTopology, stormConfig); SpoutCollector<OUT> collector = new SpoutCollector<OUT>(this.numberOfAttributes, stormTopologyContext.getThisTaskId(), ctx); this.spout.open(stormConfig, stormTopologyContext, new SpoutOutputCollector(collector)); this.spout.activate(); if (numberOfInvocations == null) { if (this.spout instanceof FiniteSpout) { final FiniteSpout finiteSpout = (FiniteSpout) this.spout; while (this.isRunning && !finiteSpout.reachedEnd()) { finiteSpout.nextTuple(); } } else { while (this.isRunning) { this.spout.nextTuple(); } } } else { int counter = this.numberOfInvocations; if (counter >= 0) { while ((–counter >= 0) && this.isRunning) { this.spout.nextTuple(); } } else { do { collector.tupleEmitted = false; this.spout.nextTuple(); } while (collector.tupleEmitted && this.isRunning); } } } /* * {@inheritDoc} * * Sets the {@link #isRunning} flag to {@code false}. / @Override public void cancel() { this.isRunning = false; } /* * {@inheritDoc} * * Sets the {@link #isRunning} flag to {@code false}. / @Override public void stop() { this.isRunning = false; } @Override public void close() throws Exception { this.spout.close(); }}SpoutWrapper继承了RichParallelSourceFunction类，实现了StoppableFunction接口的stop方法SpoutWrapper的run方法创建了flink的SpoutCollector作为storm的SpoutOutputCollector的构造器参数，之后调用spout的open方法，把包装了SpoutCollector(flink)的SpoutOutputCollector传递给spout，用来收集spout发射的数据之后就是根据numberOfInvocations参数来调用spout.nextTuple()方法来发射数据；numberOfInvocations是控制调用spout的nextTuple的次数，它可以在创建SpoutWrapper的时候在构造器中设置，如果使用没有numberOfInvocations参数的构造器，则该值为null，表示infinite loopflink对storm的spout有进行封装，提供了FiniteSpout接口，它有个reachedEnd接口用来判断数据是否发送完毕，来将storm的spout改造为finite模式；这里如果使用的是storm原始的spout，则就是一直循环调用nextTuple方法如果有设置numberOfInvocations而且大于等于0，则根据指定的次数来调用nextTuple方法；如果该值小于0，则根据collector.tupleEmitted值来判断是否终止循环SpoutCollectorflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/SpoutCollector.java/* * A {@link SpoutCollector} is used by {@link SpoutWrapper} to provided an Storm * compatible output collector to the wrapped spout. It transforms the emitted Storm tuples into * Flink tuples and emits them via the provide {@link SourceContext} object. /class SpoutCollector<OUT> extends AbstractStormCollector<OUT> implements ISpoutOutputCollector { /* The Flink source context object. / private final SourceContext<OUT> flinkContext; /* * Instantiates a new {@link SpoutCollector} that emits Flink tuples to the given Flink source context. If the * number of attributes is specified as zero, any output type is supported. If the number of attributes is between 0 * to 25, the output type is {@link Tuple0} to {@link Tuple25}, respectively. * * @param numberOfAttributes * The number of attributes of the emitted tuples. * @param taskId * The ID of the producer task (negative value for unknown). * @param flinkContext * The Flink source context to be used. * @throws UnsupportedOperationException * if the specified number of attributes is greater than 25 / SpoutCollector(final HashMap<String, Integer> numberOfAttributes, final int taskId, final SourceContext<OUT> flinkContext) throws UnsupportedOperationException { super(numberOfAttributes, taskId); assert (flinkContext != null); this.flinkContext = flinkContext; } @Override protected List<Integer> doEmit(final OUT flinkTuple) { this.flinkContext.collect(flinkTuple); // TODO return null; } @Override public void reportError(final Throwable error) { // not sure, if Flink can support this } @Override public List<Integer> emit(final String streamId, final List<Object> tuple, final Object messageId) { return this.tansformAndEmit(streamId, tuple); } @Override public void emitDirect(final int taskId, final String streamId, final List<Object> tuple, final Object messageId) { throw new UnsupportedOperationException(“Direct emit is not supported by Flink”); } public long getPendingCount() { return 0; }}SpoutCollector实现了storm的ISpoutOutputCollector接口，实现了该接口定义的emit、emitDirect、getPendingCount、reportError方法；flink目前不支持emitDirect方法，另外getPendingCount也始终返回0，reportError方法是个空操作doEmit里头调用flinkContext.collect(flinkTuple)来发射数据，该方法为protected，主要是给tansformAndEmit调用的tansformAndEmit方法由父类AbstractStormCollector提供AbstractStormCollector.tansformAndEmitflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/AbstractStormCollector.java /* * Transforms a Storm tuple into a Flink tuple of type {@code OUT} and emits this tuple via {@link #doEmit(Object)} * to the specified output stream. * * @param The * The output stream id. * @param tuple * The Storm tuple to be emitted. * @return the return value of {@link #doEmit(Object)} / @SuppressWarnings(“unchecked”) protected final List<Integer> tansformAndEmit(final String streamId, final List<Object> tuple) { List<Integer> taskIds; int numAtt = this.numberOfAttributes.get(streamId); int taskIdIdx = numAtt; if (this.taskId >= 0 && numAtt < 0) { numAtt = 1; taskIdIdx = 0; } if (numAtt >= 0) { assert (tuple.size() == numAtt); Tuple out = this.outputTuple.get(streamId); for (int i = 0; i < numAtt; ++i) { out.setField(tuple.get(i), i); } if (this.taskId >= 0) { out.setField(this.taskId, taskIdIdx); } if (this.split) { this.splitTuple.streamId = streamId; this.splitTuple.value = out; taskIds = doEmit((OUT) this.splitTuple); } else { taskIds = doEmit((OUT) out); } } else { assert (tuple.size() == 1); if (this.split) { this.splitTuple.streamId = streamId; this.splitTuple.value = tuple.get(0); taskIds = doEmit((OUT) this.splitTuple); } else { taskIds = doEmit((OUT) tuple.get(0)); } } this.tupleEmitted = true; return taskIds; }AbstractStormCollector.tansformAndEmit，这里主要处理了split的场景，即一个spout declare了多个stream，最后都通过子类SpoutCollector.doEmit来发射数据如果split为true，则传给doEmit方法的是splitTuple，即SplitStreamType，它记录了streamId及其value如果split为false，则传给doEmit方法的是Tuple类型，即相当于SplitStreamType中的value，相比于SplitStreamType少了streamId信息Task.runflink-runtime_2.11-1.6.2-sources.jar!/org/apache/flink/runtime/taskmanager/Task.java/* * The Task represents one execution of a parallel subtask on a TaskManager. * A Task wraps a Flink operator (which may be a user function) and * runs it, providing all services necessary for example to consume input data, * produce its results (intermediate result partitions) and communicate * with the JobManager. * * The Flink operators (implemented as subclasses of * {@link AbstractInvokable} have only data readers, -writers, and certain event callbacks. * The task connects those to the network stack and actor messages, and tracks the state * of the execution and handles exceptions. * * Tasks have no knowledge about how they relate to other tasks, or whether they * are the first attempt to execute the task, or a repeated attempt. All of that * is only known to the JobManager. All the task knows are its own runnable code, * the task’s configuration, and the IDs of the intermediate results to consume and * produce (if any). * * Each Task is run by one dedicated thread. /public class Task implements Runnable, TaskActions, CheckpointListener { //…… /* * The core work method that bootstraps the task and executes its code. / @Override public void run() { //…… // now load and instantiate the task’s invokable code invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env); // —————————————————————- // actual task core work // —————————————————————- // we must make strictly sure that the invokable is accessible to the cancel() call // by the time we switched to running. this.invokable = invokable; // switch to the RUNNING state, if that fails, we have been canceled/failed in the meantime if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) { throw new CancelTaskException(); } // notify everyone that we switched to running notifyObservers(ExecutionState.RUNNING, null); taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING)); // make sure the user code classloader is accessible thread-locally executingThread.setContextClassLoader(userCodeClassLoader); // run the invokable invokable.invoke(); //…… }}Task的run方法会调用invokable.invoke()，这里的invokable为StreamTaskStreamTaskflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/tasks/StreamTask.java/* * Base class for all streaming tasks. A task is the unit of local processing that is deployed * and executed by the TaskManagers. Each task runs one or more {@link StreamOperator}s which form * the Task’s operator chain. Operators that are chained together execute synchronously in the * same thread and hence on the same stream partition. A common case for these chains * are successive map/flatmap/filter tasks. * * The task chain contains one “head” operator and multiple chained operators. * The StreamTask is specialized for the type of the head operator: one-input and two-input tasks, * as well as for sources, iteration heads and iteration tails. * * The Task class deals with the setup of the streams read by the head operator, and the streams * produced by the operators at the ends of the operator chain. Note that the chain may fork and * thus have multiple ends. * * The life cycle of the task is set up as follows: * <pre>{@code * – setInitialState -> provides state of all operators in the chain * * – invoke() * | * +—-> Create basic utils (config, etc) and load the chain of operators * +—-> operators.setup() * +—-> task specific init() * +—-> initialize-operator-states() * +—-> open-operators() * +—-> run() * +—-> close-operators() * +—-> dispose-operators() * +—-> common cleanup * +—-> task specific cleanup() * }</pre> * * The {@code StreamTask} has a lock object called {@code lock}. All calls to methods on a * {@code StreamOperator} must be synchronized on this lock object to ensure that no methods * are called concurrently. * * @param <OUT> * @param <OP> /@Internalpublic abstract class StreamTask<OUT, OP extends StreamOperator<OUT>> extends AbstractInvokable implements AsyncExceptionHandler { //…… @Override public final void invoke() throws Exception { boolean disposed = false; try { //…… // let the task do its work isRunning = true; run(); // if this left the run() method cleanly despite the fact that this was canceled, // make sure the “clean shutdown” is not attempted if (canceled) { throw new CancelTaskException(); } LOG.debug(“Finished task {}”, getName()); //…… } finally { // clean up everything we initialized isRunning = false; //…… } }}StreamTask的invoke方法里头调用子类的run方法，这里子类为StoppableSourceStreamTaskStoppableSourceStreamTaskflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/tasks/StoppableSourceStreamTask.java/* * Stoppable task for executing stoppable streaming sources. * * @param <OUT> Type of the produced elements * @param <SRC> Stoppable source function /public class StoppableSourceStreamTask<OUT, SRC extends SourceFunction<OUT> & StoppableFunction> extends SourceStreamTask<OUT, SRC, StoppableStreamSource<OUT, SRC>> implements StoppableTask { private volatile boolean stopped; public StoppableSourceStreamTask(Environment environment) { super(environment); } @Override protected void run() throws Exception { if (!stopped) { super.run(); } } @Override public void stop() { stopped = true; if (this.headOperator != null) { this.headOperator.stop(); } }}StoppableSourceStreamTask继承了SourceStreamTask，主要是实现了StoppableTask的stop方法，它的run方法由其直接父类SourceStreamTask来实现SourceStreamTaskflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/runtime/tasks/SourceStreamTask.java/* * {@link StreamTask} for executing a {@link StreamSource}. * * One important aspect of this is that the checkpointing and the emission of elements must never * occur at the same time. The execution must be serial. This is achieved by having the contract * with the StreamFunction that it must only modify its state or emit elements in * a synchronized block that locks on the lock Object. Also, the modification of the state * and the emission of elements must happen in the same block of code that is protected by the * synchronized block. * * @param <OUT> Type of the output elements of this source. * @param <SRC> Type of the source function for the stream source operator * @param <OP> Type of the stream source operator /@Internalpublic class SourceStreamTask<OUT, SRC extends SourceFunction<OUT>, OP extends StreamSource<OUT, SRC>> extends StreamTask<OUT, OP> { //…… @Override protected void run() throws Exception { headOperator.run(getCheckpointLock(), getStreamStatusMaintainer()); }}SourceStreamTask主要是调用StreamSource的run方法StreamSourceflink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/operators/StreamSource.java/* * {@link StreamOperator} for streaming sources. * * @param <OUT> Type of the output elements * @param <SRC> Type of the source function of this stream source operator */@Internalpublic class StreamSource<OUT, SRC extends SourceFunction<OUT>> extends AbstractUdfStreamOperator<OUT, SRC> implements StreamOperator<OUT> { //…… public void run(final Object lockingObject, final StreamStatusMaintainer streamStatusMaintainer) throws Exception { run(lockingObject, streamStatusMaintainer, output); } public void run(final Object lockingObject, final StreamStatusMaintainer streamStatusMaintainer, final Output<StreamRecord<OUT>> collector) throws Exception { final TimeCharacteristic timeCharacteristic = getOperatorConfig().getTimeCharacteristic(); final Configuration configuration = this.getContainingTask().getEnvironment().getTaskManagerInfo().getConfiguration(); final long latencyTrackingInterval = getExecutionConfig().isLatencyTrackingConfigured() ? getExecutionConfig().getLatencyTrackingInterval() : configuration.getLong(MetricOptions.LATENCY_INTERVAL); LatencyMarksEmitter<OUT> latencyEmitter = null; if (latencyTrackingInterval > 0) { latencyEmitter = new LatencyMarksEmitter<>( getProcessingTimeService(), collector, latencyTrackingInterval, this.getOperatorID(), getRuntimeContext().getIndexOfThisSubtask()); } final long watermarkInterval = getRuntimeContext().getExecutionConfig().getAutoWatermarkInterval(); this.ctx = StreamSourceContexts.getSourceContext( timeCharacteristic, getProcessingTimeService(), lockingObject, streamStatusMaintainer, collector, watermarkInterval, -1); try { userFunction.run(ctx); // if we get here, then the user function either exited after being done (finite source) // or the function was canceled or stopped. For the finite source case, we should emit // a final watermark that indicates that we reached the end of event-time if (!isCanceledOrStopped()) { ctx.emitWatermark(Watermark.MAX_WATERMARK); } } finally { // make sure that the context is closed in any case ctx.close(); if (latencyEmitter != null) { latencyEmitter.close(); } } }它调用了userFunction.run(ctx)，这里的userFunction为SpoutWrapper，从而完成spout的nextTuple的触发小结flink使用SpoutWrapper来包装storm原始的spout，它在run方法里头创建了flink的SpoutCollector作为storm的SpoutOutputCollector的构造器参数，之后调用spout的open方法，把包装了SpoutCollector(flink)的SpoutOutputCollector传递给spout，用来收集spout发射的数据；之后就是根据numberOfInvocations参数来调用spout.nextTuple()方法来发射数据；numberOfInvocations是控制调用spout的nextTuple的次数，它可以在创建SpoutWrapper的时候在构造器中设置，如果使用没有numberOfInvocations参数的构造器，则该值为null，表示infinite loopSpoutCollector的emit方法内部调用了AbstractStormCollector.tansformAndEmit(它最后调用SpoutCollector.doEmit方法来发射)，针对多个stream的场景，封装了SplitStreamType的tuple给到doEmit方法；如果只有一个stream，则仅仅将普通的tuple传给doEmit方法flink的Task的run方法会调用StreamTask的invoke方法，而StreamTask的invoke方法会调用子类(这里子类为StoppableSourceStreamTask)的run方法，StoppableSourceStreamTask的run方法是直接父类SourceStreamTask来实现的，而它主要是调用了StreamSource的run方法，而StreamSource的run方法调用了userFunction.run(ctx)，这里的userFunction为SpoutWrapper，从而执行spout的nextTuple的逻辑，通过flink的SpoutCollector进行发射docStorm Compatibility Beta ...

聊聊flink如何兼容StormTopology

序本文主要研究一下flink如何兼容StormTopology实例 @Test public void testStormWordCount() throws Exception { //NOTE 1 build Topology the Storm way final TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“spout”, new RandomWordSpout(), 1); builder.setBolt(“count”, new WordCountBolt(), 5) .fieldsGrouping(“spout”, new Fields(“word”)); builder.setBolt(“print”, new PrintBolt(), 1) .shuffleGrouping(“count”); //NOTE 2 convert StormTopology to FlinkTopology FlinkTopology flinkTopology = FlinkTopology.createTopology(builder); //NOTE 3 execute program locally using FlinkLocalCluster Config conf = new Config(); // only required to stabilize integration test conf.put(FlinkLocalCluster.SUBMIT_BLOCKING, true); final FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster(); cluster.submitTopology(“stormWordCount”, conf, flinkTopology); cluster.shutdown(); }这里使用FlinkLocalCluster.getLocalCluster()来创建或获取FlinkLocalCluster，之后调用FlinkLocalCluster.submitTopology来提交topology，结束时通过FlinkLocalCluster.shutdown来关闭cluster这里构建的RandomWordSpout继承自storm的BaseRichSpout，WordCountBolt继承自storm的BaseBasicBolt；PrintBolt继承自storm的BaseRichBolt(由于flink是使用的Checkpoint机制，不会转换storm的ack操作，因而这里用BaseBasicBolt还是BaseRichBolt都无特别要求)FlinkLocalCluster.submitTopology这里使用的topology是StormTopoloy转换后的FlinkTopologyLocalClusterFactoryflink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java // ———————————————————————— // Access to default local cluster // ———————————————————————— // A different {@link FlinkLocalCluster} to be used for execution of ITCases private static LocalClusterFactory currentFactory = new DefaultLocalClusterFactory(); /** * Returns a {@link FlinkLocalCluster} that should be used for execution. If no cluster was set by * {@link #initialize(LocalClusterFactory)} in advance, a new {@link FlinkLocalCluster} is returned. * * @return a {@link FlinkLocalCluster} to be used for execution / public static FlinkLocalCluster getLocalCluster() { return currentFactory.createLocalCluster(); } /* * Sets a different factory for FlinkLocalClusters to be used for execution. * * @param clusterFactory * The LocalClusterFactory to create the local clusters for execution. / public static void initialize(LocalClusterFactory clusterFactory) { currentFactory = Objects.requireNonNull(clusterFactory); } // ———————————————————————— // Cluster factory // ———————————————————————— /* * A factory that creates local clusters. / public interface LocalClusterFactory { /* * Creates a local Flink cluster. * @return A local Flink cluster. / FlinkLocalCluster createLocalCluster(); } /* * A factory that instantiates a FlinkLocalCluster. / public static class DefaultLocalClusterFactory implements LocalClusterFactory { @Override public FlinkLocalCluster createLocalCluster() { return new FlinkLocalCluster(); } }flink在FlinkLocalCluster里头提供了一个静态方法getLocalCluster，用来获取FlinkLocalCluster，它是通过LocalClusterFactory来创建一个FlinkLocalClusterLocalClusterFactory这里使用的是DefaultLocalClusterFactory实现类，它的createLocalCluster方法，直接new了一个FlinkLocalCluster目前的实现来看，每次调用FlinkLocalCluster.getLocalCluster，都会创建一个新的FlinkLocalCluster，这个在调用的时候是需要注意一下的FlinkTopologyflink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java /* * Creates a Flink program that uses the specified spouts and bolts. * @param stormBuilder The Storm topology builder to use for creating the Flink topology. * @return A {@link FlinkTopology} which contains the translated Storm topology and may be executed. / public static FlinkTopology createTopology(TopologyBuilder stormBuilder) { return new FlinkTopology(stormBuilder); } private FlinkTopology(TopologyBuilder builder) { this.builder = builder; this.stormTopology = builder.createTopology(); // extract the spouts and bolts this.spouts = getPrivateField("_spouts"); this.bolts = getPrivateField("_bolts"); this.env = StreamExecutionEnvironment.getExecutionEnvironment(); // Kick off the translation immediately translateTopology(); }FlinkTopology提供了一个静态工厂方法createTopology用来创建FlinkTopologyFlinkTopology先保存一下TopologyBuilder，然后通过getPrivateField反射调用getDeclaredField获取_spouts、_bolts私有属性然后保存起来，方便后面转换topology使用之后先获取到ExecutionEnvironment，最后就是调用translateTopology进行整个StormTopology的转换translateTopologyflink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java /* * Creates a Flink program that uses the specified spouts and bolts. / private void translateTopology() { unprocessdInputsPerBolt.clear(); outputStreams.clear(); declarers.clear(); availableInputs.clear(); // Storm defaults to parallelism 1 env.setParallelism(1); / Translation of topology / for (final Entry<String, IRichSpout> spout : spouts.entrySet()) { final String spoutId = spout.getKey(); final IRichSpout userSpout = spout.getValue(); final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer(); userSpout.declareOutputFields(declarer); final HashMap<String, Fields> sourceStreams = declarer.outputStreams; this.outputStreams.put(spoutId, sourceStreams); declarers.put(spoutId, declarer); final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>(); final DataStreamSource<?> source; if (sourceStreams.size() == 1) { final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout, spoutId, null, null); spoutWrapperSingleOutput.setStormTopology(stormTopology); final String outputStreamId = (String) sourceStreams.keySet().toArray()[0]; DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId, declarer.getOutputType(outputStreamId)); outputStreams.put(outputStreamId, src); source = src; } else { final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>( userSpout, spoutId, null, null); spoutWrapperMultipleOutputs.setStormTopology(stormTopology); @SuppressWarnings({ “unchecked”, “rawtypes” }) DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource( spoutWrapperMultipleOutputs, spoutId, (TypeInformation) TypeExtractor.getForClass(SplitStreamType.class)); SplitStream<SplitStreamType<Tuple>> splitSource = multiSource .split(new StormStreamSelector<Tuple>()); for (String streamId : sourceStreams.keySet()) { SingleOutputStreamOperator<Tuple> outStream = splitSource.select(streamId) .map(new SplitStreamMapper<Tuple>()); outStream.getTransformation().setOutputType(declarer.getOutputType(streamId)); outputStreams.put(streamId, outStream); } source = multiSource; } availableInputs.put(spoutId, outputStreams); final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common(); if (common.is_set_parallelism_hint()) { int dop = common.get_parallelism_hint(); source.setParallelism(dop); } else { common.set_parallelism_hint(1); } } /* * 1. Connect all spout streams with bolts streams * 2. Then proceed with the bolts stream already connected * * Because we do not know the order in which an iterator steps over a set, we might process a consumer before * its producer * ->thus, we might need to repeat multiple times / boolean makeProgress = true; while (bolts.size() > 0) { if (!makeProgress) { StringBuilder strBld = new StringBuilder(); strBld.append(“Unable to build Topology. Could not connect the following bolts:”); for (String boltId : bolts.keySet()) { strBld.append("\n “); strBld.append(boltId); strBld.append(”: missing input streams ["); for (Entry<GlobalStreamId, Grouping> streams : unprocessdInputsPerBolt .get(boltId)) { strBld.append("’"); strBld.append(streams.getKey().get_streamId()); strBld.append("’ from ‘"); strBld.append(streams.getKey().get_componentId()); strBld.append("’; “); } strBld.append(”]"); } throw new RuntimeException(strBld.toString()); } makeProgress = false; final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator(); while (boltsIterator.hasNext()) { final Entry<String, IRichBolt> bolt = boltsIterator.next(); final String boltId = bolt.getKey(); final IRichBolt userBolt = copyObject(bolt.getValue()); final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common(); Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId); if (unprocessedBoltInputs == null) { unprocessedBoltInputs = new HashSet<>(); unprocessedBoltInputs.addAll(common.get_inputs().entrySet()); unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs); } // check if all inputs are available final int numberOfInputs = unprocessedBoltInputs.size(); int inputsAvailable = 0; for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) { final String producerId = entry.getKey().get_componentId(); final String streamId = entry.getKey().get_streamId(); final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId); if (streams != null && streams.get(streamId) != null) { inputsAvailable++; } } if (inputsAvailable != numberOfInputs) { // traverse other bolts first until inputs are available continue; } else { makeProgress = true; boltsIterator.remove(); } final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs); for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) { final GlobalStreamId streamId = input.getKey(); final Grouping grouping = input.getValue(); final String producerId = streamId.get_componentId(); final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId); inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer)); } final SingleOutputStreamOperator<?> outputStream = createOutput(boltId, userBolt, inputStreams); if (common.is_set_parallelism_hint()) { int dop = common.get_parallelism_hint(); outputStream.setParallelism(dop); } else { common.set_parallelism_hint(1); } } } }整个转换是先转换spout，再转换bolt，他们根据的spouts及bolts信息是在构造器里头使用反射从storm的TopologyBuilder对象获取到的flink使用FlinkOutputFieldsDeclarer(它实现了storm的OutputFieldsDeclarer接口)来承载storm的IRichSpout及IRichBolt里头配置的declareOutputFields信息，不过要注意的是flink不支持dirct emit；这里通过userSpout.declareOutputFields方法，将原始spout的declare信息设置到FlinkOutputFieldsDeclarerflink使用SpoutWrapper来包装spout，将其转换为RichParallelSourceFunction类型，这里对spout的outputStreams的个数是否大于1进行不同处理；之后就是将RichParallelSourceFunction作为StreamExecutionEnvironment.addSource方法的参数创建flink的DataStreamSource，并添加到availableInputs中，然后根据spout的parallelismHit来设置DataStreamSource的parallelism对于bolt的转换，这里维护了unprocessdInputsPerBolt，key为boltId，value为该bolt要连接的GlobalStreamId及Grouping方式，由于是使用map来进行遍历的，因此转换的bolt可能乱序，如果连接的GlobalStreamId存在则进行转换，然后从bolts中移除，bolt连接的GlobalStreamId不在availableInputs中的时候，需要跳过处理下一个，不会从bolts中移除，因为外层的循环条件是bolts的size大于0，就是依靠这个机制来处理乱序对于bolt的转换有一个重要的方法就是processInput，它把bolt的grouping转换为对spout的DataStream的对应操作(比如shuffleGrouping转换为对DataStream的rebalance操作，fieldsGrouping转换为对DataStream的keyBy操作，globalGrouping转换为global操作，allGrouping转换为broadcast操作)，之后调用createOutput方法转换bolt的执行逻辑，它使用BoltWrapper或者MergedInputsBoltWrapper将bolt转换为flink的OneInputStreamOperator，然后作为参数对stream进行transform操作返回flink的SingleOutputStreamOperator，同时将转换后的SingleOutputStreamOperator添加到availableInputs中，之后根据bolt的parallelismHint对这个SingleOutputStreamOperator设置parallelismFlinkLocalClusterflink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/api/FlinkLocalCluster.java/* * {@link FlinkLocalCluster} mimics a Storm {@link LocalCluster}. /public class FlinkLocalCluster { /* The log used by this mini cluster. / private static final Logger LOG = LoggerFactory.getLogger(FlinkLocalCluster.class); /* The Flink mini cluster on which to execute the programs. / private FlinkMiniCluster flink; /* Configuration key to submit topology in blocking mode if flag is set to {@code true}. */ public static final String SUBMIT_BLOCKING = “SUBMIT_STORM_TOPOLOGY_BLOCKING”; public FlinkLocalCluster() { } public FlinkLocalCluster(FlinkMiniCluster flink) { this.flink = Objects.requireNonNull(flink); } @SuppressWarnings(“rawtypes”) public void submitTopology(final String topologyName, final Map conf, final FlinkTopology topology) throws Exception { this.submitTopologyWithOpts(topologyName, conf, topology, null); } @SuppressWarnings(“rawtypes”) public void submitTopologyWithOpts(final String topologyName, final Map conf, final FlinkTopology topology, final SubmitOptions submitOpts) throws Exception { LOG.info(“Running Storm topology on FlinkLocalCluster”); boolean submitBlocking = false; if (conf != null) { Object blockingFlag = conf.get(SUBMIT_BLOCKING); if (blockingFlag instanceof Boolean) { submitBlocking = ((Boolean) blockingFlag).booleanValue(); } } FlinkClient.addStormConfigToTopology(topology, conf); StreamGraph streamGraph = topology.getExecutionEnvironment().getStreamGraph(); streamGraph.setJobName(topologyName); JobGraph jobGraph = streamGraph.getJobGraph(); if (this.flink == null) { Configuration configuration = new Configuration(); configuration.addAll(jobGraph.getJobConfiguration()); configuration.setString(TaskManagerOptions.MANAGED_MEMORY_SIZE, “0”); configuration.setInteger(TaskManagerOptions.NUM_TASK_SLOTS, jobGraph.getMaximumParallelism()); this.flink = new LocalFlinkMiniCluster(configuration, true); this.flink.start(); } if (submitBlocking) { this.flink.submitJobAndWait(jobGraph, false); } else { this.flink.submitJobDetached(jobGraph); } } public void killTopology(final String topologyName) { this.killTopologyWithOpts(topologyName, null); } public void killTopologyWithOpts(final String name, final KillOptions options) { } public void activate(final String topologyName) { } public void deactivate(final String topologyName) { } public void rebalance(final String name, final RebalanceOptions options) { } public void shutdown() { if (this.flink != null) { this.flink.stop(); this.flink = null; } } //……}FlinkLocalCluster的submitTopology方法调用了submitTopologyWithOpts，而后者主要是设置一些参数，调用topology.getExecutionEnvironment().getStreamGraph()根据transformations生成StreamGraph，再获取JobGraph，然后创建LocalFlinkMiniCluster并start，最后使用LocalFlinkMiniCluster的submitJobAndWait或submitJobDetached来提交整个JobGraph小结flink通过FlinkTopology对storm提供了一定的兼容性，这对于迁移storm到flink非常有帮助要在flink上运行storm的topology，主要有几个步骤，分别是构建storm原生的TopologyBuilder，之后通过FlinkTopology.createTopology(builder)来将StormTopology转换为FlinkTopology，最后是通过FlinkLocalCluster(本地模式)或者FlinkSubmitter(远程提交)的submitTopology方法提交FlinkTopologyFlinkTopology是flink兼容storm的核心，它负责将StormTopology转换为flink对应的结构，比如使用SpoutWrapper将spout转换为RichParallelSourceFunction，然后添加到StreamExecutionEnvironment创建DataStream，把bolt的grouping转换为对spout的DataStream的对应操作(比如shuffleGrouping转换为对DataStream的rebalance操作，fieldsGrouping转换为对DataStream的keyBy操作，globalGrouping转换为global操作，allGrouping转换为broadcast操作)，然后使用BoltWrapper或者MergedInputsBoltWrapper将bolt转换为flink的OneInputStreamOperator，然后作为参数对stream进行transform操作构建完FlinkTopology之后，就使用FlinkLocalCluster提交到本地执行，或者使用FlinkSubmitter提交到远程执行FlinkLocalCluster的submitTopology方法主要是通过FlinkTopology作用的StreamExecutionEnvironment生成StreamGraph，通过它获取JobGraph，然后创建LocalFlinkMiniCluster并start，最后通过LocalFlinkMiniCluster提交JobGraphdocStorm Compatibility Beta ...

聊聊flink LocalEnvironment的execute方法

序本文主要研究一下flink LocalEnvironment的execute方法实例 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<RecordDto> csvInput = env.readCsvFile(csvFilePath) .pojoType(RecordDto.class, “playerName”, “country”, “year”, “game”, “gold”, “silver”, “bronze”, “total”); DataSet<Tuple2<String, Integer>> groupedByCountry = csvInput .flatMap(new FlatMapFunction<RecordDto, Tuple2<String, Integer>>() { private static final long serialVersionUID = 1L; @Override public void flatMap(RecordDto record, Collector<Tuple2<String, Integer>> out) throws Exception { out.collect(new Tuple2<String, Integer>(record.getCountry(), 1)); } }).groupBy(0).sum(1); System.out.println("===groupedByCountry==="); groupedByCountry.print();这里使用DataSet从csv读取数据，然后进行flatMap、groupBy、sum操作，最后调用print输出DataSet.printflink-java-1.6.2-sources.jar!/org/apache/flink/api/java/DataSet.java /** * Prints the elements in a DataSet to the standard output stream {@link System#out} of the JVM that calls * the print() method. For programs that are executed in a cluster, this method needs * to gather the contents of the DataSet back to the client, to print it there. * * The string written for each element is defined by the {@link Object#toString()} method. * * This method immediately triggers the program execution, similar to the * {@link #collect()} and {@link #count()} methods. * * @see #printToErr() * @see #printOnTaskManager(String) / public void print() throws Exception { List<T> elements = collect(); for (T e: elements) { System.out.println(e); } }print方法这里主要是调用collect方法，获取结果，然后挨个打印DataSet.collectflink-java-1.6.2-sources.jar!/org/apache/flink/api/java/DataSet.java /* * Convenience method to get the elements of a DataSet as a List. * As DataSet can contain a lot of data, this method should be used with caution. * * @return A List containing the elements of the DataSet / public List<T> collect() throws Exception { final String id = new AbstractID().toString(); final TypeSerializer<T> serializer = getType().createSerializer(getExecutionEnvironment().getConfig()); this.output(new Utils.CollectHelper<>(id, serializer)).name(“collect()”); JobExecutionResult res = getExecutionEnvironment().execute(); ArrayList<byte[]> accResult = res.getAccumulatorResult(id); if (accResult != null) { try { return SerializedListAccumulator.deserializeList(accResult, serializer); } catch (ClassNotFoundException e) { throw new RuntimeException(“Cannot find type class of collected data type.”, e); } catch (IOException e) { throw new RuntimeException(“Serialization error while deserializing collected data”, e); } } else { throw new RuntimeException(“The call to collect() could not retrieve the DataSet.”); } }这里调用了getExecutionEnvironment().execute()来获取JobExecutionResult；executionEnvironment这里是LocalEnvironmentExecutionEnvironment.executeflink-java-1.6.2-sources.jar!/org/apache/flink/api/java/ExecutionEnvironment.java /* * Triggers the program execution. The environment will execute all parts of the program that have * resulted in a “sink” operation. Sink operations are for example printing results ({@link DataSet#print()}, * writing results (e.g. {@link DataSet#writeAsText(String)}, * {@link DataSet#write(org.apache.flink.api.common.io.FileOutputFormat, String)}, or other generic * data sinks created with {@link DataSet#output(org.apache.flink.api.common.io.OutputFormat)}. * * The program execution will be logged and displayed with a generated default name. * * @return The result of the job execution, containing elapsed time and accumulators. * @throws Exception Thrown, if the program executions fails. / public JobExecutionResult execute() throws Exception { return execute(getDefaultName()); } /* * Gets a default job name, based on the timestamp when this method is invoked. * * @return A default job name. / private static String getDefaultName() { return “Flink Java Job at " + Calendar.getInstance().getTime(); } /* * Triggers the program execution. The environment will execute all parts of the program that have * resulted in a “sink” operation. Sink operations are for example printing results ({@link DataSet#print()}, * writing results (e.g. {@link DataSet#writeAsText(String)}, * {@link DataSet#write(org.apache.flink.api.common.io.FileOutputFormat, String)}, or other generic * data sinks created with {@link DataSet#output(org.apache.flink.api.common.io.OutputFormat)}. * * The program execution will be logged and displayed with the given job name. * * @return The result of the job execution, containing elapsed time and accumulators. * @throws Exception Thrown, if the program executions fails. / public abstract JobExecutionResult execute(String jobName) throws Exception;具体的execute抽象方法由子类去实现，这里我们主要看一下LocalEnvironment的execute方法LocalEnvironment.executeflink-java-1.6.2-sources.jar!/org/apache/flink/api/java/LocalEnvironment.java @Override public JobExecutionResult execute(String jobName) throws Exception { if (executor == null) { startNewSession(); } Plan p = createProgramPlan(jobName); // Session management is disabled, revert this commit to enable //p.setJobId(jobID); //p.setSessionTimeout(sessionTimeout); JobExecutionResult result = executor.executePlan(p); this.lastJobExecutionResult = result; return result; } @Override @PublicEvolving public void startNewSession() throws Exception { if (executor != null) { // we need to end the previous session executor.stop(); // create also a new JobID jobID = JobID.generate(); } // create a new local executor executor = PlanExecutor.createLocalExecutor(configuration); executor.setPrintStatusDuringExecution(getConfig().isSysoutLoggingEnabled()); // if we have a session, start the mini cluster eagerly to have it available across sessions if (getSessionTimeout() > 0) { executor.start(); // also install the reaper that will shut it down eventually executorReaper = new ExecutorReaper(executor); } }这里判断executor为null的话，会调用startNewSession，startNewSession通过PlanExecutor.createLocalExecutor(configuration)来创建executor；如果sessionTimeout大于0，则这里会立马调用executor.start()，默认该值为0之后通过createProgramPlan方法来创建plan最后通过executor.executePlan(p)来获取JobExecutionResultPlanExecutor.createLocalExecutorflink-core-1.6.2-sources.jar!/org/apache/flink/api/common/PlanExecutor.java private static final String LOCAL_EXECUTOR_CLASS = “org.apache.flink.client.LocalExecutor”; /* * Creates an executor that runs the plan locally in a multi-threaded environment. * * @return A local executor. / public static PlanExecutor createLocalExecutor(Configuration configuration) { Class<? extends PlanExecutor> leClass = loadExecutorClass(LOCAL_EXECUTOR_CLASS); try { return leClass.getConstructor(Configuration.class).newInstance(configuration); } catch (Throwable t) { throw new RuntimeException(“An error occurred while loading the local executor (” + LOCAL_EXECUTOR_CLASS + “).”, t); } } private static Class<? extends PlanExecutor> loadExecutorClass(String className) { try { Class<?> leClass = Class.forName(className); return leClass.asSubclass(PlanExecutor.class); } catch (ClassNotFoundException cnfe) { throw new RuntimeException(“Could not load the executor class (” + className + “). Do you have the ‘flink-clients’ project in your dependencies?”); } catch (Throwable t) { throw new RuntimeException(“An error occurred while loading the executor (” + className + “).”, t); } }PlanExecutor.createLocalExecutor方法通过反射创建org.apache.flink.client.LocalExecutorLocalExecutor.executePlanflink-clients_2.11-1.6.2-sources.jar!/org/apache/flink/client/LocalExecutor.java /* * Executes the given program on a local runtime and waits for the job to finish. * * If the executor has not been started before, this starts the executor and shuts it down * after the job finished. If the job runs in session mode, the executor is kept alive until * no more references to the executor exist. * * @param plan The plan of the program to execute. * @return The net runtime of the program, in milliseconds. * * @throws Exception Thrown, if either the startup of the local execution context, or the execution * caused an exception. / @Override public JobExecutionResult executePlan(Plan plan) throws Exception { if (plan == null) { throw new IllegalArgumentException(“The plan may not be null.”); } synchronized (this.lock) { // check if we start a session dedicated for this execution final boolean shutDownAtEnd; if (jobExecutorService == null) { shutDownAtEnd = true; // configure the number of local slots equal to the parallelism of the local plan if (this.taskManagerNumSlots == DEFAULT_TASK_MANAGER_NUM_SLOTS) { int maxParallelism = plan.getMaximumParallelism(); if (maxParallelism > 0) { this.taskManagerNumSlots = maxParallelism; } } // start the cluster for us start(); } else { // we use the existing session shutDownAtEnd = false; } try { // TODO: Set job’s default parallelism to max number of slots final int slotsPerTaskManager = jobExecutorServiceConfiguration.getInteger(TaskManagerOptions.NUM_TASK_SLOTS, taskManagerNumSlots); final int numTaskManagers = jobExecutorServiceConfiguration.getInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, 1); plan.setDefaultParallelism(slotsPerTaskManager * numTaskManagers); Optimizer pc = new Optimizer(new DataStatistics(), jobExecutorServiceConfiguration); OptimizedPlan op = pc.compile(plan); JobGraphGenerator jgg = new JobGraphGenerator(jobExecutorServiceConfiguration); JobGraph jobGraph = jgg.compileJobGraph(op, plan.getJobId()); return jobExecutorService.executeJobBlocking(jobGraph); } finally { if (shutDownAtEnd) { stop(); } } } }这里当jobExecutorService为null的时候，会调用start方法启动cluster创建jobExecutorService之后创建JobGraphGenerator，然后通过JobGraphGenerator.compileJobGraph方法，将plan构建为JobGraph最后调用jobExecutorService.executeJobBlocking(jobGraph)，执行这个jobGraph，然后返回JobExecutionResultLocalExecutor.startflink-clients_2.11-1.6.2-sources.jar!/org/apache/flink/client/LocalExecutor.java @Override public void start() throws Exception { synchronized (lock) { if (jobExecutorService == null) { // create the embedded runtime jobExecutorServiceConfiguration = createConfiguration(); // start it up jobExecutorService = createJobExecutorService(jobExecutorServiceConfiguration); } else { throw new IllegalStateException(“The local executor was already started.”); } } } private Configuration createConfiguration() { Configuration newConfiguration = new Configuration(); newConfiguration.setInteger(TaskManagerOptions.NUM_TASK_SLOTS, getTaskManagerNumSlots()); newConfiguration.setBoolean(CoreOptions.FILESYTEM_DEFAULT_OVERRIDE, isDefaultOverwriteFiles()); newConfiguration.addAll(baseConfiguration); return newConfiguration; } private JobExecutorService createJobExecutorService(Configuration configuration) throws Exception { final JobExecutorService newJobExecutorService; if (CoreOptions.NEW_MODE.equals(configuration.getString(CoreOptions.MODE))) { if (!configuration.contains(RestOptions.PORT)) { configuration.setInteger(RestOptions.PORT, 0); } final MiniClusterConfiguration miniClusterConfiguration = new MiniClusterConfiguration.Builder() .setConfiguration(configuration) .setNumTaskManagers( configuration.getInteger( ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, ConfigConstants.DEFAULT_LOCAL_NUMBER_TASK_MANAGER)) .setRpcServiceSharing(RpcServiceSharing.SHARED) .setNumSlotsPerTaskManager( configuration.getInteger( TaskManagerOptions.NUM_TASK_SLOTS, 1)) .build(); final MiniCluster miniCluster = new MiniCluster(miniClusterConfiguration); miniCluster.start(); configuration.setInteger(RestOptions.PORT, miniCluster.getRestAddress().getPort()); newJobExecutorService = miniCluster; } else { final LocalFlinkMiniCluster localFlinkMiniCluster = new LocalFlinkMiniCluster(configuration, true); localFlinkMiniCluster.start(); newJobExecutorService = localFlinkMiniCluster; } return newJobExecutorService; }start方法这里先通过createConfiguration创建配置文件，再通过createJobExecutorService创建JobExecutorServicecreateConfiguration主要设置了TaskManagerOptions.NUM_TASK_SLOTS以及CoreOptions.FILESYTEM_DEFAULT_OVERRIDEcreateJobExecutorService方法这里主要是根据configuration.getString(CoreOptions.MODE)的配置来创建不同的newJobExecutorService默认是CoreOptions.NEW_MODE模式，它先创建MiniClusterConfiguration，然后创建MiniCluster(JobExecutorService)，然后调用MiniCluster.start方法启动之后返回非CoreOptions.NEW_MODE模式，则创建的是LocalFlinkMiniCluster(JobExecutorService)，然后调用LocalFlinkMiniCluster.start()启动之后返回MiniCluster.executeJobBlockingflink-runtime_2.11-1.6.2-sources.jar!/org/apache/flink/runtime/minicluster/MiniCluster.java /* * This method runs a job in blocking mode. The method returns only after the job * completed successfully, or after it failed terminally. * * @param job The Flink job to execute * @return The result of the job execution * * @throws JobExecutionException Thrown if anything went amiss during initial job launch, * or if the job terminally failed. */ @Override public JobExecutionResult executeJobBlocking(JobGraph job) throws JobExecutionException, InterruptedException { checkNotNull(job, “job is null”); final CompletableFuture<JobSubmissionResult> submissionFuture = submitJob(job); final CompletableFuture<JobResult> jobResultFuture = submissionFuture.thenCompose( (JobSubmissionResult ignored) -> requestJobResult(job.getJobID())); final JobResult jobResult; try { jobResult = jobResultFuture.get(); } catch (ExecutionException e) { throw new JobExecutionException(job.getJobID(), “Could not retrieve JobResult.”, ExceptionUtils.stripExecutionException(e)); } try { return jobResult.toJobExecutionResult(Thread.currentThread().getContextClassLoader()); } catch (IOException | ClassNotFoundException e) { throw new JobExecutionException(job.getJobID(), e); } }MiniCluster.executeJobBlocking方法，先调用submitJob(job)方法，提交这个JobGraph，它返回一个CompletableFuture(submissionFuture)该CompletableFuture(submissionFuture)通过thenCompose连接了requestJobResult方法来根据jobId请求jobResult(jobResultFuture)最后通过jobResultFuture.get()获取JobExecutionResult小结DataSet的print方法调用了collect方法，而collect方法则调用getExecutionEnvironment().execute()来获取JobExecutionResult，executionEnvironment这里是LocalEnvironmentExecutionEnvironment.execute方法内部调用了抽象方法execute(String jobName)，该抽象方法由子类实现，这里是LocalEnvironment.execute，它先通过startNewSession，使用PlanExecutor.createLocalExecutor创建LocalExecutor，之后通过createProgramPlan创建plan，最后调用LocalExecutor.executePlan来获取JobExecutionResultLocalExecutor.executePlan方法它先判断jobExecutorService，如果为null，则调用start方法创建jobExecutorService(这里根据CoreOptions.MODE配置，如果是CoreOptions.NEW_MODE则创建的jobExecutorService是MiniCluster，否则创建的jobExecutorService是LocalFlinkMiniCluster)，这里创建的jobExecutorService为MiniCluster；之后通过JobGraphGenerator将plan转换为jobGraph；最后调用jobExecutorService.executeJobBlocking(jobGraph)，执行这个jobGraph，然后返回JobExecutionResultdocLocalEnvironmentLocalExecutorMiniCluster ...

聊聊storm的OpaquePartitionedTridentSpoutExecutor

序本文主要研究一下storm的OpaquePartitionedTridentSpoutExecutorTridentTopology.newStreamstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) { return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout)); }TridentTopology.newStream方法，对于IOpaquePartitionedTridentSpout类型的spout会使用OpaquePartitionedTridentSpoutExecutor来包装；而KafkaTridentSpoutOpaque则实现了IOpaquePartitionedTridentSpout接口TridentTopologyBuilder.buildTopologystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentTopologyBuilder.java public StormTopology buildTopology(Map<String, Number> masterCoordResources) { TopologyBuilder builder = new TopologyBuilder(); Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false); Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true); Map<String, List<String>> batchesToCommitIds = new HashMap<>(); Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<>(); for(String id: _spouts.keySet()) { TransactionalSpoutComponent c = _spouts.get(id); if(c.spout instanceof IRichSpout) { //TODO: wrap this to set the stream name builder.setSpout(id, (IRichSpout) c.spout, c.parallelism); } else { String batchGroup = c.batchGroupId; if(!batchesToCommitIds.containsKey(batchGroup)) { batchesToCommitIds.put(batchGroup, new ArrayList<String>()); } batchesToCommitIds.get(batchGroup).add(c.commitStateId); if(!batchesToSpouts.containsKey(batchGroup)) { batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>()); } batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout); BoltDeclarer scd = builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout)) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID); for(Map<String, Object> m: c.componentConfs) { scd.addConfigurations(m); } Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap(); specs.put(c.batchGroupId, new CoordSpec()); BoltDeclarer bd = builder.setBolt(id, new TridentBoltExecutor( new TridentSpoutExecutor( c.commitStateId, c.streamName, ((ITridentSpout) c.spout)), batchIdsForSpouts, specs), c.parallelism); bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID); bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID); if(c.spout instanceof ICommitterTridentSpout) { bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID); } for(Map<String, Object> m: c.componentConfs) { bd.addConfigurations(m); } } } //…… return builder.createTopology(); }TridentTopologyBuilder.buildTopology会将IOpaquePartitionedTridentSpout(OpaquePartitionedTridentSpoutExecutor)使用TridentSpoutExecutor包装，然后再使用TridentBoltExecutor包装为boltOpaquePartitionedTridentSpoutExecutorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/OpaquePartitionedTridentSpoutExecutor.javapublic class OpaquePartitionedTridentSpoutExecutor implements ICommitterTridentSpout<Object> { protected final Logger LOG = LoggerFactory.getLogger(OpaquePartitionedTridentSpoutExecutor.class); IOpaquePartitionedTridentSpout<Object, ISpoutPartition, Object> _spout; //…… public OpaquePartitionedTridentSpoutExecutor(IOpaquePartitionedTridentSpout<Object, ISpoutPartition, Object> spout) { _spout = spout; } @Override public ITridentSpout.BatchCoordinator<Object> getCoordinator(String txStateId, Map conf, TopologyContext context) { return new Coordinator(conf, context); } @Override public ICommitterTridentSpout.Emitter getEmitter(String txStateId, Map conf, TopologyContext context) { return new Emitter(txStateId, conf, context); } @Override public Fields getOutputFields() { return _spout.getOutputFields(); } @Override public Map<String, Object> getComponentConfiguration() { return _spout.getComponentConfiguration(); } }OpaquePartitionedTridentSpoutExecutor实现了ICommitterTridentSpout，这里getCoordinator返回的是ITridentSpout.BatchCoordinator，getEmitter返回的是ICommitterTridentSpout.EmitterITridentSpout.BatchCoordinatorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/OpaquePartitionedTridentSpoutExecutor.java public class Coordinator implements ITridentSpout.BatchCoordinator<Object> { IOpaquePartitionedTridentSpout.Coordinator _coordinator; public Coordinator(Map conf, TopologyContext context) { _coordinator = _spout.getCoordinator(conf, context); } @Override public Object initializeTransaction(long txid, Object prevMetadata, Object currMetadata) { LOG.debug(“Initialize Transaction. [txid = {}], [prevMetadata = {}], [currMetadata = {}]”, txid, prevMetadata, currMetadata); return _coordinator.getPartitionsForBatch(); } @Override public void close() { LOG.debug(“Closing”); _coordinator.close(); LOG.debug(“Closed”); } @Override public void success(long txid) { LOG.debug(“Success [txid = {}]”, txid); } @Override public boolean isReady(long txid) { boolean ready = _coordinator.isReady(txid); LOG.debug("[isReady = {}], [txid = {}]", ready, txid); return ready; } }包装了spout的_coordinator，它的类型IOpaquePartitionedTridentSpout.Coordinator，这里仅仅是多了debug日志ICommitterTridentSpout.Emitterstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/OpaquePartitionedTridentSpoutExecutor.java public class Emitter implements ICommitterTridentSpout.Emitter { IOpaquePartitionedTridentSpout.Emitter<Object, ISpoutPartition, Object> _emitter; TransactionalState _state; TreeMap<Long, Map<String, Object>> _cachedMetas = new TreeMap<>(); Map<String, EmitterPartitionState> _partitionStates = new HashMap<>(); int _index; int _numTasks; public Emitter(String txStateId, Map conf, TopologyContext context) { _emitter = _spout.getEmitter(conf, context); _index = context.getThisTaskIndex(); _numTasks = context.getComponentTasks(context.getThisComponentId()).size(); _state = TransactionalState.newUserState(conf, txStateId); LOG.debug(“Created {}”, this); } Object _savedCoordinatorMeta = null; boolean _changedMeta = false; @Override public void emitBatch(TransactionAttempt tx, Object coordinatorMeta, TridentCollector collector) { LOG.debug(“Emitting Batch. [transaction = {}], [coordinatorMeta = {}], [collector = {}], [{}]”, tx, coordinatorMeta, collector, this); if(_savedCoordinatorMeta==null || !_savedCoordinatorMeta.equals(coordinatorMeta)) { _partitionStates.clear(); final List<ISpoutPartition> taskPartitions = _emitter.getPartitionsForTask(_index, _numTasks, coordinatorMeta); for (ISpoutPartition partition : taskPartitions) { _partitionStates.put(partition.getId(), new EmitterPartitionState(new RotatingTransactionalState(_state, partition.getId()), partition)); } // refresh all partitions for backwards compatibility with old spout _emitter.refreshPartitions(_emitter.getOrderedPartitions(coordinatorMeta)); _savedCoordinatorMeta = coordinatorMeta; _changedMeta = true; } Map<String, Object> metas = new HashMap<>(); _cachedMetas.put(tx.getTransactionId(), metas); Entry<Long, Map<String, Object>> entry = _cachedMetas.lowerEntry(tx.getTransactionId()); Map<String, Object> prevCached; if(entry!=null) { prevCached = entry.getValue(); } else { prevCached = new HashMap<>(); } for(Entry<String, EmitterPartitionState> e: _partitionStates.entrySet()) { String id = e.getKey(); EmitterPartitionState s = e.getValue(); s.rotatingState.removeState(tx.getTransactionId()); Object lastMeta = prevCached.get(id); if(lastMeta==null) lastMeta = s.rotatingState.getLastState(); Object meta = _emitter.emitPartitionBatch(tx, collector, s.partition, lastMeta); metas.put(id, meta); } LOG.debug(“Emitted Batch. [transaction = {}], [coordinatorMeta = {}], [collector = {}], [{}]”, tx, coordinatorMeta, collector, this); } @Override public void success(TransactionAttempt tx) { for(EmitterPartitionState state: _partitionStates.values()) { state.rotatingState.cleanupBefore(tx.getTransactionId()); } LOG.debug(“Success transaction {}. [{}]”, tx, this); } @Override public void commit(TransactionAttempt attempt) { LOG.debug(“Committing transaction {}. [{}]”, attempt, this); // this code here handles a case where a previous commit failed, and the partitions // changed since the last commit. This clears out any state for the removed partitions // for this txid. // we make sure only a single task ever does this. we’re also guaranteed that // it’s impossible for there to be another writer to the directory for that partition // because only a single commit can be happening at once. this is because in order for // another attempt of the batch to commit, the batch phase must have succeeded in between. // hence, all tasks for the prior commit must have finished committing (whether successfully or not) if(_changedMeta && _index==0) { Set<String> validIds = new HashSet<>(); for(ISpoutPartition p: _emitter.getOrderedPartitions(_savedCoordinatorMeta)) { validIds.add(p.getId()); } for(String existingPartition: _state.list("")) { if(!validIds.contains(existingPartition)) { RotatingTransactionalState s = new RotatingTransactionalState(_state, existingPartition); s.removeState(attempt.getTransactionId()); } } _changedMeta = false; } Long txid = attempt.getTransactionId(); Map<String, Object> metas = _cachedMetas.remove(txid); for(Entry<String, Object> entry: metas.entrySet()) { _partitionStates.get(entry.getKey()).rotatingState.overrideState(txid, entry.getValue()); } LOG.debug(“Exiting commit method for transaction {}. [{}]”, attempt, this); } @Override public void close() { LOG.debug(“Closing”); _emitter.close(); LOG.debug(“Closed”); } @Override public String toString() { return “Emitter{” + “, _state=” + _state + “, _cachedMetas=” + _cachedMetas + “, _partitionStates=” + _partitionStates + “, _index=” + _index + “, _numTasks=” + _numTasks + “, _savedCoordinatorMeta=” + _savedCoordinatorMeta + “, _changedMeta=” + _changedMeta + ‘}’; } } static class EmitterPartitionState { public RotatingTransactionalState rotatingState; public ISpoutPartition partition; public EmitterPartitionState(RotatingTransactionalState s, ISpoutPartition p) { rotatingState = s; partition = p; } }这里对spout的IOpaquePartitionedTridentSpout.Emitter进行了封装，_partitionStates使用了EmitterPartitionStateemitBatch方法首先计算_partitionStates，然后计算prevCached，最后调用_emitter.emitPartitionBatch(tx, collector, s.partition, lastMeta)success方法调用state.rotatingState.cleanupBefore(tx.getTransactionId())，清空该txid之前的状态信息；commit方法主要是更新_partitionStatesKafkaTridentSpoutOpaquestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutOpaque.javapublic class KafkaTridentSpoutOpaque<K,V> implements IOpaquePartitionedTridentSpout<List<Map<String, Object>>, KafkaTridentSpoutTopicPartition, Map<String, Object>> { private static final long serialVersionUID = -8003272486566259640L; private static final Logger LOG = LoggerFactory.getLogger(KafkaTridentSpoutOpaque.class); private final KafkaTridentSpoutManager<K, V> kafkaManager; public KafkaTridentSpoutOpaque(KafkaSpoutConfig<K, V> conf) { this(new KafkaTridentSpoutManager<>(conf)); } public KafkaTridentSpoutOpaque(KafkaTridentSpoutManager<K, V> kafkaManager) { this.kafkaManager = kafkaManager; LOG.debug(“Created {}”, this.toString()); } @Override public Emitter<List<Map<String, Object>>, KafkaTridentSpoutTopicPartition, Map<String, Object>> getEmitter( Map conf, TopologyContext context) { return new KafkaTridentSpoutEmitter<>(kafkaManager, context); } @Override public Coordinator<List<Map<String, Object>>> getCoordinator(Map conf, TopologyContext context) { return new KafkaTridentSpoutOpaqueCoordinator<>(kafkaManager); } @Override public Map<String, Object> getComponentConfiguration() { return null; } @Override public Fields getOutputFields() { final Fields outputFields = kafkaManager.getFields(); LOG.debug(“OutputFields = {}”, outputFields); return outputFields; } @Override public final String toString() { return super.toString() + “{kafkaManager=” + kafkaManager + ‘}’; }}KafkaTridentSpoutOpaque的getCoordinator返回的是KafkaTridentSpoutOpaqueCoordinator；getEmitter返回的是KafkaTridentSpoutEmitterKafkaTridentSpoutOpaqueCoordinatorstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutOpaqueCoordinator.javapublic class KafkaTridentSpoutOpaqueCoordinator<K,V> implements IOpaquePartitionedTridentSpout.Coordinator<List<Map<String, Object>>>, Serializable { private static final Logger LOG = LoggerFactory.getLogger(KafkaTridentSpoutOpaqueCoordinator.class); private final TopicPartitionSerializer tpSerializer = new TopicPartitionSerializer(); private final KafkaTridentSpoutManager<K,V> kafkaManager; public KafkaTridentSpoutOpaqueCoordinator(KafkaTridentSpoutManager<K, V> kafkaManager) { this.kafkaManager = kafkaManager; LOG.debug(“Created {}”, this.toString()); } @Override public boolean isReady(long txid) { LOG.debug(“isReady = true”); return true; // the “old” trident kafka spout always returns true, like this } @Override public List<Map<String, Object>> getPartitionsForBatch() { final ArrayList<TopicPartition> topicPartitions = new ArrayList<>(kafkaManager.getTopicPartitions()); LOG.debug(“TopicPartitions for batch {}”, topicPartitions); List<Map<String, Object>> tps = new ArrayList<>(); for(TopicPartition tp : topicPartitions) { tps.add(tpSerializer.toMap(tp)); } return tps; } @Override public void close() { LOG.debug(“Closed”); // the “old” trident kafka spout is no op like this } @Override public final String toString() { return super.toString() + “{kafkaManager=” + kafkaManager + ‘}’; }}这里的isReady始终返回true，getPartitionsForBatch方法主要是将kafkaManager.getTopicPartitions()信息转换为map结构KafkaTridentSpoutEmitterstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.javapublic class KafkaTridentSpoutEmitter<K, V> implements IOpaquePartitionedTridentSpout.Emitter< List<Map<String, Object>>, KafkaTridentSpoutTopicPartition, Map<String, Object>>, Serializable { private static final long serialVersionUID = -7343927794834130435L; private static final Logger LOG = LoggerFactory.getLogger(KafkaTridentSpoutEmitter.class); // Kafka private final KafkaConsumer<K, V> kafkaConsumer; // Bookkeeping private final KafkaTridentSpoutManager<K, V> kafkaManager; // set of topic-partitions for which first poll has already occurred, and the first polled txid private final Map<TopicPartition, Long> firstPollTransaction = new HashMap<>(); // Declare some KafkaTridentSpoutManager references for convenience private final long pollTimeoutMs; private final KafkaSpoutConfig.FirstPollOffsetStrategy firstPollOffsetStrategy; private final RecordTranslator<K, V> translator; private final Timer refreshSubscriptionTimer; private final TopicPartitionSerializer tpSerializer = new TopicPartitionSerializer(); private TopologyContext topologyContext; /** * Create a new Kafka spout emitter. * @param kafkaManager The Kafka consumer manager to use * @param topologyContext The topology context * @param refreshSubscriptionTimer The timer for deciding when to recheck the subscription / public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager<K, V> kafkaManager, TopologyContext topologyContext, Timer refreshSubscriptionTimer) { this.kafkaConsumer = kafkaManager.createAndSubscribeKafkaConsumer(topologyContext); this.kafkaManager = kafkaManager; this.topologyContext = topologyContext; this.refreshSubscriptionTimer = refreshSubscriptionTimer; this.translator = kafkaManager.getKafkaSpoutConfig().getTranslator(); final KafkaSpoutConfig<K, V> kafkaSpoutConfig = kafkaManager.getKafkaSpoutConfig(); this.pollTimeoutMs = kafkaSpoutConfig.getPollTimeoutMs(); this.firstPollOffsetStrategy = kafkaSpoutConfig.getFirstPollOffsetStrategy(); LOG.debug(“Created {}”, this.toString()); } /* * Creates instance of this class with default 500 millisecond refresh subscription timer / public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager<K, V> kafkaManager, TopologyContext topologyContext) { this(kafkaManager, topologyContext, new Timer(500, kafkaManager.getKafkaSpoutConfig().getPartitionRefreshPeriodMs(), TimeUnit.MILLISECONDS)); } //…… @Override public Map<String, Object> emitPartitionBatch(TransactionAttempt tx, TridentCollector collector, KafkaTridentSpoutTopicPartition currBatchPartition, Map<String, Object> lastBatch) { LOG.debug(“Processing batch: [transaction = {}], [currBatchPartition = {}], [lastBatchMetadata = {}], [collector = {}]”, tx, currBatchPartition, lastBatch, collector); final TopicPartition currBatchTp = currBatchPartition.getTopicPartition(); final Set<TopicPartition> assignments = kafkaConsumer.assignment(); KafkaTridentSpoutBatchMetadata lastBatchMeta = lastBatch == null ? null : KafkaTridentSpoutBatchMetadata.fromMap(lastBatch); KafkaTridentSpoutBatchMetadata currentBatch = lastBatchMeta; Collection<TopicPartition> pausedTopicPartitions = Collections.emptySet(); if (assignments == null || !assignments.contains(currBatchPartition.getTopicPartition())) { LOG.warn(“SKIPPING processing batch [transaction = {}], [currBatchPartition = {}], [lastBatchMetadata = {}], " + “[collector = {}] because it is not part of the assignments {} of consumer instance [{}] " + “of consumer group [{}]”, tx, currBatchPartition, lastBatch, collector, assignments, kafkaConsumer, kafkaManager.getKafkaSpoutConfig().getConsumerGroupId()); } else { try { // pause other topic-partitions to only poll from current topic-partition pausedTopicPartitions = pauseTopicPartitions(currBatchTp); seek(currBatchTp, lastBatchMeta, tx.getTransactionId()); // poll if (refreshSubscriptionTimer.isExpiredResetOnTrue()) { kafkaManager.getKafkaSpoutConfig().getSubscription().refreshAssignment(); } final ConsumerRecords<K, V> records = kafkaConsumer.poll(pollTimeoutMs); LOG.debug(“Polled [{}] records from Kafka.”, records.count()); if (!records.isEmpty()) { emitTuples(collector, records); // build new metadata currentBatch = new KafkaTridentSpoutBatchMetadata(currBatchTp, records); } } finally { kafkaConsumer.resume(pausedTopicPartitions); LOG.trace(“Resumed topic-partitions {}”, pausedTopicPartitions); } LOG.debug(“Emitted batch: [transaction = {}], [currBatchPartition = {}], [lastBatchMetadata = {}], " + “[currBatchMetadata = {}], [collector = {}]”, tx, currBatchPartition, lastBatch, currentBatch, collector); } return currentBatch == null ? null : currentBatch.toMap(); } private void emitTuples(TridentCollector collector, ConsumerRecords<K, V> records) { for (ConsumerRecord<K, V> record : records) { final List<Object> tuple = translator.apply(record); collector.emit(tuple); LOG.debug(“Emitted tuple {} for record [{}]”, tuple, record); } } @Override public void refreshPartitions(List<KafkaTridentSpoutTopicPartition> partitionResponsibilities) { LOG.trace(“Refreshing of topic-partitions handled by Kafka. " + “No action taken by this method for topic partitions {}”, partitionResponsibilities); } /* * Computes ordered list of topic-partitions for this task taking into consideration that topic-partitions * for this task must be assigned to the Kafka consumer running on this task. * * @param allPartitionInfo list of all partitions as returned by {@link KafkaTridentSpoutOpaqueCoordinator} * @return ordered list of topic partitions for this task */ @Override public List<KafkaTridentSpoutTopicPartition> getOrderedPartitions(final List<Map<String, Object>> allPartitionInfo) { List<TopicPartition> allTopicPartitions = new ArrayList<>(); for(Map<String, Object> map : allPartitionInfo) { allTopicPartitions.add(tpSerializer.fromMap(map)); } final List<KafkaTridentSpoutTopicPartition> allPartitions = newKafkaTridentSpoutTopicPartitions(allTopicPartitions); LOG.debug(“Returning all topic-partitions {} across all tasks. Current task index [{}]. Total tasks [{}] “, allPartitions, topologyContext.getThisTaskIndex(), getNumTasks()); return allPartitions; } @Override public List<KafkaTridentSpoutTopicPartition> getPartitionsForTask(int taskId, int numTasks, List<Map<String, Object>> allPartitionInfo) { final Set<TopicPartition> assignedTps = kafkaConsumer.assignment(); LOG.debug(“Consumer [{}], running on task with index [{}], has assigned topic-partitions {}”, kafkaConsumer, taskId, assignedTps); final List<KafkaTridentSpoutTopicPartition> taskTps = newKafkaTridentSpoutTopicPartitions(assignedTps); LOG.debug(“Returning topic-partitions {} for task with index [{}]”, taskTps, taskId); return taskTps; } @Override public void close() { kafkaConsumer.close(); LOG.debug(“Closed”); } @Override public final String toString() { return super.toString() + “{kafkaManager=” + kafkaManager + ‘}’; }}这里的refreshSubscriptionTimer的interval取的是kafkaManager.getKafkaSpoutConfig().getPartitionRefreshPeriodMs()，默认是2000emitPartitionBatch方法没调用一次都会判断refreshSubscriptionTimer.isExpiredResetOnTrue()，如果时间到了，就会调用kafkaManager.getKafkaSpoutConfig().getSubscription().refreshAssignment()刷新assignmentemitPartitionBatch方法主要是找到与该batch关联的partition，停止从其他parition拉取消息，然后根据firstPollOffsetStrategy以及lastBatchMeta信息，调用kafkaConsumer的seek相关方法seek到指定位置之后就是用kafkaConsumer.poll(pollTimeoutMs)拉取数据，然后emitTuples；emitTuples方法会是用translator转换数据，然后调用collector.emit发射出去refreshPartitions方法目前仅仅是trace下日志；getOrderedPartitions方法先将allPartitionInfo的数据从map结构反序列化回来，然后转换为KafkaTridentSpoutTopicPartition返回；getPartitionsForTask方法主要是通过kafkaConsumer.assignment()的信息转换为KafkaTridentSpoutTopicPartition返回小结storm-kafka-client提供了KafkaTridentSpoutOpaque这个spout作为trident的kafka spout(旧版的为OpaqueTridentKafkaSpout，在storm-kafka类库中)，它实现了IOpaquePartitionedTridentSpout接口TridentTopology.newStream方法，对于IOpaquePartitionedTridentSpout类型的spout会使用OpaquePartitionedTridentSpoutExecutor来包装；TridentTopologyBuilder.buildTopology会将IOpaquePartitionedTridentSpout(OpaquePartitionedTridentSpoutExecutor)先使用TridentSpoutExecutor包装，然后再使用TridentBoltExecutor包装为boltOpaquePartitionedTridentSpoutExecutor的getCoordinator返回的是ITridentSpout.BatchCoordinator，getEmitter返回的是ICommitterTridentSpout.Emitter；他们分别对KafkaTridentSpoutOpaque这个原始spout返回的KafkaTridentSpoutOpaqueCoordinator以及KafkaTridentSpoutEmitter进行包装再处理；其中对coordinator加了debug日志，对emitter则主要多了对EmitterPartitionState的存取docStorm Kafka Integration (0.10.x+) ...

聊聊storm TridentBoltExecutor的finishBatch方法

序本文主要研究一下storm TridentBoltExecutor的finishBatch方法MasterBatchCoordinator.nextTuplestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCoordinator.java public void nextTuple() { sync(); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won’t be a commit for tx 2 (because tx 1 isn’t committed yet), // and there won’t be a batch for tx 4 because there’s max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) { maybeCommit.status = AttemptStatus.COMMITTING; _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt); LOG.debug(“Emitted on [stream = {}], [tx_status = {}], [{}]”, COMMIT_STREAM_ID, maybeCommit, this); } if(_active) { if(_activeTx.size() < _maxTransactionActive) { Long curr = _currTransaction; for(int i=0; i<_maxTransactionActive; i++) { if(!_activeTx.containsKey(curr) && isReady(curr)) { // by using a monotonically increasing attempt id, downstream tasks // can be memory efficient by clearing out state for old attempts // as soon as they see a higher attempt id for a transaction Integer attemptId = _attemptIds.get(curr); if(attemptId==null) { attemptId = 0; } else { attemptId++; } _attemptIds.put(curr, attemptId); for(TransactionalState state: _states) { state.setData(CURRENT_ATTEMPTS, _attemptIds); } TransactionAttempt attempt = new TransactionAttempt(curr, attemptId); final TransactionStatus newTransactionStatus = new TransactionStatus(attempt); _activeTx.put(curr, newTransactionStatus); _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt); LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, BATCH_STREAM_ID, attempt, newTransactionStatus, this); _throttler.markEvent(); } curr = nextTransactionId(curr); } } } }MasterBatchCoordinator是整个trident的真正的spout，它的nextTuple方法会向TridentSpoutCoordinator向MasterBatchCoordinator.BATCH_STREAM_ID($batch)发射tupleTridentSpoutCoordinator.executestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/TridentSpoutCoordinator.java public void execute(Tuple tuple, BasicOutputCollector collector) { TransactionAttempt attempt = (TransactionAttempt) tuple.getValue(0); if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { _state.cleanupBefore(attempt.getTransactionId()); _coord.success(attempt.getTransactionId()); } else { long txid = attempt.getTransactionId(); Object prevMeta = _state.getPreviousState(txid); Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid)); _state.overrideState(txid, meta); collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta)); } }TridentSpoutCoordinator接收MasterBatchCoordinator在MasterBatchCoordinator.BATCH_STREAM_ID($batch)发过来的tuple，然后向包装用户spout的TridentBoltExecutor发送batch指令TridentBoltExecutor(TridentSpoutExecutor)storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java public void execute(Tuple tuple) { if(TupleUtils.isTick(tuple)) { long now = System.currentTimeMillis(); if(now - _lastRotate > _messageTimeoutMs) { _batches.rotate(); _lastRotate = now; } return; } String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId()); if(batchGroup==null) { // this is so we can do things like have simple DRPC that doesn’t need to use batch processing _coordCollector.setCurrBatch(null); _bolt.execute(null, tuple); _collector.ack(tuple); return; } IBatchID id = (IBatchID) tuple.getValue(0); //get transaction id //if it already exists and attempt id is greater than the attempt there TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());// if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {// System.out.println(“Received in " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()// + " (” + _batches.size() + “)” +// “\ntuple: " + tuple +// “\nwith tracked " + tracked +// “\nwith id " + id + // “\nwith group " + batchGroup// + “\n”);// // } //System.out.println(“Num tracked: " + _batches.size() + " " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()); // this code here ensures that only one attempt is ever tracked for a batch, so when // failures happen you don’t get an explosion in memory usage in the tasks if(tracked!=null) { if(id.getAttemptId() > tracked.attemptId) { _batches.remove(id.getId()); tracked = null; } else if(id.getAttemptId() < tracked.attemptId) { // no reason to try to execute a previous attempt than we’ve already seen return; } } if(tracked==null) { tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId()); _batches.put(id.getId(), tracked); } _coordCollector.setCurrBatch(tracked); //System.out.println(“TRACKED: " + tracked + " " + tuple); TupleType t = getTupleType(tuple, tracked); if(t==TupleType.COMMIT) { tracked.receivedCommit = true; checkFinish(tracked, tuple, t); } else if(t==TupleType.COORD) { int count = tuple.getInteger(1); tracked.reportedTasks++; tracked.expectedTupleCount+=count; checkFinish(tracked, tuple, t); } else { tracked.receivedTuples++; boolean success = true; try { _bolt.execute(tracked.info, tuple); if(tracked.condition.expectedTaskReports==0) { success = finishBatch(tracked, tuple); } } catch(FailedException e) { failBatch(tracked, e); } if(success) { _collector.ack(tuple); } else { _collector.fail(tuple); } } _coordCollector.setCurrBatch(null); } private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) { boolean success = true; try { _bolt.finishBatch(tracked.info); String stream = COORD_STREAM(tracked.info.batchGroup); for(Integer task: tracked.condition.targetTasks) { _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))); } if(tracked.delayedAck!=null) { _collector.ack(tracked.delayedAck); tracked.delayedAck = null; } } catch(FailedException e) { failBatch(tracked, e); success = false; } _batches.remove(tracked.info.batchId.getId()); return success; }TridentBoltExecutor.execute方法，首先会创建并初始化TrackedBatch(如果TrackedBatch不存在的话)，之后接收到batch指令的时候，对tracked.receivedTuple累加，然后调用_bolt.execute(tracked.info, tuple)对于spout来说，这里的_bolt是TridentSpoutExecutor，它的execute方法会往下游的TridentBoltExecutor发射一个batch的tuples；由于spout的expectedTaskReports==0，所以这里在调用完TridentSpoutExecutor发射batch的tuples时，它就立马调用finishBatchfinishBatch操作，这里会通过COORD_STREAM往下游的TridentBoltExecutor发射[id,count]数据，告知下游TridentBoltExecutor说它一共发射了多少tuplesTridentBoltExecutor(SubtopologyBolt)storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java @Override public void execute(Tuple tuple) { if(TupleUtils.isTick(tuple)) { long now = System.currentTimeMillis(); if(now - _lastRotate > _messageTimeoutMs) { _batches.rotate(); _lastRotate = now; } return; } String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId()); if(batchGroup==null) { // this is so we can do things like have simple DRPC that doesn’t need to use batch processing _coordCollector.setCurrBatch(null); _bolt.execute(null, tuple); _collector.ack(tuple); return; } IBatchID id = (IBatchID) tuple.getValue(0); //get transaction id //if it already exists and attempt id is greater than the attempt there TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());// if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {// System.out.println(“Received in " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()// + " (” + _batches.size() + “)” +// “\ntuple: " + tuple +// “\nwith tracked " + tracked +// “\nwith id " + id + // “\nwith group " + batchGroup// + “\n”);// // } //System.out.println(“Num tracked: " + _batches.size() + " " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()); // this code here ensures that only one attempt is ever tracked for a batch, so when // failures happen you don’t get an explosion in memory usage in the tasks if(tracked!=null) { if(id.getAttemptId() > tracked.attemptId) { _batches.remove(id.getId()); tracked = null; } else if(id.getAttemptId() < tracked.attemptId) { // no reason to try to execute a previous attempt than we’ve already seen return; } } if(tracked==null) { tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId()); _batches.put(id.getId(), tracked); } _coordCollector.setCurrBatch(tracked); //System.out.println(“TRACKED: " + tracked + " " + tuple); TupleType t = getTupleType(tuple, tracked); if(t==TupleType.COMMIT) { tracked.receivedCommit = true; checkFinish(tracked, tuple, t); } else if(t==TupleType.COORD) { int count = tuple.getInteger(1); tracked.reportedTasks++; tracked.expectedTupleCount+=count; checkFinish(tracked, tuple, t); } else { tracked.receivedTuples++; boolean success = true; try { _bolt.execute(tracked.info, tuple); if(tracked.condition.expectedTaskReports==0) { success = finishBatch(tracked, tuple); } } catch(FailedException e) { failBatch(tracked, e); } if(success) { _collector.ack(tuple); } else { _collector.fail(tuple); } } _coordCollector.setCurrBatch(null); } private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) { if(tracked.failed) { failBatch(tracked); _collector.fail(tuple); return; } CoordCondition cond = tracked.condition; boolean delayed = tracked.delayedAck==null && (cond.commitStream!=null && type==TupleType.COMMIT || cond.commitStream==null); if(delayed) { tracked.delayedAck = tuple; } boolean failed = false; if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) { if(tracked.receivedTuples == tracked.expectedTupleCount) { finishBatch(tracked, tuple); } else { //TODO: add logging that not all tuples were received failBatch(tracked); _collector.fail(tuple); failed = true; } } if(!delayed && !failed) { _collector.ack(tuple); } } private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) { boolean success = true; try { _bolt.finishBatch(tracked.info); String stream = COORD_STREAM(tracked.info.batchGroup); for(Integer task: tracked.condition.targetTasks) { _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))); } if(tracked.delayedAck!=null) { _collector.ack(tracked.delayedAck); tracked.delayedAck = null; } } catch(FailedException e) { failBatch(tracked, e); success = false; } _batches.remove(tracked.info.batchId.getId()); return success; }这个TridentBoltExecutor是下游的bolt，它的_bolt是SubtopologyBolt，而且它的tracked.condition.expectedTaskReports不为0，因而它是在接收到TupleType.COORD的tuple的时候，才进行checkFinish操作(这里先忽略TupleType.COMMIT类型)由于BoltExecutor是使用Utils.asyncLoop来挨个消费receiveQueue的数据的，而且emitBatch的时候也是挨个接收batch的tuples，最后再接收到TridentBoltExecutor(TridentSpoutExecutor)在finishBatch的时候通过COORD_STREAM发过来的[id,count]的tuple(注意这里的COORD_STREAM是分发给每个task的，如果TridentBoltExecutor有多个parallel，则他们是按各自的task来接收的)所以TridentBoltExecutor(SubtopologyBolt)先挨个处理每个tuple，处理完之后才轮到TupleType.COORD这个tuple，然后触发checkFinish操作；在没有commitStream的情况下，tracked.receivedCommit默认为true，因而这里只要检测收到的tuples与应收的tuples数一致，就执行_bolt.finishBatch操作完成一个batch，然后再往它的下游TridentBoltExecutor发射它应收的[id,count]的tuple小结对于trident来说，真正的spout是MasterBatchCoordinator，它的nextTuple会触发batch的发送，它将batch指令发送给TridentSpoutCoordinator，而TridentSpoutCoordinator将触发TridentBoltExecutor(TridentSpoutExecutor)的execute方法，进而触发ITridentSpout的emitter的emitBatch，从而发送一个batch的数据TridentBoltExecutor(TridentSpoutExecutor)的expectedTaskReports==0，它在调用完TridentSpoutExecutor发射batch的tuples时，就立马调用finishBatch操作，通过COORD_STREAM往下游的TridentBoltExecutor发射[id,count]数据，告知下游TridentBoltExecutor说它一共发射了多少tuplesspout的下游bolt为TridentBoltExecutor(SubtopologyBolt)，它的tracked.condition.expectedTaskReports不为0，因而它是在接收到TupleType.COORD的tuple的时候，才进行checkFinish操作(这里先忽略TupleType.COMMIT类型)，由于spout是先执行emitBatch操作再最后finishBatch发送[id,count]数据，正常情况下按顺序进入到TridentBoltExecutor(SubtopologyBolt)的receiveQueue队列，然后TridentBoltExecutor(SubtopologyBolt)挨个消费tuple，调用SubtopologyBolt.execute，最后再处理[id,count]数据，触发checkFinish操作，只要检测收到的tuples与应收的tuples数一致，就执行SubtopologyBolt.finishBatch操作完成这个batch，然后再往它的下游TridentBoltExecutor发射它应收的[id,count]的tupledocTrident Tutorial聊聊storm worker的executor与task聊聊storm的AggregateProcessor的execute及finishBatch方法 ...

聊聊storm的AggregateProcessor的execute及finishBatch方法

序本文主要研究一下storm的AggregateProcessor的execute及finishBatch方法实例 TridentTopology topology = new TridentTopology(); topology.newStream(“spout1”, spout) .groupBy(new Fields(“user”)) .aggregate(new Fields(“user”,“score”),new UserCountAggregator(),new Fields(“val”)) .toStream() .parallelismHint(1) .each(new Fields(“val”),new PrintEachFunc(),new Fields());TridentBoltExecutorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) { if(tracked.failed) { failBatch(tracked); _collector.fail(tuple); return; } CoordCondition cond = tracked.condition; boolean delayed = tracked.delayedAck==null && (cond.commitStream!=null && type==TupleType.COMMIT || cond.commitStream==null); if(delayed) { tracked.delayedAck = tuple; } boolean failed = false; if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) { if(tracked.receivedTuples == tracked.expectedTupleCount) { finishBatch(tracked, tuple); } else { //TODO: add logging that not all tuples were received failBatch(tracked); _collector.fail(tuple); failed = true; } } if(!delayed && !failed) { _collector.ack(tuple); } } private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) { boolean success = true; try { _bolt.finishBatch(tracked.info); String stream = COORD_STREAM(tracked.info.batchGroup); for(Integer task: tracked.condition.targetTasks) { _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))); } if(tracked.delayedAck!=null) { _collector.ack(tracked.delayedAck); tracked.delayedAck = null; } } catch(FailedException e) { failBatch(tracked, e); success = false; } _batches.remove(tracked.info.batchId.getId()); return success; } public static class TrackedBatch { int attemptId; BatchInfo info; CoordCondition condition; int reportedTasks = 0; int expectedTupleCount = 0; int receivedTuples = 0; Map<Integer, Integer> taskEmittedTuples = new HashMap<>(); //…… }用户的spout以及groupBy操作最后都是被包装为TridentBoltExecutor，而groupBy的TridentBoltExecutor则是包装了SubtopologyBoltTridentBoltExecutor在checkFinish方法里头会调用finishBatch操作(另外接收到REGULAR类型的tuple时，在tracked.condition.expectedTaskReports==0的时候也会调用finishBatch操作，对于spout来说tracked.condition.expectedTaskReports为0，因为它是数据源，所以不用接收COORD_STREAM更新expectedTaskReports以及expectedTupleCount)，而该操作会往COORD_STREAM这个stream发送new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))，也就是new Fields(“id”, “count”)，即batchId以及发送给目的task的tuple数量，告知下游的它给task发送了多少tuple(taskEmittedTuples数据在CoordinatedOutputCollector的emit及emitDirect方法里头维护)下游也是TridentBoltExecutor，它在接收到COORD_STREAM发来的数据时，更新expectedTupleCount，而每个TridentBoltExecutor在checkFinish方法里头会判断，如果receivedTuples等于expectedTupleCount则表示完整接收完上游发过来的tuple，然后触发finishBatch操作SubtopologyBoltstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/SubtopologyBolt.javapublic class SubtopologyBolt implements ITridentBatchBolt { //…… @Override public void execute(BatchInfo batchInfo, Tuple tuple) { String sourceStream = tuple.getSourceStreamId(); InitialReceiver ir = _roots.get(sourceStream); if(ir==null) { throw new RuntimeException(“Received unexpected tuple " + tuple.toString()); } ir.receive((ProcessorContext) batchInfo.state, tuple); } @Override public void finishBatch(BatchInfo batchInfo) { for(TridentProcessor p: _myTopologicallyOrdered.get(batchInfo.batchGroup)) { p.finishBatch((ProcessorContext) batchInfo.state); } } @Override public Object initBatchState(String batchGroup, Object batchId) { ProcessorContext ret = new ProcessorContext(batchId, new Object[_nodes.size()]); for(TridentProcessor p: _myTopologicallyOrdered.get(batchGroup)) { p.startBatch(ret); } return ret; } @Override public void cleanup() { for(String bg: _myTopologicallyOrdered.keySet()) { for(TridentProcessor p: _myTopologicallyOrdered.get(bg)) { p.cleanup(); } } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { for(Node n: _nodes) { declarer.declareStream(n.streamId, TridentUtils.fieldsConcat(new Fields("$batchId”), n.allOutputFields)); } } @Override public Map<String, Object> getComponentConfiguration() { return null; } protected static class InitialReceiver { List<TridentProcessor> _receivers = new ArrayList<>(); RootFactory _factory; ProjectionFactory _project; String _stream; public InitialReceiver(String stream, Fields allFields) { // TODO: don’t want to project for non-batch bolts…??? // how to distinguish “batch” streams from non-batch streams? _stream = stream; _factory = new RootFactory(allFields); List<String> projected = new ArrayList<>(allFields.toList()); projected.remove(0); _project = new ProjectionFactory(_factory, new Fields(projected)); } public void receive(ProcessorContext context, Tuple tuple) { TridentTuple t = _project.create(_factory.create(tuple)); for(TridentProcessor r: _receivers) { r.execute(context, _stream, t); } } public void addReceiver(TridentProcessor p) { _receivers.add(p); } public Factory getOutputFactory() { return _project; } }}groupBy操作被包装为一个SubtopologyBolt，它的outputFields的第一个field为$batchIdexecute方法会获取对应的InitialReceiver，然后调用receive方法；InitialReceiver的receive方法调用_receivers的execute，这里的receive为AggregateProcessorfinishBatch方法挨个调用_myTopologicallyOrdered.get(batchInfo.batchGroup)返回的TridentProcessor的finishBatch方法，这里就是AggregateProcessor及EachProcessor；BatchInfo，包含batchId、processorContext及batchGroup信息，这里将processorContext(包含TransactionAttempt类型的batchId以及Object数组state，state里头包含GroupCollector、aggregate累加结果等)传递给finishBatch方法AggregateProcessorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/AggregateProcessor.javapublic class AggregateProcessor implements TridentProcessor { Aggregator _agg; TridentContext _context; FreshCollector _collector; Fields _inputFields; ProjectionFactory _projection; public AggregateProcessor(Fields inputFields, Aggregator agg) { _agg = agg; _inputFields = inputFields; } @Override public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) { List<Factory> parents = tridentContext.getParentTupleFactories(); if(parents.size()!=1) { throw new RuntimeException(“Aggregate operation can only have one parent”); } _context = tridentContext; _collector = new FreshCollector(tridentContext); _projection = new ProjectionFactory(parents.get(0), _inputFields); _agg.prepare(conf, new TridentOperationContext(context, _projection)); } @Override public void cleanup() { _agg.cleanup(); } @Override public void startBatch(ProcessorContext processorContext) { _collector.setContext(processorContext); processorContext.state[_context.getStateIndex()] = _agg.init(processorContext.batchId, _collector); } @Override public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { _collector.setContext(processorContext); _agg.aggregate(processorContext.state[_context.getStateIndex()], _projection.create(tuple), _collector); } @Override public void finishBatch(ProcessorContext processorContext) { _collector.setContext(processorContext); _agg.complete(processorContext.state[_context.getStateIndex()], _collector); } @Override public Factory getOutputFactory() { return _collector.getOutputFactory(); }}AggregateProcessor在prepare创建了FreshCollector以及ProjectionFactory对于GroupBy操作来说，这里的_agg为GroupedAggregator，_agg.prepare传递的context为TridentOperationContextfinishBatch方法这里调用_agg.complete方法，传入的arr数组，第一个元素为GroupCollector，第二元素为aggregator的累加值；传入的_collector为FreshCollectorGroupedAggregatorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/impl/GroupedAggregator.javapublic class GroupedAggregator implements Aggregator<Object[]> { ProjectionFactory _groupFactory; ProjectionFactory _inputFactory; Aggregator _agg; ComboList.Factory _fact; Fields _inFields; Fields _groupFields; public GroupedAggregator(Aggregator agg, Fields group, Fields input, int outSize) { _groupFields = group; _inFields = input; _agg = agg; int[] sizes = new int[2]; sizes[0] = _groupFields.size(); sizes[1] = outSize; _fact = new ComboList.Factory(sizes); } @Override public void prepare(Map conf, TridentOperationContext context) { _inputFactory = context.makeProjectionFactory(_inFields); _groupFactory = context.makeProjectionFactory(_groupFields); _agg.prepare(conf, new TridentOperationContext(context, _inputFactory)); } @Override public Object[] init(Object batchId, TridentCollector collector) { return new Object[] {new GroupCollector(collector, _fact), new HashMap(), batchId}; } @Override public void aggregate(Object[] arr, TridentTuple tuple, TridentCollector collector) { GroupCollector groupColl = (GroupCollector) arr[0]; Map<List, Object> val = (Map) arr[1]; TridentTuple group = _groupFactory.create((TridentTupleView) tuple); TridentTuple input = _inputFactory.create((TridentTupleView) tuple); Object curr; if(!val.containsKey(group)) { curr = _agg.init(arr[2], groupColl); val.put((List) group, curr); } else { curr = val.get(group); } groupColl.currGroup = group; _agg.aggregate(curr, input, groupColl); } @Override public void complete(Object[] arr, TridentCollector collector) { Map<List, Object> val = (Map) arr[1]; GroupCollector groupColl = (GroupCollector) arr[0]; for(Entry<List, Object> e: val.entrySet()) { groupColl.currGroup = e.getKey(); _agg.complete(e.getValue(), groupColl); } } @Override public void cleanup() { _agg.cleanup(); } }aggregate方法的arr[0]为GroupCollector；arr[1]为map，key为group字段的TridentTupleView，value为_agg的init返回值用于累加；arr[2]为TransactionAttempt_agg这里为ChainedAggregatorImpl，aggregate首先获取tuple的group字段以及输入的tuple，然后判断arr[1]是否有该group的值，没有就调用_agg的init初始化一个并添加到mapaggregate方法最后调用_agg.aggregate进行累加ChainedAggregatorImplstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/impl/ChainedAggregatorImpl.javapublic class ChainedAggregatorImpl implements Aggregator<ChainedResult> { Aggregator[] _aggs; ProjectionFactory[] _inputFactories; ComboList.Factory _fact; Fields[] _inputFields; public ChainedAggregatorImpl(Aggregator[] aggs, Fields[] inputFields, ComboList.Factory fact) { _aggs = aggs; _inputFields = inputFields; _fact = fact; if(_aggs.length!=_inputFields.length) { throw new IllegalArgumentException(“Require input fields for each aggregator”); } } public void prepare(Map conf, TridentOperationContext context) { _inputFactories = new ProjectionFactory[_inputFields.length]; for(int i=0; i<_inputFields.length; i++) { _inputFactories[i] = context.makeProjectionFactory(_inputFields[i]); _aggs[i].prepare(conf, new TridentOperationContext(context, _inputFactories[i])); } } public ChainedResult init(Object batchId, TridentCollector collector) { ChainedResult initted = new ChainedResult(collector, _aggs.length); for(int i=0; i<_aggs.length; i++) { initted.objs[i] = _aggs[i].init(batchId, initted.collectors[i]); } return initted; } public void aggregate(ChainedResult val, TridentTuple tuple, TridentCollector collector) { val.setFollowThroughCollector(collector); for(int i=0; i<_aggs.length; i++) { TridentTuple projected = _inputFactories[i].create((TridentTupleView) tuple); _aggs[i].aggregate(val.objs[i], projected, val.collectors[i]); } } public void complete(ChainedResult val, TridentCollector collector) { val.setFollowThroughCollector(collector); for(int i=0; i<_aggs.length; i++) { _aggs[i].complete(val.objs[i], val.collectors[i]); } if(_aggs.length > 1) { // otherwise, tuples were emitted directly int[] indices = new int[val.collectors.length]; for(int i=0; i<indices.length; i++) { indices[i] = 0; } boolean keepGoing = true; //emit cross-join of all emitted tuples while(keepGoing) { List[] combined = new List[_aggs.length]; for(int i=0; i< _aggs.length; i++) { CaptureCollector capturer = (CaptureCollector) val.collectors[i]; combined[i] = capturer.captured.get(indices[i]); } collector.emit(_fact.create(combined)); keepGoing = increment(val.collectors, indices, indices.length - 1); } } } //return false if can’t increment anymore private boolean increment(TridentCollector[] lengths, int[] indices, int j) { if(j==-1) return false; indices[j]++; CaptureCollector capturer = (CaptureCollector) lengths[j]; if(indices[j] >= capturer.captured.size()) { indices[j] = 0; return increment(lengths, indices, j-1); } return true; } public void cleanup() { for(Aggregator a: _aggs) { a.cleanup(); } } }init方法返回的是ChainedResult，它的objs字段存放每个_aggs对应的init结果这里的_agg如果是Aggregator类型，则为用户在groupBy之后aggregate方法传入的aggregator；如果是CombinerAggregator类型，它会被CombinerAggregatorCombineImpl包装一下ChainedAggregatorImpl的complete方法，_aggs挨个调用complete，传入的第一个参数为val.objs[i]，即每个_agg对应的累加值小结groupBy被包装为一个SubtopologyBolt，它的execute方法会触发InitialReceiver的receive方法，而receive方法会触发_receivers的execute方法，第一个_receivers为AggregateProcessorAggregateProcessor包装了GroupedAggregator，而GroupedAggregator包装了ChainedAggregatorImpl，而ChainedAggregatorImpl包装了Aggregator数组，本实例只有一个，即在groupBy之后aggregate方法传入的aggregatorTridentBoltExecutor会从coordinator那里接收COORD_STREAM_PREFIX发送过来的应该接收到的tuple的count，然后更新expectedTupleCount，然后进行checkFinish判断，当receivedTuples(每次接收到spout的batch的一个tuple就更新该值)等于expectedTupleCount的时候，会触发finishBatch操作，该操作会调用SubtopologyBolt.finishBatch，进而调用AggregateProcessor.finishBatch，进而调用GroupedAggregator.complete，进而调用ChainedAggregatorImpl.complete，进而调用用户的aggregator的complete对于包装了TridentSpoutExecutor的TridentBoltExecutor来说，它的tracked.condition.expectedTaskReports为0，因为它是数据源，所以不用接收COORD_STREAM更新expectedTaskReports以及expectedTupleCount；当它在execute方法接收到MasterBatchCoordinator的MasterBatchCoordinator.BATCH_STREAM_ID($batch)发来的tuple的时候，调用TridentSpoutExecutor的execute方法，之后就由于tracked.condition.expectedTaskReports==0(本实例两个TridentBoltExecutor的TrackedBatch的condition.commitStream为null，因而receivedCommit为true)，就立即调用finishBatch(里头会调用TridentSpoutExecutor的finishBatch方法，之后通过COORD_STREAM给下游TridentBoltExecutor的task发送batchId及taskEmittedTuples数量；而对于下游TridentBoltExecutor它的expectedTaskReports不为0，则需要在收到COORD_STREAM的tuple的时候才能checkFinish，判断是否可以finishBatch)TridentSpoutExecutor的execute会调用emitter(最后调用用户的spout)发射一个batch；而finishBatch方法目前为空，没有做任何操作；也就是说对于包装了TridentSpoutExecutor的TridentBoltExecutor来说，它接收到发射一个batch的指令之后，调用完TridentSpoutExecutor.execute通过emitter发射一个batch，就立马执行finishBatch操作(发射[id,count]给下游的TridentBoltExecutor，下游TridentBoltExecutor在接收到[id,count]数据时更新expectedTupleCount，然后进行checkFinish判断，如果receivedTuples等于expectedTupleCount，就触发finishBatch操作，进而触发AggregateProcessor的finishBatch操作)docWindowing Support in Core Storm聊聊storm TridentTopology的构建聊聊storm trident的coordinator ...

聊聊storm window trident的FreshCollector

序本文主要研究一下storm window trident的FreshCollector实例 TridentTopology topology = new TridentTopology(); topology.newStream(“spout1”, spout) .partitionBy(new Fields(“user”)) .window(windowConfig,windowsStoreFactory,new Fields(“user”,“score”),new UserCountAggregator(),new Fields(“aggData”)) .parallelismHint(1) .each(new Fields(“aggData”), new PrintEachFunc(),new Fields());WindowTridentProcessorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.javapublic class WindowTridentProcessor implements TridentProcessor { private FreshCollector collector; //…… public void prepare(Map stormConf, TopologyContext context, TridentContext tridentContext) { this.topologyContext = context; List<TridentTuple.Factory> parents = tridentContext.getParentTupleFactories(); if (parents.size() != 1) { throw new RuntimeException(“Aggregation related operation can only have one parent”); } Long maxTuplesCacheSize = getWindowTuplesCacheSize(stormConf); this.tridentContext = tridentContext; collector = new FreshCollector(tridentContext); projection = new TridentTupleView.ProjectionFactory(parents.get(0), inputFields); windowStore = windowStoreFactory.create(stormConf); windowTaskId = windowId + WindowsStore.KEY_SEPARATOR + topologyContext.getThisTaskId() + WindowsStore.KEY_SEPARATOR; windowTriggerInprocessId = getWindowTriggerInprocessIdPrefix(windowTaskId); tridentWindowManager = storeTuplesInStore ? new StoreBasedTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector(), maxTuplesCacheSize, inputFields) : new InMemoryTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector()); tridentWindowManager.prepare(); } public void finishBatch(ProcessorContext processorContext) { Object batchId = processorContext.batchId; Object batchTxnId = getBatchTxnId(batchId); LOG.debug(“Received finishBatch of : [{}] “, batchId); // get all the tuples in a batch and add it to trident-window-manager List<TridentTuple> tuples = (List<TridentTuple>) processorContext.state[tridentContext.getStateIndex()]; tridentWindowManager.addTuplesBatch(batchId, tuples); List<Integer> pendingTriggerIds = null; List<String> triggerKeys = new ArrayList<>(); Iterable<Object> triggerValues = null; if (retriedAttempt(batchId)) { pendingTriggerIds = (List<Integer>) windowStore.get(inprocessTriggerKey(batchTxnId)); if (pendingTriggerIds != null) { for (Integer pendingTriggerId : pendingTriggerIds) { triggerKeys.add(triggerKey(pendingTriggerId)); } triggerValues = windowStore.get(triggerKeys); } } // if there are no trigger values in earlier attempts or this is a new batch, emit pending triggers. if(triggerValues == null) { pendingTriggerIds = new ArrayList<>(); Queue<StoreBasedTridentWindowManager.TriggerResult> pendingTriggers = tridentWindowManager.getPendingTriggers(); LOG.debug(“pending triggers at batch: [{}] and triggers.size: [{}] “, batchId, pendingTriggers.size()); try { Iterator<StoreBasedTridentWindowManager.TriggerResult> pendingTriggersIter = pendingTriggers.iterator(); List<Object> values = new ArrayList<>(); StoreBasedTridentWindowManager.TriggerResult triggerResult = null; while (pendingTriggersIter.hasNext()) { triggerResult = pendingTriggersIter.next(); for (List<Object> aggregatedResult : triggerResult.result) { String triggerKey = triggerKey(triggerResult.id); triggerKeys.add(triggerKey); values.add(aggregatedResult); pendingTriggerIds.add(triggerResult.id); } pendingTriggersIter.remove(); } triggerValues = values; } finally { // store inprocess triggers of a batch in store for batch retries for any failures if (!pendingTriggerIds.isEmpty()) { windowStore.put(inprocessTriggerKey(batchTxnId), pendingTriggerIds); } } } collector.setContext(processorContext); int i = 0; for (Object resultValue : triggerValues) { collector.emit(new ConsList(new TriggerInfo(windowTaskId, pendingTriggerIds.get(i++)), (List<Object>) resultValue)); } collector.setContext(null); }}WindowTridentProcessor在prepare的时候创建了FreshCollectorfinishBatch的时候，调用FreshCollector.emit将窗口的aggregate的结果集传递过去传递的数据结构为ConsList，其实是个AbstractList的实现，由Object类型的first元素，以及List<Object>结构的_elems组成FreshCollectorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/FreshCollector.javapublic class FreshCollector implements TridentCollector { FreshOutputFactory _factory; TridentContext _triContext; ProcessorContext context; public FreshCollector(TridentContext context) { _triContext = context; _factory = new FreshOutputFactory(context.getSelfOutputFields()); } public void setContext(ProcessorContext pc) { this.context = pc; } @Override public void emit(List<Object> values) { TridentTuple toEmit = _factory.create(values); for(TupleReceiver r: _triContext.getReceivers()) { r.execute(context, _triContext.getOutStreamId(), toEmit); } } @Override public void reportError(Throwable t) { _triContext.getDelegateCollector().reportError(t); } public Factory getOutputFactory() { return _factory; } }FreshCollector在构造器里头根据context的selfOutputFields(第一个field固定为_task_info，之后的几个field为用户在window方法定义的functionFields)构造FreshOutputFactoryemit方法，首先使用FreshOutputFactory根据outputFields构造TridentTupleView，之后获取TupleReceiver，调用TupleReceiver的execute方法把TridentTupleView传递过去这里的TupleReceiver有ProjectedProcessor、PartitionPersistProcessorTridentTupleView.FreshOutputFactorystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/TridentTupleView.java public static class FreshOutputFactory implements Factory { Map<String, ValuePointer> _fieldIndex; ValuePointer[] _index; public FreshOutputFactory(Fields selfFields) { _fieldIndex = new HashMap<>(); for(int i=0; i<selfFields.size(); i++) { String field = selfFields.get(i); _fieldIndex.put(field, new ValuePointer(0, i, field)); } _index = ValuePointer.buildIndex(selfFields, _fieldIndex); } public TridentTuple create(List<Object> selfVals) { return new TridentTupleView(PersistentVector.EMPTY.cons(selfVals), _index, _fieldIndex); } @Override public Map<String, ValuePointer> getFieldIndex() { return _fieldIndex; } @Override public int numDelegates() { return 1; } @Override public List<String> getOutputFields() { return indexToFieldsList(_index); } }FreshOutputFactory是TridentTupleView的一个静态类，其构造方法主要是计算_index以及_fieldIndex_fieldIndex是一个map，key是field字段，value是ValuePointer，记录其delegateIndex(这里固定为0)、index及field信息；第一个field为_task_info，index为0；之后的fields为用户在window方法定义的functionFields这里的create方法主要是构造TridentTupleView，其构造器第一个值为IPersistentVector，第二个值为_index，第三个值为_fieldIndexValuePointerstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/ValuePointer.javapublic class ValuePointer { public static Map<String, ValuePointer> buildFieldIndex(ValuePointer[] pointers) { Map<String, ValuePointer> ret = new HashMap<String, ValuePointer>(); for(ValuePointer ptr: pointers) { ret.put(ptr.field, ptr); } return ret; } public static ValuePointer[] buildIndex(Fields fieldsOrder, Map<String, ValuePointer> pointers) { if(fieldsOrder.size()!=pointers.size()) { throw new IllegalArgumentException(“Fields order must be same length as pointers map”); } ValuePointer[] ret = new ValuePointer[pointers.size()]; for(int i=0; i<fieldsOrder.size(); i++) { ret[i] = pointers.get(fieldsOrder.get(i)); } return ret; } public int delegateIndex; protected int index; protected String field; public ValuePointer(int delegateIndex, int index, String field) { this.delegateIndex = delegateIndex; this.index = index; this.field = field; } @Override public String toString() { return ToStringBuilder.reflectionToString(this); } }这里的buildIndex，主要是根据selfOutputFields的顺序返回ValuePointer数组ProjectedProcessorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/ProjectedProcessor.javapublic class ProjectedProcessor implements TridentProcessor { Fields _projectFields; ProjectionFactory _factory; TridentContext _context; public ProjectedProcessor(Fields projectFields) { _projectFields = projectFields; } @Override public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) { if(tridentContext.getParentTupleFactories().size()!=1) { throw new RuntimeException(“Projection processor can only have one parent”); } _context = tridentContext; _factory = new ProjectionFactory(tridentContext.getParentTupleFactories().get(0), _projectFields); } @Override public void cleanup() { } @Override public void startBatch(ProcessorContext processorContext) { } @Override public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { TridentTuple toEmit = _factory.create(tuple); for(TupleReceiver r: _context.getReceivers()) { r.execute(processorContext, _context.getOutStreamId(), toEmit); } } @Override public void finishBatch(ProcessorContext processorContext) { } @Override public Factory getOutputFactory() { return _factory; }}ProjectedProcessor在prepare的时候，创建了ProjectionFactory，其_projectFields就是window方法定义的functionFields，这里还使用tridentContext.getParentTupleFactories().get(0)提取了parent的第一个Factory，由于是FreshCollector传递过来的，因而这里是TridentTupleView.FreshOutputFactoryexecute的时候，首先调用ProjectionFactory.create方法，对TridentTupleView进行字段提取操作，toEmit就是根据window方法定义的functionFields重新提取的TridentTupleViewexecute方法之后对_context.getReceivers()挨个调用execute操作，将toEmit传递过去，这里的receiver就是window操作之后的各种processor了，比如EachProcessorTridentTupleView.ProjectionFactorystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/TridentTupleView.javapublic static class ProjectionFactory implements Factory { Map<String, ValuePointer> _fieldIndex; ValuePointer[] _index; Factory _parent; public ProjectionFactory(Factory parent, Fields projectFields) { _parent = parent; if(projectFields==null) projectFields = new Fields(); Map<String, ValuePointer> parentFieldIndex = parent.getFieldIndex(); _fieldIndex = new HashMap<>(); for(String f: projectFields) { _fieldIndex.put(f, parentFieldIndex.get(f)); } _index = ValuePointer.buildIndex(projectFields, _fieldIndex); } public TridentTuple create(TridentTuple parent) { if(_index.length==0) return EMPTY_TUPLE; else return new TridentTupleView(((TridentTupleView)parent)._delegates, _index, _fieldIndex); } @Override public Map<String, ValuePointer> getFieldIndex() { return _fieldIndex; } @Override public int numDelegates() { return _parent.numDelegates(); } @Override public List<String> getOutputFields() { return indexToFieldsList(_index); } }ProjectionFactory是TridentTupleView的静态类，它在构造器里头根据projectFields构造_index及_fieldIndex，这样create方法就能根据所需的字段创建TridentTupleViewEachProcessorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/EachProcessor.javapublic class EachProcessor implements TridentProcessor { Function _function; TridentContext _context; AppendCollector _collector; Fields _inputFields; ProjectionFactory _projection; public EachProcessor(Fields inputFields, Function function) { _function = function; _inputFields = inputFields; } @Override public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) { List<Factory> parents = tridentContext.getParentTupleFactories(); if(parents.size()!=1) { throw new RuntimeException(“Each operation can only have one parent”); } _context = tridentContext; _collector = new AppendCollector(tridentContext); _projection = new ProjectionFactory(parents.get(0), _inputFields); _function.prepare(conf, new TridentOperationContext(context, _projection)); } @Override public void cleanup() { _function.cleanup(); } @Override public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { _collector.setContext(processorContext, tuple); _function.execute(_projection.create(tuple), _collector); } @Override public void startBatch(ProcessorContext processorContext) { } @Override public void finishBatch(ProcessorContext processorContext) { } @Override public Factory getOutputFactory() { return _collector.getOutputFactory(); } }EachProcessor的execute方法，首先设置_collector的context为processorContext，然后调用_function.execute方法这里调用了_projection.create(tuple)来提取字段，主要是根据_function定义的inputFields来提取这里传递给_function的collector为AppendCollectorAppendCollectorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/AppendCollector.javapublic class AppendCollector implements TridentCollector { OperationOutputFactory _factory; TridentContext _triContext; TridentTuple tuple; ProcessorContext context; public AppendCollector(TridentContext context) { _triContext = context; _factory = new OperationOutputFactory(context.getParentTupleFactories().get(0), context.getSelfOutputFields()); } public void setContext(ProcessorContext pc, TridentTuple t) { this.context = pc; this.tuple = t; } @Override public void emit(List<Object> values) { TridentTuple toEmit = _factory.create((TridentTupleView) tuple, values); for(TupleReceiver r: _triContext.getReceivers()) { r.execute(context, _triContext.getOutStreamId(), toEmit); } } @Override public void reportError(Throwable t) { _triContext.getDelegateCollector().reportError(t); } public Factory getOutputFactory() { return _factory; }}AppendCollector在构造器里头创建了OperationOutputFactory，其emit方法也是提取OperationOutputFields，然后挨个调用_triContext.getReceivers()的execute方法；如果each之后没有其他操作，那么AppendCollector的_triContext.getReceivers()就为空小结WindowTridentProcessor里头使用的是FreshCollector，WindowTridentProcessor在finishBatch的时候，会从TridentWindowManager提取window创建的pendingTriggers(提取之后会将其数据从pendingTriggers移除)，里头包含了窗口累积的数据，然后使用FreshCollector发射这些数据，默认第一个value为TriggerInfo，第二个value就是窗口累积发射的valuesFreshCollector的emit方法首先使用TridentTupleView.FreshOutputFactory根据selfOutputFields(第一个field固定为_task_info，之后的几个field为用户在window方法定义的functionFields)构建TridentTupleView，然后挨个调用_triContext.getReceivers()的execute方法后续的receivers中有一个ProjectedProcessor，用于根据window方法定义的functionFields重新提取的TridentTupleView，它的execute方法也类似FreshCollector.emit方法，先提取所需字段构造TridentTupleView，然后挨个调用_triContext.getReceivers()的execute方法(比如EachProcessor.execute)EachProcessor使用的collector为AppendCollector，它的emit方法也类似FreshCollector的emit方法，先进行字段提取构造TridentTupleView，然后挨个调用_triContext.getReceivers()的execute方法FreshCollector的emit方法与ProjectedProcessor的execute方法以及AppendCollector的emit方法都非常类似，首先是使用Factory提取所需字段构建TridentTupleView，然后挨个调用_triContext.getReceivers()的execute方法；当一个_triContext没有receiver的时候，tuple的传递也就停止了docWindowing Support in Core Storm ...

聊聊storm TridentWindowManager的pendingTriggers

序本文主要研究一下storm TridentWindowManager的pendingTriggersTridentBoltExecutor.finishBatchstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) { boolean success = true; try { _bolt.finishBatch(tracked.info); String stream = COORD_STREAM(tracked.info.batchGroup); for(Integer task: tracked.condition.targetTasks) { _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0))); } if(tracked.delayedAck!=null) { _collector.ack(tracked.delayedAck); tracked.delayedAck = null; } } catch(FailedException e) { failBatch(tracked, e); success = false; } _batches.remove(tracked.info.batchId.getId()); return success; }这里调用_bolt的finishBatch方法，这个_bolt有两个实现类，分别是TridentSpoutExecutor用于spout，一个是SubtopologyBolt用于普通的boltSubtopologyBolt.finishBatchstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/SubtopologyBolt.java public void finishBatch(BatchInfo batchInfo) { for(TridentProcessor p: _myTopologicallyOrdered.get(batchInfo.batchGroup)) { p.finishBatch((ProcessorContext) batchInfo.state); } }SubtopologyBolt.finishBatch调用了一系列TridentProcessor的finishBatch操作WindowTridentProcessor.finishBatchstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.java public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { // add tuple to the batch state Object state = processorContext.state[tridentContext.getStateIndex()]; ((List<TridentTuple>) state).add(projection.create(tuple)); } public void finishBatch(ProcessorContext processorContext) { Object batchId = processorContext.batchId; Object batchTxnId = getBatchTxnId(batchId); LOG.debug(“Received finishBatch of : [{}] “, batchId); // get all the tuples in a batch and add it to trident-window-manager List<TridentTuple> tuples = (List<TridentTuple>) processorContext.state[tridentContext.getStateIndex()]; tridentWindowManager.addTuplesBatch(batchId, tuples); List<Integer> pendingTriggerIds = null; List<String> triggerKeys = new ArrayList<>(); Iterable<Object> triggerValues = null; if (retriedAttempt(batchId)) { pendingTriggerIds = (List<Integer>) windowStore.get(inprocessTriggerKey(batchTxnId)); if (pendingTriggerIds != null) { for (Integer pendingTriggerId : pendingTriggerIds) { triggerKeys.add(triggerKey(pendingTriggerId)); } triggerValues = windowStore.get(triggerKeys); } } // if there are no trigger values in earlier attempts or this is a new batch, emit pending triggers. if(triggerValues == null) { pendingTriggerIds = new ArrayList<>(); Queue<StoreBasedTridentWindowManager.TriggerResult> pendingTriggers = tridentWindowManager.getPendingTriggers(); LOG.debug(“pending triggers at batch: [{}] and triggers.size: [{}] “, batchId, pendingTriggers.size()); try { Iterator<StoreBasedTridentWindowManager.TriggerResult> pendingTriggersIter = pendingTriggers.iterator(); List<Object> values = new ArrayList<>(); StoreBasedTridentWindowManager.TriggerResult triggerResult = null; while (pendingTriggersIter.hasNext()) { triggerResult = pendingTriggersIter.next(); for (List<Object> aggregatedResult : triggerResult.result) { String triggerKey = triggerKey(triggerResult.id); triggerKeys.add(triggerKey); values.add(aggregatedResult); pendingTriggerIds.add(triggerResult.id); } pendingTriggersIter.remove(); } triggerValues = values; } finally { // store inprocess triggers of a batch in store for batch retries for any failures if (!pendingTriggerIds.isEmpty()) { windowStore.put(inprocessTriggerKey(batchTxnId), pendingTriggerIds); } } } collector.setContext(processorContext); int i = 0; for (Object resultValue : triggerValues) { collector.emit(new ConsList(new TriggerInfo(windowTaskId, pendingTriggerIds.get(i++)), (List<Object>) resultValue)); } collector.setContext(null); }WindowTridentProcessor所在的bolt，ack一个batch的所有tuple之后，会执行finishBatch操作WindowTridentProcessor的execute，接收到一个tuple，堆积到processorContext.statefinishBatch的时候，从processorContext.state取出这一批tuple，然后调用tridentWindowManager.addTuplesBatch(batchId, tuples)之后调用tridentWindowManager.getPendingTriggers()获取pendingTriggerIds存入store，同时获取待触发的triggerValues最后将triggerValues挨个构造TriggerInfo以及resultValue发送出去StoreBasedTridentWindowManager.addTuplesBatchstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/StoreBasedTridentWindowManager.java public void addTuplesBatch(Object batchId, List<TridentTuple> tuples) { LOG.debug(“Adding tuples to window-manager for batch: [{}]”, batchId); List<WindowsStore.Entry> entries = new ArrayList<>(); for (int i = 0; i < tuples.size(); i++) { String key = keyOf(batchId); TridentTuple tridentTuple = tuples.get(i); entries.add(new WindowsStore.Entry(key+i, tridentTuple.select(inputFields))); } // tuples should be available in store before they are added to window manager windowStore.putAll(entries); for (int i = 0; i < tuples.size(); i++) { String key = keyOf(batchId); TridentTuple tridentTuple = tuples.get(i); addToWindowManager(i, key, tridentTuple); } } private void addToWindowManager(int tupleIndex, String effectiveBatchId, TridentTuple tridentTuple) { TridentTuple actualTuple = null; if (maxCachedTuplesSize == null || currentCachedTuplesSize.get() < maxCachedTuplesSize) { actualTuple = tridentTuple; } currentCachedTuplesSize.incrementAndGet(); windowManager.add(new TridentBatchTuple(effectiveBatchId, System.currentTimeMillis(), tupleIndex, actualTuple)); }StoreBasedTridentWindowManager的addTuplesBatch方法，将这批tuple放入到windowStore，然后挨个addToWindowManager添加到windowManagerWindowManager.addstorm-core-1.2.2-sources.jar!/org/apache/storm/windowing/WindowManager.java private final ConcurrentLinkedQueue<Event<T>> queue; /** * Add an event into the window, with {@link System#currentTimeMillis()} as * the tracking ts. * * @param event the event to add / public void add(T event) { add(event, System.currentTimeMillis()); } /* * Add an event into the window, with the given ts as the tracking ts. * * @param event the event to track * @param ts the timestamp / public void add(T event, long ts) { add(new EventImpl<T>(event, ts)); } /* * Tracks a window event * * @param windowEvent the window event to track / public void add(Event<T> windowEvent) { // watermark events are not added to the queue. if (!windowEvent.isWatermark()) { queue.add(windowEvent); } else { LOG.debug(“Got watermark event with ts {}”, windowEvent.getTimestamp()); } track(windowEvent); compactWindow(); }添加tuple到ConcurrentLinkedQueue中WindowManager.onTriggerstorm-core-1.2.2-sources.jar!/org/apache/storm/windowing/WindowManager.java /* * The callback invoked by the trigger policy. / @Override public boolean onTrigger() { List<Event<T>> windowEvents = null; List<T> expired = null; try { lock.lock(); / * scan the entire window to handle out of order events in * the case of time based windows. / windowEvents = scanEvents(true); expired = new ArrayList<>(expiredEvents); expiredEvents.clear(); } finally { lock.unlock(); } List<T> events = new ArrayList<>(); List<T> newEvents = new ArrayList<>(); for (Event<T> event : windowEvents) { events.add(event.get()); if (!prevWindowEvents.contains(event)) { newEvents.add(event.get()); } } prevWindowEvents.clear(); if (!events.isEmpty()) { prevWindowEvents.addAll(windowEvents); LOG.debug(“invoking windowLifecycleListener onActivation, [{}] events in window.”, events.size()); windowLifecycleListener.onActivation(events, newEvents, expired); } else { LOG.debug(“No events in the window, skipping onActivation”); } triggerPolicy.reset(); return !events.isEmpty(); }onTrigger方法首先调用scanEvents方法获取windowEvents，之后区分为events及newEvents，然后回调windowLifecycleListener.onActivation(events, newEvents, expired)方法WindowManager.scanEventsstorm-core-1.2.2-sources.jar!/org/apache/storm/windowing/WindowManager.java /* * Scan events in the queue, using the expiration policy to check * if the event should be evicted or not. * * @param fullScan if set, will scan the entire queue; if not set, will stop * as soon as an event not satisfying the expiration policy is found * @return the list of events to be processed as a part of the current window / private List<Event<T>> scanEvents(boolean fullScan) { LOG.debug(“Scan events, eviction policy {}”, evictionPolicy); List<T> eventsToExpire = new ArrayList<>(); List<Event<T>> eventsToProcess = new ArrayList<>(); try { lock.lock(); Iterator<Event<T>> it = queue.iterator(); while (it.hasNext()) { Event<T> windowEvent = it.next(); Action action = evictionPolicy.evict(windowEvent); if (action == EXPIRE) { eventsToExpire.add(windowEvent.get()); it.remove(); } else if (!fullScan || action == STOP) { break; } else if (action == PROCESS) { eventsToProcess.add(windowEvent); } } expiredEvents.addAll(eventsToExpire); } finally { lock.unlock(); } eventsSinceLastExpiry.set(0); LOG.debug(”[{}] events expired from window.”, eventsToExpire.size()); if (!eventsToExpire.isEmpty()) { LOG.debug(“invoking windowLifecycleListener.onExpiry”); windowLifecycleListener.onExpiry(eventsToExpire); } return eventsToProcess; }scanEvents方法从ConcurrentLinkedQueue中获取event，然后判断是否过期，将其分为expiredEvents、eventsToProcess两类，返回eventsToProcess的eventsTridentWindowLifeCycleListener.onActivationstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/AbstractTridentWindowManager.java /* * Listener to reeive any activation/expiry of windowing events and take further action on them. */ class TridentWindowLifeCycleListener implements WindowLifecycleListener<T> { @Override public void onExpiry(List<T> expiredEvents) { LOG.debug(“onExpiry is invoked”); onTuplesExpired(expiredEvents); } @Override public void onActivation(List<T> events, List<T> newEvents, List<T> expired) { LOG.debug(“onActivation is invoked with events size: [{}]”, events.size()); // trigger occurred, create an aggregation and keep them in store int currentTriggerId = triggerId.incrementAndGet(); execAggregatorAndStoreResult(currentTriggerId, events); } } private void execAggregatorAndStoreResult(int currentTriggerId, List<T> tupleEvents) { List<TridentTuple> resultTuples = getTridentTuples(tupleEvents); // run aggregator to compute the result AccumulatedTuplesCollector collector = new AccumulatedTuplesCollector(delegateCollector); Object state = aggregator.init(currentTriggerId, collector); for (TridentTuple resultTuple : resultTuples) { aggregator.aggregate(state, resultTuple, collector); } aggregator.complete(state, collector); List<List<Object>> resultantAggregatedValue = collector.values; ArrayList<WindowsStore.Entry> entries = Lists.newArrayList(new WindowsStore.Entry(windowTriggerCountId, currentTriggerId + 1), new WindowsStore.Entry(WindowTridentProcessor.generateWindowTriggerKey(windowTaskId, currentTriggerId), resultantAggregatedValue)); windowStore.putAll(entries); pendingTriggers.add(new TriggerResult(currentTriggerId, resultantAggregatedValue)); }onActivation方法调用了execAggregatorAndStoreResult，它会调用window的aggregator，然后将结果存到windowStore，同时将resultantAggregatedValue作为TriggerResult添加到pendingTriggers中小结WindowTridentProcessor所在的TridentBoltExecutor，它在接收到spout的tuple的时候，调用processor的execute方法，将tuple缓存到ProcessorContext中；一系列的processor的execute方法执行完之后，就ack该tuple当WindowTridentProcessor所在的TridentBoltExecutor对一个batch的所有tuple ack完之后，会触发checkFinish操作，然后执行finishBatch操作，而finishBatch操作会调用一系列TridentProcessor的finishBatch操作(比如WindowTridentProcessor -> ProjectedProcessor -> PartitionPersistProcessor -> EachProcessor -> AggregateProcessor)WindowTridentProcessor.finishBatch从processorContext.state取出这一批tuple，然后调用tridentWindowManager.addTuplesBatch(batchId, tuples)，将这批tuple放入到windowStore，然后添加到windowManager的ConcurrentLinkedQueue中；之后调用tridentWindowManager.getPendingTriggers()获取pendingTriggerIds存入store，同时获取待触发的triggerValues，将triggerValues挨个构造TriggerInfo以及resultValue发送出去而WindowManager.onTrigger方法，在window操作时间窗口触发时被调用，它从windowManager的ConcurrentLinkedQueue中获取windowEvent，然后传递给TridentWindowLifeCycleListener.onActivationTridentWindowLifeCycleListener.onActivation方法则会执行window的aggregator的init、aggregate、complete操作获取聚合结果resultantAggregatedValue，然后放入pendingTriggers，至此完成window trigger与WindowTridentProcessor的衔接docWindowing Support in Core Storm ...

聊聊storm的window trigger

序本文主要研究一下storm的window triggerWindowTridentProcessor.preparestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.java public void prepare(Map stormConf, TopologyContext context, TridentContext tridentContext) { this.topologyContext = context; List<TridentTuple.Factory> parents = tridentContext.getParentTupleFactories(); if (parents.size() != 1) { throw new RuntimeException(“Aggregation related operation can only have one parent”); } Long maxTuplesCacheSize = getWindowTuplesCacheSize(stormConf); this.tridentContext = tridentContext; collector = new FreshCollector(tridentContext); projection = new TridentTupleView.ProjectionFactory(parents.get(0), inputFields); windowStore = windowStoreFactory.create(stormConf); windowTaskId = windowId + WindowsStore.KEY_SEPARATOR + topologyContext.getThisTaskId() + WindowsStore.KEY_SEPARATOR; windowTriggerInprocessId = getWindowTriggerInprocessIdPrefix(windowTaskId); tridentWindowManager = storeTuplesInStore ? new StoreBasedTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector(), maxTuplesCacheSize, inputFields) : new InMemoryTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector()); tridentWindowManager.prepare(); }这里调用了tridentWindowManager.prepare()AbstractTridentWindowManager.preparestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/AbstractTridentWindowManager.java public AbstractTridentWindowManager(WindowConfig windowConfig, String windowTaskId, WindowsStore windowStore, Aggregator aggregator, BatchOutputCollector delegateCollector) { this.windowTaskId = windowTaskId; this.windowStore = windowStore; this.aggregator = aggregator; this.delegateCollector = delegateCollector; windowTriggerCountId = WindowTridentProcessor.TRIGGER_COUNT_PREFIX + windowTaskId; windowManager = new WindowManager<>(new TridentWindowLifeCycleListener()); WindowStrategy<T> windowStrategy = windowConfig.getWindowStrategy(); EvictionPolicy<T> evictionPolicy = windowStrategy.getEvictionPolicy(); windowManager.setEvictionPolicy(evictionPolicy); triggerPolicy = windowStrategy.getTriggerPolicy(windowManager, evictionPolicy); windowManager.setTriggerPolicy(triggerPolicy); } public void prepare() { preInitialize(); initialize(); postInitialize(); } private void postInitialize() { // start trigger once the initialization is done. triggerPolicy.start(); }AbstractTridentWindowManager在构造器里头调用windowStrategy.getTriggerPolicy获取triggerPolicy；prepare方法调用了postInitialize，而它触发triggerPolicy.start()SlidingDurationWindowStrategy.getTriggerPolicystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/strategy/SlidingDurationWindowStrategy.java /** * Returns a {@code TriggerPolicy} which triggers for every configured sliding window duration. * * @param triggerHandler * @param evictionPolicy * @return / @Override public TriggerPolicy<T> getTriggerPolicy(TriggerHandler triggerHandler, EvictionPolicy<T> evictionPolicy) { return new TimeTriggerPolicy<>(windowConfig.getSlidingLength(), triggerHandler, evictionPolicy); }以SlidingDurationWindowStrategy为例，这里创建的是TimeTriggerPolicy，其duration为windowConfig.getSlidingLength()，而triggerHandler则为WindowManagerTimeTriggerPolicy.startstorm-core-1.2.2-sources.jar!/org/apache/storm/windowing/TimeTriggerPolicy.java public void start() { executorFuture = executor.scheduleAtFixedRate(newTriggerTask(), duration, duration, TimeUnit.MILLISECONDS); } private Runnable newTriggerTask() { return new Runnable() { @Override public void run() { // do not process current timestamp since tuples might arrive while the trigger is executing long now = System.currentTimeMillis() - 1; try { / * set the current timestamp as the reference time for the eviction policy * to evict the events / if (evictionPolicy != null) { evictionPolicy.setContext(new DefaultEvictionContext(now, null, null, duration)); } handler.onTrigger(); } catch (Throwable th) { LOG.error(“handler.onTrigger failed “, th); / * propagate it so that task gets canceled and the exception * can be retrieved from executorFuture.get() / throw th; } } }; }start方法注册了一个调度任务，每隔duration触发(windowConfig.getSlidingLength())；而run方法是触发handler.onTrigger()，即WindowManager.onTrigger()WindowManager.onTriggerstorm-core-1.2.2-sources.jar!/org/apache/storm/windowing/WindowManager.java /* * The callback invoked by the trigger policy. / @Override public boolean onTrigger() { List<Event<T>> windowEvents = null; List<T> expired = null; try { lock.lock(); / * scan the entire window to handle out of order events in * the case of time based windows. / windowEvents = scanEvents(true); expired = new ArrayList<>(expiredEvents); expiredEvents.clear(); } finally { lock.unlock(); } List<T> events = new ArrayList<>(); List<T> newEvents = new ArrayList<>(); for (Event<T> event : windowEvents) { events.add(event.get()); if (!prevWindowEvents.contains(event)) { newEvents.add(event.get()); } } prevWindowEvents.clear(); if (!events.isEmpty()) { prevWindowEvents.addAll(windowEvents); LOG.debug(“invoking windowLifecycleListener onActivation, [{}] events in window.”, events.size()); windowLifecycleListener.onActivation(events, newEvents, expired); } else { LOG.debug(“No events in the window, skipping onActivation”); } triggerPolicy.reset(); return !events.isEmpty(); }这里调用了windowLifecycleListener.onActivation(events, newEvents, expired)，而windowLifecycleListener为AbstractTridentWindowManager的TridentWindowLifeCycleListenerTridentWindowLifeCycleListener.onActivationstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/AbstractTridentWindowManager.java /* * Listener to reeive any activation/expiry of windowing events and take further action on them. */ class TridentWindowLifeCycleListener implements WindowLifecycleListener<T> { @Override public void onExpiry(List<T> expiredEvents) { LOG.debug(“onExpiry is invoked”); onTuplesExpired(expiredEvents); } @Override public void onActivation(List<T> events, List<T> newEvents, List<T> expired) { LOG.debug(“onActivation is invoked with events size: [{}]”, events.size()); // trigger occurred, create an aggregation and keep them in store int currentTriggerId = triggerId.incrementAndGet(); execAggregatorAndStoreResult(currentTriggerId, events); } } private void execAggregatorAndStoreResult(int currentTriggerId, List<T> tupleEvents) { List<TridentTuple> resultTuples = getTridentTuples(tupleEvents); // run aggregator to compute the result AccumulatedTuplesCollector collector = new AccumulatedTuplesCollector(delegateCollector); Object state = aggregator.init(currentTriggerId, collector); for (TridentTuple resultTuple : resultTuples) { aggregator.aggregate(state, resultTuple, collector); } aggregator.complete(state, collector); List<List<Object>> resultantAggregatedValue = collector.values; ArrayList<WindowsStore.Entry> entries = Lists.newArrayList(new WindowsStore.Entry(windowTriggerCountId, currentTriggerId + 1), new WindowsStore.Entry(WindowTridentProcessor.generateWindowTriggerKey(windowTaskId, currentTriggerId), resultantAggregatedValue)); windowStore.putAll(entries); pendingTriggers.add(new TriggerResult(currentTriggerId, resultantAggregatedValue)); }TridentWindowLifeCycleListener.onActivation方法主要是execAggregatorAndStoreResult而execAggregatorAndStoreResult则依次调用aggregator的init、aggregate及complete方法最后将TriggerResult放入pendingTriggers小结storm在TimeTriggerPolicy.start的时候注册了定时任务TriggerTask，以SlidingDurationWindowStrategy为例，它的调度间隔为windowConfig.getSlidingLength()TriggerTask定时触发WindowManager.onTrigger方法，该方法会回调windowLifecycleListener.onActivationAbstractTridentWindowManager提供了TridentWindowLifeCycleListener，它的onActivation主要是调用execAggregatorAndStoreResult；而execAggregatorAndStoreResult方法主要完成对aggregator的一系列调用，先是调用init方法，然后遍历resultTuples挨个调用aggregate方法，最后complete方法(从这里可以清晰看到Aggregator接口的各个方法的调用逻辑及顺序)docWindowing Support in Core Storm ...

[case45]聊聊storm-kafka-client的ProcessingGuarantee

序本文主要研究一下storm-kafka-client的ProcessingGuaranteeProcessingGuaranteestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpoutConfig.java /** * This enum controls when the tuple with the {@link ConsumerRecord} for an offset is marked as processed, * i.e. when the offset can be committed to Kafka. The default value is AT_LEAST_ONCE. * The commit interval is controlled by {@link KafkaSpoutConfig#getOffsetsCommitPeriodMs() }, if the mode commits on an interval. * NO_GUARANTEE may be removed in a later release without warning, we’re still evaluating whether it makes sense to keep. / @InterfaceStability.Unstable public enum ProcessingGuarantee { /* * An offset is ready to commit only after the corresponding tuple has been processed and acked (at least once). If a tuple fails or * times out it will be re-emitted, as controlled by the {@link KafkaSpoutRetryService}. Commits synchronously on the defined * interval. / AT_LEAST_ONCE, /* * Every offset will be synchronously committed to Kafka right after being polled but before being emitted to the downstream * components of the topology. The commit interval is ignored. This mode guarantees that the offset is processed at most once by * ensuring the spout won’t retry tuples that fail or time out after the commit to Kafka has been done / AT_MOST_ONCE, /* * The polled offsets are ready to commit immediately after being polled. The offsets are committed periodically, i.e. a message may * be processed 0, 1 or more times. This behavior is similar to setting enable.auto.commit=true in the consumer, but allows the * spout to control when commits occur. Commits asynchronously on the defined interval. / NO_GUARANTEE, }storm-kafka-client与旧版的storm-kafka不同之一就是引入了ProcessingGuarantee，是的整个代码更为清晰ProcessingGuarantee.AT_LEAST_ONCE就是开启ack的版本，它类似kafka client的auto commit，在指定interval定期commitProcessingGuarantee.AT_MOST_ONCE，它就不管ack了，在polled out消息的时候同步commit(忽略interval配置)，因而该消息最多被处理一次ProcessingGuarantee.NO_GUARANTEE，这个也是不管ack的，不过它跟ProcessingGuarantee.AT_LEAST_ONCE类似，是在指定interval定期commit，不同的是它是异步提交KafkaSpout.openstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.javapublic class KafkaSpout<K, V> extends BaseRichSpout { //Initial delay for the commit and subscription refresh timers public static final long TIMER_DELAY_MS = 500; // timer == null only if the processing guarantee is at-most-once private transient Timer commitTimer; // Tuples that were successfully acked/emitted. These tuples will be committed periodically when the commit timer expires, // or after a consumer rebalance, or during close/deactivate. Always empty if processing guarantee is none or at-most-once. private transient Map<TopicPartition, OffsetManager> offsetManagers; // Records that have been polled and are queued to be emitted in the nextTuple() call. One record is emitted per nextTuple() private transient Map<TopicPartition, List<ConsumerRecord<K, V>>> waitingToEmit; //…… @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.context = context; // Spout internals this.collector = collector; // Offset management firstPollOffsetStrategy = kafkaSpoutConfig.getFirstPollOffsetStrategy(); // Retries management retryService = kafkaSpoutConfig.getRetryService(); tupleListener = kafkaSpoutConfig.getTupleListener(); if (kafkaSpoutConfig.getProcessingGuarantee() != KafkaSpoutConfig.ProcessingGuarantee.AT_MOST_ONCE) { // In at-most-once mode the offsets are committed after every poll, and not periodically as controlled by the timer commitTimer = new Timer(TIMER_DELAY_MS, kafkaSpoutConfig.getOffsetsCommitPeriodMs(), TimeUnit.MILLISECONDS); } refreshSubscriptionTimer = new Timer(TIMER_DELAY_MS, kafkaSpoutConfig.getPartitionRefreshPeriodMs(), TimeUnit.MILLISECONDS); offsetManagers = new HashMap<>(); emitted = new HashSet<>(); waitingToEmit = new HashMap<>(); commitMetadataManager = new CommitMetadataManager(context, kafkaSpoutConfig.getProcessingGuarantee()); tupleListener.open(conf, context); if (canRegisterMetrics()) { registerMetric(); } LOG.info(“Kafka Spout opened with the following configuration: {}”, kafkaSpoutConfig); } //……}open的时候判断，只要不是ProcessingGuarantee.AT_MOST_ONCE，那么就初始化commitTimer，period值为kafkaSpoutConfig.getPartitionRefreshPeriodMs()，如果没有设置，默认是2000msTimer.isExpiredResetOnTruestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/internal/Timer.javapublic class Timer { private final long delay; private final long period; private final TimeUnit timeUnit; private final long periodNanos; private long start; //…… /* * Checks if a call to this method occurs later than {@code period} since the timer was initiated or reset. If that is the * case the method returns true, otherwise it returns false. Each time this method returns true, the counter is reset * (re-initiated) and a new cycle will start. * * @return true if the time elapsed since the last call returning true is greater than {@code period}. Returns false * otherwise. / public boolean isExpiredResetOnTrue() { final boolean expired = Time.nanoTime() - start >= periodNanos; if (expired) { start = Time.nanoTime(); } return expired; }}Timer有一个重要的方法是isExpiredResetOnTrue，用于判断“调度时间”是否到了，这个在nextTuple里头有调用到KafkaSpout.nextTuplestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java // ======== Next Tuple ======= @Override public void nextTuple() { try { if (refreshSubscriptionTimer.isExpiredResetOnTrue()) { kafkaSpoutConfig.getSubscription().refreshAssignment(); } if (commitTimer != null && commitTimer.isExpiredResetOnTrue()) { if (isAtLeastOnceProcessing()) { commitOffsetsForAckedTuples(kafkaConsumer.assignment()); } else if (kafkaSpoutConfig.getProcessingGuarantee() == ProcessingGuarantee.NO_GUARANTEE) { Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = createFetchedOffsetsMetadata(kafkaConsumer.assignment()); kafkaConsumer.commitAsync(offsetsToCommit, null); LOG.debug(“Committed offsets {} to Kafka”, offsetsToCommit); } } PollablePartitionsInfo pollablePartitionsInfo = getPollablePartitionsInfo(); if (pollablePartitionsInfo.shouldPoll()) { try { setWaitingToEmit(pollKafkaBroker(pollablePartitionsInfo)); } catch (RetriableException e) { LOG.error(“Failed to poll from kafka.”, e); } } emitIfWaitingNotEmitted(); } catch (InterruptException e) { throwKafkaConsumerInterruptedException(); } }nextTuple先判断要不要刷新subscription，然后就判断commitTimer，判断是否应该提交commit，这里是调用commitTimer.isExpiredResetOnTrue()ProcessingGuarantee类型如果是NO_GUARANTEE，则调用createFetchedOffsetsMetadata创建待提交的offset及partition信息，然后调用kafkaConsumer.commitAsync进行异步提交；ProcessingGuarantee类型如果是AT_LEAST_ONCE，则调用commitOffsetsForAckedTuples进行提交处理完offset提交之后，通过getPollablePartitionsInfo获取PollablePartitionsInfo，如果shouldPoll则调用pollKafkaBroker拉数据，然后通过setWaitingToEmit方法将拉取的数据放入waitingToEmit最后调用emitIfWaitingNotEmitted方法，当有数据的时候就进行emit或者retry，没有数据时通过while循环进行waitingcreateFetchedOffsetsMetadatastorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java private Map<TopicPartition, OffsetAndMetadata> createFetchedOffsetsMetadata(Set<TopicPartition> assignedPartitions) { Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = new HashMap<>(); for (TopicPartition tp : assignedPartitions) { offsetsToCommit.put(tp, new OffsetAndMetadata(kafkaConsumer.position(tp), commitMetadataManager.getCommitMetadata())); } return offsetsToCommit; }这里根据kafkaConsumer.assignment()的信息，通过kafkaConsumer.position(tp)提取下一步将要fetch的offset位置，通过commitMetadataManager.getCommitMetadata()提取CommitMetadata的json串作为元信息commitOffsetsForAckedTuplesstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java private void commitOffsetsForAckedTuples(Set<TopicPartition> assignedPartitions) { // Find offsets that are ready to be committed for every assigned topic partition final Map<TopicPartition, OffsetManager> assignedOffsetManagers = new HashMap<>(); for (Entry<TopicPartition, OffsetManager> entry : offsetManagers.entrySet()) { if (assignedPartitions.contains(entry.getKey())) { assignedOffsetManagers.put(entry.getKey(), entry.getValue()); } } final Map<TopicPartition, OffsetAndMetadata> nextCommitOffsets = new HashMap<>(); for (Map.Entry<TopicPartition, OffsetManager> tpOffset : assignedOffsetManagers.entrySet()) { final OffsetAndMetadata nextCommitOffset = tpOffset.getValue().findNextCommitOffset(commitMetadataManager.getCommitMetadata()); if (nextCommitOffset != null) { nextCommitOffsets.put(tpOffset.getKey(), nextCommitOffset); } } // Commit offsets that are ready to be committed for every topic partition if (!nextCommitOffsets.isEmpty()) { kafkaConsumer.commitSync(nextCommitOffsets); LOG.debug(“Offsets successfully committed to Kafka [{}]”, nextCommitOffsets); // Instead of iterating again, it would be possible to commit and update the state for each TopicPartition // in the prior loop, but the multiple network calls should be more expensive than iterating twice over a small loop for (Map.Entry<TopicPartition, OffsetAndMetadata> tpOffset : nextCommitOffsets.entrySet()) { //Update the OffsetManager for each committed partition, and update numUncommittedOffsets final TopicPartition tp = tpOffset.getKey(); long position = kafkaConsumer.position(tp); long committedOffset = tpOffset.getValue().offset(); if (position < committedOffset) { / * The position is behind the committed offset. This can happen in some cases, e.g. if a message failed, lots of (more * than max.poll.records) later messages were acked, and the failed message then gets acked. The consumer may only be * part way through “catching up” to where it was when it went back to retry the failed tuple. Skip the consumer forward * to the committed offset and drop the current waiting to emit list, since it’ll likely contain committed offsets. / LOG.debug(“Consumer fell behind committed offset. Catching up. Position was [{}], skipping to [{}]”, position, committedOffset); kafkaConsumer.seek(tp, committedOffset); List<ConsumerRecord<K, V>> waitingToEmitForTp = waitingToEmit.get(tp); if (waitingToEmitForTp != null) { //Discard the pending records that are already committed List<ConsumerRecord<K, V>> filteredRecords = new ArrayList<>(); for (ConsumerRecord<K, V> record : waitingToEmitForTp) { if (record.offset() >= committedOffset) { filteredRecords.add(record); } } waitingToEmit.put(tp, filteredRecords); } } final OffsetManager offsetManager = assignedOffsetManagers.get(tp); offsetManager.commit(tpOffset.getValue()); LOG.debug("[{}] uncommitted offsets for partition [{}] after commit", offsetManager.getNumUncommittedOffsets(), tp); } } else { LOG.trace(“No offsets to commit. {}”, this); } }这里首先通过offsetManagers，获取已经ack的等待commit的partition以及msgId信息，如果是ProcessingGuarantee.AT_MOST_ONCE则该集合为空之后根据CommitMetadata通过OffsetManager.findNextCommitOffset获取这一批待commit的消息的offset然后调用kafkaConsumer.commitSync同步提交offset，之后更新本地的OffsetManager的committed相关信息getPollablePartitionsInfostorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java private PollablePartitionsInfo getPollablePartitionsInfo() { if (isWaitingToEmit()) { LOG.debug(“Not polling. Tuples waiting to be emitted.”); return new PollablePartitionsInfo(Collections.<TopicPartition>emptySet(), Collections.<TopicPartition, Long>emptyMap()); } Set<TopicPartition> assignment = kafkaConsumer.assignment(); if (!isAtLeastOnceProcessing()) { return new PollablePartitionsInfo(assignment, Collections.<TopicPartition, Long>emptyMap()); } Map<TopicPartition, Long> earliestRetriableOffsets = retryService.earliestRetriableOffsets(); Set<TopicPartition> pollablePartitions = new HashSet<>(); final int maxUncommittedOffsets = kafkaSpoutConfig.getMaxUncommittedOffsets(); for (TopicPartition tp : assignment) { OffsetManager offsetManager = offsetManagers.get(tp); int numUncommittedOffsets = offsetManager.getNumUncommittedOffsets(); if (numUncommittedOffsets < maxUncommittedOffsets) { //Allow poll if the partition is not at the maxUncommittedOffsets limit pollablePartitions.add(tp); } else { long offsetAtLimit = offsetManager.getNthUncommittedOffsetAfterCommittedOffset(maxUncommittedOffsets); Long earliestRetriableOffset = earliestRetriableOffsets.get(tp); if (earliestRetriableOffset != null && earliestRetriableOffset <= offsetAtLimit) { //Allow poll if there are retriable tuples within the maxUncommittedOffsets limit pollablePartitions.add(tp); } else { LOG.debug(“Not polling on partition [{}]. It has [{}] uncommitted offsets, which exceeds the limit of [{}]. “, tp, numUncommittedOffsets, maxUncommittedOffsets); } } } return new PollablePartitionsInfo(pollablePartitions, earliestRetriableOffsets); }这里对于不是ProcessingGuarantee.AT_LEAST_ONCE类型的，则直接根据kafkaConsumer.assignment()信息返回如果是ProcessingGuarantee.AT_LEAST_ONCE类型类型的，这里会获取retryService.earliestRetriableOffsets()，把fail相关的offset信息整合进去这里有一个maxUncommittedOffsets参数，在numUncommittedOffsets<maxUncommittedOffsets时会进行重试，如果大于等于maxUncommittedOffsets，则会进一步判断，如果是earliestRetriableOffset小于等于offsetAtLimit，那么也加入重试pollKafkaBrokerstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java // ======== poll ========= private ConsumerRecords<K, V> pollKafkaBroker(PollablePartitionsInfo pollablePartitionsInfo) { doSeekRetriableTopicPartitions(pollablePartitionsInfo.pollableEarliestRetriableOffsets); Set<TopicPartition> pausedPartitions = new HashSet<>(kafkaConsumer.assignment()); Iterator<TopicPartition> pausedIter = pausedPartitions.iterator(); while (pausedIter.hasNext()) { if (pollablePartitionsInfo.pollablePartitions.contains(pausedIter.next())) { pausedIter.remove(); } } try { kafkaConsumer.pause(pausedPartitions); final ConsumerRecords<K, V> consumerRecords = kafkaConsumer.poll(kafkaSpoutConfig.getPollTimeoutMs()); ackRetriableOffsetsIfCompactedAway(pollablePartitionsInfo.pollableEarliestRetriableOffsets, consumerRecords); final int numPolledRecords = consumerRecords.count(); LOG.debug(“Polled [{}] records from Kafka”, numPolledRecords); if (kafkaSpoutConfig.getProcessingGuarantee() == KafkaSpoutConfig.ProcessingGuarantee.AT_MOST_ONCE) { //Commit polled records immediately to ensure delivery is at-most-once. Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = createFetchedOffsetsMetadata(kafkaConsumer.assignment()); kafkaConsumer.commitSync(offsetsToCommit); LOG.debug(“Committed offsets {} to Kafka”, offsetsToCommit); } return consumerRecords; } finally { kafkaConsumer.resume(pausedPartitions); } } private void doSeekRetriableTopicPartitions(Map<TopicPartition, Long> pollableEarliestRetriableOffsets) { for (Entry<TopicPartition, Long> retriableTopicPartitionAndOffset : pollableEarliestRetriableOffsets.entrySet()) { //Seek directly to the earliest retriable message for each retriable topic partition kafkaConsumer.seek(retriableTopicPartitionAndOffset.getKey(), retriableTopicPartitionAndOffset.getValue()); } } private void ackRetriableOffsetsIfCompactedAway(Map<TopicPartition, Long> earliestRetriableOffsets, ConsumerRecords<K, V> consumerRecords) { for (Entry<TopicPartition, Long> entry : earliestRetriableOffsets.entrySet()) { TopicPartition tp = entry.getKey(); List<ConsumerRecord<K, V>> records = consumerRecords.records(tp); if (!records.isEmpty()) { ConsumerRecord<K, V> record = records.get(0); long seekOffset = entry.getValue(); long earliestReceivedOffset = record.offset(); if (seekOffset < earliestReceivedOffset) { //Since we asked for tuples starting at seekOffset, some retriable records must have been compacted away. //Ack up to the first offset received if the record is not already acked or currently in the topology for (long i = seekOffset; i < earliestReceivedOffset; i++) { KafkaSpoutMessageId msgId = retryService.getMessageId(new ConsumerRecord<>(tp.topic(), tp.partition(), i, null, null)); if (!offsetManagers.get(tp).contains(msgId) && !emitted.contains(msgId)) { LOG.debug(“Record at offset [{}] appears to have been compacted away from topic [{}], marking as acked”, i, tp); retryService.remove(msgId); emitted.add(msgId); ack(msgId); } } } } } }如果PollablePartitionsInfo的pollablePartitions不为空，则会调用pollKafkaBroker拉取消息首先调用了doSeekRetriableTopicPartitions，根据要重试的partition及offset信息，进行seek操作，对每个parition移动到要重试的最早的offset位置拉取消息的时候，先pause不符合maxUncommitted等条件的paritions，然后进行poll消息，poll拉取消息之后判断如果是ProcessingGuarantee.AT_MOST_ONCE类型的，则调用kafkaConsumer.commitSync同步提交，然后返回拉取的记录(最后设置到waitingToEmit)，最后再resume之前pause的partitions(通过这样避免拉取不符合提交条件的partitions的消息)；注意这里的pollablePartitionsInfo是根据getPollablePartitionsInfo()获取的，它是遍历kafkaConsumer.assignment()根据offsetManager及maxUncommittedOffsets等相关参数进行过滤，因此可以认为pollablePartitionsInfo.pollablePartitions是kafkaConsumer.assignment()的子集，而pausedPartitions是根据kafkaConsumer.assignment()过滤掉pollablePartitionsInfo.pollablePartitions得来的，因而pausedPartitions就是getPollablePartitionsInfo()中不满足条件被剔除的partitions，针对这些partitions，先pause再调用poll，最后再resume，也就是此次poll不会从pausedPartitions拉取消息在poll消息之后还有一个动作就是调用ackRetriableOffsetsIfCompactedAway，针对已经compacted的消息进行ack处理emitIfWaitingNotEmittedstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java private void emitIfWaitingNotEmitted() { Iterator<List<ConsumerRecord<K, V>>> waitingToEmitIter = waitingToEmit.values().iterator(); outerLoop: while (waitingToEmitIter.hasNext()) { List<ConsumerRecord<K, V>> waitingToEmitForTp = waitingToEmitIter.next(); while (!waitingToEmitForTp.isEmpty()) { final boolean emittedTuple = emitOrRetryTuple(waitingToEmitForTp.remove(0)); if (emittedTuple) { break outerLoop; } } waitingToEmitIter.remove(); } }emitIfWaitingNotEmitted主要是判断waitingToEmit有无数据，有则取出来触发emitOrRetryTuple，没有则不断循环进行waitingemitOrRetryTuplestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java /* * Creates a tuple from the kafka record and emits it if it was never emitted or it is ready to be retried. * * @param record to be emitted * @return true if tuple was emitted. False if tuple has been acked or has been emitted and is pending ack or fail / private boolean emitOrRetryTuple(ConsumerRecord<K, V> record) { final TopicPartition tp = new TopicPartition(record.topic(), record.partition()); final KafkaSpoutMessageId msgId = retryService.getMessageId(record); if (offsetManagers.containsKey(tp) && offsetManagers.get(tp).contains(msgId)) { // has been acked LOG.trace(“Tuple for record [{}] has already been acked. Skipping”, record); } else if (emitted.contains(msgId)) { // has been emitted and it is pending ack or fail LOG.trace(“Tuple for record [{}] has already been emitted. Skipping”, record); } else { final OffsetAndMetadata committedOffset = kafkaConsumer.committed(tp); if (isAtLeastOnceProcessing() && committedOffset != null && committedOffset.offset() > record.offset() && commitMetadataManager.isOffsetCommittedByThisTopology(tp, committedOffset, Collections.unmodifiableMap(offsetManagers))) { // Ensures that after a topology with this id is started, the consumer fetch // position never falls behind the committed offset (STORM-2844) throw new IllegalStateException(“Attempting to emit a message that has already been committed.” + " This should never occur when using the at-least-once processing guarantee.”); } final List<Object> tuple = kafkaSpoutConfig.getTranslator().apply(record); if (isEmitTuple(tuple)) { final boolean isScheduled = retryService.isScheduled(msgId); // not scheduled <=> never failed (i.e. never emitted), or scheduled and ready to be retried if (!isScheduled || retryService.isReady(msgId)) { final String stream = tuple instanceof KafkaTuple ? ((KafkaTuple) tuple).getStream() : Utils.DEFAULT_STREAM_ID; if (!isAtLeastOnceProcessing()) { if (kafkaSpoutConfig.isTupleTrackingEnforced()) { collector.emit(stream, tuple, msgId); LOG.trace(“Emitted tuple [{}] for record [{}] with msgId [{}]”, tuple, record, msgId); } else { collector.emit(stream, tuple); LOG.trace(“Emitted tuple [{}] for record [{}]”, tuple, record); } } else { emitted.add(msgId); offsetManagers.get(tp).addToEmitMsgs(msgId.offset()); if (isScheduled) { // Was scheduled for retry and re-emitted, so remove from schedule. retryService.remove(msgId); } collector.emit(stream, tuple, msgId); tupleListener.onEmit(tuple, msgId); LOG.trace(“Emitted tuple [{}] for record [{}] with msgId [{}]”, tuple, record, msgId); } return true; } } else { /if a null tuple is not configured to be emitted, it should be marked as emitted and acked immediately * to allow its offset to be commited to Kafka/ LOG.debug(“Not emitting null tuple for record [{}] as defined in configuration.”, record); if (isAtLeastOnceProcessing()) { msgId.setNullTuple(true); offsetManagers.get(tp).addToEmitMsgs(msgId.offset()); ack(msgId); } } } return false; }emitOrRetryTuple是整个nextTuple的核心，这里包含了emit操作以及retry操作由于针对fail的消息，是使用seek方法进行重新拉取的，因而这里要使用offsetManagers(已经acked等待commit)以及emitted(已经emit等待ack)进行去重判断，如果这两者都不包含，才进行emit或者retry进行emit处理时，先通过retryService.isScheduled(msgId)判断是否是失败重试的，如果不是失败重试的，或者是失败重试的且已经到期了，那么就是进行下面的emit处理针对ProcessingGuarantee.AT_LEAST_ONCE类型的，这里要维护emitted以及offsetManagers，然后进行emit操作，回调tupleListener.onEmit(tuple, msgId)方法；如果不是ProcessingGuarantee.AT_LEAST_ONCE类型的，则仅仅是进行collector.emit操作KafkaSpout.ackstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java // ======== Ack ======= @Override public void ack(Object messageId) { if (!isAtLeastOnceProcessing()) { return; } // Only need to keep track of acked tuples if commits to Kafka are controlled by // tuple acks, which happens only for at-least-once processing semantics final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId; if (msgId.isNullTuple()) { //a null tuple should be added to the ack list since by definition is a direct ack offsetManagers.get(msgId.getTopicPartition()).addToAckMsgs(msgId); LOG.debug(“Received direct ack for message [{}], associated with null tuple”, msgId); tupleListener.onAck(msgId); return; } if (!emitted.contains(msgId)) { LOG.debug(“Received ack for message [{}], associated with tuple emitted for a ConsumerRecord that " + “came from a topic-partition that this consumer group instance is no longer tracking " + “due to rebalance/partition reassignment. No action taken.”, msgId); } else { Validate.isTrue(!retryService.isScheduled(msgId), “The message id " + msgId + " is queued for retry while being acked.” + " This should never occur barring errors in the RetryService implementation or the spout code.”); offsetManagers.get(msgId.getTopicPartition()).addToAckMsgs(msgId); emitted.remove(msgId); } tupleListener.onAck(msgId); }ack的时候，如果不是ProcessingGuarantee.AT_LEAST_ONCE类型，就立马返回之后将已经acked的msgId放入到offsetManagers这个map中，等待在nextTuple中进行commit，然后将其从emitted中移除这里有一个emitted的去重判断，如果不是之前emit过的就不处理，这种通常是rebalance/partition reassignment引起的KafkaSpout.failstorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpout.java // ======== Fail ======= @Override public void fail(Object messageId) { if (!isAtLeastOnceProcessing()) { return; } // Only need to keep track of failed tuples if commits to Kafka are controlled by // tuple acks, which happens only for at-least-once processing semantics final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId; if (!emitted.contains(msgId)) { LOG.debug(“Received fail for tuple this spout is no longer tracking.” + " Partitions may have been reassigned. Ignoring message [{}]”, msgId); return; } Validate.isTrue(!retryService.isScheduled(msgId), “The message id " + msgId + " is queued for retry while being failed.” + " This should never occur barring errors in the RetryService implementation or the spout code.”); msgId.incrementNumFails(); if (!retryService.schedule(msgId)) { LOG.debug(“Reached maximum number of retries. Message [{}] being marked as acked.”, msgId); // this tuple should be removed from emitted only inside the ack() method. This is to ensure // that the OffsetManager for that TopicPartition is updated and allows commit progression tupleListener.onMaxRetryReached(msgId); ack(msgId); } else { tupleListener.onRetry(msgId); emitted.remove(msgId); } }fail的时候也先判断，如果不是ProcessingGuarantee.AT_LEAST_ONCE类型，就立马返回然后判断emitted中是否存在，如果不存在，则立刻返回，这通常是partition reassigned引起的fail的时候，调用retryService.schedule(msgId)，如果不成功，则触发tupleListener.onMaxRetryReached，然后进行ack；如果成功则调用tupleListener.onRetry回调，然后从emitted中删除KafkaSpoutRetryExponentialBackoff.schedulestorm-kafka-client-1.2.2-sources.jar!/org/apache/storm/kafka/spout/KafkaSpoutRetryExponentialBackoff.java private static final RetryEntryTimeStampComparator RETRY_ENTRY_TIME_STAMP_COMPARATOR = new RetryEntryTimeStampComparator(); //This class assumes that there is at most one retry schedule per message id in this set at a time. private final Set<RetrySchedule> retrySchedules = new TreeSet<>(RETRY_ENTRY_TIME_STAMP_COMPARATOR); /* * Comparator ordering by timestamp */ private static class RetryEntryTimeStampComparator implements Serializable, Comparator<RetrySchedule> { @Override public int compare(RetrySchedule entry1, RetrySchedule entry2) { int result = Long.valueOf(entry1.nextRetryTimeNanos()).compareTo(entry2.nextRetryTimeNanos()); if(result == 0) { //TreeSet uses compareTo instead of equals() for the Set contract //Ensure that we can save two retry schedules with the same timestamp result = entry1.hashCode() - entry2.hashCode(); } return result; } } @Override public boolean schedule(KafkaSpoutMessageId msgId) { if (msgId.numFails() > maxRetries) { LOG.debug(“Not scheduling [{}] because reached maximum number of retries [{}].”, msgId, maxRetries); return false; } else { //Remove existing schedule for the message id remove(msgId); final RetrySchedule retrySchedule = new RetrySchedule(msgId, nextTime(msgId)); retrySchedules.add(retrySchedule); toRetryMsgs.add(msgId); LOG.debug(“Scheduled. {}”, retrySchedule); LOG.trace(“Current state {}”, retrySchedules); return true; } } @Override public Map<TopicPartition, Long> earliestRetriableOffsets() { final Map<TopicPartition, Long> tpToEarliestRetriableOffset = new HashMap<>(); final long currentTimeNanos = Time.nanoTime(); for (RetrySchedule retrySchedule : retrySchedules) { if (retrySchedule.retry(currentTimeNanos)) { final KafkaSpoutMessageId msgId = retrySchedule.msgId; final TopicPartition tpForMessage = new TopicPartition(msgId.topic(), msgId.partition()); final Long currentLowestOffset = tpToEarliestRetriableOffset.get(tpForMessage); if(currentLowestOffset != null) { tpToEarliestRetriableOffset.put(tpForMessage, Math.min(currentLowestOffset, msgId.offset())); } else { tpToEarliestRetriableOffset.put(tpForMessage, msgId.offset()); } } else { break; // Stop searching as soon as passed current time } } LOG.debug(“Topic partitions with entries ready to be retried [{}] “, tpToEarliestRetriableOffset); return tpToEarliestRetriableOffset; } @Override public boolean isReady(KafkaSpoutMessageId msgId) { boolean retry = false; if (isScheduled(msgId)) { final long currentTimeNanos = Time.nanoTime(); for (RetrySchedule retrySchedule : retrySchedules) { if (retrySchedule.retry(currentTimeNanos)) { if (retrySchedule.msgId.equals(msgId)) { retry = true; LOG.debug(“Found entry to retry {}”, retrySchedule); break; //Stop searching if the message is known to be ready for retry } } else { LOG.debug(“Entry to retry not found {}”, retrySchedule); break; // Stop searching as soon as passed current time } } } return retry; }schedule首先判断失败次数是否超过maxRetries，如果超过了则返回false，表示不再调度了，之后KafkaSpout在fail方法回调tupleListener.onMaxRetryReached方法，然后进行ack，表示不再处理了没有超过maxRetries的话，则创建retrySchedule信息，然后添加到retrySchedules中；retrySchedules是一个TreeSet，默认使用RetryEntryTimeStampComparator，根据nextRetryTimeNanos进行排序，如果相等则按hashCode进行排序earliestRetriableOffsets以及isReady都会用到retrySchedules的信息小结storm-kafka-client主要针对kafka0.10及以上版本，它引入了ProcessingGuarantee枚举，该枚举有三个值，分别是ProcessingGuarantee.AT_LEAST_ONCE就是开启ack的版本，它类似kafka client的auto commit，在指定interval定期commit；它会维护已经emitted(已经emitted但尚未ack)，offsetManagers(已经ack但尚未commit)以及fail需要重试的retrySchedulesProcessingGuarantee.AT_MOST_ONCE，它就不管ack了，在polled out消息的时候同步commit(忽略interval配置)，因而该消息最多被处理一次ProcessingGuarantee.NO_GUARANTEE，这个也是不管ack的，不过它跟ProcessingGuarantee.AT_LEAST_ONCE类似，是在指定interval定期commit(都依赖commitTimer)，不同的是它是异步ProcessingGuarantee.AT_LEAST_ONCE它结合了storm的ack机制，在spout的ack方法维护emitted(已经emitted但尚未ack)；在fail方法将msgId放入到retryService进行重试(这个是ProcessingGuarantee.NO_GUARANTEE所没有的)；它跟ProcessingGuarantee.NO_GUARANTEE一样是依赖commitTimer，在initerval期间提交offset信息，不同的是它是commitSync，即同步提交，而且提交的是已经acked的消息；而ProcessingGuarantee.NO_GUARANTEE是异步提交，而且提交的是offset是不管是否在storm spout已经ack，而是以consumer的poll为准的ProcessingGuarantee.AT_MOST_ONCE是在pollKafkaBroker方法里头，在调用完kafkaConsumer.poll之后，调用kafkaConsumer.commitSync进行同步提交commit；它是同步提交，而且不依赖commitTimer，即不是interval提交offsetProcessingGuarantee.NO_GUARANTEE在nextTuple中判断需要commit的时候，调用kafkaConsumer.commitAsync进行异步提交，它跟ProcessingGuarantee.AT_LEAST_ONCE一样，都依赖commitTimer，在initerval期间提交offset，但是它是异步提交，而ProcessingGuarantee.AT_LEAST_ONCE是同步提交nextTuple()方法会pollKafkaBroker会调用kafkaConsumer.poll方法拉取消息，然后将拉取到的消息放入waitingToEmit，之后调用emitIfWaitingNotEmitted方法进行emit或者waiting，如果emit则是调用emitOrRetryTuple方法；由于pollKafkaBroker会执行seek操作将offset移动到每个parition中失败的offset中最小的位置，从那个位置开始重新拉取消息，拉取消息调用了kafkaConsumer.poll方法，KafkaSpoutConfig.ProcessingGuarantee.AT_MOST_ONCE是在这里进行kafkaConsumer.commitSync同步提交offset的；由于包含了要重试的消息，emitOrRetryTuple这里要根据offsetManagers(已经ack等待commit)以及emitted(已经emit等待ack)进行去重判断是否需要调用collector.emit；对于ProcessingGuarantee.AT_LEAST_ONCE类型，这里不仅调用emit方法，还需要维护offsetManagers、emitted及重试信息相关状态，然后回调tupleListener.onEmit方法；对于非ProcessingGuarantee.AT_LEAST_ONCE类型这里仅仅是emit。docStorm Apache Kafka integration using the kafka-client jar ...

聊聊storm trident的coordinator

序本文主要研究一下storm trident的coordinator实例代码示例 @Test public void testDebugTopologyBuild(){ FixedBatchSpout spout = new FixedBatchSpout(new Fields(“user”, “score”), 3, new Values(“nickt1”, 4), new Values(“nickt2”, 7), new Values(“nickt3”, 8), new Values(“nickt4”, 9), new Values(“nickt5”, 7), new Values(“nickt6”, 11), new Values(“nickt7”, 5) ); spout.setCycle(false); TridentTopology topology = new TridentTopology(); Stream stream1 = topology.newStream(“spout1”,spout) .each(new Fields(“user”, “score”), new BaseFunction() { @Override public void execute(TridentTuple tuple, TridentCollector collector) { System.out.println(“tuple:"+tuple); } },new Fields()); topology.build(); }这里使用的spout为FixedBatchSpout，它是IBatchSpout类型拓扑图MasterBatchCoordinatorstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.javapublic class MasterBatchCoordinator extends BaseRichSpout { public static final Logger LOG = LoggerFactory.getLogger(MasterBatchCoordinator.class); public static final long INIT_TXID = 1L; public static final String BATCH_STREAM_ID = “$batch”; public static final String COMMIT_STREAM_ID = “$commit”; public static final String SUCCESS_STREAM_ID = “$success”; private static final String CURRENT_TX = “currtx”; private static final String CURRENT_ATTEMPTS = “currattempts”; private List<TransactionalState> _states = new ArrayList(); TreeMap<Long, TransactionStatus> _activeTx = new TreeMap<Long, TransactionStatus>(); TreeMap<Long, Integer> _attemptIds; private SpoutOutputCollector _collector; Long _currTransaction; int _maxTransactionActive; List<ITridentSpout.BatchCoordinator> _coordinators = new ArrayList(); List<String> _managedSpoutIds; List<ITridentSpout> _spouts; WindowedTimeThrottler _throttler; boolean _active = true; public MasterBatchCoordinator(List<String> spoutIds, List<ITridentSpout> spouts) { if(spoutIds.isEmpty()) { throw new IllegalArgumentException(“Must manage at least one spout”); } _managedSpoutIds = spoutIds; _spouts = spouts; LOG.debug(“Created {}”, this); } public List<String> getManagedSpoutIds(){ return _managedSpoutIds; } @Override public void activate() { _active = true; } @Override public void deactivate() { _active = false; } @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1); for(String spoutId: _managedSpoutIds) { _states.add(TransactionalState.newCoordinatorState(conf, spoutId)); } _currTransaction = getStoredCurrTransaction(); _collector = collector; Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING); if(active==null) { _maxTransactionActive = 1; } else { _maxTransactionActive = active.intValue(); } _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive); for(int i=0; i<_spouts.size(); i++) { String txId = _managedSpoutIds.get(i); _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context)); } LOG.debug(“Opened {}”, this); } @Override public void close() { for(TransactionalState state: _states) { state.close(); } LOG.debug(“Closed {}”, this); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // in partitioned example, in case an emitter task receives a later transaction than it’s emitted so far, // when it sees the earlier txid it should know to emit nothing declarer.declareStream(BATCH_STREAM_ID, new Fields(“tx”)); declarer.declareStream(COMMIT_STREAM_ID, new Fields(“tx”)); declarer.declareStream(SUCCESS_STREAM_ID, new Fields(“tx”)); } @Override public Map<String, Object> getComponentConfiguration() { Config ret = new Config(); ret.setMaxTaskParallelism(1); ret.registerSerialization(TransactionAttempt.class); return ret; } //……}prepare方法首先从Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS(topology.trident.batch.emit.interval.millis，在defaults.yaml默认为500)读取触发batch的频率配置，然后创建WindowedTimeThrottler，其maxAmt值为1这里使用TransactionalState在zookeeper上维护transactional状态之后读取Config.TOPOLOGY_MAX_SPOUT_PENDING(topology.max.spout.pending，在defaults.yaml中默认为null)设置_maxTransactionActive，如果为null，则设置为1MasterBatchCoordinator.nextTuplestorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java @Override public void nextTuple() { sync(); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won’t be a commit for tx 2 (because tx 1 isn’t committed yet), // and there won’t be a batch for tx 4 because there’s max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) { maybeCommit.status = AttemptStatus.COMMITTING; _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt); LOG.debug(“Emitted on [stream = {}], [tx_status = {}], [{}]”, COMMIT_STREAM_ID, maybeCommit, this); } if(_active) { if(_activeTx.size() < _maxTransactionActive) { Long curr = _currTransaction; for(int i=0; i<_maxTransactionActive; i++) { if(!_activeTx.containsKey(curr) && isReady(curr)) { // by using a monotonically increasing attempt id, downstream tasks // can be memory efficient by clearing out state for old attempts // as soon as they see a higher attempt id for a transaction Integer attemptId = _attemptIds.get(curr); if(attemptId==null) { attemptId = 0; } else { attemptId++; } _attemptIds.put(curr, attemptId); for(TransactionalState state: _states) { state.setData(CURRENT_ATTEMPTS, _attemptIds); } TransactionAttempt attempt = new TransactionAttempt(curr, attemptId); final TransactionStatus newTransactionStatus = new TransactionStatus(attempt); _activeTx.put(curr, newTransactionStatus); _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt); LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, BATCH_STREAM_ID, attempt, newTransactionStatus, this); _throttler.markEvent(); } curr = nextTransactionId(curr); } } } }nextTuple就是调用sync方法，该方法在ack及fail中均有调用；sync方法首先根据事务状态，如果需要提交，则会往MasterBatchCoordinator.COMMIT_STREAM_ID($commit)发送tuple；之后根据_maxTransactionActive以及WindowedTimeThrottler限制，符合要求才启动新的TransactionAttempt，往MasterBatchCoordinator.BATCH_STREAM_ID($batch)发送tuple，同时对WindowedTimeThrottler标记下windowEvent数量MasterBatchCoordinator.ackstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java @Override public void ack(Object msgId) { TransactionAttempt tx = (TransactionAttempt) msgId; TransactionStatus status = _activeTx.get(tx.getTransactionId()); LOG.debug(“Ack. [tx_attempt = {}], [tx_status = {}], [{}]”, tx, status, this); if(status!=null && tx.equals(status.attempt)) { if(status.status==AttemptStatus.PROCESSING) { status.status = AttemptStatus.PROCESSED; LOG.debug(“Changed status. [tx_attempt = {}] [tx_status = {}]”, tx, status); } else if(status.status==AttemptStatus.COMMITTING) { _activeTx.remove(tx.getTransactionId()); _attemptIds.remove(tx.getTransactionId()); _collector.emit(SUCCESS_STREAM_ID, new Values(tx)); _currTransaction = nextTransactionId(tx.getTransactionId()); for(TransactionalState state: _states) { state.setData(CURRENT_TX, _currTransaction); } LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, SUCCESS_STREAM_ID, tx, status, this); } sync(); } }ack主要是根据当前事务状态进行不同操作，如果之前是AttemptStatus.PROCESSING状态，则更新为AttemptStatus.PROCESSED；如果之前是AttemptStatus.COMMITTING，则移除当前事务，然后往MasterBatchCoordinator.SUCCESS_STREAM_ID($success)发送tuple，更新_currTransaction为nextTransactionId；最后再调用sync触发新的TransactionAttemptMasterBatchCoordinator.failstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java @Override public void fail(Object msgId) { TransactionAttempt tx = (TransactionAttempt) msgId; TransactionStatus stored = _activeTx.remove(tx.getTransactionId()); LOG.debug(“Fail. [tx_attempt = {}], [tx_status = {}], [{}]”, tx, stored, this); if(stored!=null && tx.equals(stored.attempt)) { _activeTx.tailMap(tx.getTransactionId()).clear(); sync(); } }fail方法将当前事务从_activeTx中移除，然后清空_activeTx中txId大于这个失败txId的数据，最后再调用sync判断是否该触发新的TransactionAttempt(注意这里没有变更_currTransaction，因而sync方法触发新的TransactionAttempt的_txid还是当前这个失败的_currTransaction)TridentSpoutCoordinatorstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/spout/TridentSpoutCoordinator.javapublic class TridentSpoutCoordinator implements IBasicBolt { public static final Logger LOG = LoggerFactory.getLogger(TridentSpoutCoordinator.class); private static final String META_DIR = “meta”; ITridentSpout<Object> _spout; ITridentSpout.BatchCoordinator<Object> _coord; RotatingTransactionalState _state; TransactionalState _underlyingState; String _id; public TridentSpoutCoordinator(String id, ITridentSpout<Object> spout) { _spout = spout; _id = id; } @Override public void prepare(Map conf, TopologyContext context) { _coord = _spout.getCoordinator(_id, conf, context); _underlyingState = TransactionalState.newCoordinatorState(conf, _id); _state = new RotatingTransactionalState(_underlyingState, META_DIR); } @Override public void execute(Tuple tuple, BasicOutputCollector collector) { TransactionAttempt attempt = (TransactionAttempt) tuple.getValue(0); if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { _state.cleanupBefore(attempt.getTransactionId()); _coord.success(attempt.getTransactionId()); } else { long txid = attempt.getTransactionId(); Object prevMeta = _state.getPreviousState(txid); Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid)); _state.overrideState(txid, meta); collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta)); } } @Override public void cleanup() { _coord.close(); _underlyingState.close(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declareStream(MasterBatchCoordinator.BATCH_STREAM_ID, new Fields(“tx”, “metadata”)); } @Override public Map<String, Object> getComponentConfiguration() { Config ret = new Config(); ret.setMaxTaskParallelism(1); return ret; } }TridentSpoutCoordinator的nextTuple根据streamId分别做不同的处理如果是MasterBatchCoordinator.SUCCESS_STREAM_ID($success)则表示master那边接收到了ack已经成功了，然后coordinator就清除该txId之前的数据，然后回调ITridentSpout.BatchCoordinator的success方法如果是MasterBatchCoordinator.BATCH_STREAM_ID($batch)则要启动新的TransactionAttempt，则往MasterBatchCoordinator.BATCH_STREAM_ID($batch)发送tuple，该tuple会被下游的bolt接收(在本实例就是使用TridentSpoutExecutor包装了用户spout的TridentBoltExecutor)TridentBoltExecutorstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.javapublic class TridentBoltExecutor implements IRichBolt { public static final String COORD_STREAM_PREFIX = “$coord-”; public static String COORD_STREAM(String batch) { return COORD_STREAM_PREFIX + batch; } RotatingMap<Object, TrackedBatch> _batches; @Override public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _messageTimeoutMs = context.maxTopologyMessageTimeout() * 1000L; _lastRotate = System.currentTimeMillis(); _batches = new RotatingMap<>(2); _context = context; _collector = collector; _coordCollector = new CoordinatedOutputCollector(collector); _coordOutputCollector = new BatchOutputCollectorImpl(new OutputCollector(_coordCollector)); _coordConditions = (Map) context.getExecutorData(”__coordConditions"); if(_coordConditions==null) { _coordConditions = new HashMap<>(); for(String batchGroup: _coordSpecs.keySet()) { CoordSpec spec = _coordSpecs.get(batchGroup); CoordCondition cond = new CoordCondition(); cond.commitStream = spec.commitStream; cond.expectedTaskReports = 0; for(String comp: spec.coords.keySet()) { CoordType ct = spec.coords.get(comp); if(ct.equals(CoordType.single())) { cond.expectedTaskReports+=1; } else { cond.expectedTaskReports+=context.getComponentTasks(comp).size(); } } cond.targetTasks = new HashSet<>(); for(String component: Utils.get(context.getThisTargets(), COORD_STREAM(batchGroup), new HashMap<String, Grouping>()).keySet()) { cond.targetTasks.addAll(context.getComponentTasks(component)); } _coordConditions.put(batchGroup, cond); } context.setExecutorData("_coordConditions", _coordConditions); } _bolt.prepare(conf, context, _coordOutputCollector); } //…… @Override public void cleanup() { _bolt.cleanup(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { _bolt.declareOutputFields(declarer); for(String batchGroup: _coordSpecs.keySet()) { declarer.declareStream(COORD_STREAM(batchGroup), true, new Fields(“id”, “count”)); } } @Override public Map<String, Object> getComponentConfiguration() { Map<String, Object> ret = _bolt.getComponentConfiguration(); if(ret==null) ret = new HashMap<>(); ret.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 5); // TODO: Need to be able to set the tick tuple time to the message timeout, ideally without parameterization return ret; }}prepare的时候，先创建了CoordinatedOutputCollector，之后用OutputCollector包装，再最后包装为BatchOutputCollectorImpl，调用ITridentBatchBolt.prepare方法，ITridentBatchBolt这里头使用的实现类为TridentSpoutExecutorprepare初始化了RotatingMap<Object, TrackedBatch> _batches = new RotatingMap<>(2);prepare主要做的是构建CoordCondition，这里主要是计算expectedTaskReports以及targetTasksTridentBoltExecutor.executestorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java @Override public void execute(Tuple tuple) { if(TupleUtils.isTick(tuple)) { long now = System.currentTimeMillis(); if(now - _lastRotate > _messageTimeoutMs) { _batches.rotate(); _lastRotate = now; } return; } String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId()); if(batchGroup==null) { // this is so we can do things like have simple DRPC that doesn’t need to use batch processing _coordCollector.setCurrBatch(null); _bolt.execute(null, tuple); _collector.ack(tuple); return; } IBatchID id = (IBatchID) tuple.getValue(0); //get transaction id //if it already exists and attempt id is greater than the attempt there TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());// if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {// System.out.println(“Received in " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()// + " (” + _batches.size() + “)” +// “\ntuple: " + tuple +// “\nwith tracked " + tracked +// “\nwith id " + id + // “\nwith group " + batchGroup// + “\n”);// // } //System.out.println(“Num tracked: " + _batches.size() + " " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()); // this code here ensures that only one attempt is ever tracked for a batch, so when // failures happen you don’t get an explosion in memory usage in the tasks if(tracked!=null) { if(id.getAttemptId() > tracked.attemptId) { _batches.remove(id.getId()); tracked = null; } else if(id.getAttemptId() < tracked.attemptId) { // no reason to try to execute a previous attempt than we’ve already seen return; } } if(tracked==null) { tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId()); _batches.put(id.getId(), tracked); } _coordCollector.setCurrBatch(tracked); //System.out.println(“TRACKED: " + tracked + " " + tuple); TupleType t = getTupleType(tuple, tracked); if(t==TupleType.COMMIT) { tracked.receivedCommit = true; checkFinish(tracked, tuple, t); } else if(t==TupleType.COORD) { int count = tuple.getInteger(1); tracked.reportedTasks++; tracked.expectedTupleCount+=count; checkFinish(tracked, tuple, t); } else { tracked.receivedTuples++; boolean success = true; try { _bolt.execute(tracked.info, tuple); if(tracked.condition.expectedTaskReports==0) { success = finishBatch(tracked, tuple); } } catch(FailedException e) { failBatch(tracked, e); } if(success) { _collector.ack(tuple); } else { _collector.fail(tuple); } } _coordCollector.setCurrBatch(null); } private TupleType getTupleType(Tuple tuple, TrackedBatch batch) { CoordCondition cond = batch.condition; if(cond.commitStream!=null && tuple.getSourceGlobalStreamId().equals(cond.commitStream)) { return TupleType.COMMIT; } else if(cond.expectedTaskReports > 0 && tuple.getSourceStreamId().startsWith(COORD_STREAM_PREFIX)) { return TupleType.COORD; } else { return TupleType.REGULAR; } } private void failBatch(TrackedBatch tracked, FailedException e) { if(e!=null && e instanceof ReportedFailedException) { _collector.reportError(e); } tracked.failed = true; if(tracked.delayedAck!=null) { _collector.fail(tracked.delayedAck); tracked.delayedAck = null; } }TridentBoltExecutor的execute方法首先判断是否是tickTuple，如果是判断距离_lastRotate的时间(prepare的时候初始化为当时的时间)是否超过_messageTimeoutMs，如果是则进行_batches.rotate()操作；tickTuple的发射频率为Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS(topology.tick.tuple.freq.secs)，在TridentBoltExecutor中它被设置为5秒；_messageTimeoutMs为context.maxTopologyMessageTimeout() * 1000L，它从整个topology的component的Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs，defaults.yaml中默认为30)最大值1000_batches按TransactionAttempt的txId来存储TrackedBatch信息，如果没有则创建一个新的TrackedBatch；创建TrackedBatch时，会回调_bolt的initBatchState方法之后判断tuple的类型，这里分为TupleType.COMMIT、TupleType.COORD、TupleType.REGULAR；如果是TupleType.COMMIT类型，则设置tracked.receivedCommit为true，然后调用checkFinish方法；如果是TupleType.COORD类型，则更新reportedTasks及expectedTupleCount计数，再调用checkFinish方法；如果是TupleType.REGULAR类型(coordinator发送过来的batch信息)，则更新receivedTuples计数，然后调用_bolt.execute方法(这里的_bolt为TridentSpoutExecutor)，对于tracked.condition.expectedTaskReports==0的则立马调用finishBatch，将该batch从_batches中移除；如果有FailedException则直接failBatch上报error信息，之后对tuple进行ack或者fail；如果下游是each操作，一个batch中如果是部分抛出FailedException异常，则需要等到所有batch中的tuple执行完，等到TupleType.COORD触发检测checkFinish，这个时候才能fail通知到master，也就是有一些滞后性，比如这个batch中有3个tuple，第二个tuple抛出FailedException，还会继续执行第三个tuple，最后该batch的tuple都处理完了，才收到TupleType.COORD触发检测checkFinish。TridentBoltExecutor.checkFinishstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) { if(tracked.failed) { failBatch(tracked); _collector.fail(tuple); return; } CoordCondition cond = tracked.condition; boolean delayed = tracked.delayedAck==null && (cond.commitStream!=null && type==TupleType.COMMIT || cond.commitStream==null); if(delayed) { tracked.delayedAck = tuple; } boolean failed = false; if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) { if(tracked.receivedTuples == tracked.expectedTupleCount) { finishBatch(tracked, tuple); } else { //TODO: add logging that not all tuples were received failBatch(tracked); _collector.fail(tuple); failed = true; } } if(!delayed && !failed) { _collector.ack(tuple); } } private void failBatch(TrackedBatch tracked) { failBatch(tracked, null); } private void failBatch(TrackedBatch tracked, FailedException e) { if(e!=null && e instanceof ReportedFailedException) { _collector.reportError(e); } tracked.failed = true; if(tracked.delayedAck!=null) { _collector.fail(tracked.delayedAck); tracked.delayedAck = null; } }TridentBoltExecutor在execute的时候，在tuple是TupleType.COMMIT以及TupleType.COORD的时候都会调用checkFinish一旦_bolt.execute(tracked.info, tuple)方法抛出FailedException，则会调用failBatch，它会标记tracked.failed为truecheckFinish在发现tracked.failed为true的时候，会调用_collector.fail(tuple)，然后回调MasterBatchCoordinator的fail方法TridentSpoutExecutorstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/spout/TridentSpoutExecutor.javapublic class TridentSpoutExecutor implements ITridentBatchBolt { public static final String ID_FIELD = “$tx”; public static final Logger LOG = LoggerFactory.getLogger(TridentSpoutExecutor.class); AddIdCollector _collector; ITridentSpout<Object> _spout; ITridentSpout.Emitter<Object> _emitter; String _streamName; String _txStateId; TreeMap<Long, TransactionAttempt> _activeBatches = new TreeMap<>(); public TridentSpoutExecutor(String txStateId, String streamName, ITridentSpout<Object> spout) { _txStateId = txStateId; _spout = spout; _streamName = streamName; } @Override public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector) { _emitter = _spout.getEmitter(_txStateId, conf, context); _collector = new AddIdCollector(_streamName, collector); } @Override public void execute(BatchInfo info, Tuple input) { // there won’t be a BatchInfo for the success stream TransactionAttempt attempt = (TransactionAttempt) input.getValue(0); if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) { if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) { ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt); _activeBatches.remove(attempt.getTransactionId()); } else { throw new FailedException(“Received commit for different transaction attempt”); } } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { // valid to delete before what’s been committed since // those batches will never be accessed again _activeBatches.headMap(attempt.getTransactionId()).clear(); _emitter.success(attempt); } else { _collector.setBatch(info.batchId); _emitter.emitBatch(attempt, input.getValue(1), _collector); _activeBatches.put(attempt.getTransactionId(), attempt); } } @Override public void cleanup() { _emitter.close(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { List<String> fields = new ArrayList<>(_spout.getOutputFields().toList()); fields.add(0, ID_FIELD); declarer.declareStream(_streamName, new Fields(fields)); } @Override public Map<String, Object> getComponentConfiguration() { return _spout.getComponentConfiguration(); } @Override public void finishBatch(BatchInfo batchInfo) { } @Override public Object initBatchState(String batchGroup, Object batchId) { return null; }}TridentSpoutExecutor使用的BatchOutputCollector为TridentBoltExecutor在prepare方法构造的，经过几层包装，先是CoordinatedOutputCollector，然后是OutputCollector，最后是BatchOutputCollectorImpl；这里最主要的是CoordinatedOutputCollector包装，它维护每个taskId发出的tuple的数量；而在这个executor的prepare方法里头，该collector又被包装为AddIdCollector，主要是添加了batchId信息(即TransactionAttempt信息)TridentSpoutExecutor的ITridentSpout就是包装了用户设置的原始spout(IBatchSpout类型)的BatchSpoutExecutor(假设原始spout是IBatchSpout类型的，因而会通过BatchSpoutExecutor包装为ITridentSpout类型)，其execute方法根据不同stream类型进行不同处理，如果是master发过来的MasterBatchCoordinator.COMMIT_STREAM_ID($commit)则调用emitter的commit方法提交当前TransactionAttempt(本文的实例没有commit信息)，然后将该tx从_activeBatches中移除；如果是master发过来的MasterBatchCoordinator.SUCCESS_STREAM_ID($success)则先把_activeBatches中txId小于该txId的TransactionAttempt移除，然后调用emitter的success方法，标记TransactionAttempt成功，该方法回调原始spout(IBatchSpout类型)的ack方法非MasterBatchCoordinator.COMMIT_STREAM_ID($commit)及MasterBatchCoordinator.SUCCESS_STREAM_ID($success)类型的tuple，则是启动batch的消息，这里设置batchId，然后调用emitter的emitBatch进行数据发送(这里传递的batchId就是TransactionAttempt的txId)，同时将该TransactionAttempt放入_activeBatches中(这里的batch相当于TransactionAttempt)FixedBatchSpoutstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/testing/FixedBatchSpout.javapublic class FixedBatchSpout implements IBatchSpout { Fields fields; List<Object>[] outputs; int maxBatchSize; HashMap<Long, List<List<Object>>> batches = new HashMap<Long, List<List<Object>>>(); public FixedBatchSpout(Fields fields, int maxBatchSize, List<Object>… outputs) { this.fields = fields; this.outputs = outputs; this.maxBatchSize = maxBatchSize; } int index = 0; boolean cycle = false; public void setCycle(boolean cycle) { this.cycle = cycle; } @Override public void open(Map conf, TopologyContext context) { index = 0; } @Override public void emitBatch(long batchId, TridentCollector collector) { List<List<Object>> batch = this.batches.get(batchId); if(batch == null){ batch = new ArrayList<List<Object>>(); if(index>=outputs.length && cycle) { index = 0; } for(int i=0; index < outputs.length && i < maxBatchSize; index++, i++) { batch.add(outputs[index]); } this.batches.put(batchId, batch); } for(List<Object> list : batch){ collector.emit(list); } } @Override public void ack(long batchId) { this.batches.remove(batchId); } @Override public void close() { } @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.setMaxTaskParallelism(1); return conf; } @Override public Fields getOutputFields() { return fields; } }用户使用的spout是IBatchSpout类型，这里缓存了每个batchId对应的tuple数据，实现的是transactional spout的语义TridentTopology.newStreamstorm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/TridentTopology.java public Stream newStream(String txId, IRichSpout spout) { return newStream(txId, new RichSpoutBatchExecutor(spout)); } public Stream newStream(String txId, IBatchSpout spout) { Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH); return addNode(n); } public Stream newStream(String txId, ITridentSpout spout) { Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH); return addNode(n); } public Stream newStream(String txId, IPartitionedTridentSpout spout) { return newStream(txId, new PartitionedTridentSpoutExecutor(spout)); } public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) { return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout)); } public Stream newStream(String txId, ITridentDataSource dataSource) { if (dataSource instanceof IBatchSpout) { return newStream(txId, (IBatchSpout) dataSource); } else if (dataSource instanceof ITridentSpout) { return newStream(txId, (ITridentSpout) dataSource); } else if (dataSource instanceof IPartitionedTridentSpout) { return newStream(txId, (IPartitionedTridentSpout) dataSource); } else if (dataSource instanceof IOpaquePartitionedTridentSpout) { return newStream(txId, (IOpaquePartitionedTridentSpout) dataSource); } else { throw new UnsupportedOperationException(“Unsupported stream”); } }用户在TridentTopology.newStream可以直接使用IBatchSpout类似的spout，使用它的好处就是TridentTopology在build的时候会使用BatchSpoutExecutor将其包装为ITridentSpout类型(省得用户再去实现ITridentSpout的相关接口，屏蔽trident spout的相关逻辑，使得之前一直使用普通topology的用户可以快速上手trident topology)BatchSpoutExecutor实现了ITridentSpout接口，将IBatchSpout适配为ITridentSpout，使用的coordinator是EmptyCoordinator，使用的emitter是BatchSpoutEmitter如果用户在TridentTopology.newStream使用的spout是IPartitionedTridentSpout类型，则TridentTopology在newStream方法内部会使用PartitionedTridentSpoutExecutor将其包装为ITridentSpout类型，对于IOpaquePartitionedTridentSpout则使用OpaquePartitionedTridentSpoutExecutor将其包装为ITridentSpout类型小结TridentTopology在newStream或者build方法里头会将ITridentDataSource中不是ITridentSpout类型的IBatchSpout(在build方法)、IPartitionedTridentSpout(在newStream方法)、IOpaquePartitionedTridentSpout(在newStream方法)适配为ITridentSpout类型；分别使用BatchSpoutExecutor、PartitionedTridentSpoutExecutor、OpaquePartitionedTridentSpoutExecutor进行适配(TridentTopologyBuilder在buildTopology的时候，对于ITridentSpout类型的spout先用TridentSpoutExecutor包装，再用TridentBoltExecutor包装，最后转换为bolt，而整个TridentTopology真正的spout就是MasterBatchCoordinator；这里可以看到一个IBatchSpout的spout先经过BatchSpoutExecutor包装为ITridentSpout类型，之后再经过TridentSpoutExecutor及TridentBoltExecutor包装为bolt)IBatchSpout的ack是针对batch维度的，也就是TransactionAttempt维度，注意这里没有fail方法，如果emitBatch方法抛出了FailedException异常，则TridentBoltExecutor会调用failBatch方法(一个batch的tuples会等所有tuple执行完再触发checkFinish)，进行reportError以及标记TrackedBatch的failed为true，之后TridentBoltExecutor在checkFinish的时候，一旦发现tracked.failed为true的时候，会调用_collector.fail(tuple)，然后回调MasterBatchCoordinator的fail方法MasterBatchCoordinator的fail方法会将当前TransactionAttempt从_activeTx移除，然后一并移除txId大于失败的txId的数据，最后调用sync方法继续TransactionAttempt(注意这里没有更改_currTransaction值，因而会继续从失败的txId开始重试，只有在ack方法里头会更改_currTransaction为nextTransactionId)TridentBoltExecutor的execute方法会根据tickTuple来检测距离上次rotate是否超过_messageTimeoutMs(取component中Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS最大值1000，这里1000是将秒转换为毫秒)，超过的话进行rotate操作，_batches的最后一个bucket将会被移除掉；这里的tickTuple的频率为5秒，Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS按30秒算的话，_messageTimeoutMs为301000，相当于每5秒检测一下距离上次rotate时间是否超过30秒，如果超过则进行rotate，丢弃最后一个bucket的数据(TrackedBatch)，这里相当于重置超时的TrackedBatch信息关于MasterBatchCoordinator的fail的情况，有几种情况，一种是下游componnent主动抛出FailException，这个时候会触发master的fail，再次重试TransactionAttempt；一种是下游component处理tuple时间超过Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs，defaults.yaml中默认为30)，这个时候ack会触发master的fail，导致该TransactionAttempt失败继续重试，目前没有对attempt的次数做限制，实际生产过程中要注意，因为只要该batchId的一个tuple失败，整个batchId的tuples都会重发，这个时候下游如果没有做好处理，可能会出现一个batchId中前面部分tuple成功，后面部分失败，导致成功的tuple不断重复处理(要避免失败的batch中tuples部分处理成功部分处理失败这个问题就需要配合使用Trident的State)。docTrident SpoutsTrident State聊聊storm TridentTopology的构建 ...

聊聊storm TridentTopology的构建

序本文主要研究一下storm TridentTopology的构建实例 @Test public void testDebugTopologyBuild(){ FixedBatchSpout spout = new FixedBatchSpout(new Fields(“user”, “score”), 3, new Values(“nickt1”, 4), new Values(“nickt2”, 7), new Values(“nickt3”, 8), new Values(“nickt4”, 9), new Values(“nickt5”, 7), new Values(“nickt6”, 11), new Values(“nickt7”, 5) ); spout.setCycle(false); TridentTopology topology = new TridentTopology(); Stream stream1 = topology.newStream(“spout1”,spout) .each(new Fields(“user”, “score”), new BaseFunction() { @Override public void execute(TridentTuple tuple, TridentCollector collector) { System.out.println(“tuple:"+tuple); } },new Fields()); topology.build(); }后面的分析为了简单起见，很多是依据这个实例来TridentTopology.newStreamstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream newStream(String txId, IRichSpout spout) { return newStream(txId, new RichSpoutBatchExecutor(spout)); } public Stream newStream(String txId, IPartitionedTridentSpout spout) { return newStream(txId, new PartitionedTridentSpoutExecutor(spout)); } public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) { return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout)); } public Stream newStream(String txId, ITridentDataSource dataSource) { if (dataSource instanceof IBatchSpout) { return newStream(txId, (IBatchSpout) dataSource); } else if (dataSource instanceof ITridentSpout) { return newStream(txId, (ITridentSpout) dataSource); } else if (dataSource instanceof IPartitionedTridentSpout) { return newStream(txId, (IPartitionedTridentSpout) dataSource); } else if (dataSource instanceof IOpaquePartitionedTridentSpout) { return newStream(txId, (IOpaquePartitionedTridentSpout) dataSource); } else { throw new UnsupportedOperationException(“Unsupported stream”); } } public Stream newStream(String txId, IBatchSpout spout) { Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH); return addNode(n); } public Stream newStream(String txId, ITridentSpout spout) { Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH); return addNode(n); } protected Stream addNode(Node n) { registerNode(n); return new Stream(this, n.name, n); } protected void registerNode(Node n) { _graph.addVertex(n); if(n.stateInfo!=null) { String id = n.stateInfo.id; if(!_colocate.containsKey(id)) { _colocate.put(id, new ArrayList()); } _colocate.get(id).add(n); } }newStream的第一个参数是txId，第二个参数是ITridentDataSourceITridentDataSource分为好几个类型，分别有IBatchSpout、ITridentSpout、IPartitionedTridentSpout、IOpaquePartitionedTridentSpout最后都是创建SpoutNode，然后registerNode添加到_graph(如果node的stateInfo不为null，还会添加到_colocate，不过SpoutNode该值为null)，注意SpoutNode的SpoutType为SpoutNode.SpoutType.BATCHNodestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/Node.javapublic class Node extends DefaultResourceDeclarer<Node> implements Serializable { private static final AtomicInteger INDEX = new AtomicInteger(0); private String nodeId; public String name = null; public Fields allOutputFields; public String streamId; public Integer parallelismHint = null; public NodeStateInfo stateInfo = null; public int creationIndex; public Node(String streamId, String name, Fields allOutputFields) { this.nodeId = UUID.randomUUID().toString(); this.allOutputFields = allOutputFields; this.streamId = streamId; this.name = name; this.creationIndex = INDEX.incrementAndGet(); } @Override public boolean equals(Object o) { if (this == o) { return true; } return nodeId.equals(((Node) o).nodeId); } @Override public int hashCode() { return nodeId.hashCode(); } @Override public String toString() { return ToStringBuilder.reflectionToString(this, ToStringStyle.MULTI_LINE_STYLE); } public String shortString() { return “nodeId: " + nodeId + “, allOutputFields: " + allOutputFields; }}Node继承了DefaultResourceDeclarer，而它实现了resources相关的接口：ResourceDeclarer以及ITridentResourceNode有几个子类，分别是SpoutNode、ProcessorNode、PartitionNodeSpoutNode就是spout信息的节点描述，ProcessorNode一般是trident的each、map、aggregrate、reduce、project等操作的节点描述，PartitionNode就是partition相关的节点描述TridentTopology.buildstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public StormTopology build() { DefaultDirectedGraph<Node, IndexedEdge> graph = (DefaultDirectedGraph) _graph.clone(); //…… List<SpoutNode> spoutNodes = new ArrayList<>(); // can be regular nodes (static state) or processor nodes Set<Node> boltNodes = new LinkedHashSet<>(); for(Node n: graph.vertexSet()) { if(n instanceof SpoutNode) { spoutNodes.add((SpoutNode) n); } else if(!(n instanceof PartitionNode)) { boltNodes.add(n); } } Set<Group> initialGroups = new LinkedHashSet<>(); //…… for(Node n: boltNodes) { initialGroups.add(new Group(graph, n)); } GraphGrouper grouper = new GraphGrouper(graph, initialGroups); grouper.mergeFully(); Collection<Group> mergedGroups = grouper.getAllGroups(); // add identity partitions between groups for(IndexedEdge<Node> e: new HashSet<>(graph.edgeSet())) { if(!(e.source instanceof PartitionNode) && !(e.target instanceof PartitionNode)) { Group g1 = grouper.nodeGroup(e.source); Group g2 = grouper.nodeGroup(e.target); // g1 being null means the source is a spout node if(g1==null && !(e.source instanceof SpoutNode)) throw new RuntimeException(“Planner exception: Null source group must indicate a spout node at this phase of planning”); if(g1==null || !g1.equals(g2)) { graph.removeEdge(e); PartitionNode pNode = makeIdentityPartition(e.source); graph.addVertex(pNode); graph.addEdge(e.source, pNode, new IndexedEdge(e.source, pNode, 0)); graph.addEdge(pNode, e.target, new IndexedEdge(pNode, e.target, e.index)); } } } //…… // add in spouts as groups so we can get parallelisms for(Node n: spoutNodes) { grouper.addGroup(new Group(graph, n)); } grouper.reindex(); mergedGroups = grouper.getAllGroups(); Map<Node, String> batchGroupMap = new HashMap<>(); List<Set<Node>> connectedComponents = new ConnectivityInspector<>(graph).connectedSets(); for(int i=0; i<connectedComponents.size(); i++) { String groupId = “bg” + i; for(Node n: connectedComponents.get(i)) { batchGroupMap.put(n, groupId); } } // System.out.println(“GRAPH:”);// System.out.println(graph); Map<Group, Integer> parallelisms = getGroupParallelisms(graph, grouper, mergedGroups); TridentTopologyBuilder builder = new TridentTopologyBuilder(); Map<Node, String> spoutIds = genSpoutIds(spoutNodes); Map<Group, String> boltIds = genBoltIds(mergedGroups); for(SpoutNode sn: spoutNodes) { Integer parallelism = parallelisms.get(grouper.nodeGroup(sn)); Map<String, Number> spoutRes = new HashMap<>(_resourceDefaults); spoutRes.putAll(sn.getResources()); Number onHeap = spoutRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_ONHEAP_MEMORY_MB); Number offHeap = spoutRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_OFFHEAP_MEMORY_MB); Number cpuLoad = spoutRes.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT); SpoutDeclarer spoutDeclarer = null; if(sn.type == SpoutNode.SpoutType.DRPC) { spoutDeclarer = builder.setBatchPerTupleSpout(spoutIds.get(sn), sn.streamId, (IRichSpout) sn.spout, parallelism, batchGroupMap.get(sn)); } else { ITridentSpout s; if(sn.spout instanceof IBatchSpout) { s = new BatchSpoutExecutor((IBatchSpout)sn.spout); } else if(sn.spout instanceof ITridentSpout) { s = (ITridentSpout) sn.spout; } else { throw new RuntimeException(“Regular rich spouts not supported yet… try wrapping in a RichSpoutBatchExecutor”); // TODO: handle regular rich spout without batches (need lots of updates to support this throughout) } spoutDeclarer = builder.setSpout(spoutIds.get(sn), sn.streamId, sn.txId, s, parallelism, batchGroupMap.get(sn)); } if(onHeap != null) { if(offHeap != null) { spoutDeclarer.setMemoryLoad(onHeap, offHeap); } else { spoutDeclarer.setMemoryLoad(onHeap); } } if(cpuLoad != null) { spoutDeclarer.setCPULoad(cpuLoad); } } for(Group g: mergedGroups) { if(!isSpoutGroup(g)) { Integer p = parallelisms.get(g); Map<String, String> streamToGroup = getOutputStreamBatchGroups(g, batchGroupMap); Map<String, Number> groupRes = g.getResources(_resourceDefaults); Number onHeap = groupRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_ONHEAP_MEMORY_MB); Number offHeap = groupRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_OFFHEAP_MEMORY_MB); Number cpuLoad = groupRes.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT); BoltDeclarer d = builder.setBolt(boltIds.get(g), new SubtopologyBolt(graph, g.nodes, batchGroupMap), p, committerBatches(g, batchGroupMap), streamToGroup); if(onHeap != null) { if(offHeap != null) { d.setMemoryLoad(onHeap, offHeap); } else { d.setMemoryLoad(onHeap); } } if(cpuLoad != null) { d.setCPULoad(cpuLoad); } Collection<PartitionNode> inputs = uniquedSubscriptions(externalGroupInputs(g)); for(PartitionNode n: inputs) { Node parent = TridentUtils.getParent(graph, n); String componentId = parent instanceof SpoutNode ? spoutIds.get(parent) : boltIds.get(grouper.nodeGroup(parent)); d.grouping(new GlobalStreamId(componentId, n.streamId), n.thriftGrouping); } } } HashMap<String, Number> combinedMasterCoordResources = new HashMap<String, Number>(_resourceDefaults); combinedMasterCoordResources.putAll(_masterCoordResources); return builder.buildTopology(combinedMasterCoordResources); }这里创建了TridentTopologyBuilder，然后对于spoutNodes，调用TridentTopologyBuilder.setSpout(String id, String streamName, String txStateId, ITridentSpout spout, Integer parallelism, String batchGroup)方法，添加spout对于IBatchSpout类型的spout，通过BatchSpoutExecutor包装为ITridentSpout这里的streamName为streamId，通过UniqueIdGen.getUniqueStreamId生成，以s开头，之后是_streamCounter的计数，比如1，合起来就是s1；txStateId为用户传入的txId；batchGroup以bg开头，之后是connectedComponents的元素的index，比如0，合起来就是bg0；parallelism参数就是用户构建topology时设置的设置完spout之后，就是设置spout的相关资源配置，比如memoryLoad、cpuLoad；之后设置bolt，这里使用的是SubtopologyBolt，然后设置bolt相关的资源配置最后调用TridentTopologyBuilder.buildTopologyTridentTopologyBuilder.setSpoutstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentTopologyBuilder.java Map<GlobalStreamId, String> _batchIds = new HashMap(); Map<String, TransactionalSpoutComponent> _spouts = new HashMap(); public SpoutDeclarer setSpout(String id, String streamName, String txStateId, ITridentSpout spout, Integer parallelism, String batchGroup) { Map<String, String> batchGroups = new HashMap(); batchGroups.put(streamName, batchGroup); markBatchGroups(id, batchGroups); TransactionalSpoutComponent c = new TransactionalSpoutComponent(spout, streamName, parallelism, txStateId, batchGroup); _spouts.put(id, c); return new SpoutDeclarerImpl(c); } private void markBatchGroups(String component, Map<String, String> batchGroups) { for(Map.Entry<String, String> entry: batchGroups.entrySet()) { _batchIds.put(new GlobalStreamId(component, entry.getKey()), entry.getValue()); } }这里调用了markBatchGroups，将新的component添加到_batchIds中，同时也添加到_spouts中TridentTopologyBuilder.setBoltstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentTopologyBuilder.java Map<GlobalStreamId, String> _batchIds = new HashMap(); Map<String, Component> _bolts = new HashMap(); // map from stream name to batch id public BoltDeclarer setBolt(String id, ITridentBatchBolt bolt, Integer parallelism, Set<String> committerBatches, Map<String, String> batchGroups) { markBatchGroups(id, batchGroups); Component c = new Component(bolt, parallelism, committerBatches); _bolts.put(id, c); return new BoltDeclarerImpl(c); } private void markBatchGroups(String component, Map<String, String> batchGroups) { for(Map.Entry<String, String> entry: batchGroups.entrySet()) { _batchIds.put(new GlobalStreamId(component, entry.getKey()), entry.getValue()); } }这里调用了markBatchGroups将新的component添加到_batchIds中，同时也添加到_bolts中；对于trident来说，就是一系列的ProcessorNode(可能也会有PartitionNode)TridentTopologyBuilder.buildTopologystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentTopologyBuilder.java public StormTopology buildTopology(Map<String, Number> masterCoordResources) { TopologyBuilder builder = new TopologyBuilder(); Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false); Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true); Map<String, List<String>> batchesToCommitIds = new HashMap<>(); Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<>(); for(String id: _spouts.keySet()) { TransactionalSpoutComponent c = _spouts.get(id); if(c.spout instanceof IRichSpout) { //TODO: wrap this to set the stream name builder.setSpout(id, (IRichSpout) c.spout, c.parallelism); } else { String batchGroup = c.batchGroupId; if(!batchesToCommitIds.containsKey(batchGroup)) { batchesToCommitIds.put(batchGroup, new ArrayList<String>()); } batchesToCommitIds.get(batchGroup).add(c.commitStateId); if(!batchesToSpouts.containsKey(batchGroup)) { batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>()); } batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout); BoltDeclarer scd = builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout)) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID); for(Map<String, Object> m: c.componentConfs) { scd.addConfigurations(m); } Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap(); specs.put(c.batchGroupId, new CoordSpec()); BoltDeclarer bd = builder.setBolt(id, new TridentBoltExecutor( new TridentSpoutExecutor( c.commitStateId, c.streamName, ((ITridentSpout) c.spout)), batchIdsForSpouts, specs), c.parallelism); bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID); bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID); if(c.spout instanceof ICommitterTridentSpout) { bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID); } for(Map<String, Object> m: c.componentConfs) { bd.addConfigurations(m); } } } //…… Number onHeap = masterCoordResources.get(Config.TOPOLOGY_COMPONENT_RESOURCES_ONHEAP_MEMORY_MB); Number offHeap = masterCoordResources.get(Config.TOPOLOGY_COMPONENT_RESOURCES_OFFHEAP_MEMORY_MB); Number cpuLoad = masterCoordResources.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT); for(String batch: batchesToCommitIds.keySet()) { List<String> commitIds = batchesToCommitIds.get(batch); SpoutDeclarer masterCoord = builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch))); if(onHeap != null) { if(offHeap != null) { masterCoord.setMemoryLoad(onHeap, offHeap); } else { masterCoord.setMemoryLoad(onHeap); } } if(cpuLoad != null) { masterCoord.setCPULoad(cpuLoad); } } for(String id: _bolts.keySet()) { Component c = _bolts.get(id); Map<String, CoordSpec> specs = new HashMap<>(); for(GlobalStreamId s: getBoltSubscriptionStreams(id)) { String batch = batchIdsForBolts.get(s); if(!specs.containsKey(batch)) specs.put(batch, new CoordSpec()); CoordSpec spec = specs.get(batch); CoordType ct; if(_batchPerTupleSpouts.containsKey(s.get_componentId())) { ct = CoordType.single(); } else { ct = CoordType.all(); } spec.coords.put(s.get_componentId(), ct); } for(String b: c.committerBatches) { specs.get(b).commitStream = new GlobalStreamId(masterCoordinator(b), MasterBatchCoordinator.COMMIT_STREAM_ID); } BoltDeclarer d = builder.setBolt(id, new TridentBoltExecutor(c.bolt, batchIdsForBolts, specs), c.parallelism); for(Map<String, Object> conf: c.componentConfs) { d.addConfigurations(conf); } for(InputDeclaration inputDecl: c.declarations) { inputDecl.declare(d); } Map<String, Set<String>> batchToComponents = getBoltBatchToComponentSubscriptions(id); for(Map.Entry<String, Set<String>> entry: batchToComponents.entrySet()) { for(String comp: entry.getValue()) { d.directGrouping(comp, TridentBoltExecutor.COORD_STREAM(entry.getKey())); } } for(String b: c.committerBatches) { d.allGrouping(masterCoordinator(b), MasterBatchCoordinator.COMMIT_STREAM_ID); } } return builder.createTopology(); }buildTopology对于非IRichSpout的的spout会在topology中创建TridentSpoutCoordinator这个bolt，它globalGrouping了MasterBatchCoordinator.BATCH_STREAM_ID($batch)、MasterBatchCoordinator.SUCCESS_STREAM_ID($success)这两个stream；同时还创建了TridentBoltExecutor这个bolt，它allGrouping了MasterBatchCoordinator.BATCH_STREAM_ID($batch)、MasterBatchCoordinator.SUCCESS_STREAM_ID($success)，对于spout是ICommitterTridentSpout类型的，还allGrouping了MasterBatchCoordinator.COMMIT_STREAM_ID($commit)；注意这里将非IRichSpout的spout转换为bolt之后对batchesToCommitIds中的每个batch创建MasterBatchCoordinator这个spout，正好前前面的TridentSpoutCoordinator以及TridentBoltExecutor衔接起来对于bolt来说(包装了ProcessorNode的SubtopologyBolt)，这里设置了TridentBoltExecutor这个bolt，它directGrouping了TridentBoltExecutor.COORD_STREAM($coord-)，同时还allGrouping了MasterBatchCoordinator.COMMIT_STREAM_ID($commit)TridentTopologyBuilder.createTopologystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentTopologyBuilder.java public StormTopology createTopology() { Map<String, Bolt> boltSpecs = new HashMap<>(); Map<String, SpoutSpec> spoutSpecs = new HashMap<>(); maybeAddCheckpointSpout(); for(String boltId: _bolts.keySet()) { IRichBolt bolt = _bolts.get(boltId); bolt = maybeAddCheckpointTupleForwarder(bolt); ComponentCommon common = getComponentCommon(boltId, bolt); try{ maybeAddCheckpointInputs(common); boltSpecs.put(boltId, new Bolt(ComponentObject.serialized_java(Utils.javaSerialize(bolt)), common)); }catch(RuntimeException wrapperCause){ if (wrapperCause.getCause() != null && NotSerializableException.class.equals(wrapperCause.getCause().getClass())){ throw new IllegalStateException( “Bolt ‘” + boltId + “’ contains a non-serializable field of type " + wrapperCause.getCause().getMessage() + “, " + “which was instantiated prior to topology creation. " + wrapperCause.getCause().getMessage() + " " + “should be instantiated within the prepare method of ‘” + boltId + " at the earliest.”, wrapperCause); } throw wrapperCause; } } for(String spoutId: _spouts.keySet()) { IRichSpout spout = _spouts.get(spoutId); ComponentCommon common = getComponentCommon(spoutId, spout); try{ spoutSpecs.put(spoutId, new SpoutSpec(ComponentObject.serialized_java(Utils.javaSerialize(spout)), common)); }catch(RuntimeException wrapperCause){ if (wrapperCause.getCause() != null && NotSerializableException.class.equals(wrapperCause.getCause().getClass())){ throw new IllegalStateException( “Spout ‘” + spoutId + “’ contains a non-serializable field of type " + wrapperCause.getCause().getMessage() + “, " + “which was instantiated prior to topology creation. " + wrapperCause.getCause().getMessage() + " " + “should be instantiated within the prepare method of ‘” + spoutId + " at the earliest.”, wrapperCause); } throw wrapperCause; } } StormTopology stormTopology = new StormTopology(spoutSpecs, boltSpecs, new HashMap<String, StateSpoutSpec>()); stormTopology.set_worker_hooks(_workerHooks); return Utils.addVersions(stormTopology); } /** * If the topology has at least one stateful bolt * add a {@link CheckpointSpout} component to the topology. */ private void maybeAddCheckpointSpout() { if (hasStatefulBolt) { setSpout(CHECKPOINT_COMPONENT_ID, new CheckpointSpout(), 1); } }createTopology的时候，判断如果有stateful的bolt，则会添加CheckpointSpout这个spout；同时对每个bolt判断如果是statefulBolt且不是StatefulBoltExecutor，那么会添加CheckpointTupleForwarder经过buildTopology的一系列设置，到了createTopology这里，已经有了3个bolt，一个是包装了ProcessNode的TridentBoltExecutor，一个是TridentSpoutCoordinator，还有一个是包装了原始spout的TridentBoltExecutorspout这里只有一个就是MasterBatchCoordinator，在buildTopology的时候，对于非IRichSpout的的spout，会被转化为TridentSpoutCoordinator这个bolt拓扑结构以前面的实例来讲，经过TridentTopologyBuilder的createTopology，最后的拓扑结构为一个spout为MasterBatchCoordinator($mastercoord-bg0)，3个bolt分别为TridentSpoutCoordinator($spoutcoord-spout-spout1)、包装了非IRichSpout的的spout的TridentBoltExecutor(spout-spout1)、包装了ProcessorNode的TridentBoltExecutor(b-0)；一共涉及到了几个stream，分别为MasterBatchCoordinator.SUCCESS_STREAM_ID($success)、MasterBatchCoordinator.COMMIT_STREAM_ID($commit)、MasterBatchCoordinator.BATCH_STREAM_ID($batch)、TridentBoltExecutor.COORD_STREAM($coord-bg0)、s1、s2$mastercoord-bg0它declare了$success、$commit、$batch这三个stream，outputFields均为tx这个字段$spoutcoord-spout-spout1它接收了$mastercoord-bg0的$success、$batch这两个stream，同时declare了$batch这个stream，outputFields为[tx,metadata]spout-spout1，它allGrouping接收$mastercoord-bg0的$success，以及$spoutcoord-spout-spout1的$batch这两个stream的数据；同时会往$coord-bg0发送[id,count]数据，以及stream(s1)发送数据tupleb-0它接收了spout-spout1的$coord-bg0以及s1这两个stream的数据，之后往stream(s2)发送数据(output_fields:[$batchId, user, score])，同时也会往stream($coord-bg0)发送[id, count]数据小结TridentTopologyBuilder在buildTopology的时候，对于非IRichSpout的的spout，会被转化为TridentBoltExecutor这个bolt，同时会新增一个TridentSpoutCoordinator这个bolt；ProcessorNode则会被包装为TridentBoltExecutor这个bolt；TridentTopology为了方便管理将用户设定的spout包装为bolt，然后创建MasterBatchCoordinator作为真正的spoutTridentBoltExecutor.COORD_STREAM($coord-)这个stream用来在component之间传递[id, count]数据，用于保障tuple在每个component能够完整传输，即spout和bolt都会往该stream发送[id, count]数据MasterBatchCoordinator、TridentSpoutCoordinator、包装原始spout的TridentBoltExecutor(spout-spout1)它们之间的关系如下：master会给spout-spout1发送suceess数据(tuple\指令)，给coordinator发送suceess、batch数据(tuple\指令)；coordinator会给spout-spout1发送batch数据(tuple\指令)docTrident API OverviewTrident Spouts聊聊storm的LinearDRPCTopologyBuilder ...

聊聊storm tuple的序列化

序本文主要研究一下storm tuple的序列化ExecutorTransfer.tryTransferstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java// Every executor has an instance of this classpublic class ExecutorTransfer { private static final Logger LOG = LoggerFactory.getLogger(ExecutorTransfer.class); private final WorkerState workerData; private final KryoTupleSerializer serializer; private final boolean isDebug; private int indexingBase = 0; private ArrayList<JCQueue> localReceiveQueues; // [taskId-indexingBase] => queue : List of all recvQs local to this worker private AtomicReferenceArray<JCQueue> queuesToFlush; // [taskId-indexingBase] => queue, some entries can be null. : outbound Qs for this executor instance public ExecutorTransfer(WorkerState workerData, Map<String, Object> topoConf) { this.workerData = workerData; this.serializer = new KryoTupleSerializer(topoConf, workerData.getWorkerTopologyContext()); this.isDebug = ObjectReader.getBoolean(topoConf.get(Config.TOPOLOGY_DEBUG), false); } //…… // adds addressedTuple to destination Q if it is not full. else adds to pendingEmits (if its not null) public boolean tryTransfer(AddressedTuple addressedTuple, Queue<AddressedTuple> pendingEmits) { if (isDebug) { LOG.info(“TRANSFERRING tuple {}”, addressedTuple); } JCQueue localQueue = getLocalQueue(addressedTuple); if (localQueue != null) { return tryTransferLocal(addressedTuple, localQueue, pendingEmits); } return workerData.tryTransferRemote(addressedTuple, pendingEmits, serializer); } //……}ExecutorTransfer在构造器里头创建了KryoTupleSerializer这里先判断目标地址是否是在localQueue中，如果是则进行local transfer，否则进行remote transferremote transfer的时候调用了workerData.tryTransferRemote，并传递了serializerWorkerState.tryTransferRemotestorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java /* Not a Blocking call. If cannot emit, will add ’tuple’ to pendingEmits and return ‘false’. ‘pendingEmits’ can be null / public boolean tryTransferRemote(AddressedTuple tuple, Queue<AddressedTuple> pendingEmits, ITupleSerializer serializer) { return workerTransfer.tryTransferRemote(tuple, pendingEmits, serializer); }WorkerState.tryTransferRemote实际上使用的是workerTransfer.tryTransferRemoteworkerTransfer.tryTransferRemotestorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java / Not a Blocking call. If cannot emit, will add ’tuple’ to ‘pendingEmits’ and return ‘false’. ‘pendingEmits’ can be null / public boolean tryTransferRemote(AddressedTuple addressedTuple, Queue<AddressedTuple> pendingEmits, ITupleSerializer serializer) { if (pendingEmits != null && !pendingEmits.isEmpty()) { pendingEmits.add(addressedTuple); return false; } if (!remoteBackPressureStatus[addressedTuple.dest].get()) { TaskMessage tm = new TaskMessage(addressedTuple.getDest(), serializer.serialize(addressedTuple.getTuple())); if (transferQueue.tryPublish(tm)) { return true; } } else { LOG.debug(“Noticed Back Pressure in remote task {}”, addressedTuple.dest); } if (pendingEmits != null) { pendingEmits.add(addressedTuple); } return false; }这里可以看到创建TaskMessage的时候，使用serializer.serialize(addressedTuple.getTuple())对tuple进行了序列化；该serializer为ITupleSerializer类型，它的实现类为KryoTupleSerializerKryoTupleSerializerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.javapublic class KryoTupleSerializer implements ITupleSerializer { KryoValuesSerializer _kryo; SerializationFactory.IdDictionary _ids; Output _kryoOut; public KryoTupleSerializer(final Map<String, Object> conf, final GeneralTopologyContext context) { _kryo = new KryoValuesSerializer(conf); _kryoOut = new Output(2000, 2000000000); _ids = new SerializationFactory.IdDictionary(context.getRawTopology()); } public byte[] serialize(Tuple tuple) { try { _kryoOut.clear(); _kryoOut.writeInt(tuple.getSourceTask(), true); _kryoOut.writeInt(_ids.getStreamId(tuple.getSourceComponent(), tuple.getSourceStreamId()), true); tuple.getMessageId().serialize(_kryoOut); _kryo.serializeInto(tuple.getValues(), _kryoOut); return _kryoOut.toBytes(); } catch (IOException e) { throw new RuntimeException(e); } } // public long crc32(Tuple tuple) { // try { // CRC32OutputStream hasher = new CRC32OutputStream(); // _kryo.serializeInto(tuple.getValues(), hasher); // return hasher.getValue(); // } catch (IOException e) { // throw new RuntimeException(e); // } // }}KryoTupleSerializer创建了KryoValuesSerializer，在serialize tuple的时候调用了_kryo.serializeInto(tuple.getValues(), _kryoOut)KryoValuesSerializerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/serialization/KryoValuesSerializer.javapublic class KryoValuesSerializer { Kryo _kryo; ListDelegate _delegate; Output _kryoOut; public KryoValuesSerializer(Map<String, Object> conf) { _kryo = SerializationFactory.getKryo(conf); _delegate = new ListDelegate(); _kryoOut = new Output(2000, 2000000000); } public void serializeInto(List<Object> values, Output out) { // this ensures that list of values is always written the same way, regardless // of whether it’s a java collection or one of clojure’s persistent collections // (which have different serializers) // Doing this lets us deserialize as ArrayList and avoid writing the class here _delegate.setDelegate(values); _kryo.writeObject(out, _delegate); } public byte[] serialize(List<Object> values) { _kryoOut.clear(); serializeInto(values, _kryoOut); return _kryoOut.toBytes(); } public byte[] serializeObject(Object obj) { _kryoOut.clear(); _kryo.writeClassAndObject(_kryoOut, obj); return _kryoOut.toBytes(); }}KryoValuesSerializer在构造器里头调用SerializationFactory.getKryo(conf)方法创建_kryo这里的_delegate使用的是ListDelegate(即用它来包装一下List<Object> values)，_kryoOut为new Output(2000, 2000000000)serialize方法调用的是serializeInto方法，该方法最后调用的是原生的_kryo.writeObject方法进行序列化SerializationFactory.getKryostorm-2.0.0/storm-client/src/jvm/org/apache/storm/serialization/SerializationFactory.java public static Kryo getKryo(Map<String, Object> conf) { IKryoFactory kryoFactory = (IKryoFactory) ReflectionUtils.newInstance((String) conf.get(Config.TOPOLOGY_KRYO_FACTORY)); Kryo k = kryoFactory.getKryo(conf); k.register(byte[].class); / tuple payload serializer is specified via configuration / String payloadSerializerName = (String) conf.get(Config.TOPOLOGY_TUPLE_SERIALIZER); try { Class serializerClass = Class.forName(payloadSerializerName); Serializer serializer = resolveSerializerInstance(k, ListDelegate.class, serializerClass, conf); k.register(ListDelegate.class, serializer); } catch (ClassNotFoundException ex) { throw new RuntimeException(ex); } k.register(ArrayList.class, new ArrayListSerializer()); k.register(HashMap.class, new HashMapSerializer()); k.register(HashSet.class, new HashSetSerializer()); k.register(BigInteger.class, new BigIntegerSerializer()); k.register(TransactionAttempt.class); k.register(Values.class); k.register(org.apache.storm.metric.api.IMetricsConsumer.DataPoint.class); k.register(org.apache.storm.metric.api.IMetricsConsumer.TaskInfo.class); k.register(ConsList.class); k.register(BackPressureStatus.class); synchronized (loader) { for (SerializationRegister sr : loader) { try { sr.register(k); } catch (Exception e) { throw new RuntimeException(e); } } } kryoFactory.preRegister(k, conf); boolean skipMissing = (Boolean) conf.get(Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS); register(k, conf.get(Config.TOPOLOGY_KRYO_REGISTER), conf, skipMissing); kryoFactory.postRegister(k, conf); if (conf.get(Config.TOPOLOGY_KRYO_DECORATORS) != null) { for (String klassName : (List<String>) conf.get(Config.TOPOLOGY_KRYO_DECORATORS)) { try { Class klass = Class.forName(klassName); IKryoDecorator decorator = (IKryoDecorator) klass.newInstance(); decorator.decorate(k); } catch (ClassNotFoundException e) { if (skipMissing) { LOG.info(“Could not find kryo decorator named " + klassName + “. Skipping registration…”); } else { throw new RuntimeException(e); } } catch (InstantiationException e) { throw new RuntimeException(e); } catch (IllegalAccessException e) { throw new RuntimeException(e); } } } kryoFactory.postDecorate(k, conf); return k; } public static void register(Kryo k, Object kryoRegistrations, Map<String, Object> conf, boolean skipMissing) { Map<String, String> registrations = normalizeKryoRegister(kryoRegistrations); for (Map.Entry<String, String> entry : registrations.entrySet()) { String serializerClassName = entry.getValue(); try { Class klass = Class.forName(entry.getKey()); Class serializerClass = null; if (serializerClassName != null) { serializerClass = Class.forName(serializerClassName); } if (serializerClass == null) { k.register(klass); } else { k.register(klass, resolveSerializerInstance(k, klass, serializerClass, conf)); } } catch (ClassNotFoundException e) { if (skipMissing) { LOG.info(“Could not find serialization or class for " + serializerClassName + “. Skipping registration…”); } else { throw new RuntimeException(e); } } } }SerializationFactory.getKryo静态方法首先根据Config.TOPOLOGY_KRYO_FACTORY创建IKryoFactory，默认是org.apache.storm.serialization.DefaultKryoFactory之后通过IKryoFactory.getKryo创建Kryo，之后就是对Kryo进行一系列配置，这里注册了byte[].class、ListDelegate.class、ArrayList.class、HashMap.class、HashSet.class、BigInteger.class、TransactionAttempt.class、Values.class、org.apache.storm.metric.api.IMetricsConsumer.DataPoint.class、org.apache.storm.metric.api.IMetricsConsumer.TaskInfo.class、ConsList.class、BackPressureStatus.classListDelegate.class为payload的容器，采用Config.TOPOLOGY_TUPLE_SERIALIZER(topology.tuple.serializer，默认是org.apache.storm.serialization.types.ListDelegateSerializer)配置的类进行序列化Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS(topology.skip.missing.kryo.registrations，默认为false)，当kryo找不到配置的要序列化的class对应serializers的时候，是抛出异常还是直接跳过注册；最后通过Config.TOPOLOGY_KRYO_DECORATORS(topology.kryo.decorators)加载自定义的serializationDefaultKryoFactorystorm-2.0.0/storm-client/src/jvm/org/apache/storm/serialization/DefaultKryoFactory.javapublic class DefaultKryoFactory implements IKryoFactory { @Override public Kryo getKryo(Map<String, Object> conf) { KryoSerializableDefault k = new KryoSerializableDefault(); k.setRegistrationRequired(!((Boolean) conf.get(Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION))); k.setReferences(false); return k; } @Override public void preRegister(Kryo k, Map<String, Object> conf) { } public void postRegister(Kryo k, Map<String, Object> conf) { ((KryoSerializableDefault) k).overrideDefault(true); } @Override public void postDecorate(Kryo k, Map<String, Object> conf) { } public static class KryoSerializableDefault extends Kryo { boolean _override = false; public void overrideDefault(boolean value) { _override = value; } @Override public Serializer getDefaultSerializer(Class type) { if (_override) { return new SerializableSerializer(); } else { return super.getDefaultSerializer(type); } } }}这里从配置读取Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION(topology.fall.back.on.java.serialization)，默认该值为true，则registrationRequired这里设置为false，即序列化的时候不要求该class必须在已注册的列表中Kryokryo-4.0.2-sources.jar!/com/esotericsoftware/kryo/Kryo.java /* If the class is not registered and {@link Kryo#setRegistrationRequired(boolean)} is false, it is automatically registered * using the {@link Kryo#addDefaultSerializer(Class, Class) default serializer}. * @throws IllegalArgumentException if the class is not registered and {@link Kryo#setRegistrationRequired(boolean)} is true. * @see ClassResolver#getRegistration(Class) / public Registration getRegistration (Class type) { if (type == null) throw new IllegalArgumentException(“type cannot be null.”); Registration registration = classResolver.getRegistration(type); if (registration == null) { if (Proxy.isProxyClass(type)) { // If a Proxy class, treat it like an InvocationHandler because the concrete class for a proxy is generated. registration = getRegistration(InvocationHandler.class); } else if (!type.isEnum() && Enum.class.isAssignableFrom(type) && !Enum.class.equals(type)) { // This handles an enum value that is an inner class. Eg: enum A {b{}}; registration = getRegistration(type.getEnclosingClass()); } else if (EnumSet.class.isAssignableFrom(type)) { registration = classResolver.getRegistration(EnumSet.class); } else if (isClosure(type)) { registration = classResolver.getRegistration(ClosureSerializer.Closure.class); } if (registration == null) { if (registrationRequired) { throw new IllegalArgumentException(unregisteredClassMessage(type)); } if (warnUnregisteredClasses) { warn(unregisteredClassMessage(type)); } registration = classResolver.registerImplicit(type); } } return registration; } /* Registers the class using the lowest, next available integer ID and the {@link Kryo#getDefaultSerializer(Class) default * serializer}. If the class is already registered, no change will be made and the existing registration will be returned. * Registering a primitive also affects the corresponding primitive wrapper. * * Because the ID assigned is affected by the IDs registered before it, the order classes are registered is important when * using this method. The order must be the same at deserialization as it was for serialization. / public Registration register (Class type) { Registration registration = classResolver.getRegistration(type); if (registration != null) return registration; return register(type, getDefaultSerializer(type)); } /* Returns the best matching serializer for a class. This method can be overridden to implement custom logic to choose a * serializer. / public Serializer getDefaultSerializer (Class type) { if (type == null) throw new IllegalArgumentException(“type cannot be null.”); final Serializer serializerForAnnotation = getDefaultSerializerForAnnotatedType(type); if (serializerForAnnotation != null) return serializerForAnnotation; for (int i = 0, n = defaultSerializers.size(); i < n; i++) { DefaultSerializerEntry entry = defaultSerializers.get(i); if (entry.type.isAssignableFrom(type)) { Serializer defaultSerializer = entry.serializerFactory.makeSerializer(this, type); return defaultSerializer; } } return newDefaultSerializer(type); } /* Called by {@link #getDefaultSerializer(Class)} when no default serializers matched the type. Subclasses can override this * method to customize behavior. The default implementation calls {@link SerializerFactory#makeSerializer(Kryo, Class)} using * the {@link #setDefaultSerializer(Class) default serializer}. / protected Serializer newDefaultSerializer (Class type) { return defaultSerializer.makeSerializer(this, type); } /* Registers the class using the lowest, next available integer ID and the specified serializer. If the class is already * registered, the existing entry is updated with the new serializer. Registering a primitive also affects the corresponding * primitive wrapper. * * Because the ID assigned is affected by the IDs registered before it, the order classes are registered is important when * using this method. The order must be the same at deserialization as it was for serialization. / public Registration register (Class type, Serializer serializer) { Registration registration = classResolver.getRegistration(type); if (registration != null) { registration.setSerializer(serializer); return registration; } return classResolver.register(new Registration(type, serializer, getNextRegistrationId())); } /* Returns the lowest, next available integer ID. / public int getNextRegistrationId () { while (nextRegisterID != -2) { if (classResolver.getRegistration(nextRegisterID) == null) return nextRegisterID; nextRegisterID++; } throw new KryoException(“No registration IDs are available.”); }Kryo的getRegistration方法，当遇到class没有注册时会判断registrationRequired，如果为true，则抛出IllegalArgumentException；如果为false，则调用classResolver.registerImplicit进行隐式注册，同时如果warnUnregisteredClasses为true则会打印warning信息Kryo的register方法如果没有指定Serializer时，会通过getDefaultSerializer获取最匹配的Serializer，如果从已经注册的defaultSerializers没匹配到，则调用newDefaultSerializer创建一个，这里可能存在无法创建的异常，会抛出IllegalArgumentExceptionregister(Class type, Serializer serializer)方法最后是调用ClassResolver.register(Registration registration)方法，对于没有Registration的，这里new了一个，同时通过getNextRegistrationId，给Registration分配一个idDefaultClassResolver.registerkryo-4.0.2-sources.jar!/com/esotericsoftware/kryo/util/DefaultClassResolver.java static public final byte NAME = -1; protected final IntMap<Registration> idToRegistration = new IntMap(); protected final ObjectMap<Class, Registration> classToRegistration = new ObjectMap(); protected IdentityObjectIntMap<Class> classToNameId; public Registration registerImplicit (Class type) { return register(new Registration(type, kryo.getDefaultSerializer(type), NAME)); } public Registration register (Registration registration) { if (registration == null) throw new IllegalArgumentException(“registration cannot be null.”); if (registration.getId() != NAME) { if (TRACE) { trace(“kryo”, “Register class ID " + registration.getId() + “: " + className(registration.getType()) + " (” + registration.getSerializer().getClass().getName() + “)”); } idToRegistration.put(registration.getId(), registration); } else if (TRACE) { trace(“kryo”, “Register class name: " + className(registration.getType()) + " (” + registration.getSerializer().getClass().getName() + “)”); } classToRegistration.put(registration.getType(), registration); if (registration.getType().isPrimitive()) classToRegistration.put(getWrapperClass(registration.getType()), registration); return registration; } public Registration writeClass (Output output, Class type) { if (type == null) { if (TRACE || (DEBUG && kryo.getDepth() == 1)) log(“Write”, null); output.writeVarInt(Kryo.NULL, true); return null; } Registration registration = kryo.getRegistration(type); if (registration.getId() == NAME) writeName(output, type, registration); else { if (TRACE) trace(“kryo”, “Write class " + registration.getId() + “: " + className(type)); output.writeVarInt(registration.getId() + 2, true); } return registration; } protected void writeName (Output output, Class type, Registration registration) { output.writeVarInt(NAME + 2, true); if (classToNameId != null) { int nameId = classToNameId.get(type, -1); if (nameId != -1) { if (TRACE) trace(“kryo”, “Write class name reference " + nameId + “: " + className(type)); output.writeVarInt(nameId, true); return; } } // Only write the class name the first time encountered in object graph. if (TRACE) trace(“kryo”, “Write class name: " + className(type)); int nameId = nextNameId++; if (classToNameId == null) classToNameId = new IdentityObjectIntMap(); classToNameId.put(type, nameId); output.writeVarInt(nameId, true); output.writeString(type.getName()); } public void reset () { if (!kryo.isRegistrationRequired()) { if (classToNameId != null) classToNameId.clear(2048); if (nameIdToClass != null) nameIdToClass.clear(); nextNameId = 0; } }DefaultClassResolver.register(Registration registration)方法里头针对registration的id进行了判断，如果是NAME(这里用-1表示)则注册到ObjectMap<Class, Registration> classToRegistration，如果有id不是NAME的，则注册到IntMap<Registration> idToRegistration前面提到如果registrationRequired是false，则调用classResolver.registerImplicit进行隐式注册，这里可以看到registerImplicit注册的registration的id是NAMEregistration的id是NAME与否具体在writeClass中有体现(如果要序列化的类的字段中不仅仅有基本类型，还有未注册的类，会调用这里的writeClass方法)，从代码可以看到如果是NAME，则使用的是writeName；不是NAME的则直接使用output.writeVarInt(registration.getId() + 2, true)，写入int；writeName方法第一次遇到NAME的class时会给它生成一个nameId，然后放入到IdentityObjectIntMap<Class> classToNameId中，然后写入int，再写入class.getName，第二次遇到该class的时候，由于classToNameId中已经存在nameId，因而直接写入int；但是DefaultClassResolver的reset方法在registrationRequired是false这种情况下会调用classToNameId.clear(2048)，进行清空或者resize，这个时候一旦这个方法被调用，那么下次可能无法利用classToNameId用id替代className来序列化。Kryo.writeObjectkryo-4.0.2-sources.jar!/com/esotericsoftware/kryo/Kryo.java /* Writes an object using the registered serializer. / public void writeObject (Output output, Object object) { if (output == null) throw new IllegalArgumentException(“output cannot be null.”); if (object == null) throw new IllegalArgumentException(“object cannot be null.”); beginObject(); try { if (references && writeReferenceOrNull(output, object, false)) { getRegistration(object.getClass()).getSerializer().setGenerics(this, null); return; } if (TRACE || (DEBUG && depth == 1)) log(“Write”, object); getRegistration(object.getClass()).getSerializer().write(this, output, object); } finally { if (–depth == 0 && autoReset) reset(); } } /* Resets unregistered class names, references to previously serialized or deserialized objects, and the * {@link #getGraphContext() graph context}. If {@link #setAutoReset(boolean) auto reset} is true, this method is called * automatically when an object graph has been completely serialized or deserialized. If overridden, the super method must be * called. */ public void reset () { depth = 0; if (graphContext != null) graphContext.clear(); classResolver.reset(); if (references) { referenceResolver.reset(); readObject = null; } copyDepth = 0; if (originalToCopy != null) originalToCopy.clear(2048); if (TRACE) trace(“kryo”, “Object graph complete.”); }这里要注意一下，writeObject方法在finally的时候判断如果depth为0且autoReset为true，会调用reset方法；而reset方法会调用classResolver.reset()，清空nameIdToClass以及classToNameId(classToNameId.clear(2048))小结storm默认是用kryo来进行tuple的序列化，storm额外注册了byte[].class、ListDelegate.class、ArrayList.class、HashMap.class、HashSet.class、BigInteger.class、TransactionAttempt.class、Values.class、org.apache.storm.metric.api.IMetricsConsumer.DataPoint.class、org.apache.storm.metric.api.IMetricsConsumer.TaskInfo.class、ConsList.class、BackPressureStatus.class等类型Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION(topology.fall.back.on.java.serialization)如果为true，则kryo.setRegistrationRequired(false)，也就是如果一个class没有在kryo进行注册，不会抛异常；这个命名可能存在歧义(不是使用java自身的序列化机制来进行fallback)，它实际上要表达的是对于遇到没有注册的class要不要fallback，如果不fallback则直接抛异常，如果fallback，则会进行隐式注册，在classToNameId不会被reset的前提下，第一次使用className来序列化，同时分配一个id写入classToNameId，第二次则直接使用classToNameId中获取到的id，也就相当于手工注册的效果Config.TOPOLOGY_TUPLE_SERIALIZER(topology.tuple.serializer，默认是org.apache.storm.serialization.types.ListDelegateSerializer)用于配置tuple的payload的序列化类Config.TOPOLOGY_KRYO_DECORATORS(topology.kryo.decorators)用于加载自定义的serialization，可以直接通过Config.registerDecorator注册一个IKryoDecorator，在decorate方法中对Kyro注册要序列化的classConfig.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS(topology.skip.missing.kryo.registrations，默认为false)这个属性容易跟Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION(topology.fall.back.on.java.serialization)混淆起来，前者是storm自身的属性而后者storm包装的kryo的属性(registrationRequired)；Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS配置的是在有自定义Config.TOPOLOGY_KRYO_DECORATORS的场景下，如果storm加载不到用户自定义的IKryoDecorator类时是skip还是抛异常Kryo的registrationRequired为false的话，则会自动对未注册的class进行隐式注册(注册到classToNameId)，只在第一次序列化的时候使用className，之后都用id替代，来节省空间；不过要注意的是如果Kryo的autoReset为true的话，那么classToNameId会被reset，因而隐式注册在非第一次遇到未注册的class的时候并不能一直走使用id代替className来序列化docSerializationSpark调优之Data SerializationSpark 2.0.2, double[], 使用Kyro序列化加速，和手动注册类名 ...

聊聊storm的LoggingClusterMetricsConsumer

序本文主要研究一下storm的LoggingClusterMetricsConsumerLoggingClusterMetricsConsumerstorm-2.0.0/storm-server/src/main/java/org/apache/storm/metric/LoggingClusterMetricsConsumer.javapublic class LoggingClusterMetricsConsumer implements IClusterMetricsConsumer { public static final Logger LOG = LoggerFactory.getLogger(LoggingClusterMetricsConsumer.class); static private String padding = " “; @Override public void prepare(Object registrationArgument) { } @Override public void handleDataPoints(ClusterInfo clusterInfo, Collection<DataPoint> dataPoints) { StringBuilder sb = new StringBuilder(); String header = String.format("%d\t%15s\t%40s\t”, clusterInfo.getTimestamp(), “<cluster>”, “<cluster>”); sb.append(header); logDataPoints(dataPoints, sb, header); } @Override public void handleDataPoints(SupervisorInfo supervisorInfo, Collection<DataPoint> dataPoints) { StringBuilder sb = new StringBuilder(); String header = String.format("%d\t%15s\t%40s\t", supervisorInfo.getTimestamp(), supervisorInfo.getSrcSupervisorHost(), supervisorInfo.getSrcSupervisorId()); sb.append(header); for (DataPoint p : dataPoints) { sb.delete(header.length(), sb.length()); sb.append(p.getName()) .append(padding).delete(header.length() + 23, sb.length()).append("\t") .append(p.getValue()); LOG.info(sb.toString()); } } @Override public void cleanup() { } private void logDataPoints(Collection<DataPoint> dataPoints, StringBuilder sb, String header) { for (DataPoint p : dataPoints) { sb.delete(header.length(), sb.length()); sb.append(p.getName()) .append(padding).delete(header.length() + 23, sb.length()).append("\t") .append(p.getValue()); LOG.info(sb.toString()); } }}这个是cluster级别的metrics consumer，只能在storm.yaml里头配置它的handleDataPoints供ClusterMetricsConsumerExecutor回调这里handleDataPoints仅仅是打印到日志文件storm.yaml配置## Cluster Metrics Consumersstorm.cluster.metrics.consumer.register: - class: “org.apache.storm.metric.LoggingClusterMetricsConsumer”# - class: “com.example.demo.metric.FixedLoggingClusterMetricsConsumer”# argument:# - endpoint: “metrics-collector.mycompany.org”#storm.cluster.metrics.consumer.publish.interval.secs: 5这里指定了consumer类为LoggingClusterMetricsConsumer，同时指定了publish interval为5秒cluster.xml<?xml version=“1.0” encoding=“UTF-8”?><!– Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.–><configuration monitorInterval=“60” shutdownHook=“disable” packages=“org.apache.logging.log4j.core,io.sentry.log4j2”><properties> <property name=“pattern”>%d{yyyy-MM-dd HH:mm:ss.SSS} %c{1.} %t [%p] %msg%n</property> <property name=“patternMetrics”>%d %-8r %m%n</property></properties><appenders> <RollingFile name=“A1” immediateFlush=“false” fileName="${sys:storm.log.dir}/${sys:logfile.name}" filePattern="${sys:storm.log.dir}/${sys:logfile.name}.%i.gz"> <PatternLayout> <pattern>${pattern}</pattern> </PatternLayout> <Policies> <SizeBasedTriggeringPolicy size=“100 MB”/> <!– Or every 100 MB –> </Policies> <DefaultRolloverStrategy max=“9”/> </RollingFile> <RollingFile name=“WEB-ACCESS” immediateFlush=“false” fileName="${sys:storm.log.dir}/access-web-${sys:daemon.name}.log" filePattern="${sys:storm.log.dir}/access-web-${sys:daemon.name}.log.%i.gz"> <PatternLayout> <pattern>${pattern}</pattern> </PatternLayout> <Policies> <SizeBasedTriggeringPolicy size=“100 MB”/> <!– Or every 100 MB –> </Policies> <DefaultRolloverStrategy max=“9”/> </RollingFile> <RollingFile name=“THRIFT-ACCESS” immediateFlush=“false” fileName="${sys:storm.log.dir}/access-${sys:logfile.name}" filePattern="${sys:storm.log.dir}/access-${sys:logfile.name}.%i.gz"> <PatternLayout> <pattern>${pattern}</pattern> </PatternLayout> <Policies> <SizeBasedTriggeringPolicy size=“100 MB”/> <!– Or every 100 MB –> </Policies> <DefaultRolloverStrategy max=“9”/> </RollingFile> <RollingFile name=“METRICS” fileName="${sys:storm.log.dir}/${sys:logfile.name}.metrics" filePattern="${sys:storm.log.dir}/${sys:logfile.name}.metrics.%i.gz"> <PatternLayout> <pattern>${patternMetrics}</pattern> </PatternLayout> <Policies> <SizeBasedTriggeringPolicy size=“2 MB”/> </Policies> <DefaultRolloverStrategy max=“9”/> </RollingFile> <Syslog name=“syslog” format=“RFC5424” charset=“UTF-8” host=“localhost” port=“514” protocol=“UDP” appName="[${sys:daemon.name}]" mdcId=“mdc” includeMDC=“true” facility=“LOCAL5” enterpriseNumber=“18060” newLine=“true” exceptionPattern="%rEx{full}" messageId="[${sys:user.name}:S0]" id=“storm” immediateFlush=“true” immediateFail=“true”/></appenders><loggers> <Logger name=“org.apache.storm.logging.filters.AccessLoggingFilter” level=“info” additivity=“false”> <AppenderRef ref=“WEB-ACCESS”/> <AppenderRef ref=“syslog”/> </Logger> <Logger name=“org.apache.storm.logging.ThriftAccessLogger” level=“info” additivity=“false”> <AppenderRef ref=“THRIFT-ACCESS”/> <AppenderRef ref=“syslog”/> </Logger> <Logger name=“org.apache.storm.metric.LoggingClusterMetricsConsumer” level=“info” additivity=“false”> <appender-ref ref=“METRICS”/> </Logger> <root level=“info”> <!– We log everything –> <appender-ref ref=“A1”/> <appender-ref ref=“syslog”/> <appender-ref ref=“Sentry” level=“ERROR” /> </root></loggers></configuration>cluster.xml指定了metrics logging的相关配置，这里使用的是METRICS appender，该appender是一个RollingFile，文件名为&dollar;{sys:storm.log.dir}/&dollar;{sys:logfile.name}.metrics，例如nimbus默认的logfile.name为nimbus.log，supervisor默认的logfile.name为supervisor.log，因而写入的文件为nimbus.log.metrics及supervisor.log.metrics输出实例如下2018-11-06 07:51:51,488 18628 1541490711 <cluster> <cluster> supervisors 12018-11-06 07:51:51,488 18628 1541490711 <cluster> <cluster> topologies 02018-11-06 07:51:51,489 18629 1541490711 <cluster> <cluster> slotsTotal 42018-11-06 07:51:51,489 18629 1541490711 <cluster> <cluster> slotsUsed 02018-11-06 07:51:51,489 18629 1541490711 <cluster> <cluster> slotsFree 42018-11-06 07:51:51,489 18629 1541490711 <cluster> <cluster> executorsTotal 02018-11-06 07:51:51,489 18629 1541490711 <cluster> <cluster> tasksTotal 02018-11-06 07:51:51,496 18636 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 slotsTotal 42018-11-06 07:51:51,497 18637 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 slotsUsed 02018-11-06 07:51:51,497 18637 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 totalMem 3072.02018-11-06 07:51:51,497 18637 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 totalCpu 400.02018-11-06 07:51:51,498 18638 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 usedMem 0.02018-11-06 07:51:51,498 18638 1541490711 192.168.99.100 5bbd576d-218c-4365-ac5e-865b1f6e9b29 usedCpu 0.0ClusterMetricsConsumerExecutorstorm-2.0.0/storm-server/src/main/java/org/apache/storm/metric/ClusterMetricsConsumerExecutor.javapublic class ClusterMetricsConsumerExecutor { public static final Logger LOG = LoggerFactory.getLogger(ClusterMetricsConsumerExecutor.class); private static final String ERROR_MESSAGE_PREPARATION_CLUSTER_METRICS_CONSUMER_FAILED = “Preparation of Cluster Metrics Consumer failed. " + “Please check your configuration and/or corresponding systems and relaunch Nimbus. " + “Skipping handle metrics.”; private IClusterMetricsConsumer metricsConsumer; private String consumerClassName; private Object registrationArgument; public ClusterMetricsConsumerExecutor(String consumerClassName, Object registrationArgument) { this.consumerClassName = consumerClassName; this.registrationArgument = registrationArgument; } public void prepare() { try { metricsConsumer = (IClusterMetricsConsumer) Class.forName(consumerClassName).newInstance(); metricsConsumer.prepare(registrationArgument); } catch (Exception e) { LOG.error(“Could not instantiate or prepare Cluster Metrics Consumer with fully qualified name " + consumerClassName, e); if (metricsConsumer != null) { metricsConsumer.cleanup(); } metricsConsumer = null; } } public void handleDataPoints(final IClusterMetricsConsumer.ClusterInfo clusterInfo, final Collection<DataPoint> dataPoints) { if (metricsConsumer == null) { LOG.error(ERROR_MESSAGE_PREPARATION_CLUSTER_METRICS_CONSUMER_FAILED); return; } try { metricsConsumer.handleDataPoints(clusterInfo, dataPoints); } catch (Throwable e) { LOG.error(“Error while handling cluster data points, consumer class: " + consumerClassName, e); } } public void handleDataPoints(final IClusterMetricsConsumer.SupervisorInfo supervisorInfo, final Collection<DataPoint> dataPoints) { if (metricsConsumer == null) { LOG.error(ERROR_MESSAGE_PREPARATION_CLUSTER_METRICS_CONSUMER_FAILED); return; } try { metricsConsumer.handleDataPoints(supervisorInfo, dataPoints); } catch (Throwable e) { LOG.error(“Error while handling cluster data points, consumer class: " + consumerClassName, e); } } public void cleanup() { if (metricsConsumer != null) { metricsConsumer.cleanup(); } }}ClusterMetricsConsumerExecutor在prepare的时候，根据consumerClassName来实例化IClusterMetricsConsumer的实现类，之后传入调用metricsConsumer.prepare(registrationArgument)做些准备ClusterMetricsConsumerExecutor的handleDataPoints方法实际上是代理了metricsConsumer的handleDataPoints该handleDataPoints方法有两个，他们都有共同的参数dataPoints，另外一个不同的参数，是一个传的是ClusterInfo，一个是SupervisorInfo，分别用于nimbus及supervisorNimbus.launchServerstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java public void launchServer() throws Exception { try { BlobStore store = blobStore; IStormClusterState state = stormClusterState; NimbusInfo hpi = nimbusHostPortInfo; LOG.info(“Starting Nimbus with conf {}”, ConfigUtils.maskPasswords(conf)); validator.prepare(conf); //…… timer.scheduleRecurring(0, ObjectReader.getInt(conf.get(DaemonConfig.STORM_CLUSTER_METRICS_CONSUMER_PUBLISH_INTERVAL_SECS)), () -> { try { if (isLeader()) { sendClusterMetricsToExecutors(); } } catch (Exception e) { throw new RuntimeException(e); } }); timer.scheduleRecurring(5, 5, clusterMetricSet); } catch (Exception e) { if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)) { throw e; } if (Utils.exceptionCauseIsInstanceOf(InterruptedIOException.class, e)) { throw e; } LOG.error(“Error on initialization of nimbus”, e); Utils.exitProcess(13, “Error on initialization of nimbus”); } } private boolean isLeader() throws Exception { return leaderElector.isLeader(); }Nimbus的launchServer方法创建了一个定时任务，如果是leader节点，则调用sendClusterMetricsToExecutors方法发送相关metrics到metrics consumer定时任务的调度时间间隔为DaemonConfig.STORM_CLUSTER_METRICS_CONSUMER_PUBLISH_INTERVAL_SECS(storm.cluster.metrics.consumer.publish.interval.secs)，在defaults.yaml文件中默认为60除了发送metrics到metrics consumer，它还有一个定时任务，每隔5秒调用一下ClusterSummaryMetricSet这个线程Nimbus.sendClusterMetricsToExecutorsstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java private void sendClusterMetricsToExecutors() throws Exception { ClusterInfo clusterInfo = mkClusterInfo(); ClusterSummary clusterSummary = getClusterInfoImpl(); List<DataPoint> clusterMetrics = extractClusterMetrics(clusterSummary); Map<IClusterMetricsConsumer.SupervisorInfo, List<DataPoint>> supervisorMetrics = extractSupervisorMetrics(clusterSummary); for (ClusterMetricsConsumerExecutor consumerExecutor : clusterConsumerExceutors) { consumerExecutor.handleDataPoints(clusterInfo, clusterMetrics); for (Entry<IClusterMetricsConsumer.SupervisorInfo, List<DataPoint>> entry : supervisorMetrics.entrySet()) { consumerExecutor.handleDataPoints(entry.getKey(), entry.getValue()); } } }nimbus的sendClusterMetricsToExecutors方法通过extractClusterMetrics及extractSupervisorMetrics提取相关metrics，然后调用consumerExecutor.handleDataPoints传递数据ClusterSummaryMetricSetstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java private class ClusterSummaryMetricSet implements Runnable { private static final int CACHING_WINDOW = 5; private final ClusterSummaryMetrics clusterSummaryMetrics = new ClusterSummaryMetrics(); private final Function<String, Histogram> registerHistogram = (name) -> { //This histogram reflects the data distribution across only one ClusterSummary, i.e., // data distribution across all entities of a type (e.g., data from all nimbus/topologies) at one moment. // Hence we use half of the CACHING_WINDOW time to ensure it retains only data from the most recent update final Histogram histogram = new Histogram(new SlidingTimeWindowReservoir(CACHING_WINDOW / 2, TimeUnit.SECONDS)); clusterSummaryMetrics.put(name, histogram); return histogram; }; private volatile boolean active = false; //NImbus metrics distribution private final Histogram nimbusUptime = registerHistogram.apply(“nimbuses:uptime-secs”); //Supervisor metrics distribution private final Histogram supervisorsUptime = registerHistogram.apply(“supervisors:uptime-secs”); private final Histogram supervisorsNumWorkers = registerHistogram.apply(“supervisors:num-workers”); private final Histogram supervisorsNumUsedWorkers = registerHistogram.apply(“supervisors:num-used-workers”); private final Histogram supervisorsUsedMem = registerHistogram.apply(“supervisors:used-mem”); private final Histogram supervisorsUsedCpu = registerHistogram.apply(“supervisors:used-cpu”); private final Histogram supervisorsFragmentedMem = registerHistogram.apply(“supervisors:fragmented-mem”); private final Histogram supervisorsFragmentedCpu = registerHistogram.apply(“supervisors:fragmented-cpu”); //Topology metrics distribution private final Histogram topologiesNumTasks = registerHistogram.apply(“topologies:num-tasks”); private final Histogram topologiesNumExecutors = registerHistogram.apply(“topologies:num-executors”); private final Histogram topologiesNumWorker = registerHistogram.apply(“topologies:num-workers”); private final Histogram topologiesUptime = registerHistogram.apply(“topologies:uptime-secs”); private final Histogram topologiesReplicationCount = registerHistogram.apply(“topologies:replication-count”); private final Histogram topologiesRequestedMemOnHeap = registerHistogram.apply(“topologies:requested-mem-on-heap”); private final Histogram topologiesRequestedMemOffHeap = registerHistogram.apply(“topologies:requested-mem-off-heap”); private final Histogram topologiesRequestedCpu = registerHistogram.apply(“topologies:requested-cpu”); private final Histogram topologiesAssignedMemOnHeap = registerHistogram.apply(“topologies:assigned-mem-on-heap”); private final Histogram topologiesAssignedMemOffHeap = registerHistogram.apply(“topologies:assigned-mem-off-heap”); private final Histogram topologiesAssignedCpu = registerHistogram.apply(“topologies:assigned-cpu”); private final StormMetricsRegistry metricsRegistry; /** * Constructor to put all items in ClusterSummary in MetricSet as a metric. * All metrics are derived from a cached ClusterSummary object, * expired {@link ClusterSummaryMetricSet#CACHING_WINDOW} seconds after first query in a while from reporters. * In case of {@link com.codahale.metrics.ScheduledReporter}, CACHING_WINDOW should be set shorter than * reporting interval to avoid outdated reporting. / ClusterSummaryMetricSet(StormMetricsRegistry metricsRegistry) { this.metricsRegistry = metricsRegistry; //Break the code if out of sync to thrift protocol assert ClusterSummary._Fields.values().length == 3 && ClusterSummary._Fields.findByName(“supervisors”) == ClusterSummary._Fields.SUPERVISORS && ClusterSummary._Fields.findByName(“topologies”) == ClusterSummary._Fields.TOPOLOGIES && ClusterSummary._Fields.findByName(“nimbuses”) == ClusterSummary._Fields.NIMBUSES; final CachedGauge<ClusterSummary> cachedSummary = new CachedGauge<ClusterSummary>(CACHING_WINDOW, TimeUnit.SECONDS) { @Override protected ClusterSummary loadValue() { try { ClusterSummary newSummary = getClusterInfoImpl(); LOG.debug(“The new summary is {}”, newSummary); / * Update histograms based on the new summary. Most common implementation of Reporter reports Gauges before * Histograms. Because DerivativeGauge will trigger cache refresh upon reporter’s query, histogram will also be * updated before query */ updateHistogram(newSummary); return newSummary; } catch (Exception e) { LOG.warn(“Get cluster info exception.”, e); throw new RuntimeException(e); } } }; clusterSummaryMetrics.put(“cluster:num-nimbus-leaders”, new DerivativeGauge<ClusterSummary, Long>(cachedSummary) { @Override protected Long transform(ClusterSummary clusterSummary) { return clusterSummary.get_nimbuses().stream() .filter(NimbusSummary::is_isLeader) .count(); } }); clusterSummaryMetrics.put(“cluster:num-nimbuses”, new DerivativeGauge<ClusterSummary, Integer>(cachedSummary) { @Override protected Integer transform(ClusterSummary clusterSummary) { return clusterSummary.get_nimbuses_size(); } }); clusterSummaryMetrics.put(“cluster:num-supervisors”, new DerivativeGauge<ClusterSummary, Integer>(cachedSummary) { @Override protected Integer transform(ClusterSummary clusterSummary) { return clusterSummary.get_supervisors_size(); } }); clusterSummaryMetrics.put(“cluster:num-topologies”, new DerivativeGauge<ClusterSummary, Integer>(cachedSummary) { @Override protected Integer transform(ClusterSummary clusterSummary) { return clusterSummary.get_topologies_size(); } }); clusterSummaryMetrics.put(“cluster:num-total-workers”, new DerivativeGauge<ClusterSummary, Integer>(cachedSummary) { @Override protected Integer transform(ClusterSummary clusterSummary) { return clusterSummary.get_supervisors().stream() .mapToInt(SupervisorSummary::get_num_workers) .sum(); } }); clusterSummaryMetrics.put(“cluster:num-total-used-workers”, new DerivativeGauge<ClusterSummary, Integer>(cachedSummary) { @Override protected Integer transform(ClusterSummary clusterSummary) { return clusterSummary.get_supervisors().stream() .mapToInt(SupervisorSummary::get_num_used_workers) .sum(); } }); clusterSummaryMetrics.put(“cluster:total-fragmented-memory-non-negative”, new DerivativeGauge<ClusterSummary, Double>(cachedSummary) { @Override protected Double transform(ClusterSummary clusterSummary) { return clusterSummary.get_supervisors().stream() //Filtered negative value .mapToDouble(supervisorSummary -> Math.max(supervisorSummary.get_fragmented_mem(), 0)) .sum(); } }); clusterSummaryMetrics.put(“cluster:total-fragmented-cpu-non-negative”, new DerivativeGauge<ClusterSummary, Double>(cachedSummary) { @Override protected Double transform(ClusterSummary clusterSummary) { return clusterSummary.get_supervisors().stream() //Filtered negative value .mapToDouble(supervisorSummary -> Math.max(supervisorSummary.get_fragmented_cpu(), 0)) .sum(); } }); } private void updateHistogram(ClusterSummary newSummary) { for (NimbusSummary nimbusSummary : newSummary.get_nimbuses()) { nimbusUptime.update(nimbusSummary.get_uptime_secs()); } for (SupervisorSummary summary : newSummary.get_supervisors()) { supervisorsUptime.update(summary.get_uptime_secs()); supervisorsNumWorkers.update(summary.get_num_workers()); supervisorsNumUsedWorkers.update(summary.get_num_used_workers()); supervisorsUsedMem.update(Math.round(summary.get_used_mem())); supervisorsUsedCpu.update(Math.round(summary.get_used_cpu())); supervisorsFragmentedMem.update(Math.round(summary.get_fragmented_mem())); supervisorsFragmentedCpu.update(Math.round(summary.get_fragmented_cpu())); } for (TopologySummary summary : newSummary.get_topologies()) { topologiesNumTasks.update(summary.get_num_tasks()); topologiesNumExecutors.update(summary.get_num_executors()); topologiesNumWorker.update(summary.get_num_workers()); topologiesUptime.update(summary.get_uptime_secs()); topologiesReplicationCount.update(summary.get_replication_count()); topologiesRequestedMemOnHeap.update(Math.round(summary.get_requested_memonheap())); topologiesRequestedMemOffHeap.update(Math.round(summary.get_requested_memoffheap())); topologiesRequestedCpu.update(Math.round(summary.get_requested_cpu())); topologiesAssignedMemOnHeap.update(Math.round(summary.get_assigned_memonheap())); topologiesAssignedMemOffHeap.update(Math.round(summary.get_assigned_memoffheap())); topologiesAssignedCpu.update(Math.round(summary.get_assigned_cpu())); } } void setActive(final boolean active) { if (this.active != active) { this.active = active; if (active) { metricsRegistry.registerAll(clusterSummaryMetrics); } else { metricsRegistry.removeAll(clusterSummaryMetrics); } } } @Override public void run() { try { setActive(isLeader()); } catch (Exception e) { throw new RuntimeException(e); } } }这个线程主要是调用setActive方法，做的工作的话，就是不断判断节点状态变化，如果leader发生变化，自己是leader则注册clusterSummaryMetrics，如果自己变成不是leader则删除掉clusterSummaryMetricsclusterSummaryMetrics添加了cluster:num-nimbus-leaders、cluster:num-nimbuses、cluster:num-supervisors、cluster:num-topologies、cluster:num-total-workers、cluster:num-total-used-workers、cluster:total-fragmented-memory-non-negative、cluster:total-fragmented-cpu-non-negative这几个指标小结LoggingClusterMetricsConsumer消费的是cluster级别的指标，它消费了指标数据，然后打印到日志文件，log4j2的配置读取的是cluster.xml，最后写入的文件是nimbus.log.metrics、supervisor.log.metricsNimbus在launchServer的时候，会建立一个定时任务，在当前节点是leader的情况下，定时发送metrics指标到ClusterMetricsConsumerExecutor，然后间接回调LoggingClusterMetricsConsumer的handleDataPoints方法，把数据打印到日志handleDataPoints处理两类info，一类是ClusterInfo，一类是SupervisorInfo；这里要注意一下定时任务是在当前节点是leader的情况下才会sendClusterMetricsToExecutors的，正常情况nimbus与supervisor不在同一个节点上，因而supervisor.log.metrics可能是空的docStorm Metrics ...

聊聊storm的IEventLogger

序本文主要研究一下storm的IEventLoggerIEventLoggerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/metric/IEventLogger.java/** * EventLogger interface for logging the event info to a sink like log file or db for inspecting the events via UI for debugging. /public interface IEventLogger { void prepare(Map<String, Object> conf, Map<String, Object> arguments, TopologyContext context); /* * This method would be invoked when the {@link EventLoggerBolt} receives a tuple from the spouts or bolts that has event logging * enabled. * * @param e the event / void log(EventInfo e); void close(); /* * A wrapper for the fields that we would log. / class EventInfo { private long ts; private String component; private int task; private Object messageId; private List<Object> values; public EventInfo(long ts, String component, int task, Object messageId, List<Object> values) { this.ts = ts; this.component = component; this.task = task; this.messageId = messageId; this.values = values; } public long getTs() { return ts; } public String getComponent() { return component; } public int getTask() { return task; } public Object getMessageId() { return messageId; } public List<Object> getValues() { return values; } /* * Returns a default formatted string with fields separated by “,” * * @return a default formatted string with fields separated by “,” / @Override public String toString() { return new Date(ts).toString() + “,” + component + “,” + String.valueOf(task) + “,” + (messageId == null ? "" : messageId.toString()) + “,” + values.toString(); } }}IEventLogger定义了log方法，同时也定义了EventInfo对象FileBasedEventLoggerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/metric/FileBasedEventLogger.javapublic class FileBasedEventLogger implements IEventLogger { private static final Logger LOG = LoggerFactory.getLogger(FileBasedEventLogger.class); private static final int FLUSH_INTERVAL_MILLIS = 1000; private Path eventLogPath; private BufferedWriter eventLogWriter; private ScheduledExecutorService flushScheduler; private volatile boolean dirty = false; private void initLogWriter(Path logFilePath) { try { LOG.info(“logFilePath {}”, logFilePath); eventLogPath = logFilePath; eventLogWriter = Files.newBufferedWriter(eventLogPath, StandardCharsets.UTF_8, StandardOpenOption.CREATE, StandardOpenOption.WRITE, StandardOpenOption.APPEND); } catch (IOException e) { LOG.error(“Error setting up FileBasedEventLogger.”, e); throw new RuntimeException(e); } } private void setUpFlushTask() { ThreadFactory threadFactory = new ThreadFactoryBuilder() .setNameFormat(“event-logger-flush-%d”) .setDaemon(true) .build(); flushScheduler = Executors.newSingleThreadScheduledExecutor(threadFactory); Runnable runnable = new Runnable() { @Override public void run() { try { if (dirty) { eventLogWriter.flush(); dirty = false; } } catch (IOException ex) { LOG.error(“Error flushing " + eventLogPath, ex); throw new RuntimeException(ex); } } }; flushScheduler.scheduleAtFixedRate(runnable, FLUSH_INTERVAL_MILLIS, FLUSH_INTERVAL_MILLIS, TimeUnit.MILLISECONDS); } @Override public void prepare(Map<String, Object> conf, Map<String, Object> arguments, TopologyContext context) { String stormId = context.getStormId(); int port = context.getThisWorkerPort(); / * Include the topology name & worker port in the file name so that * multiple event loggers can log independently. / String workersArtifactRoot = ConfigUtils.workerArtifactsRoot(conf, stormId, port); Path path = Paths.get(workersArtifactRoot, “events.log”); File dir = path.toFile().getParentFile(); if (!dir.exists()) { dir.mkdirs(); } initLogWriter(path); setUpFlushTask(); } @Override public void log(EventInfo event) { try { //TODO: file rotation eventLogWriter.write(buildLogMessage(event)); eventLogWriter.newLine(); dirty = true; } catch (IOException ex) { LOG.error(“Error logging event {}”, event, ex); throw new RuntimeException(ex); } } protected String buildLogMessage(EventInfo event) { return event.toString(); } @Override public void close() { try { eventLogWriter.close(); } catch (IOException ex) { LOG.error(“Error closing event log.”, ex); } closeFlushScheduler(); } private void closeFlushScheduler() { if (flushScheduler != null) { flushScheduler.shutdown(); try { if (!flushScheduler.awaitTermination(2, TimeUnit.SECONDS)) { flushScheduler.shutdownNow(); } } catch (InterruptedException ie) { // (Re-)Cancel if current thread also interrupted flushScheduler.shutdownNow(); // Preserve interrupt status Thread.currentThread().interrupt(); } } }}IEventLogger默认的实现为FileBasedEventLogger，它启动一个定时任务，每隔FLUSH_INTERVAL_MILLIS时间将数据flush到磁盘(如果是dirty的话)默认的文件路径为workersArtifactRoot目录下的events.logStormCommon.addEventLoggerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java public static void addEventLogger(Map<String, Object> conf, StormTopology topology) { Integer numExecutors = ObjectReader.getInt(conf.get(Config.TOPOLOGY_EVENTLOGGER_EXECUTORS), ObjectReader.getInt(conf.get(Config.TOPOLOGY_WORKERS))); if (numExecutors == null || numExecutors == 0) { return; } HashMap<String, Object> componentConf = new HashMap<>(); componentConf.put(Config.TOPOLOGY_TASKS, numExecutors); componentConf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))); Bolt eventLoggerBolt = Thrift.prepareSerializedBoltDetails( eventLoggerInputs(topology), new EventLoggerBolt(), null, numExecutors, componentConf); for (Object component : allComponents(topology).values()) { ComponentCommon common = getComponentCommon(component); common.put_to_streams(EVENTLOGGER_STREAM_ID, Thrift.outputFields(eventLoggerBoltFields())); } topology.put_to_bolts(EVENTLOGGER_COMPONENT_ID, eventLoggerBolt); } public static List<String> eventLoggerBoltFields() { return Arrays.asList(EventLoggerBolt.FIELD_COMPONENT_ID, EventLoggerBolt.FIELD_MESSAGE_ID, EventLoggerBolt.FIELD_TS, EventLoggerBolt.FIELD_VALUES); } public static Map<GlobalStreamId, Grouping> eventLoggerInputs(StormTopology topology) { Map<GlobalStreamId, Grouping> inputs = new HashMap<GlobalStreamId, Grouping>(); Set<String> allIds = new HashSet<String>(); allIds.addAll(topology.get_bolts().keySet()); allIds.addAll(topology.get_spouts().keySet()); for (String id : allIds) { inputs.put(Utils.getGlobalStreamId(id, EVENTLOGGER_STREAM_ID), Thrift.prepareFieldsGrouping(Arrays.asList(“component-id”))); } return inputs; }这里从Config.TOPOLOGY_EVENTLOGGER_EXECUTORS读取numExecutors，如果为null则使用Config.TOPOLOGY_WORKERS的值，默认是0，即禁用event logger这里还读取了Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS作为Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS这里创建了EventLoggerBolt，该bolt使用了fieldsGrouping(“component-id”)以及Utils.getGlobalStreamId(id, EVENTLOGGER_STREAM_ID)将所有的spout及bolt都作为该bolt的inputs，从而接收所有的tuple，其字段为ventLoggerBolt.FIELD_COMPONENT_ID,EventLoggerBolt.FIELD_MESSAGE_ID,EventLoggerBolt.FIELD_TS, EventLoggerBolt.FIELD_VALUES；同时也会对每个spout或bolt添加一个输出到名为EVENTLOGGER_STREAM_ID的stream的声明，这样使得数据得以流向EventLoggerBoltEventLoggerBoltstorm-2.0.0/storm-client/src/jvm/org/apache/storm/metric/EventLoggerBolt.javapublic class EventLoggerBolt implements IBolt { / The below field declarations are also used in common.clj to define the event logger output fields / public static final String FIELD_TS = “ts”; public static final String FIELD_VALUES = “values”; public static final String FIELD_COMPONENT_ID = “component-id”; public static final String FIELD_MESSAGE_ID = “message-id”; private static final Logger LOG = LoggerFactory.getLogger(EventLoggerBolt.class); private List<IEventLogger> eventLoggers; @Override public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) { LOG.info(“EventLoggerBolt prepare called”); eventLoggers = new ArrayList<>(); List<Map<String, Object>> registerInfo = (List<Map<String, Object>>) topoConf.get(Config.TOPOLOGY_EVENT_LOGGER_REGISTER); if (registerInfo != null && !registerInfo.isEmpty()) { initializeEventLoggers(topoConf, context, registerInfo); } else { initializeDefaultEventLogger(topoConf, context); } } @Override public void execute(Tuple input) { LOG.debug(”* EventLoggerBolt got tuple from sourceComponent {}, with values {}", input.getSourceComponent(), input.getValues()); Object msgId = input.getValueByField(FIELD_MESSAGE_ID); EventInfo eventInfo = new EventInfo(input.getLongByField(FIELD_TS), input.getSourceComponent(), input.getSourceTask(), msgId, (List<Object>) input.getValueByField(FIELD_VALUES)); for (IEventLogger eventLogger : eventLoggers) { eventLogger.log(eventInfo); } } @Override public void cleanup() { for (IEventLogger eventLogger : eventLoggers) { eventLogger.close(); } } private void initializeEventLoggers(Map<String, Object> topoConf, TopologyContext context, List<Map<String, Object>> registerInfo) { for (Map<String, Object> info : registerInfo) { String className = (String) info.get(TOPOLOGY_EVENT_LOGGER_CLASS); Map<String, Object> arguments = (Map<String, Object>) info.get(TOPOLOGY_EVENT_LOGGER_ARGUMENTS); IEventLogger eventLogger; try { eventLogger = (IEventLogger) Class.forName(className).newInstance(); } catch (Exception e) { throw new RuntimeException(“Could not instantiate a class listed in config under section " + Config.TOPOLOGY_EVENT_LOGGER_REGISTER + " with fully qualified name " + className, e); } eventLogger.prepare(topoConf, arguments, context); eventLoggers.add(eventLogger); } } private void initializeDefaultEventLogger(Map<String, Object> topoConf, TopologyContext context) { FileBasedEventLogger eventLogger = new FileBasedEventLogger(); eventLogger.prepare(topoConf, null, context); eventLoggers.add(eventLogger); }}EventLoggerBolt在prepare的时候，从topoConf读取Config.TOPOLOGY_EVENT_LOGGER_REGISTER信息，如果registerInfo为空的话则使用默认的FileBasedEventLogger，否则按registerInfo中注册的eventLoggers来初始化这里的execute方法就是挨个遍历eventLoggers，然后调用log方法小结要开启EventLogger的话，要设置Config.TOPOLOGY_EVENTLOGGER_EXECUTORS的值大于0(conf.setNumEventLoggers)，默认为0，即禁用。开启了event logger的话，可以点击spout或bolt的debug，然后打开events链接，就可以在界面上查看debug期间的tuple数据。设置Config.TOPOLOGY_EVENTLOGGER_EXECUTORS大于0了之后，如果没有自己设置Config.TOPOLOGY_EVENT_LOGGER_REGISTER，则默认启用的是FileBasedEventLogger，当开启spout或bolt的debug的时候，会将EventInfo打印到workersArtifactRoot目录下的events.log如果自定义了Config.TOPOLOGY_EVENT_LOGGER_REGISTER(conf.registerEventLogger)，则StormCommon采用的是该配置来初始化EventLogger，默认的FileBasedEventLogger如果没有被设置进去的话，则不会被初始化；StormCommon在addEventLogger的时候，对所有的spout及bolt增加一个declareStream，输出到EVENTLOGGER_STREAM_ID；同时对EventLoggerBolt通过类似fieldsGrouping(componentId,Utils.getGlobalStreamId(id, EVENTLOGGER_STREAM_ID),new Fields(“component-id”))将所有的spout及bolt作为inputs；输入到EventLoggerBolt的tuple的字段为ventLoggerBolt.FIELD_COMPONENT_ID,EventLoggerBolt.FIELD_MESSAGE_ID,EventLoggerBolt.FIELD_TS, EventLoggerBolt.FIELD_VALUESdocSTORM-954 Topology Event InspectorTopology event inspector ...

[case44]聊聊storm trident的operations

序本文主要研究一下storm trident的operationsfunction filter projectionFunctionstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/Function.javapublic interface Function extends EachOperation { /** * Performs the function logic on an individual tuple and emits 0 or more tuples. * * @param tuple The incoming tuple * @param collector A collector instance that can be used to emit tuples / void execute(TridentTuple tuple, TridentCollector collector);}Function定义了execute方法，它发射的字段会追加到input tuple中Filterstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/Filter.javapublic interface Filter extends EachOperation { /* * Determines if a tuple should be filtered out of a stream * * @param tuple the tuple being evaluated * @return false to drop the tuple, true to keep the tuple / boolean isKeep(TridentTuple tuple);}Filter提供一个isKeep方法，用来决定该tuple是否输出projectionstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * Filters out fields from a stream, resulting in a Stream containing only the fields specified by keepFields. * * For example, if you had a Stream mystream containing the fields ["a", "b", "c","d"], calling" * * java * mystream.project(new Fields("b", "d")) * * * would produce a stream containing only the fields ["b", "d"]. * * * @param keepFields The fields in the Stream to keep * @return / public Stream project(Fields keepFields) { projectionValidation(keepFields); return _topology.addSourcedNode(this, new ProcessorNode(_topology.getUniqueStreamId(), _name, keepFields, new Fields(), new ProjectedProcessor(keepFields))); }这里使用了ProjectedProcessor来进行projection操作repartitioning operationspartitionstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * @param partitioner * @return / public Stream partition(CustomStreamGrouping partitioner) { return partition(Grouping.custom_serialized(Utils.javaSerialize(partitioner))); }这里使用了CustomStreamGroupingpartitionBystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * @param fields * @return / public Stream partitionBy(Fields fields) { projectionValidation(fields); return partition(Grouping.fields(fields.toList())); }这里使用Grouping.fieldsidentityPartitionstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * @return / public Stream identityPartition() { return partition(new IdentityGrouping()); }这里使用IdentityGroupingshufflestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * Use random round robin algorithm to evenly redistribute tuples across all target partitions * * @return / public Stream shuffle() { return partition(Grouping.shuffle(new NullStruct())); }这里使用Grouping.shufflelocalOrShufflestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * Use random round robin algorithm to evenly redistribute tuples across all target partitions, with a preference * for local tasks. * * @return / public Stream localOrShuffle() { return partition(Grouping.local_or_shuffle(new NullStruct())); }这里使用Grouping.local_or_shuffleglobalstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * All tuples are sent to the same partition. The same partition is chosen for all batches in the stream. * @return / public Stream global() { // use this instead of storm’s built in one so that we can specify a singleemitbatchtopartition // without knowledge of storm’s internals return partition(new GlobalGrouping()); }这里使用GlobalGroupingbatchGlobalstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * All tuples in the batch are sent to the same partition. Different batches in the stream may go to different * partitions. * * @return / public Stream batchGlobal() { // the first field is the batch id return partition(new IndexHashGrouping(0)); }这里使用IndexHashGrouping，是对整个batch维度的repartitionbroadcaststorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Repartitioning Operation * * Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do * a stateQuery on every partition of data. * * @return / public Stream broadcast() { return partition(Grouping.all(new NullStruct())); }这里使用Grouping.allgroupBystorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java /* * ## Grouping Operation * * @param fields * @return */ public GroupedStream groupBy(Fields fields) { projectionValidation(fields); return new GroupedStream(this, fields); }这里返回的是GroupedStreamaggregatorsstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/Stream.java //partition aggregate public Stream partitionAggregate(Aggregator agg, Fields functionFields) { return partitionAggregate(null, agg, functionFields); } public Stream partitionAggregate(CombinerAggregator agg, Fields functionFields) { return partitionAggregate(null, agg, functionFields); } public Stream partitionAggregate(Fields inputFields, CombinerAggregator agg, Fields functionFields) { projectionValidation(inputFields); return chainedAgg() .partitionAggregate(inputFields, agg, functionFields) .chainEnd(); } public Stream partitionAggregate(ReducerAggregator agg, Fields functionFields) { return partitionAggregate(null, agg, functionFields); } public Stream partitionAggregate(Fields inputFields, ReducerAggregator agg, Fields functionFields) { projectionValidation(inputFields); return chainedAgg() .partitionAggregate(inputFields, agg, functionFields) .chainEnd(); } //aggregate public Stream aggregate(Fields inputFields, Aggregator agg, Fields functionFields) { projectionValidation(inputFields); return chainedAgg() .aggregate(inputFields, agg, functionFields) .chainEnd(); } public Stream aggregate(Fields inputFields, CombinerAggregator agg, Fields functionFields) { projectionValidation(inputFields); return chainedAgg() .aggregate(inputFields, agg, functionFields) .chainEnd(); } public Stream aggregate(Fields inputFields, ReducerAggregator agg, Fields functionFields) { projectionValidation(inputFields); return chainedAgg() .aggregate(inputFields, agg, functionFields) .chainEnd(); } //persistent aggregate public TridentState persistentAggregate(StateFactory stateFactory, CombinerAggregator agg, Fields functionFields) { return persistentAggregate(new StateSpec(stateFactory), agg, functionFields); } public TridentState persistentAggregate(StateSpec spec, CombinerAggregator agg, Fields functionFields) { return persistentAggregate(spec, null, agg, functionFields); } public TridentState persistentAggregate(StateFactory stateFactory, Fields inputFields, CombinerAggregator agg, Fields functionFields) { return persistentAggregate(new StateSpec(stateFactory), inputFields, agg, functionFields); } public TridentState persistentAggregate(StateSpec spec, Fields inputFields, CombinerAggregator agg, Fields functionFields) { projectionValidation(inputFields); // replaces normal aggregation here with a global grouping because it needs to be consistent across batches return new ChainedAggregatorDeclarer(this, new GlobalAggScheme()) .aggregate(inputFields, agg, functionFields) .chainEnd() .partitionPersist(spec, functionFields, new CombinerAggStateUpdater(agg), functionFields); } public TridentState persistentAggregate(StateFactory stateFactory, ReducerAggregator agg, Fields functionFields) { return persistentAggregate(new StateSpec(stateFactory), agg, functionFields); } public TridentState persistentAggregate(StateSpec spec, ReducerAggregator agg, Fields functionFields) { return persistentAggregate(spec, null, agg, functionFields); } public TridentState persistentAggregate(StateFactory stateFactory, Fields inputFields, ReducerAggregator agg, Fields functionFields) { return persistentAggregate(new StateSpec(stateFactory), inputFields, agg, functionFields); } public TridentState persistentAggregate(StateSpec spec, Fields inputFields, ReducerAggregator agg, Fields functionFields) { projectionValidation(inputFields); return global().partitionPersist(spec, inputFields, new ReducerAggStateUpdater(agg), functionFields); }trident的aggregators主要分为三类，分别是partitionAggregate、aggregate、persistentAggregate；aggregator操作会改变输出partitionAggregate其作用的粒度为每个partition，而非整个batchaggregrate操作作用的粒度为batch，对每个batch，它先使用global操作将该batch的tuple从所有partition合并到一个partition，最后再对batch进行aggregation操作；这里提供了三类参数，分别是Aggregator、CombinerAggregator、ReducerAggregator；调用stream.aggregrate方法时，相当于一次global aggregation，此时使用Aggregator或ReducerAggregator时，stream会先将tuple划分到一个partition，然后再进行aggregate操作；而使用CombinerAggregator时，trident会进行优化，先对每个partition进行局部的aggregate操作，然后再划分到一个partition，最后再进行aggregate操作，因而相对Aggregator或ReducerAggregator可以节省网络传输耗时persistentAggregate操作会对stream上所有batch的tuple进行aggretation，然后将结果存储在state中Aggregatorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/Aggregator.javapublic interface Aggregator<T> extends Operation { T init(Object batchId, TridentCollector collector); void aggregate(T val, TridentTuple tuple, TridentCollector collector); void complete(T val, TridentCollector collector);}Aggregator首先会调用init进行初始化，然后通过参数传递给aggregate以及complete方法对于batch partition中的每个tuple执行一次aggregate；当batch partition中的tuple执行完aggregate之后执行complete方法假设自定义Aggregator为累加操作，那么对于[4]、[7]、[8]这批tuple，init为0，对于[4]，val=0，0+4=4；对于[7]，val=4，4+7=11；对于[8]，val=11，11+8=19；然后batch结束，val=19，此时执行complete，可以使用collector发射数据CombinerAggregatorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/CombinerAggregator.javapublic interface CombinerAggregator<T> extends Serializable { T init(TridentTuple tuple); T combine(T val1, T val2); T zero();}CombinerAggregator每收到一个tuple，就调用init获取当前tuple的值，调用combine操作使用前一个combine的结果(没有的话取zero的值)与init取得的值进行新的combine操作，如果该partition中没有tuple，则返回zero方法的值假设combine为累加操作，zero返回0，那么对于[4]、[7]、[8]这批tuple，init值分别是4、7、8，对于[4]，没有前一个combine结果，于是val1=0，val2=4，combine结果为4；对于[7]，val1=4，val2=7，combine结果为11；对于[8]，val1为11，val2为8，combine结果为19CombinerAggregator操作的网络开销相对较低，因此性能比其他两类aggratator好ReducerAggregatorstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/operation/ReducerAggregator.javapublic interface ReducerAggregator<T> extends Serializable { T init(); T reduce(T curr, TridentTuple tuple);}ReducerAggregator在对一批tuple进行计算时，先调用一次init获取初始值，然后再执行reduce操作，curr值为前一次reduce操作的值，没有的话，就是init值假设reduce为累加操作，init返回0，那么对于[4]、[7]、[8]这批tuple，对于[4]，init为0，然后curr=0，先是0+4=4；对于[7]，curr为4，就是4+7=11；对于[8]，curr为11，最后就是11+8=19topology stream operationsjoinstorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields) { return join(streams, joinFields, outFields, JoinType.INNER); } public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields, JoinType type) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields, type); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields, JoinType type) { return join(streams, joinFields, outFields, repeat(streams.size(), type)); } public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields, List<JoinType> mixed) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields, mixed); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields, List<JoinType> mixed) { return join(streams, joinFields, outFields, mixed, JoinOutFieldsMode.COMPACT); } public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields, JoinOutFieldsMode mode) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields, mode); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields, JoinOutFieldsMode mode) { return join(streams, joinFields, outFields, JoinType.INNER, mode); } public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields, JoinType type, JoinOutFieldsMode mode) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields, type, mode); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields, JoinType type, JoinOutFieldsMode mode) { return join(streams, joinFields, outFields, repeat(streams.size(), type), mode); } public Stream join(Stream s1, Fields joinFields1, Stream s2, Fields joinFields2, Fields outFields, List<JoinType> mixed, JoinOutFieldsMode mode) { return join(Arrays.asList(s1, s2), Arrays.asList(joinFields1, joinFields2), outFields, mixed, mode); } public Stream join(List<Stream> streams, List<Fields> joinFields, Fields outFields, List<JoinType> mixed, JoinOutFieldsMode mode) { switch (mode) { case COMPACT: return multiReduce(strippedInputFields(streams, joinFields), groupedStreams(streams, joinFields), new JoinerMultiReducer(mixed, joinFields.get(0).size(), strippedInputFields(streams, joinFields)), outFields); case PRESERVE: return multiReduce(strippedInputFields(streams, joinFields), groupedStreams(streams, joinFields), new PreservingFieldsOrderJoinerMultiReducer(mixed, joinFields.get(0).size(), getAllOutputFields(streams), joinFields, strippedInputFields(streams, joinFields)), outFields); default: throw new IllegalArgumentException(“Unsupported out-fields mode: " + mode); } }可以看到join最后调用了multiReduce，对于COMPACT类型使用的GroupedMultiReducer是JoinerMultiReducer，对于PRESERVE类型使用的GroupedMultiReducer是PreservingFieldsOrderJoinerMultiReducermergestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream merge(Fields outputFields, Stream… streams) { return merge(outputFields, Arrays.asList(streams)); } public Stream merge(Stream… streams) { return merge(Arrays.asList(streams)); } public Stream merge(List<Stream> streams) { return merge(streams.get(0).getOutputFields(), streams); } public Stream merge(Fields outputFields, List<Stream> streams) { return multiReduce(streams, new IdentityMultiReducer(), outputFields); }可以看到merge最后是调用了multiReduce，使用的MultiReducer是IdentityMultiReducermultiReducestorm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.java public Stream multiReduce(Stream s1, Stream s2, MultiReducer function, Fields outputFields) { return multiReduce(Arrays.asList(s1, s2), function, outputFields); } public Stream multiReduce(Fields inputFields1, Stream s1, Fields inputFields2, Stream s2, MultiReducer function, Fields outputFields) { return multiReduce(Arrays.asList(inputFields1, inputFields2), Arrays.asList(s1, s2), function, outputFields); } public Stream multiReduce(GroupedStream s1, GroupedStream s2, GroupedMultiReducer function, Fields outputFields) { return multiReduce(Arrays.asList(s1, s2), function, outputFields); } public Stream multiReduce(Fields inputFields1, GroupedStream s1, Fields inputFields2, GroupedStream s2, GroupedMultiReducer function, Fields outputFields) { return multiReduce(Arrays.asList(inputFields1, inputFields2), Arrays.asList(s1, s2), function, outputFields); } public Stream multiReduce(List<Stream> streams, MultiReducer function, Fields outputFields) { return multiReduce(getAllOutputFields(streams), streams, function, outputFields); } public Stream multiReduce(List<GroupedStream> streams, GroupedMultiReducer function, Fields outputFields) { return multiReduce(getAllOutputFields(streams), streams, function, outputFields); } public Stream multiReduce(List<Fields> inputFields, List<GroupedStream> groupedStreams, GroupedMultiReducer function, Fields outputFields) { List<Fields> fullInputFields = new ArrayList<>(); List<Stream> streams = new ArrayList<>(); List<Fields> fullGroupFields = new ArrayList<>(); for(int i=0; i<groupedStreams.size(); i++) { GroupedStream gs = groupedStreams.get(i); Fields groupFields = gs.getGroupFields(); fullGroupFields.add(groupFields); streams.add(gs.toStream().partitionBy(groupFields)); fullInputFields.add(TridentUtils.fieldsUnion(groupFields, inputFields.get(i))); } return multiReduce(fullInputFields, streams, new GroupedMultiReducerExecutor(function, fullGroupFields, inputFields), outputFields); } public Stream multiReduce(List<Fields> inputFields, List<Stream> streams, MultiReducer function, Fields outputFields) { List<String> names = new ArrayList<>(); for(Stream s: streams) { if(s._name!=null) { names.add(s._name); } } Node n = new ProcessorNode(getUniqueStreamId(), Utils.join(names, “-”), outputFields, outputFields, new MultiReducerProcessor(inputFields, function)); return addSourcedNode(streams, n); }multiReduce方法有个MultiReducer参数，join与merge虽然都调用了multiReduce，但是他们传的MultiReducer值不一样小结trident的操作主要有几类，一类是基本的function、filter、projection操作；一类是repartitioning操作，主要是一些grouping；一类是aggregate操作，包括aggregate、partitionAggregate、persistentAggregate；一类是在topology对stream的join、merge操作function的话，若有emit字段会追加到原始的tuple上；filter用于过滤tuple；projection用于提取字段repartitioning操作有Grouping.local_or_shuffle、Grouping.shuffle、Grouping.all、GlobalGrouping、CustomStreamGrouping、IdentityGrouping、IndexHashGrouping等；partition操作可以理解为将输入的tuple分配到task上，也可以理解为是对stream进行groupingaggregate操作的话，普通的aggregate操作有3类接口，分别是Aggregator、CombinerAggregator、ReducerAggregator，其中Aggregator是最为通用的，它继承了Operation接口，而且在方法参数里头可以使用到collector，这是CombinerAggregator与ReducerAggregator所没有的；而CombinerAggregator与Aggregator及ReducerAggregator不同的是，调用stream.aggregrate方法时，trident会优先在partition进行局部聚合，然后再归一到一个partition做最后聚合，相对来说比较节省网络传输耗时，但是如果将CombinerAggregator与非CombinerAggregator的进行chaining的话，就享受不到这个优化；partitionAggregate主要是在partition维度上进行操作；而persistentAggregate则是在整个stream的维度上对所有batch的tuple进行操作，结果持久化在state上对于stream的join及merge操作，其最后都是依赖multiReduce来实现，只是传递的MultiReducer值不一样；join的话join的话需要字段来进行匹配(字段名可以不一样)，可以选择JoinType，是INNER还是OUTER，不过join是对于spout的small batch来进行join的；merge的话，就是纯粹的几个stream进行tuple的归总。docTrident API OverviewTrident API 综述JStorm Trident 入门精华一页纸Everything You Need to Know about Apache Stormbatch and partition - differences ...

聊聊storm trident的state

序本文主要研究一下storm trident的stateStateTypestorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/StateType.javapublic enum StateType { NON_TRANSACTIONAL, TRANSACTIONAL, OPAQUE}StateType有三种类型，NON_TRANSACTIONAL非事务性，TRANSACTIONAL事务性，OPAQUE不透明事务对应的spout也有三类，non-transactional、transactional以及opaque transactionalStatestorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/State.java/** * There’s 3 different kinds of state: * * 1. non-transactional: ignores commits, updates are permanent. no rollback. a cassandra incrementing state would be like this 2. * repeat-transactional: idempotent as long as all batches for a txid are identical 3. opaque-transactional: the most general kind of state. * updates are always done based on the previous version of the value if the current commit = latest stored commit Idempotent even if the * batch for a txid can change. * * repeat transactional is idempotent for transactional spouts opaque transactional is idempotent for opaque or transactional spouts * * Trident should log warnings when state is idempotent but updates will not be idempotent because of spout */// retrieving is encapsulated in Retrieval interfacepublic interface State { void beginCommit(Long txid); // can be null for things like partitionPersist occuring off a DRPC stream void commit(Long txid);}non-transactional，忽略commits，updates是持久的，没有rollback，cassandra的incrementing state属于这个类型；at-most或者at-least once语义repeat-transactional，简称transactional，要求不管是否replayed，同一个batch的txid始终相同，而且里头的tuple也不变，一个tuple只属于一个batch，各个batch之间不会重叠；对于state更新来说，replay遇到相同的txid，即可跳过；在数据库需要较少的state，但是容错性较差，保证exactly once语义opaque-transactional，简称opaque，是用的比较多的一类，它的容错性比transactional强，它不要求一个tuple始终在同一个batch/txid，也就是说允许一个tuple在这个batch处理失败，但是在其他batch中处理成功，但是它可以保证每个tuple只在某一个batch中exactly成功处理一次；OpaqueTridentKafkaSpout就是这个类型的实现，它能容忍kafka节点丢失的错误；对于state更新来说，replay遇到相同的txid，则需要基于prevValue使用当前的值覆盖掉；在数据库需要更多空间来存储state，但是容错性好，保证exactly once语义MapStatestorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/MapState.javapublic interface MapState<T> extends ReadOnlyMapState<T> { List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters); void multiPut(List<List<Object>> keys, List<T> vals);}MapState继承了ReadOnlyMapState接口，而ReadOnlyMapState则继承了State接口这里主要举MapState的几个实现类分析一下NonTransactionalMapstorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/NonTransactionalMap.javapublic class NonTransactionalMap<T> implements MapState<T> { IBackingMap<T> _backing; protected NonTransactionalMap(IBackingMap<T> backing) { _backing = backing; } public static <T> MapState<T> build(IBackingMap<T> backing) { return new NonTransactionalMap<T>(backing); } @Override public List<T> multiGet(List<List<Object>> keys) { return _backing.multiGet(keys); } @Override public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) { List<T> curr = _backing.multiGet(keys); List<T> ret = new ArrayList<T>(curr.size()); for (int i = 0; i < curr.size(); i++) { T currVal = curr.get(i); ValueUpdater<T> updater = updaters.get(i); ret.add(updater.update(currVal)); } _backing.multiPut(keys, ret); return ret; } @Override public void multiPut(List<List<Object>> keys, List<T> vals) { _backing.multiPut(keys, vals); } @Override public void beginCommit(Long txid) { } @Override public void commit(Long txid) { }}NonTransactionalMap包装了IBackingMap，beginCommit及commit方法都不做任何操作multiUpdate方法构造List<T> ret，然后使用IBackingMap的multiPut来实现TransactionalMapstorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/TransactionalMap.javapublic class TransactionalMap<T> implements MapState<T> { CachedBatchReadsMap<TransactionalValue> _backing; Long _currTx; protected TransactionalMap(IBackingMap<TransactionalValue> backing) { _backing = new CachedBatchReadsMap(backing); } public static <T> MapState<T> build(IBackingMap<TransactionalValue> backing) { return new TransactionalMap<T>(backing); } @Override public List<T> multiGet(List<List<Object>> keys) { List<CachedBatchReadsMap.RetVal<TransactionalValue>> vals = _backing.multiGet(keys); List<T> ret = new ArrayList<T>(vals.size()); for (CachedBatchReadsMap.RetVal<TransactionalValue> retval : vals) { TransactionalValue v = retval.val; if (v != null) { ret.add((T) v.getVal()); } else { ret.add(null); } } return ret; } @Override public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) { List<CachedBatchReadsMap.RetVal<TransactionalValue>> curr = _backing.multiGet(keys); List<TransactionalValue> newVals = new ArrayList<TransactionalValue>(curr.size()); List<List<Object>> newKeys = new ArrayList(); List<T> ret = new ArrayList<T>(); for (int i = 0; i < curr.size(); i++) { CachedBatchReadsMap.RetVal<TransactionalValue> retval = curr.get(i); TransactionalValue<T> val = retval.val; ValueUpdater<T> updater = updaters.get(i); TransactionalValue<T> newVal; boolean changed = false; if (val == null) { newVal = new TransactionalValue<T>(_currTx, updater.update(null)); changed = true; } else { if (_currTx != null && _currTx.equals(val.getTxid()) && !retval.cached) { newVal = val; } else { newVal = new TransactionalValue<T>(_currTx, updater.update(val.getVal())); changed = true; } } ret.add(newVal.getVal()); if (changed) { newVals.add(newVal); newKeys.add(keys.get(i)); } } if (!newKeys.isEmpty()) { _backing.multiPut(newKeys, newVals); } return ret; } @Override public void multiPut(List<List<Object>> keys, List<T> vals) { List<TransactionalValue> newVals = new ArrayList<TransactionalValue>(vals.size()); for (T val : vals) { newVals.add(new TransactionalValue<T>(_currTx, val)); } _backing.multiPut(keys, newVals); } @Override public void beginCommit(Long txid) { _currTx = txid; _backing.reset(); } @Override public void commit(Long txid) { _currTx = null; _backing.reset(); }}TransactionalMap采取的是CachedBatchReadsMap<TransactionalValue>，这里泛型使用的是TransactionalValue，beginCommit会设置当前的txid，重置_backing，commit的时候会重置txid，然后重置_backingmultiUpdate方法中判断如果_currTx已经存在值，且该值!retval.cached(即不是本次事务中multiPut进去的)，那么不会更新该值(skip the update)，使用newVal = valmultiPut方法构造批量的TransactionalValue，然后使用CachedBatchReadsMap.multiPut(List<List<Object>> keys, List<T> vals)方法，该方法更新值之后会更新到缓存OpaqueMapstorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/OpaqueMap.javapublic class OpaqueMap<T> implements MapState<T> { CachedBatchReadsMap<OpaqueValue> _backing; Long _currTx; protected OpaqueMap(IBackingMap<OpaqueValue> backing) { _backing = new CachedBatchReadsMap(backing); } public static <T> MapState<T> build(IBackingMap<OpaqueValue> backing) { return new OpaqueMap<T>(backing); } @Override public List<T> multiGet(List<List<Object>> keys) { List<CachedBatchReadsMap.RetVal<OpaqueValue>> curr = _backing.multiGet(keys); List<T> ret = new ArrayList<T>(curr.size()); for (CachedBatchReadsMap.RetVal<OpaqueValue> retval : curr) { OpaqueValue val = retval.val; if (val != null) { if (retval.cached) { ret.add((T) val.getCurr()); } else { ret.add((T) val.get(_currTx)); } } else { ret.add(null); } } return ret; } @Override public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) { List<CachedBatchReadsMap.RetVal<OpaqueValue>> curr = _backing.multiGet(keys); List<OpaqueValue> newVals = new ArrayList<OpaqueValue>(curr.size()); List<T> ret = new ArrayList<T>(); for (int i = 0; i < curr.size(); i++) { CachedBatchReadsMap.RetVal<OpaqueValue> retval = curr.get(i); OpaqueValue<T> val = retval.val; ValueUpdater<T> updater = updaters.get(i); T prev; if (val == null) { prev = null; } else { if (retval.cached) { prev = val.getCurr(); } else { prev = val.get(_currTx); } } T newVal = updater.update(prev); ret.add(newVal); OpaqueValue<T> newOpaqueVal; if (val == null) { newOpaqueVal = new OpaqueValue<T>(_currTx, newVal); } else { newOpaqueVal = val.update(_currTx, newVal); } newVals.add(newOpaqueVal); } _backing.multiPut(keys, newVals); return ret; } @Override public void multiPut(List<List<Object>> keys, List<T> vals) { List<ValueUpdater> updaters = new ArrayList<ValueUpdater>(vals.size()); for (T val : vals) { updaters.add(new ReplaceUpdater<T>(val)); } multiUpdate(keys, updaters); } @Override public void beginCommit(Long txid) { _currTx = txid; _backing.reset(); } @Override public void commit(Long txid) { _currTx = null; _backing.reset(); } static class ReplaceUpdater<T> implements ValueUpdater<T> { T _t; public ReplaceUpdater(T t) { _t = t; } @Override public T update(Object stored) { return _t; } }}OpaqueMap采取的是CachedBatchReadsMap<OpaqueValue>，这里泛型使用的是OpaqueValue，beginCommit会设置当前的txid，重置_backing，commit的时候会重置txid，然后重置_backing与TransactionalMap的不同，这里在multiPut的时候，使用的是ReplaceUpdater，然后调用multiUpdate强制覆盖multiUpdate方法与TransactionalMap的不同，它是基于prev值来进行update的，算出newVal小结trident严格按batch的顺序更新state，比如txid为3的batch必须在txid为2的batch处理完之后才能处理state分三种类型，分别是non-transactional、transactional、opaque transactional，对应的spout也是这三种类型non-transactional无法保证exactly once，它可能是at-least once或者at-most once；其state计算参考NonTransactionalMap，对于beginCommit及commit操作都无处理transactional类型能够保证exactly once，但是要求比较严格，要同一个batch的txid及tuple在replayed的时候仍然保持一致，因此容错性差一点，但是它的state计算相对简单，参考TransactionalMap，遇到同一个txid的值，skip掉即可opaque transactional类型也能够保证exactly once，它允许一个tuple处理失败之后，出现在其他batch中处理，因而容错性好，但是state计算要多存储prev值，参考OpaqueMap，遇到同一个txid的值，使用prev值跟当前值进行覆盖trident将保证exactly once的state的计算都封装好了，使用的时候，在persistentAggregate传入相应的StateFactory即可，支持多种StateType的factory可以选择使用StateType属性，通过传入不同的参数构造不同transactional的state；也可以通过实现StateFactory自定义实现state factory，另外也可以通过继承BaseQueryFunction来自定义stateQuery查询，自定义更新的话，可以继承BaseStateUpdater，然后通过partitionPersist传入docTrident TutorialTrident State ...

聊聊storm的CheckpointSpout

序本文主要研究一下storm的CheckpointSpoutTopologyBuilderstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java public StormTopology createTopology() { Map<String, Bolt> boltSpecs = new HashMap<>(); Map<String, SpoutSpec> spoutSpecs = new HashMap<>(); maybeAddCheckpointSpout(); for (String boltId : _bolts.keySet()) { IRichBolt bolt = _bolts.get(boltId); bolt = maybeAddCheckpointTupleForwarder(bolt); ComponentCommon common = getComponentCommon(boltId, bolt); try { maybeAddCheckpointInputs(common); boltSpecs.put(boltId, new Bolt(ComponentObject.serialized_java(Utils.javaSerialize(bolt)), common)); } catch (RuntimeException wrapperCause) { if (wrapperCause.getCause() != null && NotSerializableException.class.equals(wrapperCause.getCause().getClass())) { throw new IllegalStateException( “Bolt ‘” + boltId + “’ contains a non-serializable field of type " + wrapperCause.getCause().getMessage() + “, " + “which was instantiated prior to topology creation. " + wrapperCause.getCause().getMessage() + " " + “should be instantiated within the prepare method of ‘” + boltId + " at the earliest.”, wrapperCause); } throw wrapperCause; } } for (String spoutId : _spouts.keySet()) { IRichSpout spout = _spouts.get(spoutId); ComponentCommon common = getComponentCommon(spoutId, spout); try { spoutSpecs.put(spoutId, new SpoutSpec(ComponentObject.serialized_java(Utils.javaSerialize(spout)), common)); } catch (RuntimeException wrapperCause) { if (wrapperCause.getCause() != null && NotSerializableException.class.equals(wrapperCause.getCause().getClass())) { throw new IllegalStateException( “Spout ‘” + spoutId + “’ contains a non-serializable field of type " + wrapperCause.getCause().getMessage() + “, " + “which was instantiated prior to topology creation. " + wrapperCause.getCause().getMessage() + " " + “should be instantiated within the prepare method of ‘” + spoutId + " at the earliest.”, wrapperCause); } throw wrapperCause; } } StormTopology stormTopology = new StormTopology(spoutSpecs, boltSpecs, new HashMap<>()); stormTopology.set_worker_hooks(_workerHooks); if (!_componentToSharedMemory.isEmpty()) { stormTopology.set_component_to_shared_memory(_componentToSharedMemory); stormTopology.set_shared_memory(_sharedMemory); } return Utils.addVersions(stormTopology); } /** * If the topology has at least one stateful bolt add a {@link CheckpointSpout} component to the topology. / private void maybeAddCheckpointSpout() { if (hasStatefulBolt) { setSpout(CHECKPOINT_COMPONENT_ID, new CheckpointSpout(), 1); } } private void maybeAddCheckpointInputs(ComponentCommon common) { if (hasStatefulBolt) { addCheckPointInputs(common); } } /* * If the topology has at least one stateful bolt all the non-stateful bolts are wrapped in {@link CheckpointTupleForwarder} so that the * checkpoint tuples can flow through the topology. / private IRichBolt maybeAddCheckpointTupleForwarder(IRichBolt bolt) { if (hasStatefulBolt && !(bolt instanceof StatefulBoltExecutor)) { bolt = new CheckpointTupleForwarder(bolt); } return bolt; } /* * For bolts that has incoming streams from spouts (the root bolts), add checkpoint stream from checkpoint spout to its input. For other * bolts, add checkpoint stream from the previous bolt to its input. / private void addCheckPointInputs(ComponentCommon component) { Set<GlobalStreamId> checkPointInputs = new HashSet<>(); for (GlobalStreamId inputStream : component.get_inputs().keySet()) { String sourceId = inputStream.get_componentId(); if (_spouts.containsKey(sourceId)) { checkPointInputs.add(new GlobalStreamId(CHECKPOINT_COMPONENT_ID, CHECKPOINT_STREAM_ID)); } else { checkPointInputs.add(new GlobalStreamId(sourceId, CHECKPOINT_STREAM_ID)); } } for (GlobalStreamId streamId : checkPointInputs) { component.put_to_inputs(streamId, Grouping.all(new NullStruct())); } }TopologyBuilder在createTopology的时候，会调用maybeAddCheckpointSpout，如果是hasStatefulBolt的话，则会自动创建并添加CheckpointSpout如果是hasStatefulBolt，bolt不是StatefulBoltExecutor类型，则会使用CheckpointTupleForwarder进行包装如果是hasStatefulBolt，会调用addCheckPointInputs，配置inputsCheckpointSpoutstorm-2.0.0/storm-client/src/jvm/org/apache/storm/spout/CheckpointSpout.java/* * Emits checkpoint tuples which is used to save the state of the {@link org.apache.storm.topology.IStatefulComponent} across the topology. * If a topology contains Stateful bolts, Checkpoint spouts are automatically added to the topology. There is only one Checkpoint task per * topology. Checkpoint spout stores its internal state in a {@link KeyValueState}. * * @see CheckPointState /public class CheckpointSpout extends BaseRichSpout { public static final String CHECKPOINT_STREAM_ID = “$checkpoint”; public static final String CHECKPOINT_COMPONENT_ID = “$checkpointspout”; public static final String CHECKPOINT_FIELD_TXID = “txid”; public static final String CHECKPOINT_FIELD_ACTION = “action”; private static final Logger LOG = LoggerFactory.getLogger(CheckpointSpout.class); private static final String TX_STATE_KEY = “__state”; private TopologyContext context; private SpoutOutputCollector collector; private long lastCheckpointTs; private int checkpointInterval; private int sleepInterval; private boolean recoveryStepInProgress; private boolean checkpointStepInProgress; private boolean recovering; private KeyValueState<String, CheckPointState> checkpointState; private CheckPointState curTxState; public static boolean isCheckpoint(Tuple input) { return CHECKPOINT_STREAM_ID.equals(input.getSourceStreamId()); } @Override public void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector) { open(context, collector, loadCheckpointInterval(conf), loadCheckpointState(conf, context)); } // package access for unit test void open(TopologyContext context, SpoutOutputCollector collector, int checkpointInterval, KeyValueState<String, CheckPointState> checkpointState) { this.context = context; this.collector = collector; this.checkpointInterval = checkpointInterval; this.sleepInterval = checkpointInterval / 10; this.checkpointState = checkpointState; this.curTxState = checkpointState.get(TX_STATE_KEY); lastCheckpointTs = 0; recoveryStepInProgress = false; checkpointStepInProgress = false; recovering = true; } @Override public void nextTuple() { if (shouldRecover()) { handleRecovery(); startProgress(); } else if (shouldCheckpoint()) { doCheckpoint(); startProgress(); } else { Utils.sleep(sleepInterval); } } @Override public void ack(Object msgId) { LOG.debug(“Got ack with txid {}, current txState {}”, msgId, curTxState); if (curTxState.getTxid() == ((Number) msgId).longValue()) { if (recovering) { handleRecoveryAck(); } else { handleCheckpointAck(); } } else { LOG.warn(“Ack msgid {}, txState.txid {} mismatch”, msgId, curTxState.getTxid()); } resetProgress(); } @Override public void fail(Object msgId) { LOG.debug(“Got fail with msgid {}”, msgId); if (!recovering) { LOG.debug(“Checkpoint failed, will trigger recovery”); recovering = true; } resetProgress(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declareStream(CHECKPOINT_STREAM_ID, new Fields(CHECKPOINT_FIELD_TXID, CHECKPOINT_FIELD_ACTION)); } private int loadCheckpointInterval(Map<String, Object> topoConf) { int interval = 0; if (topoConf.containsKey(Config.TOPOLOGY_STATE_CHECKPOINT_INTERVAL)) { interval = ((Number) topoConf.get(Config.TOPOLOGY_STATE_CHECKPOINT_INTERVAL)).intValue(); } // ensure checkpoint interval is not less than a sane low value. interval = Math.max(100, interval); LOG.info(“Checkpoint interval is {} millis”, interval); return interval; } private boolean shouldCheckpoint() { return !recovering && !checkpointStepInProgress && (curTxState.getState() != COMMITTED || checkpointIntervalElapsed()); } private boolean checkpointIntervalElapsed() { return (System.currentTimeMillis() - lastCheckpointTs) > checkpointInterval; } private void doCheckpoint() { LOG.debug(“In checkpoint”); if (curTxState.getState() == COMMITTED) { saveTxState(curTxState.nextState(false)); lastCheckpointTs = System.currentTimeMillis(); } Action action = curTxState.nextAction(false); emit(curTxState.getTxid(), action); } private void emit(long txid, Action action) { LOG.debug(“Current state {}, emitting txid {}, action {}”, curTxState, txid, action); collector.emit(CHECKPOINT_STREAM_ID, new Values(txid, action), txid); } //……}CheckpointSpout从Config.TOPOLOGY_STATE_CHECKPOINT_INTERVAL(topology.state.checkpoint.interval.ms)读取checkpoint的时间间隔，defaults.yaml中默认是1000，如果没有指定，则使用100，最低值为100nextTuple方法首先判断shouldRecover，如果需要恢复则调用handleRecovery进行恢复，然后startProgress；如果需要checkpoint则进行checkpoint，否则sleepInterval再进行下次判断如果不需要recover，则调用shouldCheckpoint方法判断是否需要进行checkpoint，如果当前状态不是COMMITTED或者当前时间距离上次checkpoint的时间超过了checkpointInterval，则进行doCheckpoint操作，往CHECKPOINT_STREAM_ID发送下一步的actionCheckpointSpout在收到ack之后会进行saveTxState操作，调用checkpointState.commit提交整个checkpoint，然后调用resetProgress重置状态如果是fail的ack，则调用resetProgress重置状态CheckPointStatestorm-2.0.0/storm-client/src/jvm/org/apache/storm/spout/CheckPointState.java /* * Get the next state based on this checkpoint state. * * @param recovering if in recovering phase * @return the next checkpoint state based on this state. / public CheckPointState nextState(boolean recovering) { CheckPointState nextState; switch (state) { case PREPARING: nextState = recovering ? new CheckPointState(txid - 1, COMMITTED) : new CheckPointState(txid, COMMITTING); break; case COMMITTING: nextState = new CheckPointState(txid, COMMITTED); break; case COMMITTED: nextState = recovering ? this : new CheckPointState(txid + 1, PREPARING); break; default: throw new IllegalStateException(“Unknown state " + state); } return nextState; } /* * Get the next action to perform based on this checkpoint state. * * @param recovering if in recovering phase * @return the next action to perform based on this state / public Action nextAction(boolean recovering) { Action action; switch (state) { case PREPARING: action = recovering ? Action.ROLLBACK : Action.PREPARE; break; case COMMITTING: action = Action.COMMIT; break; case COMMITTED: action = recovering ? Action.INITSTATE : Action.PREPARE; break; default: throw new IllegalStateException(“Unknown state " + state); } return action; }CheckPointState提供了nextState方法进行状态的切换，nextAction方法则提供了对应state的的下个动作BaseStatefulBoltExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/BaseStatefulBoltExecutor.java public void execute(Tuple input) { if (CheckpointSpout.isCheckpoint(input)) { processCheckpoint(input); } else { handleTuple(input); } } /* * Invokes handleCheckpoint once checkpoint tuple is received on all input checkpoint streams to this component. / private void processCheckpoint(Tuple input) { CheckPointState.Action action = (CheckPointState.Action) input.getValueByField(CHECKPOINT_FIELD_ACTION); long txid = input.getLongByField(CHECKPOINT_FIELD_TXID); if (shouldProcessTransaction(action, txid)) { LOG.debug(“Processing action {}, txid {}”, action, txid); try { if (txid >= lastTxid) { handleCheckpoint(input, action, txid); if (action == ROLLBACK) { lastTxid = txid - 1; } else { lastTxid = txid; } } else { LOG.debug(“Ignoring old transaction. Action {}, txid {}”, action, txid); collector.ack(input); } } catch (Throwable th) { LOG.error(“Got error while processing checkpoint tuple”, th); collector.fail(input); collector.reportError(th); } } else { LOG.debug(“Waiting for action {}, txid {} from all input tasks. checkPointInputTaskCount {}, " + “transactionRequestCount {}”, action, txid, checkPointInputTaskCount, transactionRequestCount); collector.ack(input); } } /* * Checks if check points have been received from all tasks across all input streams to this component / private boolean shouldProcessTransaction(CheckPointState.Action action, long txid) { TransactionRequest request = new TransactionRequest(action, txid); Integer count; if ((count = transactionRequestCount.get(request)) == null) { transactionRequestCount.put(request, 1); count = 1; } else { transactionRequestCount.put(request, ++count); } if (count == checkPointInputTaskCount) { transactionRequestCount.remove(request); return true; } return false; } protected void declareCheckpointStream(OutputFieldsDeclarer declarer) { declarer.declareStream(CHECKPOINT_STREAM_ID, new Fields(CHECKPOINT_FIELD_TXID, CHECKPOINT_FIELD_ACTION)); }BaseStatefulBoltExecutor的execute方法首先通过CheckpointSpout.isCheckpoint(input)判断是否是CheckpointSpout发来的tuple，如果是则执行processCheckpointprocessCheckpoint首先调用shouldProcessTransaction判断所有输入流的task是否都有给它发送checkpint tuple来决定是否往下处理如果txid大于lastTxid，则调用handleCheckpoint方法，该方法由子类实现StatefulBoltExecutor.handleCheckpointstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/StatefulBoltExecutor.javapublic class StatefulBoltExecutor<T extends State> extends BaseStatefulBoltExecutor { //…… protected void handleCheckpoint(Tuple checkpointTuple, Action action, long txid) { LOG.debug(“handleCheckPoint with tuple {}, action {}, txid {}”, checkpointTuple, action, txid); if (action == PREPARE) { if (boltInitialized) { bolt.prePrepare(txid); state.prepareCommit(txid); preparedTuples.addAll(collector.ackedTuples()); } else { / * May be the task restarted in the middle and the state needs be initialized. * Fail fast and trigger recovery. / LOG.debug(“Failing checkpointTuple, PREPARE received when bolt state is not initialized.”); collector.fail(checkpointTuple); return; } } else if (action == COMMIT) { bolt.preCommit(txid); state.commit(txid); ack(preparedTuples); } else if (action == ROLLBACK) { bolt.preRollback(); state.rollback(); fail(preparedTuples); fail(collector.ackedTuples()); } else if (action == INITSTATE) { if (!boltInitialized) { bolt.initState((T) state); boltInitialized = true; LOG.debug(”{} pending tuples to process”, pendingTuples.size()); for (Tuple tuple : pendingTuples) { doExecute(tuple); } pendingTuples.clear(); } else { / * If a worker crashes, the states of all workers are rolled back and an initState message is sent across * the topology so that crashed workers can initialize their state. * The bolts that have their state already initialized need not be re-initialized. / LOG.debug(“Bolt state is already initialized, ignoring tuple {}, action {}, txid {}”, checkpointTuple, action, txid); } } collector.emit(CheckpointSpout.CHECKPOINT_STREAM_ID, checkpointTuple, new Values(txid, action)); collector.delegate.ack(checkpointTuple); } //……}StatefulBoltExecutor继承了BaseStatefulBoltExecutor，实现了handleCheckpoint方法该方法根据不同的action进行相应的处理，PREPARE的话，调用bolt的prePrepare，对state调用prepareCommit；COMMIT的话则调用bolt的preCommit，对state调用commit；ROLLBACK的话，调用bolt的preRollback，对state调用rollback；对于INITSTATE，如果bolt未初始化，则调用bolt的initState根据action执行完之后，继续流转checkpoint tuple，然后调用collector.delegate.ack(checkpointTuple)进行ackCheckpointTupleForwarder.handleCheckpointstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/CheckpointTupleForwarder.java/* * Wraps {@link IRichBolt} and forwards checkpoint tuples in a stateful topology. * * When a storm topology contains one or more {@link IStatefulBolt} all non-stateful bolts are wrapped in {@link CheckpointTupleForwarder} * so that the checkpoint tuples can flow through the entire topology DAG. * /public class CheckpointTupleForwarder extends BaseStatefulBoltExecutor { //…… /* * Forwards the checkpoint tuple downstream. * * @param checkpointTuple the checkpoint tuple * @param action the action (prepare, commit, rollback or initstate) * @param txid the transaction id. */ protected void handleCheckpoint(Tuple checkpointTuple, Action action, long txid) { collector.emit(CHECKPOINT_STREAM_ID, checkpointTuple, new Values(txid, action)); collector.ack(checkpointTuple); } //……}CheckpointTupleForwarder用于包装non-stateful bolts，使得checkpoint tuples得以在整个topology DAG中顺利流转小结如果topology有IStatefulBolt的话(IStatefulBolt为bolt提供了存取state的抽象，通过checkpiont机制持久化state并利用ack机制提供at-least once语义)，TopologyBuilder会自动添加CheckpointSpout，对于bolt不是StatefulBoltExecutor类型，则会使用CheckpointTupleForwarder进行包装，这样使得checkpint tuple贯穿整个topology的DAGCheckpointSpout在nextTuple方法先判断是否需要recover，在判断是否需要进行checkpoint，都不是的话则sleep一段时间，sleepInterval为checkpointInterval/10，而checkpointInterval最小为100，从Config.TOPOLOGY_STATE_CHECKPOINT_INTERVAL配置读取，默认是1000；注意该值并不是意味着每隔checkpointInterval就进行checkpoint检测，也就是说不是fixedRate效果而是fixedDelay的效果，即如果当前checkpoint还没有结束，是不会再重复进行checkpoint检测的recover及checkpoint都会往CHECKPOINT_STREAM_ID发送tuple；BaseStatefulBoltExecutor则在execute方法封装了对checkpoint tuple的处理，非checkpint tuple则通过抽象方法handleTuple由子类去实现；具体的handleCheckpoint方法由子类实现，BaseStatefulBoltExecutor只是对其进行前提判断，要求收到所有输入流的task发来的checkpoint tuple，且txid >= lastTxid才可以执行handleCheckpoint操作StatefulBoltExecutor继承了BaseStatefulBoltExecutor，实现了handleCheckpoint方法，对PREPARE、COMMIT、ROLLBACK、INITSTATE这几个action(类似three phase commit protocol)进行相应处理，然后继续流转checkpoint tuple，并进行ackCheckpointSpout在发送checkpoint tuple的时候，使用txid作为msgId来发送可靠的tuple，在所有checkpoint tuple在整个topology的DAG都被ack之后，会收到ack，然后调用checkpointState.commit提交整个checkpoint；如果是fail的话则重置相关状态；一般情况下Config.TOPOLOGY_STATE_CHECKPOINT_INTERVAL(topology.state.checkpoint.interval.ms，默认1000，即1秒)值小于Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs，默认30秒)；如果checkpointInterval设置得太大，中间假设worker crash了恢复后的state就不太实时，这样就失去了checkpoint的意义了。docStorm State ManagementStorm 状态管理What is a checkpoint in databases? ...

聊聊storm client的netty buffer watermark

序本文主要研究一下storm client的netty buffer watermarkConfigstorm-2.0.0/storm-client/src/jvm/org/apache/storm/Config.java /** * Netty based messaging: The netty write buffer high watermark in bytes. * * If the number of bytes queued in the netty’s write buffer exceeds this value, the netty {@code Channel.isWritable()} will start to * return {@code false}. The client will wait until the value falls below the {@linkplain #STORM_MESSAGING_NETTY_BUFFER_LOW_WATERMARK * low water mark}. * / @isInteger @isPositiveNumber public static final String STORM_MESSAGING_NETTY_BUFFER_HIGH_WATERMARK = “storm.messaging.netty.buffer.high.watermark”; /* * Netty based messaging: The netty write buffer low watermark in bytes. * * Once the number of bytes queued in the write buffer exceeded the {@linkplain #STORM_MESSAGING_NETTY_BUFFER_HIGH_WATERMARK high water * mark} and then dropped down below this value, the netty {@code Channel.isWritable()} will start to return true. * / @isInteger @isPositiveNumber public static final String STORM_MESSAGING_NETTY_BUFFER_LOW_WATERMARK = “storm.messaging.netty.buffer.low.watermark”;这里有两个相关的参数，分别是storm.messaging.netty.buffer.high.watermark以及storm.messaging.netty.buffer.low.watermarkdefaults.yaml# The netty write buffer high watermark in bytes.# If the number of bytes queued in the netty’s write buffer exceeds this value, the netty client will block# until the value falls below the low water mark.storm.messaging.netty.buffer.high.watermark: 16777216 # 16 MB# The netty write buffer low watermark in bytes.# Once the number of bytes queued in the write buffer exceeded the high water mark and then# dropped down below this value, any blocked clients will unblock and start processing further messages.storm.messaging.netty.buffer.low.watermark: 8388608 # 8 MB在defaults.yaml文件中，low.watermark默认大小为8388608，即8M；high.watermark默认大小为16777216，即16MClientstorm-2.0.0/storm-client/src/jvm/org/apache/storm/messaging/netty/Client.java Client(Map<String, Object> topoConf, AtomicBoolean[] remoteBpStatus, EventLoopGroup eventLoopGroup, HashedWheelTimer scheduler, String host, int port) { this.topoConf = topoConf; closing = false; this.scheduler = scheduler; int bufferSize = ObjectReader.getInt(topoConf.get(Config.STORM_MESSAGING_NETTY_BUFFER_SIZE)); int lowWatermark = ObjectReader.getInt(topoConf.get(Config.STORM_MESSAGING_NETTY_BUFFER_LOW_WATERMARK)); int highWatermark = ObjectReader.getInt(topoConf.get(Config.STORM_MESSAGING_NETTY_BUFFER_HIGH_WATERMARK)); // if SASL authentication is disabled, saslChannelReady is initialized as true; otherwise false saslChannelReady.set(!ObjectReader.getBoolean(topoConf.get(Config.STORM_MESSAGING_NETTY_AUTHENTICATION), false)); LOG.info(“Creating Netty Client, connecting to {}:{}, bufferSize: {}, lowWatermark: {}, highWatermark: {}”, host, port, bufferSize, lowWatermark, highWatermark); int minWaitMs = ObjectReader.getInt(topoConf.get(Config.STORM_MESSAGING_NETTY_MIN_SLEEP_MS)); int maxWaitMs = ObjectReader.getInt(topoConf.get(Config.STORM_MESSAGING_NETTY_MAX_SLEEP_MS)); retryPolicy = new StormBoundedExponentialBackoffRetry(minWaitMs, maxWaitMs, -1); // Initiate connection to remote destination this.eventLoopGroup = eventLoopGroup; // Initiate connection to remote destination bootstrap = new Bootstrap() .group(this.eventLoopGroup) .channel(NioSocketChannel.class) .option(ChannelOption.TCP_NODELAY, true) .option(ChannelOption.SO_SNDBUF, bufferSize) .option(ChannelOption.SO_KEEPALIVE, true) .option(ChannelOption.WRITE_BUFFER_WATER_MARK, new WriteBufferWaterMark(lowWatermark, highWatermark)) .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT) .handler(new StormClientPipelineFactory(this, remoteBpStatus, topoConf)); dstAddress = new InetSocketAddress(host, port); dstAddressPrefixedName = prefixedName(dstAddress); launchChannelAliveThread(); scheduleConnect(NO_DELAY_MS); int messageBatchSize = ObjectReader.getInt(topoConf.get(Config.STORM_NETTY_MESSAGE_BATCH_SIZE), 262144); batcher = new MessageBuffer(messageBatchSize); String clazz = (String) topoConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY); if (clazz == null) { waitStrategy = new WaitStrategyProgressive(); } else { waitStrategy = ReflectionUtils.newInstance(clazz); } waitStrategy.prepare(topoConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); }这里根据lowWatermark及highWatermark创建了WriteBufferWaterMark对象，设置到ChannelOption.WRITE_BUFFER_WATER_MARKWriteBufferWaterMarknetty-all-4.1.25.Final-sources.jar!/io/netty/channel/WriteBufferWaterMark.java/* * WriteBufferWaterMark is used to set low water mark and high water mark for the write buffer. * * If the number of bytes queued in the write buffer exceeds the * {@linkplain #high high water mark}, {@link Channel#isWritable()} * will start to return {@code false}. * * If the number of bytes queued in the write buffer exceeds the * {@linkplain #high high water mark} and then * dropped down below the {@linkplain #low low water mark}, * {@link Channel#isWritable()} will start to return * {@code true} again. /public final class WriteBufferWaterMark { private static final int DEFAULT_LOW_WATER_MARK = 32 * 1024; private static final int DEFAULT_HIGH_WATER_MARK = 64 * 1024; public static final WriteBufferWaterMark DEFAULT = new WriteBufferWaterMark(DEFAULT_LOW_WATER_MARK, DEFAULT_HIGH_WATER_MARK, false); private final int low; private final int high; /* * Create a new instance. * * @param low low water mark for write buffer. * @param high high water mark for write buffer / public WriteBufferWaterMark(int low, int high) { this(low, high, true); } /* * This constructor is needed to keep backward-compatibility. / WriteBufferWaterMark(int low, int high, boolean validate) { if (validate) { if (low < 0) { throw new IllegalArgumentException(“write buffer’s low water mark must be >= 0”); } if (high < low) { throw new IllegalArgumentException( “write buffer’s high water mark cannot be less than " + " low water mark (” + low + “): " + high); } } this.low = low; this.high = high; } /* * Returns the low water mark for the write buffer. / public int low() { return low; } /* * Returns the high water mark for the write buffer. / public int high() { return high; } @Override public String toString() { StringBuilder builder = new StringBuilder(55) .append(“WriteBufferWaterMark(low: “) .append(low) .append(”, high: “) .append(high) .append(”)”); return builder.toString(); }}从注释里头可以看到这两个参数控制的是Channel.isWritable()方法ChannelOutboundBuffer.bytesBeforeWritablenetty-all-4.1.25.Final-sources.jar!/io/netty/channel/ChannelOutboundBuffer.java private static final AtomicIntegerFieldUpdater<ChannelOutboundBuffer> UNWRITABLE_UPDATER = AtomicIntegerFieldUpdater.newUpdater(ChannelOutboundBuffer.class, “unwritable”); private volatile int unwritable; /* * Returns {@code true} if and only if {@linkplain #totalPendingWriteBytes() the total number of pending bytes} did * not exceed the write watermark of the {@link Channel} and * no {@linkplain #setUserDefinedWritability(int, boolean) user-defined writability flag} has been set to * {@code false}. / public boolean isWritable() { return unwritable == 0; } /* * Get how many bytes must be drained from the underlying buffer until {@link #isWritable()} returns {@code true}. * This quantity will always be non-negative. If {@link #isWritable()} is {@code true} then 0. / public long bytesBeforeWritable() { long bytes = totalPendingSize - channel.config().getWriteBufferLowWaterMark(); // If bytes is negative we know we are writable, but if bytes is non-negative we have to check writability. // Note that totalPendingSize and isWritable() use different volatile variables that are not synchronized // together. totalPendingSize will be updated before isWritable(). if (bytes > 0) { return isWritable() ? 0 : bytes; } return 0; } /* * Decrement the pending bytes which will be written at some point. * This method is thread-safe! / void decrementPendingOutboundBytes(long size) { decrementPendingOutboundBytes(size, true, true); } private void decrementPendingOutboundBytes(long size, boolean invokeLater, boolean notifyWritability) { if (size == 0) { return; } long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, -size); if (notifyWritability && newWriteBufferSize < channel.config().getWriteBufferLowWaterMark()) { setWritable(invokeLater); } } private void setWritable(boolean invokeLater) { for (;;) { final int oldValue = unwritable; final int newValue = oldValue & ~1; if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) { if (oldValue != 0 && newValue == 0) { fireChannelWritabilityChanged(invokeLater); } break; } } } private void fireChannelWritabilityChanged(boolean invokeLater) { final ChannelPipeline pipeline = channel.pipeline(); if (invokeLater) { Runnable task = fireChannelWritabilityChangedTask; if (task == null) { fireChannelWritabilityChangedTask = task = new Runnable() { @Override public void run() { pipeline.fireChannelWritabilityChanged(); } }; } channel.eventLoop().execute(task); } else { pipeline.fireChannelWritabilityChanged(); } }bytesBeforeWritable方法先判断totalPendingSize是否大于lowWatermark，如果不大于则返回0，如果大于且isWritable返回true则返回0，否则返回差值decrementPendingOutboundBytes方法会判断，如果notifyWritability为true且newWriteBufferSize < channel.config().getWriteBufferLowWaterMark()，则调用setWritablesetWritable(invokeLater)setWritable会判断是否有变更，有的话，触发fireChannelWritabilityChanged进行通知ChannelOutboundBuffer.bytesBeforeUnwritablenetty-all-4.1.25.Final-sources.jar!/io/netty/channel/ChannelOutboundBuffer.java private static final AtomicIntegerFieldUpdater<ChannelOutboundBuffer> UNWRITABLE_UPDATER = AtomicIntegerFieldUpdater.newUpdater(ChannelOutboundBuffer.class, “unwritable”); private volatile int unwritable; /* * Returns {@code true} if and only if {@linkplain #totalPendingWriteBytes() the total number of pending bytes} did * not exceed the write watermark of the {@link Channel} and * no {@linkplain #setUserDefinedWritability(int, boolean) user-defined writability flag} has been set to * {@code false}. / public boolean isWritable() { return unwritable == 0; } /* * Get how many bytes can be written until {@link #isWritable()} returns {@code false}. * This quantity will always be non-negative. If {@link #isWritable()} is {@code false} then 0. / public long bytesBeforeUnwritable() { long bytes = channel.config().getWriteBufferHighWaterMark() - totalPendingSize; // If bytes is negative we know we are not writable, but if bytes is non-negative we have to check writability. // Note that totalPendingSize and isWritable() use different volatile variables that are not synchronized // together. totalPendingSize will be updated before isWritable(). if (bytes > 0) { return isWritable() ? bytes : 0; } return 0; } /* * Increment the pending bytes which will be written at some point. * This method is thread-safe! */ void incrementPendingOutboundBytes(long size) { incrementPendingOutboundBytes(size, true); } private void incrementPendingOutboundBytes(long size, boolean invokeLater) { if (size == 0) { return; } long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, size); if (newWriteBufferSize > channel.config().getWriteBufferHighWaterMark()) { setUnwritable(invokeLater); } } private void setUnwritable(boolean invokeLater) { for (;;) { final int oldValue = unwritable; final int newValue = oldValue | 1; if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) { if (oldValue == 0 && newValue != 0) { fireChannelWritabilityChanged(invokeLater); } break; } } } private void fireChannelWritabilityChanged(boolean invokeLater) { final ChannelPipeline pipeline = channel.pipeline(); if (invokeLater) { Runnable task = fireChannelWritabilityChangedTask; if (task == null) { fireChannelWritabilityChangedTask = task = new Runnable() { @Override public void run() { pipeline.fireChannelWritabilityChanged(); } }; } channel.eventLoop().execute(task); } else { pipeline.fireChannelWritabilityChanged(); } }bytesBeforeUnwritable方法先判断highWatermark与totalPendingSize的差值，totalPendingSize大于等于highWatermark，则返回0；如果小于highWatermark，且isWritable为true，则返回差值，否则返回0incrementPendingOutboundBytes方法判断如果newWriteBufferSize > channel.config().getWriteBufferHighWaterMark()，则调用setUnwritable(invokeLater)setUnwritable会判断是否有变更，有的话，触发fireChannelWritabilityChanged进行通知小结storm client的storm.messaging.netty.buffer.high.watermark(默认16M)以及storm.messaging.netty.buffer.low.watermark(默认8M)其实配置的是netty的ChannelOption.WRITE_BUFFER_WATER_MARKnetty的WriteBufferWaterMark主要是控制ChannelOutboundBuffer的bytesBeforeWritable以及bytesBeforeUnwritable方法，通过lowWatermark及highWatermark参数来控制ChannelOutboundBuffer的buffer的容量lowWatermark及highWatermark分别在decrementPendingOutboundBytes及incrementPendingOutboundBytes方法里头用到，当小于lowWatermark或者大于highWatermark的时候，分别触发setWritable及setUnwritable，更改ChannelOutboundBuffer的unwritable字段，进而影响isWritable方法；在isWritable为true的时候会立马执行写请求，当返回false的时候，写请求会被放入队列等待isWritable为true时才能执行这些堆积的写请求docPipelining and flow controlWriteBufferWaterMarkNetty 4: high and low write watermarksNetty 水位详解Netty 那些事儿 ——— Netty实现“流量整形”原理分析及实战 Set sane WRITE_BUFFER_HIGH_WATER_MARK and WRITE_BUFFER_LOW_WATER_MARK ...

聊聊storm的messageTimeout

序本文主要研究一下storm的messageTimeoutTOPOLOGY_MESSAGE_TIMEOUT_SECSstorm-2.0.0/storm-client/src/jvm/org/apache/storm/Config.java /** * True if Storm should timeout messages or not. Defaults to true. This is meant to be used in unit tests to prevent tuples from being * accidentally timed out during the test. / @isBoolean public static final String TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS = “topology.enable.message.timeouts”; /* * The maximum amount of time given to the topology to fully process a message emitted by a spout. If the message is not acked within * this time frame, Storm will fail the message on the spout. Some spouts implementations will then replay the message at a later time. / @isInteger @isPositiveNumber @NotNull public static final String TOPOLOGY_MESSAGE_TIMEOUT_SECS = “topology.message.timeout.secs”; /* * How often a tick tuple from the “__system” component and “__tick” stream should be sent to tasks. Meant to be used as a * component-specific configuration. */ @isInteger public static final String TOPOLOGY_TICK_TUPLE_FREQ_SECS = “topology.tick.tuple.freq.secs”;defaults.yaml中topology.enable.message.timeouts默认为truedefaults.yaml中topology.message.timeout.secs默认为30defaults.yaml中topology.tick.tuple.freq.secs默认为null，实际是取的topology.message.timeout.secs的值StormCommon.addAckerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java public static void addAcker(Map<String, Object> conf, StormTopology topology) { int ackerNum = ObjectReader.getInt(conf.get(Config.TOPOLOGY_ACKER_EXECUTORS), ObjectReader.getInt(conf.get(Config.TOPOLOGY_WORKERS))); Map<GlobalStreamId, Grouping> inputs = ackerInputs(topology); Map<String, StreamInfo> outputStreams = new HashMap<String, StreamInfo>(); outputStreams.put(Acker.ACKER_ACK_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); outputStreams.put(Acker.ACKER_FAIL_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); outputStreams.put(Acker.ACKER_RESET_TIMEOUT_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); Map<String, Object> ackerConf = new HashMap<>(); ackerConf.put(Config.TOPOLOGY_TASKS, ackerNum); ackerConf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))); Bolt acker = Thrift.prepareSerializedBoltDetails(inputs, makeAckerBolt(), outputStreams, ackerNum, ackerConf); for (Bolt bolt : topology.get_bolts().values()) { ComponentCommon common = bolt.get_common(); common.put_to_streams(Acker.ACKER_ACK_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”, “ack-val”))); common.put_to_streams(Acker.ACKER_FAIL_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”))); common.put_to_streams(Acker.ACKER_RESET_TIMEOUT_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”))); } for (SpoutSpec spout : topology.get_spouts().values()) { ComponentCommon common = spout.get_common(); Map<String, Object> spoutConf = componentConf(spout); spoutConf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))); common.set_json_conf(JSONValue.toJSONString(spoutConf)); common.put_to_streams(Acker.ACKER_INIT_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”, “init-val”, “spout-task”))); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_ACK_STREAM_ID), Thrift.prepareDirectGrouping()); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_FAIL_STREAM_ID), Thrift.prepareDirectGrouping()); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_RESET_TIMEOUT_STREAM_ID), Thrift.prepareDirectGrouping()); } topology.put_to_bolts(Acker.ACKER_COMPONENT_ID, acker); }storm在addAcker的时候，使用了Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS的值作为Config.TOPOLOGY_TICK_TUPLE_FREQ_SECSExecutor.setupTicksstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java protected void setupTicks(boolean isSpout) { final Integer tickTimeSecs = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS), null); if (tickTimeSecs != null) { boolean enableMessageTimeout = (Boolean) topoConf.get(Config.TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS); if ((!Acker.ACKER_COMPONENT_ID.equals(componentId) && Utils.isSystemId(componentId)) || (!enableMessageTimeout && isSpout)) { LOG.info(“Timeouts disabled for executor {}:{}”, componentId, executorId); } else { StormTimer timerTask = workerData.getUserTimer(); timerTask.scheduleRecurring(tickTimeSecs, tickTimeSecs, () -> { TupleImpl tuple = new TupleImpl(workerTopologyContext, new Values(tickTimeSecs), Constants.SYSTEM_COMPONENT_ID, (int) Constants.SYSTEM_TASK_ID, Constants.SYSTEM_TICK_STREAM_ID); AddressedTuple tickTuple = new AddressedTuple(AddressedTuple.BROADCAST_DEST, tuple); try { receiveQueue.publish(tickTuple); receiveQueue.flush(); // avoid buffering } catch (InterruptedException e) { LOG.warn(“Thread interrupted when emitting tick tuple. Setting interrupt flag.”); Thread.currentThread().interrupt(); return; } } ); } } }Executor在setupTicks的时候，使用了Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS作为tickTimeSecs，即tickTuple的调度时间间隔调度tickTuple的前提之一是有开启Config.TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS该定时任务每隔tickTimeSecs发射一个tickTuple，该tuple的srcComponent设置为Constants.SYSTEM_COMPONENT_ID(__system)，taskId设置为Constants.SYSTEM_TASK_ID(-1)，streamId设置为Constants.SYSTEM_TICK_STREAM_ID(__tick)SpoutExecutor.tupleActionFnstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java public void tupleActionFn(int taskId, TupleImpl tuple) throws Exception { String streamId = tuple.getSourceStreamId(); if (Constants.SYSTEM_FLUSH_STREAM_ID.equals(streamId)) { spoutOutputCollector.flush(); } else if (streamId.equals(Constants.SYSTEM_TICK_STREAM_ID)) { pending.rotate(); } else if (streamId.equals(Constants.METRICS_TICK_STREAM_ID)) { metricsTick(idToTask.get(taskId - idToTaskBase), tuple); } else if (streamId.equals(Constants.CREDENTIALS_CHANGED_STREAM_ID)) { Object spoutObj = idToTask.get(taskId - idToTaskBase).getTaskObject(); if (spoutObj instanceof ICredentialsListener) { ((ICredentialsListener) spoutObj).setCredentials((Map<String, String>) tuple.getValue(0)); } } else if (streamId.equals(Acker.ACKER_RESET_TIMEOUT_STREAM_ID)) { Long id = (Long) tuple.getValue(0); TupleInfo pendingForId = pending.get(id); if (pendingForId != null) { pending.put(id, pendingForId); } } else { Long id = (Long) tuple.getValue(0); Long timeDeltaMs = (Long) tuple.getValue(1); TupleInfo tupleInfo = pending.remove(id); if (tupleInfo != null && tupleInfo.getMessageId() != null) { if (taskId != tupleInfo.getTaskId()) { throw new RuntimeException(“Fatal error, mismatched task ids: " + taskId + " " + tupleInfo.getTaskId()); } Long timeDelta = null; if (hasAckers) { long startTimeMs = tupleInfo.getTimestamp(); if (startTimeMs != 0) { timeDelta = timeDeltaMs; } } if (streamId.equals(Acker.ACKER_ACK_STREAM_ID)) { ackSpoutMsg(this, idToTask.get(taskId - idToTaskBase), timeDelta, tupleInfo); } else if (streamId.equals(Acker.ACKER_FAIL_STREAM_ID)) { failSpoutMsg(this, idToTask.get(taskId - idToTaskBase), timeDelta, tupleInfo, “FAIL-STREAM”); } } } }SpoutExecutor在tupleActionFn方法接收到Constants.SYSTEM_TICK_STREAM_ID的tickTuple的时候，触发pending.rotate()方法RotatingMap.rotatestorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/RotatingMap.java public Map<K, V> rotate() { Map<K, V> dead = _buckets.removeLast(); _buckets.addFirst(new HashMap<K, V>()); if (_callback != null) { for (Entry<K, V> entry : dead.entrySet()) { _callback.expire(entry.getKey(), entry.getValue()); } } return dead; }rotate方法会触发expireSpoutExecutor.RotatingMap.ExpiredCallbackstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java public void init(final ArrayList<Task> idToTask, int idToTaskBase) { this.threadId = Thread.currentThread().getId(); executorTransfer.initLocalRecvQueues(); while (!stormActive.get()) { Utils.sleep(100); } LOG.info(“Opening spout {}:{}”, componentId, taskIds); this.idToTask = idToTask; this.maxSpoutPending = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING), 0) * idToTask.size(); this.spouts = new ArrayList<>(); for (Task task : idToTask) { if (task != null) { this.spouts.add((ISpout) task.getTaskObject()); } } this.pending = new RotatingMap<>(2, new RotatingMap.ExpiredCallback<Long, TupleInfo>() { @Override public void expire(Long key, TupleInfo tupleInfo) { Long timeDelta = null; if (tupleInfo.getTimestamp() != 0) { timeDelta = Time.deltaMs(tupleInfo.getTimestamp()); } failSpoutMsg(SpoutExecutor.this, idToTask.get(tupleInfo.getTaskId() - idToTaskBase), timeDelta, tupleInfo, “TIMEOUT”); } }); //…… }SpoutExecutor在init的时候注册了pending的RotatingMap.ExpiredCallback，里头对过期的tuple调用failSpoutMsg小结spout的messageTimeout相关的参数为Config.TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS(topology.enable.message.timeouts默认true)、Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs默认30)StormCommon在addAcker的时候取Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS作为Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS值，作为tickTimeSecs，即ack的tickTuple的调度时间间隔SpoutExecutor在接收到该tickTuple的时候，触发RotatingMap的rotate操作，进行expire回调，而SpoutExecutor在init的时候，注册了RotatingMap.ExpiredCallback，对过期的tuple进行failSpoutMsg操作，回调spout的fail方法，至此完成messageTimeout的功能doc聊聊storm的ack机制聊聊storm的tickTupleGuaranteeing Message Processing ...

聊聊storm的maxSpoutPending

序本文主要研究一下storm的maxSpoutPendingTOPOLOGY_MAX_SPOUT_PENDINGstorm-2.0.0/storm-client/src/jvm/org/apache/storm/Config.java /** * The maximum number of tuples that can be pending on a spout task at any given time. This config applies to individual tasks, not to * spouts or topologies as a whole. * * A pending tuple is one that has been emitted from a spout but has not been acked or failed yet. Note that this config parameter has * no effect for unreliable spouts that don’t tag their tuples with a message id. */ @isInteger @isPositiveNumber public static final String TOPOLOGY_MAX_SPOUT_PENDING = “topology.max.spout.pending”;TOPOLOGY_MAX_SPOUT_PENDING设置的是一个spout task已经emit等待ack的tuple的最大数量，该配置仅仅对于发射可靠tuple(设置msgId)的spout起作用defaults.yaml文件中topology.max.spout.pending的默认配置为nullSpoutExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java public void init(final ArrayList<Task> idToTask, int idToTaskBase) { this.threadId = Thread.currentThread().getId(); executorTransfer.initLocalRecvQueues(); while (!stormActive.get()) { Utils.sleep(100); } LOG.info(“Opening spout {}:{}”, componentId, taskIds); this.idToTask = idToTask; this.maxSpoutPending = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING), 0) * idToTask.size(); //…… } public Callable<Long> call() throws Exception { init(idToTask, idToTaskBase); return new Callable<Long>() { final int recvqCheckSkipCountMax = getSpoutRecvqCheckSkipCount(); int recvqCheckSkips = 0; int swIdleCount = 0; // counter for spout wait strategy int bpIdleCount = 0; // counter for back pressure wait strategy int rmspCount = 0; @Override public Long call() throws Exception { int receiveCount = 0; if (recvqCheckSkips++ == recvqCheckSkipCountMax) { receiveCount = receiveQueue.consume(SpoutExecutor.this); recvqCheckSkips = 0; } long currCount = emittedCount.get(); boolean reachedMaxSpoutPending = (maxSpoutPending != 0) && (pending.size() >= maxSpoutPending); boolean isActive = stormActive.get(); if (!isActive) { inactiveExecute(); return 0L; } if (!lastActive.get()) { lastActive.set(true); activateSpouts(); } boolean pendingEmitsIsEmpty = tryFlushPendingEmits(); boolean noEmits = true; long emptyStretch = 0; if (!reachedMaxSpoutPending && pendingEmitsIsEmpty) { for (int j = 0; j < spouts.size(); j++) { // in critical path. don’t use iterators. spouts.get(j).nextTuple(); } noEmits = (currCount == emittedCount.get()); if (noEmits) { emptyEmitStreak.increment(); } else { emptyStretch = emptyEmitStreak.get(); emptyEmitStreak.set(0); } } if (reachedMaxSpoutPending) { if (rmspCount == 0) { LOG.debug(“Reached max spout pending”); } rmspCount++; } else { if (rmspCount > 0) { LOG.debug(“Ended max spout pending stretch of {} iterations”, rmspCount); } rmspCount = 0; } if (receiveCount > 1) { // continue without idling return 0L; } if (!pendingEmits.isEmpty()) { // then facing backpressure backPressureWaitStrategy(); return 0L; } bpIdleCount = 0; if (noEmits) { spoutWaitStrategy(reachedMaxSpoutPending, emptyStretch); return 0L; } swIdleCount = 0; return 0L; } private void backPressureWaitStrategy() throws InterruptedException { long start = Time.currentTimeMillis(); if (bpIdleCount == 0) { // check avoids multiple log msgs when in a idle loop LOG.debug(“Experiencing Back Pressure from downstream components. Entering BackPressure Wait.”); } bpIdleCount = backPressureWaitStrategy.idle(bpIdleCount); spoutThrottlingMetrics.skippedBackPressureMs(Time.currentTimeMillis() - start); } private void spoutWaitStrategy(boolean reachedMaxSpoutPending, long emptyStretch) throws InterruptedException { emptyEmitStreak.increment(); long start = Time.currentTimeMillis(); swIdleCount = spoutWaitStrategy.idle(swIdleCount); if (reachedMaxSpoutPending) { spoutThrottlingMetrics.skippedMaxSpoutMs(Time.currentTimeMillis() - start); } else { if (emptyStretch > 0) { LOG.debug(“Ending Spout Wait Stretch of {}”, emptyStretch); } } } // returns true if pendingEmits is empty private boolean tryFlushPendingEmits() { for (AddressedTuple t = pendingEmits.peek(); t != null; t = pendingEmits.peek()) { if (executorTransfer.tryTransfer(t, null)) { pendingEmits.poll(); } else { // to avoid reordering of emits, stop at first failure return false; } } return true; } }; }这里从topoConf读取Config.TOPOLOGY_MAX_SPOUT_PENDING，如果读取不到则取0，之后乘以task的数量，即为maxSpoutPendingmaxSpoutPending在call方法里头控制的是reachedMaxSpoutPending变量，只有!reachedMaxSpoutPending && pendingEmitsIsEmpty才能够执行nextTuple发射数据MasterBatchCoordinatorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java public void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector) { _throttler = new WindowedTimeThrottler((Number) conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1); for (String spoutId : _managedSpoutIds) { _states.add(TransactionalState.newCoordinatorState(conf, spoutId)); } _currTransaction = getStoredCurrTransaction(); _collector = collector; Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING); if (active == null) { _maxTransactionActive = 1; } else { _maxTransactionActive = active.intValue(); } _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive); for (int i = 0; i < _spouts.size(); i++) { String txId = _managedSpoutIds.get(i); _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context)); } LOG.debug(“Opened {}”, this); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won’t be a commit for tx 2 (because tx 1 isn’t committed yet), // and there won’t be a batch for tx 4 because there’s max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if (maybeCommit != null && maybeCommit.status == AttemptStatus.PROCESSED) { maybeCommit.status = AttemptStatus.COMMITTING; _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt); LOG.debug(“Emitted on [stream = {}], [tx_status = {}], [{}]”, COMMIT_STREAM_ID, maybeCommit, this); } if (_active) { if (_activeTx.size() < _maxTransactionActive) { Long curr = _currTransaction; for (int i = 0; i < _maxTransactionActive; i++) { if (!_activeTx.containsKey(curr) && isReady(curr)) { // by using a monotonically increasing attempt id, downstream tasks // can be memory efficient by clearing out state for old attempts // as soon as they see a higher attempt id for a transaction Integer attemptId = _attemptIds.get(curr); if (attemptId == null) { attemptId = 0; } else { attemptId++; } _attemptIds.put(curr, attemptId); for (TransactionalState state : _states) { state.setData(CURRENT_ATTEMPTS, _attemptIds); } TransactionAttempt attempt = new TransactionAttempt(curr, attemptId); final TransactionStatus newTransactionStatus = new TransactionStatus(attempt); _activeTx.put(curr, newTransactionStatus); _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt); LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, BATCH_STREAM_ID, attempt, newTransactionStatus, this); _throttler.markEvent(); } curr = nextTransactionId(curr); } } } }MasterBatchCoordinator的open方法从conf读取Config.TOPOLOGY_MAX_SPOUT_PENDING设置到_maxTransactionActive，如果为null则默认为1这里只有_activeTx.size() < _maxTransactionActive才会往BATCH_STREAM_ID发射数据小结Config.TOPOLOGY_MAX_SPOUT_PENDING(topology.max.spout.pending)，默认为null，只对于开启可靠(msgId)消息的spout起作用对于普通的spout，指的是等待ack的数量的最大值，超过这个值，SpoutExecutor不会调用spout的nextTuple发射数据对于trident的spout来说，指的是同时处理的batches的数量，只有这些batches处理成功或失败之后才能继续下一个batchdocTrident Spouts聊聊storm的IWaitStrategy ...

聊聊storm的IWaitStrategy

序本文主要研究一下storm的IWaitStrategyIWaitStrategystorm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/IWaitStrategy.javapublic interface IWaitStrategy { static IWaitStrategy createBackPressureWaitStrategy(Map<String, Object> topologyConf) { IWaitStrategy producerWaitStrategy = ReflectionUtils.newInstance((String) topologyConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); producerWaitStrategy.prepare(topologyConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); return producerWaitStrategy; } void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation); /** * Implementations of this method should be thread-safe (preferably no side-effects and lock-free) * * Supports static or dynamic backoff. Dynamic backoff relies on idleCounter to estimate how long caller has been idling. * * <pre> * <code> * int idleCounter = 0; * int consumeCount = consumeFromQ(); * while (consumeCount==0) { * idleCounter = strategy.idle(idleCounter); * consumeCount = consumeFromQ(); * } * </code> * </pre> * * @param idleCounter managed by the idle method until reset * @return new counter value to be used on subsequent idle cycle / int idle(int idleCounter) throws InterruptedException; enum WAIT_SITUATION {SPOUT_WAIT, BOLT_WAIT, BACK_PRESSURE_WAIT}}这个接口提供了一个工厂方法，默认是读取topology.backpressure.wait.strategy参数值，创建producerWaitStrategy，并使用WAIT_SITUATION.BACK_PRESSURE_WAIT初始化WAIT_SITUATION一共有三类，分别是SPOUT_WAIT, BOLT_WAIT, BACK_PRESSURE_WAIT该接口定义了int idle(int idleCounter)方法，用于static或dynamic backoffSpoutExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.javapublic class SpoutExecutor extends Executor { private static final Logger LOG = LoggerFactory.getLogger(SpoutExecutor.class); private final IWaitStrategy spoutWaitStrategy; private final IWaitStrategy backPressureWaitStrategy; private final AtomicBoolean lastActive; private final MutableLong emittedCount; private final MutableLong emptyEmitStreak; private final SpoutThrottlingMetrics spoutThrottlingMetrics; private final boolean hasAckers; private final SpoutExecutorStats stats; private final BuiltinMetrics builtInMetrics; SpoutOutputCollectorImpl spoutOutputCollector; private Integer maxSpoutPending; private List<ISpout> spouts; private List<SpoutOutputCollector> outputCollectors; private RotatingMap<Long, TupleInfo> pending; private long threadId = 0; public SpoutExecutor(final WorkerState workerData, final List<Long> executorId, Map<String, String> credentials) { super(workerData, executorId, credentials, ClientStatsUtil.SPOUT); this.spoutWaitStrategy = ReflectionUtils.newInstance((String) topoConf.get(Config.TOPOLOGY_SPOUT_WAIT_STRATEGY)); this.spoutWaitStrategy.prepare(topoConf, WAIT_SITUATION.SPOUT_WAIT); this.backPressureWaitStrategy = ReflectionUtils.newInstance((String) topoConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); this.backPressureWaitStrategy.prepare(topoConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); //…… } //……}这里创建了两个watiStrategy，一个是spoutWaitStrategy，一个是backPressureWaitStrategyspoutWaitStrategy读取的是topology.spout.wait.strategy参数，在defaults.yaml里头值为org.apache.storm.policy.WaitStrategyProgressivebackPressureWaitStrategy读取的是topology.backpressure.wait.strategy参数，在defaults.yaml里头值为org.apache.storm.policy.WaitStrategyProgressiveBoltExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.javapublic class BoltExecutor extends Executor { private static final Logger LOG = LoggerFactory.getLogger(BoltExecutor.class); private final BooleanSupplier executeSampler; private final boolean isSystemBoltExecutor; private final IWaitStrategy consumeWaitStrategy; // employed when no incoming data private final IWaitStrategy backPressureWaitStrategy; // employed when outbound path is congested private final BoltExecutorStats stats; private final BuiltinMetrics builtInMetrics; private BoltOutputCollectorImpl outputCollector; public BoltExecutor(WorkerState workerData, List<Long> executorId, Map<String, String> credentials) { super(workerData, executorId, credentials, ClientStatsUtil.BOLT); this.executeSampler = ConfigUtils.mkStatsSampler(topoConf); this.isSystemBoltExecutor = (executorId == Constants.SYSTEM_EXECUTOR_ID); if (isSystemBoltExecutor) { this.consumeWaitStrategy = makeSystemBoltWaitStrategy(); } else { this.consumeWaitStrategy = ReflectionUtils.newInstance((String) topoConf.get(Config.TOPOLOGY_BOLT_WAIT_STRATEGY)); this.consumeWaitStrategy.prepare(topoConf, WAIT_SITUATION.BOLT_WAIT); } this.backPressureWaitStrategy = ReflectionUtils.newInstance((String) topoConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); this.backPressureWaitStrategy.prepare(topoConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); this.stats = new BoltExecutorStats(ConfigUtils.samplingRate(this.getTopoConf()), ObjectReader.getInt(this.getTopoConf().get(Config.NUM_STAT_BUCKETS))); this.builtInMetrics = new BuiltinBoltMetrics(stats); } private static IWaitStrategy makeSystemBoltWaitStrategy() { WaitStrategyPark ws = new WaitStrategyPark(); Map<String, Object> conf = new HashMap<>(); conf.put(Config.TOPOLOGY_BOLT_WAIT_PARK_MICROSEC, 5000); ws.prepare(conf, WAIT_SITUATION.BOLT_WAIT); return ws; } //……}这里创建了两个IWaitStrategy，一个是consumeWaitStrategy，一个是backPressureWaitStrategyconsumeWaitStrategy在非SystemBoltExecutor的情况下读取的是topology.bolt.wait.strategy参数，在defaults.yaml里头值为org.apache.storm.policy.WaitStrategyProgressive；如果是SystemBoltExecutor则使用的是WaitStrategyPark策略backPressureWaitStrategy读取的是读取的是topology.backpressure.wait.strategy参数，在defaults.yaml里头值为org.apache.storm.policy.WaitStrategyProgressiveWaitStrategyParkstorm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/WaitStrategyPark.javapublic class WaitStrategyPark implements IWaitStrategy { private long parkTimeNanoSec; public WaitStrategyPark() { // required for instantiation via reflection. must call prepare() thereafter } // Convenience alternative to prepare() for use in Tests public WaitStrategyPark(long microsec) { parkTimeNanoSec = microsec * 1_000; } @Override public void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation) { if (waitSituation == WAIT_SITUATION.SPOUT_WAIT) { parkTimeNanoSec = 1_000 * ObjectReader.getLong(conf.get(Config.TOPOLOGY_SPOUT_WAIT_PARK_MICROSEC)); } else if (waitSituation == WAIT_SITUATION.BOLT_WAIT) { parkTimeNanoSec = 1_000 * ObjectReader.getLong(conf.get(Config.TOPOLOGY_BOLT_WAIT_PARK_MICROSEC)); } else if (waitSituation == WAIT_SITUATION.BACK_PRESSURE_WAIT) { parkTimeNanoSec = 1_000 * ObjectReader.getLong(conf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_PARK_MICROSEC)); } else { throw new IllegalArgumentException(“Unknown wait situation : " + waitSituation); } } @Override public int idle(int idleCounter) throws InterruptedException { if (parkTimeNanoSec == 0) { return 1; } LockSupport.parkNanos(parkTimeNanoSec); return idleCounter + 1; }}该策略使用的是LockSupport.parkNanos(parkTimeNanoSec)方法WaitStrategyProgressivestorm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/WaitStrategyProgressive.java/* * A Progressive Wait Strategy * Has three levels of idling. Stays in each level for a configured number of iterations before entering the next level. * Level 1 - No idling. Returns immediately. Stays in this level for level1Count iterations. Level 2 - Calls LockSupport.parkNanos(1). * Stays in this level for level2Count iterations Level 3 - Calls Thread.sleep(). Stays in this level until wait situation changes. * * * The initial spin can be useful to prevent downstream bolt from repeatedly sleeping/parking when the upstream component is a bit * relatively slower. Allows downstream bolt can enter deeper wait states only if the traffic to it appears to have reduced. * */public class WaitStrategyProgressive implements IWaitStrategy { private int level1Count; private int level2Count; private long level3SleepMs; @Override public void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation) { if (waitSituation == WAIT_SITUATION.SPOUT_WAIT) { level1Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_SPOUT_WAIT_PROGRESSIVE_LEVEL1_COUNT)); level2Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_SPOUT_WAIT_PROGRESSIVE_LEVEL2_COUNT)); level3SleepMs = ObjectReader.getLong(conf.get(Config.TOPOLOGY_SPOUT_WAIT_PROGRESSIVE_LEVEL3_SLEEP_MILLIS)); } else if (waitSituation == WAIT_SITUATION.BOLT_WAIT) { level1Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_BOLT_WAIT_PROGRESSIVE_LEVEL1_COUNT)); level2Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_BOLT_WAIT_PROGRESSIVE_LEVEL2_COUNT)); level3SleepMs = ObjectReader.getLong(conf.get(Config.TOPOLOGY_BOLT_WAIT_PROGRESSIVE_LEVEL3_SLEEP_MILLIS)); } else if (waitSituation == WAIT_SITUATION.BACK_PRESSURE_WAIT) { level1Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_PROGRESSIVE_LEVEL1_COUNT)); level2Count = ObjectReader.getInt(conf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_PROGRESSIVE_LEVEL2_COUNT)); level3SleepMs = ObjectReader.getLong(conf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_PROGRESSIVE_LEVEL3_SLEEP_MILLIS)); } else { throw new IllegalArgumentException(“Unknown wait situation : " + waitSituation); } } @Override public int idle(int idleCounter) throws InterruptedException { if (idleCounter < level1Count) { // level 1 - no waiting ++idleCounter; } else if (idleCounter < level1Count + level2Count) { // level 2 - parkNanos(1L) ++idleCounter; LockSupport.parkNanos(1L); } else { // level 3 - longer idling with Thread.sleep() Thread.sleep(level3SleepMs); } return idleCounter; }}WaitStrategyProgressive是一个渐进式的wait strategy，它分为3个level的idlinglevel 1是no idling，立刻返回；在level 1经历了level1Count的次数之后进入level 2level 2使用的是LockSupport.parkNanos(1)，在level 2经历了level2Count次数之后进入level 3level 3使用的是Thread.sleep(level3SleepMs)，在wait situation改变的时候跳出不同的WAIT_SITUATION读取不同的LEVEL1_COUNT、LEVEL2_COUNT、LEVEL3_SLEEP_MILLIS参数，对于spout，它们的默认值分别为0、0、1；对于bolt它们的默认值分别为1、1000、1；对于back pressure，它们的默认值分别为1、1000、1SpoutExecutor.callstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java @Override public Callable<Long> call() throws Exception { init(idToTask, idToTaskBase); return new Callable<Long>() { final int recvqCheckSkipCountMax = getSpoutRecvqCheckSkipCount(); int recvqCheckSkips = 0; int swIdleCount = 0; // counter for spout wait strategy int bpIdleCount = 0; // counter for back pressure wait strategy int rmspCount = 0; @Override public Long call() throws Exception { int receiveCount = 0; if (recvqCheckSkips++ == recvqCheckSkipCountMax) { receiveCount = receiveQueue.consume(SpoutExecutor.this); recvqCheckSkips = 0; } long currCount = emittedCount.get(); boolean reachedMaxSpoutPending = (maxSpoutPending != 0) && (pending.size() >= maxSpoutPending); boolean isActive = stormActive.get(); if (!isActive) { inactiveExecute(); return 0L; } if (!lastActive.get()) { lastActive.set(true); activateSpouts(); } boolean pendingEmitsIsEmpty = tryFlushPendingEmits(); boolean noEmits = true; long emptyStretch = 0; if (!reachedMaxSpoutPending && pendingEmitsIsEmpty) { for (int j = 0; j < spouts.size(); j++) { // in critical path. don’t use iterators. spouts.get(j).nextTuple(); } noEmits = (currCount == emittedCount.get()); if (noEmits) { emptyEmitStreak.increment(); } else { emptyStretch = emptyEmitStreak.get(); emptyEmitStreak.set(0); } } if (reachedMaxSpoutPending) { if (rmspCount == 0) { LOG.debug(“Reached max spout pending”); } rmspCount++; } else { if (rmspCount > 0) { LOG.debug(“Ended max spout pending stretch of {} iterations”, rmspCount); } rmspCount = 0; } if (receiveCount > 1) { // continue without idling return 0L; } if (!pendingEmits.isEmpty()) { // then facing backpressure backPressureWaitStrategy(); return 0L; } bpIdleCount = 0; if (noEmits) { spoutWaitStrategy(reachedMaxSpoutPending, emptyStretch); return 0L; } swIdleCount = 0; return 0L; } private void backPressureWaitStrategy() throws InterruptedException { long start = Time.currentTimeMillis(); if (bpIdleCount == 0) { // check avoids multiple log msgs when in a idle loop LOG.debug(“Experiencing Back Pressure from downstream components. Entering BackPressure Wait.”); } bpIdleCount = backPressureWaitStrategy.idle(bpIdleCount); spoutThrottlingMetrics.skippedBackPressureMs(Time.currentTimeMillis() - start); } private void spoutWaitStrategy(boolean reachedMaxSpoutPending, long emptyStretch) throws InterruptedException { emptyEmitStreak.increment(); long start = Time.currentTimeMillis(); swIdleCount = spoutWaitStrategy.idle(swIdleCount); if (reachedMaxSpoutPending) { spoutThrottlingMetrics.skippedMaxSpoutMs(Time.currentTimeMillis() - start); } else { if (emptyStretch > 0) { LOG.debug(“Ending Spout Wait Stretch of {}”, emptyStretch); } } } // returns true if pendingEmits is empty private boolean tryFlushPendingEmits() { for (AddressedTuple t = pendingEmits.peek(); t != null; t = pendingEmits.peek()) { if (executorTransfer.tryTransfer(t, null)) { pendingEmits.poll(); } else { // to avoid reordering of emits, stop at first failure return false; } } return true; } }; }spout维护了pendingEmits队列，即emit没有成功或者等待emit的队列，同时也维护了pending的RotatingMap，即等待ack的tuple的id及数据spout从topology.max.spout.pending读取TOPOLOGY_MAX_SPOUT_PENDING配置，计算maxSpoutPending=ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING), 0) * idToTask.size()，默认为null，即maxSpoutPending为0spout在!reachedMaxSpoutPending && pendingEmitsIsEmpty的条件下才调用nextTuple发送数据；在pendingEmits不为空的时候触发backPressureWaitStrategy；在noEmits((currCount == emittedCount.get()))时触发spoutWaitStrategy在每次调用call的时候，在调用nextTuple之间记录currCount = emittedCount.get()；如果有调用nextTuple的话，则会在SpoutOutputCollectorImpl的emit或emitDirect等方法更新emittedCount；之后用noEmits=(currCount == emittedCount.get())判断是否有发射数据spout维护了bpIdleCount以及swIdleCount，分别用于backPressureWaitStrategy.idle(bpIdleCount)、spoutWaitStrategy.idle(swIdleCount)BoltExecutor.callstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java @Override public Callable<Long> call() throws Exception { init(idToTask, idToTaskBase); return new Callable<Long>() { int bpIdleCount = 0; int consumeIdleCounter = 0; private final ExitCondition tillNoPendingEmits = () -> pendingEmits.isEmpty(); @Override public Long call() throws Exception { boolean pendingEmitsIsEmpty = tryFlushPendingEmits(); if (pendingEmitsIsEmpty) { if (bpIdleCount != 0) { LOG.debug(“Ending Back Pressure Wait stretch : {}”, bpIdleCount); } bpIdleCount = 0; int consumeCount = receiveQueue.consume(BoltExecutor.this, tillNoPendingEmits); if (consumeCount == 0) { if (consumeIdleCounter == 0) { LOG.debug(“Invoking consume wait strategy”); } consumeIdleCounter = consumeWaitStrategy.idle(consumeIdleCounter); if (Thread.interrupted()) { throw new InterruptedException(); } } else { if (consumeIdleCounter != 0) { LOG.debug(“Ending consume wait stretch : {}”, consumeIdleCounter); } consumeIdleCounter = 0; } } else { if (bpIdleCount == 0) { // check avoids multiple log msgs when spinning in a idle loop LOG.debug(“Experiencing Back Pressure. Entering BackPressure Wait. PendingEmits = {}”, pendingEmits.size()); } bpIdleCount = backPressureWaitStrategy.idle(bpIdleCount); } return 0L; } // returns true if pendingEmits is empty private boolean tryFlushPendingEmits() { for (AddressedTuple t = pendingEmits.peek(); t != null; t = pendingEmits.peek()) { if (executorTransfer.tryTransfer(t, null)) { pendingEmits.poll(); } else { // to avoid reordering of emits, stop at first failure return false; } } return true; } }; }bolt executor同样也维护了pendingEmits，在pendingEmits不为空的时候，触发backPressureWaitStrategy.idle(bpIdleCount)在pendingEmits为空时，根据receiveQueue.consume(BoltExecutor.this, tillNoPendingEmits)返回的consumeCount，若为0则触发consumeWaitStrategy.idle(consumeIdleCounter)bolt executor维护了bpIdleCount及consumeIdleCounter，分别用于backPressureWaitStrategy.idle(bpIdleCount)以及consumeWaitStrategy.idle(consumeIdleCounter)小结spout和bolt的executor里头都用到了backPressureWaitStrategy，读取的是topology.backpressure.wait.strategy参数(for any producer (spout/bolt/transfer thread) when the downstream Q is full)，使用的实现类为org.apache.storm.policy.WaitStrategyProgressive，在下游component的recv queue满的时候使用的背压策略；具体是使用pendingEmits队列来判断，spout或bolt的call方法里头每次判断pendingEmitsIsEmpty都是调用tryFlushPendingEmits，先尝试发送数据，如果下游成功接收，则pendingEmits队列为空，通过这种机制来动态判断下游负载，决定是否触发backpressurespout使用的spoutWaitStrategy，读取的是topology.spout.wait.strategy参数(employed when there is no data to produce)，使用的实现类为org.apache.storm.policy.WaitStrategyProgressive，在没有数据发射的时候使用；具体是使用emittedCount来判断bolt使用的consumeWaitStrategy，在非SystemBoltExecutor的情况下读取的是topology.bolt.wait.strategy参数(employed when there is no data in its receive buffer to process)，使用的实现类为org.apache.storm.policy.WaitStrategyProgressive，在receive buffer没有数据处理的时候使用；具体是使用receiveQueue.consume(BoltExecutor.this, tillNoPendingEmits)返回的consumeCount来判断spout与bolt不同的还有一点就是spout除了pendingEmitsIsEmpty还多了一个reachedMaxSpoutPending参数，来判断是否继续产生数据，bolt则使用pendingEmitsIsEmpty来判断是否可以继续消费数据IWaitStrategy除了WaitStrategyProgressive实现，还有WaitStrategyPark实现，该策略在bolt是SystemBolt的情况下使用docIWaitStrategyWaitStrategyProgressiveWaitStrategyPark ...

[case43]聊聊storm的LinearDRPCTopologyBuilder

序本文主要研究一下storm的LinearDRPCTopologyBuilder实例manual drpc @Test public void testManualDRPC() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); DRPCSpout spout = new DRPCSpout(“exclamation”); //Fields(“args”, “return-info”) //spout为DRPCSpout，组件id为drpc builder.setSpout(“drpc”, spout); builder.setBolt(“exclaim”, new ManualExclaimBolt(), 3).shuffleGrouping(“drpc”); //Fields(“result”, “return-info”) builder.setBolt(“return”, new ReturnResults(), 3).shuffleGrouping(“exclaim”); SubmitHelper.submitRemote(“manualDrpc”,builder.createTopology()); }这里展示了最原始的drpc的topology的构建，开始使用DRPCSpout，结束使用ReturnResultsDRPCSpout的outputFields为Fields(“args”, “return-info”)，ReturnResults接收的fields为Fields(“result”, “return-info”)这里要求自定义的ManualExclaimBolt的outputFields为Fields为Fields(“result”, “return-info”)，其中return-info可以从input中获取，而result则会处理结果使用LinearDRPCTopologyBuilder @Test public void testBasicDRPCTopology() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder(“exclamation”); builder.addBolt(new ExclaimBolt(), 3); SubmitHelper.submitRemote(“basicDrpc”,builder.createRemoteTopology()); }LinearDRPCTopologyBuilder自动帮你构建了DRPCSpout、PrepareRequest、CoordinatedBolt、JoinResult、ReturnResults，在使用上极为简洁由于构造的component上下游不同，因而对用户自定义的bolt的要求为输入字段为Fields(“request”, “args”)，输出字段为new Fields(“id”, “result”)，其中前者的request即为requestId，即为后者的id，是long型；args为输入参数，result为输出结果LinearDRPCTopologyBuilderstorm-2.0.0/storm-client/src/jvm/org/apache/storm/drpc/LinearDRPCTopologyBuilder.javapublic class LinearDRPCTopologyBuilder { String function; List<Component> components = new ArrayList<>(); public LinearDRPCTopologyBuilder(String function) { this.function = function; } private static String boltId(int index) { return “bolt” + index; } public LinearDRPCInputDeclarer addBolt(IBatchBolt bolt, Number parallelism) { return addBolt(new BatchBoltExecutor(bolt), parallelism); } public LinearDRPCInputDeclarer addBolt(IBatchBolt bolt) { return addBolt(bolt, 1); } @Deprecated public LinearDRPCInputDeclarer addBolt(IRichBolt bolt, Number parallelism) { if (parallelism == null) { parallelism = 1; } Component component = new Component(bolt, parallelism.intValue()); components.add(component); return new InputDeclarerImpl(component); } @Deprecated public LinearDRPCInputDeclarer addBolt(IRichBolt bolt) { return addBolt(bolt, null); } public LinearDRPCInputDeclarer addBolt(IBasicBolt bolt, Number parallelism) { return addBolt(new BasicBoltExecutor(bolt), parallelism); } public LinearDRPCInputDeclarer addBolt(IBasicBolt bolt) { return addBolt(bolt, null); } public StormTopology createLocalTopology(ILocalDRPC drpc) { return createTopology(new DRPCSpout(function, drpc)); } public StormTopology createRemoteTopology() { return createTopology(new DRPCSpout(function)); } private StormTopology createTopology(DRPCSpout spout) { final String SPOUT_ID = “spout”; final String PREPARE_ID = “prepare-request”; TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(SPOUT_ID, spout); builder.setBolt(PREPARE_ID, new PrepareRequest()) .noneGrouping(SPOUT_ID); int i = 0; for (; i < components.size(); i++) { Component component = components.get(i); Map<String, SourceArgs> source = new HashMap<String, SourceArgs>(); if (i == 1) { source.put(boltId(i - 1), SourceArgs.single()); } else if (i >= 2) { source.put(boltId(i - 1), SourceArgs.all()); } IdStreamSpec idSpec = null; if (i == components.size() - 1 && component.bolt instanceof FinishedCallback) { idSpec = IdStreamSpec.makeDetectSpec(PREPARE_ID, PrepareRequest.ID_STREAM); } BoltDeclarer declarer = builder.setBolt( boltId(i), new CoordinatedBolt(component.bolt, source, idSpec), component.parallelism); for (SharedMemory request : component.sharedMemory) { declarer.addSharedMemory(request); } if (!component.componentConf.isEmpty()) { declarer.addConfigurations(component.componentConf); } if (idSpec != null) { declarer.fieldsGrouping(idSpec.getGlobalStreamId().get_componentId(), PrepareRequest.ID_STREAM, new Fields(“request”)); } if (i == 0 && component.declarations.isEmpty()) { declarer.noneGrouping(PREPARE_ID, PrepareRequest.ARGS_STREAM); } else { String prevId; if (i == 0) { prevId = PREPARE_ID; } else { prevId = boltId(i - 1); } for (InputDeclaration declaration : component.declarations) { declaration.declare(prevId, declarer); } } if (i > 0) { declarer.directGrouping(boltId(i - 1), Constants.COORDINATED_STREAM_ID); } } IRichBolt lastBolt = components.get(components.size() - 1).bolt; OutputFieldsGetter getter = new OutputFieldsGetter(); lastBolt.declareOutputFields(getter); Map<String, StreamInfo> streams = getter.getFieldsDeclaration(); if (streams.size() != 1) { throw new RuntimeException(“Must declare exactly one stream from last bolt in LinearDRPCTopology”); } String outputStream = streams.keySet().iterator().next(); List<String> fields = streams.get(outputStream).get_output_fields(); if (fields.size() != 2) { throw new RuntimeException( “Output stream of last component in LinearDRPCTopology must contain exactly two fields. " + “The first should be the request id, and the second should be the result.”); } builder.setBolt(boltId(i), new JoinResult(PREPARE_ID)) .fieldsGrouping(boltId(i - 1), outputStream, new Fields(fields.get(0))) .fieldsGrouping(PREPARE_ID, PrepareRequest.RETURN_STREAM, new Fields(“request”)); i++; builder.setBolt(boltId(i), new ReturnResults()) .noneGrouping(boltId(i - 1)); return builder.createTopology(); } //……}从createTopology可以看到，构建的spout为DRPCSpout(spout)，之后是PrepareRequest(prepare-request)之后根据用户设置的bolt，包装构建CoordinatedBolt，如果有多个bolt的话，会对第二个及之后的bolt设置directGrouping(boltId(i - 1), Constants.COORDINATED_STREAM_ID)，用emitDirect发射Fields(“id”, “count”)构建完用户设置的bolt之后，构建JoinResult，最后才是ReturnResultsDRPCSpoutstorm-2.0.0/storm-client/src/jvm/org/apache/storm/drpc/DRPCSpout.javapublic class DRPCSpout extends BaseRichSpout { public static final Logger LOG = LoggerFactory.getLogger(DRPCSpout.class); //ANY CHANGE TO THIS CODE MUST BE SERIALIZABLE COMPATIBLE OR THERE WILL BE PROBLEMS static final long serialVersionUID = 2387848310969237877L; final String _function; final String _local_drpc_id; SpoutOutputCollector _collector; List<DRPCInvocationsClient> _clients = new ArrayList<>(); transient LinkedList<Future<Void>> _futures = null; transient ExecutorService _backround = null; public DRPCSpout(String function) { _function = function; if (DRPCClient.isLocalOverride()) { _local_drpc_id = DRPCClient.getOverrideServiceId(); } else { _local_drpc_id = null; } } //…… @Override public void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; if (_local_drpc_id == null) { _backround = new ExtendedThreadPoolExecutor(0, Integer.MAX_VALUE, 60L, TimeUnit.SECONDS, new SynchronousQueue<Runnable>()); _futures = new LinkedList<>(); int numTasks = context.getComponentTasks(context.getThisComponentId()).size(); int index = context.getThisTaskIndex(); int port = ObjectReader.getInt(conf.get(Config.DRPC_INVOCATIONS_PORT)); List<String> servers = (List<String>) conf.get(Config.DRPC_SERVERS); if (servers == null || servers.isEmpty()) { throw new RuntimeException(“No DRPC servers configured for topology”); } if (numTasks < servers.size()) { for (String s : servers) { _futures.add(_backround.submit(new Adder(s, port, conf))); } } else { int i = index % servers.size(); _futures.add(_backround.submit(new Adder(servers.get(i), port, conf))); } } } @Override public void close() { for (DRPCInvocationsClient client : _clients) { client.close(); } } @Override public void nextTuple() { if (_local_drpc_id == null) { int size = 0; synchronized (_clients) { size = _clients.size(); //This will only ever grow, so no need to worry about falling off the end } for (int i = 0; i < size; i++) { DRPCInvocationsClient client; synchronized (_clients) { client = _clients.get(i); } if (!client.isConnected()) { LOG.warn(“DRPCInvocationsClient [{}:{}] is not connected.”, client.getHost(), client.getPort()); reconnectAsync(client); continue; } try { DRPCRequest req = client.fetchRequest(_function); if (req.get_request_id().length() > 0) { Map<String, Object> returnInfo = new HashMap<>(); returnInfo.put(“id”, req.get_request_id()); returnInfo.put(“host”, client.getHost()); returnInfo.put(“port”, client.getPort()); _collector.emit(new Values(req.get_func_args(), JSONValue.toJSONString(returnInfo)), new DRPCMessageId(req.get_request_id(), i)); break; } } catch (AuthorizationException aze) { reconnectAsync(client); LOG.error(“Not authorized to fetch DRPC request from DRPC server”, aze); } catch (TException e) { reconnectAsync(client); LOG.error(“Failed to fetch DRPC request from DRPC server”, e); } catch (Exception e) { LOG.error(“Failed to fetch DRPC request from DRPC server”, e); } } checkFutures(); } else { //…… } } @Override public void ack(Object msgId) { } @Override public void fail(Object msgId) { DRPCMessageId did = (DRPCMessageId) msgId; DistributedRPCInvocations.Iface client; if (_local_drpc_id == null) { client = _clients.get(did.index); } else { client = (DistributedRPCInvocations.Iface) ServiceRegistry.getService(_local_drpc_id); } int retryCnt = 0; int maxRetries = 3; while (retryCnt < maxRetries) { retryCnt++; try { client.failRequest(did.id); break; } catch (AuthorizationException aze) { LOG.error(“Not authorized to failRequest from DRPC server”, aze); throw new RuntimeException(aze); } catch (TException tex) { if (retryCnt >= maxRetries) { LOG.error(“Failed to fail request”, tex); break; } reconnectSync((DRPCInvocationsClient) client); } } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“args”, “return-info”)); } //……}open的时候准备DRPCInvocationsClientnextTuple方法通过DRPCInvocationsClient.fetchRequest(_function)获取DRPCRequest信息之后构建returnInfo然后emit数据，msgId为DRPCMessageId，tuple为Values(req.get_func_args(), JSONValue.toJSONString(returnInfo))这里重写了fail方法，对于请求失败，进行重试，默认重试3次PrepareRequeststorm-2.0.0/storm-client/src/jvm/org/apache/storm/drpc/PrepareRequest.javapublic class PrepareRequest extends BaseBasicBolt { public static final String ARGS_STREAM = Utils.DEFAULT_STREAM_ID; public static final String RETURN_STREAM = “ret”; public static final String ID_STREAM = “id”; Random rand; @Override public void prepare(Map<String, Object> map, TopologyContext context) { rand = new Random(); } @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String args = tuple.getString(0); String returnInfo = tuple.getString(1); long requestId = rand.nextLong(); collector.emit(ARGS_STREAM, new Values(requestId, args)); collector.emit(RETURN_STREAM, new Values(requestId, returnInfo)); collector.emit(ID_STREAM, new Values(requestId)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declareStream(ARGS_STREAM, new Fields(“request”, “args”)); declarer.declareStream(RETURN_STREAM, new Fields(“request”, “return”)); declarer.declareStream(ID_STREAM, new Fields(“request”)); }}PrepareRequest取出args及returnInfo，构造requestId，然后emit到ARGS_STREAM、RETURN_STREAM、ID_STREAM三个streamJoinResult会接收PrepareRequest的RETURN_STREAM，第一个CoordinatedBolt会接收ARGS_STREAMCoordinatedBoltstorm-2.0.0/storm-client/src/jvm/org/apache/storm/coordination/CoordinatedBolt.java/** * Coordination requires the request ids to be globally unique for awhile. This is so it doesn’t get confused in the case of retries. */public class CoordinatedBolt implements IRichBolt { private TimeCacheMap<Object, TrackingInfo> _tracked; //…… public void execute(Tuple tuple) { Object id = tuple.getValue(0); TrackingInfo track; TupleType type = getTupleType(tuple); synchronized (_tracked) { track = _tracked.get(id); if (track == null) { track = new TrackingInfo(); if (_idStreamSpec == null) { track.receivedId = true; } _tracked.put(id, track); } } if (type == TupleType.ID) { synchronized (_tracked) { track.receivedId = true; } checkFinishId(tuple, type); } else if (type == TupleType.COORD) { int count = (Integer) tuple.getValue(1); synchronized (_tracked) { track.reportCount++; track.expectedTupleCount += count; } checkFinishId(tuple, type); } else { synchronized (_tracked) { _delegate.execute(tuple); } } } public void declareOutputFields(OutputFieldsDeclarer declarer) { _delegate.declareOutputFields(declarer); declarer.declareStream(Constants.COORDINATED_STREAM_ID, true, new Fields(“id”, “count”)); } //…… public static class TrackingInfo { int reportCount = 0; int expectedTupleCount = 0; int receivedTuples = 0; boolean failed = false; Map<Integer, Integer> taskEmittedTuples = new HashMap<>(); boolean receivedId = false; boolean finished = false; List<Tuple> ackTuples = new ArrayList<>(); @Override public String toString() { return “reportCount: " + reportCount + “\n” + “expectedTupleCount: " + expectedTupleCount + “\n” + “receivedTuples: " + receivedTuples + “\n” + “failed: " + failed + “\n” + taskEmittedTuples.toString(); } }}CoordinatedBolt在declareOutputFields的时候，除了调用代理bolt的declareOutputFields外，还declareStream，给Constants.COORDINATED_STREAM_ID发射Fields(“id”, “count”)execute方法首先保证每个requestId都有一个TrackingInfo，它记录了expectedTupleCount以及receivedTuples统计数，还有taskEmittedTuples(这里命名有点歧义，其实是这里维护的是当前bolt发射给下游bolt的task的tuple数量，用于emitDirect告知下游bolt的task它应该接收到的tuple数量(具体是在checkFinishId方法中，在finished的时候发送)，下游bolt接收到该统计数之后更新expectedTupleCount)execute方法接收到的tuple有几类，一类是TupleType.ID(_idStreamSpec不为null的情况下)、一类是TupleType.COORD(接收Fields(“id”, “count”)，并执行checkFinishId，判断是否应该结束)、一类是TupleType.REGULAR(正常的执行bolt的execute方法)checkFinishId会判断track.reportCount == _numSourceReports以及track.expectedTupleCount == track.receivedTuples，如果满足条件则标记track.finished = true，同时通知下游bolt它应该接收到多少数量的tuple(如果还有的话)。JoinResultstorm-2.0.0/storm-client/src/jvm/org/apache/storm/drpc/JoinResult.javapublic class JoinResult extends BaseRichBolt { public static final Logger LOG = LoggerFactory.getLogger(JoinResult.class); String returnComponent; Map<Object, Tuple> returns = new HashMap<>(); Map<Object, Tuple> results = new HashMap<>(); OutputCollector _collector; public JoinResult(String returnComponent) { this.returnComponent = returnComponent; } public void prepare(Map<String, Object> map, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { Object requestId = tuple.getValue(0); if (tuple.getSourceComponent().equals(returnComponent)) { returns.put(requestId, tuple); } else { results.put(requestId, tuple); } if (returns.containsKey(requestId) && results.containsKey(requestId)) { Tuple result = results.remove(requestId); Tuple returner = returns.remove(requestId); LOG.debug(result.getValue(1).toString()); List<Tuple> anchors = new ArrayList<>(); anchors.add(result); anchors.add(returner); _collector.emit(anchors, new Values(”” + result.getValue(1), returner.getValue(1))); _collector.ack(result); _collector.ack(returner); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“result”, “return-info”)); }}如果tuple是PrepareRequest发送过来的，则将tuple放入returns，否则放入results之后判断returns及results两个map是否同时都有该requestId，如果有表示匹配出了结果，则往下游emit数据emit的第一个字段为result，第二个为returnInfoReturnResultsstorm-2.0.0/storm-client/src/jvm/org/apache/storm/drpc/ReturnResults.javapublic class ReturnResults extends BaseRichBolt { public static final Logger LOG = LoggerFactory.getLogger(ReturnResults.class); //ANY CHANGE TO THIS CODE MUST BE SERIALIZABLE COMPATIBLE OR THERE WILL BE PROBLEMS static final long serialVersionUID = -774882142710631591L; OutputCollector _collector; boolean local; Map<String, Object> _conf; Map<List, DRPCInvocationsClient> _clients = new HashMap<List, DRPCInvocationsClient>(); @Override public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) { _conf = topoConf; _collector = collector; local = topoConf.get(Config.STORM_CLUSTER_MODE).equals(“local”); } @Override public void execute(Tuple input) { String result = (String) input.getValue(0); String returnInfo = (String) input.getValue(1); if (returnInfo != null) { Map<String, Object> retMap; try { retMap = (Map<String, Object>) JSONValue.parseWithException(returnInfo); } catch (ParseException e) { LOG.error(“Parseing returnInfo failed”, e); _collector.fail(input); return; } final String host = (String) retMap.get(“host”); final int port = ObjectReader.getInt(retMap.get(“port”)); String id = (String) retMap.get(“id”); DistributedRPCInvocations.Iface client; if (local) { client = (DistributedRPCInvocations.Iface) ServiceRegistry.getService(host); } else { List server = new ArrayList() {{ add(host); add(port); }}; if (!_clients.containsKey(server)) { try { _clients.put(server, new DRPCInvocationsClient(_conf, host, port)); } catch (TTransportException ex) { throw new RuntimeException(ex); } } client = _clients.get(server); } int retryCnt = 0; int maxRetries = 3; while (retryCnt < maxRetries) { retryCnt++; try { client.result(id, result); _collector.ack(input); break; } catch (AuthorizationException aze) { LOG.error(“Not authorized to return results to DRPC server”, aze); _collector.fail(input); throw new RuntimeException(aze); } catch (TException tex) { if (retryCnt >= maxRetries) { LOG.error(“Failed to return results to DRPC server”, tex); _collector.fail(input); } reconnectClient((DRPCInvocationsClient) client); } } } } private void reconnectClient(DRPCInvocationsClient client) { if (client instanceof DRPCInvocationsClient) { try { LOG.info(“reconnecting… “); client.reconnectClient(); //Blocking call } catch (TException e2) { LOG.error(“Failed to connect to DRPC server”, e2); } } } @Override public void cleanup() { for (DRPCInvocationsClient c : _clients.values()) { c.close(); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { }}ReturnResults主要是将结果发送给请求的DRPCInvocationsClientreturnInfo里头包含了要将结果发送到的目标host、port，根据host、port构造DRPCInvocationsClient之后调用DRPCInvocationsClient.result(id, result)方法将结果返回，默认重试3次，如果是AuthorizationException则直接fail，如果成功则ack小结LinearDRPCTopologyBuilder在v0.9.1-incubating版本的时候被标记为@Deprecated(2012年月)，当时认为Trident的newDRPCStream的替代，不过这样的话要用drpc就得使用Trident，所以后来(2018年4月)移除掉该标志，在2.0.0, 1.1.3, 1.0.7, 1.2.2版本均已经不是废弃标记LinearDRPCTopologyBuilder包装组合了DRPCSpout、PrepareRequest、CoordinatedBolt、JoinResult、ReturnResults，对外暴露简单的api无需用户在构造这些componentDRPCSpout主要是构造args以及returnInfo信息；PrepareRequest将数据分流，发往ARGS_STREAM、RETURN_STREAM、ID_STREAM；CoordinatedBolt主要是保障这些bolt之间的tuple被完整传递及ack；JoinResult主要是匹配requestId及结果，将请求与响应的数据匹配上，然后发送到下游；ReturnResults根据returnInfo将数据返回给Client端使用LinearDRPCTopologyBuilder，对于第一个bolt，其输入为Fields(“request”, “args”)；对最后一个bolt要求输出字段为new Fields(“id”, “result”)；对于非最后一个bolt要求输出字段的第一个字段为id，即requestId，方便CoordinatedBolt进行追踪统计，确认bolt是否成功接收上游bolt发送的所有tuple。docDistributed RPCLinearDRPCTopologyBuilder DeprecatedLinearDRPCTopologyBuilder is deprecated - what to use insteadLinearDRPCTopologyBuilder shouldn’t be deprecatedTwitter Storm源代码分析之CoordinatedBolt ...

聊聊storm的JoinBolt

序本文主要研究一下storm的JoinBolt实例 @Test public void testJoinBolt() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“uuid-spout”, new RandomWordSpout(new String[]{“uuid”, “timestamp”}), 1); builder.setSpout(“word-spout”, new RandomWordSpout(new String[]{“word”, “timestamp”}), 1); JoinBolt joinBolt = new JoinBolt(“uuid-spout”, “timestamp”) //from priorStream inner join newStream on newStream.field = priorStream.field1 .join(“word-spout”, “timestamp”, “uuid-spout”) .select(“uuid,word,timestamp”) .withTumblingWindow(BaseWindowedBolt.Count.of(10)); builder.setBolt(“join”, joinBolt,1) .fieldsGrouping(“uuid-spout”,new Fields(“timestamp”)) .fieldsGrouping(“word-spout”,new Fields(“timestamp”)); builder.setBolt(“fileWriter”,new FilePrinterBolt(),1).globalGrouping(“join”); SubmitHelper.submitRemote(“windowTopology”,builder.createTopology()); }JoinBoltstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.javapublic class JoinBolt extends BaseWindowedBolt { protected final Selector selectorType; // Map[StreamName -> JoinInfo] protected LinkedHashMap<String, JoinInfo> joinCriteria = new LinkedHashMap<>(); protected FieldSelector[] outputFields; // specified via bolt.select() … used in declaring Output fields // protected String[] dotSeparatedOutputFieldNames; // fieldNames in x.y.z format w/o stream name, used for naming output fields protected String outputStreamName; // Map[StreamName -> Map[Key -> List<Tuple>] ] HashMap<String, HashMap<Object, ArrayList<Tuple>>> hashedInputs = new HashMap<>(); // holds remaining streams private OutputCollector collector; /** * Calls JoinBolt(Selector.SOURCE, sourceId, fieldName) * * @param sourceId Id of source component (spout/bolt) from which this bolt is receiving data * @param fieldName the field to use for joining the stream (x.y.z format) / public JoinBolt(String sourceId, String fieldName) { this(Selector.SOURCE, sourceId, fieldName); } /* * Introduces the first stream to start the join with. Equivalent SQL … select …. from srcOrStreamId … * * @param type Specifies whether ‘srcOrStreamId’ refers to stream name/source component * @param srcOrStreamId name of stream OR source component * @param fieldName the field to use for joining the stream (x.y.z format) / public JoinBolt(Selector type, String srcOrStreamId, String fieldName) { selectorType = type; joinCriteria.put(srcOrStreamId, new JoinInfo(new FieldSelector(srcOrStreamId, fieldName))); } /* * Optional. Allows naming the output stream of this bolt. If not specified, the emits will happen on ‘default’ stream. / public JoinBolt withOutputStream(String streamName) { this.outputStreamName = streamName; return this; } /* * Performs inner Join with the newStream. SQL : from priorStream inner join newStream on newStream.field = priorStream.field1 same * as: new WindowedQueryBolt(priorStream,field1). join(newStream, field, priorStream); * * Note: priorStream must be previously joined. Valid ex: new WindowedQueryBolt(s1,k1). join(s2,k2, s1). join(s3,k3, s2); Invalid ex: * new WindowedQueryBolt(s1,k1). join(s3,k3, s2). join(s2,k2, s1); * * @param newStream Either stream name or name of upstream component * @param field the field on which to perform the join / public JoinBolt join(String newStream, String field, String priorStream) { return joinCommon(newStream, field, priorStream, JoinType.INNER); } /* * Performs left Join with the newStream. SQL : from stream1 left join stream2 on stream2.field = stream1.field1 same as: new * WindowedQueryBolt(stream1, field1). leftJoin(stream2, field, stream1); * * Note: priorStream must be previously joined Valid ex: new WindowedQueryBolt(s1,k1). leftJoin(s2,k2, s1). leftJoin(s3,k3, s2); * Invalid ex: new WindowedQueryBolt(s1,k1). leftJoin(s3,k3, s2). leftJoin(s2,k2, s1); * * @param newStream Either a name of a stream or an upstream component * @param field the field on which to perform the join / public JoinBolt leftJoin(String newStream, String field, String priorStream) { return joinCommon(newStream, field, priorStream, JoinType.LEFT); } private JoinBolt joinCommon(String newStream, String fieldDescriptor, String priorStream, JoinType joinType) { if (hashedInputs.containsKey(newStream)) { throw new IllegalArgumentException("’" + newStream + “’ is already part of join. Cannot join with it more than once.”); } hashedInputs.put(newStream, new HashMap<Object, ArrayList<Tuple>>()); JoinInfo joinInfo = joinCriteria.get(priorStream); if (joinInfo == null) { throw new IllegalArgumentException(“Stream ‘” + priorStream + “’ was not previously declared”); } FieldSelector field = new FieldSelector(newStream, fieldDescriptor); joinCriteria.put(newStream, new JoinInfo(field, priorStream, joinInfo, joinType)); return this; } /* * Specify projection fields. i.e. Specifies the fields to include in the output. e.g: .select(“field1, stream2:field2, field3”) Nested * Key names are supported for nested types: e.g: .select(“outerKey1.innerKey1, outerKey1.innerKey2, stream3:outerKey2.innerKey3)” Inner * types (non leaf) must be Map<> in order to support nested lookup using this dot notation This selected fields implicitly declare the * output fieldNames for the bolt based. * * @param commaSeparatedKeys * @return */ public JoinBolt select(String commaSeparatedKeys) { String[] fieldNames = commaSeparatedKeys.split(","); outputFields = new FieldSelector[fieldNames.length]; for (int i = 0; i < fieldNames.length; i++) { outputFields[i] = new FieldSelector(fieldNames[i]); } return this; } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { String[] outputFieldNames = new String[outputFields.length]; for (int i = 0; i < outputFields.length; ++i) { outputFieldNames[i] = outputFields[i].getOutputName(); } if (outputStreamName != null) { declarer.declareStream(outputStreamName, new Fields(outputFieldNames)); } else { declarer.declare(new Fields(outputFieldNames)); } } @Override public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) { this.collector = collector; // initialize the hashedInputs data structure int i = 0; for (String stream : joinCriteria.keySet()) { if (i > 0) { hashedInputs.put(stream, new HashMap<Object, ArrayList<Tuple>>()); } ++i; } if (outputFields == null) { throw new IllegalArgumentException(“Must specify output fields via .select() method.”); } } @Override public void execute(TupleWindow inputWindow) { // 1) Perform Join List<Tuple> currentWindow = inputWindow.get(); JoinAccumulator joinResult = hashJoin(currentWindow); // 2) Emit results for (ResultRecord resultRecord : joinResult.getRecords()) { ArrayList<Object> outputTuple = resultRecord.getOutputFields(); if (outputStreamName == null) { // explicit anchoring emits to corresponding input tuples only, as default window anchoring will anchor them to all // tuples in window collector.emit(resultRecord.tupleList, outputTuple); } else { // explicitly anchor emits to corresponding input tuples only, as default window anchoring will anchor them to all tuples // in window collector.emit(outputStreamName, resultRecord.tupleList, outputTuple); } } } //……}JoinBolt继承了BaseWindowedBolt，定义了Selector selectorType、LinkedHashMap<String, JoinInfo> joinCriteria、FieldSelector[] outputFields等属性，用于记录关联类型及关联关系join、leftJoin方法用于设置join关联关系，最后都是调用joinCommon方法，关联关系使用JoinInfo对象，存储在joinCriteria中select方法用于选择结果集的列，最后设置到outputFields，用于declareOutputFieldsexecute就是join的核心逻辑了，这里调用了hashJoinJoinBolt.hashJoinstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java protected JoinAccumulator hashJoin(List<Tuple> tuples) { clearHashedInputs(); JoinAccumulator probe = new JoinAccumulator(); // 1) Build phase - Segregate tuples in the Window into streams. // First stream’s tuples go into probe, rest into HashMaps in hashedInputs String firstStream = joinCriteria.keySet().iterator().next(); for (Tuple tuple : tuples) { String streamId = getStreamSelector(tuple); if (!streamId.equals(firstStream)) { Object field = getJoinField(streamId, tuple); ArrayList<Tuple> recs = hashedInputs.get(streamId).get(field); if (recs == null) { recs = new ArrayList<Tuple>(); hashedInputs.get(streamId).put(field, recs); } recs.add(tuple); } else { ResultRecord probeRecord = new ResultRecord(tuple, joinCriteria.size() == 1); probe.insert(probeRecord); // first stream’s data goes into the probe } } // 2) Join the streams in order of streamJoinOrder int i = 0; for (String streamName : joinCriteria.keySet()) { boolean finalJoin = (i == joinCriteria.size() - 1); if (i > 0) { probe = doJoin(probe, hashedInputs.get(streamName), joinCriteria.get(streamName), finalJoin); } ++i; } return probe; }hashJoin方法先遍历一下tuples，把tuples分为两类，firstStream的数据存到JoinAccumulator probe中，其余的存到HashMap<String, HashMap<Object, ArrayList<Tuple>>> hashedInputs之后对剩余的streamId，挨个遍历调用doJoin，把结果整合到JoinAccumulator probeJoinAccumulatorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java protected class JoinAccumulator { ArrayList<ResultRecord> records = new ArrayList<>(); public void insert(ResultRecord tuple) { records.add(tuple); } public Collection<ResultRecord> getRecords() { return records; } }JoinAccumulator就是一个ArrayList<ResultRecord>ResultRecordstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java // Join helper to concat fields to the record protected class ResultRecord { ArrayList<Tuple> tupleList = new ArrayList<>(); // contains one Tuple per Stream being joined ArrayList<Object> outFields = null; // refs to fields that will be part of output fields // ‘generateOutputFields’ enables us to avoid projection unless it is the final stream being joined public ResultRecord(Tuple tuple, boolean generateOutputFields) { tupleList.add(tuple); if (generateOutputFields) { outFields = doProjection(tupleList, outputFields); } } public ResultRecord(ResultRecord lhs, Tuple rhs, boolean generateOutputFields) { if (lhs != null) { tupleList.addAll(lhs.tupleList); } if (rhs != null) { tupleList.add(rhs); } if (generateOutputFields) { outFields = doProjection(tupleList, outputFields); } } public ArrayList<Object> getOutputFields() { return outFields; } // ‘stream’ cannot be null, public Object getField(FieldSelector fieldSelector) { for (Tuple tuple : tupleList) { Object result = lookupField(fieldSelector, tuple); if (result != null) { return result; } } return null; } } // Performs projection on the tuples based on ‘projectionFields’ protected ArrayList<Object> doProjection(ArrayList<Tuple> tuples, FieldSelector[] projectionFields) { ArrayList<Object> result = new ArrayList<>(projectionFields.length); // Todo: optimize this computation… perhaps inner loop can be outside to avoid rescanning tuples for (int i = 0; i < projectionFields.length; i++) { boolean missingField = true; for (Tuple tuple : tuples) { Object field = lookupField(projectionFields[i], tuple); if (field != null) { result.add(field); missingField = false; break; } } if (missingField) { // add a null for missing fields (usually in case of outer joins) result.add(null); } } return result; } // Extract the field from tuple. Field may be nested field (x.y.z) protected Object lookupField(FieldSelector fieldSelector, Tuple tuple) { // very stream name matches, it stream name was specified if (fieldSelector.streamName != null && !fieldSelector.streamName.equalsIgnoreCase(getStreamSelector(tuple))) { return null; } Object curr = null; for (int i = 0; i < fieldSelector.field.length; i++) { if (i == 0) { if (tuple.contains(fieldSelector.field[i])) { curr = tuple.getValueByField(fieldSelector.field[i]); } else { return null; } } else { curr = ((Map) curr).get(fieldSelector.field[i]); if (curr == null) { return null; } } } return curr; }ResultRecord用于存储joined之后的数据当joinCriteria.size() == 1或者finalJoin为true的时候，ResultRecord的generateOutputFields为true，会调用doProjection对结果集进行projection操作当遍历joinCriteria调用doJoin的时候，遍历到最后一条记录时为trueJoinBolt.doJoinstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java // Dispatches to the right join method (inner/left/right/outer) based on the joinInfo.joinType protected JoinAccumulator doJoin(JoinAccumulator probe, HashMap<Object, ArrayList<Tuple>> buildInput, JoinInfo joinInfo, boolean finalJoin) { final JoinType joinType = joinInfo.getJoinType(); switch (joinType) { case INNER: return doInnerJoin(probe, buildInput, joinInfo, finalJoin); case LEFT: return doLeftJoin(probe, buildInput, joinInfo, finalJoin); case RIGHT: case OUTER: default: throw new RuntimeException(“Unsupported join type : " + joinType.name()); } }doJoin封装了各种join类型的方法，目前仅仅实现了INNER以及LEFT，分别调用doInnerJoin、doLeftJoin方法doInnerJoinstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java // inner join - core implementation protected JoinAccumulator doInnerJoin(JoinAccumulator probe, Map<Object, ArrayList<Tuple>> buildInput, JoinInfo joinInfo, boolean finalJoin) { String[] probeKeyName = joinInfo.getOtherField(); JoinAccumulator result = new JoinAccumulator(); FieldSelector fieldSelector = new FieldSelector(joinInfo.other.getStreamName(), probeKeyName); for (ResultRecord rec : probe.getRecords()) { Object probeKey = rec.getField(fieldSelector); if (probeKey != null) { ArrayList<Tuple> matchingBuildRecs = buildInput.get(probeKey); if (matchingBuildRecs != null) { for (Tuple matchingRec : matchingBuildRecs) { ResultRecord mergedRecord = new ResultRecord(rec, matchingRec, finalJoin); result.insert(mergedRecord); } } } } return result; }这里挨个对JoinAccumulator probe的records遍历，然后通过probeKey从buildInput寻找对应的records，如果有找到则进行合并doLeftJoinstorm-2.0.0/storm-client/src/jvm/org/apache/storm/bolt/JoinBolt.java // left join - core implementation protected JoinAccumulator doLeftJoin(JoinAccumulator probe, Map<Object, ArrayList<Tuple>> buildInput, JoinInfo joinInfo, boolean finalJoin) { String[] probeKeyName = joinInfo.getOtherField(); JoinAccumulator result = new JoinAccumulator(); FieldSelector fieldSelector = new FieldSelector(joinInfo.other.getStreamName(), probeKeyName); for (ResultRecord rec : probe.getRecords()) { Object probeKey = rec.getField(fieldSelector); if (probeKey != null) { ArrayList<Tuple> matchingBuildRecs = buildInput.get(probeKey); // ok if its return null if (matchingBuildRecs != null && !matchingBuildRecs.isEmpty()) { for (Tuple matchingRec : matchingBuildRecs) { ResultRecord mergedRecord = new ResultRecord(rec, matchingRec, finalJoin); result.insert(mergedRecord); } } else { ResultRecord mergedRecord = new ResultRecord(rec, null, finalJoin); result.insert(mergedRecord); } } } return result; }left join与inner join的区别就在于没有找到匹配记录的话，仍旧保留左边的记录小结JoinBolt继承了BaseWindowedBolt，目前仅仅支持inner join及left join，而且要求join的字段与fieldsGrouping的字段相同JoinBolt对于多个stream数据的合并，使用分治的方式实现，采用JoinAccumulator不断累加结果集，循环遍历调用doJoin来完成由于JoinBolt是在内存进行操作，又需要匹配数据，需要消耗CPU及内存，有几个点需要注意一下：window的时间窗口不宜过大，否则内存堆积的数据过多，容易OOM，可根据情况调整时间窗口或者通过Config.TOPOLOGY_WORKER_MAX_HEAP_SIZE_MB设置woker的内存大小采取slding window会造成数据重复join，因而需要使用withTumblingWindow如果开启tuple处理超时，则要求Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS大于windowLength + slidingInterval + 处理时间，避免还没有处理完就误判为超时重新replayed由于windowedBolt会自动对tupleWindow的数据进行anchor，数据量过多anchor操作会给整个topology造成压力，如无必要可以关闭ack(设置Config.TOPOLOGY_ACKER_EXECUTORS为0)Config.TOPOLOGY_MAX_SPOUT_PENDING要设置的大一点，给window的join操作及后续操作足够的时间，在一定程度上避免spout发送tuple速度过快，下游bolt消费不过来生产上Config.TOPOLOGY_DEBUG设置为false关闭debug日志，Config.TOPOLOGY_EVENTLOGGER_EXECUTORS设置为0关闭event loggerdocWindowing Support in Core StormJoining Streams in Storm Core ...

聊聊storm的WindowedBoltExecutor

序本文主要研究一下storm的WindowedBoltExecutorWindowedBoltExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java/** * An {@link IWindowedBolt} wrapper that does the windowing of tuples. /public class WindowedBoltExecutor implements IRichBolt { public static final String LATE_TUPLE_FIELD = “late_tuple”; private static final Logger LOG = LoggerFactory.getLogger(WindowedBoltExecutor.class); private static final int DEFAULT_WATERMARK_EVENT_INTERVAL_MS = 1000; // 1s private static final int DEFAULT_MAX_LAG_MS = 0; // no lag private final IWindowedBolt bolt; // package level for unit tests transient WaterMarkEventGenerator<Tuple> waterMarkEventGenerator; private transient WindowedOutputCollector windowedOutputCollector; private transient WindowLifecycleListener<Tuple> listener; private transient WindowManager<Tuple> windowManager; private transient int maxLagMs; private TimestampExtractor timestampExtractor; private transient String lateTupleStream; private transient TriggerPolicy<Tuple, ?> triggerPolicy; private transient EvictionPolicy<Tuple, ?> evictionPolicy; private transient Duration windowLengthDuration; public WindowedBoltExecutor(IWindowedBolt bolt) { this.bolt = bolt; timestampExtractor = bolt.getTimestampExtractor(); } @Override public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) { doPrepare(topoConf, context, collector, new ConcurrentLinkedQueue<>(), false); } // NOTE: the queue has to be thread safe. protected void doPrepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector, Collection<Event<Tuple>> queue, boolean stateful) { Objects.requireNonNull(topoConf); Objects.requireNonNull(context); Objects.requireNonNull(collector); Objects.requireNonNull(queue); this.windowedOutputCollector = new WindowedOutputCollector(collector); bolt.prepare(topoConf, context, windowedOutputCollector); this.listener = newWindowLifecycleListener(); this.windowManager = initWindowManager(listener, topoConf, context, queue, stateful); start(); LOG.info(“Initialized window manager {} “, windowManager); } @Override public void execute(Tuple input) { if (isTupleTs()) { long ts = timestampExtractor.extractTimestamp(input); if (waterMarkEventGenerator.track(input.getSourceGlobalStreamId(), ts)) { windowManager.add(input, ts); } else { if (lateTupleStream != null) { windowedOutputCollector.emit(lateTupleStream, input, new Values(input)); } else { LOG.info(“Received a late tuple {} with ts {}. This will not be processed.”, input, ts); } windowedOutputCollector.ack(input); } } else { windowManager.add(input); } } @Override public void cleanup() { if (waterMarkEventGenerator != null) { waterMarkEventGenerator.shutdown(); } windowManager.shutdown(); bolt.cleanup(); } // for unit tests WindowManager<Tuple> getWindowManager() { return windowManager; } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { String lateTupleStream = (String) getComponentConfiguration().get(Config.TOPOLOGY_BOLTS_LATE_TUPLE_STREAM); if (lateTupleStream != null) { declarer.declareStream(lateTupleStream, new Fields(LATE_TUPLE_FIELD)); } bolt.declareOutputFields(declarer); } @Override public Map<String, Object> getComponentConfiguration() { return bolt.getComponentConfiguration(); } //……}WindowedBoltExecutor实现了IRichBolt接口，在prepare的时候初始化windowedOutputCollector、listener、windowManager，调用了bolt.prepare；在cleanup的时候对waterMarkEventGenerator、windowManager、bolt进行清理；TopologyBuilder在setBolt的时候，对原始的IWindowedBolt的实现类进行了一次包装，用WindowedBoltExecutor替代declareOutputFields采用的是bolt.declareOutputFields(declarer)；getComponentConfiguration也返回的是bolt.getComponentConfiguration();execute方法主要是将tuple添加到windowManager，对于不纳入window的tuple则立刻进行ackWindowedOutputCollectorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java /* * Creates an {@link OutputCollector} wrapper that automatically anchors the tuples to inputTuples while emitting. / private static class WindowedOutputCollector extends OutputCollector { private List<Tuple> inputTuples; WindowedOutputCollector(IOutputCollector delegate) { super(delegate); } void setContext(List<Tuple> inputTuples) { this.inputTuples = inputTuples; } @Override public List<Integer> emit(String streamId, List<Object> tuple) { return emit(streamId, inputTuples, tuple); } @Override public void emitDirect(int taskId, String streamId, List<Object> tuple) { emitDirect(taskId, streamId, inputTuples, tuple); } }WindowedOutputCollector继承了OutputCollector，可以看到这里重写了emit计emitDirect方法，默认对inputTuples进行anchorWindowLifecycleListenerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/WindowLifecycleListener.java/* * A callback for expiry, activation of events tracked by the {@link WindowManager} * * @param <T> The type of Event in the window (e.g. Tuple). /public interface WindowLifecycleListener<T> { /* * Called on expiry of events from the window due to {@link EvictionPolicy} * * @param events the expired events / void onExpiry(List<T> events); /* * Called on activation of the window due to the {@link TriggerPolicy} * * @param events the list of current events in the window. * @param newEvents the newly added events since last activation. * @param expired the expired events since last activation. * @param referenceTime the reference (event or processing) time that resulted in activation / default void onActivation(List<T> events, List<T> newEvents, List<T> expired, Long referenceTime) { throw new UnsupportedOperationException(“Not implemented”); } /* * Called on activation of the window due to the {@link TriggerPolicy}. This is typically invoked when the windows are persisted in * state and is huge to be loaded entirely in memory. * * @param eventsIt a supplier of iterator over the list of current events in the window * @param newEventsIt a supplier of iterator over the newly added events since the last ativation * @param expiredIt a supplier of iterator over the expired events since the last activation * @param referenceTime the reference (event or processing) time that resulted in activation / default void onActivation(Supplier<Iterator<T>> eventsIt, Supplier<Iterator<T>> newEventsIt, Supplier<Iterator<T>> expiredIt, Long referenceTime) { throw new UnsupportedOperationException(“Not implemented”); }}WindowLifecycleListener定义了几个回调方法，分别是onExpiry、onActivation它们分别是由EvictionPolicy、TriggerPolicy两种策略来触发EvictionPolicystorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/EvictionPolicy.java/* * Eviction policy tracks events and decides whether an event should be evicted from the window or not. * * @param <T> the type of event that is tracked. /public interface EvictionPolicy<T, S> { /* * Decides if an event should be expired from the window, processed in the current window or kept for later processing. * * @param event the input event * @return the {@link org.apache.storm.windowing.EvictionPolicy.Action} to be taken based on the input event / Action evict(Event<T> event); /* * Tracks the event to later decide whether {@link EvictionPolicy#evict(Event)} should evict it or not. * * @param event the input event to be tracked / void track(Event<T> event); /* * Returns the current context that is part of this eviction policy. * * @return the eviction context / EvictionContext getContext(); /* * Sets a context in the eviction policy that can be used while evicting the events. E.g. For TimeEvictionPolicy, this could be used to * set the reference timestamp. * * @param context the eviction context / void setContext(EvictionContext context); /* * Resets the eviction policy. / void reset(); /* * Return runtime state to be checkpointed by the framework for restoring the eviction policy in case of failures. * * @return the state / S getState(); /* * Restore the eviction policy from the state that was earlier checkpointed by the framework. * * @param state the state / void restoreState(S state); /* * The action to be taken when {@link EvictionPolicy#evict(Event)} is invoked. / public enum Action { /* * expire the event and remove it from the queue. / EXPIRE, /* * process the event in the current window of events. / PROCESS, /* * don’t include in the current window but keep the event in the queue for evaluating as a part of future windows. / KEEP, /* * stop processing the queue, there cannot be anymore events satisfying the eviction policy. / STOP }}EvictionPolicy主要负责追踪event，然后判断event是否该从window中移除EvictionPolicy有几个实现类：CountEvictionPolicy、TimeEvictionPolicy、WatermarkCountEvictionPolicy、WatermarkTimeEvictionPolicyTriggerPolicystorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/TriggerPolicy.java/* * Triggers the window calculations based on the policy. * * @param <T> the type of the event that is tracked /public interface TriggerPolicy<T, S> { /* * Tracks the event and could use this to invoke the trigger. * * @param event the input event / void track(Event<T> event); /* * resets the trigger policy. / void reset(); /* * Starts the trigger policy. This can be used during recovery to start the triggers after recovery is complete. / void start(); /* * Any clean up could be handled here. / void shutdown(); /* * Return runtime state to be checkpointed by the framework for restoring the trigger policy in case of failures. * * @return the state / S getState(); /* * Restore the trigger policy from the state that was earlier checkpointed by the framework. * * @param state the state / void restoreState(S state);}TriggerPolicy主要是负责window的计算TriggerPolicy有几个实现类：CountTriggerPolicy、TimeTriggerPolicy、WatermarkCountTriggerPolicy、WatermarkTimeTriggerPolicyWindowedBoltExecutor.newWindowLifecycleListenerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java protected WindowLifecycleListener<Tuple> newWindowLifecycleListener() { return new WindowLifecycleListener<Tuple>() { @Override public void onExpiry(List<Tuple> tuples) { for (Tuple tuple : tuples) { windowedOutputCollector.ack(tuple); } } @Override public void onActivation(List<Tuple> tuples, List<Tuple> newTuples, List<Tuple> expiredTuples, Long timestamp) { windowedOutputCollector.setContext(tuples); boltExecute(tuples, newTuples, expiredTuples, timestamp); } }; } protected void boltExecute(List<Tuple> tuples, List<Tuple> newTuples, List<Tuple> expiredTuples, Long timestamp) { bolt.execute(new TupleWindowImpl(tuples, newTuples, expiredTuples, getWindowStartTs(timestamp), timestamp)); }这里创建了一个匿名的WindowLifecycleListener实现在onExpiry的时候挨个对tuple进行ack，在onActivation的时候，调用了boltExecute，构造TupleWindowImpl，传递给bolt进行执行WindowedBoltExecutor.initWindowManagerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java private WindowManager<Tuple> initWindowManager(WindowLifecycleListener<Tuple> lifecycleListener, Map<String, Object> topoConf, TopologyContext context, Collection<Event<Tuple>> queue, boolean stateful) { WindowManager<Tuple> manager = stateful ? new StatefulWindowManager<>(lifecycleListener, queue) : new WindowManager<>(lifecycleListener, queue); Count windowLengthCount = null; Duration slidingIntervalDuration = null; Count slidingIntervalCount = null; // window length if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_COUNT)) { windowLengthCount = new Count(((Number) topoConf.get(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_COUNT)).intValue()); } else if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_DURATION_MS)) { windowLengthDuration = new Duration( ((Number) topoConf.get(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_DURATION_MS)).intValue(), TimeUnit.MILLISECONDS); } // sliding interval if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_COUNT)) { slidingIntervalCount = new Count(((Number) topoConf.get(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_COUNT)).intValue()); } else if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_DURATION_MS)) { slidingIntervalDuration = new Duration(((Number) topoConf.get(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_DURATION_MS)).intValue(), TimeUnit.MILLISECONDS); } else { // default is a sliding window of count 1 slidingIntervalCount = new Count(1); } // tuple ts if (timestampExtractor != null) { // late tuple stream lateTupleStream = (String) topoConf.get(Config.TOPOLOGY_BOLTS_LATE_TUPLE_STREAM); if (lateTupleStream != null) { if (!context.getThisStreams().contains(lateTupleStream)) { throw new IllegalArgumentException( “Stream for late tuples must be defined with the builder method withLateTupleStream”); } } // max lag if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_TUPLE_TIMESTAMP_MAX_LAG_MS)) { maxLagMs = ((Number) topoConf.get(Config.TOPOLOGY_BOLTS_TUPLE_TIMESTAMP_MAX_LAG_MS)).intValue(); } else { maxLagMs = DEFAULT_MAX_LAG_MS; } // watermark interval int watermarkInterval; if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_WATERMARK_EVENT_INTERVAL_MS)) { watermarkInterval = ((Number) topoConf.get(Config.TOPOLOGY_BOLTS_WATERMARK_EVENT_INTERVAL_MS)).intValue(); } else { watermarkInterval = DEFAULT_WATERMARK_EVENT_INTERVAL_MS; } waterMarkEventGenerator = new WaterMarkEventGenerator<>(manager, watermarkInterval, maxLagMs, getComponentStreams(context)); } else { if (topoConf.containsKey(Config.TOPOLOGY_BOLTS_LATE_TUPLE_STREAM)) { throw new IllegalArgumentException(“Late tuple stream can be defined only when specifying a timestamp field”); } } // validate validate(topoConf, windowLengthCount, windowLengthDuration, slidingIntervalCount, slidingIntervalDuration); evictionPolicy = getEvictionPolicy(windowLengthCount, windowLengthDuration); triggerPolicy = getTriggerPolicy(slidingIntervalCount, slidingIntervalDuration, manager, evictionPolicy); manager.setEvictionPolicy(evictionPolicy); manager.setTriggerPolicy(triggerPolicy); return manager; } private EvictionPolicy<Tuple, ?> getEvictionPolicy(Count windowLengthCount, Duration windowLengthDuration) { if (windowLengthCount != null) { if (isTupleTs()) { return new WatermarkCountEvictionPolicy<>(windowLengthCount.value); } else { return new CountEvictionPolicy<>(windowLengthCount.value); } } else { if (isTupleTs()) { return new WatermarkTimeEvictionPolicy<>(windowLengthDuration.value, maxLagMs); } else { return new TimeEvictionPolicy<>(windowLengthDuration.value); } } } private TriggerPolicy<Tuple, ?> getTriggerPolicy(Count slidingIntervalCount, Duration slidingIntervalDuration, WindowManager<Tuple> manager, EvictionPolicy<Tuple, ?> evictionPolicy) { if (slidingIntervalCount != null) { if (isTupleTs()) { return new WatermarkCountTriggerPolicy<>(slidingIntervalCount.value, manager, evictionPolicy, manager); } else { return new CountTriggerPolicy<>(slidingIntervalCount.value, manager, evictionPolicy); } } else { if (isTupleTs()) { return new WatermarkTimeTriggerPolicy<>(slidingIntervalDuration.value, manager, evictionPolicy, manager); } else { return new TimeTriggerPolicy<>(slidingIntervalDuration.value, manager, evictionPolicy); } } }对于WindowedBoltExecutor来说，stateful为false，这里创建的是WindowManager这里默认的DEFAULT_MAX_LAG_MS为0，即没有lag，默认的DEFAULT_WATERMARK_EVENT_INTERVAL_MS为1000，即1秒这里根据windowLength及slidingInterval指定的参数类型，来获取相应的EvictionPolicy及TriggerPolicy，对于有配置timestampField的，参数是Duration的，则创建的是WatermarkTimeEvictionPolicy以及WatermarkTimeTriggerPolicyWindowManagerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/WindowManager.java/* * Tracks a window of events and fires {@link WindowLifecycleListener} callbacks on expiry of events or activation of the window due to * {@link TriggerPolicy}. * * @param <T> the type of event in the window. /public class WindowManager<T> implements TriggerHandler { protected final Collection<Event<T>> queue; private final AtomicInteger eventsSinceLastExpiry; //…… /* * Add an event into the window, with the given ts as the tracking ts. * * @param event the event to track * @param ts the timestamp / public void add(T event, long ts) { add(new EventImpl<T>(event, ts)); } /* * Tracks a window event * * @param windowEvent the window event to track / public void add(Event<T> windowEvent) { // watermark events are not added to the queue. if (!windowEvent.isWatermark()) { queue.add(windowEvent); } else { LOG.debug(“Got watermark event with ts {}”, windowEvent.getTimestamp()); } track(windowEvent); compactWindow(); } /* * feed the event to the eviction and trigger policies for bookkeeping and optionally firing the trigger. / private void track(Event<T> windowEvent) { evictionPolicy.track(windowEvent); triggerPolicy.track(windowEvent); } /* * expires events that fall out of the window every EXPIRE_EVENTS_THRESHOLD so that the window does not grow too big. / protected void compactWindow() { if (eventsSinceLastExpiry.incrementAndGet() >= EXPIRE_EVENTS_THRESHOLD) { scanEvents(false); } } /* * Scan events in the queue, using the expiration policy to check if the event should be evicted or not. * * @param fullScan if set, will scan the entire queue; if not set, will stop as soon as an event not satisfying the expiration policy is * found * @return the list of events to be processed as a part of the current window / private List<Event<T>> scanEvents(boolean fullScan) { LOG.debug(“Scan events, eviction policy {}”, evictionPolicy); List<T> eventsToExpire = new ArrayList<>(); List<Event<T>> eventsToProcess = new ArrayList<>(); try { lock.lock(); Iterator<Event<T>> it = queue.iterator(); while (it.hasNext()) { Event<T> windowEvent = it.next(); Action action = evictionPolicy.evict(windowEvent); if (action == EXPIRE) { eventsToExpire.add(windowEvent.get()); it.remove(); } else if (!fullScan || action == STOP) { break; } else if (action == PROCESS) { eventsToProcess.add(windowEvent); } } expiredEvents.addAll(eventsToExpire); } finally { lock.unlock(); } eventsSinceLastExpiry.set(0); LOG.debug(”[{}] events expired from window.”, eventsToExpire.size()); if (!eventsToExpire.isEmpty()) { LOG.debug(“invoking windowLifecycleListener.onExpiry”); windowLifecycleListener.onExpiry(eventsToExpire); } return eventsToProcess; } //……}WindowedBoltExecutor的execute主要是将tuple添加到windowManagerEventImpl的isWatermark返回false，这里主要是执行track及compactWindow操作track主要是委托给evictionPolicy以及triggerPolicy进行track，compactWindow在events超过指定阈值的时候，会触发scanEvents，不是fullScan的话，检测到一个非过期的event就跳出遍历，然后检测eventsToExpire是否为空如果有则触发windowLifecycleListener.onExpiry(eventsToExpire);WaterMarkEventGeneratorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/WaterMarkEventGenerator.java/* * Tracks tuples across input streams and periodically emits watermark events. Watermark event timestamp is the minimum of the latest tuple * timestamps across all the input streams (minus the lag). Once a watermark event is emitted any tuple coming with an earlier timestamp can * be considered as late events. /public class WaterMarkEventGenerator<T> implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(WaterMarkEventGenerator.class); private final WindowManager<T> windowManager; private final int eventTsLag; private final Set<GlobalStreamId> inputStreams; private final Map<GlobalStreamId, Long> streamToTs; private final ScheduledExecutorService executorService; private final int interval; private ScheduledFuture<?> executorFuture; private volatile long lastWaterMarkTs; //…… public void start() { this.executorFuture = executorService.scheduleAtFixedRate(this, interval, interval, TimeUnit.MILLISECONDS); } @Override public void run() { try { long waterMarkTs = computeWaterMarkTs(); if (waterMarkTs > lastWaterMarkTs) { this.windowManager.add(new WaterMarkEvent<>(waterMarkTs)); lastWaterMarkTs = waterMarkTs; } } catch (Throwable th) { LOG.error(“Failed while processing watermark event “, th); throw th; } }}WindowedBoltExecutor在start的时候会调用WaterMarkEventGenerator的start方法该方法每隔watermarkInterval时间调度WaterMarkEventGenerator这个任务其run方法就是计算watermark(这批数据最小值-lag)，当大于lastWaterMarkTs时，更新lastWaterMarkTs，往windowManager添加WaterMarkEvent(该event的isWatermark为true)windowManager.add(new WaterMarkEvent<>(waterMarkTs))会触发triggerPolicy.track(windowEvent)以及compactWindow操作WatermarkTimeTriggerPolicy.trackstorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/WatermarkTimeTriggerPolicy.java @Override public void track(Event<T> event) { if (started && event.isWatermark()) { handleWaterMarkEvent(event); } } /* * Invokes the trigger all pending windows up to the watermark timestamp. The end ts of the window is set in the eviction policy context * so that the events falling within that window can be processed. / private void handleWaterMarkEvent(Event<T> event) { long watermarkTs = event.getTimestamp(); long windowEndTs = nextWindowEndTs; LOG.debug(“Window end ts {} Watermark ts {}”, windowEndTs, watermarkTs); while (windowEndTs <= watermarkTs) { long currentCount = windowManager.getEventCount(windowEndTs); evictionPolicy.setContext(new DefaultEvictionContext(windowEndTs, currentCount)); if (handler.onTrigger()) { windowEndTs += slidingIntervalMs; } else { / * No events were found in the previous window interval. * Scan through the events in the queue to find the next * window intervals based on event ts. / long ts = getNextAlignedWindowTs(windowEndTs, watermarkTs); LOG.debug(“Next aligned window end ts {}”, ts); if (ts == Long.MAX_VALUE) { LOG.debug(“No events to process between {} and watermark ts {}”, windowEndTs, watermarkTs); break; } windowEndTs = ts; } } nextWindowEndTs = windowEndTs; } /* * Computes the next window by scanning the events in the window and finds the next aligned window between the startTs and endTs. Return * the end ts of the next aligned window, i.e. the ts when the window should fire. * * @param startTs the start timestamp (excluding) * @param endTs the end timestamp (including) * @return the aligned window end ts for the next window or Long.MAX_VALUE if there are no more events to be processed. / private long getNextAlignedWindowTs(long startTs, long endTs) { long nextTs = windowManager.getEarliestEventTs(startTs, endTs); if (nextTs == Long.MAX_VALUE || (nextTs % slidingIntervalMs == 0)) { return nextTs; } return nextTs + (slidingIntervalMs - (nextTs % slidingIntervalMs)); }handleWaterMarkEvent会触发handler.onTrigger()方法WindowManager.onTriggerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/windowing/WindowManager.java /* * The callback invoked by the trigger policy. / @Override public boolean onTrigger() { List<Event<T>> windowEvents = null; List<T> expired = null; try { lock.lock(); / * scan the entire window to handle out of order events in * the case of time based windows. */ windowEvents = scanEvents(true); expired = new ArrayList<>(expiredEvents); expiredEvents.clear(); } finally { lock.unlock(); } List<T> events = new ArrayList<>(); List<T> newEvents = new ArrayList<>(); for (Event<T> event : windowEvents) { events.add(event.get()); if (!prevWindowEvents.contains(event)) { newEvents.add(event.get()); } } prevWindowEvents.clear(); if (!events.isEmpty()) { prevWindowEvents.addAll(windowEvents); LOG.debug(“invoking windowLifecycleListener onActivation, [{}] events in window.”, events.size()); windowLifecycleListener.onActivation(events, newEvents, expired, evictionPolicy.getContext().getReferenceTime()); } else { LOG.debug(“No events in the window, skipping onActivation”); } triggerPolicy.reset(); return !events.isEmpty(); }onTrigger方法主要是计算出三类数据，events、expiredEvents、newEvents当events不为空时，触发windowLifecycleListener.onActivation，也就是调用bolt的execute方法小结WindowedBoltExecutor实现了IRichBolt接口，是一个bolt，TopologyBuilder在setBolt的时候，对用户的IWindowedBolt的实现类进行了一次包装，用WindowedBoltExecutor替代，它改造了execute方法，对于该纳入windows的调用windowManager.add添加，该丢弃的则进行ack，而真正的bolt的execute操作，则需要等待window的触发WindowLifecycleListener有两个回调操作，一个是由EvictionPolicy触发的onExpiry，一个是由TriggerPolicy触发的onActivation操作由于window的windowLength及slidingInterval参数有Duration及Count两个维度，因而EvictionPolicy及TriggerPolicy也有这两类维度，外加watermark属性，因而每个policy分别有4个实现类，EvictionPolicy有几个实现类：CountEvictionPolicy、TimeEvictionPolicy、WatermarkCountEvictionPolicy、WatermarkTimeEvictionPolicy；TriggerPolicy有几个实现类：CountTriggerPolicy、TimeTriggerPolicy、WatermarkCountTriggerPolicy、WatermarkTimeTriggerPolicywindowManager.add除了把tuple保存起来外，还调用了两类trigger的track操作，然后进行compactWindow操作；WatermarkTimeEvictionPolicy的track目前没有操作，而WatermarkTimeTriggerPolicy的track方法在event是WaterMarkEvent的时候会触发window操作，调用WindowManager的onTrigger方法，进而筛选出window的数据，然后触发windowLifecycleListener.onActivation操作，最后触发windowedBolt的execute方法WindowManager的onTrigger方法以及add方法都会调用scanEvents，区别是前者是fullScan，后者不是；scanEvents会调用evictionPolicy.evict来判断是否该剔除tuple，进而触发windowLifecycleListener.onExpiry操作，该操作会对tuple进行ack，即过期的tuple在expired的时候会自动ack(理论上所有tuple都会过期，也就都会自动被ack，因而要求Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS大于windowLength + slidingInterval，避免还没ack就被认为超时)WindowedBoltExecutor在start的时候会启动WaterMarkEventGenerator，它会注册一个定时任务，每隔watermarkInterval时间计算watermark(这批数据最小值-lag)，当大于lastWaterMarkTs时，更新lastWaterMarkTs，往windowManager添加WaterMarkEvent(该event的isWatermark为true)，整个WindowManager的onTrigger方法(即windowLifecycleListener.onActivation操作)就是靠这里来触发的关于ack的话，在WindowedBoltExecutor.execute方法对于未能进入window队列的，没有配置配置Config.TOPOLOGY_BOLTS_LATE_TUPLE_STREAM的话，则立马ack；在tuple过期的时候会自ack；WindowedBoltExecutor使用了WindowedOutputCollector，它继承了OutputCollector，对输入的tuples做anchor操作docWindowing Support in Core Storm ...

聊聊storm的reportError

序本文主要研究一下storm的reportErrorIErrorReporterstorm-2.0.0/storm-client/src/jvm/org/apache/storm/task/IErrorReporter.javapublic interface IErrorReporter { void reportError(Throwable error);}ISpoutOutputCollector、IOutputCollector、IBasicOutputCollector接口均继承了IErrorReporter接口ISpoutOutputCollectorstorm-core/1.2.2/storm-core-1.2.2-sources.jar!/org/apache/storm/spout/ISpoutOutputCollector.javapublic interface ISpoutOutputCollector extends IErrorReporter{ /** Returns the task ids that received the tuples. / List<Integer> emit(String streamId, List<Object> tuple, Object messageId); void emitDirect(int taskId, String streamId, List<Object> tuple, Object messageId); long getPendingCount();}ISpoutOutputCollector的实现类有SpoutOutputCollector、SpoutOutputCollectorImpl等IOutputCollectorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/task/IOutputCollector.javapublic interface IOutputCollector extends IErrorReporter { /* * Returns the task ids that received the tuples. / List<Integer> emit(String streamId, Collection<Tuple> anchors, List<Object> tuple); void emitDirect(int taskId, String streamId, Collection<Tuple> anchors, List<Object> tuple); void ack(Tuple input); void fail(Tuple input); void resetTimeout(Tuple input); void flush();}IOutputCollector的实现类有OutputCollector、BoltOutputCollectorImpl等IBasicOutputCollectorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/IBasicOutputCollector.javapublic interface IBasicOutputCollector extends IErrorReporter { List<Integer> emit(String streamId, List<Object> tuple); void emitDirect(int taskId, String streamId, List<Object> tuple); void resetTimeout(Tuple tuple);}IBasicOutputCollector的实现类有BasicOutputCollectorreportErrorSpoutOutputCollectorImpl.reportErrorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutOutputCollectorImpl.java @Override public void reportError(Throwable error) { executor.getErrorReportingMetrics().incrReportedErrorCount(); executor.getReportError().report(error); }BoltOutputCollectorImpl.reportErrorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java @Override public void reportError(Throwable error) { executor.getErrorReportingMetrics().incrReportedErrorCount(); executor.getReportError().report(error); }可以看到SpoutOutputCollectorImpl及BoltOutputCollectorImpl的reportError方法，均调用了executor.getReportError().report(error);ReportError.reportstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/error/ReportError.javapublic class ReportError implements IReportError { private static final Logger LOG = LoggerFactory.getLogger(ReportError.class); private final Map<String, Object> topoConf; private final IStormClusterState stormClusterState; private final String stormId; private final String componentId; private final WorkerTopologyContext workerTopologyContext; private int maxPerInterval; private int errorIntervalSecs; private AtomicInteger intervalStartTime; private AtomicInteger intervalErrors; public ReportError(Map<String, Object> topoConf, IStormClusterState stormClusterState, String stormId, String componentId, WorkerTopologyContext workerTopologyContext) { this.topoConf = topoConf; this.stormClusterState = stormClusterState; this.stormId = stormId; this.componentId = componentId; this.workerTopologyContext = workerTopologyContext; this.errorIntervalSecs = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_ERROR_THROTTLE_INTERVAL_SECS)); this.maxPerInterval = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_MAX_ERROR_REPORT_PER_INTERVAL)); this.intervalStartTime = new AtomicInteger(Time.currentTimeSecs()); this.intervalErrors = new AtomicInteger(0); } @Override public void report(Throwable error) { LOG.error(“Error”, error); if (Time.deltaSecs(intervalStartTime.get()) > errorIntervalSecs) { intervalErrors.set(0); intervalStartTime.set(Time.currentTimeSecs()); } if (intervalErrors.incrementAndGet() <= maxPerInterval) { try { stormClusterState.reportError(stormId, componentId, Utils.hostname(), workerTopologyContext.getThisWorkerPort().longValue(), error); } catch (UnknownHostException e) { throw Utils.wrapInRuntime(e); } } }}可以看到这里先判断interval是否需要重置，然后再判断error是否超过interval的最大次数，没有超过的话，则调用stormClusterState.reportError写入到存储，比如zkStormClusterStateImpl.reportErrorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java @Override public void reportError(String stormId, String componentId, String node, Long port, Throwable error) { String path = ClusterUtils.errorPath(stormId, componentId); String lastErrorPath = ClusterUtils.lastErrorPath(stormId, componentId); ErrorInfo errorInfo = new ErrorInfo(ClusterUtils.stringifyError(error), Time.currentTimeSecs()); errorInfo.set_host(node); errorInfo.set_port(port.intValue()); byte[] serData = Utils.serialize(errorInfo); stateStorage.mkdirs(path, defaultAcls); stateStorage.create_sequential(path + ClusterUtils.ZK_SEPERATOR + “e”, serData, defaultAcls); stateStorage.set_data(lastErrorPath, serData, defaultAcls); List<String> childrens = stateStorage.get_children(path, false); Collections.sort(childrens, new Comparator<String>() { public int compare(String arg0, String arg1) { return Long.compare(Long.parseLong(arg0.substring(1)), Long.parseLong(arg1.substring(1))); } }); while (childrens.size() > 10) { String znodePath = path + ClusterUtils.ZK_SEPERATOR + childrens.remove(0); try { stateStorage.delete_node(znodePath); } catch (Exception e) { if (Utils.exceptionCauseIsInstanceOf(KeeperException.NoNodeException.class, e)) { // if the node is already deleted, do nothing LOG.warn(“Could not find the znode: {}”, znodePath); } else { throw e; } } } }这里使用ClusterUtils.errorPath(stormId, componentId)获取写入的目录，再通过ClusterUtils.lastErrorPath(stormId, componentId)获取写入的路径由于zk不适合存储大量数据，因而这里会判断如果childrens超过10的时候，会删除多余的节点，这里先按照节点名substring(1)升序排序，然后挨个删除ClusterUtils.errorPathstorm-2.0.0/storm-client/src/jvm/org/apache/storm/cluster/ClusterUtils.java public static final String ZK_SEPERATOR = “/”; public static final String ERRORS_ROOT = “errors”; public static final String ERRORS_SUBTREE = ZK_SEPERATOR + ERRORS_ROOT; public static String errorPath(String stormId, String componentId) { try { return errorStormRoot(stormId) + ZK_SEPERATOR + URLEncoder.encode(componentId, “UTF-8”); } catch (UnsupportedEncodingException e) { throw Utils.wrapInRuntime(e); } } public static String lastErrorPath(String stormId, String componentId) { return errorPath(stormId, componentId) + “-last-error”; } public static String errorStormRoot(String stormId) { return ERRORS_SUBTREE + ZK_SEPERATOR + stormId; }errorPath的路径为/errors/{stormId}/{componentId}，该目录下创建了以e开头的EPHEMERAL_SEQUENTIAL节点，error信息首先追加到该目录下，然后再判断如果超过10个则删除旧的节点lastErrorPath的路径为/errors/{stormId}/{componentId}-last-error，用于存储该componentId的最后一个errorzkCli查看[zk: localhost:2181(CONNECTED) 21] ls /storm/errors[DRPCStateQuery-1-1540185943, reportErrorDemo-1-1540260375][zk: localhost:2181(CONNECTED) 22] ls /storm/errors/reportErrorDemo-1-1540260375[print, print-last-error][zk: localhost:2181(CONNECTED) 23] ls /storm/errors/reportErrorDemo-1-1540260375/print[e0000000291, e0000000290, e0000000295, e0000000294, e0000000293, e0000000292, e0000000299, e0000000298, e0000000297, e0000000296][zk: localhost:2181(CONNECTED) 24] ls /storm/errors/reportErrorDemo-1-1540260375/print/e0000000299[][zk: localhost:2181(CONNECTED) 25] ls /storm/errors/reportErrorDemo-1-1540260375/print-last-error[]storm-uicurl -i http://192.168.99.100:8080/api/v1/topology/reportErrorDemo-1-1540260375?sys=falsestorm-ui请求了如上的接口，获取了topology相关的数据，其中spout或bolt中包括了lastError，展示了最近一个的error信息StormApiResourcestorm-2.0.0/storm-webapp/src/main/java/org/apache/storm/daemon/ui/resources/StormApiResource.java /* * /api/v1/topology -> topo. / @GET @Path("/topology/{id}") @AuthNimbusOp(value = “getTopology”, needsTopoId = true) @Produces(“application/json”) public Response getTopology(@PathParam(“id”) String id, @DefaultValue(":all-time") @QueryParam(“window”) String window, @QueryParam(“sys”) boolean sys, @QueryParam(callbackParameterName) String callback) throws TException { topologyPageRequestMeter.mark(); try (NimbusClient nimbusClient = NimbusClient.getConfiguredClient(config)) { return UIHelpers.makeStandardResponse( UIHelpers.getTopologySummary( nimbusClient.getClient().getTopologyPageInfo(id, window, sys), window, config, servletRequest.getRemoteUser() ), callback ); } }这里调用了nimbusClient.getClient().getTopologyPageInfo(id, window, sys)方法Nimbus.getTopologyPageInfostorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java @Override public TopologyPageInfo getTopologyPageInfo(String topoId, String window, boolean includeSys) throws NotAliveException, AuthorizationException, TException { try { getTopologyPageInfoCalls.mark(); CommonTopoInfo common = getCommonTopoInfo(topoId, “getTopologyPageInfo”); String topoName = common.topoName; IStormClusterState state = stormClusterState; int launchTimeSecs = common.launchTimeSecs; Assignment assignment = common.assignment; Map<List<Integer>, Map<String, Object>> beats = common.beats; Map<Integer, String> taskToComp = common.taskToComponent; StormTopology topology = common.topology; Map<String, Object> topoConf = Utils.merge(conf, common.topoConf); StormBase base = common.base; if (base == null) { throw new WrappedNotAliveException(topoId); } Map<WorkerSlot, WorkerResources> workerToResources = getWorkerResourcesForTopology(topoId); List<WorkerSummary> workerSummaries = null; Map<List<Long>, List<Object>> exec2NodePort = new HashMap<>(); if (assignment != null) { Map<List<Long>, NodeInfo> execToNodeInfo = assignment.get_executor_node_port(); Map<String, String> nodeToHost = assignment.get_node_host(); for (Entry<List<Long>, NodeInfo> entry : execToNodeInfo.entrySet()) { NodeInfo ni = entry.getValue(); List<Object> nodePort = Arrays.asList(ni.get_node(), ni.get_port_iterator().next()); exec2NodePort.put(entry.getKey(), nodePort); } workerSummaries = StatsUtil.aggWorkerStats(topoId, topoName, taskToComp, beats, exec2NodePort, nodeToHost, workerToResources, includeSys, true); //this is the topology page, so we know the user is authorized } TopologyPageInfo topoPageInfo = StatsUtil.aggTopoExecsStats(topoId, exec2NodePort, taskToComp, beats, topology, window, includeSys, state); //…… return topoPageInfo; } catch (Exception e) { LOG.warn(“Get topo page info exception. (topology id=’{}’)”, topoId, e); if (e instanceof TException) { throw (TException) e; } throw new RuntimeException(e); } }这里调用了StatsUtil.aggTopoExecsStats来获取TopologyPageInfoStatsUtil.aggTopoExecsStatsstorm-2.0.0/storm-server/src/main/java/org/apache/storm/stats/StatsUtil.java /* * aggregate topo executors stats. * * @param topologyId topology id * @param exec2nodePort executor -> host+port * @param task2component task -> component * @param beats executor[start, end] -> executor heartbeat * @param topology storm topology * @param window the window to be aggregated * @param includeSys whether to include system streams * @param clusterState cluster state * @return TopologyPageInfo thrift structure / public static TopologyPageInfo aggTopoExecsStats( String topologyId, Map exec2nodePort, Map task2component, Map<List<Integer>, Map<String, Object>> beats, StormTopology topology, String window, boolean includeSys, IStormClusterState clusterState) { List<Map<String, Object>> beatList = extractDataFromHb(exec2nodePort, task2component, beats, includeSys, topology); Map<String, Object> topoStats = aggregateTopoStats(window, includeSys, beatList); return postAggregateTopoStats(task2component, exec2nodePort, topoStats, topologyId, clusterState); }StatsUtil.aggTopoExecsStats方法最后调用了postAggregateTopoStats方法StatsUtil.postAggregateTopoStatsstorm-2.0.0/storm-server/src/main/java/org/apache/storm/stats/StatsUtil.java private static TopologyPageInfo postAggregateTopoStats(Map task2comp, Map exec2nodePort, Map<String, Object> accData, String topologyId, IStormClusterState clusterState) { TopologyPageInfo ret = new TopologyPageInfo(topologyId); ret.set_num_tasks(task2comp.size()); ret.set_num_workers(((Set) accData.get(WORKERS_SET)).size()); ret.set_num_executors(exec2nodePort != null ? exec2nodePort.size() : 0); Map bolt2stats = ClientStatsUtil.getMapByKey(accData, BOLT_TO_STATS); Map<String, ComponentAggregateStats> aggBolt2stats = new HashMap<>(); for (Object o : bolt2stats.entrySet()) { Map.Entry e = (Map.Entry) o; Map m = (Map) e.getValue(); long executed = getByKeyOr0(m, EXECUTED).longValue(); if (executed > 0) { double execLatencyTotal = getByKeyOr0(m, EXEC_LAT_TOTAL).doubleValue(); m.put(EXEC_LATENCY, execLatencyTotal / executed); double procLatencyTotal = getByKeyOr0(m, PROC_LAT_TOTAL).doubleValue(); m.put(PROC_LATENCY, procLatencyTotal / executed); } m.remove(EXEC_LAT_TOTAL); m.remove(PROC_LAT_TOTAL); String id = (String) e.getKey(); m.put(“last-error”, getLastError(clusterState, topologyId, id)); aggBolt2stats.put(id, thriftifyBoltAggStats(m)); } //…… return ret; } private static ErrorInfo getLastError(IStormClusterState stormClusterState, String stormId, String compId) { return stormClusterState.lastError(stormId, compId); }这里有添加last-error，通过getLastError调用，之后再通过thriftifyBoltAggStats转化到thrift对象这里调用了stormClusterState.lastError(stormId, compId)获取last-errorUIHelpers.getTopologySummarystorm-2.0.0/storm-webapp/src/main/java/org/apache/storm/daemon/ui/UIHelpers.java /* * getTopologySummary. * @param topologyPageInfo topologyPageInfo * @param window window * @param config config * @param remoteUser remoteUser * @return getTopologySummary / public static Map<String, Object> getTopologySummary(TopologyPageInfo topologyPageInfo, String window, Map<String, Object> config, String remoteUser) { Map<String, Object> result = new HashMap(); Map<String, Object> topologyConf = (Map<String, Object>) JSONValue.parse(topologyPageInfo.get_topology_conf()); long messageTimeout = (long) topologyConf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS); Map<String, Object> unpackedTopologyPageInfo = unpackTopologyInfo(topologyPageInfo, window, config); result.putAll(unpackedTopologyPageInfo); result.put(“user”, remoteUser); result.put(“window”, window); result.put(“windowHint”, getWindowHint(window)); result.put(“msgTimeout”, messageTimeout); result.put(“configuration”, topologyConf); result.put(“visualizationTable”, new ArrayList()); result.put(“schedulerDisplayResource”, config.get(DaemonConfig.SCHEDULER_DISPLAY_RESOURCE)); return result; }获取到TopologyPageInfo之后，UIHelpers.getTopologySummary对其进行unpackTopologyInfoUIHelpers.unpackTopologyInfostorm-2.0.0/storm-webapp/src/main/java/org/apache/storm/daemon/ui/UIHelpers.java /* * unpackTopologyInfo. * @param topologyPageInfo topologyPageInfo * @param window window * @param config config * @return unpackTopologyInfo / private static Map<String,Object> unpackTopologyInfo(TopologyPageInfo topologyPageInfo, String window, Map<String,Object> config) { Map<String, Object> result = new HashMap(); result.put(“id”, topologyPageInfo.get_id()); //…… Map<String, ComponentAggregateStats> spouts = topologyPageInfo.get_id_to_spout_agg_stats(); List<Map> spoutStats = new ArrayList(); for (Map.Entry<String, ComponentAggregateStats> spoutEntry : spouts.entrySet()) { spoutStats.add(getTopologySpoutAggStatsMap(spoutEntry.getValue(), spoutEntry.getKey())); } result.put(“spouts”, spoutStats); Map<String, ComponentAggregateStats> bolts = topologyPageInfo.get_id_to_bolt_agg_stats(); List<Map> boltStats = new ArrayList(); for (Map.Entry<String, ComponentAggregateStats> boltEntry : bolts.entrySet()) { boltStats.add(getTopologyBoltAggStatsMap(boltEntry.getValue(), boltEntry.getKey())); } result.put(“bolts”, boltStats); //…… result.put(“samplingPct”, samplingPct); result.put(“replicationCount”, topologyPageInfo.get_replication_count()); result.put(“topologyVersion”, topologyPageInfo.get_topology_version()); result.put(“stormVersion”, topologyPageInfo.get_storm_version()); return result; } /* * getTopologySpoutAggStatsMap. * @param componentAggregateStats componentAggregateStats * @param spoutId spoutId * @return getTopologySpoutAggStatsMap / private static Map<String, Object> getTopologySpoutAggStatsMap(ComponentAggregateStats componentAggregateStats, String spoutId) { Map<String, Object> result = new HashMap(); CommonAggregateStats commonStats = componentAggregateStats.get_common_stats(); result.putAll(getCommonAggStatsMap(commonStats)); result.put(“spoutId”, spoutId); result.put(“encodedSpoutId”, URLEncoder.encode(spoutId)); SpoutAggregateStats spoutAggregateStats = componentAggregateStats.get_specific_stats().get_spout(); result.put(“completeLatency”, spoutAggregateStats.get_complete_latency_ms()); ErrorInfo lastError = componentAggregateStats.get_last_error(); result.put(“lastError”, Objects.isNull(lastError) ? "" : getTruncatedErrorString(lastError.get_error())); return result; } /* * getTopologyBoltAggStatsMap. * @param componentAggregateStats componentAggregateStats * @param boltId boltId * @return getTopologyBoltAggStatsMap / private static Map<String, Object> getTopologyBoltAggStatsMap(ComponentAggregateStats componentAggregateStats, String boltId) { Map<String, Object> result = new HashMap(); CommonAggregateStats commonStats = componentAggregateStats.get_common_stats(); result.putAll(getCommonAggStatsMap(commonStats)); result.put(“boltId”, boltId); result.put(“encodedBoltId”, URLEncoder.encode(boltId)); BoltAggregateStats boltAggregateStats = componentAggregateStats.get_specific_stats().get_bolt(); result.put(“capacity”, StatsUtil.floatStr(boltAggregateStats.get_capacity())); result.put(“executeLatency”, StatsUtil.floatStr(boltAggregateStats.get_execute_latency_ms())); result.put(“executed”, boltAggregateStats.get_executed()); result.put(“processLatency”, StatsUtil.floatStr(boltAggregateStats.get_process_latency_ms())); ErrorInfo lastError = componentAggregateStats.get_last_error(); result.put(“lastError”, Objects.isNull(lastError) ? "" : getTruncatedErrorString(lastError.get_error())); return result; } /* * getTruncatedErrorString. * @param errorString errorString * @return getTruncatedErrorString */ private static String getTruncatedErrorString(String errorString) { return errorString.substring(0, Math.min(errorString.length(), 200)); }注意这里对spout调用了getTopologySpoutAggStatsMap，对bolt调用了getTopologyBoltAggStatsMap这两个方法对lastError都进行了getTruncatedErrorString处理，最大只substring(0,200)crash log2018-10-23 02:53:28.118 o.a.s.util Thread-10-print-executor[7 7] [ERROR] Async loop died!java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:522) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:487) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:74) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$fn__10795$fn__10808$fn__10861.invoke(executor.clj:861) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) [storm-core-1.2.2.jar:1.2.2] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.storm.tuple.TupleImpl.getInteger(TupleImpl.java:116) ~[storm-core-1.2.2.jar:1.2.2] at com.example.demo.error.ErrorPrintBolt.execute(ErrorPrintBolt.java:26) ~[stormjar.jar:?] at org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$fn__10795$tuple_action_fn__10797.invoke(executor.clj:739) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$mk_task_receiver$fn__10716.invoke(executor.clj:468) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$clojure_handler$reify__10135.onEvent(disruptor.clj:41) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:509) ~[storm-core-1.2.2.jar:1.2.2] … 6 more2018-10-23 02:53:28.129 o.a.s.d.executor Thread-10-print-executor[7 7] [ERROR]java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:522) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:487) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:74) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$fn__10795$fn__10808$fn__10861.invoke(executor.clj:861) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) [storm-core-1.2.2.jar:1.2.2] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.storm.tuple.TupleImpl.getInteger(TupleImpl.java:116) ~[storm-core-1.2.2.jar:1.2.2] at com.example.demo.error.ErrorPrintBolt.execute(ErrorPrintBolt.java:26) ~[stormjar.jar:?] at org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$fn__10795$tuple_action_fn__10797.invoke(executor.clj:739) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$mk_task_receiver$fn__10716.invoke(executor.clj:468) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$clojure_handler$reify__10135.onEvent(disruptor.clj:41) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:509) ~[storm-core-1.2.2.jar:1.2.2] … 6 more2018-10-23 02:53:28.175 o.a.s.util Thread-10-print-executor[7 7] [ERROR] Halting process: (“Worker died”)java.lang.RuntimeException: (“Worker died”) at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.2.2.jar:1.2.2] at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker$fn__11404$fn__11405.invoke(worker.clj:792) [storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$mk_executor_data$fn__10612$fn__10613.invoke(executor.clj:281) [storm-core-1.2.2.jar:1.2.2] at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:494) [storm-core-1.2.2.jar:1.2.2] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]2018-10-23 02:53:28.176 o.a.s.d.worker Thread-41 [INFO] Shutting down worker reportErrorDemo-2-1540263136 f9856902-cfe9-45c7-b675-93a29d3d3d36 67002018-10-23 02:53:28.177 o.a.s.d.worker Thread-41 [INFO] Terminating messaging context2018-10-23 02:53:28.177 o.a.s.d.worker Thread-41 [INFO] Shutting down executors2018-10-23 02:53:28.177 o.a.s.d.executor Thread-41 [INFO] Shutting down executor spout:[8 8]2018-10-23 02:53:28.182 o.a.s.util Thread-3-disruptor-executor[8 8]-send-queue [INFO] Async loop interrupted!2018-10-23 02:53:28.186 o.a.s.util Thread-4-spout-executor[8 8] [INFO] Async loop interrupted!2018-10-23 02:53:28.188 o.a.s.d.executor Thread-41 [INFO] Shut down executor spout:[8 8]2018-10-23 02:53:28.188 o.a.s.d.executor Thread-41 [INFO] Shutting down executor spout:[12 12]2018-10-23 02:53:28.189 o.a.s.util Thread-5-disruptor-executor[12 12]-send-queue [INFO] Async loop interrupted!2018-10-23 02:53:28.190 o.a.s.util Thread-6-spout-executor[12 12] [INFO] Async loop interrupted!2018-10-23 02:53:28.190 o.a.s.d.executor Thread-41 [INFO] Shut down executor spout:[12 12]2018-10-23 02:53:28.190 o.a.s.d.executor Thread-41 [INFO] Shutting down executor count:[2 2]2018-10-23 02:53:28.191 o.a.s.util Thread-7-disruptor-executor[2 2]-send-queue [INFO] Async loop interrupted!2018-10-23 02:53:28.193 o.a.s.util Thread-8-count-executor[2 2] [INFO] Async loop interrupted!2018-10-23 02:53:28.194 o.a.s.d.executor Thread-41 [INFO] Shut down executor count:[2 2]2018-10-23 02:53:28.194 o.a.s.d.executor Thread-41 [INFO] Shutting down executor print:[7 7]2018-10-23 02:53:28.196 o.a.s.util Thread-9-disruptor-executor[7 7]-send-queue [INFO] Async loop interrupted!小结spout或bolt的方法里头如果抛出异常会导致整个worker die掉，同时也会自动记录异常到zk但是代价就是worker die掉不断被重启reportError可以通过try catch结合使用，使得有异常之后，worker不会die掉，同时也把error信息记录起来；不过一个topology的同一个component也只记录最近10个异常，采用的是EPHEMERAL_SEQUENTIAL节点来保存，随着worker的die而销毁；lastError采用的是PERSISTENT节点。两者在topology被kill的时候相关信息都会被删掉。storm-ui展示了每个component的lastError信息，展示的时候错误信息的长度最大为200docWhat is the use of OutputCollector class’ reportError(Throwable) method? ...

storm drpc实例

序本文主要演示一下storm drpc实例配置version: ‘2’services: supervisor: image: storm container_name: supervisor command: storm supervisor -c storm.local.hostname=“192.168.99.100” -c drpc.servers=’[“192.168.99.100”]’ -c drpc.port=3772 -c drpc.invocations.port=3773 -c drpc.http.port=3774 depends_on: - nimbus - zookeeper links: - nimbus - zookeeper restart: always ports: - 6700:6700 - 6701:6701 - 6702:6702 - 6703:6703 - 8000:8000 drpc: image: storm container_name: drpc command: storm drpc -c storm.local.hostname=“192.168.99.100” -c drpc.port=3772 -c drpc.invocations.port=3773 -c drpc.http.port=3774 depends_on: - nimbus - supervisor - zookeeper links: - nimbus - supervisor - zookeeper restart: always ports: - 3772:3772 - 3773:3773 - 3774:3774这里对supervisor配置drpc.servers及drpc.port、drpc.invocations.port，好让worker通过drpc.invocations.port去访问drpc节点对于drpc服务，则暴露drpc.port(好让外部的DRPCClient访问)、drpc.invocations.port(让worker访问)TridentTopology @Test public void testDeployDRPCStateQuery() throws InterruptedException, TException { TridentTopology topology = new TridentTopology(); FixedBatchSpout spout = new FixedBatchSpout(new Fields(“sentence”), 3, new Values(“the cow jumped over the moon”), new Values(“the man went to the store and bought some candy”), new Values(“four score and seven years ago”), new Values(“how many apples can you eat”)); spout.setCycle(true); TridentState wordCounts = topology.newStream(“spout1”, spout) .each(new Fields(“sentence”), new Split(), new Fields(“word”)) .groupBy(new Fields(“word”)) //NOTE transforms a Stream into a TridentState object .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields(“count”)) .parallelismHint(6); topology.newDRPCStream(“words”) .each(new Fields(“args”), new Split(), new Fields(“word”)) .groupBy(new Fields(“word”)) .stateQuery(wordCounts, new Fields(“word”), new MapGet(), new Fields(“count”)) .each(new Fields(“count”), new FilterNull()) .aggregate(new Fields(“count”), new Sum(), new Fields(“sum”)); StormTopology stormTopology = topology.build(); //远程提交 mvn clean package -Dmaven.test.skip=true //storm默认会使用System.getProperty(“storm.jar”)去取，如果不设定，就不能提交 System.setProperty(“storm.jar”,TOPOLOGY_JAR); Config conf = new Config(); conf.put(Config.NIMBUS_SEEDS,Arrays.asList(“192.168.99.100”)); //配置nimbus连接主机地址，比如：192.168.10.1 conf.put(Config.NIMBUS_THRIFT_PORT,6627);//配置nimbus连接端口，默认 6627 conf.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(“192.168.99.100”)); //配置zookeeper连接主机地址，可以使用集合存放多个 conf.put(Config.STORM_ZOOKEEPER_PORT,2181); //配置zookeeper连接端口，默认2181 StormSubmitter.submitTopology(“DRPCStateQuery”, conf, stormTopology); }这里newStream创建了一个TridentState，然后newDRPCStream创建了一个DRPCStream，其stateQuery指定为前面创建的TridentState由于TridentState把结果存储到了MemoryMapState，因而这里的DRPCStream通过drpc进行stateQueryDRPCClient @Test public void testLaunchDrpcClient() throws TException { Config conf = new Config(); //NOTE 要设置Config.DRPC_THRIFT_TRANSPORT_PLUGIN属性，不然client直接跑空指针 conf.put(Config.DRPC_THRIFT_TRANSPORT_PLUGIN,SimpleTransportPlugin.class.getName()); conf.put(Config.STORM_NIMBUS_RETRY_TIMES,3); conf.put(Config.STORM_NIMBUS_RETRY_INTERVAL,10000); conf.put(Config.STORM_NIMBUS_RETRY_INTERVAL_CEILING,10000); conf.put(Config.DRPC_MAX_BUFFER_SIZE, 104857600); // 100M DRPCClient client = new DRPCClient(conf, “192.168.99.100”, 3772); System.out.println(client.execute(“words”, “cat dog the man”)); }注意这里的配置项不能少，否则会引发空指针Config.DRPC_THRIFT_TRANSPORT_PLUGIN这里使用的是SimpleTransportPlugin.class.getName()，虽然该类被废弃了，不过还可以跑通由于使用了SimpleTransportPlugin.class，因而这里要配置Config.DRPC_MAX_BUFFER_SIZEDRPCClient配置了drpc的地址及portclient.execute这里要传入newDRPCStream指定的function名称小结使用drpc的时候，需要通过storm drpc启动drpc server服务节点，另外要暴露两个端口，一个drpc.port是供外部DRPCClient调用，一个drpc.invocations.port是给worker来访问；drpc.http.port端口是暴露给http协议调用的(DRPCClient使用的是thrift协议调用)supervisor要配置drpc.servers、drpc.invocations.port，好让worker去访问到drpc serverDRPCClient使用drpc.port指定的端口来访问，另外client.execute这里要传入newDRPCStream指定的function名称docTrident TutorialDistributed RPCRunning Apache Storm Securely ...

[case42]聊聊storm的ack机制

序本文主要研究一下storm的ack机制实例SentenceSpoutpublic class AckSentenceSpout extends BaseRichSpout { private ConcurrentHashMap<UUID, Values> pending; private SpoutOutputCollector collector; private int index = 0; private String[] sentences = { “my dog has fleas”, “i like cold beverages”, “the dog ate my homework”, “don’t have a cow man”, “i don’t think i like fleas” }; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.collector = collector; this.pending = new ConcurrentHashMap<UUID, Values>(); } @Override public void nextTuple() { Values values = new Values(sentences[index]); UUID msgId = UUID.randomUUID(); this.pending.put(msgId, values);// this.collector.emit(values); //NOTE 这里要传入msgId this.collector.emit(values, msgId); index++; if (index >= sentences.length) { index = 0; } Utils.sleep(100); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“sentence”)); } @Override public void ack(Object msgId) { this.pending.remove(msgId); } //NOTE 对于ack是失败的，要重新发送 @Override public void fail(Object msgId) { this.collector.emit(this.pending.get(msgId), msgId); }}对spout来说，需要在emit的时候要指定msgId，然后需要缓存数据，在ack时删除，在fail的时候重新发送进行重试AckWordCountBoltpublic class AckWordCountBolt extends BaseRichBolt { private static final Logger LOGGER = LoggerFactory.getLogger(AckWordCountBolt.class); private OutputCollector collector; private HashMap<String, Long> counts = null; public void prepare(Map config, TopologyContext context, OutputCollector collector) { this.collector = collector; this.counts = new HashMap<String, Long>(); } public void execute(Tuple tuple) { try{ String word = tuple.getStringByField(“word”); Long count = this.counts.get(word); if(count == null){ count = 0L; } count++; this.counts.put(word, count); //NOTE 传入当前处理的tuple作为anchor this.collector.emit(tuple, new Values(word, count)); //NOTE 这里要自己ack this.collector.ack(tuple); }catch (Exception e){ LOGGER.error(e.getMessage(),e); //NOTE 处理异常要fail this.collector.fail(tuple); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“word”, “count”)); }}对于bolt来说，要做两件事情，一是要anchor，在emit的时候把输入及输出tuple连接起来，构建tuple tree；而要对处理完的tuple进行ack，失败进行fail操作源码解析SpoutOutputCollectorImpl.emitstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutOutputCollectorImpl.java @Override public List<Integer> emit(String streamId, List<Object> tuple, Object messageId) { try { return sendSpoutMsg(streamId, tuple, messageId, null); } catch (InterruptedException e) { LOG.warn(“Spout thread interrupted during emit().”); throw new RuntimeException(e); } } private List<Integer> sendSpoutMsg(String stream, List<Object> values, Object messageId, Integer outTaskId) throws InterruptedException { emittedCount.increment(); List<Integer> outTasks; if (outTaskId != null) { outTasks = taskData.getOutgoingTasks(outTaskId, stream, values); } else { outTasks = taskData.getOutgoingTasks(stream, values); } final boolean needAck = (messageId != null) && hasAckers; final List<Long> ackSeq = needAck ? new ArrayList<>() : null; final long rootId = needAck ? MessageId.generateId(random) : 0; for (int i = 0; i < outTasks.size(); i++) { // perf critical path. don’t use iterators. Integer t = outTasks.get(i); MessageId msgId; if (needAck) { long as = MessageId.generateId(random); msgId = MessageId.makeRootId(rootId, as); ackSeq.add(as); } else { msgId = MessageId.makeUnanchored(); } final TupleImpl tuple = new TupleImpl(executor.getWorkerTopologyContext(), values, executor.getComponentId(), this.taskId, stream, msgId); AddressedTuple adrTuple = new AddressedTuple(t, tuple); executor.getExecutorTransfer().tryTransfer(adrTuple, executor.getPendingEmits()); } if (isEventLoggers) { taskData.sendToEventLogger(executor, values, executor.getComponentId(), messageId, random, executor.getPendingEmits()); } if (needAck) { boolean sample = executor.samplerCheck(); TupleInfo info = new TupleInfo(); info.setTaskId(this.taskId); info.setStream(stream); info.setMessageId(messageId); if (isDebug) { info.setValues(values); } if (sample) { info.setTimestamp(System.currentTimeMillis()); } pending.put(rootId, info); List<Object> ackInitTuple = new Values(rootId, Utils.bitXorVals(ackSeq), this.taskId); taskData.sendUnanchored(Acker.ACKER_INIT_STREAM_ID, ackInitTuple, executor.getExecutorTransfer(), executor.getPendingEmits()); } else if (messageId != null) { // Reusing TupleInfo object as we directly call executor.ackSpoutMsg() & are not sending msgs. perf critical if (isDebug) { if (spoutExecutorThdId != Thread.currentThread().getId()) { throw new RuntimeException(“Detected background thread emitting tuples for the spout. " + “Spout Output Collector should only emit from the main spout executor thread.”); } } globalTupleInfo.clear(); globalTupleInfo.setStream(stream); globalTupleInfo.setValues(values); globalTupleInfo.setMessageId(messageId); globalTupleInfo.setTimestamp(0); globalTupleInfo.setId(“0:”); Long timeDelta = 0L; executor.ackSpoutMsg(executor, taskData, timeDelta, globalTupleInfo); } return outTasks; }对于needAck的，首先创建rootId，然后调用ackSeq.add(as)，之后触发taskData.sendUnanchored(Acker.ACKER_INIT_STREAM_ID, ackInitTuple, executor.getExecutorTransfer(), executor.getPendingEmits())操作BoltOutputCollectorImpl.ack&failstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java @Override public void ack(Tuple input) { if (!ackingEnabled) { return; } long ackValue = ((TupleImpl) input).getAckVal(); Map<Long, Long> anchorsToIds = input.getMessageId().getAnchorsToIds(); for (Map.Entry<Long, Long> entry : anchorsToIds.entrySet()) { task.sendUnanchored(Acker.ACKER_ACK_STREAM_ID, new Values(entry.getKey(), Utils.bitXor(entry.getValue(), ackValue)), executor.getExecutorTransfer(), executor.getPendingEmits()); } long delta = tupleTimeDelta((TupleImpl) input); if (isDebug) { LOG.info(“BOLT ack TASK: {} TIME: {} TUPLE: {}”, taskId, delta, input); } if (!task.getUserContext().getHooks().isEmpty()) { BoltAckInfo boltAckInfo = new BoltAckInfo(input, taskId, delta); boltAckInfo.applyOn(task.getUserContext()); } if (delta >= 0) { executor.getStats().boltAckedTuple(input.getSourceComponent(), input.getSourceStreamId(), delta, task.getTaskMetrics().getAcked(input.getSourceStreamId())); } } @Override public void fail(Tuple input) { if (!ackingEnabled) { return; } Set<Long> roots = input.getMessageId().getAnchors(); for (Long root : roots) { task.sendUnanchored(Acker.ACKER_FAIL_STREAM_ID, new Values(root), executor.getExecutorTransfer(), executor.getPendingEmits()); } long delta = tupleTimeDelta((TupleImpl) input); if (isDebug) { LOG.info(“BOLT fail TASK: {} TIME: {} TUPLE: {}”, taskId, delta, input); } BoltFailInfo boltFailInfo = new BoltFailInfo(input, taskId, delta); boltFailInfo.applyOn(task.getUserContext()); if (delta >= 0) { executor.getStats().boltFailedTuple(input.getSourceComponent(), input.getSourceStreamId(), delta, task.getTaskMetrics().getFailed(input.getSourceStreamId())); } }BoltOutputCollectorImpl的ack及fail均是调用task.sendUnanchored操作ack发送到Acker.ACKER_ACK_STREAM_ID，fail发送到Acker.ACKER_FAIL_STREAM_IDTask.sendUnanchoredstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/Task.java // Non Blocking call. If cannot emit to destination immediately, such tuples will be added to pendingEmits argument public void sendUnanchored(String stream, List<Object> values, ExecutorTransfer transfer, Queue<AddressedTuple> pendingEmits) { Tuple tuple = getTuple(stream, values); List<Integer> tasks = getOutgoingTasks(stream, values); for (Integer t : tasks) { AddressedTuple addressedTuple = new AddressedTuple(t, tuple); transfer.tryTransfer(addressedTuple, pendingEmits); } }这里调用了ExecutorTransfer.tryTransferExecutorTransfer.tryTransferstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java // adds addressedTuple to destination Q if it is not full. else adds to pendingEmits (if its not null) public boolean tryTransfer(AddressedTuple addressedTuple, Queue<AddressedTuple> pendingEmits) { if (isDebug) { LOG.info(“TRANSFERRING tuple {}”, addressedTuple); } JCQueue localQueue = getLocalQueue(addressedTuple); if (localQueue != null) { return tryTransferLocal(addressedTuple, localQueue, pendingEmits); } return workerData.tryTransferRemote(addressedTuple, pendingEmits, serializer); } /** * Adds tuple to localQueue (if overflow is empty). If localQueue is full adds to pendingEmits instead. pendingEmits can be null. * Returns false if unable to add to localQueue. */ public boolean tryTransferLocal(AddressedTuple tuple, JCQueue localQueue, Queue<AddressedTuple> pendingEmits) { workerData.checkSerialize(serializer, tuple); if (pendingEmits != null) { if (pendingEmits.isEmpty() && localQueue.tryPublish(tuple)) { queuesToFlush.set(tuple.dest - indexingBase, localQueue); return true; } else { pendingEmits.add(tuple); return false; } } else { return localQueue.tryPublish(tuple); } }这里先根据addressedTuple判断目标队列是否是本地，是的话，调用tryTransferLocal；不是的话，则调用workerData.tryTransferRemotetryTransferLocal操作，执行的localQueue.tryPublish，就是将数据放到JCQueue的recvQueue队列中workerData.tryTransferRemote的话，是通过WorkerTransfer将数据放到TransferDrainer，在flush的时候传输到远程的node节点StormCommon.systemTopologystorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java public static StormTopology systemTopology(Map<String, Object> topoConf, StormTopology topology) throws InvalidTopologyException { return _instance.systemTopologyImpl(topoConf, topology); } protected StormTopology systemTopologyImpl(Map<String, Object> topoConf, StormTopology topology) throws InvalidTopologyException { validateBasic(topology); StormTopology ret = topology.deepCopy(); addAcker(topoConf, ret); if (hasEventLoggers(topoConf)) { addEventLogger(topoConf, ret); } addMetricComponents(topoConf, ret); addSystemComponents(topoConf, ret); addMetricStreams(ret); addSystemStreams(ret); validateStructure(ret); return ret; } public static void addAcker(Map<String, Object> conf, StormTopology topology) { int ackerNum = ObjectReader.getInt(conf.get(Config.TOPOLOGY_ACKER_EXECUTORS), ObjectReader.getInt(conf.get(Config.TOPOLOGY_WORKERS))); Map<GlobalStreamId, Grouping> inputs = ackerInputs(topology); Map<String, StreamInfo> outputStreams = new HashMap<String, StreamInfo>(); outputStreams.put(Acker.ACKER_ACK_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); outputStreams.put(Acker.ACKER_FAIL_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); outputStreams.put(Acker.ACKER_RESET_TIMEOUT_STREAM_ID, Thrift.directOutputFields(Arrays.asList(“id”, “time-delta-ms”))); Map<String, Object> ackerConf = new HashMap<>(); ackerConf.put(Config.TOPOLOGY_TASKS, ackerNum); ackerConf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))); Bolt acker = Thrift.prepareSerializedBoltDetails(inputs, makeAckerBolt(), outputStreams, ackerNum, ackerConf); for (Bolt bolt : topology.get_bolts().values()) { ComponentCommon common = bolt.get_common(); common.put_to_streams(Acker.ACKER_ACK_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”, “ack-val”))); common.put_to_streams(Acker.ACKER_FAIL_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”))); common.put_to_streams(Acker.ACKER_RESET_TIMEOUT_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”))); } for (SpoutSpec spout : topology.get_spouts().values()) { ComponentCommon common = spout.get_common(); Map<String, Object> spoutConf = componentConf(spout); spoutConf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))); common.set_json_conf(JSONValue.toJSONString(spoutConf)); common.put_to_streams(Acker.ACKER_INIT_STREAM_ID, Thrift.outputFields(Arrays.asList(“id”, “init-val”, “spout-task”))); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_ACK_STREAM_ID), Thrift.prepareDirectGrouping()); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_FAIL_STREAM_ID), Thrift.prepareDirectGrouping()); common.put_to_inputs(Utils.getGlobalStreamId(Acker.ACKER_COMPONENT_ID, Acker.ACKER_RESET_TIMEOUT_STREAM_ID), Thrift.prepareDirectGrouping()); } topology.put_to_bolts(Acker.ACKER_COMPONENT_ID, acker); } public static Map<GlobalStreamId, Grouping> ackerInputs(StormTopology topology) { Map<GlobalStreamId, Grouping> inputs = new HashMap<>(); Set<String> boltIds = topology.get_bolts().keySet(); Set<String> spoutIds = topology.get_spouts().keySet(); for (String id : spoutIds) { inputs.put(Utils.getGlobalStreamId(id, Acker.ACKER_INIT_STREAM_ID), Thrift.prepareFieldsGrouping(Arrays.asList(“id”))); } for (String id : boltIds) { inputs.put(Utils.getGlobalStreamId(id, Acker.ACKER_ACK_STREAM_ID), Thrift.prepareFieldsGrouping(Arrays.asList(“id”))); inputs.put(Utils.getGlobalStreamId(id, Acker.ACKER_FAIL_STREAM_ID), Thrift.prepareFieldsGrouping(Arrays.asList(“id”))); inputs.put(Utils.getGlobalStreamId(id, Acker.ACKER_RESET_TIMEOUT_STREAM_ID), Thrift.prepareFieldsGrouping(Arrays.asList(“id”))); } return inputs; } public static IBolt makeAckerBolt() { return _instance.makeAckerBoltImpl(); } public IBolt makeAckerBoltImpl() { return new Acker(); }WorkerState构造器里头调用了systemTopology方法，添加了一些系统的组件，比如Acker、MetricsConsumerBolt、SystemBoltaddAcker执行了创建ack的逻辑，ackerNum为ObjectReader.getInt(conf.get(Config.TOPOLOGY_ACKER_EXECUTORS), ObjectReader.getInt(conf.get(Config.TOPOLOGY_WORKERS)))，即如果Config.TOPOLOGY_ACKER_EXECUTORS没有配置，则取Config.TOPOLOGY_WORKERS的值这里对ack配置了Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS，值为ObjectReader.getInt(conf.get(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS))，也就是Acker配置了tickTuple，Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS的时候触发超时操作Thrift.prepareSerializedBoltDetails传入参数的时候，调用makeAckerBolt()方法，创建Ackerack里头对input及output配置了Acker.ACKER_ACK_STREAM_ID、Acker.ACKER_FAIL_STREAM_IDaddAcker对spout配置了Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS，Acker.ACKER_ACK_STREAM_ID、Acker.ACKER_FAIL_STREAM_ID、Acker.ACKER_RESET_TIMEOUT_STREAM_IDAckerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/Acker.javapublic class Acker implements IBolt { public static final String ACKER_COMPONENT_ID = “__acker”; public static final String ACKER_INIT_STREAM_ID = “__ack_init”; public static final String ACKER_ACK_STREAM_ID = “__ack_ack”; public static final String ACKER_FAIL_STREAM_ID = “__ack_fail”; public static final String ACKER_RESET_TIMEOUT_STREAM_ID = “__ack_reset_timeout”; public static final int TIMEOUT_BUCKET_NUM = 3; private static final Logger LOG = LoggerFactory.getLogger(Acker.class); private static final long serialVersionUID = 4430906880683183091L; private OutputCollector collector; private RotatingMap<Object, AckObject> pending; @Override public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) { this.collector = collector; this.pending = new RotatingMap<>(TIMEOUT_BUCKET_NUM); } @Override public void execute(Tuple input) { if (TupleUtils.isTick(input)) { Map<Object, AckObject> tmp = pending.rotate(); LOG.debug(“Number of timeout tuples:{}”, tmp.size()); return; } boolean resetTimeout = false; String streamId = input.getSourceStreamId(); Object id = input.getValue(0); AckObject curr = pending.get(id); if (ACKER_INIT_STREAM_ID.equals(streamId)) { if (curr == null) { curr = new AckObject(); pending.put(id, curr); } curr.updateAck(input.getLong(1)); curr.spoutTask = input.getInteger(2); } else if (ACKER_ACK_STREAM_ID.equals(streamId)) { if (curr == null) { curr = new AckObject(); pending.put(id, curr); } curr.updateAck(input.getLong(1)); } else if (ACKER_FAIL_STREAM_ID.equals(streamId)) { // For the case that ack_fail message arrives before ack_init if (curr == null) { curr = new AckObject(); } curr.failed = true; pending.put(id, curr); } else if (ACKER_RESET_TIMEOUT_STREAM_ID.equals(streamId)) { resetTimeout = true; if (curr != null) { pending.put(id, curr); } //else if it has not been added yet, there is no reason time it out later on } else if (Constants.SYSTEM_FLUSH_STREAM_ID.equals(streamId)) { collector.flush(); return; } else { LOG.warn(“Unknown source stream {} from task-{}”, streamId, input.getSourceTask()); return; } int task = curr.spoutTask; if (task >= 0 && (curr.val == 0 || curr.failed || resetTimeout)) { Values tuple = new Values(id, getTimeDeltaMillis(curr.startTime)); if (curr.val == 0) { pending.remove(id); collector.emitDirect(task, ACKER_ACK_STREAM_ID, tuple); } else if (curr.failed) { pending.remove(id); collector.emitDirect(task, ACKER_FAIL_STREAM_ID, tuple); } else if (resetTimeout) { collector.emitDirect(task, ACKER_RESET_TIMEOUT_STREAM_ID, tuple); } else { throw new IllegalStateException(“The checks are inconsistent we reach what should be unreachable code.”); } } collector.ack(input); } @Override public void cleanup() { LOG.info(“Acker: cleanup successfully”); } private long getTimeDeltaMillis(long startTimeMillis) { return Time.currentTimeMillis() - startTimeMillis; } private static class AckObject { public long val = 0L; public long startTime = Time.currentTimeMillis(); public int spoutTask = -1; public boolean failed = false; // val xor value public void updateAck(Long value) { val = Utils.bitXor(val, value); } }}对于tickTuple，执行RotatingMap.rotate操作对于成功则调用AckObject的updateAck操作，对于失败的重新放回pending中最后判断，如果AckObject的val为0的话，表示整个tuple tree都操作成功，则往ACKER_ACK_STREAM_ID通知；如果是failed的则往ACKER_FAIL_STREAM_ID通知；如果是resetTimeout的则往ACKER_RESET_TIMEOUT_STREAM_ID通知SpoutExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.javapublic class SpoutExecutor extends Executor { //…… @Override public void tupleActionFn(int taskId, TupleImpl tuple) throws Exception { String streamId = tuple.getSourceStreamId(); if (Constants.SYSTEM_FLUSH_STREAM_ID.equals(streamId)) { spoutOutputCollector.flush(); } else if (streamId.equals(Constants.SYSTEM_TICK_STREAM_ID)) { pending.rotate(); } else if (streamId.equals(Constants.METRICS_TICK_STREAM_ID)) { metricsTick(idToTask.get(taskId - idToTaskBase), tuple); } else if (streamId.equals(Constants.CREDENTIALS_CHANGED_STREAM_ID)) { Object spoutObj = idToTask.get(taskId - idToTaskBase).getTaskObject(); if (spoutObj instanceof ICredentialsListener) { ((ICredentialsListener) spoutObj).setCredentials((Map<String, String>) tuple.getValue(0)); } } else if (streamId.equals(Acker.ACKER_RESET_TIMEOUT_STREAM_ID)) { Long id = (Long) tuple.getValue(0); TupleInfo pendingForId = pending.get(id); if (pendingForId != null) { pending.put(id, pendingForId); } } else { Long id = (Long) tuple.getValue(0); Long timeDeltaMs = (Long) tuple.getValue(1); TupleInfo tupleInfo = pending.remove(id); if (tupleInfo != null && tupleInfo.getMessageId() != null) { if (taskId != tupleInfo.getTaskId()) { throw new RuntimeException(“Fatal error, mismatched task ids: " + taskId + " " + tupleInfo.getTaskId()); } Long timeDelta = null; if (hasAckers) { long startTimeMs = tupleInfo.getTimestamp(); if (startTimeMs != 0) { timeDelta = timeDeltaMs; } } if (streamId.equals(Acker.ACKER_ACK_STREAM_ID)) { ackSpoutMsg(this, idToTask.get(taskId - idToTaskBase), timeDelta, tupleInfo); } else if (streamId.equals(Acker.ACKER_FAIL_STREAM_ID)) { failSpoutMsg(this, idToTask.get(taskId - idToTaskBase), timeDelta, tupleInfo, “FAIL-STREAM”); } } } } public void ackSpoutMsg(SpoutExecutor executor, Task taskData, Long timeDelta, TupleInfo tupleInfo) { try { ISpout spout = (ISpout) taskData.getTaskObject(); int taskId = taskData.getTaskId(); if (executor.getIsDebug()) { LOG.info(“SPOUT Acking message {} {}”, tupleInfo.getId(), tupleInfo.getMessageId()); } spout.ack(tupleInfo.getMessageId()); if (!taskData.getUserContext().getHooks().isEmpty()) { // avoid allocating SpoutAckInfo obj if not necessary new SpoutAckInfo(tupleInfo.getMessageId(), taskId, timeDelta).applyOn(taskData.getUserContext()); } if (hasAckers && timeDelta != null) { executor.getStats().spoutAckedTuple(tupleInfo.getStream(), timeDelta, taskData.getTaskMetrics().getAcked(tupleInfo.getStream())); } } catch (Exception e) { throw Utils.wrapInRuntime(e); } } public void failSpoutMsg(SpoutExecutor executor, Task taskData, Long timeDelta, TupleInfo tupleInfo, String reason) { try { ISpout spout = (ISpout) taskData.getTaskObject(); int taskId = taskData.getTaskId(); if (executor.getIsDebug()) { LOG.info(“SPOUT Failing {} : {} REASON: {}”, tupleInfo.getId(), tupleInfo, reason); } spout.fail(tupleInfo.getMessageId()); new SpoutFailInfo(tupleInfo.getMessageId(), taskId, timeDelta).applyOn(taskData.getUserContext()); if (timeDelta != null) { executor.getStats().spoutFailedTuple(tupleInfo.getStream(), timeDelta, taskData.getTaskMetrics().getFailed(tupleInfo.getStream())); } } catch (Exception e) { throw Utils.wrapInRuntime(e); } }}SpoutExecutor在tupleActionFn里头，如果接收到ACKER_ACK_STREAM_ID，则进行ackSpoutMsg操作；如果接收到ACKER_FAIL_STREAM_ID，则进行failSpoutMsg操作SpoutExecutor的ackSpoutMsg及failSpoutMsg里头分别调用了具体spout的ack及fail方法，将ack的结果通知到原始的spout小结storm通过ack机制保证least once processing的语义storm在WorkerState构造器里头调用了systemTopology方法，对提交的topology添加了一些系统的组件，比如Acker、MetricsConsumerBolt、SystemBolt；addAcker里头添加了acker，也对spout进行了ack相关的配置spout的emit方法如果带messageId的话，则表示需要ack，然后会触发taskData.sendUnanchored(Acker.ACKER_INIT_STREAM_ID, ackInitTuple, executor.getExecutorTransfer(), executor.getPendingEmits())操作bolt通过BoltOutputCollectorImpl的ack或fail方法将ack信息发送出去，里头调用了task.sendUnanchored操作，而该操作是调用ExecutorTransfer.tryTransfer，将addressedTuple发送到目标队列(如果是远程node则会远程进行远程调用)，发送到的stream为Acker.ACKER_ACK_STREAM_ID或者Acker.ACKER_FAIL_STREAM_IDacker接收到Acker.ACKER_ACK_STREAM_ID调用AckObject的updateAck操作，对于Acker.ACKER_FAIL_STREAM_ID则重新放回pending中，然后对AckObject的val进行判断，如果为0的话，表示整个tuple tree都操作成功，则emitDirect往ACKER_ACK_STREAM_ID通知；如果是failed的则emitDirect往ACKER_FAIL_STREAM_ID通知对应的task；如果是resetTimeout的则往ACKER_RESET_TIMEOUT_STREAM_ID通知对应的taskSpoutExecutor接收到接收到ACKER_ACK_STREAM_ID，则进行ackSpoutMsg操作；接收到ACKER_FAIL_STREAM_ID，则进行failSpoutMsg操作；ackSpoutMsg及failSpoutMsg里头分别调用了具体spout的ack及fail方法，将ack的结果通知到原始的spoutdocJStorm Acker详解Guaranteeing Message Processingstorm ack机制流程详解Storm的ack机制在项目应用中的坑Storm可靠性实例解析——ack机制 ...

聊聊storm的tickTuple

序本文主要研究一下storm的tickTuple实例TickWordCountBoltpublic class TickWordCountBolt extends BaseBasicBolt { private static final Logger LOGGER = LoggerFactory.getLogger(TickWordCountBolt.class); Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10); return conf; } @Override public void execute(Tuple input, BasicOutputCollector collector) { if(TupleUtils.isTick(input)){ //execute tick logic LOGGER.info(“execute tick tuple, emit and clear counts”); counts.entrySet().stream() .forEach(entry -> collector.emit(new Values(entry.getKey(), entry.getValue()))); counts.clear(); }else{ String word = input.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“word”, “count”)); }}使用tick的话，在execute方法里头要自己判断tuple类型，然后执行相应处理这里实例是重写getComponentConfiguration方法，直接new了一个conf，设置了Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS参数tickTopology @Test public void testTickTuple() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); //并发度10 builder.setSpout(“spout”, new TestWordSpout(), 10); builder.setBolt(“count”, new TickWordCountBolt(), 5)// .addConfiguration(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 3) .fieldsGrouping(“spout”, new Fields(“word”)); builder.setBolt(“print”, new PrintBolt(), 1) .shuffleGrouping(“count”); SubmitHelper.submitRemote(“tickDemo”,builder); }除了重写getComponentConfiguration方法配置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS参数外，还可以在TopologyBuilder.setBolt之后调用addConfiguration方法在配置，这个配置会覆盖getComponentConfiguration方法的配置另外除了在bolt上配置，还可以在StormSubmitter.submitTopology时，对传入的conf配置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS参数，不过这个配置是全局的，作用于整个topology的所有bolt；当出现既有全局配置，又有bolt自己的配置时，作用范围小的优先。源码解析TupleUtils.isTickstorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java public static boolean isTick(Tuple tuple) { return tuple != null && Constants.SYSTEM_COMPONENT_ID.equals(tuple.getSourceComponent()) && Constants.SYSTEM_TICK_STREAM_ID.equals(tuple.getSourceStreamId()); }isTick是根据tuple的sourceComponent以及sourceStreamId来判断TopologyBuilder.setBoltstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java /** * Define a new bolt in this topology with the specified amount of parallelism. * * @param id the id of this component. This id is referenced by other components that want to consume this bolt’s * outputs. * @param bolt the bolt * @param parallelism_hint the number of tasks that should be assigned to execute this bolt. Each task will run on a thread in a process * somewhere around the cluster. * @return use the returned object to declare the inputs to this component * * @throws IllegalArgumentException if {@code parallelism_hint} is not positive / public BoltDeclarer setBolt(String id, IRichBolt bolt, Number parallelism_hint) throws IllegalArgumentException { validateUnusedId(id); initCommon(id, bolt, parallelism_hint); _bolts.put(id, bolt); return new BoltGetter(id); } private void initCommon(String id, IComponent component, Number parallelism) throws IllegalArgumentException { ComponentCommon common = new ComponentCommon(); common.set_inputs(new HashMap<GlobalStreamId, Grouping>()); if (parallelism != null) { int dop = parallelism.intValue(); if (dop < 1) { throw new IllegalArgumentException(“Parallelism must be positive.”); } common.set_parallelism_hint(dop); } Map<String, Object> conf = component.getComponentConfiguration(); if (conf != null) { common.set_json_conf(JSONValue.toJSONString(conf)); } commons.put(id, common); }setBolt的时候调用了initCommon，这里调用了bolt的getComponentConfiguration，将其配置写入到commonsBoltGetter.addConfigurationstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java protected class BoltGetter extends ConfigGetter<BoltDeclarer> implements BoltDeclarer { //…… }addConfiguration方法继承自BaseConfigurationDeclarerBaseConfigurationDeclarerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/topology/BaseConfigurationDeclarer.javapublic abstract class BaseConfigurationDeclarer<T extends ComponentConfigurationDeclarer> implements ComponentConfigurationDeclarer<T> { @Override public T addConfiguration(String config, Object value) { Map<String, Object> configMap = new HashMap<>(); configMap.put(config, value); return addConfigurations(configMap); } //……}这里新建一个map，然后调用子类的addConfigurations，这里子类为ConfigGetterConfigGetter.addConfigurations protected class ConfigGetter<T extends ComponentConfigurationDeclarer> extends BaseConfigurationDeclarer<T> { String id; public ConfigGetter(String id) { this.id = id; } @SuppressWarnings(“unchecked”) @Override public T addConfigurations(Map<String, Object> conf) { if (conf != null) { if (conf.containsKey(Config.TOPOLOGY_KRYO_REGISTER)) { throw new IllegalArgumentException(“Cannot set serializations for a component using fluent API”); } if (!conf.isEmpty()) { String currConf = commons.get(id).get_json_conf(); commons.get(id).set_json_conf(mergeIntoJson(parseJson(currConf), conf)); } } return (T) this; } //…… } private static String mergeIntoJson(Map<String, Object> into, Map<String, Object> newMap) { Map<String, Object> res = new HashMap<>(into); res.putAll(newMap); return JSONValue.toJSONString(res); }可以看到这里从common获取配置，然后将自己的配置合并到component自身的配置中，也就是说addConfiguration的配置项会覆盖bolt在getComponentConfiguration方法中的配置Executor.normalizedComponentConfstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java private Map<String, Object> normalizedComponentConf( Map<String, Object> topoConf, WorkerTopologyContext topologyContext, String componentId) { List<String> keysToRemove = retrieveAllConfigKeys(); keysToRemove.remove(Config.TOPOLOGY_DEBUG); keysToRemove.remove(Config.TOPOLOGY_MAX_SPOUT_PENDING); keysToRemove.remove(Config.TOPOLOGY_MAX_TASK_PARALLELISM); keysToRemove.remove(Config.TOPOLOGY_TRANSACTIONAL_ID); keysToRemove.remove(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS); keysToRemove.remove(Config.TOPOLOGY_SLEEP_SPOUT_WAIT_STRATEGY_TIME_MS); keysToRemove.remove(Config.TOPOLOGY_SPOUT_WAIT_STRATEGY); keysToRemove.remove(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_COUNT); keysToRemove.remove(Config.TOPOLOGY_BOLTS_WINDOW_LENGTH_DURATION_MS); keysToRemove.remove(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_COUNT); keysToRemove.remove(Config.TOPOLOGY_BOLTS_SLIDING_INTERVAL_DURATION_MS); keysToRemove.remove(Config.TOPOLOGY_BOLTS_TUPLE_TIMESTAMP_MAX_LAG_MS); keysToRemove.remove(Config.TOPOLOGY_BOLTS_MESSAGE_ID_FIELD_NAME); keysToRemove.remove(Config.TOPOLOGY_STATE_PROVIDER); keysToRemove.remove(Config.TOPOLOGY_STATE_PROVIDER_CONFIG); keysToRemove.remove(Config.TOPOLOGY_BOLTS_LATE_TUPLE_STREAM); Map<String, Object> componentConf; String specJsonConf = topologyContext.getComponentCommon(componentId).get_json_conf(); if (specJsonConf != null) { try { componentConf = (Map<String, Object>) JSONValue.parseWithException(specJsonConf); } catch (ParseException e) { throw new RuntimeException(e); } for (Object p : keysToRemove) { componentConf.remove(p); } } else { componentConf = new HashMap<>(); } Map<String, Object> ret = new HashMap<>(); ret.putAll(topoConf); ret.putAll(componentConf); return ret; }Executor在构造器里头会调用normalizedComponentConf合并一下配置对于componentConf移除掉topology的部分配置项，然后对返回值，先putAll(topoConf)再putAll(componentConf)，可以看到如果都有配置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS的话，componentConf的会覆盖掉topoConf的配置。Executor.setupTicksstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java protected void setupTicks(boolean isSpout) { final Integer tickTimeSecs = ObjectReader.getInt(topoConf.get(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS), null); if (tickTimeSecs != null) { boolean enableMessageTimeout = (Boolean) topoConf.get(Config.TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS); if ((!Acker.ACKER_COMPONENT_ID.equals(componentId) && Utils.isSystemId(componentId)) || (!enableMessageTimeout && isSpout)) { LOG.info(“Timeouts disabled for executor {}:{}”, componentId, executorId); } else { StormTimer timerTask = workerData.getUserTimer(); timerTask.scheduleRecurring(tickTimeSecs, tickTimeSecs, () -> { TupleImpl tuple = new TupleImpl(workerTopologyContext, new Values(tickTimeSecs), Constants.SYSTEM_COMPONENT_ID, (int) Constants.SYSTEM_TASK_ID, Constants.SYSTEM_TICK_STREAM_ID); AddressedTuple tickTuple = new AddressedTuple(AddressedTuple.BROADCAST_DEST, tuple); try { receiveQueue.publish(tickTuple); receiveQueue.flush(); // avoid buffering } catch (InterruptedException e) { LOG.warn(“Thread interrupted when emitting tick tuple. Setting interrupt flag.”); Thread.currentThread().interrupt(); return; } } ); } } }这里的topoConf是topoConf与componentConf合并之后的配置，对满足条件的component设置timerTask可以看到这里new的TupleImpl的srcComponent设置为Constants.SYSTEM_COMPONENT_ID(__system)，taskId设置为Constants.SYSTEM_TASK_ID(-1)，streamId设置为Constants.SYSTEM_TICK_STREAM_ID(__tick)timerTask在调度的时候调用JCQueue(receiveQueue).publish(tickTuple)JCQueue.publish private final DirectInserter directInserter = new DirectInserter(this); /* * Blocking call. Retries till it can successfully publish the obj. Can be interrupted via Thread.interrupt(). / public void publish(Object obj) throws InterruptedException { Inserter inserter = getInserter(); inserter.publish(obj); } private Inserter getInserter() { Inserter inserter; if (producerBatchSz > 1) { inserter = thdLocalBatcher.get(); if (inserter == null) { BatchInserter b = new BatchInserter(this, producerBatchSz); inserter = b; thdLocalBatcher.set(b); } } else { inserter = directInserter; } return inserter; } private static class DirectInserter implements Inserter { private JCQueue q; public DirectInserter(JCQueue q) { this.q = q; } /* * Blocking call, that can be interrupted via Thread.interrupt / @Override public void publish(Object obj) throws InterruptedException { boolean inserted = q.tryPublishInternal(obj); int idleCount = 0; while (!inserted) { q.metrics.notifyInsertFailure(); if (idleCount == 0) { // check avoids multiple log msgs when in a idle loop LOG.debug(“Experiencing Back Pressure on recvQueue: ‘{}’. Entering BackPressure Wait”, q.getName()); } idleCount = q.backPressureWaitStrategy.idle(idleCount); if (Thread.interrupted()) { throw new InterruptedException(); } inserted = q.tryPublishInternal(obj); } } //…… } // Non Blocking. returns true/false indicating success/failure. Fails if full. private boolean tryPublishInternal(Object obj) { if (recvQueue.offer(obj)) { metrics.notifyArrivals(1); return true; } return false; }JCQueue.publish的时候调用inserter.publish，这里inserter可能是BatchInserter或DirectInserter，这里看一下DirectInserter的publish方法DirectInserter的publish方法调用了JCQueue.tryPublishInternal，而该方法调用的是recvQueue.offer(obj)，放入到recvQueue队列JCQueue.consumestorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/JCQueue.java /* * Non blocking. Returns immediately if Q is empty. Runs till Q is empty OR exitCond.keepRunning() return false. Returns number of * elements consumed from Q / public int consume(JCQueue.Consumer consumer, ExitCondition exitCond) { try { return consumeImpl(consumer, exitCond); } catch (InterruptedException e) { throw new RuntimeException(e); } } /* * Non blocking. Returns immediately if Q is empty. Returns number of elements consumed from Q * * @param consumer * @param exitCond */ private int consumeImpl(Consumer consumer, ExitCondition exitCond) throws InterruptedException { int drainCount = 0; while (exitCond.keepRunning()) { Object tuple = recvQueue.poll(); if (tuple == null) { break; } consumer.accept(tuple); ++drainCount; } int overflowDrainCount = 0; int limit = overflowQ.size(); while (exitCond.keepRunning() && (overflowDrainCount < limit)) { // 2nd cond prevents staying stuck with consuming overflow Object tuple = overflowQ.poll(); ++overflowDrainCount; consumer.accept(tuple); } int total = drainCount + overflowDrainCount; if (total > 0) { consumer.flush(); } return total; }在聊聊storm worker的executor与task这篇文章我们有看到executor的asyncLoop主要是调用Executor.call().call()方法，对于BoltExecutor.call则是调用JCQueue.consume方法，该方法调用的是recvQueue.poll()可以看到tickTuple与bolt的业务tuple是共用一个队列的小结关于tick的参数配置，有topology层面，有BoltDeclarer层面，也有bolt的getComponentConfiguration层面，三种方式，BoltDeclarer优先级最高，然后是bolt的getComponentConfiguration，最后是全局的topology层面的配置对于tickTuple，采用的是StormTimer进行调度，调度的时候，往bolt的JCQueue的publish方法，具体是是调用recvQueue.offer(obj)；而executor的asycLoop调用Executor.call().call()方法，对于BoltExecutor.call则是调用JCQueue.consume方法，该方法调用的是recvQueue.poll()因此可以看到timer只负责往队列发送tickTuple，至于触发的时间精度，不一定百分百精确，具体要看recvQueue队列的长度以及executor的消费能力doc关于Storm tickTick tuples within Stormstorm定时的三种方式及tick详解Apache Storm Design Pattern—Micro Batching聊聊storm worker的executor与task ...

聊聊storm的direct grouping

序本文主要研究一下storm的direct groupingdirect groupingdirect grouping是一种特殊的grouping，它是由上游的producer直接指定下游哪个task去接收它发射出来的tuple。direct grouping的使用有如下几个步骤：1、上游在prepare方法保存下游bolt的taskId列表public class SentenceDirectBolt extends BaseRichBolt { private static final Logger LOGGER = LoggerFactory.getLogger(SentenceDirectBolt.class); private OutputCollector collector; private List<Integer> taskIds; private int numCounterTasks; public void prepare(Map config, TopologyContext context, OutputCollector collector) { this.collector = collector; //NOTE 1 这里要取到下游的bolt的taskId，用于emitDirect时指定taskId this.taskIds = context.getComponentTasks(“count-bolt”); this.numCounterTasks = taskIds.size(); } //……}这里保存了下游的bolt的taskId列表，用于emitDirect时选择taskId2、上游在declareOutputFields使用declareStream声明streamIdpublic class SentenceDirectBolt extends BaseRichBolt { //…… public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“word”)); //NOTE 2 这里要通过declareStream声明direct stream，并指定streamId declarer.declareStream(“directStreamDemo1”,true,new Fields(“word”)); declarer.declareStream(“directStreamDemo2”,true,new Fields(“word”)); }}这里声明了两个streamId，一个是directStreamDemo1，一个是directStreamDemo23、上游采用emitDirect指定下游taskId及streamIdpublic class SentenceDirectBolt extends BaseRichBolt { //…… public void execute(Tuple tuple) { String sentence = tuple.getStringByField(“sentence”); String[] words = sentence.split(" “); for(String word : words){ int targetTaskId = getWordCountTaskId(word); LOGGER.info(“word:{} choose taskId:{}",word,targetTaskId); // NOTE 3 这里指定发送给下游bolt的哪个taskId，同时指定streamId if(targetTaskId % 2 == 0){ this.collector.emitDirect(targetTaskId,“directStreamDemo1”,new Values(word)); }else{ this.collector.emitDirect(targetTaskId,“directStreamDemo2”,new Values(word)); } } this.collector.ack(tuple); }}这里使用emitDirect(int taskId, String streamId, List<Object> tuple)方法指定了下游的taskId以及要发送到的streamId4、下游使用directGrouping连接上游bolt及streamId @Test public void testDirectGrouping() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“sentence-spout”, new SentenceSpout()); // SentenceSpout –> SplitSentenceBolt builder.setBolt(“split-bolt”, new SentenceDirectBolt()).shuffleGrouping(“sentence-spout”); // SplitSentenceBolt –> WordCountBolt //NOTE 4这里要指定上游的bolt以及要处理的streamId builder.setBolt(“count-bolt”, new WordCountBolt(),5).directGrouping(“split-bolt”,“directStreamDemo1”); // WordCountBolt –> ReportBolt builder.setBolt(“report-bolt”, new ReportBolt()).globalGrouping(“count-bolt”); submitRemote(builder); }这里count-bolt作为split-bolt的下游，使用了directGrouping，同时指定了要接收的streamId为directStreamDemo1小结direct grouping是一种特殊的grouping，它是由上游的producer直接指定下游哪个task去接收它发射出来的tuple。下游使用directGrouping连接上游同时指定要消费的streamId，上游在prepare的时候保存下游的taskId列表，然后在declareOutputFields的时候使用declareStream来声明streamId，最后在execute方法里头使用emitDirect(int taskId, String streamId, List<Object> tuple)方法指定了下游的taskId以及要发送到的streamIddocConceptsCommon Topology Patterns关于Storm Stream grouping ...

聊聊storm的CustomStreamGrouping

序本文主要研究一下storm的CustomStreamGroupingCustomStreamGroupingstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/CustomStreamGrouping.javapublic interface CustomStreamGrouping extends Serializable { /** * Tells the stream grouping at runtime the tasks in the target bolt. This information should be used in chooseTasks to determine the * target tasks. * * It also tells the grouping the metadata on the stream this grouping will be used on. / void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks); /* * This function implements a custom stream grouping. It takes in as input the number of tasks in the target bolt in prepare and returns * the tasks to send the tuples to. * * @param values the values to group on */ List<Integer> chooseTasks(int taskId, List<Object> values);}这里定义了prepare以及chooseTasks方法GrouperFactory里头定义了FieldsGrouper、GlobalGrouper、NoneGrouper、AllGrouper、BasicLoadAwareCustomStreamGrouping另外org.apache.storm.grouping包里头也定义了ShuffleGrouping、PartialKeyGrouping、LoadAwareShuffleGroupingFieldsGrouperstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java public static class FieldsGrouper implements CustomStreamGrouping { private Fields outFields; private List<List<Integer>> targetTasks; private Fields groupFields; private int numTasks; public FieldsGrouper(Fields outFields, Grouping thriftGrouping) { this.outFields = outFields; this.groupFields = new Fields(Thrift.fieldGrouping(thriftGrouping)); } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = new ArrayList<List<Integer>>(); for (Integer targetTask : targetTasks) { this.targetTasks.add(Collections.singletonList(targetTask)); } this.numTasks = targetTasks.size(); } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { int targetTaskIndex = TupleUtils.chooseTaskIndex(outFields.select(groupFields, values), numTasks); return targetTasks.get(targetTaskIndex); } }对选中fields的values通过TupleUtils.chooseTaskIndex选择task下标；chooseTaskIndex主要是采用Arrays.deepHashCode取哈希值然后对numTask向下取模GlobalGrouperstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java public static class GlobalGrouper implements CustomStreamGrouping { private List<Integer> targetTasks; public GlobalGrouper() { } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { if (targetTasks.isEmpty()) { return null; } // It’s possible for target to have multiple tasks if it reads multiple sources return Collections.singletonList(targetTasks.get(0)); } }这里固定取第一个task，即targetTasks.get(0)NoneGrouperstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java public static class NoneGrouper implements CustomStreamGrouping { private final Random random; private List<Integer> targetTasks; private int numTasks; public NoneGrouper() { random = new Random(); } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; this.numTasks = targetTasks.size(); } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { int index = random.nextInt(numTasks); return Collections.singletonList(targetTasks.get(index)); } }这里通过random.nextInt(numTasks)随机取taskAllGrouperstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java public static class AllGrouper implements CustomStreamGrouping { private List<Integer> targetTasks; @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { return targetTasks; } }这里返回所有的targetTasksShuffleGroupingstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/ShuffleGrouping.javapublic class ShuffleGrouping implements CustomStreamGrouping, Serializable { private ArrayList<List<Integer>> choices; private AtomicInteger current; @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { choices = new ArrayList<List<Integer>>(targetTasks.size()); for (Integer i : targetTasks) { choices.add(Arrays.asList(i)); } current = new AtomicInteger(0); Collections.shuffle(choices, new Random()); } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { int rightNow; int size = choices.size(); while (true) { rightNow = current.incrementAndGet(); if (rightNow < size) { return choices.get(rightNow); } else if (rightNow == size) { current.set(0); return choices.get(0); } } // race condition with another thread, and we lost. try again }}这里在prepare的时候对ArrayList<List<Integer>> choices进行随机化采用current.incrementAndGet()实现round robbin的效果，超过size的时候重置返回第一个，没有超过则返回incr后的index的值PartialKeyGroupingstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/PartialKeyGrouping.javapublic class PartialKeyGrouping implements CustomStreamGrouping, Serializable { private static final long serialVersionUID = -1672360572274911808L; private List<Integer> targetTasks; private Fields fields = null; private Fields outFields = null; private AssignmentCreator assignmentCreator; private TargetSelector targetSelector; public PartialKeyGrouping() { this(null); } public PartialKeyGrouping(Fields fields) { this(fields, new RandomTwoTaskAssignmentCreator(), new BalancedTargetSelector()); } public PartialKeyGrouping(Fields fields, AssignmentCreator assignmentCreator) { this(fields, assignmentCreator, new BalancedTargetSelector()); } public PartialKeyGrouping(Fields fields, AssignmentCreator assignmentCreator, TargetSelector targetSelector) { this.fields = fields; this.assignmentCreator = assignmentCreator; this.targetSelector = targetSelector; } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; if (this.fields != null) { this.outFields = context.getComponentOutputFields(stream); } } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { List<Integer> boltIds = new ArrayList<>(1); if (values.size() > 0) { final byte[] rawKeyBytes = getKeyBytes(values); final int[] taskAssignmentForKey = assignmentCreator.createAssignment(this.targetTasks, rawKeyBytes); final int selectedTask = targetSelector.chooseTask(taskAssignmentForKey); boltIds.add(selectedTask); } return boltIds; } //……}这里通过RandomTwoTaskAssignmentCreator来选中两个taskId，然后选择使用次数小的那个LoadAwareCustomStreamGroupingstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/LoadAwareCustomStreamGrouping.javapublic interface LoadAwareCustomStreamGrouping extends CustomStreamGrouping { void refreshLoad(LoadMapping loadMapping);}继承了CustomStreamGrouping接口，然后新定义了refreshLoad方法用于刷新负载，这里的负载主要是executor的receiveQueue的负载(qMetrics.population() / qMetrics.capacity())LoadAwareCustomStreamGrouping有几个实现类，有BasicLoadAwareCustomStreamGrouping以及LoadAwareShuffleGrouping小结storm的CustomStreamGrouping接口定义了chooseTasks方法，用于选择tasks来处理tuplesShuffleGrouping类似round robbin，FieldsGrouper则根据所选字段值采用Arrays.deepHashCode取哈希值然后对numTask向下取模，GlobalGrouper返回index为0的taskId，NoneGrouper则随机返回，AllGrouper不做过滤返回所有taskId，PartialKeyGrouping则使用key的哈希值作为seed，采用Random函数来计算两个taskId的下标，然后选择使用次数少的那个task。LoadAware的grouping有BasicLoadAwareCustomStreamGrouping以及LoadAwareShuffleGrouping，他们都实现了LoadAwareCustomStreamGrouping接口，该接口定义了refreshLoad方法，用于动态刷新负载，这里的负载主要是executor的receiveQueue的负载(qMetrics.population() / qMetrics.capacity())docStream groupings ...

聊聊storm的PartialKeyGrouping

序本文主要研究一下storm的PartialKeyGrouping实例 @Test public void testPartialKeyGrouping() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { String spoutId = “wordGenerator”; String counterId = “counter”; String aggId = “aggregator”; String intermediateRankerId = “intermediateRanker”; String totalRankerId = “finalRanker”; int TOP_N = 5; TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(spoutId, new TestWordSpout(), 5); //NOTE 通过partialKeyGrouping替代fieldsGrouping，实现较为均衡的负载到countBolt builder.setBolt(counterId, new RollingCountBolt(9, 3), 4).partialKeyGrouping(spoutId, new Fields(“word”)); builder.setBolt(aggId, new RollingCountAggBolt(), 4).fieldsGrouping(counterId, new Fields(“obj”)); builder.setBolt(intermediateRankerId, new IntermediateRankingsBolt(TOP_N), 4).fieldsGrouping(aggId, new Fields(“obj”)); builder.setBolt(totalRankerId, new TotalRankingsBolt(TOP_N)).globalGrouping(intermediateRankerId); submitRemote(builder); }值得注意的是在wordCount的bolt使用PartialKeyGrouping，同一个单词不再固定发给相同的task，因此这里还需要RollingCountAggBolt按fieldsGrouping进行合并。PartialKeyGrouping(1.2.2版)storm-core-1.2.2-sources.jar!/org/apache/storm/grouping/PartialKeyGrouping.javapublic class PartialKeyGrouping implements CustomStreamGrouping, Serializable { private static final long serialVersionUID = -447379837314000353L; private List<Integer> targetTasks; private long[] targetTaskStats; private HashFunction h1 = Hashing.murmur3_128(13); private HashFunction h2 = Hashing.murmur3_128(17); private Fields fields = null; private Fields outFields = null; public PartialKeyGrouping() { //Empty } public PartialKeyGrouping(Fields fields) { this.fields = fields; } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; targetTaskStats = new long[this.targetTasks.size()]; if (this.fields != null) { this.outFields = context.getComponentOutputFields(stream); } } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { List<Integer> boltIds = new ArrayList<>(1); if (values.size() > 0) { byte[] raw; if (fields != null) { List<Object> selectedFields = outFields.select(fields, values); ByteBuffer out = ByteBuffer.allocate(selectedFields.size() * 4); for (Object o: selectedFields) { if (o instanceof List) { out.putInt(Arrays.deepHashCode(((List)o).toArray())); } else if (o instanceof Object[]) { out.putInt(Arrays.deepHashCode((Object[])o)); } else if (o instanceof byte[]) { out.putInt(Arrays.hashCode((byte[]) o)); } else if (o instanceof short[]) { out.putInt(Arrays.hashCode((short[]) o)); } else if (o instanceof int[]) { out.putInt(Arrays.hashCode((int[]) o)); } else if (o instanceof long[]) { out.putInt(Arrays.hashCode((long[]) o)); } else if (o instanceof char[]) { out.putInt(Arrays.hashCode((char[]) o)); } else if (o instanceof float[]) { out.putInt(Arrays.hashCode((float[]) o)); } else if (o instanceof double[]) { out.putInt(Arrays.hashCode((double[]) o)); } else if (o instanceof boolean[]) { out.putInt(Arrays.hashCode((boolean[]) o)); } else if (o != null) { out.putInt(o.hashCode()); } else { out.putInt(0); } } raw = out.array(); } else { raw = values.get(0).toString().getBytes(); // assume key is the first field } int firstChoice = (int) (Math.abs(h1.hashBytes(raw).asLong()) % this.targetTasks.size()); int secondChoice = (int) (Math.abs(h2.hashBytes(raw).asLong()) % this.targetTasks.size()); int selected = targetTaskStats[firstChoice] > targetTaskStats[secondChoice] ? secondChoice : firstChoice; boltIds.add(targetTasks.get(selected)); targetTaskStats[selected]++; } return boltIds; }}可以看到PartialKeyGrouping是一种CustomStreamGrouping，在prepare的时候，初始化了long[] targetTaskStats用于统计每个taskpartialKeyGrouping如果没有指定fields，则默认按outputFields的第一个field来计算这里使用guava类库提供的Hashing.murmur3_128函数，构造了两个HashFunction，然后计算哈希值的绝对值与targetTasks.size()取余数得到两个可选的taskId下标然后根据targetTaskStats的统计值，取用过的次数小的那个taskId，选中之后更新targetTaskStatsPartialKeyGrouping(2.0.0版)storm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/PartialKeyGrouping.java/** * A variation on FieldGrouping. This grouping operates on a partitioning of the incoming tuples (like a FieldGrouping), but it can send * Tuples from a given partition to multiple downstream tasks. * * Given a total pool of target tasks, this grouping will always send Tuples with a given key to one member of a subset of those tasks. Each * key is assigned a subset of tasks. Each tuple is then sent to one task from that subset. * * Notes: - the default TaskSelector ensures each task gets as close to a balanced number of Tuples as possible - the default * AssignmentCreator hashes the key and produces an assignment of two tasks /public class PartialKeyGrouping implements CustomStreamGrouping, Serializable { private static final long serialVersionUID = -1672360572274911808L; private List<Integer> targetTasks; private Fields fields = null; private Fields outFields = null; private AssignmentCreator assignmentCreator; private TargetSelector targetSelector; public PartialKeyGrouping() { this(null); } public PartialKeyGrouping(Fields fields) { this(fields, new RandomTwoTaskAssignmentCreator(), new BalancedTargetSelector()); } public PartialKeyGrouping(Fields fields, AssignmentCreator assignmentCreator) { this(fields, assignmentCreator, new BalancedTargetSelector()); } public PartialKeyGrouping(Fields fields, AssignmentCreator assignmentCreator, TargetSelector targetSelector) { this.fields = fields; this.assignmentCreator = assignmentCreator; this.targetSelector = targetSelector; } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = targetTasks; if (this.fields != null) { this.outFields = context.getComponentOutputFields(stream); } } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { List<Integer> boltIds = new ArrayList<>(1); if (values.size() > 0) { final byte[] rawKeyBytes = getKeyBytes(values); final int[] taskAssignmentForKey = assignmentCreator.createAssignment(this.targetTasks, rawKeyBytes); final int selectedTask = targetSelector.chooseTask(taskAssignmentForKey); boltIds.add(selectedTask); } return boltIds; } /* * Extract the key from the input Tuple. / private byte[] getKeyBytes(List<Object> values) { byte[] raw; if (fields != null) { List<Object> selectedFields = outFields.select(fields, values); ByteBuffer out = ByteBuffer.allocate(selectedFields.size() * 4); for (Object o : selectedFields) { if (o instanceof List) { out.putInt(Arrays.deepHashCode(((List) o).toArray())); } else if (o instanceof Object[]) { out.putInt(Arrays.deepHashCode((Object[]) o)); } else if (o instanceof byte[]) { out.putInt(Arrays.hashCode((byte[]) o)); } else if (o instanceof short[]) { out.putInt(Arrays.hashCode((short[]) o)); } else if (o instanceof int[]) { out.putInt(Arrays.hashCode((int[]) o)); } else if (o instanceof long[]) { out.putInt(Arrays.hashCode((long[]) o)); } else if (o instanceof char[]) { out.putInt(Arrays.hashCode((char[]) o)); } else if (o instanceof float[]) { out.putInt(Arrays.hashCode((float[]) o)); } else if (o instanceof double[]) { out.putInt(Arrays.hashCode((double[]) o)); } else if (o instanceof boolean[]) { out.putInt(Arrays.hashCode((boolean[]) o)); } else if (o != null) { out.putInt(o.hashCode()); } else { out.putInt(0); } } raw = out.array(); } else { raw = values.get(0).toString().getBytes(); // assume key is the first field } return raw; } //……}2.0.0版本将逻辑封装到了RandomTwoTaskAssignmentCreator以及BalancedTargetSelector中RandomTwoTaskAssignmentCreatorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/PartialKeyGrouping.java /* * This interface is responsible for choosing a subset of the target tasks to use for a given key. * * NOTE: whatever scheme you use to create the assignment should be deterministic. This may be executed on multiple Storm Workers, thus * each of them needs to come up with the same assignment for a given key. / public interface AssignmentCreator extends Serializable { int[] createAssignment(List<Integer> targetTasks, byte[] key); } /========== Implementations ==========*/ /** * This implementation of AssignmentCreator chooses two arbitrary tasks. / public static class RandomTwoTaskAssignmentCreator implements AssignmentCreator { /* * Creates a two task assignment by selecting random tasks. / public int[] createAssignment(List<Integer> tasks, byte[] key) { // It is necessary that this produce a deterministic assignment based on the key, so seed the Random from the key final long seedForRandom = Arrays.hashCode(key); final Random random = new Random(seedForRandom); final int choice1 = random.nextInt(tasks.size()); int choice2 = random.nextInt(tasks.size()); // ensure that choice1 and choice2 are not the same task choice2 = choice1 == choice2 ? (choice2 + 1) % tasks.size() : choice2; return new int[]{ tasks.get(choice1), tasks.get(choice2) }; } }2.0.0版本不再使用guava类库提供的Hashing.murmur3_128哈希函数，转而使用key的哈希值作为seed，采用Random函数来计算两个taskId的下标，这里返回两个值供bolt做负载均衡选择BalancedTargetSelectorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/PartialKeyGrouping.java /* * This interface chooses one element from a task assignment to send a specific Tuple to. / public interface TargetSelector extends Serializable { Integer chooseTask(int[] assignedTasks); } /* * A basic implementation of target selection. This strategy chooses the task within the assignment that has received the fewest Tuples * overall from this instance of the grouping. / public static class BalancedTargetSelector implements TargetSelector { private Map<Integer, Long> targetTaskStats = Maps.newHashMap(); /* * Chooses one of the incoming tasks and selects the one that has been selected the fewest times so far. */ public Integer chooseTask(int[] assignedTasks) { Integer taskIdWithMinLoad = null; Long minTaskLoad = Long.MAX_VALUE; for (Integer currentTaskId : assignedTasks) { final Long currentTaskLoad = targetTaskStats.getOrDefault(currentTaskId, 0L); if (currentTaskLoad < minTaskLoad) { minTaskLoad = currentTaskLoad; taskIdWithMinLoad = currentTaskId; } } targetTaskStats.put(taskIdWithMinLoad, targetTaskStats.getOrDefault(taskIdWithMinLoad, 0L) + 1); return taskIdWithMinLoad; } }BalancedTargetSelector根据选中的taskId，然后根据targetTaskStats计算taskIdWithMinLoad返回FieldsGrouperstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java public static class FieldsGrouper implements CustomStreamGrouping { private Fields outFields; private List<List<Integer>> targetTasks; private Fields groupFields; private int numTasks; public FieldsGrouper(Fields outFields, Grouping thriftGrouping) { this.outFields = outFields; this.groupFields = new Fields(Thrift.fieldGrouping(thriftGrouping)); } @Override public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { this.targetTasks = new ArrayList<List<Integer>>(); for (Integer targetTask : targetTasks) { this.targetTasks.add(Collections.singletonList(targetTask)); } this.numTasks = targetTasks.size(); } @Override public List<Integer> chooseTasks(int taskId, List<Object> values) { int targetTaskIndex = TupleUtils.chooseTaskIndex(outFields.select(groupFields, values), numTasks); return targetTasks.get(targetTaskIndex); } }这里可以看到FieldsGrouper的chooseTasks方法使用TupleUtils.chooseTaskIndex来选择taskId下标TupleUtils.chooseTaskIndexstorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java public static <T> int chooseTaskIndex(List<T> keys, int numTasks) { return Math.floorMod(listHashCode(keys), numTasks); } private static <T> int listHashCode(List<T> alist) { if (alist == null) { return 1; } else { return Arrays.deepHashCode(alist.toArray()); } }这里先对keys进行listHashCode，然后与numTasks进行Math.floorMod运算，即向下取模listHashCode调用了Arrays.deepHashCode(alist.toArray())进行哈希值计算小结storm的PartialKeyGrouping是解决fieldsGrouping造成的bolt节点skewed load的问题fieldsGrouping采取的是对所选字段进行哈希然后与taskId数量向下取模来选择taskId的下标PartialKeyGrouping在1.2.2版本的实现是使用guava提供的Hashing.murmur3_128哈希函数计算哈希值，然后取绝对值与taskId数量取余数得到两个可选的taskId下标；在2.0.0版本则使用key的哈希值作为seed，采用Random函数来计算两个taskId的下标。注意这里返回两个值供bolt做负载均衡选择，这是与fieldsGrouping的差别。在得到两个候选taskId之后，PartialKeyGrouping额外维护了taskId的使用数，每次选择使用少的，与此同时也更新每次选择的计数。值得注意的是在wordCount的bolt使用PartialKeyGrouping，同一个单词不再固定发给相同的task，因此这里还需要RollingCountAggBolt按fieldsGrouping进行合并。docCommon Topology PatternsThe Power of Both Choices: Practical Load Balancing for Distributed Stream Processing EnginesStorm-源码分析-Streaming Grouping (backtype.storm.daemon.executor) ...

聊聊storm的AssignmentDistributionService

序本文主要研究一下storm的AssignmentDistributionServiceAssignmentDistributionServicestorm-2.0.0/storm-server/src/main/java/org/apache/storm/nimbus/AssignmentDistributionService.java/** * A service for distributing master assignments to supervisors, this service makes the assignments notification * asynchronous. * * We support multiple working threads to distribute assignment, every thread has a queue buffer. * * Master will shuffle its node request to the queues, if the target queue is full, we just discard the request, * let the supervisors sync instead. * * Caution: this class is not thread safe. * * <pre>{@code * Working mode * +——–+ +—————–+ * | queue1 | ==> | Working thread1 | * +——–+ shuffle +——–+ +—————–+ * | Master | ==> * +——–+ +——–+ +—————–+ * | queue2 | ==> | Working thread2 | * +——–+ +—————–+ * } * </pre> /public class AssignmentDistributionService implements Closeable { //…… private ExecutorService service; /* * Assignments request queue. / private volatile Map<Integer, LinkedBlockingQueue<NodeAssignments>> assignmentsQueue; /* * Add an assignments for a node/supervisor for distribution. * @param node node id of supervisor. * @param host host name for the node. * @param serverPort node thrift server port. * @param assignments the {@link org.apache.storm.generated.SupervisorAssignments} / public void addAssignmentsForNode(String node, String host, Integer serverPort, SupervisorAssignments assignments) { try { //For some reasons, we can not get supervisor port info, eg: supervisor shutdown, //Just skip for this scheduling round. if (serverPort == null) { LOG.warn(“Discard an assignment distribution for node {} because server port info is missing.”, node); return; } boolean success = nextQueue().offer(NodeAssignments.getInstance(node, host, serverPort, assignments), 5L, TimeUnit.SECONDS); if (!success) { LOG.warn(“Discard an assignment distribution for node {} because the target sub queue is full.”, node); } } catch (InterruptedException e) { LOG.error(“Add node assignments interrupted: {}”, e.getMessage()); throw new RuntimeException(e); } } private LinkedBlockingQueue<NodeAssignments> nextQueue() { return this.assignmentsQueue.get(nextQueueId()); }}Nimbus通过调用AssignmentDistributionService的addAssignmentsForNode，将任务分配结果通知到supervisoraddAssignmentsForNode主要是将SupervisorAssignments放入到assignmentsQueueAssignmentDistributionService.getInstancestorm-2.0.0/storm-server/src/main/java/org/apache/storm/nimbus/AssignmentDistributionService.java /* * Factory method for initialize a instance. * @param conf config. * @return an instance of {@link AssignmentDistributionService} / public static AssignmentDistributionService getInstance(Map conf) { AssignmentDistributionService service = new AssignmentDistributionService(); service.prepare(conf); return service; } /* * Function for initialization. * * @param conf config / public void prepare(Map conf) { this.conf = conf; this.random = new Random(47); this.threadsNum = ObjectReader.getInt(conf.get(DaemonConfig.NIMBUS_ASSIGNMENTS_SERVICE_THREADS), 10); this.queueSize = ObjectReader.getInt(conf.get(DaemonConfig.NIMBUS_ASSIGNMENTS_SERVICE_THREAD_QUEUE_SIZE), 100); this.assignmentsQueue = new HashMap<>(); for (int i = 0; i < threadsNum; i++) { this.assignmentsQueue.put(i, new LinkedBlockingQueue<NodeAssignments>(queueSize)); } //start the thread pool this.service = Executors.newFixedThreadPool(threadsNum); this.active = true; //start the threads for (int i = 0; i < threadsNum; i++) { this.service.submit(new DistributeTask(this, i)); } // for local cluster localSupervisors = new HashMap<>(); if (ConfigUtils.isLocalMode(conf)) { isLocalMode = true; } }getInstance方法new了一个AssignmentDistributionService，同时调用prepare方法进行初始化prepare的时候，创建了threadsNum数量的LinkedBlockingQueue，队列大小为DaemonConfig.NIMBUS_ASSIGNMENTS_SERVICE_THREAD_QUEUE_SIZE另外通过Executors.newFixedThreadPool(threadsNum)创建一个线程池，然后提交threadsNum数量的DistributeTask，每个queue对应一个DistributeTaskDistributeTaskstorm-2.0.0/storm-server/src/main/java/org/apache/storm/nimbus/AssignmentDistributionService.java /* * Task to distribute assignments. / static class DistributeTask implements Runnable { private AssignmentDistributionService service; private Integer queueIndex; DistributeTask(AssignmentDistributionService service, Integer index) { this.service = service; this.queueIndex = index; } @Override public void run() { while (service.isActive()) { try { NodeAssignments nodeAssignments = this.service.nextAssignments(queueIndex); sendAssignmentsToNode(nodeAssignments); } catch (InterruptedException e) { if (service.isActive()) { LOG.error(“Get an unexpected interrupt when distributing assignments to node, {}”, e.getCause()); } else { // service is off now just interrupt it. Thread.currentThread().interrupt(); } } } } private void sendAssignmentsToNode(NodeAssignments assignments) { if (this.service.isLocalMode) { //local node Supervisor supervisor = this.service.localSupervisors.get(assignments.getNode()); if (supervisor != null) { supervisor.sendSupervisorAssignments(assignments.getAssignments()); } else { LOG.error(“Can not find node {} for assignments distribution”, assignments.getNode()); throw new RuntimeException(“null for node " + assignments.getNode() + " supervisor instance.”); } } else { // distributed mode try (SupervisorClient client = SupervisorClient.getConfiguredClient(service.getConf(), assignments.getHost(), assignments.getServerPort())) { try { client.getClient().sendSupervisorAssignments(assignments.getAssignments()); } catch (Exception e) { //just ignore the exception. LOG.error(“Exception when trying to send assignments to node {}: {}”, assignments.getNode(), e.getMessage()); } } catch (Throwable e) { //just ignore any error/exception. LOG.error(“Exception to create supervisor client for node {}: {}”, assignments.getNode(), e.getMessage()); } } } } /* * Get an assignments from the target queue with the specific index. * @param queueIndex index of the queue * @return an {@link NodeAssignments} * @throws InterruptedException / public NodeAssignments nextAssignments(Integer queueIndex) throws InterruptedException { NodeAssignments target = null; while (true) { target = getQueueById(queueIndex).poll(); if (target != null) { return target; } Time.sleep(100L); } }AssignmentDistributionService在prepare的时候，会往线程池提交DistributeTaskDistributeTask的run方法不断循环，从对应的queue取NodeAssignments，然后调用sendAssignmentsToNode进行远程通信sendAssignmentsToNode调用client.getClient().sendSupervisorAssignments(assignments.getAssignments())Supervisor.launchSupervisorThriftServerstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Supervisor.java private void launchSupervisorThriftServer(Map<String, Object> conf) throws IOException { // validate port int port = getThriftServerPort(); try { ServerSocket socket = new ServerSocket(port); socket.close(); } catch (BindException e) { LOG.error("{} is not available. Check if another process is already listening on {}", port, port); throw new RuntimeException(e); } TProcessor processor = new org.apache.storm.generated.Supervisor.Processor( new org.apache.storm.generated.Supervisor.Iface() { @Override public void sendSupervisorAssignments(SupervisorAssignments assignments) throws AuthorizationException, TException { checkAuthorization(“sendSupervisorAssignments”); LOG.info(“Got an assignments from master, will start to sync with assignments: {}”, assignments); SynchronizeAssignments syn = new SynchronizeAssignments(getSupervisor(), assignments, getReadClusterState()); getEventManger().add(syn); } //…… }); this.thriftServer = new ThriftServer(conf, processor, ThriftConnectionType.SUPERVISOR); this.thriftServer.serve(); }Supervisor.launchSupervisorThriftServer的时候，添加了TProcessor，将SupervisorAssignments包装为SynchronizeAssignments添加到EventManager中SynchronizeAssignments.runstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/supervisor/timer/SynchronizeAssignments.java/* * A runnable which will synchronize assignments to node local and then worker processes. */public class SynchronizeAssignments implements Runnable { //…… @Override public void run() { // first sync assignments to local, then sync processes. if (null == assignments) { getAssignmentsFromMaster(this.supervisor.getConf(), this.supervisor.getStormClusterState(), this.supervisor.getAssignmentId()); } else { assignedAssignmentsToLocal(this.supervisor.getStormClusterState(), assignments); } this.readClusterState.run(); } private static void assignedAssignmentsToLocal(IStormClusterState clusterState, SupervisorAssignments assignments) { if (null == assignments) { //unknown error, just skip return; } Map<String, byte[]> serAssignments = new HashMap<>(); for (Map.Entry<String, Assignment> entry : assignments.get_storm_assignment().entrySet()) { serAssignments.put(entry.getKey(), Utils.serialize(entry.getValue())); } clusterState.syncRemoteAssignments(serAssignments); }}这里调用了assignedAssignmentsToLocal，然后还触发了this.readClusterState.run()assignedAssignmentsToLocal调用了clusterState.syncRemoteAssignments(serAssignments)StormClusterStateImpl.syncRemoteAssignmentsstorm-2.0.0/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java @Override public void syncRemoteAssignments(Map<String, byte[]> remote) { if (null != remote) { this.assignmentsBackend.syncRemoteAssignments(remote); } else { Map<String, byte[]> tmp = new HashMap<>(); List<String> stormIds = this.stateStorage.get_children(ClusterUtils.ASSIGNMENTS_SUBTREE, false); for (String stormId : stormIds) { byte[] assignment = this.stateStorage.get_data(ClusterUtils.assignmentPath(stormId), false); tmp.put(stormId, assignment); } this.assignmentsBackend.syncRemoteAssignments(tmp); } }这里将serAssignments信息更新到assignmentsBackend(即本地内存)如果remote为null，这里则从zk读取分配信息，然后更新到内存；zk地址为ClusterUtils.assignmentPath(stormId)(/assignments/{topologyId})ReadClusterState.runstorm-2.0.0/storm-server/src/main/java/org/apache/storm/daemon/supervisor/ReadClusterState.java @Override public synchronized void run() { try { List<String> stormIds = stormClusterState.assignments(null); Map<String, Assignment> assignmentsSnapshot = getAssignmentsSnapshot(stormClusterState); Map<Integer, LocalAssignment> allAssignments = readAssignments(assignmentsSnapshot); if (allAssignments == null) { //Something odd happened try again later return; } Map<String, List<ProfileRequest>> topoIdToProfilerActions = getProfileActions(stormClusterState, stormIds); HashSet<Integer> assignedPorts = new HashSet<>(); LOG.debug(“Synchronizing supervisor”); LOG.debug(“All assignment: {}”, allAssignments); LOG.debug(“Topology Ids -> Profiler Actions {}”, topoIdToProfilerActions); for (Integer port : allAssignments.keySet()) { if (iSuper.confirmAssigned(port)) { assignedPorts.add(port); } } HashSet<Integer> allPorts = new HashSet<>(assignedPorts); iSuper.assigned(allPorts); allPorts.addAll(slots.keySet()); Map<Integer, Set<TopoProfileAction>> filtered = new HashMap<>(); for (Entry<String, List<ProfileRequest>> entry : topoIdToProfilerActions.entrySet()) { String topoId = entry.getKey(); if (entry.getValue() != null) { for (ProfileRequest req : entry.getValue()) { NodeInfo ni = req.get_nodeInfo(); if (host.equals(ni.get_node())) { Long port = ni.get_port().iterator().next(); Set<TopoProfileAction> actions = filtered.get(port.intValue()); if (actions == null) { actions = new HashSet<>(); filtered.put(port.intValue(), actions); } actions.add(new TopoProfileAction(topoId, req)); } } } } for (Integer port : allPorts) { Slot slot = slots.get(port); if (slot == null) { slot = mkSlot(port); slots.put(port, slot); slot.start(); } slot.setNewAssignment(allAssignments.get(port)); slot.addProfilerActions(filtered.get(port)); } } catch (Exception e) { LOG.error(“Failed to Sync Supervisor”, e); throw new RuntimeException(e); } }这里调用slot的setNewAssignment进行分配，设置slot的AtomicReference<LocalAssignment> newAssignmentSlot的run方法会轮询通过stateMachineStep方法对newAssignment进行判断然后更新nextState小结Nimbus通过调用AssignmentDistributionService的addAssignmentsForNode，将任务分配结果通知到supervisoraddAssignmentsForNode主要是将SupervisorAssignments放入到assignmentsQueue；AssignmentDistributionService默认创建一个指定线程数的线程池，同时创建指定线程数的队列及DistributeTaskDistributeTask不断循环从指定queue拉取SynchronizeAssignments，然后调用sendAssignmentsToNode通知到supervisorSupervisor在启动的时候会launchSupervisorThriftServer，注册了响应sendSupervisorAssignments的processor，将接收到的SupervisorAssignments包装为SynchronizeAssignments添加到EventManager中EventManager处理SynchronizeAssignments时执行其run方法，调用了assignedAssignmentsToLocal，然后还触发了this.readClusterState.run()assignedAssignmentsToLocal调用了clusterState.syncRemoteAssignments(serAssignments)将分配信息更新到本地内存；而readClusterState.run()主要是更新slot的newAssignment值，之后依赖Slot的轮询去感知状态变化，然后触发相应的处理docUnderstanding the Parallelism of a Storm Topology ...

[case41]聊聊storm的GraphiteStormReporter

序本文主要研究一下storm的GraphiteStormReporterGraphiteStormReporterstorm-core-1.2.2-sources.jar!/org/apache/storm/metrics2/reporters/GraphiteStormReporter.javapublic class GraphiteStormReporter extends ScheduledStormReporter { private final static Logger LOG = LoggerFactory.getLogger(GraphiteStormReporter.class); public static final String GRAPHITE_PREFIXED_WITH = “graphite.prefixed.with”; public static final String GRAPHITE_HOST = “graphite.host”; public static final String GRAPHITE_PORT = “graphite.port”; public static final String GRAPHITE_TRANSPORT = “graphite.transport”; @Override public void prepare(MetricRegistry metricsRegistry, Map stormConf, Map reporterConf) { LOG.debug(“Preparing…”); GraphiteReporter.Builder builder = GraphiteReporter.forRegistry(metricsRegistry); TimeUnit durationUnit = MetricsUtils.getMetricsDurationUnit(reporterConf); if (durationUnit != null) { builder.convertDurationsTo(durationUnit); } TimeUnit rateUnit = MetricsUtils.getMetricsRateUnit(reporterConf); if (rateUnit != null) { builder.convertRatesTo(rateUnit); } StormMetricsFilter filter = getMetricsFilter(reporterConf); if(filter != null){ builder.filter(filter); } String prefix = getMetricsPrefixedWith(reporterConf); if (prefix != null) { builder.prefixedWith(prefix); } //defaults to 10 reportingPeriod = getReportPeriod(reporterConf); //defaults to seconds reportingPeriodUnit = getReportPeriodUnit(reporterConf); // Not exposed: // * withClock(Clock) String host = getMetricsTargetHost(reporterConf); Integer port = getMetricsTargetPort(reporterConf); String transport = getMetricsTargetTransport(reporterConf); GraphiteSender sender = null; if (transport.equalsIgnoreCase(“udp”)) { sender = new GraphiteUDP(host, port); } else { sender = new Graphite(host, port); } reporter = builder.build(sender); } private static String getMetricsPrefixedWith(Map reporterConf) { return Utils.getString(reporterConf.get(GRAPHITE_PREFIXED_WITH), null); } private static String getMetricsTargetHost(Map reporterConf) { return Utils.getString(reporterConf.get(GRAPHITE_HOST), null); } private static Integer getMetricsTargetPort(Map reporterConf) { return Utils.getInt(reporterConf.get(GRAPHITE_PORT), null); } private static String getMetricsTargetTransport(Map reporterConf) { return Utils.getString(reporterConf.get(GRAPHITE_TRANSPORT), “tcp”); }}继承了ScheduledStormReporter，实现prepare方法prepare方法根据配置文件创建com.codahale.metrics.graphite.GraphiteSender，然后创建com.codahale.metrics.graphite.GraphiteReporterScheduledStormReporterstorm-core-1.2.2-sources.jar!/org/apache/storm/metrics2/reporters/ScheduledStormReporter.javapublic abstract class ScheduledStormReporter implements StormReporter{ private static final Logger LOG = LoggerFactory.getLogger(ScheduledStormReporter.class); protected ScheduledReporter reporter; protected long reportingPeriod; protected TimeUnit reportingPeriodUnit; @Override public void start() { if (reporter != null) { LOG.debug(“Starting…”); reporter.start(reportingPeriod, reportingPeriodUnit); } else { throw new IllegalStateException(“Attempt to start without preparing " + getClass().getSimpleName()); } } @Override public void stop() { if (reporter != null) { LOG.debug(“Stopping…”); reporter.stop(); } else { throw new IllegalStateException(“Attempt to stop without preparing " + getClass().getSimpleName()); } } public static TimeUnit getReportPeriodUnit(Map<String, Object> reporterConf) { TimeUnit unit = getTimeUnitForConfig(reporterConf, REPORT_PERIOD_UNITS); return unit == null ? TimeUnit.SECONDS : unit; } private static TimeUnit getTimeUnitForConfig(Map reporterConf, String configName) { String rateUnitString = Utils.getString(reporterConf.get(configName), null); if (rateUnitString != null) { return TimeUnit.valueOf(rateUnitString); } return null; } public static long getReportPeriod(Map reporterConf) { return Utils.getInt(reporterConf.get(REPORT_PERIOD), 10).longValue(); } public static StormMetricsFilter getMetricsFilter(Map reporterConf){ StormMetricsFilter filter = null; Map<String, Object> filterConf = (Map)reporterConf.get(“filter”); if(filterConf != null) { String clazz = (String) filterConf.get(“class”); if (clazz != null) { filter = Utils.newInstance(clazz); filter.prepare(filterConf); } } return filter; }}ScheduledStormReporter封装了对reporter的生命周期的控制，启动时调用start，关闭时调用stop小结storm从1.2版本开始启用了新的metrics，即metrics2，新版的metrics基于Dropwizard Metrics默认提供了Console Reporter、CSV Reporter、Ganglia Reporter 、Graphite Reporter、JMX ReporterdocNew Metrics Reporting APIubuntu-graphite-grafana ...

聊聊storm worker的executor与task

序本文主要研究一下storm worker的executor与taskWorkerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java public static void main(String[] args) throws Exception { Preconditions.checkArgument(args.length == 5, “Illegal number of arguments. Expected: 5, Actual: " + args.length); String stormId = args[0]; String assignmentId = args[1]; String supervisorPort = args[2]; String portStr = args[3]; String workerId = args[4]; Map<String, Object> conf = ConfigUtils.readStormConfig(); Utils.setupDefaultUncaughtExceptionHandler(); StormCommon.validateDistributedMode(conf); Worker worker = new Worker(conf, null, stormId, assignmentId, Integer.parseInt(supervisorPort), Integer.parseInt(portStr), workerId); worker.start(); Utils.addShutdownHookWithForceKillIn1Sec(worker::shutdown); }main方法创建Worker，然后调用startWorker.startstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java public void start() throws Exception { LOG.info(“Launching worker for {} on {}:{} with id {} and conf {}”, topologyId, assignmentId, port, workerId, ConfigUtils.maskPasswords(conf)); // because in local mode, its not a separate // process. supervisor will register it in this case // if ConfigUtils.isLocalMode(conf) returns false then it is in distributed mode. if (!ConfigUtils.isLocalMode(conf)) { // Distributed mode SysOutOverSLF4J.sendSystemOutAndErrToSLF4J(); String pid = Utils.processPid(); FileUtils.touch(new File(ConfigUtils.workerPidPath(conf, workerId, pid))); FileUtils.writeStringToFile(new File(ConfigUtils.workerArtifactsPidPath(conf, topologyId, port)), pid, Charset.forName(“UTF-8”)); } final Map<String, Object> topologyConf = ConfigUtils.overrideLoginConfigWithSystemProperty(ConfigUtils.readSupervisorStormConf(conf, topologyId)); ClusterStateContext csContext = new ClusterStateContext(DaemonType.WORKER, topologyConf); IStateStorage stateStorage = ClusterUtils.mkStateStorage(conf, topologyConf, csContext); IStormClusterState stormClusterState = ClusterUtils.mkStormClusterState(stateStorage, null, csContext); StormMetricRegistry.start(conf, DaemonType.WORKER); Credentials initialCredentials = stormClusterState.credentials(topologyId, null); Map<String, String> initCreds = new HashMap<>(); if (initialCredentials != null) { initCreds.putAll(initialCredentials.get_creds()); } autoCreds = ClientAuthUtils.getAutoCredentials(topologyConf); subject = ClientAuthUtils.populateSubject(null, autoCreds, initCreds); Subject.doAs(subject, (PrivilegedExceptionAction<Object>) () -> loadWorker(topologyConf, stateStorage, stormClusterState, initCreds, initialCredentials) ); }这里主要是调用loadWorkerWorker.loadWorkerstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java private AtomicReference<List<IRunningExecutor>> executorsAtom; private Object loadWorker(Map<String, Object> topologyConf, IStateStorage stateStorage, IStormClusterState stormClusterState, Map<String, String> initCreds, Credentials initialCredentials) throws Exception { workerState = new WorkerState(conf, context, topologyId, assignmentId, supervisorPort, port, workerId, topologyConf, stateStorage, stormClusterState, autoCreds); // Heartbeat here so that worker process dies if this fails // it’s important that worker heartbeat to supervisor ASAP so that supervisor knows // that worker is running and moves on doHeartBeat(); executorsAtom = new AtomicReference<>(null); // launch heartbeat threads immediately so that slow-loading tasks don’t cause the worker to timeout // to the supervisor workerState.heartbeatTimer .scheduleRecurring(0, (Integer) conf.get(Config.WORKER_HEARTBEAT_FREQUENCY_SECS), () -> { try { doHeartBeat(); } catch (IOException e) { throw new RuntimeException(e); } }); workerState.executorHeartbeatTimer .scheduleRecurring(0, (Integer) conf.get(Config.EXECUTOR_METRICS_FREQUENCY_SECS), Worker.this::doExecutorHeartbeats); workerState.registerCallbacks(); workerState.refreshConnections(null); workerState.activateWorkerWhenAllConnectionsReady(); workerState.refreshStormActive(null); workerState.runWorkerStartHooks(); List<Executor> execs = new ArrayList<>(); for (List<Long> e : workerState.getLocalExecutors()) { if (ConfigUtils.isLocalMode(topologyConf)) { Executor executor = LocalExecutor.mkExecutor(workerState, e, initCreds); execs.add(executor); for (int i = 0; i < executor.getTaskIds().size(); ++i) { workerState.localReceiveQueues.put(executor.getTaskIds().get(i), executor.getReceiveQueue()); } } else { Executor executor = Executor.mkExecutor(workerState, e, initCreds); for (int i = 0; i < executor.getTaskIds().size(); ++i) { workerState.localReceiveQueues.put(executor.getTaskIds().get(i), executor.getReceiveQueue()); } execs.add(executor); } } List<IRunningExecutor> newExecutors = new ArrayList<IRunningExecutor>(); for (Executor executor : execs) { newExecutors.add(executor.execute()); } executorsAtom.set(newExecutors); //…… setupFlushTupleTimer(topologyConf, newExecutors); setupBackPressureCheckTimer(topologyConf); LOG.info(“Worker has topology config {}”, ConfigUtils.maskPasswords(topologyConf)); LOG.info(“Worker {} for storm {} on {}:{} has finished loading”, workerId, topologyId, assignmentId, port); return this; }这里通过workerState.getLocalExecutors()获取List<Long> executorId的集合然后通过Executor.mkExecutor创建指定数量的Executor，然后调用execute()方法转换为ExecutorShutdown，然后保存到AtomicReference<List<IRunningExecutor>> executorsAtomWorkerState.getLocalExecutorsstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java // local executors and localTaskIds running in this worker final Set<List<Long>> localExecutors; public Set<List<Long>> getLocalExecutors() { return localExecutors; } public WorkerState(Map<String, Object> conf, IContext mqContext, String topologyId, String assignmentId, int supervisorPort, int port, String workerId, Map<String, Object> topologyConf, IStateStorage stateStorage, IStormClusterState stormClusterState, Collection<IAutoCredentials> autoCredentials) throws IOException, InvalidTopologyException { this.autoCredentials = autoCredentials; this.conf = conf; this.localExecutors = new HashSet<>(readWorkerExecutors(stormClusterState, topologyId, assignmentId, port)); //…… } private List<List<Long>> readWorkerExecutors(IStormClusterState stormClusterState, String topologyId, String assignmentId, int port) { LOG.info(“Reading assignments”); List<List<Long>> executorsAssignedToThisWorker = new ArrayList<>(); executorsAssignedToThisWorker.add(Constants.SYSTEM_EXECUTOR_ID); Map<List<Long>, NodeInfo> executorToNodePort = getLocalAssignment(conf, stormClusterState, topologyId).get_executor_node_port(); for (Map.Entry<List<Long>, NodeInfo> entry : executorToNodePort.entrySet()) { NodeInfo nodeInfo = entry.getValue(); if (nodeInfo.get_node().equals(assignmentId) && nodeInfo.get_port().iterator().next() == port) { executorsAssignedToThisWorker.add(entry.getKey()); } } return executorsAssignedToThisWorker; } private Assignment getLocalAssignment(Map<String, Object> conf, IStormClusterState stormClusterState, String topologyId) { if (!ConfigUtils.isLocalMode(conf)) { try (SupervisorClient supervisorClient = SupervisorClient.getConfiguredClient(conf, Utils.hostname(), supervisorPort)) { Assignment assignment = supervisorClient.getClient().getLocalAssignmentForStorm(topologyId); return assignment; } catch (Throwable tr1) { //if any error/exception thrown, fetch it from zookeeper return stormClusterState.remoteAssignmentInfo(topologyId, null); } } else { return stormClusterState.remoteAssignmentInfo(topologyId, null); } }WorkerState在构造器里头通过readWorkerExecutors获取在本worker运行的executorIds通过getLocalAssignment方法获取Assignment，然后通过get_executor_node_port方法获取Map<List<Long>, NodeInfo> executorToNodePortgetLocalAssignment通过supervisorClient.getClient().getLocalAssignmentForStorm(topologyId)获取Assignment，如果出现异常则通过stormClusterState.remoteAssignmentInfo从zookeeper获取StormClusterStateImpl.remoteAssignmentInfostorm-2.0.0/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java public Assignment remoteAssignmentInfo(String stormId, Runnable callback) { if (callback != null) { assignmentInfoCallback.put(stormId, callback); } byte[] serialized = stateStorage.get_data(ClusterUtils.assignmentPath(stormId), callback != null); return ClusterUtils.maybeDeserialize(serialized, Assignment.class); }根据topologyId从ClusterUtils.assignmentPath获取路径，然后去zookeeper获取数据数据采用thrift序列化，取回来需要反序列化ClusterUtils.assignmentPathstorm-2.0.0/storm-client/src/jvm/org/apache/storm/cluster/ClusterUtils.java public static final String ZK_SEPERATOR = “/”; public static final String ASSIGNMENTS_ROOT = “assignments”; public static final String ASSIGNMENTS_SUBTREE = ZK_SEPERATOR + ASSIGNMENTS_ROOT; public static String assignmentPath(String id) { return ASSIGNMENTS_SUBTREE + ZK_SEPERATOR + id; }路径为/assignments/{topology}，比如/assignments/DemoTopology-1-1539163962Executor.mkExecutorstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java public static Executor mkExecutor(WorkerState workerState, List<Long> executorId, Map<String, String> credentials) { Executor executor; WorkerTopologyContext workerTopologyContext = workerState.getWorkerTopologyContext(); List<Integer> taskIds = StormCommon.executorIdToTasks(executorId); String componentId = workerTopologyContext.getComponentId(taskIds.get(0)); String type = getExecutorType(workerTopologyContext, componentId); if (ClientStatsUtil.SPOUT.equals(type)) { executor = new SpoutExecutor(workerState, executorId, credentials); } else { executor = new BoltExecutor(workerState, executorId, credentials); } int minId = Integer.MAX_VALUE; Map<Integer, Task> idToTask = new HashMap<>(); for (Integer taskId : taskIds) { minId = Math.min(minId, taskId); try { Task task = new Task(executor, taskId); idToTask.put(taskId, task); } catch (IOException ex) { throw Utils.wrapInRuntime(ex); } } executor.idToTaskBase = minId; executor.idToTask = Utils.convertToArray(idToTask, minId); return executor; }根据组件类型创建SpoutExecutor或者BoltExecutor然后创建tasks并绑定到executorExecutor.executestorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java /** * separated from mkExecutor in order to replace executor transfer in executor data for testing. / public ExecutorShutdown execute() throws Exception { LOG.info(“Loading executor tasks " + componentId + “:” + executorId); String handlerName = componentId + “-executor” + executorId; Utils.SmartThread handler = Utils.asyncLoop(this, false, reportErrorDie, Thread.NORM_PRIORITY, true, true, handlerName); LOG.info(“Finished loading executor " + componentId + “:” + executorId); return new ExecutorShutdown(this, Lists.newArrayList(handler), idToTask, receiveQueue); }这里使用Utils.asyncLoop创建Utils.SmartThread并且调用start启动Utils.asyncLoopstorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/Utils.java /* * Creates a thread that calls the given code repeatedly, sleeping for an interval of seconds equal to the return value of the previous * call. * * The given afn may be a callable that returns the number of seconds to sleep, or it may be a Callable that returns another Callable * that in turn returns the number of seconds to sleep. In the latter case isFactory. * * @param afn the code to call on each iteration * @param isDaemon whether the new thread should be a daemon thread * @param eh code to call when afn throws an exception * @param priority the new thread’s priority * @param isFactory whether afn returns a callable instead of sleep seconds * @param startImmediately whether to start the thread before returning * @param threadName a suffix to be appended to the thread name * @return the newly created thread * * @see Thread / public static SmartThread asyncLoop(final Callable afn, boolean isDaemon, final Thread.UncaughtExceptionHandler eh, int priority, final boolean isFactory, boolean startImmediately, String threadName) { SmartThread thread = new SmartThread(new Runnable() { public void run() { try { final Callable<Long> fn = isFactory ? (Callable<Long>) afn.call() : afn; while (true) { if (Thread.interrupted()) { throw new InterruptedException(); } final Long s = fn.call(); if (s == null) { // then stop running it break; } if (s > 0) { Time.sleep(s); } } } catch (Throwable t) { if (Utils.exceptionCauseIsInstanceOf( InterruptedException.class, t)) { LOG.info(“Async loop interrupted!”); return; } LOG.error(“Async loop died!”, t); throw new RuntimeException(t); } } }); if (eh != null) { thread.setUncaughtExceptionHandler(eh); } else { thread.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() { public void uncaughtException(Thread t, Throwable e) { LOG.error(“Async loop died!”, e); Utils.exitProcess(1, “Async loop died!”); } }); } thread.setDaemon(isDaemon); thread.setPriority(priority); if (threadName != null && !threadName.isEmpty()) { thread.setName(thread.getName() + “-” + threadName); } if (startImmediately) { thread.start(); } return thread; }这里run方法无限循环调用fn.call()，也就是调用Executor.call().call()方法BoltExecutor.call主要是调用receiveQueue.consume方法SpoutExecutor.call除了调用receiveQueue.consume方法，还调用了spouts.get(j).nextTuple()receiveQueue.consumestorm-2.0.0/storm-client/src/jvm/org/apache/storm/utils/JCQueue.java /* * Non blocking. Returns immediately if Q is empty. Returns number of elements consumed from Q / public int consume(JCQueue.Consumer consumer) { return consume(consumer, continueRunning); } /* * Non blocking. Returns immediately if Q is empty. Runs till Q is empty OR exitCond.keepRunning() return false. Returns number of * elements consumed from Q / public int consume(JCQueue.Consumer consumer, ExitCondition exitCond) { try { return consumeImpl(consumer, exitCond); } catch (InterruptedException e) { throw new RuntimeException(e); } } /* * Non blocking. Returns immediately if Q is empty. Returns number of elements consumed from Q * * @param consumer * @param exitCond */ private int consumeImpl(Consumer consumer, ExitCondition exitCond) throws InterruptedException { int drainCount = 0; while (exitCond.keepRunning()) { Object tuple = recvQueue.poll(); if (tuple == null) { break; } consumer.accept(tuple); ++drainCount; } int overflowDrainCount = 0; int limit = overflowQ.size(); while (exitCond.keepRunning() && (overflowDrainCount < limit)) { // 2nd cond prevents staying stuck with consuming overflow Object tuple = overflowQ.poll(); ++overflowDrainCount; consumer.accept(tuple); } int total = drainCount + overflowDrainCount; if (total > 0) { consumer.flush(); } return total; }consume方法主要是调用consumer的accept方法Taskstorm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/Task.javapublic class Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private final TaskMetrics taskMetrics; private Executor executor; private WorkerState workerData; private TopologyContext systemTopologyContext; private TopologyContext userTopologyContext; private WorkerTopologyContext workerTopologyContext; private Integer taskId; private String componentId; private Object taskObject; // Spout/Bolt object private Map<String, Object> topoConf; private BooleanSupplier emitSampler; private CommonStats executorStats; private Map<String, Map<String, LoadAwareCustomStreamGrouping>> streamComponentToGrouper; private HashMap<String, ArrayList<LoadAwareCustomStreamGrouping>> streamToGroupers; private boolean debug; public Task(Executor executor, Integer taskId) throws IOException { this.taskId = taskId; this.executor = executor; this.workerData = executor.getWorkerData(); this.topoConf = executor.getTopoConf(); this.componentId = executor.getComponentId(); this.streamComponentToGrouper = executor.getStreamToComponentToGrouper(); this.streamToGroupers = getGroupersPerStream(streamComponentToGrouper); this.executorStats = executor.getStats(); this.workerTopologyContext = executor.getWorkerTopologyContext(); this.emitSampler = ConfigUtils.mkStatsSampler(topoConf); this.systemTopologyContext = mkTopologyContext(workerData.getSystemTopology()); this.userTopologyContext = mkTopologyContext(workerData.getTopology()); this.taskObject = mkTaskObject(); this.debug = topoConf.containsKey(Config.TOPOLOGY_DEBUG) && (Boolean) topoConf.get(Config.TOPOLOGY_DEBUG); this.addTaskHooks(); this.taskMetrics = new TaskMetrics(this.workerTopologyContext, this.componentId, this.taskId); } //……}Executor.acceptstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/Executor.java @Override public void accept(Object event) { AddressedTuple addressedTuple = (AddressedTuple) event; int taskId = addressedTuple.getDest(); TupleImpl tuple = (TupleImpl) addressedTuple.getTuple(); if (isDebug) { LOG.info(“Processing received message FOR {} TUPLE: {}”, taskId, tuple); } try { if (taskId != AddressedTuple.BROADCAST_DEST) { tupleActionFn(taskId, tuple); } else { for (Integer t : taskIds) { tupleActionFn(t, tuple); } } } catch (Exception e) { throw new RuntimeException(e); } }accept方法主要是对每个taskId，挨个调用tupleActionFn方法BoltExecutor.tupleActionFn主要是从task获取boltObject，然后调用boltObject.execute(tuple);SpoutExecutor.tupleActionFn主要是从RotatingMap<Long, TupleInfo> pending取出TupleInfo，然后进行成功或失败的ackExecutorShutdownstorm-2.0.0/storm-client/src/jvm/org/apache/storm/executor/ExecutorShutdown.javapublic class ExecutorShutdown implements Shutdownable, IRunningExecutor { private static final Logger LOG = LoggerFactory.getLogger(ExecutorShutdown.class); private final Executor executor; private final List<Utils.SmartThread> threads; private final ArrayList<Task> taskDatas; private final JCQueue receiveQueue; //…… @Override public void credentialsChanged(Credentials credentials) { TupleImpl tuple = new TupleImpl(executor.getWorkerTopologyContext(), new Values(credentials), Constants.SYSTEM_COMPONENT_ID, (int) Constants.SYSTEM_TASK_ID, Constants.CREDENTIALS_CHANGED_STREAM_ID); AddressedTuple addressedTuple = new AddressedTuple(AddressedTuple.BROADCAST_DEST, tuple); try { executor.getReceiveQueue().publish(addressedTuple); executor.getReceiveQueue().flush(); } catch (InterruptedException e) { throw new RuntimeException(e); } } public void loadChanged(LoadMapping loadMapping) { executor.reflectNewLoadMapping(loadMapping); } @Override public JCQueue getReceiveQueue() { return receiveQueue; } @Override public boolean publishFlushTuple() { return executor.publishFlushTuple(); } @Override public void shutdown() { try { LOG.info(“Shutting down executor " + executor.getComponentId() + “:” + executor.getExecutorId()); executor.getReceiveQueue().close(); for (Utils.SmartThread t : threads) { t.interrupt(); } for (Utils.SmartThread t : threads) { LOG.debug(“Executor " + executor.getComponentId() + “:” + executor.getExecutorId() + " joining thread " + t.getName()); t.join(); } executor.getStats().cleanupStats(); for (Task task : taskDatas) { if (task == null) { continue; } TopologyContext userContext = task.getUserContext(); for (ITaskHook hook : userContext.getHooks()) { hook.cleanup(); } } executor.getStormClusterState().disconnect(); if (executor.getOpenOrPrepareWasCalled().get()) { for (Task task : taskDatas) { if (task == null) { continue; } Object object = task.getTaskObject(); if (object instanceof ISpout) { ((ISpout) object).close(); } else if (object instanceof IBolt) { ((IBolt) object).cleanup(); } else { LOG.error(“unknown component object”); } } } LOG.info(“Shut down executor " + executor.getComponentId() + “:” + executor.getExecutorId()); } catch (Exception e) { throw Utils.wrapInRuntime(e); } }}ExecutorShutdown主要包装了一下shutdown的处理小结worker启动之后从去zk的/assignments/{topology}路径，比如/assignments/DemoTopology-1-1539163962读取assignment信息然后根据assignment信息获取Map<List<Long>, NodeInfo> executorToNodePort，然后通过Executor.mkExecutor创建Executor创建Executor的时候根据assignment信息中的task信息创建Task绑定到Executor之后调用executor的execute方法，这个方法启动Utils.SmartThread，该thread循环调用Executor.call().call()方法BoltExecutor.call主要是调用receiveQueue.consume方法；SpoutExecutor.call除了调用receiveQueue.consume方法，还调用了spouts.get(j).nextTuple()receiveQueue.consume方法主要是调用Executor的accept方法，而accept方法主要是对每个taskId，挨个调用tupleActionFn方法BoltExecutor.tupleActionFn主要是从task获取boltObject，然后调用boltObject.execute(tuple)；SpoutExecutor.tupleActionFn主要是从RotatingMap<Long, TupleInfo> pending取出TupleInfo，然后进行成功或失败的ackworker可以理解为进程，executor即为该进程里头的线程数，而task则可以理解为spout或bolt的实例，默认是一个executor对应一个spout或bolt的task增加worker或executor可以对supervisor进行扩容，这个过程称之为rebalance，而task则作为载体及任务的抽象从负载大的worker的executor转到新worker的executor上，实现rebalance(rebalance命令只能重新调整worker、executor数量，无法改变task数量)docStorm-源码分析- Component ,Executor ,Task之间关系Understanding the Parallelism of a Storm Topology ...

聊聊storm supervisor的启动

序本文主要研究一下storm supervisor的启动Supervisor.launchstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/Supervisor.java /** * Launch the supervisor / public void launch() throws Exception { LOG.info(“Starting Supervisor with conf {}”, conf); String path = ConfigUtils.supervisorTmpDir(conf); FileUtils.cleanDirectory(new File(path)); Localizer localizer = getLocalizer(); SupervisorHeartbeat hb = new SupervisorHeartbeat(conf, this); hb.run(); // should synchronize supervisor so it doesn’t launch anything after being down (optimization) Integer heartbeatFrequency = Utils.getInt(conf.get(Config.SUPERVISOR_HEARTBEAT_FREQUENCY_SECS)); heartbeatTimer.scheduleRecurring(0, heartbeatFrequency, hb); this.eventManager = new EventManagerImp(false); this.readState = new ReadClusterState(this); Set<String> downloadedTopoIds = SupervisorUtils.readDownloadedTopologyIds(conf); Map<Integer, LocalAssignment> portToAssignments = localState.getLocalAssignmentsMap(); if (portToAssignments != null) { Map<String, LocalAssignment> assignments = new HashMap<>(); for (LocalAssignment la : localState.getLocalAssignmentsMap().values()) { assignments.put(la.get_topology_id(), la); } for (String topoId : downloadedTopoIds) { LocalAssignment la = assignments.get(topoId); if (la != null) { SupervisorUtils.addBlobReferences(localizer, topoId, conf, la.get_owner()); } else { LOG.warn(“Could not find an owner for topo {}”, topoId); } } } // do this after adding the references so we don’t try to clean things being used localizer.startCleaner(); UpdateBlobs updateBlobsThread = new UpdateBlobs(this); if ((Boolean) conf.get(Config.SUPERVISOR_ENABLE)) { // This isn’t strictly necessary, but it doesn’t hurt and ensures that the machine stays up // to date even if callbacks don’t all work exactly right eventTimer.scheduleRecurring(0, 10, new EventManagerPushCallback(readState, eventManager)); // Blob update thread. Starts with 30 seconds delay, every 30 seconds blobUpdateTimer.scheduleRecurring(30, 30, new EventManagerPushCallback(updateBlobsThread, eventManager)); // supervisor health check eventTimer.scheduleRecurring(300, 300, new SupervisorHealthCheck(this)); } LOG.info(“Starting supervisor with id {} at host {}.”, getId(), getHostName()); }supervisor launch的时候new了一个ReadClusterStateReadClusterStatestorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/ReadClusterState.java public ReadClusterState(Supervisor supervisor) throws Exception { this.superConf = supervisor.getConf(); this.stormClusterState = supervisor.getStormClusterState(); this.syncSupEventManager = supervisor.getEventManger(); this.assignmentVersions = new AtomicReference<Map<String, VersionedData<Assignment>>>(new HashMap<String, VersionedData<Assignment>>()); this.assignmentId = supervisor.getAssignmentId(); this.iSuper = supervisor.getiSupervisor(); this.localizer = supervisor.getAsyncLocalizer(); this.host = supervisor.getHostName(); this.localState = supervisor.getLocalState(); this.clusterState = supervisor.getStormClusterState(); this.cachedAssignments = supervisor.getCurrAssignment(); this.launcher = ContainerLauncher.make(superConf, assignmentId, supervisor.getSharedContext()); @SuppressWarnings(“unchecked”) List<Number> ports = (List<Number>)superConf.get(Config.SUPERVISOR_SLOTS_PORTS); for (Number port: ports) { slots.put(port.intValue(), mkSlot(port.intValue())); } try { Collection<String> workers = SupervisorUtils.supervisorWorkerIds(superConf); for (Slot slot: slots.values()) { String workerId = slot.getWorkerId(); if (workerId != null) { workers.remove(workerId); } } if (!workers.isEmpty()) { supervisor.killWorkers(workers, launcher); } } catch (Exception e) { LOG.warn(“Error trying to clean up old workers”, e); } //All the slots/assignments should be recovered now, so we can clean up anything that we don’t expect to be here try { localizer.cleanupUnusedTopologies(); } catch (Exception e) { LOG.warn(“Error trying to clean up old topologies”, e); } for (Slot slot: slots.values()) { slot.start(); } } private Slot mkSlot(int port) throws Exception { return new Slot(localizer, superConf, launcher, host, port, localState, clusterState, iSuper, cachedAssignments); }这里读取SUPERVISOR_SLOTS_PORTS(supervisor.slots.ports)，默认是[6700,6701,6702,6703]通过ContainerLauncher.make(superConf, assignmentId, supervisor.getSharedContext())创建ContainerLauncher根据slots的port配置调用mkSlot创建slot，最后挨个调用slot的start，启动slot线程ContainerLauncher.makestorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/ContainerLauncher.java /* * Factory to create the right container launcher * for the config and the environment. * @param conf the config * @param supervisorId the ID of the supervisor * @param sharedContext Used in local mode to let workers talk together without netty * @return the proper container launcher * @throws IOException on any error / public static ContainerLauncher make(Map<String, Object> conf, String supervisorId, IContext sharedContext) throws IOException { if (ConfigUtils.isLocalMode(conf)) { return new LocalContainerLauncher(conf, supervisorId, sharedContext); } if (Utils.getBoolean(conf.get(Config.SUPERVISOR_RUN_WORKER_AS_USER), false)) { return new RunAsUserContainerLauncher(conf, supervisorId); } return new BasicContainerLauncher(conf, supervisorId); }这里根据配置来创建ContainerLauncher的不同子类，local模式的创建的是LocalContainerLauncher；要求runAsUser的创建的是RunAsUserContainerLauncher；其他的创建的是BasicContainerLauncherSlotstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/Slot.java public void run() { try { while(!done) { Set<TopoProfileAction> origProfileActions = new HashSet<>(profiling.get()); Set<TopoProfileAction> removed = new HashSet<>(origProfileActions); DynamicState nextState = stateMachineStep(dynamicState.withNewAssignment(newAssignment.get()) .withProfileActions(origProfileActions, dynamicState.pendingStopProfileActions), staticState); if (LOG.isDebugEnabled() || dynamicState.state != nextState.state) { LOG.info(“STATE {} -> {}”, dynamicState, nextState); } //Save the current state for recovery if (!equivalent(nextState.currentAssignment, dynamicState.currentAssignment)) { LOG.info(“SLOT {}: Changing current assignment from {} to {}”, staticState.port, dynamicState.currentAssignment, nextState.currentAssignment); saveNewAssignment(nextState.currentAssignment); } if (equivalent(nextState.newAssignment, nextState.currentAssignment) && nextState.currentAssignment != null && nextState.currentAssignment.get_owner() == null && nextState.newAssignment != null && nextState.newAssignment.get_owner() != null) { //This is an odd case for a rolling upgrade where the user on the old assignment may be null, // but not on the new one. Although in all other ways they are the same. // If this happens we want to use the assignment with the owner. LOG.info(“Updating assignment to save owner {}”, nextState.newAssignment.get_owner()); saveNewAssignment(nextState.newAssignment); nextState = nextState.withCurrentAssignment(nextState.container, nextState.newAssignment); } // clean up the profiler actions that are not being processed removed.removeAll(dynamicState.profileActions); removed.removeAll(dynamicState.pendingStopProfileActions); for (TopoProfileAction action: removed) { try { clusterState.deleteTopologyProfileRequests(action.topoId, action.request); } catch (Exception e) { LOG.error(“Error trying to remove profiling request, it will be retried”, e); } } Set<TopoProfileAction> orig, copy; do { orig = profiling.get(); copy = new HashSet<>(orig); copy.removeAll(removed); } while (!profiling.compareAndSet(orig, copy)); dynamicState = nextState; } } catch (Throwable e) { if (!Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)) { LOG.error(“Error when processing event”, e); Utils.exitProcess(20, “Error when processing an event”); } } } private void saveNewAssignment(LocalAssignment assignment) { synchronized(staticState.localState) { Map<Integer, LocalAssignment> assignments = staticState.localState.getLocalAssignmentsMap(); if (assignments == null) { assignments = new HashMap<>(); } if (assignment == null) { assignments.remove(staticState.port); } else { assignments.put(staticState.port, assignment); } staticState.localState.setLocalAssignmentsMap(assignments); } Map<Long, LocalAssignment> update = null; Map<Long, LocalAssignment> orig = null; do { Long lport = new Long(staticState.port); orig = cachedCurrentAssignments.get(); update = new HashMap<>(orig); if (assignment == null) { update.remove(lport); } else { update.put(lport, assignment); } } while (!cachedCurrentAssignments.compareAndSet(orig, update)); } static DynamicState stateMachineStep(DynamicState dynamicState, StaticState staticState) throws Exception { LOG.debug(“STATE {}”, dynamicState.state); switch (dynamicState.state) { case EMPTY: return handleEmpty(dynamicState, staticState); case RUNNING: return handleRunning(dynamicState, staticState); case WAITING_FOR_WORKER_START: return handleWaitingForWorkerStart(dynamicState, staticState); case KILL_AND_RELAUNCH: return handleKillAndRelaunch(dynamicState, staticState); case KILL: return handleKill(dynamicState, staticState); case WAITING_FOR_BASIC_LOCALIZATION: return handleWaitingForBasicLocalization(dynamicState, staticState); case WAITING_FOR_BLOB_LOCALIZATION: return handleWaitingForBlobLocalization(dynamicState, staticState); default: throw new IllegalStateException(“Code not ready to handle a state of “+dynamicState.state); } }不断循环stateMachineStep方法切换state当state是WAITING_FOR_BLOB_LOCALIZATION时，会触发handleWaitingForBlobLocalizationhandleWaitingForBlobLocalizationstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/Slot.java /* * State Transitions for WAITING_FOR_BLOB_LOCALIZATION state. * PRECONDITION: neither pendingLocalization nor pendingDownload is null. * PRECONDITION: The slot should be empty * @param dynamicState current state * @param staticState static data * @return the next state * @throws Exception on any error / static DynamicState handleWaitingForBlobLocalization(DynamicState dynamicState, StaticState staticState) throws Exception { assert(dynamicState.pendingLocalization != null); assert(dynamicState.pendingDownload != null); assert(dynamicState.container == null); //Ignore changes to scheduling while downloading the topology blobs // We don’t support canceling the download through the future yet, // so to keep everything in sync, just wait try { dynamicState.pendingDownload.get(1000, TimeUnit.MILLISECONDS); //Downloading of all blobs finished. if (!equivalent(dynamicState.newAssignment, dynamicState.pendingLocalization)) { //Scheduling changed staticState.localizer.releaseSlotFor(dynamicState.pendingLocalization, staticState.port); return prepareForNewAssignmentNoWorkersRunning(dynamicState, staticState); } Container c = staticState.containerLauncher.launchContainer(staticState.port, dynamicState.pendingLocalization, staticState.localState); return dynamicState.withCurrentAssignment(c, dynamicState.pendingLocalization).withState(MachineState.WAITING_FOR_WORKER_START).withPendingLocalization(null, null); } catch (TimeoutException e) { //We waited for 1 second loop around and try again…. return dynamicState; } }这里通过staticState.containerLauncher.launchContainer去启动containerBasicContainerLauncher.launchContainerstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/BasicContainerLauncher.java @Override public Container launchContainer(int port, LocalAssignment assignment, LocalState state) throws IOException { LocalContainer ret = new LocalContainer(_conf, _supervisorId, port, assignment, _sharedContext); ret.setup(); ret.launch(); return ret; }launchContainer的时候，先调用setup，再调用launch方法Container.setupstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/Container.java /* * Setup the container to run. By default this creates the needed directories/links in the * local file system * PREREQUISITE: All needed blobs and topology, jars/configs have been downloaded and * placed in the appropriate locations * @throws IOException on any error / protected void setup() throws IOException { _type.assertFull(); if (!_ops.doRequiredTopoFilesExist(_conf, _topologyId)) { LOG.info(“Missing topology storm code, so can’t launch worker with assignment {} for this supervisor {} on port {} with id {}”, _assignment, _supervisorId, _port, _workerId); throw new IllegalStateException(“Not all needed files are here!!!!”); } LOG.info(“Setting up {}:{}”, _supervisorId, _workerId); _ops.forceMkdir(new File(ConfigUtils.workerPidsRoot(_conf, _workerId))); _ops.forceMkdir(new File(ConfigUtils.workerTmpRoot(_conf, _workerId))); _ops.forceMkdir(new File(ConfigUtils.workerHeartbeatsRoot(_conf, _workerId))); File workerArtifacts = new File(ConfigUtils.workerArtifactsRoot(_conf, _topologyId, _port)); if (!_ops.fileExists(workerArtifacts)) { _ops.forceMkdir(workerArtifacts); _ops.setupWorkerArtifactsDir(_assignment.get_owner(), workerArtifacts); } String user = getWorkerUser(); writeLogMetadata(user); saveWorkerUser(user); createArtifactsLink(); createBlobstoreLinks(); }setup主要做一些创建目录或链接的准备工作BasicContainer.launchstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/BasicContainer.java public void launch() throws IOException { _type.assertFull(); LOG.info(“Launching worker with assignment {} for this supervisor {} on port {} with id {}”, _assignment, _supervisorId, _port, _workerId); String logPrefix = “Worker Process " + _workerId; ProcessExitCallback processExitCallback = new ProcessExitCallback(logPrefix); _exitedEarly = false; final WorkerResources resources = _assignment.get_resources(); final int memOnheap = getMemOnHeap(resources); final String stormRoot = ConfigUtils.supervisorStormDistRoot(_conf, _topologyId); final String jlp = javaLibraryPath(stormRoot, _conf); List<String> commandList = mkLaunchCommand(memOnheap, stormRoot, jlp); Map<String, String> topEnvironment = new HashMap<String, String>(); @SuppressWarnings(“unchecked”) Map<String, String> environment = (Map<String, String>) _topoConf.get(Config.TOPOLOGY_ENVIRONMENT); if (environment != null) { topEnvironment.putAll(environment); } topEnvironment.put(“LD_LIBRARY_PATH”, jlp); LOG.info(“Launching worker with command: {}. “, Utils.shellCmd(commandList)); String workerDir = ConfigUtils.workerRoot(_conf, _workerId); launchWorkerProcess(commandList, topEnvironment, logPrefix, processExitCallback, new File(workerDir)); } /* * Launch the worker process (non-blocking) * * @param command * the command to run * @param env * the environment to run the command * @param processExitcallback * a callback for when the process exits * @param logPrefix * the prefix to include in the logs * @param targetDir * the working directory to run the command in * @return true if it ran successfully, else false * @throws IOException * on any error / protected void launchWorkerProcess(List<String> command, Map<String, String> env, String logPrefix, ExitCodeCallback processExitCallback, File targetDir) throws IOException { SupervisorUtils.launchProcess(command, env, logPrefix, processExitCallback, targetDir); }这里通过mkLaunchCommand来准备创建命令然后通过SupervisorUtils.launchProcess启动worker进程mkLaunchCommandstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/BasicContainerLauncher.java /* * Create the command to launch the worker process * @param memOnheap the on heap memory for the worker * @param stormRoot the root dist dir for the topology * @param jlp java library path for the topology * @return the command to run * @throws IOException on any error. / private List<String> mkLaunchCommand(final int memOnheap, final String stormRoot, final String jlp) throws IOException { final String javaCmd = javaCmd(“java”); final String stormOptions = ConfigUtils.concatIfNotNull(System.getProperty(“storm.options”)); final String stormConfFile = ConfigUtils.concatIfNotNull(System.getProperty(“storm.conf.file”)); final String workerTmpDir = ConfigUtils.workerTmpRoot(_conf, _workerId); List<String> classPathParams = getClassPathParams(stormRoot); List<String> commonParams = getCommonParams(); List<String> commandList = new ArrayList<>(); //Log Writer Command… commandList.add(javaCmd); commandList.addAll(classPathParams); commandList.addAll(substituteChildopts(_topoConf.get(Config.TOPOLOGY_WORKER_LOGWRITER_CHILDOPTS))); commandList.addAll(commonParams); commandList.add(“org.apache.storm.LogWriter”); //The LogWriter in turn launches the actual worker. //Worker Command… commandList.add(javaCmd); commandList.add("-server”); commandList.addAll(commonParams); commandList.addAll(substituteChildopts(_conf.get(Config.WORKER_CHILDOPTS), memOnheap)); commandList.addAll(substituteChildopts(_topoConf.get(Config.TOPOLOGY_WORKER_CHILDOPTS), memOnheap)); commandList.addAll(substituteChildopts(OR( _topoConf.get(Config.TOPOLOGY_WORKER_GC_CHILDOPTS), _conf.get(Config.WORKER_GC_CHILDOPTS)), memOnheap)); commandList.addAll(getWorkerProfilerChildOpts(memOnheap)); commandList.add("-Djava.library.path=” + jlp); commandList.add("-Dstorm.conf.file=” + stormConfFile); commandList.add("-Dstorm.options=” + stormOptions); commandList.add("-Djava.io.tmpdir=” + workerTmpDir); commandList.addAll(classPathParams); commandList.add(“org.apache.storm.daemon.worker”); commandList.add(_topologyId); commandList.add(_supervisorId); commandList.add(String.valueOf(_port)); commandList.add(_workerId); return commandList; }启动参数实例/usr/lib/jvm/java-1.8-openjdk/jre/bin/java -server -Dlogging.sensitivity=S3 -Dlogfile.name=worker.log -Dstorm.home=/apache-storm-1.2.2 -Dworkers.artifacts=/logs/workers-artifacts -Dstorm.id=DemoTopology-1-1539163962 -Dworker.id=f0f30bc3-11af-4f4f-b2dd-8cc92d8791bf -Dworker.port=6700 -Dstorm.log.dir=/logs -Dlog4j.configurationFile=/apache-storm-1.2.2/log4j2/worker.xml -DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector -Dstorm.local.dir=/data -Xmx768m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump -Djava.library.path=/data/supervisor/stormdist/DemoTopology-1-1539163962/resources/Linux-amd64:/data/supervisor/stormdist/DemoTopology-1-1539163962/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -Dstorm.options=storm.local.hostname%3D192.168.99.100 -Djava.io.tmpdir=/data/workers/f0f30bc3-11af-4f4f-b2dd-8cc92d8791bf/tmp -cp /apache-storm-1.2.2/lib/:/apache-storm-1.2.2/extlib/*:/conf:/data/supervisor/stormdist/DemoTopology-1-1539163962/stormjar.jar org.apache.storm.daemon.worker DemoTopology-1-1539163962 8dd6dc7f-95cb-49f9-9bd1-f0d638fe6fc6 6700 f0f30bc3-11af-4f4f-b2dd-8cc92d8791bforg.apache.storm.daemon.worker"的路径为storm-core-1.2.2-sources.jar!/org/apache/storm/daemon/worker.cljSupervisorUtils.launchProcessstorm-core-1.2.2-sources.jar!/org/apache/storm/daemon/supervisor/SupervisorUtils.java /** * Launch a new process as per {@link java.lang.ProcessBuilder} with a given * callback. * @param command the command to be executed in the new process * @param environment the environment to be applied to the process. Can be * null. * @param logPrefix a prefix for log entries from the output of the process. * Can be null. * @param exitCodeCallback code to be called passing the exit code value * when the process completes * @param dir the working directory of the new process * @return the new process * @throws IOException * @see java.lang.ProcessBuilder */ public static Process launchProcess(List<String> command, Map<String,String> environment, final String logPrefix, final ExitCodeCallback exitCodeCallback, File dir) throws IOException { ProcessBuilder builder = new ProcessBuilder(command); Map<String,String> procEnv = builder.environment(); if (dir != null) { builder.directory(dir); } builder.redirectErrorStream(true); if (environment != null) { procEnv.putAll(environment); } final Process process = builder.start(); if (logPrefix != null || exitCodeCallback != null) { Utils.asyncLoop(new Callable<Object>() { public Object call() { if (logPrefix != null ) { Utils.readAndLogStream(logPrefix, process.getInputStream()); } if (exitCodeCallback != null) { try { process.waitFor(); exitCodeCallback.call(process.exitValue()); } catch (InterruptedException ie) { LOG.info("{} interrupted", logPrefix); exitCodeCallback.call(-1); } } return null; // Run only once. } }); } return process; }这里通过ProcessBuilder来启动进程小结storm的supervisor启动的时候，会创建ContainerLauncher以及根据SUPERVISOR_SLOTS_PORTS(supervisor.slots.ports)创建slotsslot线程会不断循环state，在WAITING_FOR_BLOB_LOCALIZATION的时候使用ContainerLauncher的launchContainer创建Container并launchcontainer launch的时候通过SupervisorUtils.launchProcess(使用ProcessBuilder)启动worker进程docStorm Concepts ...

聊聊storm nimbus的LeaderElector

序本文主要研究一下storm nimbus的LeaderElectorNimbusorg/apache/storm/daemon/nimbus/Nimbus.java public static void main(String[] args) throws Exception { Utils.setupDefaultUncaughtExceptionHandler(); launch(new StandaloneINimbus()); } public static Nimbus launch(INimbus inimbus) throws Exception { Map<String, Object> conf = Utils.merge(ConfigUtils.readStormConfig(), ConfigUtils.readYamlConfig(“storm-cluster-auth.yaml”, false)); boolean fixupAcl = (boolean) conf.get(DaemonConfig.STORM_NIMBUS_ZOOKEEPER_ACLS_FIXUP); boolean checkAcl = fixupAcl || (boolean) conf.get(DaemonConfig.STORM_NIMBUS_ZOOKEEPER_ACLS_CHECK); if (checkAcl) { AclEnforcement.verifyAcls(conf, fixupAcl); } return launchServer(conf, inimbus); } private static Nimbus launchServer(Map<String, Object> conf, INimbus inimbus) throws Exception { StormCommon.validateDistributedMode(conf); validatePortAvailable(conf); StormMetricsRegistry metricsRegistry = new StormMetricsRegistry(); final Nimbus nimbus = new Nimbus(conf, inimbus, metricsRegistry); nimbus.launchServer(); final ThriftServer server = new ThriftServer(conf, new Processor<>(nimbus), ThriftConnectionType.NIMBUS); metricsRegistry.startMetricsReporters(conf); Utils.addShutdownHookWithDelayedForceKill(() -> { metricsRegistry.stopMetricsReporters(); nimbus.shutdown(); server.stop(); }, 10); if (ClientAuthUtils.areWorkerTokensEnabledServer(server, conf)) { nimbus.initWorkerTokenManager(); } LOG.info(“Starting nimbus server for storm version ‘{}’”, STORM_VERSION); server.serve(); return nimbus; } public Nimbus(Map<String, Object> conf, INimbus inimbus, IStormClusterState stormClusterState, NimbusInfo hostPortInfo, BlobStore blobStore, TopoCache topoCache, ILeaderElector leaderElector, IGroupMappingServiceProvider groupMapper, StormMetricsRegistry metricsRegistry) throws Exception { //…… if (blobStore == null) { blobStore = ServerUtils.getNimbusBlobStore(conf, this.nimbusHostPortInfo, null); } this.blobStore = blobStore; if (topoCache == null) { topoCache = new TopoCache(blobStore, conf); } if (leaderElector == null) { leaderElector = Zookeeper.zkLeaderElector(conf, zkClient, blobStore, topoCache, stormClusterState, getNimbusAcls(conf), metricsRegistry); } this.leaderElector = leaderElector; this.blobStore.setLeaderElector(this.leaderElector); //…… } public void launchServer() throws Exception { try { BlobStore store = blobStore; IStormClusterState state = stormClusterState; NimbusInfo hpi = nimbusHostPortInfo; LOG.info(“Starting Nimbus with conf {}”, ConfigUtils.maskPasswords(conf)); validator.prepare(conf); //add to nimbuses state.addNimbusHost(hpi.getHost(), new NimbusSummary(hpi.getHost(), hpi.getPort(), Time.currentTimeSecs(), false, STORM_VERSION)); leaderElector.addToLeaderLockQueue(); this.blobStore.startSyncBlobs(); for (ClusterMetricsConsumerExecutor exec: clusterConsumerExceutors) { exec.prepare(); } if (isLeader()) { for (String topoId : state.activeStorms()) { transition(topoId, TopologyActions.STARTUP, null); } clusterMetricSet.setActive(true); } //…… } catch (Exception e) { if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)) { throw e; } if (Utils.exceptionCauseIsInstanceOf(InterruptedIOException.class, e)) { throw e; } LOG.error(“Error on initialization of nimbus”, e); Utils.exitProcess(13, “Error on initialization of nimbus”); } }Nimbus在构造器里头调用Zookeeper.zkLeaderElector创建leaderElectorlaunchServer方法调用了leaderElector.addToLeaderLockQueue()参与leader选举Zookeeper.zkLeaderElectorstorm-core-1.1.0-sources.jar!/org/apache/storm/zookeeper/Zookeeper.java public static ILeaderElector zkLeaderElector(Map conf, BlobStore blobStore) throws UnknownHostException { return _instance.zkLeaderElectorImpl(conf, blobStore); } protected ILeaderElector zkLeaderElectorImpl(Map conf, BlobStore blobStore) throws UnknownHostException { List<String> servers = (List<String>) conf.get(Config.STORM_ZOOKEEPER_SERVERS); Object port = conf.get(Config.STORM_ZOOKEEPER_PORT); CuratorFramework zk = mkClientImpl(conf, servers, port, “”, conf); String leaderLockPath = conf.get(Config.STORM_ZOOKEEPER_ROOT) + “/leader-lock”; String id = NimbusInfo.fromConf(conf).toHostPortString(); AtomicReference<LeaderLatch> leaderLatchAtomicReference = new AtomicReference<>(new LeaderLatch(zk, leaderLockPath, id)); AtomicReference<LeaderLatchListener> leaderLatchListenerAtomicReference = new AtomicReference<>(leaderLatchListenerImpl(conf, zk, blobStore, leaderLatchAtomicReference.get())); return new LeaderElectorImp(conf, servers, zk, leaderLockPath, id, leaderLatchAtomicReference, leaderLatchListenerAtomicReference, blobStore); }这里使用/leader-lock路径创建了LeaderLatch，然后使用leaderLatchListenerImpl创建了LeaderLatchListener最后使用LeaderElectorImp创建ILeaderElectorleaderLatchListenerImplstorm-core-1.1.0-sources.jar!/org/apache/storm/zookeeper/Zookeeper.java // Leader latch listener that will be invoked when we either gain or lose leadership public static LeaderLatchListener leaderLatchListenerImpl(final Map conf, final CuratorFramework zk, final BlobStore blobStore, final LeaderLatch leaderLatch) throws UnknownHostException { final String hostName = InetAddress.getLocalHost().getCanonicalHostName(); return new LeaderLatchListener() { final String STORM_JAR_SUFFIX = “-stormjar.jar”; final String STORM_CODE_SUFFIX = “-stormcode.ser”; final String STORM_CONF_SUFFIX = “-stormconf.ser”; @Override public void isLeader() { Set<String> activeTopologyIds = new TreeSet<>(Zookeeper.getChildren(zk, conf.get(Config.STORM_ZOOKEEPER_ROOT) + ClusterUtils.STORMS_SUBTREE, false)); Set<String> activeTopologyBlobKeys = populateTopologyBlobKeys(activeTopologyIds); Set<String> activeTopologyCodeKeys = filterTopologyCodeKeys(activeTopologyBlobKeys); Set<String> allLocalBlobKeys = Sets.newHashSet(blobStore.listKeys()); Set<String> allLocalTopologyBlobKeys = filterTopologyBlobKeys(allLocalBlobKeys); // this finds all active topologies blob keys from all local topology blob keys Sets.SetView<String> diffTopology = Sets.difference(activeTopologyBlobKeys, allLocalTopologyBlobKeys); LOG.info(“active-topology-blobs [{}] local-topology-blobs [{}] diff-topology-blobs [{}]”, generateJoinedString(activeTopologyIds), generateJoinedString(allLocalTopologyBlobKeys), generateJoinedString(diffTopology)); if (diffTopology.isEmpty()) { Set<String> activeTopologyDependencies = getTopologyDependencyKeys(activeTopologyCodeKeys); // this finds all dependency blob keys from active topologies from all local blob keys Sets.SetView<String> diffDependencies = Sets.difference(activeTopologyDependencies, allLocalBlobKeys); LOG.info(“active-topology-dependencies [{}] local-blobs [{}] diff-topology-dependencies [{}]”, generateJoinedString(activeTopologyDependencies), generateJoinedString(allLocalBlobKeys), generateJoinedString(diffDependencies)); if (diffDependencies.isEmpty()) { LOG.info(“Accepting leadership, all active topologies and corresponding dependencies found locally.”); } else { LOG.info(“Code for all active topologies is available locally, but some dependencies are not found locally, giving up leadership.”); closeLatch(); } } else { LOG.info(“code for all active topologies not available locally, giving up leadership.”); closeLatch(); } } @Override public void notLeader() { LOG.info("{} lost leadership.", hostName); } //…… private void closeLatch() { try { leaderLatch.close(); } catch (IOException e) { throw new RuntimeException(e); } } }; }leaderLatchListenerImpl返回一个LeaderLatchListener接口的实现类isLeader接口里头做了一些校验，即当被zookeeper选中为leader的时候，如果本地没有所有的active topologies或者本地没有所有dependencies，那么就需要调用leaderLatch.close()放弃leadershipnotLeader接口主要打印一下logLeaderElectorImporg/apache/storm/zookeeper/LeaderElectorImp.javapublic class LeaderElectorImp implements ILeaderElector { private static Logger LOG = LoggerFactory.getLogger(LeaderElectorImp.class); private final Map<String, Object> conf; private final List<String> servers; private final CuratorFramework zk; private final String leaderlockPath; private final String id; private final AtomicReference<LeaderLatch> leaderLatch; private final AtomicReference<LeaderLatchListener> leaderLatchListener; private final BlobStore blobStore; private final TopoCache tc; private final IStormClusterState clusterState; private final List<ACL> acls; private final StormMetricsRegistry metricsRegistry; public LeaderElectorImp(Map<String, Object> conf, List<String> servers, CuratorFramework zk, String leaderlockPath, String id, AtomicReference<LeaderLatch> leaderLatch, AtomicReference<LeaderLatchListener> leaderLatchListener, BlobStore blobStore, final TopoCache tc, IStormClusterState clusterState, List<ACL> acls, StormMetricsRegistry metricsRegistry) { this.conf = conf; this.servers = servers; this.zk = zk; this.leaderlockPath = leaderlockPath; this.id = id; this.leaderLatch = leaderLatch; this.leaderLatchListener = leaderLatchListener; this.blobStore = blobStore; this.tc = tc; this.clusterState = clusterState; this.acls = acls; this.metricsRegistry = metricsRegistry; } @Override public void prepare(Map<String, Object> conf) { // no-op for zookeeper implementation } @Override public void addToLeaderLockQueue() throws Exception { // if this latch is already closed, we need to create new instance. if (LeaderLatch.State.CLOSED.equals(leaderLatch.get().getState())) { leaderLatch.set(new LeaderLatch(zk, leaderlockPath)); LeaderListenerCallback callback = new LeaderListenerCallback(conf, zk, leaderLatch.get(), blobStore, tc, clusterState, acls, metricsRegistry); leaderLatchListener.set(Zookeeper.leaderLatchListenerImpl(callback)); LOG.info(“LeaderLatch was in closed state. Resetted the leaderLatch and listeners.”); } // Only if the latch is not already started we invoke start if (LeaderLatch.State.LATENT.equals(leaderLatch.get().getState())) { leaderLatch.get().addListener(leaderLatchListener.get()); leaderLatch.get().start(); LOG.info(“Queued up for leader lock.”); } else { LOG.info(“Node already in queue for leader lock.”); } } @Override // Only started latches can be closed. public void removeFromLeaderLockQueue() throws Exception { if (LeaderLatch.State.STARTED.equals(leaderLatch.get().getState())) { leaderLatch.get().close(); LOG.info(“Removed from leader lock queue.”); } else { LOG.info(“leader latch is not started so no removeFromLeaderLockQueue needed.”); } } @Override public boolean isLeader() throws Exception { return leaderLatch.get().hasLeadership(); } @Override public NimbusInfo getLeader() { try { return Zookeeper.toNimbusInfo(leaderLatch.get().getLeader()); } catch (Exception e) { throw Utils.wrapInRuntime(e); } } @Override public List<NimbusInfo> getAllNimbuses() throws Exception { List<NimbusInfo> nimbusInfos = new ArrayList<>(); Collection<Participant> participants = leaderLatch.get().getParticipants(); for (Participant participant : participants) { nimbusInfos.add(Zookeeper.toNimbusInfo(participant)); } return nimbusInfos; } @Override public void close() { //Do nothing now. }}LeaderElectorImp实现了ILeaderElector接口addToLeaderLockQueue方法检测如果latch已经closed，则重新创建一个新的，然后检测latch的状态，如果还没有start的话，则调用start参与选举之所以对closed状态的latch创建一个，主要有两个原因：一是对已经closed的latch进行方法调用会抛异常，二是被zk选举为leader，但是不满意storm的一些leader条件会放弃leadership即close掉小结storm nimbus的LeaderElector主要是基于zookeeper recipies的LeaderLatch来实现storm nimbus自定义了LeaderLatchListener，对成为leader之后的nimbus进行校验，需要本地拥有所有的active topologies以及所有dependencies，否则放弃leadershipdocHighly Available Nimbus Design ...

聊聊storm client的nimbus.seeds参数

序本文主要研究一下storm client的nimbus.seeds参数NIMBUS_SEEDSstorm-core-1.1.0-sources.jar!/org/apache/storm/Config.java /** * The host that the master server is running on, added only for backward compatibility, * the usage deprecated in favor of nimbus.seeds config. / @Deprecated @isString public static final String NIMBUS_HOST = “nimbus.host”; /* * List of seed nimbus hosts to use for leader nimbus discovery. */ @isStringList public static final String NIMBUS_SEEDS = “nimbus.seeds”;可以看到这里废除了nimbus.host参数，而nimbus.seeds参数主要用于发现nimbus leaderStormSubmitterstorm-core-1.1.0-sources.jar!/org/apache/storm/StormSubmitter.java public static void submitTopologyAs(String name, Map stormConf, StormTopology topology, SubmitOptions opts, ProgressListener progressListener, String asUser) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException, IllegalArgumentException { if(!Utils.isValidConf(stormConf)) { throw new IllegalArgumentException(“Storm conf is not valid. Must be json-serializable”); } stormConf = new HashMap(stormConf); stormConf.putAll(Utils.readCommandLineOpts()); Map conf = Utils.readStormConfig(); conf.putAll(stormConf); stormConf.putAll(prepareZookeeperAuthentication(conf)); validateConfs(conf, topology); Map<String,String> passedCreds = new HashMap<>(); if (opts != null) { Credentials tmpCreds = opts.get_creds(); if (tmpCreds != null) { passedCreds = tmpCreds.get_creds(); } } Map<String,String> fullCreds = populateCredentials(conf, passedCreds); if (!fullCreds.isEmpty()) { if (opts == null) { opts = new SubmitOptions(TopologyInitialStatus.ACTIVE); } opts.set_creds(new Credentials(fullCreds)); } try { if (localNimbus!=null) { LOG.info(“Submitting topology " + name + " in local mode”); if (opts!=null) { localNimbus.submitTopologyWithOpts(name, stormConf, topology, opts); } else { // this is for backwards compatibility localNimbus.submitTopology(name, stormConf, topology); } LOG.info(“Finished submitting topology: " + name); } else { String serConf = JSONValue.toJSONString(stormConf); try (NimbusClient client = NimbusClient.getConfiguredClientAs(conf, asUser)) { if (topologyNameExists(name, client)) { throw new RuntimeException(“Topology with name " + name + " already exists on cluster”); } // Dependency uploading only makes sense for distributed mode List<String> jarsBlobKeys = Collections.emptyList(); List<String> artifactsBlobKeys; DependencyUploader uploader = new DependencyUploader(); try { uploader.init(); jarsBlobKeys = uploadDependencyJarsToBlobStore(uploader); artifactsBlobKeys = uploadDependencyArtifactsToBlobStore(uploader); } catch (Throwable e) { // remove uploaded jars blobs, not artifacts since they’re shared across the cluster uploader.deleteBlobs(jarsBlobKeys); uploader.shutdown(); throw e; } try { setDependencyBlobsToTopology(topology, jarsBlobKeys, artifactsBlobKeys); submitTopologyInDistributeMode(name, topology, opts, progressListener, asUser, conf, serConf, client); } catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) { // remove uploaded jars blobs, not artifacts since they’re shared across the cluster // Note that we don’t handle TException to delete jars blobs // because it’s safer to leave some blobs instead of topology not running uploader.deleteBlobs(jarsBlobKeys); throw e; } finally { uploader.shutdown(); } } } } catch(TException e) { throw new RuntimeException(e); } invokeSubmitterHook(name, asUser, conf, topology); }StormSubmitter的submitTopologyAs通过NimbusClient.getConfiguredClientAs(conf, asUser)创建NimbusClientNimbusClientstorm-core-1.1.0-sources.jar!/org/apache/storm/utils/NimbusClient.java public static NimbusClient getConfiguredClientAs(Map conf, String asUser) { if (conf.containsKey(Config.STORM_DO_AS_USER)) { if (asUser != null && !asUser.isEmpty()) { LOG.warn(“You have specified a doAsUser as param {} and a doAsParam as config, config will take precedence.” , asUser, conf.get(Config.STORM_DO_AS_USER)); } asUser = (String) conf.get(Config.STORM_DO_AS_USER); } List<String> seeds; if(conf.containsKey(Config.NIMBUS_HOST)) { LOG.warn(“Using deprecated config {} for backward compatibility. Please update your storm.yaml so it only has config {}”, Config.NIMBUS_HOST, Config.NIMBUS_SEEDS); seeds = Lists.newArrayList(conf.get(Config.NIMBUS_HOST).toString()); } else { seeds = (List<String>) conf.get(Config.NIMBUS_SEEDS); } for (String host : seeds) { int port = Integer.parseInt(conf.get(Config.NIMBUS_THRIFT_PORT).toString()); NimbusSummary nimbusSummary; NimbusClient client = null; try { client = new NimbusClient(conf, host, port, null, asUser); nimbusSummary = client.getClient().getLeader(); if (nimbusSummary != null) { String leaderNimbus = nimbusSummary.get_host() + “:” + nimbusSummary.get_port(); LOG.info(“Found leader nimbus : {}”, leaderNimbus); if (nimbusSummary.get_host().equals(host) && nimbusSummary.get_port() == port) { NimbusClient ret = client; client = null; return ret; } try { return new NimbusClient(conf, nimbusSummary.get_host(), nimbusSummary.get_port(), null, asUser); } catch (TTransportException e) { throw new RuntimeException(“Failed to create a nimbus client for the leader " + leaderNimbus, e); } } } catch (Exception e) { LOG.warn(“Ignoring exception while trying to get leader nimbus info from " + host + “. will retry with a different seed host.”, e); continue; } finally { if (client != null) { client.close(); } } throw new NimbusLeaderNotFoundException(“Could not find a nimbus leader, please try " + “again after some time.”); } throw new NimbusLeaderNotFoundException( “Could not find leader nimbus from seed hosts " + seeds + “. " + “Did you specify a valid list of nimbus hosts for config " + Config.NIMBUS_SEEDS + “?”); }这里仍然兼容NIMBUS_HOST参数，如果有NIMBUS_HOST参数则从中读取seeds，没有则从NIMBUS_SEEDS参数获取之后遍历seeds，根据每个seed创建NimbusClient，然后调用client.getClient().getLeader()获取leader信息，如果获取成功，则判断leader是否当前连接的seed，如果是则直接返回，如果不是则根据leader的host和port创建新的NimbusClient返回如果nimbusSummary为null，则会抛出NimbusLeaderNotFoundException(“Could not find a nimbus leader, please try again after some time.")如果连接leader出现异常，则遍历下一个seed，进行retry操作，如果所有seed都retry失败，则跳出循环，最后抛出NimbusLeaderNotFoundException(“Could not find leader nimbus from seed hosts " + seeds + “. Did you specify a valid list of nimbus hosts for config nimbus.seeds?")小结对于storm client来说，nimbus.seeds参数用于client进行寻找nimbus leader，而nimbus.host参数已经被废弃寻找nimbus leader的过程就是挨个遍历seeds配置的host，进行连接，然后获取leader的信息，如果获取成功但是nimbusSummary为null，则抛出NimbusLeaderNotFoundException(“Could not find a nimbus leader, please try again after some time.")。如果有异常则遍历下一个seed进行retry，如果都不成功，则最后跳出循环，抛出NimbusLeaderNotFoundException(“Could not find leader nimbus from seed hosts " + seeds + “. Did you specify a valid list of nimbus hosts for config nimbus.seeds?")docSetting-up-a-Storm-cluster ...

聊聊storm的submitTopology

序本文主要研究一下storm的submitTopology提交topology日志实例2018-10-08 17:32:55.738 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Generated ZooKeeper secret payload for MD5-digest: -8659577410336375158:-63518734380418553182018-10-08 17:32:55.893 INFO 2870 — [ main] org.apache.storm.utils.NimbusClient : Found leader nimbus : a391f7a04044:66272018-10-08 17:32:56.059 INFO 2870 — [ main] o.apache.storm.security.auth.AuthUtils : Got AutoCreds []2018-10-08 17:32:56.073 INFO 2870 — [ main] org.apache.storm.utils.NimbusClient : Found leader nimbus : a391f7a04044:66272018-10-08 17:32:56.123 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Uploading dependencies - jars…2018-10-08 17:32:56.125 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Uploading dependencies - artifacts…2018-10-08 17:32:56.125 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Dependency Blob keys - jars : [] / artifacts : []2018-10-08 17:32:56.149 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Uploading topology jar /tmp/storm-demo/target/storm-demo-0.0.1-SNAPSHOT.jar to assigned location: /data/nimbus/inbox/stormjar-4ead82bb-74a3-45a3-aca4-3af2f1d23998.jar2018-10-08 17:32:57.105 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Successfully uploaded topology jar to assigned location: /data/nimbus/inbox/stormjar-4ead82bb-74a3-45a3-aca4-3af2f1d23998.jar2018-10-08 17:32:57.106 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Submitting topology DemoTopology in distributed mode with conf {“nimbus.seeds”:[“192.168.99.100”],“storm.zookeeper.topology.auth.scheme”:“digest”,“topology.workers”:1,“storm.zookeeper.port”:2181,“nimbus.thrift.port”:6627,“storm.zookeeper.topology.auth.payload”:"-8659577410336375158:-6351873438041855318",“storm.zookeeper.servers”:[“192.168.99.100”]}2018-10-08 17:32:58.008 INFO 2870 — [ main] org.apache.storm.StormSubmitter : Finished submitting topology: DemoTopology这里可以看到这里上传到了nimbus的路径为/data/nimbus/inbox/stormjar-4ead82bb-74a3-45a3-aca4-3af2f1d23998.jarStormSubmittersubmitTopologystorm-core-1.1.0-sources.jar!/org/apache/storm/StormSubmitter.java public static void submitTopology(String name, Map stormConf, StormTopology topology) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException { submitTopology(name, stormConf, topology, null, null); } public static void submitTopology(String name, Map stormConf, StormTopology topology, SubmitOptions opts, ProgressListener progressListener) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException { submitTopologyAs(name, stormConf, topology, opts, progressListener, null); } public static void submitTopologyAs(String name, Map stormConf, StormTopology topology, SubmitOptions opts, ProgressListener progressListener, String asUser) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException, IllegalArgumentException { if(!Utils.isValidConf(stormConf)) { throw new IllegalArgumentException(“Storm conf is not valid. Must be json-serializable”); } stormConf = new HashMap(stormConf); stormConf.putAll(Utils.readCommandLineOpts()); Map conf = Utils.readStormConfig(); conf.putAll(stormConf); stormConf.putAll(prepareZookeeperAuthentication(conf)); validateConfs(conf, topology); Map<String,String> passedCreds = new HashMap<>(); if (opts != null) { Credentials tmpCreds = opts.get_creds(); if (tmpCreds != null) { passedCreds = tmpCreds.get_creds(); } } Map<String,String> fullCreds = populateCredentials(conf, passedCreds); if (!fullCreds.isEmpty()) { if (opts == null) { opts = new SubmitOptions(TopologyInitialStatus.ACTIVE); } opts.set_creds(new Credentials(fullCreds)); } try { if (localNimbus!=null) { LOG.info(“Submitting topology " + name + " in local mode”); if (opts!=null) { localNimbus.submitTopologyWithOpts(name, stormConf, topology, opts); } else { // this is for backwards compatibility localNimbus.submitTopology(name, stormConf, topology); } LOG.info(“Finished submitting topology: " + name); } else { String serConf = JSONValue.toJSONString(stormConf); try (NimbusClient client = NimbusClient.getConfiguredClientAs(conf, asUser)) { if (topologyNameExists(name, client)) { throw new RuntimeException(“Topology with name " + name + " already exists on cluster”); } // Dependency uploading only makes sense for distributed mode List<String> jarsBlobKeys = Collections.emptyList(); List<String> artifactsBlobKeys; DependencyUploader uploader = new DependencyUploader(); try { uploader.init(); jarsBlobKeys = uploadDependencyJarsToBlobStore(uploader); artifactsBlobKeys = uploadDependencyArtifactsToBlobStore(uploader); } catch (Throwable e) { // remove uploaded jars blobs, not artifacts since they’re shared across the cluster uploader.deleteBlobs(jarsBlobKeys); uploader.shutdown(); throw e; } try { setDependencyBlobsToTopology(topology, jarsBlobKeys, artifactsBlobKeys); submitTopologyInDistributeMode(name, topology, opts, progressListener, asUser, conf, serConf, client); } catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) { // remove uploaded jars blobs, not artifacts since they’re shared across the cluster // Note that we don’t handle TException to delete jars blobs // because it’s safer to leave some blobs instead of topology not running uploader.deleteBlobs(jarsBlobKeys); throw e; } finally { uploader.shutdown(); } } } } catch(TException e) { throw new RuntimeException(e); } invokeSubmitterHook(name, asUser, conf, topology); } private static void submitTopologyInDistributeMode(String name, StormTopology topology, SubmitOptions opts, ProgressListener progressListener, String asUser, Map conf, String serConf, NimbusClient client) throws TException { try { String jar = submitJarAs(conf, System.getProperty(“storm.jar”), progressListener, client); LOG.info(“Submitting topology {} in distributed mode with conf {}”, name, serConf); if (opts != null) { client.getClient().submitTopologyWithOpts(name, jar, serConf, topology, opts); } else { // this is for backwards compatibility client.getClient().submitTopology(name, jar, serConf, topology); } LOG.info(“Finished submitting topology: {}”, name); } catch (InvalidTopologyException e) { LOG.warn(“Topology submission exception: {}”, e.get_msg()); throw e; } catch (AlreadyAliveException e) { LOG.warn(“Topology already alive exception”, e); throw e; } } public static String submitJarAs(Map conf, String localJar, ProgressListener listener, NimbusClient client) { if (localJar == null) { throw new RuntimeException(“Must submit topologies using the ‘storm’ client script so that StormSubmitter knows which jar to upload.”); } try { String uploadLocation = client.getClient().beginFileUpload(); LOG.info(“Uploading topology jar " + localJar + " to assigned location: " + uploadLocation); BufferFileInputStream is = new BufferFileInputStream(localJar, THRIFT_CHUNK_SIZE_BYTES); long totalSize = new File(localJar).length(); if (listener != null) { listener.onStart(localJar, uploadLocation, totalSize); } long bytesUploaded = 0; while(true) { byte[] toSubmit = is.read(); bytesUploaded += toSubmit.length; if (listener != null) { listener.onProgress(localJar, uploadLocation, bytesUploaded, totalSize); } if(toSubmit.length==0) break; client.getClient().uploadChunk(uploadLocation, ByteBuffer.wrap(toSubmit)); } client.getClient().finishFileUpload(uploadLocation); if (listener != null) { listener.onCompleted(localJar, uploadLocation, totalSize); } LOG.info(“Successfully uploaded topology jar to assigned location: " + uploadLocation); return uploadLocation; } catch(Exception e) { throw new RuntimeException(e); } }主要通过submitTopologyAs方法来提交topology而submitTopologyAs调用了submitTopologyInDistributeMode，通过DependencyUploader上传依赖，最后再通过submitJarAs方法上传topology的jar包从前面的日志可以看到，上传到nimbus的路径为/data/nimbus/inbox/stormjar-4ead82bb-74a3-45a3-aca4-3af2f1d23998.jarclient.getClient().submitTopology主要是提交topology信息uploadDependencyJarsToBlobStorestorm-core-1.1.0-sources.jar!/org/apache/storm/StormSubmitter.java private static List<String> uploadDependencyJarsToBlobStore(DependencyUploader uploader) { LOG.info(“Uploading dependencies - jars…”); DependencyPropertiesParser propertiesParser = new DependencyPropertiesParser(); String depJarsProp = System.getProperty(“storm.dependency.jars”, “”); List<File> depJars = propertiesParser.parseJarsProperties(depJarsProp); try { return uploader.uploadFiles(depJars, true); } catch (Throwable e) { throw new RuntimeException(e); } }uploadDependencyArtifactsToBlobStorestorm-core-1.1.0-sources.jar!/org/apache/storm/StormSubmitter.java private static List<String> uploadDependencyArtifactsToBlobStore(DependencyUploader uploader) { LOG.info(“Uploading dependencies - artifacts…”); DependencyPropertiesParser propertiesParser = new DependencyPropertiesParser(); String depArtifactsProp = System.getProperty(“storm.dependency.artifacts”, “{}”); Map<String, File> depArtifacts = propertiesParser.parseArtifactsProperties(depArtifactsProp); try { return uploader.uploadArtifacts(depArtifacts); } catch (Throwable e) { throw new RuntimeException(e); } }DependencyUploaderstorm-core-1.1.0-sources.jar!/org/apache/storm/dependency/DependencyUploader.java public List<String> uploadFiles(List<File> dependencies, boolean cleanupIfFails) throws IOException, AuthorizationException { checkFilesExist(dependencies); List<String> keys = new ArrayList<>(dependencies.size()); try { for (File dependency : dependencies) { String fileName = dependency.getName(); String key = BlobStoreUtils.generateDependencyBlobKey(BlobStoreUtils.applyUUIDToFileName(fileName)); try { uploadDependencyToBlobStore(key, dependency); } catch (KeyAlreadyExistsException e) { // it should never happened since we apply UUID throw new RuntimeException(e); } keys.add(key); } } catch (Throwable e) { if (getBlobStore() != null && cleanupIfFails) { deleteBlobs(keys); } throw new RuntimeException(e); } return keys; } public List<String> uploadArtifacts(Map<String, File> artifacts) { checkFilesExist(artifacts.values()); List<String> keys = new ArrayList<>(artifacts.size()); try { for (Map.Entry<String, File> artifactToFile : artifacts.entrySet()) { String artifact = artifactToFile.getKey(); File dependency = artifactToFile.getValue(); String key = BlobStoreUtils.generateDependencyBlobKey(convertArtifactToJarFileName(artifact)); try { uploadDependencyToBlobStore(key, dependency); } catch (KeyAlreadyExistsException e) { // we lose the race, but it doesn’t matter } keys.add(key); } } catch (Throwable e) { throw new RuntimeException(e); } return keys; } private boolean uploadDependencyToBlobStore(String key, File dependency) throws KeyAlreadyExistsException, AuthorizationException, IOException { boolean uploadNew = false; try { // FIXME: we can filter by listKeys() with local blobstore when STORM-1986 is going to be resolved // as a workaround, we call getBlobMeta() for all keys getBlobStore().getBlobMeta(key); } catch (KeyNotFoundException e) { // TODO: do we want to add ACL here? AtomicOutputStream blob = getBlobStore() .createBlob(key, new SettableBlobMeta(new ArrayList<AccessControl>())); Files.copy(dependency.toPath(), blob); blob.close(); uploadNew = true; } return uploadNew; }uploadFiles以及uploadArtifacts方法最后都调用uploadDependencyToBlobStoreuploadDependencyToBlobStore方法将数据写入AtomicOutputStreamNimbusUploadAtomicOutputStreamstorm-core-1.1.0-sources.jar!/org/apache/storm/blobstore/NimbusBlobStore.java public class NimbusUploadAtomicOutputStream extends AtomicOutputStream { private String session; private int maxChunkSize = 4096; private String key; public NimbusUploadAtomicOutputStream(String session, int bufferSize, String key) { this.session = session; this.maxChunkSize = bufferSize; this.key = key; } @Override public void cancel() throws IOException { try { synchronized(client) { client.getClient().cancelBlobUpload(session); } } catch (TException e) { throw new RuntimeException(e); } } @Override public void write(int b) throws IOException { try { synchronized(client) { client.getClient().uploadBlobChunk(session, ByteBuffer.wrap(new byte[] {(byte)b})); } } catch (TException e) { throw new RuntimeException(e); } } @Override public void write(byte []b) throws IOException { write(b, 0, b.length); } @Override public void write(byte []b, int offset, int len) throws IOException { try { int end = offset + len; for (int realOffset = offset; realOffset < end; realOffset += maxChunkSize) { int realLen = Math.min(end - realOffset, maxChunkSize); LOG.debug(“Writing {} bytes of {} remaining”,realLen,(end-realOffset)); synchronized(client) { client.getClient().uploadBlobChunk(session, ByteBuffer.wrap(b, realOffset, realLen)); } } } catch (TException e) { throw new RuntimeException(e); } } @Override public void close() throws IOException { try { synchronized(client) { client.getClient().finishBlobUpload(session); client.getClient().createStateInZookeeper(key); } } catch (TException e) { throw new RuntimeException(e); } } }NimbusUploadAtomicOutputStream的write方法通过client.getClient().uploadBlobChunk完成数据上传send&recvstorm-core-1.1.0-sources.jar!/org/apache/storm/generated/Nimbus.java public String beginFileUpload() throws AuthorizationException, org.apache.thrift.TException { send_beginFileUpload(); return recv_beginFileUpload(); } public void send_beginFileUpload() throws org.apache.thrift.TException { beginFileUpload_args args = new beginFileUpload_args(); sendBase(“beginFileUpload”, args); } public String recv_beginFileUpload() throws AuthorizationException, org.apache.thrift.TException { beginFileUpload_result result = new beginFileUpload_result(); receiveBase(result, “beginFileUpload”); if (result.is_set_success()) { return result.success; } if (result.aze != null) { throw result.aze; } throw new org.apache.thrift.TApplicationException(org.apache.thrift.TApplicationException.MISSING_RESULT, “beginFileUpload failed: unknown result”); } public void send_finishFileUpload(String location) throws org.apache.thrift.TException { finishFileUpload_args args = new finishFileUpload_args(); args.set_location(location); sendBase(“finishFileUpload”, args); } public void uploadChunk(String location, ByteBuffer chunk) throws AuthorizationException, org.apache.thrift.TException { send_uploadChunk(location, chunk); recv_uploadChunk(); } public void send_uploadChunk(String location, ByteBuffer chunk) throws org.apache.thrift.TException { uploadChunk_args args = new uploadChunk_args(); args.set_location(location); args.set_chunk(chunk); sendBase(“uploadChunk”, args); } public void recv_uploadChunk() throws AuthorizationException, org.apache.thrift.TException { uploadChunk_result result = new uploadChunk_result(); receiveBase(result, “uploadChunk”); if (result.aze != null) { throw result.aze; } return; } public void submitTopology(String name, String uploadedJarLocation, String jsonConf, StormTopology topology) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException, org.apache.thrift.TException { send_submitTopology(name, uploadedJarLocation, jsonConf, topology); recv_submitTopology(); } public void send_submitTopology(String name, String uploadedJarLocation, String jsonConf, StormTopology topology) throws org.apache.thrift.TException { submitTopology_args args = new submitTopology_args(); args.set_name(name); args.set_uploadedJarLocation(uploadedJarLocation); args.set_jsonConf(jsonConf); args.set_topology(topology); sendBase(“submitTopology”, args); } public void recv_submitTopology() throws AlreadyAliveException, InvalidTopologyException, AuthorizationException, org.apache.thrift.TException { submitTopology_result result = new submitTopology_result(); receiveBase(result, “submitTopology”); if (result.e != null) { throw result.e; } if (result.ite != null) { throw result.ite; } if (result.aze != null) { throw result.aze; } return; } public void uploadBlobChunk(String session, ByteBuffer chunk) throws AuthorizationException, org.apache.thrift.TException { send_uploadBlobChunk(session, chunk); recv_uploadBlobChunk(); } public void send_uploadBlobChunk(String session, ByteBuffer chunk) throws org.apache.thrift.TException { uploadBlobChunk_args args = new uploadBlobChunk_args(); args.set_session(session); args.set_chunk(chunk); sendBase(“uploadBlobChunk”, args); } public void recv_uploadBlobChunk() throws AuthorizationException, org.apache.thrift.TException { uploadBlobChunk_result result = new uploadBlobChunk_result(); receiveBase(result, “uploadBlobChunk”); if (result.aze != null) { throw result.aze; } return; }通过sendBase发送数据，通过receiveBase接收数据小结storm的submitTopology会先上传storm.dependency.jars指定的依赖jar，再上传storm.dependency.artifacts指定的依赖，最后再上传指定的jar包，他们都是通过远程方法sendBase发送数据以及receiveBase接收数据。docStorm 1.1.0 released ...

apache storm demo示例

从国外网站上翻译的，主要业务是创建移动电话日志分析器。场景 - 移动呼叫日志分析器移动电话及其持续时间将作为Apache Storm的输入提供，Storm将处理并分组相同呼叫者和接收者之间的呼叫及其呼叫总数。创建Spout Spout是用于数据生成的组件。基本上，spout将实现一个IRichSpout接口。“IRichSpout”界面有以下重要方法 - open - 为spout提供执行环境。执行者将运行此方法来初始化spout。 nextTuple - 通过收集器发出生成的数据。 close - spout将要关闭时调用此方法。 declareOutputFields - 声明元组的输出模式。 ack - 确认处理了特定的tuple fail - 指定一个特定的tuple不被处理并且不被重新处理。 open __open__方法的签名如下 - open(Map conf, TopologyContext context, SpoutOutputCollector collector) conf - 为此spout提供storm暴配置。 context - 提供关于topology中spout位置，其任务ID，输入和输出信息的完整信息。 collector - 使我们能够发出将由bolts处理的tuple。 nextTuple __nextTuple__方法的签名如下 - nextTuple() nextTuple（）从与ack（）和fail（）方法相同的循环周期性地调用。当没有工作要做时，它必须释放对线程的控制，以便其他方法有机会被调用。所以nextTuple的第一行检查处理是否完成。如果是这样，它应该睡眠至少一毫秒，以在返回之前减少处理器上的负载。 ...