聊聊storm trident的coordinator

共计 26374 个字符，预计需要花费 66 分钟才能阅读完成。

序
本文主要研究一下 storm trident 的 coordinator
实例
代码示例
@Test
public void testDebugTopologyBuild(){
FixedBatchSpout spout = new FixedBatchSpout(new Fields(“user”, “score”), 3,
new Values(“nickt1”, 4),
new Values(“nickt2”, 7),
new Values(“nickt3”, 8),
new Values(“nickt4”, 9),
new Values(“nickt5”, 7),
new Values(“nickt6”, 11),
new Values(“nickt7”, 5)
);
spout.setCycle(false);
TridentTopology topology = new TridentTopology();
Stream stream1 = topology.newStream(“spout1”,spout)
.each(new Fields(“user”, “score”), new BaseFunction() {
@Override
public void execute(TridentTuple tuple, TridentCollector collector) {
System.out.println(“tuple:”+tuple);
}
},new Fields());

topology.build();
}
这里使用的 spout 为 FixedBatchSpout，它是 IBatchSpout 类型
拓扑图

MasterBatchCoordinator
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java
public class MasterBatchCoordinator extends BaseRichSpout {
public static final Logger LOG = LoggerFactory.getLogger(MasterBatchCoordinator.class);

public static final long INIT_TXID = 1L;

public static final String BATCH_STREAM_ID = “$batch”;
public static final String COMMIT_STREAM_ID = “$commit”;
public static final String SUCCESS_STREAM_ID = “$success”;

private static final String CURRENT_TX = “currtx”;
private static final String CURRENT_ATTEMPTS = “currattempts”;

private List<TransactionalState> _states = new ArrayList();

TreeMap<Long, TransactionStatus> _activeTx = new TreeMap<Long, TransactionStatus>();
TreeMap<Long, Integer> _attemptIds;

private SpoutOutputCollector _collector;
Long _currTransaction;
int _maxTransactionActive;

List<ITridentSpout.BatchCoordinator> _coordinators = new ArrayList();

List<String> _managedSpoutIds;
List<ITridentSpout> _spouts;
WindowedTimeThrottler _throttler;

boolean _active = true;

public MasterBatchCoordinator(List<String> spoutIds, List<ITridentSpout> spouts) {
if(spoutIds.isEmpty()) {
throw new IllegalArgumentException(“Must manage at least one spout”);
}
_managedSpoutIds = spoutIds;
_spouts = spouts;
LOG.debug(“Created {}”, this);
}

public List<String> getManagedSpoutIds(){
return _managedSpoutIds;
}

@Override
public void activate() {
_active = true;
}

@Override
public void deactivate() {
_active = false;
}

@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);
for(String spoutId: _managedSpoutIds) {
_states.add(TransactionalState.newCoordinatorState(conf, spoutId));
}
_currTransaction = getStoredCurrTransaction();

_collector = collector;
Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);
if(active==null) {
_maxTransactionActive = 1;
} else {
_maxTransactionActive = active.intValue();
}
_attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);

for(int i=0; i<_spouts.size(); i++) {
String txId = _managedSpoutIds.get(i);
_coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));
}
LOG.debug(“Opened {}”, this);
}

@Override
public void close() {
for(TransactionalState state: _states) {
state.close();
}
LOG.debug(“Closed {}”, this);
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// in partitioned example, in case an emitter task receives a later transaction than it’s emitted so far,
// when it sees the earlier txid it should know to emit nothing
declarer.declareStream(BATCH_STREAM_ID, new Fields(“tx”));
declarer.declareStream(COMMIT_STREAM_ID, new Fields(“tx”));
declarer.declareStream(SUCCESS_STREAM_ID, new Fields(“tx”));
}

@Override
public Map<String, Object> getComponentConfiguration() {
Config ret = new Config();
ret.setMaxTaskParallelism(1);
ret.registerSerialization(TransactionAttempt.class);
return ret;
}

//……
}

prepare 方法首先从 Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS(topology.trident.batch.emit.interval.millis，在 defaults.yaml 默认为 500)读取触发 batch 的频率配置，然后创建 WindowedTimeThrottler，其 maxAmt 值为 1
这里使用 TransactionalState 在 zookeeper 上维护 transactional 状态
之后读取 Config.TOPOLOGY_MAX_SPOUT_PENDING(topology.max.spout.pending，在 defaults.yaml 中默认为 null)设置_maxTransactionActive，如果为 null，则设置为 1

MasterBatchCoordinator.nextTuple
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java
@Override
public void nextTuple() {
sync();
}

private void sync() {
// note that sometimes the tuples active may be less than max_spout_pending, e.g.
// max_spout_pending = 3
// tx 1, 2, 3 active, tx 2 is acked. there won’t be a commit for tx 2 (because tx 1 isn’t committed yet),
// and there won’t be a batch for tx 4 because there’s max_spout_pending tx active
TransactionStatus maybeCommit = _activeTx.get(_currTransaction);
if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
maybeCommit.status = AttemptStatus.COMMITTING;
_collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
LOG.debug(“Emitted on [stream = {}], [tx_status = {}], [{}]”, COMMIT_STREAM_ID, maybeCommit, this);
}

if(_active) {
if(_activeTx.size() < _maxTransactionActive) {
Long curr = _currTransaction;
for(int i=0; i<_maxTransactionActive; i++) {
if(!_activeTx.containsKey(curr) && isReady(curr)) {
// by using a monotonically increasing attempt id, downstream tasks
// can be memory efficient by clearing out state for old attempts
// as soon as they see a higher attempt id for a transaction
Integer attemptId = _attemptIds.get(curr);
if(attemptId==null) {
attemptId = 0;
} else {
attemptId++;
}
_attemptIds.put(curr, attemptId);
for(TransactionalState state: _states) {
state.setData(CURRENT_ATTEMPTS, _attemptIds);
}

TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);
final TransactionStatus newTransactionStatus = new TransactionStatus(attempt);
_activeTx.put(curr, newTransactionStatus);
_collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, BATCH_STREAM_ID, attempt, newTransactionStatus, this);
_throttler.markEvent();
}
curr = nextTransactionId(curr);
}
}
}
}
nextTuple 就是调用 sync 方法，该方法在 ack 及 fail 中均有调用；sync 方法首先根据事务状态，如果需要提交，则会往 MasterBatchCoordinator.COMMIT_STREAM_ID($commit)发送 tuple；之后根据_maxTransactionActive 以及 WindowedTimeThrottler 限制，符合要求才启动新的 TransactionAttempt，往 MasterBatchCoordinator.BATCH_STREAM_ID($batch)发送 tuple，同时对 WindowedTimeThrottler 标记下 windowEvent 数量
MasterBatchCoordinator.ack
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java
@Override
public void ack(Object msgId) {
TransactionAttempt tx = (TransactionAttempt) msgId;
TransactionStatus status = _activeTx.get(tx.getTransactionId());
LOG.debug(“Ack. [tx_attempt = {}], [tx_status = {}], [{}]”, tx, status, this);
if(status!=null && tx.equals(status.attempt)) {
if(status.status==AttemptStatus.PROCESSING) {
status.status = AttemptStatus.PROCESSED;
LOG.debug(“Changed status. [tx_attempt = {}] [tx_status = {}]”, tx, status);
} else if(status.status==AttemptStatus.COMMITTING) {
_activeTx.remove(tx.getTransactionId());
_attemptIds.remove(tx.getTransactionId());
_collector.emit(SUCCESS_STREAM_ID, new Values(tx));
_currTransaction = nextTransactionId(tx.getTransactionId());
for(TransactionalState state: _states) {
state.setData(CURRENT_TX, _currTransaction);
}
LOG.debug(“Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]”, SUCCESS_STREAM_ID, tx, status, this);
}
sync();
}
}
ack 主要是根据当前事务状态进行不同操作，如果之前是 AttemptStatus.PROCESSING 状态，则更新为 AttemptStatus.PROCESSED；如果之前是 AttemptStatus.COMMITTING，则移除当前事务，然后往 MasterBatchCoordinator.SUCCESS_STREAM_ID($success)发送 tuple，更新_currTransaction 为 nextTransactionId；最后再调用 sync 触发新的 TransactionAttempt
MasterBatchCoordinator.fail
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/MasterBatchCoordinator.java
@Override
public void fail(Object msgId) {
TransactionAttempt tx = (TransactionAttempt) msgId;
TransactionStatus stored = _activeTx.remove(tx.getTransactionId());
LOG.debug(“Fail. [tx_attempt = {}], [tx_status = {}], [{}]”, tx, stored, this);
if(stored!=null && tx.equals(stored.attempt)) {
_activeTx.tailMap(tx.getTransactionId()).clear();
sync();
}
}
fail 方法将当前事务从_activeTx 中移除，然后清空_activeTx 中 txId 大于这个失败 txId 的数据，最后再调用 sync 判断是否该触发新的 TransactionAttempt(注意这里没有变更_currTransaction，因而 sync 方法触发新的 TransactionAttempt 的_txid 还是当前这个失败的_currTransaction)
TridentSpoutCoordinator
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/spout/TridentSpoutCoordinator.java
public class TridentSpoutCoordinator implements IBasicBolt {
public static final Logger LOG = LoggerFactory.getLogger(TridentSpoutCoordinator.class);
private static final String META_DIR = “meta”;

ITridentSpout<Object> _spout;
ITridentSpout.BatchCoordinator<Object> _coord;
RotatingTransactionalState _state;
TransactionalState _underlyingState;
String _id;

public TridentSpoutCoordinator(String id, ITridentSpout<Object> spout) {
_spout = spout;
_id = id;
}

@Override
public void prepare(Map conf, TopologyContext context) {
_coord = _spout.getCoordinator(_id, conf, context);
_underlyingState = TransactionalState.newCoordinatorState(conf, _id);
_state = new RotatingTransactionalState(_underlyingState, META_DIR);
}

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
TransactionAttempt attempt = (TransactionAttempt) tuple.getValue(0);

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
_state.cleanupBefore(attempt.getTransactionId());
_coord.success(attempt.getTransactionId());
} else {
long txid = attempt.getTransactionId();
Object prevMeta = _state.getPreviousState(txid);
Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));
_state.overrideState(txid, meta);
collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));
}

}

@Override
public void cleanup() {
_coord.close();
_underlyingState.close();
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declareStream(MasterBatchCoordinator.BATCH_STREAM_ID, new Fields(“tx”, “metadata”));
}

@Override
public Map<String, Object> getComponentConfiguration() {
Config ret = new Config();
ret.setMaxTaskParallelism(1);
return ret;
}
}

TridentSpoutCoordinator 的 nextTuple 根据 streamId 分别做不同的处理
如果是 MasterBatchCoordinator.SUCCESS_STREAM_ID($success)则表示 master 那边接收到了 ack 已经成功了，然后 coordinator 就清除该 txId 之前的数据，然后回调 ITridentSpout.BatchCoordinator 的 success 方法
如果是 MasterBatchCoordinator.BATCH_STREAM_ID($batch)则要启动新的 TransactionAttempt，则往 MasterBatchCoordinator.BATCH_STREAM_ID($batch)发送 tuple，该 tuple 会被下游的 bolt 接收(在本实例就是使用 TridentSpoutExecutor 包装了用户 spout 的 TridentBoltExecutor)

TridentBoltExecutor
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java
public class TridentBoltExecutor implements IRichBolt {
public static final String COORD_STREAM_PREFIX = “$coord-“;

public static String COORD_STREAM(String batch) {
return COORD_STREAM_PREFIX + batch;
}

RotatingMap<Object, TrackedBatch> _batches;

@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_messageTimeoutMs = context.maxTopologyMessageTimeout() * 1000L;
_lastRotate = System.currentTimeMillis();
_batches = new RotatingMap<>(2);
_context = context;
_collector = collector;
_coordCollector = new CoordinatedOutputCollector(collector);
_coordOutputCollector = new BatchOutputCollectorImpl(new OutputCollector(_coordCollector));

_coordConditions = (Map) context.getExecutorData(“__coordConditions”);
if(_coordConditions==null) {
_coordConditions = new HashMap<>();
for(String batchGroup: _coordSpecs.keySet()) {
CoordSpec spec = _coordSpecs.get(batchGroup);
CoordCondition cond = new CoordCondition();
cond.commitStream = spec.commitStream;
cond.expectedTaskReports = 0;
for(String comp: spec.coords.keySet()) {
CoordType ct = spec.coords.get(comp);
if(ct.equals(CoordType.single())) {
cond.expectedTaskReports+=1;
} else {
cond.expectedTaskReports+=context.getComponentTasks(comp).size();
}
}
cond.targetTasks = new HashSet<>();
for(String component: Utils.get(context.getThisTargets(),
COORD_STREAM(batchGroup),
new HashMap<String, Grouping>()).keySet()) {
cond.targetTasks.addAll(context.getComponentTasks(component));
}
_coordConditions.put(batchGroup, cond);
}
context.setExecutorData(“_coordConditions”, _coordConditions);
}
_bolt.prepare(conf, context, _coordOutputCollector);
}

//……

@Override
public void cleanup() {
_bolt.cleanup();
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
_bolt.declareOutputFields(declarer);
for(String batchGroup: _coordSpecs.keySet()) {
declarer.declareStream(COORD_STREAM(batchGroup), true, new Fields(“id”, “count”));
}
}

@Override
public Map<String, Object> getComponentConfiguration() {
Map<String, Object> ret = _bolt.getComponentConfiguration();
if(ret==null) ret = new HashMap<>();
ret.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 5);
// TODO: Need to be able to set the tick tuple time to the message timeout, ideally without parameterization
return ret;
}
}

prepare 的时候，先创建了 CoordinatedOutputCollector，之后用 OutputCollector 包装，再最后包装为 BatchOutputCollectorImpl，调用 ITridentBatchBolt.prepare 方法，ITridentBatchBolt 这里头使用的实现类为 TridentSpoutExecutor
prepare 初始化了 RotatingMap<Object, TrackedBatch> _batches = new RotatingMap<>(2);
prepare 主要做的是构建 CoordCondition，这里主要是计算 expectedTaskReports 以及 targetTasks

TridentBoltExecutor.execute
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java
@Override
public void execute(Tuple tuple) {
if(TupleUtils.isTick(tuple)) {
long now = System.currentTimeMillis();
if(now – _lastRotate > _messageTimeoutMs) {
_batches.rotate();
_lastRotate = now;
}
return;
}
String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId());
if(batchGroup==null) {
// this is so we can do things like have simple DRPC that doesn’t need to use batch processing
_coordCollector.setCurrBatch(null);
_bolt.execute(null, tuple);
_collector.ack(tuple);
return;
}
IBatchID id = (IBatchID) tuple.getValue(0);
//get transaction id
//if it already exists and attempt id is greater than the attempt there

TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());
// if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {
// System.out.println(“Received in ” + _context.getThisComponentId() + ” ” + _context.getThisTaskIndex()
// + ” (” + _batches.size() + “)” +
// “\ntuple: ” + tuple +
// “\nwith tracked ” + tracked +
// “\nwith id ” + id +
// “\nwith group ” + batchGroup
// + “\n”);
//
// }
//System.out.println(“Num tracked: ” + _batches.size() + ” ” + _context.getThisComponentId() + ” ” + _context.getThisTaskIndex());

// this code here ensures that only one attempt is ever tracked for a batch, so when
// failures happen you don’t get an explosion in memory usage in the tasks
if(tracked!=null) {
if(id.getAttemptId() > tracked.attemptId) {
_batches.remove(id.getId());
tracked = null;
} else if(id.getAttemptId() < tracked.attemptId) {
// no reason to try to execute a previous attempt than we’ve already seen
return;
}
}

if(tracked==null) {
tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId());
_batches.put(id.getId(), tracked);
}
_coordCollector.setCurrBatch(tracked);

//System.out.println(“TRACKED: ” + tracked + ” ” + tuple);

TupleType t = getTupleType(tuple, tracked);
if(t==TupleType.COMMIT) {
tracked.receivedCommit = true;
checkFinish(tracked, tuple, t);
} else if(t==TupleType.COORD) {
int count = tuple.getInteger(1);
tracked.reportedTasks++;
tracked.expectedTupleCount+=count;
checkFinish(tracked, tuple, t);
} else {
tracked.receivedTuples++;
boolean success = true;
try {
_bolt.execute(tracked.info, tuple);
if(tracked.condition.expectedTaskReports==0) {
success = finishBatch(tracked, tuple);
}
} catch(FailedException e) {
failBatch(tracked, e);
}
if(success) {
_collector.ack(tuple);
} else {
_collector.fail(tuple);
}
}
_coordCollector.setCurrBatch(null);
}

private TupleType getTupleType(Tuple tuple, TrackedBatch batch) {
CoordCondition cond = batch.condition;
if(cond.commitStream!=null
&& tuple.getSourceGlobalStreamId().equals(cond.commitStream)) {
return TupleType.COMMIT;
} else if(cond.expectedTaskReports > 0
&& tuple.getSourceStreamId().startsWith(COORD_STREAM_PREFIX)) {
return TupleType.COORD;
} else {
return TupleType.REGULAR;
}
}

private void failBatch(TrackedBatch tracked, FailedException e) {
if(e!=null && e instanceof ReportedFailedException) {
_collector.reportError(e);
}
tracked.failed = true;
if(tracked.delayedAck!=null) {
_collector.fail(tracked.delayedAck);
tracked.delayedAck = null;
}
}

TridentBoltExecutor 的 execute 方法首先判断是否是 tickTuple，如果是判断距离_lastRotate 的时间 (prepare 的时候初始化为当时的时间) 是否超过_messageTimeoutMs，如果是则进行_batches.rotate()操作；tickTuple 的发射频率为 Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS(topology.tick.tuple.freq.secs)，在 TridentBoltExecutor 中它被设置为 5 秒；_messageTimeoutMs 为 context.maxTopologyMessageTimeout() * 1000L，它从整个 topology 的 component 的 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs，defaults.yaml 中默认为 30)最大值 *1000
_batches 按 TransactionAttempt 的 txId 来存储 TrackedBatch 信息，如果没有则创建一个新的 TrackedBatch；创建 TrackedBatch 时，会回调_bolt 的 initBatchState 方法
之后判断 tuple 的类型，这里分为 TupleType.COMMIT、TupleType.COORD、TupleType.REGULAR；如果是 TupleType.COMMIT 类型，则设置 tracked.receivedCommit 为 true，然后调用 checkFinish 方法；如果是 TupleType.COORD 类型，则更新 reportedTasks 及 expectedTupleCount 计数，再调用 checkFinish 方法；如果是 TupleType.REGULAR 类型(coordinator 发送过来的 batch 信息)，则更新 receivedTuples 计数，然后调用_bolt.execute 方法(这里的_bolt 为 TridentSpoutExecutor)，对于 tracked.condition.expectedTaskReports== 0 的则立马调用 finishBatch，将该 batch 从_batches 中移除；如果有 FailedException 则直接 failBatch 上报 error 信息，之后对 tuple 进行 ack 或者 fail；如果下游是 each 操作，一个 batch 中如果是部分抛出 FailedException 异常，则需要等到所有 batch 中的 tuple 执行完，等到 TupleType.COORD 触发检测 checkFinish，这个时候才能 fail 通知到 master，也就是有一些滞后性，比如这个 batch 中有 3 个 tuple，第二个 tuple 抛出 FailedException，还会继续执行第三个 tuple，最后该 batch 的 tuple 都处理完了，才收到 TupleType.COORD 触发检测 checkFinish。

TridentBoltExecutor.checkFinish
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java
private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) {
if(tracked.failed) {
failBatch(tracked);
_collector.fail(tuple);
return;
}
CoordCondition cond = tracked.condition;
boolean delayed = tracked.delayedAck==null &&
(cond.commitStream!=null && type==TupleType.COMMIT
|| cond.commitStream==null);
if(delayed) {
tracked.delayedAck = tuple;
}
boolean failed = false;
if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) {
if(tracked.receivedTuples == tracked.expectedTupleCount) {
finishBatch(tracked, tuple);
} else {
//TODO: add logging that not all tuples were received
failBatch(tracked);
_collector.fail(tuple);
failed = true;
}
}

if(!delayed && !failed) {
_collector.ack(tuple);
}

}

private void failBatch(TrackedBatch tracked) {
failBatch(tracked, null);
}

TridentBoltExecutor 在 execute 的时候，在 tuple 是 TupleType.COMMIT 以及 TupleType.COORD 的时候都会调用 checkFinish
一旦_bolt.execute(tracked.info, tuple)方法抛出 FailedException，则会调用 failBatch，它会标记 tracked.failed 为 true
checkFinish 在发现 tracked.failed 为 true 的时候，会调用_collector.fail(tuple)，然后回调 MasterBatchCoordinator 的 fail 方法

TridentSpoutExecutor
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/spout/TridentSpoutExecutor.java
public class TridentSpoutExecutor implements ITridentBatchBolt {
public static final String ID_FIELD = “$tx”;

public static final Logger LOG = LoggerFactory.getLogger(TridentSpoutExecutor.class);

AddIdCollector _collector;
ITridentSpout<Object> _spout;
ITridentSpout.Emitter<Object> _emitter;
String _streamName;
String _txStateId;

TreeMap<Long, TransactionAttempt> _activeBatches = new TreeMap<>();

public TridentSpoutExecutor(String txStateId, String streamName, ITridentSpout<Object> spout) {
_txStateId = txStateId;
_spout = spout;
_streamName = streamName;
}

@Override
public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector) {
_emitter = _spout.getEmitter(_txStateId, conf, context);
_collector = new AddIdCollector(_streamName, collector);
}

@Override
public void execute(BatchInfo info, Tuple input) {
// there won’t be a BatchInfo for the success stream
TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);
if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
_activeBatches.remove(attempt.getTransactionId());
} else {
throw new FailedException(“Received commit for different transaction attempt”);
}
} else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
// valid to delete before what’s been committed since
// those batches will never be accessed again
_activeBatches.headMap(attempt.getTransactionId()).clear();
_emitter.success(attempt);
} else {
_collector.setBatch(info.batchId);
_emitter.emitBatch(attempt, input.getValue(1), _collector);
_activeBatches.put(attempt.getTransactionId(), attempt);
}
}

@Override
public void cleanup() {
_emitter.close();
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
List<String> fields = new ArrayList<>(_spout.getOutputFields().toList());
fields.add(0, ID_FIELD);
declarer.declareStream(_streamName, new Fields(fields));
}

@Override
public Map<String, Object> getComponentConfiguration() {
return _spout.getComponentConfiguration();
}

@Override
public void finishBatch(BatchInfo batchInfo) {
}

@Override
public Object initBatchState(String batchGroup, Object batchId) {
return null;
}
}

TridentSpoutExecutor 使用的 BatchOutputCollector 为 TridentBoltExecutor 在 prepare 方法构造的，经过几层包装，先是 CoordinatedOutputCollector，然后是 OutputCollector，最后是 BatchOutputCollectorImpl；这里最主要的是 CoordinatedOutputCollector 包装，它维护每个 taskId 发出的 tuple 的数量；而在这个 executor 的 prepare 方法里头，该 collector 又被包装为 AddIdCollector，主要是添加了 batchId 信息(即 TransactionAttempt 信息)
TridentSpoutExecutor 的 ITridentSpout 就是包装了用户设置的原始 spout(IBatchSpout 类型)的 BatchSpoutExecutor(假设原始 spout 是 IBatchSpout 类型的，因而会通过 BatchSpoutExecutor 包装为 ITridentSpout 类型)，其 execute 方法根据不同 stream 类型进行不同处理，如果是 master 发过来的 MasterBatchCoordinator.COMMIT_STREAM_ID($commit)则调用 emitter 的 commit 方法提交当前 TransactionAttempt(本文的实例没有 commit 信息)，然后将该 tx 从_activeBatches 中移除；如果是 master 发过来的 MasterBatchCoordinator.SUCCESS_STREAM_ID($success)则先把_activeBatches 中 txId 小于该 txId 的 TransactionAttempt 移除，然后调用 emitter 的 success 方法，标记 TransactionAttempt 成功，该方法回调原始 spout(IBatchSpout 类型)的 ack 方法
非 MasterBatchCoordinator.COMMIT_STREAM_ID($commit)及 MasterBatchCoordinator.SUCCESS_STREAM_ID($success)类型的 tuple，则是启动 batch 的消息，这里设置 batchId，然后调用 emitter 的 emitBatch 进行数据发送(这里传递的 batchId 就是 TransactionAttempt 的 txId)，同时将该 TransactionAttempt 放入_activeBatches 中(这里的 batch 相当于 TransactionAttempt)

FixedBatchSpout
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/testing/FixedBatchSpout.java
public class FixedBatchSpout implements IBatchSpout {

Fields fields;
List<Object>[] outputs;
int maxBatchSize;
HashMap<Long, List<List<Object>>> batches = new HashMap<Long, List<List<Object>>>();

public FixedBatchSpout(Fields fields, int maxBatchSize, List<Object>… outputs) {
this.fields = fields;
this.outputs = outputs;
this.maxBatchSize = maxBatchSize;
}

int index = 0;
boolean cycle = false;

public void setCycle(boolean cycle) {
this.cycle = cycle;
}

@Override
public void open(Map conf, TopologyContext context) {
index = 0;
}

@Override
public void emitBatch(long batchId, TridentCollector collector) {
List<List<Object>> batch = this.batches.get(batchId);
if(batch == null){
batch = new ArrayList<List<Object>>();
if(index>=outputs.length && cycle) {
index = 0;
}
for(int i=0; index < outputs.length && i < maxBatchSize; index++, i++) {
batch.add(outputs[index]);
}
this.batches.put(batchId, batch);
}
for(List<Object> list : batch){
collector.emit(list);
}
}

@Override
public void ack(long batchId) {
this.batches.remove(batchId);
}

@Override
public void close() {
}

@Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.setMaxTaskParallelism(1);
return conf;
}

@Override
public Fields getOutputFields() {
return fields;
}

}
用户使用的 spout 是 IBatchSpout 类型，这里缓存了每个 batchId 对应的 tuple 数据，实现的是 transactional spout 的语义
TridentTopology.newStream
storm-1.2.2/storm-core/src/jvm/org/apache/storm/trident/TridentTopology.java
public Stream newStream(String txId, IRichSpout spout) {
return newStream(txId, new RichSpoutBatchExecutor(spout));
}

public Stream newStream(String txId, IBatchSpout spout) {
Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
return addNode(n);
}

public Stream newStream(String txId, ITridentSpout spout) {
Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
return addNode(n);
}

public Stream newStream(String txId, IPartitionedTridentSpout spout) {
return newStream(txId, new PartitionedTridentSpoutExecutor(spout));
}

public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) {
return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout));
}

public Stream newStream(String txId, ITridentDataSource dataSource) {
if (dataSource instanceof IBatchSpout) {
return newStream(txId, (IBatchSpout) dataSource);
} else if (dataSource instanceof ITridentSpout) {
return newStream(txId, (ITridentSpout) dataSource);
} else if (dataSource instanceof IPartitionedTridentSpout) {
return newStream(txId, (IPartitionedTridentSpout) dataSource);
} else if (dataSource instanceof IOpaquePartitionedTridentSpout) {
return newStream(txId, (IOpaquePartitionedTridentSpout) dataSource);
} else {
throw new UnsupportedOperationException(“Unsupported stream”);
}
}

用户在 TridentTopology.newStream 可以直接使用 IBatchSpout 类似的 spout，使用它的好处就是 TridentTopology 在 build 的时候会使用 BatchSpoutExecutor 将其包装为 ITridentSpout 类型(省得用户再去实现 ITridentSpout 的相关接口，屏蔽 trident spout 的相关逻辑，使得之前一直使用普通 topology 的用户可以快速上手 trident topology)
BatchSpoutExecutor 实现了 ITridentSpout 接口，将 IBatchSpout 适配为 ITridentSpout，使用的 coordinator 是 EmptyCoordinator，使用的 emitter 是 BatchSpoutEmitter
如果用户在 TridentTopology.newStream 使用的 spout 是 IPartitionedTridentSpout 类型，则 TridentTopology 在 newStream 方法内部会使用 PartitionedTridentSpoutExecutor 将其包装为 ITridentSpout 类型，对于 IOpaquePartitionedTridentSpout 则使用 OpaquePartitionedTridentSpoutExecutor 将其包装为 ITridentSpout 类型

小结

TridentTopology 在 newStream 或者 build 方法里头会将 ITridentDataSource 中不是 ITridentSpout 类型的 IBatchSpout(在 build 方法)、IPartitionedTridentSpout(在 newStream 方法)、IOpaquePartitionedTridentSpout(在 newStream 方法)适配为 ITridentSpout 类型；分别使用 BatchSpoutExecutor、PartitionedTridentSpoutExecutor、OpaquePartitionedTridentSpoutExecutor 进行适配(TridentTopologyBuilder 在 buildTopology 的时候，对于 ITridentSpout 类型的 spout 先用 TridentSpoutExecutor 包装，再用 TridentBoltExecutor 包装，最后转换为 bolt，而整个 TridentTopology 真正的 spout 就是 MasterBatchCoordinator；这里可以看到一个 IBatchSpout 的 spout 先经过 BatchSpoutExecutor 包装为 ITridentSpout 类型，之后再经过 TridentSpoutExecutor 及 TridentBoltExecutor 包装为 bolt)
IBatchSpout 的 ack 是针对 batch 维度的，也就是 TransactionAttempt 维度，注意这里没有 fail 方法，如果 emitBatch 方法抛出了 FailedException 异常，则 TridentBoltExecutor 会调用 failBatch 方法(一个 batch 的 tuples 会等所有 tuple 执行完再触发 checkFinish)，进行 reportError 以及标记 TrackedBatch 的 failed 为 true，之后 TridentBoltExecutor 在 checkFinish 的时候，一旦发现 tracked.failed 为 true 的时候，会调用_collector.fail(tuple)，然后回调 MasterBatchCoordinator 的 fail 方法
MasterBatchCoordinator 的 fail 方法会将当前 TransactionAttempt 从_activeTx 移除，然后一并移除 txId 大于失败的 txId 的数据，最后调用 sync 方法继续 TransactionAttempt(注意这里没有更改_currTransaction 值，因而会继续从失败的 txId 开始重试，只有在 ack 方法里头会更改_currTransaction 为 nextTransactionId)
TridentBoltExecutor 的 execute 方法会根据 tickTuple 来检测距离上次 rotate 是否超过_messageTimeoutMs(取 component 中 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS 最大值 *1000，这里 *1000 是将秒转换为毫秒)，超过的话进行 rotate 操作，_batches 的最后一个 bucket 将会被移除掉；这里的 tickTuple 的频率为 5 秒，Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS 按 30 秒算的话，_messageTimeoutMs 为 30*1000，相当于每 5 秒检测一下距离上次 rotate 时间是否超过 30 秒，如果超过则进行 rotate，丢弃最后一个 bucket 的数据(TrackedBatch)，这里相当于重置超时的 TrackedBatch 信息
关于 MasterBatchCoordinator 的 fail 的情况，有几种情况，一种是下游 componnent 主动抛出 FailException，这个时候会触发 master 的 fail，再次重试 TransactionAttempt；一种是下游 component 处理 tuple 时间超过 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs，defaults.yaml 中默认为 30)，这个时候 ack 会触发 master 的 fail，导致该 TransactionAttempt 失败继续重试，目前没有对 attempt 的次数做限制，实际生产过程中要注意，因为只要该 batchId 的一个 tuple 失败，整个 batchId 的 tuples 都会重发，这个时候下游如果没有做好处理，可能会出现一个 batchId 中前面部分 tuple 成功，后面部分失败，导致成功的 tuple 不断重复处理(要避免失败的 batch 中 tuples 部分处理成功部分处理失败这个问题就需要配合使用 Trident 的 State)。

doc

Trident Spouts
Trident State
聊聊 storm TridentTopology 的构建

聊聊storm trident的coordinator

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）