序
本文主要研究一下storm WindowTridentProcessor的FreshCollector
實例
TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout) .partitionBy(new Fields("user")) .window(windowConfig,windowsStoreFactory,new Fields("user","score"),new UserCountAggregator(),new Fields("aggData")) .parallelismHint(1) .each(new Fields("aggData"), new PrintEachFunc(),new Fields());
- 這個實例在window操作之後跟了一個each操作
WindowTridentProcessor
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.java
public class WindowTridentProcessor implements TridentProcessor { private FreshCollector collector; //...... public void prepare(Map stormConf, TopologyContext context, TridentContext tridentContext) { this.topologyContext = context; List<TridentTuple.Factory> parents = tridentContext.getParentTupleFactories(); if (parents.size() != 1) { throw new RuntimeException("Aggregation related operation can only have one parent"); } Long maxTuplesCacheSize = getWindowTuplesCacheSize(stormConf); this.tridentContext = tridentContext; collector = new FreshCollector(tridentContext); projection = new TridentTupleView.ProjectionFactory(parents.get(0), inputFields); windowStore = windowStoreFactory.create(stormConf); windowTaskId = windowId + WindowsStore.KEY_SEPARATOR + topologyContext.getThisTaskId() + WindowsStore.KEY_SEPARATOR; windowTriggerInprocessId = getWindowTriggerInprocessIdPrefix(windowTaskId); tridentWindowManager = storeTuplesInStore ? new StoreBasedTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector(), maxTuplesCacheSize, inputFields) : new InMemoryTridentWindowManager(windowConfig, windowTaskId, windowStore, aggregator, tridentContext.getDelegateCollector()); tridentWindowManager.prepare(); } public void finishBatch(ProcessorContext processorContext) { Object batchId = processorContext.batchId; Object batchTxnId = getBatchTxnId(batchId); LOG.debug("Received finishBatch of : [{}] ", batchId); // get all the tuples in a batch and add it to trident-window-manager List<TridentTuple> tuples = (List<TridentTuple>) processorContext.state[tridentContext.getStateIndex()]; tridentWindowManager.addTuplesBatch(batchId, tuples); List<Integer> pendingTriggerIds = null; List<String> triggerKeys = new ArrayList<>(); Iterable<Object> triggerValues = null; if (retriedAttempt(batchId)) { pendingTriggerIds = (List<Integer>) windowStore.get(inprocessTriggerKey(batchTxnId)); if (pendingTriggerIds != null) { for (Integer pendingTriggerId : pendingTriggerIds) { triggerKeys.add(triggerKey(pendingTriggerId)); } triggerValues = windowStore.get(triggerKeys); } } // if there are no trigger values in earlier attempts or this is a new batch, emit pending triggers. if(triggerValues == null) { pendingTriggerIds = new ArrayList<>(); Queue<StoreBasedTridentWindowManager.TriggerResult> pendingTriggers = tridentWindowManager.getPendingTriggers(); LOG.debug("pending triggers at batch: [{}] and triggers.size: [{}] ", batchId, pendingTriggers.size()); try { Iterator<StoreBasedTridentWindowManager.TriggerResult> pendingTriggersIter = pendingTriggers.iterator(); List<Object> values = new ArrayList<>(); StoreBasedTridentWindowManager.TriggerResult triggerResult = null; while (pendingTriggersIter.hasNext()) { triggerResult = pendingTriggersIter.next(); for (List<Object> aggregatedResult : triggerResult.result) { String triggerKey = triggerKey(triggerResult.id); triggerKeys.add(triggerKey); values.add(aggregatedResult); pendingTriggerIds.add(triggerResult.id); } pendingTriggersIter.remove(); } triggerValues = values; } finally { // store inprocess triggers of a batch in store for batch retries for any failures if (!pendingTriggerIds.isEmpty()) { windowStore.put(inprocessTriggerKey(batchTxnId), pendingTriggerIds); } } } collector.setContext(processorContext); int i = 0; for (Object resultValue : triggerValues) { collector.emit(new ConsList(new TriggerInfo(windowTaskId, pendingTriggerIds.get(i++)), (List<Object>) resultValue)); } collector.setContext(null); } }
- WindowTridentProcessor在prepare的時候創建了FreshCollector
- finishBatch的時候,調用FreshCollector.emit將窗口的aggregate的結果集傳遞過去
- 傳遞的數據結構爲ConsList,其實是個AbstractList的實現,由Object類型的first元素,以及List<Object>結構的_elems組成
FreshCollector
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/FreshCollector.java
public class FreshCollector implements TridentCollector { FreshOutputFactory _factory; TridentContext _triContext; ProcessorContext context; public FreshCollector(TridentContext context) { _triContext = context; _factory = new FreshOutputFactory(context.getSelfOutputFields()); } public void setContext(ProcessorContext pc) { this.context = pc; } @Override public void emit(List<Object> values) { TridentTuple toEmit = _factory.create(values); for(TupleReceiver r: _triContext.getReceivers()) { r.execute(context, _triContext.getOutStreamId(), toEmit); } } @Override public void reportError(Throwable t) { _triContext.getDelegateCollector().reportError(t); } public Factory getOutputFactory() { return _factory; } }
- FreshCollector在構造器裏頭根據context的selfOutputFields(
第一個field固定爲_task_info,之後的幾個field爲用戶在window方法定義的functionFields
)構造FreshOutputFactory - emit方法,首先使用FreshOutputFactory根據outputFields構造TridentTupleView,之後獲取TupleReceiver,調用TupleReceiver的execute方法把TridentTupleView傳遞過去
- 這裏的TupleReceiver有ProjectedProcessor、PartitionPersistProcessor
TridentTupleView.FreshOutputFactory
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/TridentTupleView.java
public static class FreshOutputFactory implements Factory { Map<String, ValuePointer> _fieldIndex; ValuePointer[] _index; public FreshOutputFactory(Fields selfFields) { _fieldIndex = new HashMap<>(); for(int i=0; i<selfFields.size(); i++) { String field = selfFields.get(i); _fieldIndex.put(field, new ValuePointer(0, i, field)); } _index = ValuePointer.buildIndex(selfFields, _fieldIndex); } public TridentTuple create(List<Object> selfVals) { return new TridentTupleView(PersistentVector.EMPTY.cons(selfVals), _index, _fieldIndex); } @Override public Map<String, ValuePointer> getFieldIndex() { return _fieldIndex; } @Override public int numDelegates() { return 1; } @Override public List<String> getOutputFields() { return indexToFieldsList(_index); } }
- FreshOutputFactory是TridentTupleView的一個靜態類,其構造方法主要是計算index以及fieldIndex
- fieldIndex是一個map,key是field字段,value是ValuePointer,記錄其delegateIndex(
這裏固定爲0
)、index及field信息;第一個field爲task_info,index爲0;之後的fields爲用戶在window方法定義的functionFields - 這裏的create方法主要是構造TridentTupleView,其構造器第一個值爲IPersistentVector,第二個值爲index,第三個值爲fieldIndex
ValuePointer
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/ValuePointer.java
public class ValuePointer { public static Map<String, ValuePointer> buildFieldIndex(ValuePointer[] pointers) { Map<String, ValuePointer> ret = new HashMap<String, ValuePointer>(); for(ValuePointer ptr: pointers) { ret.put(ptr.field, ptr); } return ret; } public static ValuePointer[] buildIndex(Fields fieldsOrder, Map<String, ValuePointer> pointers) { if(fieldsOrder.size()!=pointers.size()) { throw new IllegalArgumentException("Fields order must be same length as pointers map"); } ValuePointer[] ret = new ValuePointer[pointers.size()]; for(int i=0; i<fieldsOrder.size(); i++) { ret[i] = pointers.get(fieldsOrder.get(i)); } return ret; } public int delegateIndex; protected int index; protected String field; public ValuePointer(int delegateIndex, int index, String field) { this.delegateIndex = delegateIndex; this.index = index; this.field = field; } @Override public String toString() { return ToStringBuilder.reflectionToString(this); } }
- 這裏的buildIndex,主要是根據selfOutputFields的順序返回ValuePointer數組
ProjectedProcessor
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/ProjectedProcessor.java
public class ProjectedProcessor implements TridentProcessor { Fields _projectFields; ProjectionFactory _factory; TridentContext _context; public ProjectedProcessor(Fields projectFields) { _projectFields = projectFields; } @Override public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) { if(tridentContext.getParentTupleFactories().size()!=1) { throw new RuntimeException("Projection processor can only have one parent"); } _context = tridentContext; _factory = new ProjectionFactory(tridentContext.getParentTupleFactories().get(0), _projectFields); } @Override public void cleanup() { } @Override public void startBatch(ProcessorContext processorContext) { } @Override public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { TridentTuple toEmit = _factory.create(tuple); for(TupleReceiver r: _context.getReceivers()) { r.execute(processorContext, _context.getOutStreamId(), toEmit); } } @Override public void finishBatch(ProcessorContext processorContext) { } @Override public Factory getOutputFactory() { return _factory; } }
- ProjectedProcessor在prepare的時候,創建了ProjectionFactory,其_projectFields就是window方法定義的functionFields,這裏還使用tridentContext.getParentTupleFactories().get(0)提取了parent的第一個Factory,由於是FreshCollector傳遞過來的,因而這裏是TridentTupleView.FreshOutputFactory
- execute的時候,首先調用ProjectionFactory.create方法,對TridentTupleView進行字段提取操作,toEmit就是根據window方法定義的functionFields重新提取的TridentTupleView
- execute方法之後對_context.getReceivers()挨個調用execute操作,將toEmit傳遞過去,這裏的receiver就是window操作之後的各種processor了,比如EachProcessor
TridentTupleView.ProjectionFactory
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/tuple/TridentTupleView.java
public static class ProjectionFactory implements Factory { Map<String, ValuePointer> _fieldIndex; ValuePointer[] _index; Factory _parent; public ProjectionFactory(Factory parent, Fields projectFields) { _parent = parent; if(projectFields==null) projectFields = new Fields(); Map<String, ValuePointer> parentFieldIndex = parent.getFieldIndex(); _fieldIndex = new HashMap<>(); for(String f: projectFields) { _fieldIndex.put(f, parentFieldIndex.get(f)); } _index = ValuePointer.buildIndex(projectFields, _fieldIndex); } public TridentTuple create(TridentTuple parent) { if(_index.length==0) return EMPTY_TUPLE; else return new TridentTupleView(((TridentTupleView)parent)._delegates, _index, _fieldIndex); } @Override public Map<String, ValuePointer> getFieldIndex() { return _fieldIndex; } @Override public int numDelegates() { return _parent.numDelegates(); } @Override public List<String> getOutputFields() { return indexToFieldsList(_index); } }
- ProjectionFactory是TridentTupleView的靜態類,它在構造器裏頭根據projectFields構造index及fieldIndex,這樣create方法就能根據所需的字段創建TridentTupleView
EachProcessor
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/EachProcessor.java
public class EachProcessor implements TridentProcessor { Function _function; TridentContext _context; AppendCollector _collector; Fields _inputFields; ProjectionFactory _projection; public EachProcessor(Fields inputFields, Function function) { _function = function; _inputFields = inputFields; } @Override public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) { List<Factory> parents = tridentContext.getParentTupleFactories(); if(parents.size()!=1) { throw new RuntimeException("Each operation can only have one parent"); } _context = tridentContext; _collector = new AppendCollector(tridentContext); _projection = new ProjectionFactory(parents.get(0), _inputFields); _function.prepare(conf, new TridentOperationContext(context, _projection)); } @Override public void cleanup() { _function.cleanup(); } @Override public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) { _collector.setContext(processorContext, tuple); _function.execute(_projection.create(tuple), _collector); } @Override public void startBatch(ProcessorContext processorContext) { } @Override public void finishBatch(ProcessorContext processorContext) { } @Override public Factory getOutputFactory() { return _collector.getOutputFactory(); } }
- EachProcessor的execute方法,首先設置collector的context爲processorContext,然後調用function.execute方法
- 這裏調用了projection.create(tuple)來提取字段,主要是根據function定義的inputFields來提取
- 這裏傳遞給_function的collector爲AppendCollector
AppendCollector
storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/AppendCollector.java
public class AppendCollector implements TridentCollector { OperationOutputFactory _factory; TridentContext _triContext; TridentTuple tuple; ProcessorContext context; public AppendCollector(TridentContext context) { _triContext = context; _factory = new OperationOutputFactory(context.getParentTupleFactories().get(0), context.getSelfOutputFields()); } public void setContext(ProcessorContext pc, TridentTuple t) { this.context = pc; this.tuple = t; } @Override public void emit(List<Object> values) { TridentTuple toEmit = _factory.create((TridentTupleView) tuple, values); for(TupleReceiver r: _triContext.getReceivers()) { r.execute(context, _triContext.getOutStreamId(), toEmit); } } @Override public void reportError(Throwable t) { _triContext.getDelegateCollector().reportError(t); } public Factory getOutputFactory() { return _factory; } }
- AppendCollector在構造器裏頭創建了OperationOutputFactory,其emit方法也是提取OperationOutputFields,然後挨個調用triContext.getReceivers()的execute方法;如果each之後沒有其他操作,那麼AppendCollector的triContext.getReceivers()就爲空
小結
- WindowTridentProcessor裏頭使用的是FreshCollector,WindowTridentProcessor在finishBatch的時候,會從TridentWindowManager提取window創建的pendingTriggers(
提取之後會將其數據從pendingTriggers移除
),裏頭包含了窗口累積的數據,然後使用FreshCollector發射這些數據,默認第一個value爲TriggerInfo,第二個value就是窗口累積發射的values - FreshCollector的emit方法首先使用TridentTupleView.FreshOutputFactory根據selfOutputFields(
第一個field固定爲_task_info,之後的幾個field爲用戶在window方法定義的functionFields
)構建TridentTupleView,然後挨個調用_triContext.getReceivers()的execute方法 - 後續的receivers中有一個ProjectedProcessor,用於根據window方法定義的functionFields重新提取的TridentTupleView,它的execute方法也類似FreshCollector.emit方法,先提取所需字段構造TridentTupleView,然後挨個調用_triContext.getReceivers()的execute方法(
比如EachProcessor.execute
) - EachProcessor使用的collector爲AppendCollector,它的emit方法也類似FreshCollector的emit方法,先進行字段提取構造TridentTupleView,然後挨個調用_triContext.getReceivers()的execute方法
- FreshCollector的emit方法與ProjectedProcessor的execute方法以及AppendCollector的emit方法都非常類似,首先是使用Factory提取所需字段構建TridentTupleView,然後挨個調用triContext.getReceivers()的execute方法;當一個triContext沒有receiver的時候,tuple的傳遞也就停止了
doc
- Windowing Support in Core Storm