這裏是一個簡單的Storm的Topology代碼,它在單機模式下執行,可以用來觀察Storm拓撲中各個模塊的執行順序。
使用Storm版本爲storm-0.9.x都可以運行,用IDE,比如idea,創建一個Maven項目,然後在pom.xml中添加以下依賴:
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.3</version>
<scope>compile</scope>
</dependency>
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
import java.util.Map;
import java.util.Random;
/**
* This is a basic example of a Storm topology.
*
* 探究對象的變量在被多個線程共享時,Storm是否已經對它進行同步了?
* 比如ExclamationBolt中的成員變量i,當設置其並行度爲13時,其execute方法的打印結果
* 是同一個i值會被打印13次。
*
* 結論:execute()方法中調用的公共方法或共享變量都會被Storm同步。
*/
public class TestCountTopology {
public static class ExclamationBolt extends BaseRichBolt {
private static final long serialVersionUID = 6618890434446310020L;
OutputCollector _collector;
TopologyContext _context;
int i = 0; //這個變量被多個線程共享時,是否被Storm同步了?
@Override
public void prepare(Map conf, TopologyContext context,
OutputCollector collector) {
_collector = collector;
System.out.println("prepare in BOLT is called!");
_context = context;
}
@Override
public void execute(Tuple tuple) {
System.out.println("execute in BOLT is called! " + i);
System.err.println("Worker_port: " + _context.getThisWorkerPort()
+ " tasks: " + _context.getThisWorkerTasks()
+ " task_id: " + _context.getThisTaskId() + " BOLT receive: "
+ new Values(tuple.getString(0)) + " i: " + i);
inc();
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
private void inc() {
i++;
System.err.println("task_id: " + _context.getThisTaskId() + " inc() was called");
}
}
public static class WordSpout extends BaseRichSpout {
private static final long serialVersionUID = 6962615351294880911L;
SpoutOutputCollector _collector;
int i = 0;
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector = collector;
System.err.println("open in SPOUT is called!");
}
@Override
public void nextTuple() {
System.out.println("nextTuple in SPOUT is called! " + i);
Utils.sleep(100);
final String[] words = new String[] { "nathan", "mike", "jackson",
"golda", "bertels" };
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
System.out.println("SPOUI emit: " + new Values(word) + " i: " + i);
i++;
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
System.out.println("declareOutputFields in SPOUT is called!");
}
public void ack(Object msgId) {
System.out.println("ack in SPOUT is called!");
}
public void fail(Object msgId) {
System.out.println("fail in SPOUT is called!");
}
public void close() {
System.out.println("close in SPOUT is called!");
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
// builder.setSpout("word", new TestWordSpout(), 10);
builder.setSpout("word", new WordSpout(), 1);
builder.setBolt("exclaim1", new ExclamationBolt(), 13).shuffleGrouping("word");
//builder.setBolt("exclaim2", new ExclamationBolt(), 3).shuffleGrouping("exclaim1");
Config conf = new Config();
conf.setDebug(false);// TOPOLOGY_DEBUG
conf.setNumWorkers(2);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);// TOPOLOGY_WORKERS
StormSubmitter.submitTopologyWithProgressBar(args[0], conf,
builder.createTopology());
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("test");
cluster.shutdown();
/*
* comment out the cluster.killTopology() and cluster.shutdown()
* methods, to avoid the file deletion error.
*/
}
}
}
這個拓撲由兩個節點組成,WordSpout的方法nextTuple會反覆被執行,每次隨機從words這個字符數組中選出一個單詞發送到流中;而ExclamationBolt會從流中讀取這個單詞,然後在它後面添加三個歎號”!!!”,然後再發送到流中。
比如WordSpout產生單詞”nathan”,那麼ExclamationBolt就會產生”nathan!!!”。
在上面的代碼中,ExclamationBolt這個Bolt的並行度設置爲13,這意味着Storm會產生13個線程,每個線程都會反覆執行該Bolt的execute方法,在該方法中的變量i的值會被訪問,然後自增1。
Q: 由於這個變量會被多個線程共享訪問和修改,那麼需要我們進行同步嗎?
A: 不需要, 運行Topology發現,*ExclamationBolt的execute方法會將接收到的每個單詞添加”!!!”後打印13次,變量i也會以相同值被打印13次後才自增1,這說明Storm幫我們完成了同步。