首先是環境配置,很簡單,下載下來解壓就可以。
然後運行bin/start-cluster.sh啓動Flink,雖然腳本名字是cluster,不過默認配置是啓動的本地模式。啓動後,可以在瀏覽器輸入localhost:8081進入Dashboard:
接下來就是按照給定的代碼編寫示例程序並運行,不過程序example目錄已經有打包好的示例程序,可以拿來直接運行。SocketWindowWordCount源碼如下:
public class SocketWindowWordCount {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data by connecting to the socket
DataStream<String> text = env.socketTextStream("localhost", 9000, "\n");
// parse the data, group it, window it, and aggregate the counts
DataStream<WordWithCount> windowCounts = text
.flatMap(new FlatMapFunction<String, WordWithCount>() {
@Override
public void flatMap(String value, Collector<WordWithCount> out) {
for (String word : value.split("\\s")) {
out.collect(new WordWithCount(word, 1L));
}
}
})
.keyBy("word")
.timeWindow(Time.seconds(5), Time.seconds(1))
.reduce(new ReduceFunction<WordWithCount>() {
@Override
public WordWithCount reduce(WordWithCount a, WordWithCount b) {
return new WordWithCount(a.word, a.count + b.count);
}
});
// print the results with a single thread, rather than in parallel
windowCounts.print().setParallelism(1);
env.execute("Socket Window WordCount");
}
// Data type for words with count
public static class WordWithCount {
public String word;
public long count;
public WordWithCount() {}
public WordWithCount(String word, long count) {
this.word = word;
this.count = count;
}
@Override
public String toString() {
return word + " : " + count;
}
}
}
編寫Flink程序需要引入相關依賴(以Flink 1.9,基於Scala 2.12爲例):
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.9.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.9.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
然後就是打包,不過還需要配置打包插件,否則會提示找不到主類(配置抄自官方代碼):
<build>
<plugins>
<!-- self-contained jars for each example -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version><!--$NO-MVN-MAN-VER$-->
<executions>
<!-- Default Execution -->
<execution>
<id>default</id>
<phase>package</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
<execution>
<id>SocketWindowWordCount</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classifier>SocketWindowWordCount</classifier>
<archive>
<manifestEntries>
<program-class>你的程序路徑.WordCount</program-class>
</manifestEntries>
</archive>
<includes>
<include>你的程序路徑/WordCount.class</include>
<include>你的程序路徑/WordCount$*.class</include>
</includes>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
流計算代碼必須位於streaming目錄下,批處理代碼必須位於batch目錄下。
然後運行打好包的flinkdemo-1.0-RELEASE-SocketWindowWordCount.jar,這裏我複製到了Flink根目錄下:
bin/flink run flinkdemo-1.0-RELEASE-SocketWindowWordCount.jar
同時在另一個Terminal中輸入 nc -l 9000,然後隨便輸入一些詞,例如:hello world flink hello flink,此時可以觀察Dashboard,看到新任務正在執行:
在新的Terminal窗口中,觀察Flink目錄下,log/flink-你的用戶名-taskexecutor-0-你的機器名.out,有如下輸出:
root@Yhc-Surface:~/flink-1.9.1/log# tail -f flink-root-taskexecutor-0-Yhc-Surface.out
flink : 2
flink : 2
world : 1
hello : 2
world : 1
flink : 2
hello : 2
hello : 2
world : 1
flink : 2
顯示hello、flink都出現了兩次,world出現一次,和輸入符合。
在nc -l 9000窗口使用Ctrl+C後,示例程序也會退出,且Dashboard中,該程序的狀態會變成FINISHED。
也可以在Dashboard左側工具欄最下方的“Submit New Job”中上傳Jar包運行。