Flink學習(一):SocketWindowWordCount示例

 

參考資料:官方文檔 & 官方示例代碼

 

首先是環境配置,很簡單,下載下來解壓就可以。

然後運行bin/start-cluster.sh啓動Flink,雖然腳本名字是cluster,不過默認配置是啓動的本地模式。啓動後,可以在瀏覽器輸入localhost:8081進入Dashboard:

接下來就是按照給定的代碼編寫示例程序並運行,不過程序example目錄已經有打包好的示例程序,可以拿來直接運行。SocketWindowWordCount源碼如下:

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // get input data by connecting to the socket
        DataStream<String> text = env.socketTextStream("localhost", 9000, "\n");

        // parse the data, group it, window it, and aggregate the counts
        DataStream<WordWithCount> windowCounts = text
            .flatMap(new FlatMapFunction<String, WordWithCount>() {
                @Override
                public void flatMap(String value, Collector<WordWithCount> out) {
                    for (String word : value.split("\\s")) {
                        out.collect(new WordWithCount(word, 1L));
                    }
                }
            })
            .keyBy("word")
            .timeWindow(Time.seconds(5), Time.seconds(1))
            .reduce(new ReduceFunction<WordWithCount>() {
                @Override
                public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                    return new WordWithCount(a.word, a.count + b.count);
                }
            });

        // print the results with a single thread, rather than in parallel
        windowCounts.print().setParallelism(1);

        env.execute("Socket Window WordCount");
    }

    // Data type for words with count
    public static class WordWithCount {

        public String word;
        public long count;

        public WordWithCount() {}

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

編寫Flink程序需要引入相關依賴(以Flink 1.9,基於Scala 2.12爲例):

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.9.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.9.1</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

然後就是打包,不過還需要配置打包插件,否則會提示找不到主類(配置抄自官方代碼):

<build>
        <plugins>
            <!-- self-contained jars for each example -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version><!--$NO-MVN-MAN-VER$-->
                <executions>
                    <!-- Default Execution -->
                    <execution>
                        <id>default</id>
                        <phase>package</phase>
                        <goals>
                            <goal>test-jar</goal>
                        </goals>
                    </execution>

                    <execution>
                        <id>SocketWindowWordCount</id>
                        <phase>package</phase>
                        <goals>
                            <goal>jar</goal>
                        </goals>
                        <configuration>
                            <classifier>SocketWindowWordCount</classifier>

                            <archive>
                                <manifestEntries>
                                    <program-class>你的程序路徑.WordCount</program-class>
                                </manifestEntries>
                            </archive>

                            <includes>
                                <include>你的程序路徑/WordCount.class</include>
                                <include>你的程序路徑/WordCount$*.class</include>
                            </includes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

流計算代碼必須位於streaming目錄下,批處理代碼必須位於batch目錄下。

然後運行打好包的flinkdemo-1.0-RELEASE-SocketWindowWordCount.jar,這裏我複製到了Flink根目錄下:

bin/flink run flinkdemo-1.0-RELEASE-SocketWindowWordCount.jar

同時在另一個Terminal中輸入 nc -l 9000,然後隨便輸入一些詞,例如:hello world flink hello flink,此時可以觀察Dashboard,看到新任務正在執行:

在新的Terminal窗口中,觀察Flink目錄下,log/flink-你的用戶名-taskexecutor-0-你的機器名.out,有如下輸出:

root@Yhc-Surface:~/flink-1.9.1/log# tail -f flink-root-taskexecutor-0-Yhc-Surface.out
flink : 2
flink : 2
world : 1
hello : 2
world : 1
flink : 2
hello : 2
hello : 2
world : 1
flink : 2

顯示hello、flink都出現了兩次,world出現一次,和輸入符合。

在nc -l 9000窗口使用Ctrl+C後,示例程序也會退出,且Dashboard中,該程序的狀態會變成FINISHED。

也可以在Dashboard左側工具欄最下方的“Submit New Job”中上傳Jar包運行。

發佈了84 篇原創文章 · 獲贊 12 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章