IDEA搭建Flink開發環境及WordCount

 

1、創建一個Maven項目並配置Java SDK和Scala SDK,如圖:

這裏選擇的是jdk1.8和scala2.12版本。

 

2、添加pom依賴

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.leboop</groupId>
    <artifactId>com.leboop.www</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.12</scala.version>
        <flink.version>1.9.3</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_${scala.version}</artifactId>
            <version>${flink.version}</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_${scala.version}</artifactId>
            <version>${flink.version}</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>
</project>

scala和flink版本分別爲2.12和1.9.3。

 

3、BatchWordCount

批處理的WordCount程序代碼如下:

package wordcount

import org.apache.flink.streaming.api.scala._
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}

object BatchWordCount {

  def main(args: Array[String]) {
    val env = ExecutionEnvironment.getExecutionEnvironment

    val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
    // get input data
    val text: DataSet[String] = env.readTextFile(filePath)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).groupBy(0).sum(1)

    counts.print()
  }
}

程序讀取word.txt文件,統計詞頻。word.txt內容如下:

hello world
hello java
hello scala

輸出如下:

(scala,1)
(world,1)
(hello,3)
(java,1)

 

4、StreamingWordCount

使用流式處理統計詞頻,代碼如下:

package wordcount

import org.apache.flink.streaming.api.scala._

/**
  * Created by leboop on 2020/5/19.
  */
object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
    // get input data
    val text: DataStream[String] = env.readTextFile(filePath)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).keyBy(0).sum(1)

    counts.print()
    env.execute("Streaming Count")
  }
}

 輸出如下:

3> (hello,1)
1> (scala,1)
5> (world,1)
3> (hello,2)
3> (hello,3)
2> (java,1)

 

監聽端口統計詞頻,代碼如下:

package wordcount

import org.apache.flink.streaming.api.scala._

/**
  * Created by leboop on 2020/5/19.
  */
object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    // get input data
    val text: DataStream[String] = env.socketTextStream("192.168.128.111", 6666)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).keyBy(0).sum(1)

    counts.print()
    env.execute("Streaming Count")
  }
}

在192.168.128.111主機上執行如下命令啓動端口

nc -lk 6666

啓動程序對端口監聽

如圖:

 

5、batch和stream不同

(1)環境不同

batch:

val env = ExecutionEnvironment.getExecutionEnvironment

stream:

val env = StreamExecutionEnvironment.getExecutionEnvironment

(2)啓動不同

streaming:

env.execute("Streaming Count")

batch:

啓動程序即可。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章