1、創建一個Maven項目並配置Java SDK和Scala SDK,如圖:
這裏選擇的是jdk1.8和scala2.12版本。
2、添加pom依賴
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.leboop</groupId>
<artifactId>com.leboop.www</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<flink.version>1.9.3</flink.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</project>
scala和flink版本分別爲2.12和1.9.3。
3、BatchWordCount
批處理的WordCount程序代碼如下:
package wordcount
import org.apache.flink.streaming.api.scala._
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
object BatchWordCount {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnvironment
val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
// get input data
val text: DataSet[String] = env.readTextFile(filePath)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).groupBy(0).sum(1)
counts.print()
}
}
程序讀取word.txt文件,統計詞頻。word.txt內容如下:
hello world
hello java
hello scala
輸出如下:
(scala,1)
(world,1)
(hello,3)
(java,1)
4、StreamingWordCount
使用流式處理統計詞頻,代碼如下:
package wordcount
import org.apache.flink.streaming.api.scala._
/**
* Created by leboop on 2020/5/19.
*/
object StreamingWordCount {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
// get input data
val text: DataStream[String] = env.readTextFile(filePath)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).keyBy(0).sum(1)
counts.print()
env.execute("Streaming Count")
}
}
輸出如下:
3> (hello,1)
1> (scala,1)
5> (world,1)
3> (hello,2)
3> (hello,3)
2> (java,1)
監聽端口統計詞頻,代碼如下:
package wordcount
import org.apache.flink.streaming.api.scala._
/**
* Created by leboop on 2020/5/19.
*/
object StreamingWordCount {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// get input data
val text: DataStream[String] = env.socketTextStream("192.168.128.111", 6666)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).keyBy(0).sum(1)
counts.print()
env.execute("Streaming Count")
}
}
在192.168.128.111主機上執行如下命令啓動端口
nc -lk 6666
啓動程序對端口監聽
如圖:
5、batch和stream不同
(1)環境不同
batch:
val env = ExecutionEnvironment.getExecutionEnvironment
stream:
val env = StreamExecutionEnvironment.getExecutionEnvironment
(2)啓動不同
streaming:
env.execute("Streaming Count")
batch:
啓動程序即可。