Chapter 1 Introduction to Apache Flink-Quick start setup

Quick start setup

Now that we understand the details about Flink's architecture and its process model, it's time to get started with a quick setup and try out things on our own. Flink works on both . Windows and Linux machines
(截止目前,我們已經理解了Flink的架構和運行模型的相關細節,該安裝並嘗試做一些我們自己的事情了。Flink 可以同時工作在windows 和linux 平臺上)
The very first thing we need to do is to download Flink's binaries. Flink can be downloaded from the Flink download page at: http://flink. apache.org/downloads.html(首先我們要做的是下載Flink的二進制包,見下面的地址)
On the download page, you will see multiple options as shown in the following screenshot:
(下載地址你可以看到更多象下邊截圖的選項)

In order to install Flink, you don't need to have Hadoop installed. But in case you need to connect to Hadoop using Flink then you need to download the exact binary that is compatible with the Hadoop version you have with you.(爲了安裝Flink,你不必安裝Hadoop。但如果你需要用 Flink去鏈接Hadoop ,你需要下載確切的hadoop 兼容版本。)

As I have latest version of Hadoop 2.7.0 installed with me, I am going to download the Flink binary compatible with Hadoop 2.7.0 and built on Scala 2.11.Here is direct link to download http://www-us.apache.orq/dist/flink/flink-1.1.4/flink-1.1.4-bin-hadoop27-scal.a_2.11.tgz
(因爲我已經安裝了hadoop2.7.0版本,所以我下載了兼容hadoop 2.7.0 並構建在scala2.11上的Flink版本。下面是下載鏈接)

Pre-requisite

Flink needs Java to be installed first. So before you start, please make sure Java is installed. I have IDK 1.8 installed on my machine:

D:\\java -veresion
Java version "1.8.0_92"
Java<TM>SE Runtime Envirenment (Build 1.8.92-b14)
Java HotSport(TM) 64-Bit Server VM (build 25.92-b14 mixed mode)

Installing on Windows

Flink installation is very easy to install. Just extract the compressed file and store it on the desired location
(Flink安裝非常簡單,只是解壓並將其保存至指定的目錄)
Once extracted, go to the folder and execute start-local.bat:
(解壓完後,CD 到該目錄並執行start-local.bat

>cd flink-1.1.4
>bin\start-local.bat

And you will see that the local instance of Flink has started.You can also check the web UI on http://localhost :8081/:
(你會看到本地的Flink實例已經啓動。你也可以通過WEB UI檢測是否安裝成功。)
You can stop the Flink process by pressing Cltr +C
(可以按Ctrl+C停止Flink進程)

Installing on Linux

Similar to Windows, installing Flink on Linux machines is very easy. We need to download the binary, place it in a specific folder, extract, and finish:(象在windows 一下,在linux 上安裝也很簡單。我們需要下載二進制包放到指定的目錄,解壓,並完成下面命令)

$sudo tar -xzf flink-1.1.4-bin-hadoop27-scala 2.11.tqz 
$cd flink-1.1.4
$bin/start-local.sh

As in Windows, please make sure Java is installed on the machine
(跟windows一樣,確保JAVA 已經安裝)
Now we are all set to submit a Flink job. To stop the local Flink instance on Linux, execute following command:
(現在我們可以提交一個Flink Job了。停止本地Flink 實例,用下面的命令)

$bin/stop-local.sh

Cluster setup

Setting up a Flink cluster is very simple as well. Those who have a background of installing a Hadoop cluster will be able to relate to these steps very easily. In order to set up the cluster, let's assume we have four Linux machines with us, each having a moderate configuration. At least two cores and 4 GB RAM machines would be a good option to get started.The very first thing we need to do this is to choose the cluster design. As we have four machines, we will use one machine as the Job Manager and the other three machines as the Task Managers:
(安裝Flink Cluster 也是非常簡單。有安裝Hadoop安裝經驗的人很容易將這幾步關聯起來。爲了安裝Flink集羣,假設我們有4個臺機器,每臺都有合適的配置(至少兩核和4G RAM將的機器比較合適)。最首要的是選擇集羣設計。因爲我們有4臺機器,我們用一臺作爲Job Manager並且另外三臺作爲Task Managers

SSH configurations

In order to set up the cluster, we first need to do password less connections to the TaskManager from the Job Manager machine. The following steps needs to be performed on the Job Manager machine which creates an SSH key and copies it to authorized_keys
(爲了安裝集羣,我們首先需要JobManagerTaskManager之間的免密連接。下面的步驟需要在Job Manager上創建SSH keycopyauthorized_keys)

Ssh-keygen

This will generate the public and private keys in the /home/flinkuser/.ssh folder. Now copy the public key to the Task Manager machine and perform the following steps on the Task Manager to allow password less connection from the Job Manager(這個命令會生成公/私鑰對在/home/flinuser/.ssh目錄下。現在我們copy public keyTask Manager上,並在Task Manager上執行下面步驟允許Job Manager可以免密登錄到Task Manager上)

sudo mkdir -p /home/flinkuser/.ssh
sudo touch /home/flinkuser/authorized_keys
sudo cp /home/flinkuser/.ssh  //error 譯者注
sudo sh -c "cat id rsa.pub >> /home/flinkuser/.ssh/authorized keys". 

Make sure the keys have restricted access by executing the following commands
(執行下面的命令保證key的訪問權限。)

sudo chmod 700 /home/flinkuser/.ssh
sudo chmod 600 /home/flinkuser/.ssh/authorized keys

Now you can test the password less SSH connection from the Job Manager machine

現在你可以從Job Manager上測試一下免密登錄。

sudo ssh <task-manager-1> 
sudo ssh <task-manager-2> 
sudo ssh <task-manager-3>

If you are using any cloud service instances for the installations, please make sure that the ROOT login is enabled from SSH. In order to do this, you need to login to each machine: open file /etc/ssh/sshd_config.Then change the value to PermitRootLogin yes. Once you save the file, restart the SSH service by executing the command:
(如果你用的是雲服務安裝,請確保ROOT用戶的SSH是可用的。你需要登錄到每臺機器,然後打開/etc/ssh/sshd_config。改變PermitRootLogin的值爲yes。保存後,重啓ssh服務)

sudo service sshd restart

Java installation

Next we need to install Java on each machine. The following command will help you install Java on Redhat/CentOS based UNIX machines.

wget --no-check-certificate -no-cookies --header "Cookie:oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/idk/8u92-b14/jdk-8u92-linux-x64 rpm
sudo rpm -ivh jdk-8u92-linux-x64.rpm

Next we need to set up the JAVA_HOME environment variable so that Java is available to access from everywhere.Create a java.sh file

接下來我們需要安裝JAVA_HOME 環境變量以至於JAVA命令可以被訪問,新建java.sh 文件。

sudo vi /etc/profile.d/java.sh

And add following content in it and save it:
(接下來把以下內容加入到文件中並保存)

#!/bin/bash
JAVA HOME=/usr/java/jdk1.8.0_92 
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME 
export CLASSPATH=.

Make the file executable and source it:
(使java.sh文件可執行)

sudo chmod +x /etc/profile.d/java.sh 
source /etc/profile.d/java.sh

You can now check if Java is installed properly:
(檢測 java 是否正確安裝)

$ java -version
java version "1.8.0_92"
Java (TM) SE Runtime Environment (build 1.8.0 92-b14).
Java HotSpot (TM) 64-Bit Server VM (build 25.92-b14, mixed mode).

Repeat these installations steps on the Job Manager and Task Manager machines.
(在其他的Job ManagerTask Manager 機器上重複這幾步)

Flink installation

Once SSH and Java installation is done, we need to download Flink binaries and extract them into a specific folder. Please make a note that the installation directory on all nodes should be same.
So let's get started:
SSHJAVA都安裝完之後,我們需要下載Flink 的二進制安裝包並解壓到指定目錄。請確保所有機器的安裝目錄都一致。好,開始安裝。)

cd /usr/local
sudo wget http://www-eu.apache.orq/dist/flink/flink-1.1.4/flink1.1.4-bin-hadoop27-scala_2.11.tqz
sudo tar -xzf flink-1.1.4-bin-hadoop27-scala_2.11.tqz

現在,Flink 安裝包我們已經準備好了,我們需要做一些相關的配置

Configurations

Flink's configurations are simple. We need to tune a few parameters and we are all set. Most of the configurations are same for the Job Manager node and the Task Manager node. All configurations are done in the conf/flink-conf.yaml file.
(Flink的配置很簡單。我們需要調整一小部分參數,我們都已經準備好了。大部分的參數在Job ManagerTask Manager節點都是相同的。所有的配置都在下邊的文件中配置好了。)
The following is a configuration file for a Job Manager node:

jobmanager.rpc.address: localhost 
jobmanager.rpc.port: 6123 
jobmanager.heap.mb:256 
taskmanager.heap.mb: 512
taskmanager.numberOfTaskSlots: 1

You may want to change memory configurations for the Job Manager and Task Manager based on your node configurations. For the Task Manager, jobmanager.rpc.address should be populated with the correct Job Manager hostname or IP address.
(你也許想根據你節點的配置改變Job ManagerTask Manager的內存配置。對於Task Manager來講jobmanager.rpc.address應該是正確的Job ManagerhostnameIP address。)

So for all Task Managers, the configuration file should be like the following:
(所有對於所有的Task Mangers,它們的配置文件應該是下面這樣的)

jobmanager.rpc.address:<jobmanager-ip-or-host>
jobmanager.rpc.port:6123
jobmanaqer.heap.mb:250
taskmanager.heap.mb:512
taskmanager.numberOfTaskSlots:1

We need to add the JAVA_HOME details in this file so that Flink knows exactly where to look for Java binaries
(我們還需要將JAVA_HOME放在這個文件裏,以便Flink知道JAVA的確切位置。)

export JAVA HOME=/usr/java/idk1.8.0_92

We also need to add the slave node details in the conf/slaves file, with each node on a separate new line.Here is how a sample conf/slaves file should look like:
我們也需要將slave 節點加到conf/slaves文件中,每行一個節點。下面是例子:

<task-manager-1>
<task-manager-2>
<task-manager-3>

Starting daemons

Now the only thing left is starting the Flink processes. We can start each process separately on individual nodes or we can execute the start-cluster.sh command to start the required processes on each node:
(現在剩下的事就是啓動Flink進程了。我們可以分別在每一個節點上啓動每個進程或我們可以在每一個節點上執行start-cluster.sh命令)

bin/start-cluster.sh

If all the configurations are good, then you would see that the cluster is up and running.You can check the web UI at http://<job-manager-ip>:8081/.The following are some snapshots of the Flink Web UI
(如果所有的配置都是OK的,那麼你可以看到集羣已經啓動並運行了。你可以通過下邊的地址訪問WEB UI 來驗證是否安裝成功。下面是Flink WEB UI的截圖)

You can click on the Job Manager link to get the following view:
點擊Job Manager顯示如下

Similarly, you can check out the Task Managers view as follows:
同樣的,點擊Task Manager顯示如下

Adding additional Job/Task Managers

Flink provides you with the facility to add additional instances of Job and Task Managers to the running cluster.Before we start the daemon, please make sure that you have followed the steps given previously.To add an additional Job Manager to the existing cluster, execute the following command: sudo bin/jobmanager.sh start cluster Similarly, we need to execute the following command to add an additional Task Manager sudo bin/taskmanager.sh start cluster

(Flink提供在線增加Job ManagerTask Manager的功能。在我們啓動daemon之前,你確認已經執行了上面我們提到那些步驟。加Job Manager到現有集羣有如下命令

sudo bin/jobmanager.sh start cluster 

同樣的,追加Task Manager執行下如下命令

sudo bin/taskmanager.sh  start cluster

)

Stopping daemons and cluster

Once the job execution is completed, you want to shut down the cluster. The following commands are used for that.
(一旦job 運行結束,你需要停止集羣。可以用下面的命令)
To stop the complete cluster in one go:
關閉集羣裏的所有進程

sudo bin/stop-cluster.sh

To stop the individual Job Manager:
關閉一個Job Manager

sudo bin/jobmanager.sh stop cluster

To stop the individual Task Manager:
關閉一個Task Manager

sudo bin/tasknanager.sh stop cluster

Running sample application

Flink binaries come with a sample application which can be used as it is. Let's start with a very simple application, word count. Here we are going try a streaming application which reads data from the netcat server on a specific port.
(Flink 的包裏帶着一個簡單的程序,這個程序是可以用的。我們開始啓動這個簡單的應用程序word count。在這裏,我們嘗試一個streaming應用程序,這個程序從metcat服務的批定端口讀數據。)
So let's get started. First start the netcat server on port 9000 by executing the following command:
(我們先啓動netcat服務在端口9000)

nc -l 9999

Now the netcat server will be start listening on port 9000 so whatever you type on the command prompt will be sent to the Flink processing.
(現在netcat服務器已經啓程並監聽9000端口,所以你在命令提示符下敲的內容都會被髮送到Flink 進程)
Next we need to start the Flink sample program to listen to the netcat server. The following is the command
接下來我們需要啓動Flink的示例程序來監聽netcat服務。命令如下:

bin/flink run examples/streaming/SocketTextstreamWordCount.iar --hostname localhost --port 9000
08/06/2016 10:32:40 Job execution switched to status RUNNING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map(1/1) switched to SCHEDULED
08/06/2016 10:32:40 Source: Socket stream -> Flat Map (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to SCHRDULED
08/06/2016 10:32:40 Keyed Aqqreqation -> Sink: Unnamed (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map (1/1) switched to RUNNING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING

This will start the Flink job execution. Now you can type something on the netcat console and Flink will process it
For example, type the following on the netcat server:
(這會啓動Flink job的運行。現在我們可以在netcat控制檯敲一些內容,Flink會處理它。比如:敲下面的內容在netcat控制檯)

Snc-1 9000
hi Hello
Hello World
This distribution includes crvptographic software. 
The country in.which you currently reside may have restrictions on the import,
 possession. use, and/or re-export to another country, of. encryption software BEFORE using any encryption software. 
please check your country's laws. regulations and policies 
concerning the import, possession, or use, and re-export of 
encryption software, to see if this is permitted. 
See <http://www.wassenaar.org/> for more information.

You can verify the output in logs

$ tail-f flink--taskmanager--flink-instance-*.out.
==> flink-root-taskmanager-0-flink-instance-1.out <==
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(org, 1)
(for, 1)
(more, 1)
(information, 1)
(hellow, 1) 
(world, 1)
==> flink-root-taskmananer-1-flink-instance-1 out <==
(is,1)
(permitted, 1)
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(orq, 1)
(for, 1)
(more, 1)
(information, 1)

==> flirk-root-taskmanager-2-flink-instance-1.out <==
(he11o, 1)
(worlds, 1)
(hi, 1)
(how, 1)
(are, 1)
(you, 1)
(how, 2)
(is,1)
(it,1)
(going, 1)

You can also checkout the Flink Web UI to see how your job is performing. The following screenshot shows the data flow plan for the execution:
你可以打開Flink WEB UI看一下你的job是如何 運行的。下面的截圖顯示了data flow執行計劃。

Here for the job execution, Flink has two operators. The first is the source operator which reads data from the Socket stream. The second operator is the transformation operator which aggregates counts of words We can also look at the timeline of the job execution:
(這裏job的執行,Flink 有兩個操作(符)。第一個是source,這個從 Socket stream中讀數據。第二個是transformation,它聚合單詞數,我們也可以看一下job 執行的時間線。)

Summary

In this chapter, we talked about how Flink started as a university project and then became a full-fledged enterprise-ready data processing platform. We looked at the details of Flink's.architecture and how its process model works. We also learnt how to run Flink in local and cluster modes
在這一章,我們討論了從大學裏發起的Flink項目,變成一個成熟的企業級的數據處理平臺。我們也學習了更多Flink架構的細節和它的處理模型。我們也學習了怎樣以localcluster方式運行。
In the next chapter, we are going to learn about Flink's Streaming API and look at its details and how can we use that API to solve our data streaming processing problems.
在下面的章節當中,我們將要學習Flink的Streaming API,並學習它的相關細節,以及我們怎樣用API來解決我們的流處理問題。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章