Centos7下搭建大數據處理集羣(Hadoop+Spark+Hbase+Hive+Zookeeper+Kafka+Flume)詳細步驟

一.目錄

左側按鈕顯示

二.集羣規劃

準備3臺虛擬機,設置IP地址和hostname,一般的三臺機器的集羣爲了明確表示主從關係,命名master,slave1,slave2,我這邊直接命名爲hp1,hp2,hp3,hp1當主節點使用,集羣搭建時要考慮各個組件之間的版本兼容,下列這些版本的選擇也是參考了其他人資料選擇的。
在這裏插入圖片描述爲了快速方便安裝,我選擇了下載安裝壓縮包,上傳解壓安裝,現將下載地址給出:https://mirrors.cnnic.cn/apache/
這個地址下基本上有apache的所有開源軟件的各個版本的安裝包,隨意下載吧!

三.虛擬機準備及基礎配置設置

3.1更改主機名映射IP地址和主機名

3.1.1更改主機名

編輯/etc/hostname文件

vi /etc/hostname

將localhost更改爲需要命名的主機名

3.1.2映射IP地址和主機名

編輯/etc/hosts文件

vi /etc/hosts

添加IP地址和主機名對應關係

192.168.5.142 hp1
192.168.5.143 hp2
192.168.5.144 hp3

3.2關閉防火牆

執行防火牆關閉命令,並禁止開機啓動
防火牆的相關命令
停止防火牆:
service iptables stop
啓動防火牆:
service iptables start
重啓防火牆:
service iptables restart
永久關閉防火牆:
chkconfig iptables off

3.3設置時間同步

3.3.1設置hp1的ntp

1.安裝ntp服務

yum -y install ntp    #更改hp1的節點

2.設置ntp服務器

vi /etc/ntp.conf      #註釋掉server 0.centos.pool.ntp.org iburst,在hp1節點新添加自己的ntp服務器,在其他2個節點將ntp服務器指向master節點

*# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp4.aliyun.com iburst
*

3.將其他2臺主機的ntp服務器設置指向hp1

3.4設置主機之間的免密登錄SSH

在所有節點按下列命令執行:

1.產生公鑰和私鑰:ssh-keygen -t rsa (一直回車直即可)
2.將公鑰分發給所有節點(包括本機):  ssh-copy-id -i 主機名
3.測試ssh訪問:    ssh root@主機名

3.5安裝JDK

3.5.1卸載自帶的openjdk

1.查看已安裝的jdk

rpm -qa | grep java
*tzdata-java-2018e-3.el7.noarch
java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
*

2.卸載已安裝的openjdk

rpm -e --nodeps 包名

3.5.2上傳解壓安裝包

1.創建軟件安裝包上傳目錄/opt/package,Jdk安裝目錄/opt/java

[root@hp1 opt]# mkdir package
[root@hp1 opt]# mkdir java

2.將下載好的java安裝包上傳到/opt/package目錄下,並解壓移動到/opt/java

[root@hp1 package]# rpm -ivh jdk-8u221-linux-x64.rpm
[root@hp1 package]# mv jdk-8u221 /opt/java/jdk-1.8

3.5.3配置環境變量

1.配置環境變量

vi /etc/profile

添加以下內容

export HADOOP_HOME=/opt/hadoop/jdk-1.8
export PATH=.:${JAVA_HOME}/bin:$PATH

使配置文件生效

source /etc/profile

3.5.4查看java信息

[root@hp1 /]# java -version
java version "1.8.0_221"
Java(TM) SE Runtime Environment (build 1.8.0_221-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode)

四.Hadoop

4.1安裝Hadoop

4.1.1上傳解壓安裝包

1.創建Hadoop安裝目錄/opt/hadoop

[root@hp1 opt]# mkdir hadoop

2.將下載好的Hadoop安裝包上傳到/opt/package目錄下,並解壓移動到/opt/hadoop

[root@hp1 package]# tar -zxvf hadoop-2.7.7.tar.gz
[root@hp1 package]# mv hadoop-2.7.7 /opt/hadoop/

4.1.2配置環境變量

1.配置環境變量

vi /etc/profile

添加以下內容

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

注意:export PATH= 後面只是在後面添加Hadoop的{HADOOP_HOME}/bin,後面安裝其他的集羣組件也是一樣

使配置文件生效

source /etc/profile

4.1.3更改配置文件

更改配置文件
進入/opt/hadoop/hadoop-2.7.7/etc/hadoop 這個目錄修改
core-site.xml、hadoop-env.sh、hdfs-site.xml、mapred-site.xm、slaves

cd /opt/hadoop/hadoop-2.7.7/etc/hadoop

1.修改core-site.xml

vi core-site.xml

在< configuration >後加入

# 在< configuration >  </configuration>內添加以下內容
 <property>
        <name>fs.default.name</name>
        <value>hdfs://hp1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>

  • fs.default.name(指定hdfs的主端口 namenode要放在哪臺機器上) =》 主機名:端口號
  • hadoop.tmp.dir(臨時變量目錄 data name secondary 如果指定了namenode datanode 可以不配)=》tmp目錄
  • io.file.buffer.size (流緩衝區大小 )=》131702(128M)

2.修改hadoop-env.sh

vi hadoop-env.sh
#修改 export JAVA_HOME=${JAVA_HOME}(jdk的目錄)
export JAVA_HOME=/opt/java/jdk1.8

3.修改hdfs-site.xml

vi hdfs-site.xml
#在<configuration>  </configuration>內添加以下內容
        <!-- namenode數據的存放地點。也就是namenode元數據存放的地方,記錄了hdfs系統中文件的元數據-->
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/opt/hadoop/hadoop-2.7.7/dfs/name</value>
        </property>
 
        <!-- datanode數據的存放地點。也就是block塊存放的目錄了-->
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/opt/hadoop/hadoop-2.7.7/dfs/data</value>
        </property>
    
        <!-- hdfs的副本數設置。也就是上傳一個文件,其分割爲block塊後,每個block的冗餘副本個數-->
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        
        <!-- secondary namenode的http通訊地址-->
        <property>
                <name>dfs.secondary.http.address</name>
                <value>hp1:50090</value>
        </property>
	<property>
		<!-- 開啓hdfs的web訪問接口。默認端口是50070 , 一般不配 , 使用默認值-->
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>

4.修改mapred-site.xml
若無mapred-site.xml文件,複製mapred-site.xml.template重命名爲mapred-site.xml

cp mapred-site.xml.template mapred-site.xml

修改mapred-site.xml

vi mapred-site.xml
#在<configuration>  </configuration>內添加以下內容

	<property>
	<!-- 指定mr框架爲yarn方式,Hadoop二代MP也基於資源管理系統Yarn來運行 -->
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<!-- JobHistory Server ============================================================== -->
    <!-- 配置 MapReduce JobHistory Server 地址 ,默認端口10020 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hp1:10020</value>
    </property>
    <!-- 配置 MapReduce JobHistory Server web ui 地址, 默認端口19888 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hp2:19888</value>
    </property>

5.修改yarn-site.xml

vi yarn-site.xml
#在<configuration>  </configuration>內添加以下內容
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hp1</value>
   </property> 
   <property>
        <!--yarn總管理器的IPC通訊地址-->    
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
   </property>
   <property>
        <!--yarn總管理器調度程序的IPC通訊地址-->
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
   </property>
   <property>
        <!--yarn總管理器的web http通訊地址-->
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
   </property>
   <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
   </property>
   <property>
        <!--yarn總管理器的IPC通訊地址-->
        <value>${yarn.resourcemanager.hostname}:8031</value>
   </property>
   <property>
        <!--yarn總管理器的IPC管理地址-->
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
   </property>
   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
   </property>
   <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <discription>單個任務可申請最大內存,默認8192MB</discription>
   </property>
   <property>
        <!--容器所佔的虛擬內存和物理內存之比。該值指示了虛擬內存的使用可以超過所分配內存的量。默認值是2.1-->
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
   </property>
   <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
</property>
   <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
</property>

6.修改slaves

vi slaves

默認localhost,若是單機模式則不需要改變
我們有2個從節點,所以將localhost改爲從節點的地址

hp2
hp3

4.1.4將主節點(hp1)配置同步到從節點(hp2,hp3)

1.複製環境變量

scp /etc/profile  root@hp2:/etc/
scp /etc/profile  root@hp3:/etc/  

2.複製Hadoop安裝配置文件

scp  -r /opt/hadoop  root@hp2:/opt/
scp  -r /opt/hadoop  root@hp3:/opt/

4.2啓動Hadoop

1.初始化Hadoop(== 注意如果在使用中修改了Hadoop的配置文件,就必須重新初始化==)

cd /opt/hadoop/hadoop-2.7.7/bin

到/opt/hadoop/hadoop-2.7.7/bin目錄下執行

./hdfs  namenode  -format

初始化完成後
2.啓動Hadoop(== 只需在主節點hp1啓動從節點會自動啓動的==)
切換到/opt/hadoop/hadoop-2.7.7/sbin目錄

cd /opt/hadoop/hadoop-2.7.7/sbin

執行啓動命令

./start-all.sh
[root@hp1 sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hp1]
hp1: starting namenode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-root-namenode-hp1.out
hp2: starting datanode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-hp2.out
hp3: starting datanode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-hp3.out
Starting secondary namenodes [hp1]
hp1: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-hp1.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.7.7/logs/yarn-root-resourcemanager-hp1.out
hp2: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.7/logs/yarn-root-nodemanager-hp2.out
hp3: starting nodemanager, logging to /opt/hadoop/hadoop-2.7.7/logs/yarn-root-nodemanager-hp3.out

啓動成功
執行jps命令查看各個節點啓動的服務
主節點hp1

[root@hp1 sbin]# jps
25505 SecondaryNameNode
25654 ResourceManager
25304 NameNode
25950 Jps

從節點hp2,hp3

[root@hp2 hadoop]# jps
15364 NodeManager
15492 Jps
15259 DataNode
[root@hp3 hadoop]# jps
15222 DataNode
15335 NodeManager
15463 Jps

通過網頁查看Hadoop啓動狀態是否成功
IP:50070和IP:8088
在這裏插入圖片描述在這裏插入圖片描述

4.3停止Hadoop

執行命令關閉Hadoop

[root@hp1 sbin]# ./stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hp1]
hp1: stopping namenode
hp3: stopping datanode
hp2: stopping datanode
Stopping secondary namenodes [hp1]
hp1: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
hp3: stopping nodemanager
hp2: stopping nodemanager
no proxyserver to stop

Hadoop安裝成功!!!

五.Spark

5.1安裝Scala

5.1.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並加壓重命名到/opt/scala目錄下

tar -zxvf scala-2.12.2.tgz
mv scala-2.12.2 /opt/scala/scala-2.12

5.1.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export SCALA_HOME=/opt/scala/scala-2.12

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:$PATH

使環境變量生效,並查看scala是否安裝成功

[root@hp1 scala-2.12]# source /etc/profile
[root@hp1 scala-2.12]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

Scala安裝成功

5.2安裝Spark

5.2.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/spark目錄下

tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz
mv spark-2.4.4-bin-hadoop2.7 /opt/spark/spark-2.4.4

5.2.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export SPARK_HOME=/opt/spark/spark-2.4.4

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

5.2.3更改配置文件

切換目錄

cd /opt/spark/spark-2.4.4/conf

1.修改spark-env.sh
若沒有spark-env.sh文件,複製spark-env.sh.template文件重命名

cp spark-env.sh.template spark-env.sh

在spark-env.sh文件中添加以下內容

export SCALA_HOME=/opt/scala/scala-2.12
export JAVA_HOME=/opt/java/jdk1.8
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark/spark-2.4.4
export SPARK_MASTER_IP=hp1       
export SPARK_EXECUTOR_MEMORY=2G

SPARK_MASTER_IP spark的主節點IP
SPARK_EXECUTOR_MEMORY spark的運行內存
上面的路徑根據自己實際安裝路徑配置
2.修改slaves
若沒有slaves文件,複製slaves.template文件重命名

cp slaves.template  slaves

修改新的slaves文件
更改原來的localhost爲從節點的主機名(單節點不需要修改)

hp2
hp3

5.2.4將主節點(hp1)的配置同步到從節點(hp2,hp3)

1.同步scala

scp -r /opt/scala root@hp2:/opt/
scp -r /opt/scala root@hp3:/opt/

2.同步spark

scp -r /opt/spark root@hp2:/opt/
scp -r /opt/spark root@hp3:/opt/

3.同步環境變量

scp /etc/profile root@hp2:/etc/
scp /etc/profile root@hp3:/etc/

4.使hp2,hp3的環境變量生效

source /etc/profile

5.3啓動Spark

前提條件:Hadoop已啓動
切換到/opt/spark/spark-2.4.4/sbin

cd /opt/spark/spark-2.4.4/sbin

執行啓動命令

start-all.sh
[root@hp1 sbin]# start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/spark-2.4.4/logs/spark-root-org.apache.spark.deploy.master.Master-1-hp1.out
hp3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-2.4.4/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hp3.out
hp2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-2.4.4/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hp2.out

頁面查看啓動結果
IP:8080查看頁面
在這裏插入圖片描述

5.4停止Spark

stop-all.sh

Spark安裝成功!!!

六.Zookeeper

6.1安裝Zookeeper

6.1.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/zookeeper目錄下

tar -zxvf zookeeper-3.4.14.tar.gz
mv zookeeper-3.4.14 /opt/zookeeper/zookeeper-3.4

6.1.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export ZOOKEEPER_HOME=/opt/zookeeper/zookeeper-3.4

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

6.1.3更改配置文件

1.在集羣上創建以下目錄

mkdir /opt/zookeeper/data
mkdir /opt/zookeeper/datalog

2.在/opt/zookeeper/data目錄下創建文件myid,並在myid添加1

touch myid

注意:記得在主節點同步配置到從節點後修改分別修改myid的內容,hp2.myid對應 2,hp3.myid對應 3
3.修改zoo.cfg文件
在/opt/zookeeper/zookeeper-3.4/conf目錄下,找到zoo.cfg,如果沒有則複製zoo_sample.cfg文件並重命名

cp zoo_sample.cfg zoo.cfg

在zoo.cfg文件註釋掉"dataDir=/tmp/zookeeper"並添加以下內容

dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/dataLog
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

zoo.cfg文件內容如下

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/dataLog
server.1=hp1:2888:3888
server.2=hp2:2888:3888
server.3=hp3:2888:3888

參數解析:

  • tickTime:CS通信心跳時間,單位是毫秒,系統默認是2000毫秒,客戶端與服務器或者服務器與服務器之間維持心跳,也就是每個tickTime時間就會發送一次心跳。通過心跳不僅能夠用來監聽機器的工作狀態,還可以通過心跳來控制Flower跟Leader的通信時間,默認情況下FL的會話時常是心跳間隔的兩倍。
  • initLimit:集羣中的follower服務器(F)與leader服務器(L)之間初始連接時能容忍的最多心跳數(tickTime的數量)
  • syncLimit:集羣中flower服務器(F)跟leader(L)服務器之間的請求和答應最多能容忍的心跳數。
  • clientPort:客戶端連接的接口,客戶端連接zookeeper服務器的端口,zookeeper會監聽這個端口,接收客戶端的請求訪問!這個端口默認是2181
  • dataDir:該屬性對應的目錄是用來存放myid信息跟一些版本,日誌,跟服務器唯一的ID信息等
  • dataLogDir :存放運行日誌的路徑
  • server.1=hp1:2888:3888 :配置集羣信息是存在一定的格式:service.N =YYY: A:B,N:代表服務器編號(也就是myid裏面的值),
    YYY:服務器地址,A:表示 Flower 跟 Leader的通信端口,簡稱服務端內部通信的端口(默認2888),B:表示 是選舉端口(默認是3888)

6.1.4將主節點(hp1)的配置同步到從節點(hp2,hp3)

1.同步環境變量

scp /etc/profile root@hp2:/etc/profile
scp /etc/profile root@hp3:/etc/profile

使環境變量生效

source /etc/profile

2.同步配置文件

scp -r /opt/zookeeper root@hp2:/opt/
scp -r /opt/zookeeper root@hp3:/opt/

修改同步後hp2,hp3的myid文件

6.2啓動Zookeeper

Zookeeper執行的是選舉制度,所以需要在每臺服務器都執行啓動
切換到/opt/zookeeper/zookeeper-3.4/bin目錄下

cd /opt/zookeeper/zookeeper-3.4/bin

執行啓動命令

[root@hp1 bin]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

在所有服務器啓動成功後,查看Zookeeper的狀態(只啓動一個服務器的Zookeeper查看狀態是未運行的)

[root@hp1 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Mode: follower

[root@hp2 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Mode: leader

[root@hp3 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Mode: follower

6.3停止Zookeeper

切換到/opt/zookeeper/zookeeper-3.4/bin目錄下
執行

zkServer.sh stop
[root@hp2 bin]# zkServer.sh stop
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED

注意:當集羣中停掉leader角色的服務器Zookeeper時,會從剩餘運行Zookeeper服務器中重新選舉一個leader角色

Zookeeper安裝成功!!!!

七.Hbase

7.1安裝Hbase

7.1.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/hbase目錄下

tar -zxvf hbase-2.1.7-bin.tar.gz
mv hbase-2.1.7 /opt/hbase/hbase-2.1

7.1.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export HBASE_HOME=/opt/hbase/hbase-2.1

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

7.1.3更改配置文件

切換到/opt/hbase/hbase-2.1/conf目錄下
1.修改hbase-env.sh文件
添加以下內容

export JAVA_HOME=/opt/java/jdk1.8
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HBASE_HOME=/opt/hbase/hbase-2.1
export HBASE_CLASSPATH=/opt/hadoop/hadoop-2.7.7/etc/hadoop
export HBASE_PID_DIR=/root/hbase/pids
export HBASE_MANAGES_ZK=false

"export HBASE_MANAGES_ZK=false"表示是否使用Hbase自帶的Zookeeper,此處我們設置爲false,使用我們安裝的Zookeeper,其他配置根據自己實際情況設置

2.修改hbase-site.xml文件

vi hbase-site.xml

在" < configuration > < /configuration >" 中添加如下內容

<property>
 <name>hbase.rootdir</name>
 <value>hdfs://hp1:9000/hbase</value>
 <description>The directory shared byregion servers.</description>
</property>
 <!-- hbase端口 -->
<property>
 <name>hbase.zookeeper.property.clientPort</name>
 <value>2181</value>
</property>
<!-- 超時時間 -->
<property>
 <name>zookeeper.session.timeout</name>
 <value>120000</value>
</property>
<!--防止服務器時間不同步出錯 -->
<property>
<name>hbase.master.maxclockskew</name>
<value>150000</value>
</property>
<!-- 集羣主機配置 -->
<property>
 <name>hbase.zookeeper.quorum</name>
 <value>hp1,hp2,hp3</value>
</property>
<!--   路徑存放 -->
<property>
 <name>hbase.tmp.dir</name>
 <value>/opt/hbase/tmp</value>
</property>
<!-- true表示分佈式 -->
<property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
</property>
  <!-- 指定master -->
  <property>
    <name>hbase.master</name>
    <value>hp1:60000</value>
  </property>

注意: hbase.rootdir:這個目錄是region server的共享目錄,用來持久化Hbase 。hbase.cluster.distributed :Hbase的運行模式。false是單機模式,true是分佈式模式。若爲false,Hbase和Zookeeper會運行在同一個JVM裏面

3.修改regionservers

vi regionservers

將localhost更改爲hp2,hp3,與hadoop的salves一樣,確認主從關係
修改內容如下:

hp2
hp3

7.1.4將主節點(hp1)的配置同步到從節點(hp2,hp3)

1.同步環境變量

scp /etc/profile root@hp2:/etc/profile
scp /etc/profile root@hp3:/etc/profile

使環境變量生效

source /etc/profile

2.同步配置文件

scp -r /opt/hbase root@hp2:/opt/
scp -r /opt/hbase root@hp3:/opt/

7.2啓動Hbase

在成功啓動Hadoop、zookeeper之後
切換到/opt/hbase/hbase-2.1/bin

cd /opt/hbase/hbase-2.1/bin

執行啓動命令

start-hbase.sh
[root@hp1 bin]# start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /opt/hbase/hbase-2.1/logs/hbase-root-master-hp1.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hp3: running regionserver, logging to /opt/hbase/hbase-2.1/logs/hbase-root-regionserver-hp3.out
hp2: running regionserver, logging to /opt/hbase/hbase-2.1/logs/hbase-root-regionserver-hp2.out

使用jps查看各個服務器的啓動 情況

[root@hp1 bin]# jps
20480 NameNode
20832 ResourceManager
21122 QuorumPeerMain
20677 SecondaryNameNode
21400 Jps
21295 HMaster
##########################
[root@hp2 bin]# jps
18080 NodeManager
18346 HRegionServer
18268 QuorumPeerMain
17997 DataNode
18590 Jps
##########################
[root@hp3 bin]# jps
18049 NodeManager
18534 Jps
18314 HRegionServer
18236 QuorumPeerMain
17965 DataNode

通過web頁面查看hbase啓動情況,服務器地址+16010端口查看:
在這裏插入圖片描述

7.3停止Hbase

在/opt/hbase/hbase-2.1/bin目錄下執行命令:

stop-hbase.sh

hbase安裝成功!!!!!!!!!!!!

八.MySQL

8.1安裝MySQL

8.1.1檢查卸載自帶mariadb

執行命令檢查自帶mariadb安裝情況:

[root@hp1 /]# rpm -qa |grep mariadb
mariadb-libs-5.5.60-1.el7_5.x86_64

執行命令卸載mariadb:

rpm -e mariadb-libs-5.5.60-1.el7_5.x86_64 --nodeps

8.1.2上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/mysql目錄下

tar -zxvf mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz
mv mysql-5.7.27-linux-glibc2.12-x86_64 /opt/mysql/mysql-5.7

8.1.3配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export MYSQL_HOME=/opt/mysql/mysql-5.7

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${MYSQL_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

8.1.4更改配置文件

1.配置my.conf文件
在/etc目錄下新建my.conf文件

vi /etc/my.conf

添加以下內容:

# 設置mysql客戶端默認字符集
default-character-set=utf8
[mysqld]
#跳過權限表校驗
#skip-grant-tables
skip-name-resolve
#設置3306端口
port = 3306
# 設置mysql的安裝目錄
basedir=/opt/mysql/mysql-5.7
# 設置mysql數據庫的數據的存放目錄
datadir=/opt/mysql/mysql-5.7/data
# 允許最大連接數
max_connections=200
# 服務端使用的字符集默認爲8比特編碼的latin1字符集
character-set-server=utf8
# 創建新表時將使用的默認存儲引擎
default-storage-engine=INNODB
lower_case_table_names=1
max_allowed_packet=16M

2.創建數據存儲目錄
mysql配置文件中/opt/mysql/mysql-5.7/data目錄需要手動創建

mkdir /opt/mysql/mysql-5.7/data

3.創建mysql用戶,mysql用戶組

[root@hp1 local]# cd /opt/mysql/mysql-5.7/
[root@hp1 mysql-5.7]# groupadd mysql
[root@hp1 mysql-5.7]# useradd -r -g mysql mysql
[root@hp1 mysql-5.7]# chown -R mysql:mysql /opt/mysql/mysql-5.7/

4.初始化mysql數據庫
到/opt/mysql/mysql-5.7/bin目錄下:

cd /opt/mysql/mysql-5.7/bin

執行以下命令(注意記錄生成的密碼):

./mysqld --initialize --user=mysql --basedir=/opt/mysql/mysql-5.7 --datadir=/opt/mysql/mysql-5.7/data

在這裏插入圖片描述

8.2啓動MySQL

在/opt/mysql/mysql-5.7目錄下執行以下命令:
因爲不是按默認路徑安裝的mysql,所以必須修改/opt/mysql/mysql-5.7/support-files/mysql.server配置文件

basedir=/opt/mysql/mysql-5.7    #添加mysql的安裝目錄
datadir=/opt/mysql/mysql-5.7/data #添加data的目錄

# Default value, in seconds, afterwhich the script should timeout waiting
# for server start.
# Value here is overriden by value in my.cnf.
# 0 means don't wait at all
# Negative numbers mean to wait indefinitely
service_startup_timeout=900

# Lock directory for RedHat / SuSE.
lockdir='/var/lock/subsys'
lock_file_path="$lockdir/mysql"

# The following variables are only set for letting mysql.server find things.

# Set some defaults
mysqld_pid_file_path=
if test -z "$basedir"
then
  basedir=/usr/local/mysql
  bindir=/usr/local/mysql/bin
  if test -z "$datadir"
  then
    datadir=/usr/local/mysql/data
  fi
  sbindir=/usr/local/mysql/bin
  libexecdir=/usr/local/mysql/bin
else
  bindir="$basedir/bin"
  if test -z "$datadir"
  then
    datadir="$basedir/data"
  fi
  sbindir="$basedir/sbin"
  libexecdir="$basedir/libexec"
fi

複製mysql.server文件到/etc/init.d目錄下,並重命名方便後面配置開機自啓動

[root@hp1 mysql-5.7]# cp -a ./support-files/mysql.server  /etc/init.d/mysqld

執行命令:

[root@hp1 support-files]# mysql.server start
Starting MySQL SUCCESS!

若出現報錯:mysqld_safe A mysqld process already exists
執行下列命令,找到並kill掉其中的進程在執行上面的啓動命令:

ps aux |grep mysqld

在這裏插入圖片描述執行命令,進入mysql數據庫:
密碼是上面初始化時生成的那個密碼

[root@hp1 mysql]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.27

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

修改root賬號的密碼:

mysql> set password for 'root'@'localhost'=password('root');
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> exit;
Bye

退出後重新進入,添加數據庫連接權限:

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> update user set host='%' where user='root';    #允許所有ip的主機連接mysql數據庫
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION; #授權遠程主機允許連接mysql數據庫
Query OK, 0 rows affected, 1 warning (0.00 sec)   


mysql> flush privileges;   #刷新MySQL的系統權限相關表­
Query OK, 0 rows affected (0.00 sec)

8.3設置開機自啓動

[root@hp1 mysql-5.7]# chkconfig --add mysqld
[root@hp1 mysql-5.7]# chkconfig mysqld on

在這裏插入圖片描述mysqld 3,4,5狀態爲開說明開機自啓動設置成功

九.Hive

9.1安裝Hive

9.1.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/hive目錄下

tar -zxvf apache-hive-2.3.6-bin.tar.gz
mv apache-hive-2.3.6-bin /opt/hive/hive-2.3

9.1.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export HIVE_HOME=/opt/hive/hive-2.3

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${MYSQL_HOME}/bin:${HIVE_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

9.1.3更改配置文件

1.修改hive-env.sh
若沒有hive-env.sh,複製並重命名/opt/hive/hive-2.3/conf/hive-env.sh.template

cp hive-env.sh.template hive-env.sh

在hive-env.sh文件中添加以下內容:

JAVA_HOME=/opt/java/jdk1.8
HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
HIVE_HOME=/opt/hive/hive-2.3
export HIVE_CONF_DIR=$HIVE_HOME/conf
#export HIVE_AUX_JARS_PATH=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib
#export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"

2.修改hive-site.xml
若沒有hive-site.xml,複製並重命名hive-default.xml.template

cp hive-default.xml.template hive-site.xml

編輯hive-site.xml前,需要新建一些目錄
在hdfs新建目錄

[root@hp1 sbin]# hdfs dfs -mkdir -p /user/hive/warehouse
[root@hp1 sbin]# hdfs dfs -chmod -R 777 /user/hive/warehouse
[root@hp1 sbin]# hdfs dfs -mkdir -p /tmp/hive
[root@hp1 sbin]# hdfs dfs -chmod -R 777 /tmp/hive
[root@hp1 sbin]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2019-12-09 10:49 /hbase
drwxr-xr-x   - root supergroup          0 2019-12-10 10:48 /tmp
drwxr-xr-x   - root supergroup          0 2019-12-10 10:47 /user
[root@hp1 sbin]# hdfs dfs -ls /tmp/
Found 1 items
drwxrwxrwx   - root supergroup          0 2019-12-10 10:48 /tmp/hive
[root@hp1 sbin]# hdfs dfs -ls /user/
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-12-10 10:47 /user/hive
[root@hp1 sbin]# hdfs dfs -ls /user/hive/
Found 1 items
drwxrwxrwx   - root supergroup          0 2019-12-10 10:47 /user/hive/warehouse

創建/opt/hive/hive-2.3/tmp目錄:

[root@hp1 hive-2.3]# mkdir tmp
[root@hp1 hive-2.3]# chmod -R 777 tmp

複製上穿的mysql驅動包到/opt/hive/hive-2.3/lib目錄下:

cp /opt/package/mysql-connector-java-5.1.48-bin.jar /opt/hive/hive-2.3/lib/

編輯hive-site.xml文件如下:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
   <!-- 配置hive的hdfs目錄 -->
   <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. </description>
  </property>
  <!-- 配置hive的本地目錄和用戶名 -->
    <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/opt/hive/hive-2.3/tmp/root</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <!-- 使用mysql存儲元數據,配置mysql的信息 -->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hp1:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false&amp;</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>

9.2啓動Hive

啓動hive之前,必須保證hdfs和mysql啓動
1.Hive數據庫初始化
到/opt/hive/hive-2.3/bin目錄下執行初始化命令:

[root@hp1 bin]# schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/hive-2.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://hp1:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false&
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

2.啓動Hive
執行命令:

[root@hp1 bin]# hive 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/hive-2.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/hive-2.3/lib/hive-common-2.3.6.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

3.測試

hive> create database user_test;
OK
Time taken: 4.32 seconds
hive> use user_test;
OK
hive> create table test_hive(id int, name string)
    > row format delimited fields terminated by '\t'
    > stored as textfile;
OK
Time taken: 0.676 seconds
hive> show tables;
OK
test_hive
Time taken: 0.11 seconds, Fetched: 1 row(s)
hive> desc test_hive;
OK
id                  	int                 	                    
name                	string              	                    
Time taken: 0.084 seconds, Fetched: 2 row(s)
hive> load data local inpath '/lc/test_db.txt' into table test_hive;
Loading data to table user_test.test_hive
OK
Time taken: 1.783 seconds
hive> select * from test_hive;
1	張三 
2	李四
3	王五
4	王麻子
5	張武

測試成功!!!!!!!!!

9.3停止Hive

hive> exit;

十.Flume

10.1安裝Flume

10.1.1上傳加壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/flume目錄下

tar -zxvf apache-flume-1.8.0-bin.tar.gz
mv apache-flume-1.8.0-bin  /opt/flume/flume-1.8

爲了flume與Hadoop的交互,需要將/opt/hadoop/hadoop-2.7.7/share/hadoop/common和/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs目錄下的6個jar複製到/opt/flume/flume-1.8/lib目錄下:

#hadoop-common-2.7.7.jar
cp /opt/hadoop/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar /opt/flume/flume-1.8/lib/
#commons-configuration-1.6.jar
cp  /opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/commons-configuration-1.6.jar /opt/flume/flume-1.8/lib/
#commons-io-2.4.jar
cp  /opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/commons-io-2.4.jar /opt/flume/flume-1.8/lib/
#hadoop-auth-2.7.7.jar
cp /opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/hadoop-auth-2.7.7.jar /opt/flume/flume-1.8/lib/
#hadoop-hdfs-2.7.7.jar
cp  /opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/hadoop-hdfs-2.7.7.jar  /opt/flume/flume-1.8/lib/
#htrace-core-3.1.0-incubating.jar
cp /opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar /opt/flume/flume-1.8/lib/

10.1.3配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export FLUME_HOME=/opt/flume/flume-1.8

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${MYSQL_HOME}/bin:${HIVE_HOME}/bin:${FLUME_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

10.1.3更改配置文件

在/opt/flume/flume-1.8/conf/目錄下
1.編輯flume-env.sh
若沒有flume-env.sh,複製flume-env.sh.template並重命名

cp flume-env.sh.template flume-env.sh
vi flume-env.sh

在 flume-env.sh添加:

export JAVA_HOME=/opt/java/jdk1.8

10.2測試

將本地文件讀到hdfs上
在/opt/flume目錄下創建flume-file-hdfs.conf文件

[root@hp1 flume]# touch flume-file-hdfs.conf

編輯flume-file-hdfs.conf文件添加以下內容:

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
#讀取的文件地址,這兒我設置的是hive的日誌目錄
a2.sources.r2.command = tail -F /tmp/root/hive.log
#腳本的啓動目錄
a2.sources.r2.shell = /bin/bash -c

# Describe the sink
a2.sinks.k2.type = hdfs
#被讀取文件的上傳目錄
a2.sinks.k2.hdfs.path = hdfs://hp1:9000/flume/%Y%m%d/%H
#上傳文件的前綴
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照時間滾動文件夾
a2.sinks.k2.hdfs.round = true
#多少時間單位創建一個新的文件夾
a2.sinks.k2.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#設置文件類型,可支持壓縮
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一個新的文件
a2.sinks.k2.hdfs.rollInterval = 600
#設置每個文件的滾動大小
a2.sinks.k2.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a2.sinks.k2.hdfs.rollCount = 0
#最小冗餘數
a2.sinks.k2.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

在啓動Hadoop的前提下,執行以下命令:

[root@hp1 flume]# /opt/flume/flume-1.8/bin/flume-ng agent --conf /opt/flume/flume-1.8/conf/ --name a2 --conf-file /opt/flume/flume-file-hdfs.conf

執行完畢顯示結果如下:

[root@hp1 flume]# /opt/flume/flume-1.8/bin/flume-ng agent --conf /opt/flume/flume-1.8/conf/ --name a2 --conf-file /opt/flume/flume-file-hdfs.conf
Info: Sourcing environment configuration script /opt/flume/flume-1.8/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/hadoop/hadoop-2.7.7/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/hbase/hbase-2.1/bin/hbase) for HBASE access
錯誤: 找不到或無法加載主類 org.apache.flume.tools.GetJavaProperty
錯誤: 找不到或無法加載主類 org.apache.hadoop.hbase.util.GetJavaProperty
Info: Including Hive libraries found via (/opt/hive/hive-2.3) for Hive access
+ exec /opt/java/jdk1.8/bin/java -Xmx20m -cp '/opt/flume/flume-1.8/conf:/opt/flume/flume-1.8/lib/*:/opt/hadoop/hadoop-2.7.7/etc/hadoop:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-2.7.7/contrib/capacity-scheduler/*.jar:/opt/hbase/hbase-2.1/conf:/opt/java/jdk1.8/lib/tools.jar:/opt/hbase/hbase-2.1:/opt/hbase/hbase-2.1/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.1.7.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase/hbase-2.1/lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hadoop/hadoop-2.7.7/etc/hadoop:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/opt/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-2.7.7/contrib/capacity-scheduler/*.jar:/opt/hadoop/hadoop-2.7.7/etc/hadoop:/opt/hbase/hbase-2.1/conf:/opt/hive/hive-2.3/lib/*' -Djava.library.path=:/opt/hadoop/hadoop-2.7.7/lib/native org.apache.flume.node.Application --name a2 --conf-file /opt/flume/flume-file-hdfs.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flume/flume-1.8/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hive/hive-2.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

在開一個服務器窗口,啓動hive:

[root@hp1 ~]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/hive-2.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/hive-2.3/lib/hive-common-2.3.6.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

在web頁面查看hdfs是否有文件存儲:
在這裏插入圖片描述看到有文件上傳,flume安裝成功!!!!!

十一.Kafka

11.1安裝Kafka

11.1.1上傳解壓安裝包

將下載好的安裝包上傳到/opt/package目錄下,並解壓重命名到/opt/kafka目錄下

tar -zxvf kafka_2.12-2.1.1.tgz
mv kafka_2.12-2.1.1 /opt/kafka/kafka-2.12

11.1.2配置環境變量

編輯/etc/profile文件

vi /etc/profile

添加

export KAFKA_HOME=/opt/kafka/kafka-2.12

更改

export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${MYSQL_HOME}/bin:${HIVE_HOME}/bin:${FLUME_HOME}/bin:${KAFKA_HOME}/bin:$PATH

使環境變量生效

source /etc/profile

11.1.3更改配置文件

到/opt/kafka/kafka-2.12/config目錄下
編輯server.properties文件:

#將0更改爲1,每個節點不能重複
broker.id=1
#將localhost:2181更改爲如下
zookeeper.connect=hp1:2181,hp2:2181,hp3:2181

注意在發送到從節點後分別修改broker.id=2,broker.id=3
配置文件的詳細解析如下:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
#broker的全局唯一編號,不能重複
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
#處理網絡請求的線程數量
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
#用來處理磁盤IO的線程數量
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
#發送套接字的緩衝區大小
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
#接受套接字的緩衝區大小
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
#請求套接字的最大緩衝區大小
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
#kafka消息存放的路徑
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
#topic在當前broker上的分片個數
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
#用來恢復和清理data下數據的線程數量
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#partion buffer中,消息的條數達到閾值,將觸發flush到磁盤
#log.flush.interval.messages=10000


############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
#segment文件保留的最長時間,超時將被刪除
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
#日誌文件中每個segment的大小,默認爲1G
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
#週期性檢查文件大小的時間
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
#broker需要使用zookeeper保存meta數據
zookeeper.connect=hp1:2181,hp2:2181,hp3:2181

# Timeout in ms for connecting to zookeeper
#zookeeper鏈接超時時間
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

11.1.4將主節點(hp1)的配置同步到從節點(hp2,hp3)

1.同步環境變量

scp /etc/profile root@hp2:/etc/profile
scp /etc/profile root@hp3:/etc/profile

使環境變量生效

source /etc/profile

2.同步配置文件

scp -r /opt/kafka root@hp2:/opt/
scp -r /opt/kafka root@hp3:/opt/

注意修改/opt/kafka/kafka-2.12/config/server.properties文件的broker.id

11.2啓動Kafka

1.先全部啓動Zookeeper

#hp1
[root@hp1 bin]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#hp2
[root@hp2 /]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#hp3
[root@hp3 ~]# vi /opt/kafka/kafka-2.12/config/server.properties
[root@hp3 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

2.啓動所有kafka後臺運行
在/opt/kafka/kafka-2.12/目下執行命令:

./bin/kafka-server-start.sh -daemon config/server.properties &

執行完後jps命令查看

#hp1
[root@hp1 kafka-2.12]# jps
23762 NameNode
26916 Jps
26854 Kafka
24504 Application
24123 ResourceManager
23964 SecondaryNameNode
26447 QuorumPeerMain
[2]-  完成                  ./bin/kafka-server-start.sh -daemon config/server.properties
#hp2
[root@hp2 kafka-2.12]# jps
23088 Jps
21031 DataNode
23067 Kafka
21132 NodeManager
22685 QuorumPeerMain
[1]+  完成                  ./bin/kafka-server-start.sh -daemon config/server.properties
#hp3
[root@hp3 kafka-2.12]# ./bin/kafka-server-start.sh -daemon config/server.properties &
[1] 22781
[root@hp3 kafka-2.12]# jps
21027 DataNode
22675 QuorumPeerMain
23053 Kafka
21134 NodeManager
23070 Jps
[1]+  完成                  ./bin/kafka-server-start.sh -daemon config/server.properties

11.3測試

1.在hp1上創建topic-test

[root@hp1 kafka-2.12]# ./bin/kafka-topics.sh --create --zookeeper hp1:2181,hp2:2181,hp3:2181 --replication-factor 3 --partitions 3 --topic test
Created topic "test"

2.在hp1,hp2,hp3上查看已創建的topic列表

[root@hp1 kafka-2.12]# ./bin/kafka-topics.sh --list --zookeeper localhost:2181
test

3.在hp1上啓動生產者

[root@hp1 kafka-2.12]# ./bin/kafka-console-producer.sh --broker-list hp1:9092,hp2:9092,hp3:9092 --topic test

輸入一些內容
在這裏插入圖片描述
4.在其他節點上啓動控制檯消費者

[root@hp2 kafka-2.12]# ./bin/kafka-console-consumer.sh --bootstrap-server hp1:9092,hp2:9092,hp3:9092 --from-beginning --topic test

hp1上輸入的內容會出現
在這裏插入圖片描述

11.4停止Kafka

必須先停止Kafka再停止Zookeeper
在/opt/kafka/kafka-2.12/bin目錄下執行:

kafka-server-stop.sh

Kafka安裝成功!!!!!

十二.其他

參考博文:https://blog.csdn.net/qazwsxpcm/article/details/78937820
參考博文:https://blog.csdn.net/qq_40343117/article/details/100121774

發佈了28 篇原創文章 · 獲贊 9 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章