Oozie --oozie的使用

oozie的使用

KEY

含義

nameNode

HDFS地址

jobTracker

jobTracker(ResourceManager)地址

queueName

Oozie隊列(默認填寫default)

examplesRoot

全局目錄(默認填寫examples)

oozie.usr.system.libpath

是否加載用戶lib目錄(true/false)

oozie.libpath

用戶lib庫所在的位置

oozie.wf.application.path

Oozie流程所在hdfs地址(workflow.xml所在的地址)

user.name

當前用戶

oozie.coord.application.path

Coordinator.xml地址(沒有可以不寫)

oozie.bundle.application.path

Bundle.xml地址(沒有可以不寫)

 

使用oozie調度shell腳本

oozie安裝好了之後,需要測試oozie的功能是否完整好使,官方已經給我們帶了各種測試案例,我們可以通過官方提供的各種案例來對oozie進行調度

第一步解壓官方提供調度案例

oozie自帶了各種案例,我們可以使用oozie自帶的各種案例來作爲模板,所以我們這裏先把官方提供的各種案例給解壓出來

cd /export/servers/oozie-4.1.0-cdh5.14.0

tar -zxf oozie-examples.tar.gz

第二步:創建工作目錄

在任意地方創建一個oozie的工作目錄,以後調度任務的配置文件全部放到oozie的工作目錄當中去

我這裏直接在oozie的安裝目錄下面創建工作目錄

cd /export/servers/oozie-4.1.0-cdh5.14.0

mkdir oozie_works

第三步:拷貝任務模板工作目錄當中去

任務模板以及工作目錄都準備好了之後,我們把shell的任務模板拷貝到我們oozie的工作目錄當中去

cd /export/servers/oozie-4.1.0-cdh5.14.0

cp -r examples/apps/shell/ oozie_works/

 

第四步隨意準備一個shell腳本

cd /export/servers/oozie-4.1.0-cdh5.14.0

vim oozie_works/shell/hello.sh

注意:這個腳本一定要是在我們oozie工作路徑下的shell路徑下的位置

#!/bin/bash

echo "hello world" >> /export/servers/hello_oozie.txt

 

第五步:修改模板下的配置文件

修改job.properties

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/shell

vim job.properties

nameNode=hdfs://node01:8020

jobTracker=node01:8032

queueName=default

examplesRoot=oozie_works

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell

EXEC=hello.sh

修改workflow.xml

vim workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">

<start to="shell-node"/>

<action name="shell-node">

    <shell xmlns="uri:oozie:shell-action:0.2">

        <job-tracker>${jobTracker}</job-tracker>

        <name-node>${nameNode}</name-node>

        <configuration>

            <property>

                <name>mapred.job.queue.name</name>

                <value>${queueName}</value>

            </property>

        </configuration>

        <exec>${EXEC}</exec>

        <!-- <argument>my_output=Hello Oozie</argument> -->

        <file>/user/root/oozie_works/shell/${EXEC}#${EXEC}</file>



        <capture-output/>

    </shell>

    <ok to="end"/>

    <error to="fail"/>

</action>

<decision name="check-output">

    <switch>

        <case to="end">

            ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}

        </case>

        <default to="fail-output"/>

    </switch>

</decision>

<kill name="fail">

    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

</kill>

<kill name="fail-output">

    <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>

</kill>

<end name="end"/>

</workflow-app>

 

第六步上傳調度任務到hdfs上面去

注意:上傳的hdfs目錄爲/user/root,因爲我們hadoop啓動的時候使用的是root用戶,如果hadoop啓動的是其他用戶,那麼就上傳到

/user/其他用戶

cd /export/servers/oozie-4.1.0-cdh5.14.0

hdfs dfs -put oozie_works/ /user/root

 

第七步執行調度任務

通過oozie的命令來執行調度任務

cd /export/servers/oozie-4.1.0-cdh5.14.0

bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/shell/job.properties  -run

從監控界面可以看到任務執行成功了

查看hadoop的19888端口,我們會發現,oozie啓動了一個MR的任務去執行shell腳本

 

使用oozie調度hive

第一步:拷貝hive的案例模板

cd /export/servers/oozie-4.1.0-cdh5.14.0

cp -ra examples/apps/hive2/ oozie_works/

 

第二步:編輯hive模板

這裏使用的是hiveserver2來進行提交任務,需要注意我們要將hiveserver2的服務給啓動起來

hive --service hiveserver2 &

hive --service metastore  &

 

修改job.properties

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/hive2

vim job.properties
nameNode=hdfs://bd001:8020

jobTracker=bd001:8032

queueName=default

jdbcURL=jdbc:hive2://bd001:10000/default

examplesRoot=oozie_works



oozie.use.system.libpath=true

# 配置我們文件上傳到hdfs的保存路徑 實際上就是在hdfs 的/user/root/oozie_works/hive2這個路徑下

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/hive2

修改workflow.xml

vim workflow.xml
<?xml version="1.0" encoding="UTF-8"?>

<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf">

    <start to="hive2-node"/>



    <action name="hive2-node">

        <hive2 xmlns="uri:oozie:hive2-action:0.1">

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <prepare>

                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>

                <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>

            </configuration>

            <jdbc-url>${jdbcURL}</jdbc-url>

            <script>script.q</script>

            <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>

            <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive2</param>

        </hive2>

        <ok to="end"/>

        <error to="fail"/>

    </action>



    <kill name="fail">

        <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end"/>

</workflow-app>

 

編輯hivesql文件

vim script.q
DROP TABLE IF EXISTS test;

CREATE EXTERNAL TABLE default.test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';

insert into test values(10);

insert into test values(20);

insert into test values(30);

第三步:上傳工作文件hdfs

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

hdfs dfs -put hive2/ /user/root/oozie_works/

第四步:執行oozie的調度

cd /export/servers/oozie-4.1.0-cdh5.14.0

bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/hive2/job.properties  -run

 

第五步:查看調度結果

 

 

使用oozie調度MR任務

第一步準備MR執行的數據

我們這裏通過oozie調度一個MR的程序的執行,MR的程序可以是自己寫的,也可以是hadoop工程自帶的,我們這裏就選用hadoop工程自帶的MR程序來運行wordcount的示例

準備以下數據上傳到HDFS的/oozie/input路徑下去

hdfs dfs -mkdir -p /oozie/input

vim wordcount.txt
hello   world   hadoop

spark   hive    hadoop

將數據上傳到hdfs對應目錄

hdfs dfs -put wordcount.txt /oozie/input

 

第二步執行官方測試案例

hadoop jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar   wordcount  /oozie/input/  /oozie/output

第三步準備調度的資源

將需要調度的資源都準備好放到一個文件夾下面去,包括jar包,job.properties,以及workflow.xml。

拷貝MR的任務模板

cd /export/servers/oozie-4.1.0-cdh5.14.0

cp -ra examples/apps/map-reduce/   oozie_works/

 

刪掉MR任務模板lib目錄下自帶的jar包

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib

rm -rf oozie-examples-4.1.0-cdh5.14.0.jar

 

第三步:拷貝的jar包到對應目錄

從上一步的刪除當中,可以看到需要調度的jar包存放在了

/export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib這個目錄下,所以我們把我們需要調度的jar包也放到這個路徑下即可

cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/

第四步:修改配置文件

修改job.properties

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce

vim job.properties

nameNode=hdfs://node01:8020

jobTracker=node01:8032

queueName=default

examplesRoot=oozie_works



oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml

outputDir=/oozie/output

inputdir=/oozie/input

修改workflow.xml

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce

vim workflow.xml
<?xml version="1.0" encoding="UTF-8"?>

<!--

  Licensed to the Apache Software Foundation (ASF) under one

  or more contributor license agreements.  See the NOTICE file

  distributed with this work for additional information

  regarding copyright ownership.  The ASF licenses this file

  to you under the Apache License, Version 2.0 (the

  "License"); you may not use this file except in compliance

  with the License.  You may obtain a copy of the License at

  

       http://www.apache.org/licenses/LICENSE-2.0

  

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License.

-->

<workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">

    <start to="mr-node"/>

    <action name="mr-node">

        <map-reduce>

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <prepare>

                <delete path="${nameNode}/${outputDir}"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>

<!--  

                <property>

                    <name>mapred.mapper.class</name>

                    <value>org.apache.oozie.example.SampleMapper</value>

                </property>

                <property>

                    <name>mapred.reducer.class</name>

                    <value>org.apache.oozie.example.SampleReducer</value>

                </property>

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>

                <property>

                    <name>mapred.input.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>

                </property>

                <property>

                    <name>mapred.output.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>

                </property>

-->



   <!-- 開啓使用新的API來進行配置 -->

                <property>

                    <name>mapred.mapper.new-api</name>

                    <value>true</value>

                </property>



                <property>

                    <name>mapred.reducer.new-api</name>

                    <value>true</value>

                </property>



                <!-- 指定MR的輸出key的類型 -->

                <property>

                    <name>mapreduce.job.output.key.class</name>

                    <value>org.apache.hadoop.io.Text</value>

                </property>



                <!-- 指定MR的輸出的value的類型-->

                <property>

                    <name>mapreduce.job.output.value.class</name>

                    <value>org.apache.hadoop.io.IntWritable</value>

                </property>



                <!-- 指定輸入路徑 -->

                <property>

                    <name>mapred.input.dir</name>

                    <value>${nameNode}/${inputdir}</value>

                </property>



                <!-- 指定輸出路徑 -->

                <property>

                    <name>mapred.output.dir</name>

                    <value>${nameNode}/${outputDir}</value>

                </property>



                <!-- 指定執行的map類 -->

                <property>

                    <name>mapreduce.job.map.class</name>

                    <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>

                </property>



                <!-- 指定執行的reduce類 -->

                <property>

                    <name>mapreduce.job.reduce.class</name>

                    <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>

                </property>

<!--  配置map task的個數 -->

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>



            </configuration>

        </map-reduce>

        <ok to="end"/>

        <error to="fail"/>

    </action>

    <kill name="fail">

        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end"/>

</workflow-app>

第五步:上傳調度任務到hdfs對應目錄

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

hdfs dfs -put map-reduce/ /user/root/oozie_works/

 

第六步:執行調度任務

執行調度任務,然後通過oozie的11000端口進行查看任務結果

cd /export/servers/oozie-4.1.0-cdh5.14.0

bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/map-reduce/job.properties -run

 

 

oozie任務串聯

在實際工作當中,肯定會存在多個任務需要執行,並且存在上一個任務的輸出結果作爲下一個任務的輸入數據這樣的情況,所以我們需要在workflow.xml配置文件當中配置多個action,實現多個任務之間的相互依賴關係

 

需求:首先執行一個shell腳本,執行完了之後再執行一個MR的程序,最後再執行一個hive的程序

第一步:準備工作目錄

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

mkdir -p sereval-actions

 

第二步:準備調度文件

將我們之前的hive,shell,以及MR的執行,進行串聯成到一個workflow當中去,準備資源文件

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

cp hive2/script.q    sereval-actions/

cp shell/hello.sh    sereval-actions/

cp -ra map-reduce/lib    sereval-actions/

 

第三步:開發調度的配置文件

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/sereval-actions

創建配置文件workflow.xml並編輯

vim workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">

<start to="shell-node"/>

<action name="shell-node">

    <shell xmlns="uri:oozie:shell-action:0.2">

        <job-tracker>${jobTracker}</job-tracker>

        <name-node>${nameNode}</name-node>

        <configuration>

            <property>

                <name>mapred.job.queue.name</name>

                <value>${queueName}</value>

            </property>

        </configuration>

        <exec>${EXEC}</exec>

        <!-- <argument>my_output=Hello Oozie</argument> -->

        <file>/user/root/oozie_works/sereval-actions/${EXEC}#${EXEC}</file>



        <capture-output/>

    </shell>

    <ok to="mr-node"/>

    <error to="mr-node"/>

</action>









<action name="mr-node">

        <map-reduce>

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <prepare>

                <delete path="${nameNode}/${outputDir}"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>

<!--  

                <property>

                    <name>mapred.mapper.class</name>

                    <value>org.apache.oozie.example.SampleMapper</value>

                </property>

                <property>

                    <name>mapred.reducer.class</name>

                    <value>org.apache.oozie.example.SampleReducer</value>

                </property>

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>

                <property>

                    <name>mapred.input.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>

                </property>

                <property>

                    <name>mapred.output.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>

                </property>

-->



   <!-- 開啓使用新的API來進行配置 -->

                <property>

                    <name>mapred.mapper.new-api</name>

                    <value>true</value>

                </property>



                <property>

                    <name>mapred.reducer.new-api</name>

                    <value>true</value>

                </property>



                <!-- 指定MR的輸出key的類型 -->

                <property>

                    <name>mapreduce.job.output.key.class</name>

                    <value>org.apache.hadoop.io.Text</value>

                </property>



                <!-- 指定MR的輸出的value的類型-->

                <property>

                    <name>mapreduce.job.output.value.class</name>

                    <value>org.apache.hadoop.io.IntWritable</value>

                </property>



                <!-- 指定輸入路徑 -->

                <property>

                    <name>mapred.input.dir</name>

                    <value>${nameNode}/${inputdir}</value>

                </property>



                <!-- 指定輸出路徑 -->

                <property>

                    <name>mapred.output.dir</name>

                    <value>${nameNode}/${outputDir}</value>

                </property>



                <!-- 指定執行的map類 -->

                <property>

                    <name>mapreduce.job.map.class</name>

                    <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>

                </property>



                <!-- 指定執行的reduce類 -->

                <property>

                    <name>mapreduce.job.reduce.class</name>

                    <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>

                </property>

<!--  配置map task的個數 -->

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>



            </configuration>

        </map-reduce>

        <ok to="hive2-node"/>

        <error to="fail"/>

    </action>













 <action name="hive2-node">

        <hive2 xmlns="uri:oozie:hive2-action:0.1">

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <prepare>

                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>

                <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>

            </configuration>

            <jdbc-url>${jdbcURL}</jdbc-url>

            <script>script.q</script>

            <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>

            <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive2</param>

        </hive2>

        <ok to="end"/>

        <error to="fail"/>

    </action>

<decision name="check-output">

    <switch>

        <case to="end">

            ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}

        </case>

        <default to="fail-output"/>

    </switch>

</decision>

<kill name="fail">

    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

</kill>

<kill name="fail-output">

    <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>

</kill>

<end name="end"/>

</workflow-app>

 

開發job.properties配置文件

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/sereval-actions

vim  job.properties
nameNode=hdfs://bd001:8020

jobTracker=bd001:8032

queueName=default

examplesRoot=oozie_works

EXEC=hello.sh

outputDir=/oozie/output

inputdir=/oozie/input

jdbcURL=jdbc:hive2://bd001:10000/default

oozie.use.system.libpath=true

# 配置我們文件上傳到hdfs的保存路徑 實際上就是在hdfs 的/user/root/oozie_works/sereval-actions這個路徑下

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/sereval-actions/workflow.xml

 

第四步:上傳資源文件夾到hdfs對應路徑

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/

hdfs dfs -put sereval-actions/ /user/root/oozie_works/

第五步:執行調度任務

cd /export/servers/oozie-4.1.0-cdh5.14.0/

bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/sereval-actions/job.properties -run

 

oozie當中定時任務的設置

第一步:拷貝定時任務的調度模板

cd /export/servers/oozie-4.1.0-cdh5.14.0

cp -r examples/apps/cron oozie_works/cron-job

 

第二步拷貝hello.sh腳本

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

cp shell/hello.sh  cron-job/

 

第三步:修改配置文件

修改job.properties

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/cron-job

vim job.properties
nameNode=hdfs://node01:8020

jobTracker=node01:8032

queueName=default

examplesRoot=oozie_works



oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron-job/coordinator.xml

start=2018-08-22T19:20+0800

end=2019-08-22T19:20+0800

EXEC=hello.sh

workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron-job/workflow.xml

修改coordinator.xml

vim coordinator.xml
<!--

oozie的frequency 可以支持很多表達式,其中可以通過定時每分,或者每小時,或者每天,或者每月進行執行,也支持可以通過與linux的crontab表達式類似的寫法來進行定時任務的執行

例如frequency 也可以寫成以下方式

frequency="10 9 * * *"  每天上午的09:10:00開始執行任務

frequency="0 1 * * *"  每天凌晨的01:00開始執行任務

 -->

<coordinator-app name="cron-job" frequency="${coord:minutes(1)}" start="${start}" end="${end}" timezone="GMT+0800"

                 xmlns="uri:oozie:coordinator:0.4">

        <action>

        <workflow>

            <app-path>${workflowAppUri}</app-path>

            <configuration>

                <property>

                    <name>jobTracker</name>

                    <value>${jobTracker}</value>

                </property>

                <property>

                    <name>nameNode</name>

                    <value>${nameNode}</value>

                </property>

                <property>

                    <name>queueName</name>

                    <value>${queueName}</value>

                </property>

            </configuration>

        </workflow>

    </action>

</coordinator-app>

修改workflow.xml

vim workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf">

    <start to="action1"/>

    <action name="action1">

    <shell xmlns="uri:oozie:shell-action:0.2">

        <job-tracker>${jobTracker}</job-tracker>

        <name-node>${nameNode}</name-node>

        <configuration>

            <property>

                <name>mapred.job.queue.name</name>

                <value>${queueName}</value>

            </property>

        </configuration>

        <exec>${EXEC}</exec>

        <!-- <argument>my_output=Hello Oozie</argument> -->

        <file>/user/root/oozie_works/cron-job/${EXEC}#${EXEC}</file>



        <capture-output/>

    </shell>

    <ok to="end"/>

    <error to="end"/>

</action>

    <end name="end"/>

</workflow-app>

 

第四步:上傳到hdfs對應路徑

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

hdfs dfs -put cron-job/ /user/root/oozie_works/

第五步:運行定時任務

cd /export/servers/oozie-4.1.0-cdh5.14.0

bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/cron-job/job.properties -run

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章