yarn capacity scheduler調度器實踐

簡述

能力調度器在生產實踐中是用的較多的一種模式,今天單機來實踐一下。hadoop版本我這裏選用了3.1.2,spark是用的2.4.3

配置

  • yarn-site.xml
    這裏主要配置yarn.resourcemanager.scheduler.class屬性就好。
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>dc-sit-225</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>  
        <name>yarn.resourcemanager.scheduler.class</name>  
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>  
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>0.0.0.0:8081</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>18432</value>
        <discription>每個節點可用內存,單位MB,默認是8g,這裏調整爲18g</discription>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
        <discription>單個任務可申請最少內存,默認1024MB</discription>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>16384</value>
        <discription>單個任務可申請最大內存,默認8192MB,</discription>
    </property>
</configuration>

  • capacity-scheduler.xml
    我們分別配置三個隊列default、api、dev
<?xml version="1.0" encoding="UTF-8"?>
<configuration> 
    <property> 
        <name>yarn.scheduler.capacity.root.queues</name>  
        <value>default,api,dev</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.capacity</name>  
        <value>100</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.acl_administer_queue</name>  
        <value>root</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.acl_submit_applications</name>  
        <value>root</value> 
    </property>
    <property> 
        <name>yarn.scheduler.capacity.root.default.capacity</name> 
        <value>30</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>  
        <value>35</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.api.capacity</name>  
        <value>45</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.api.maximum-capacity</name>  
        <value>50</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.dev.capacity</name>  
        <value>25</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.dev.maximum-capacity</name>  
        <value>30</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.api.acl_administer_queue</name>  
        <value>root,hadoop1</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.api.acl_submit_applications</name>  
        <value>root,hadoop1</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.dev.acl_administer_queue</name>  
        <value>root,hadoop2</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.root.dev.acl_submit_applications</name>  
        <value>root,hadoop2</value> 
    </property>  
    <property> 
        <name>yarn.scheduler.capacity.resource-calculator</name>  
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> 
    </property>  
</configuration>

  • sbin/start-yarn.sh 及sbin/stop-yarn.sh
    如果是root啓動、上訴文件開頭添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

測試

通過sbin/start-yarn.sh啓動,之後我們使用jps查看ResourceManager 和NodeManager是否存在。如果沒有則需要查看對應的日誌。然後我們打開yarn的web-ui界面,默認端口是8088,我這裏配置的8081。點擊Scheduler如果看到root下有三個子Queue,說明我們的配置是正常的。
在這裏插入圖片描述
之後可以通過spark-shell來指定yarn中的隊列。

[hadoop1@dc-sit-225 spark-2.4.3]$ bin/spark-shell --master yarn-client --queue dev
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
2020-04-01 16:20:13,878 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2020-04-01 16:20:18,334 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://dc-sit-225:4040
Spark context available as 'sc' (master = yarn, app id = application_1585729093592_0002).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/
         
Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :quit
[hadoop1@dc-sit-225 spark-2.4.3]$

然後在ui中查看會發現dev隊列中有任務在執行說明正常。
在這裏插入圖片描述

異常

下面說着過程中遇到的主要異常。

  • 異常1
2020-04-01 15:09:25,370 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/data/server/hadoop-3.1.2/etc/hadoop/capacity-sch
eduler.xml
2020-04-01 15:09:25,380 ERROR org.apache.hadoop.conf.Configuration: error parsing conf java.io.BufferedInputStream@7ceb3185
com.ctc.wstx.exc.WstxParsingException: Illegal processing instruction target ("xml"); xml (case insensitive) is reserved by the specs.
 at [row,col {unknown-source}]: [2,5]
        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:621)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:491)
        at com.ctc.wstx.sr.BasicStreamReader.readPIPrimary(BasicStreamReader.java:4019)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2141)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
        at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3277)
        at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3071)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2964)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2930)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2805)
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:822)
        at org.apache.hadoop.yarn.server.resourcemanager.reservation.ReservationSchedulerConfiguration.<init>(ReservationSchedulerConfiguration.java:64)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.<init>(CapacitySchedulerConfiguration.java:374)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvid
er.java:60)

上面這個是因爲我拷貝網上的配置,可能一些空格編碼或者tab健的影響、我把xml中每行的空格去重,然後手動格式化後正常。

  • 異常2
2020-04-01 16:14:03,503 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize spark2_shuffle
java.lang.RuntimeException: No class defined for spark2_shuffle
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:274)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:318)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:477)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)

上訴是因爲我在yarn.nodemanager.aux-services屬性中多配置了spark2_shuffle的原因,記得之前的hadoop版本是要加這個配置的。現在好像只需要mapreduce_shuffle就可以了。

  • 異常3
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
2020-04-01 16:18:32,393 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2020-04-01 16:18:36,924 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

yarn提交任務後一直卡在這個位置,但是用spark自身的master就不會,resources和spark master均無異常日誌。我們找了下原因才發現是NodeManager並沒有正常啓動,所以yarn一直沒有節點可以分配任務。yarn node -all -list查看節點總數爲0,所以說每做一步後進行測試還是很有必要的。NodeManager沒啓動成功的原因正是異常2中spark2_shuffle配置的原因。

結尾

上述使用到了多個隊列的指定,能夠進行資源隔離。但是用戶層面的隔離還需要藉助Kerberos等身份認證組件。
配置參考 https://www.cnblogs.com/xiaodf/p/6266201.html#221

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章