spark on yarn的那些事 ---第一篇

原創

蜡笔小吴

2018-09-04 10:42

spark on yarn後一個spark application資源使用情況如何？

在不考慮動態分配spark資源的情況下： 一個spark application程序資源主要分爲兩部分：driver + executor，下面分別以client、cluster模式說明：

client模式：

spark driver啓動在本地，而YARN Application Master啓動在集羣的某個節點中，所以要設置driver的資源必須要在啓動時設定。AM僅用作資源管理。

driver資源：（因爲是本地的JVM程序，並沒有運行在容器中，不能做到cpu資源的隔離）

--driver-memory（也可以使用spark.driver.memory）

AM資源：

spark.yarn.am.cores

spark.yarn.am.memory

spark.yarn.am.memoryOverhead

executor資源：

spark.executor.cores

spark.executor.memory

spark.yarn.executor.memoryOverhead

spark.executor.instances

故而：

一個spark application所使用的資源爲：

cores = spark.yarn.am.cores + spark.executor.cores * spark.executor.instances

memory = spark.yarn.am.memory + spark.yarn.am.memoryOverhead + (spark.executor.memory + spark.yarn.executor.memoryOverhead) * spark.executor.instances + --driver-memory

cluster模式：

spark driver和YARN Application Master運行在同一個JVM中，所以driver的資源參數也意味着控制着YARN AM的資源。通過spark.yarn.submit.waitAppCompletion設置爲false使spark client(運行在本地JVM中)提交完任務就退出，下面將不考慮其資源使用情況：

driver(AM)資源：

spark.driver.cores

spark.driver.memory

spark.yarn.driver.memoryOverhead

executor：

spark.executor.cores

spark.executor.memory

spark.yarn.executor.memoryOverhead

spark.executor.instances

故而：

一個spark application所使用的資源爲：

cores = spark.driver.cores + spark.executor.cores * spark.executor.instances

memory = spark.driver.memory + spark.yarn.driver.memoryOverhead + (spark.executor.memory + spark.yarn.executor.memoryOverhead) * spark.executor.instances

總上所述：

client模式，AM和executor運行在yarn的container中；cluster模式，AM（和spark driver共享JVM）executor運行在yarn的container中，可以享用container的資源隔離機制。

運行在yarn container中的程序資源域值如何？

首先，運行在yarn container中的程序，其可使用的資源域值受container限制，即：

each container mem:

yarn.scheduler.minimum-allocation-mb

yarn.scheduler.maximum-allocation-mb

each container vcore:

yarn.scheduler.minimum-allocation-vcores

yarn.scheduler.maximum-allocation-vcores

PS:

每個物理節點上的可以被nodemanager管理的資源受限於

total container mem:

yarn.nodemanager.resource.memory-mb

total container vcore:

yarn.nodemanager.resource.cpu-vcores(無法限制yarn可以管理的vcore，只是表示這麼多vcore可以用於RM scheduler分配給container的)

必須保證這些值大於單個container的資源使用值。

其次，運行在yarn container中的程序，其可使用的資源域值受自身參數限制。比如說spark的進程要求最小內存512MB，分配到1個core。

3. 分配給executor的core是如何被使用的？

private val tasksPerExecutor = conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

executor將她擁有的全部core，按照每一個任務需要的core數目，分配給這個executor上的任務。

4. container中的內存使用情況大致是怎樣的？

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spark on yarn的那些事 ---第一篇

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

關於接口協議，你必須要知道這些！

FolkMq v1.4.6 發佈（可以內嵌的消息中間件）

一鍵自動化博客發佈工具,用過的人都說好(頭條篇)

01 穩定性（一）如何應對事故並做好覆盤？

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

線程池那些坑爹的參數-核心線程數&最大線程數&工作隊列

京東面試：如何進行JVM調優？

Stream流常用方法總結

ubuntu操作遇到問題

ubuntu下安裝eclipse，配置jdk環境變量仍然報錯

關於java的初始化順序的問題

HBase HA (多HMaster)

Hive:用Java代碼通過JDBC連接Hiveserver

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結