淺析Broadcast

淺析 Broadcast

主要有三種對象 BroadcastManager、BroadcastFactory 和 Broadcast
- BroadcastManager 負責Broadcast的全局管理
- BroadcastFactory 負責創建或取消Broadcast
- Broadcast 爲實際的一次廣播操作
BroadcastManager 是 BroadcastFactory 的封裝，負責了BroadcastFactory從初始化到 stop 的整個生命週期。
- 初始化階段，會初始化一個 TorrentBroadcastFactory 工廠對象，並initialized 設置爲 true
- 在運行期間，它會調用BroadcastFactory的newBroadcast方法和unbroadcast方法來控制變量的廣播，每次廣播有遞增的唯一IDnextBroadcastId
- 最後它還負責了 BroadcastFactory 的關閉。

Broadcast

TorrentBroadcast 的原理

/**
 * A BitTorrent-like implementation of [[org.apache.spark.broadcast.Broadcast]].
 *
 * The mechanism is as follows:
 *
 * The driver divides the serialized object into small chunks and
 * stores those chunks in the BlockManager of the driver.
 *
 * On each executor, the executor first attempts to fetch the object from its BlockManager. If
 * it does not exist, it then uses remote fetches to fetch the small chunks from the driver and/or
 * other executors if available. Once it gets the chunks, it puts the chunks in its own
 * BlockManager, ready for other executors to fetch from.
 *
 * This prevents the driver from being the bottleneck in sending out multiple copies of the
 * broadcast data (one per executor).
 *
 * When initialized, TorrentBroadcast objects read SparkEnv.get.conf.
 *
 * @param obj object to broadcast
 * @param id A unique identifier for the broadcast variable.
 */

機制如下
- Driver 將序列化對象劃分爲小塊，並將這些小塊存儲在 Driver 的BlockManager中。
- 在每個 executor 上，executor首先嚐試從其BlockManager獲取被廣播對象。如果不存在，則使用遠程抓取從Driver和/或其他executor（如果可用）中獲取廣播對象。一旦獲取了廣播對象，它就會將塊放在自己的BlockManager中，準備好讓其他executor從中獲取。
這段註釋說明了TorrentBroadcast實現的原理，其中關鍵的部分在於利用BlockManager的分佈式結構來儲存和獲取數據塊。

1.Driver把序列化後的對象(即value)分爲很多塊，並且把這些塊存到Driver的BlockManager裏。

2.在 Executor端，Executor首先試圖從自己的BlockManager中獲取被broadcast變量的塊，如果它不存在，就使用遠程抓取從 driver 以及/或者其它的
executor上獲取這個塊。當executor獲取了一個塊，它就把這個塊放在自己的BlockManager裏，以使得其它的 Executor 可以抓取它。

3.這防止了被廣播的數據只從 Driver端被拷貝，這樣當要拷貝的次數很多的時候(每個Executor都會拷貝一次)，Driver端容易成爲瓶頸 .

driver端把數據分塊，每個塊做爲一個block存進driver端的BlockManager，每個executor會試圖獲取所有的塊，來組裝成一個被broadcast的變量。
“獲取塊”的方法是首先從executor自身的BlockManager中獲取，如果自己的BlockManager中沒有這個塊，就從別的BlockManager中獲取。
這樣最初的時候，driver是獲取這些塊的唯一的源，但是隨着各個BlockManager從driver端獲取了不同的塊(TorrentBroadcast會有意避免各個executor以同樣的順序獲取這些塊)，
“塊”的源就多了起來，每個executor就可能從多個源中的一個,包括driver和其它executor的BlockManager中獲取塊，這要就使得流量在整個集羣中更均勻，而不是由driver作爲唯一的源。
原理就是這樣啦，但是TorrentBroadcast的實現有很多有意思的細節，可以仔細分析一下。
Broadcast 就是將數據從一個節點發送到其他各個節點上去
- Driver 端：
  - Driver 先把 data 序列化到 byteArray，然後切割成 BLOCK_SIZE（由 spark.broadcast.blockSize = 4MB設置）大小的 data block。
  - 完成分塊切割後，就將分塊信息（稱爲 meta 信息）存放到 driver 自己的 blockManager 裏面，StorageLevel 爲內存＋磁盤(MEMORY_AND_DISK)，
  - 同時會通知 driver 自己的 blockManagerMaster 說 meta 信息已經存放好。
  - 通知 blockManagerMaster 這一步很重要，因爲 blockManagerMaster 可以被 driver 和所有 executor 訪問到，信息被存放到 blockManagerMaster 就變成了全局信息。
  - 之後將每個分塊 data block 存放到 driver 的 blockManager 裏面，StorageLevel 爲內存＋磁盤。存放後仍然通知 blockManagerMaster 說 blocks 已經存放好。到這一步，driver 的任務已經完成。
- Executor 端：
  - executor 收到 serialized task 後，先反序列化 task，這時候會反序列化 serialized task 中包含的數據類型是 TorrentBroadcast，也就是去調用 TorrentBroadcast.readBroadcastBlock()。
  - 先詢問所在的 executor 裏的 blockManager 是會否包含 data，包含就直接從本地 blockManager 讀取 data。
  - 否則，就通過本地 blockManager 去連接 driver 的 blockManagerMaster 獲取 data 分塊的 meta 信息，獲取信息後，就開始了 BT 過程。

BroadcastManager

BroadcastManager用於將配置信息和序列化後的RDD、Job及ShuffleDependency等信息在本地存儲。如果爲了容災，也會複製到其他節點上。創建BroadcastManager的代碼實現如下。

    // BroadcastManager是用來管理Broadcast，該對象在SparkEnv中創建
    val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)

BroadcastManager除了構造器定義的三個成員屬性外，BroadcastManager內部還有三個成員，分別如下。

initialized : 表示BroadcastManager是否初始化完成的狀態。
broadcastFactory : 廣播工廠實例。
nextBroadcastId : 一個廣播對象的廣播ID，類型爲AtomicLong。

BroadcastManager在其初始化的過程中就會調用自身的initialize方法，當initialize執行完畢，BroadcastManager就正式生效。

  // Called by SparkContext or Executor before using Broadcast
  private def initialize() {
    synchronized {
      if (!initialized) {
        broadcastFactory = new TorrentBroadcastFactory
        broadcastFactory.initialize(isDriver, conf, securityManager)
        initialized = true
      }
    }
  }

上述代碼說明：

initialize方法首先判斷 BroadcastManager 是否已經初始化，以保證BroadcastManager只被初始化一次。
新建TorrentBroadcastFactory作爲BroadcastManager的廣播工廠實例。之後調用TorrentBroadcastFactory的initialize方法對TorrentBroadcastFactory進行初始化。
最後將BroadcastManager自身標記爲初始化完成狀態。

BroadcastManager中的三個方法

  def stop() {
    broadcastFactory.stop()
  }

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean): Broadcast[T] = {
    broadcastFactory.newBroadcast[T](value_, isLocal, nextBroadcastId.getAndIncrement())
  }

  def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
    broadcastFactory.unbroadcast(id, removeFromDriver, blocking)
  }

BroadcastManager的三個方法都分別代理了TorrentBroadcastFactory的對應方法

BroadcastFactory

BroadcastFactory 作爲一個工廠類在 BroadcastManager 中被初始化，目前只有 TorrentBroadcastFactory 一個實現類。

BroadcastFactory 在 BroadcastManager 中以成員變量的方式被聲明

    private var broadcastFactory: BroadcastFactory = null

在 BroadcastManager#initialize()中以 TorrentBroadcastFactory被初始化，可參見 BroadcastManager 的initialize() 方法

  private def initialize() {
    ...
        broadcastFactory = new TorrentBroadcastFactory
    ...
  }

trait BroadcastFactory 有四個方法，其功能分別是：

初始化(initialize)
廣播一個新的變量(newBroadcast)
刪除一個已有的變量(unbroadcast)
關閉BroadcastFactory (關閉)

private[spark] trait BroadcastFactory {

  def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager): Unit

  /**
   * Creates a new broadcast variable.
   *
   * @param value value to broadcast
   * @param isLocal whether we are in local mode (single JVM process)
   * @param id unique id representing this broadcast variable
   */
  def newBroadcast[T: ClassTag](value: T, isLocal: Boolean, id: Long): Broadcast[T]

  def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit

  def stop(): Unit
}