spark版本： 2.0.0

1.概念

spark是分佈式服務，需要涉及到大量的網絡通信以及遠程服務調用(rpc),在1.6前spark使用的是akka實現，但是考慮到akka兼容性問題，最後捨棄，改爲netty。這篇文章就將介紹基於netty的rpc服務是如何運作的。

在前一篇文章中介紹了master的啓動過程，但是其中對rpcEnv這部分介紹的很少，所以我將從上篇文章創建rpcEnv位置說明spark中的服務是如何通信的。

2.rpc實現

2.1 rpc服務端實現

在master啓動中有這樣一句代碼val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)用於創建rpcEnv環境，現在我們就來深入瞭解一下這段代碼究竟在幹什麼?

RpcEnv.scala
------------------
  
  def create(
      name: String,
      host: String,
      port: Int,
      conf: SparkConf,
      securityManager: SecurityManager,
      // clientMode=false,因爲啓動服務方一定是服務端
      clientMode: Boolean = false): RpcEnv = {
      // 封裝rpcEnv配置對象
    val config = RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
    // 使用基於netty的rpcEnv工廠（工廠模式），不過這裏爲了擴展方便可以使用反射方式創建對象
    new NettyRpcEnvFactory().create(config)
  }

上面核心代碼：new NettyRpcEnvFactory().create(config)

NettyRpcEnv.scala
--------------------


// 創建RpcEnv對象
  def create(config: RpcEnvConfig): RpcEnv = {
    val sparkConf = config.conf
    // Use JavaSerializerInstance in multiple threads is safe. However, if we plan to support
    // KryoSerializer in future, we have to use ThreadLocal to store SerializerInstance
    // 序列化方式，更好的方式：通過反射創建
    val javaSerializerInstance =
      new JavaSerializer(sparkConf).newInstance().asInstanceOf[JavaSerializerInstance]
    // 創建NettyRpcEnv 【1】
    val nettyEnv =
      new NettyRpcEnv(sparkConf, javaSerializerInstance, config.host, config.securityManager)
    // 如果是服務端，需要啓動服務 【2】
    if (!config.clientMode) {
      // 根據端口啓動服務
      val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort =>
        nettyEnv.startServer(actualPort)
        (nettyEnv, nettyEnv.address.port)
      }
      try {
        // 啓動服務
        Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1
      } catch {
        case NonFatal(e) =>
          nettyEnv.shutdown()
          throw e
      }
    }
    nettyEnv
  }
}

這段代碼是非常關鍵的，所以分爲了兩個主要部分：
【1】創建NettyRpcEnv
【2】啓動服務端

現在依次介紹這兩個部分：
【1】

NettyRpcEnv.class
-------------------------

在創建NettyRpcEnv對象時，需要關注以下主要屬性：
  // （1） 將sparkConf轉爲SparkTransportConf（傳輸配置對象）
  private[netty] val transportConf = SparkTransportConf.fromSparkConf(
    conf.clone.set("spark.rpc.io.numConnectionsPerPeer", "1"),
    "rpc",
    conf.getInt("spark.rpc.io.threads", 0))
  
  // （2）分發消息
  private val dispatcher: Dispatcher = new Dispatcher(this)
  
  // （3）處理數據流
  private val streamManager = new NettyStreamManager(this)
  
  // （4）傳輸數據上下文
  private val transportContext = new TransportContext(transportConf,
    new NettyRpcHandler(dispatcher, this, streamManager))

（1）SparkTransportConf就是專門用來處理傳輸的配置對象
（2）在介紹master的啓動過程中，也說過dispatcher的registerRpcEndpoint方法用於註冊endpoint,並將endpoint信息記錄到endpoints，endpointRefs兩個主要屬性中，還有註冊的時候inbox會添加一個message=OnStart,用於觸發調用endpoint.onStart方法
（3） NettyStreamManager是專門用來處理文件，jar包，目錄等數據流
（4）transportContext對象中主要包含以下屬性：

  private final TransportConf conf; // 傳輸配置
  private final RpcHandler rpcHandler; // 消息處理對象，比如將字節流轉爲RequestMessage對象

  private final MessageEncoder encoder; // 消息編碼
  private final MessageDecoder decoder; // 消息解碼

【2】如果config.clientMode==false,將會調用nettyEnv.startServer(actualPort)啓動服務端【在獲取actualPort有一些特殊處理，如果指定的端口被佔用，會嘗試獲取新的端口】

NettyRpcEnv.class
----------------------

 /**
    * 開啓服務
    * @param port 服務端口
    */
  def startServer(port: Int): Unit = {
    val bootstraps: java.util.List[TransportServerBootstrap] =
      if (securityManager.isAuthenticationEnabled()) {
        java.util.Arrays.asList(new SaslServerBootstrap(transportConf, securityManager))
      } else {
        java.util.Collections.emptyList()
      }
    // 啓動通信服務
    server = transportContext.createServer(host, port, bootstraps)
    // 註冊校驗endpoint,可以參考master endpoint,這裏不做進一步分析
    dispatcher.registerRpcEndpoint(
      RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
  }

上面代碼先分析一下：transportContext.createServer(host, port, bootstraps)最終會調用new TransportServer(this, host, port, rpcHandler, bootstraps)所以我們來看一下TransportServer實例化過程：

  public TransportServer(
      TransportContext context,
      String hostToBind,
      int portToBind,
      RpcHandler appRpcHandler,
      List<TransportServerBootstrap> bootstraps) {
    this.context = context;
    this.conf = context.getConf();
    this.appRpcHandler = appRpcHandler;
    this.bootstraps = Lists.newArrayList(Preconditions.checkNotNull(bootstraps));

    try {
       //初始化netty服務
      init(hostToBind, portToBind);
    } catch (RuntimeException e) {
      JavaUtils.closeQuietly(this);
      throw e;
    }
  }

在初始化netty服務有很多操作，但是這些都是netty server創建最基礎的代碼，所以不多介紹，唯一要強調的是這段代碼

  private void init(String hostToBind, int portToBind) {
  ......
    // 添加消息處理器
    bootstrap.childHandler(new ChannelInitializer<SocketChannel>() {
      @Override
      protected void initChannel(SocketChannel ch) throws Exception {
      // 這裏的appRpcHandler根據前面的調用可以知道是NettyRpcHandler對象***
        RpcHandler rpcHandler = appRpcHandler;
        // 在rpcHandler上封裝多層處理，比如Sasl認證（裝飾器模式）
        for (TransportServerBootstrap bootstrap : bootstraps) {
          rpcHandler = bootstrap.doBootstrap(ch, rpcHandler);
        }
         // 初始化消息處理器（在messageDecoder之後處理）
        context.initializePipeline(ch, rpcHandler);
      }
    });

接着分析一下context.initializePipeline這個方法用於channel.pipeline()中添加一個handler

TransportContext.java
--------------------------


  public TransportChannelHandler initializePipeline(
      SocketChannel channel,
      RpcHandler channelRpcHandler) {
    try {
      TransportChannelHandler channelHandler = createChannelHandler(channel, channelRpcHandler);
      channel.pipeline()
        .addLast("encoder", encoder)
              // 用於處理粘包拆包
        .addLast(TransportFrameDecoder.HANDLER_NAME, NettyUtils.createFrameDecoder())
              // 消息解碼
        .addLast("decoder", decoder)
              // 當連接的空閒時間（讀或者寫）太長時，將會觸發一個 IdleStateEvent 事件。然後，你可以通過你的 ChannelInboundHandler 中重寫 userEventTrigged 方法來處理該事件。
              // 所以TransportChannelHandler中添加了userEventTrigged方法
        .addLast("idleStateHandler", new IdleStateHandler(0, 0, conf.connectionTimeoutMs() / 1000))
        // NOTE: Chunks are currently guaranteed to be returned in the order of request, but this
        // would require more logic to guarantee if this were not part of the same event loop.
        .addLast("handler", channelHandler);
      return channelHandler;
    } catch (RuntimeException e) {
      logger.error("Error while initializing Netty pipeline", e);
      throw e;
    }
  }

通過channel.pipeline()的添加流程，可以使用下圖表示：
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-jfH10Lil-1574436254221)(9B03E550A367455980F72BA3AD214F42)]
(IdleStateHandler處理省略)
爲了方便理解，介紹其中比較重要的三個handler: MessageEncoder，MessageDecoder，TransportChannelHandler

(1) MessageEncoder : 其encode方法可知，編碼器會提取Message對象的type和data等信息編碼成一個字節數組

(2) MessageDecoder: 它和MessageEncoder正好相反，會將字節數組轉爲Message對象

  private Message decode(Message.Type msgType, ByteBuf in) {
    switch (msgType) {
      case ChunkFetchRequest:
        return ChunkFetchRequest.decode(in);

      case ChunkFetchSuccess:
        return ChunkFetchSuccess.decode(in);
    ......

(3)TransportChannelHandler:

  @Override
  public void channelRead0(ChannelHandlerContext ctx, Message request) throws Exception {
   // 區分消息類型
    if (request instanceof RequestMessage) {
      requestHandler.handle((RequestMessage) request); // 具體處理查看：TransportRequestHandler.handle
    } else {
      responseHandler.handle((ResponseMessage) request); // 具體處理查看：TransportResponseHandler.handle
    }
  }

TransportChannelHandler在處理消息過程中主要區分了三種消息類型：RPC消息、ChunkFetch消息以及Stream消息。

RPC消息用於抽象所有spark中涉及到RPC操作時需要傳輸的消息，通常這類消息很小，一般都是些控制類消息
ChunkFetch消息用於抽象所有spark中涉及到數據拉取操作時需要傳輸的消息，它用於shuffle數據以及RDD Block數據傳輸
Stream消息很簡單，主要用於driver到executor傳輸jar、file文件等

這裏着重介紹RPC消息的處理

（1）RequestMessage處理

TransportRequestHandler.java
------------------------
  
  /**
   * 處理rpc請求
   * @param req
   */
  private void processRpcRequest(final RpcRequest req) {
    try {
      // rpcHandler=NettyRpcHandler
      // 核心邏輯： rpcHandler.receive
      rpcHandler.receive(reverseClient, req.body().nioByteBuffer(), new RpcResponseCallback() {
        @Override
        public void onSuccess(ByteBuffer response) {
          respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
        }

        @Override
        public void onFailure(Throwable e) {
          respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
        }
      });
    } catch (Exception e) {
      logger.error("Error while invoking RpcHandler#receive() on RPC id " + req.requestId, e);
      respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
    } finally {
      req.body().release();
    }
  }
  
 NettyRpcHandler.java
---------------------------

 // 接收並處理請求
 override def receive(
      client: TransportClient,
      message: ByteBuffer,
      callback: RpcResponseCallback): Unit = {
    // ByteBuffer => requestMessage
    val messageToDispatch = internalReceive(client, message)
    // 分發請求信息，前面已經介紹了，最後會放到receivers觸發endpoint請求處理
    dispatcher.postRemoteMessage(messageToDispatch, callback)
  }

（2）ResponseMessage處理

 public void handle(ResponseMessage message) throws Exception {
 ....
else if (message instanceof RpcResponse) {
        // 處理RpcResponse類型
      RpcResponse resp = (RpcResponse) message;
      RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
      if (listener == null) {
        logger.warn("Ignoring response for RPC {} from {} ({} bytes) since it is not outstanding",
          resp.requestId, remoteAddress, resp.body().size());
      } else {
        outstandingRpcs.remove(resp.requestId);
        try {
            // 通過listener發送成功信息
          listener.onSuccess(resp.body().nioByteBuffer());
        } finally {
          resp.body().release();
        }
      }
    } else if (message instanceof RpcFailure) {
        // 處理RpcFailure類型
      RpcFailure resp = (RpcFailure) message;
      RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
      if (listener == null) {
        logger.warn("Ignoring response for RPC {} from {} ({}) since it is not outstanding",
          resp.requestId, remoteAddress, resp.errorString);
      } else {
        outstandingRpcs.remove(resp.requestId);
        // 通過listener發送失敗信息
        listener.onFailure(new RuntimeException(resp.errorString));
      }
    }

2.2 rpc客戶端實現

在前面的master啓動分析中，我們分析過這行代碼// 向Master的通信終端發送請求，獲取綁定的端口號 val portsResponse = masterEndpoint.askWithRetry[BoundPortsResponse](BoundPortsRequest)，它的作用就是通過拿到master endpint的引用請求master的服務，也就是以master client的形式請求數據。上次我們介紹到了請求時分兩種情況，其一remoteAddr == address(本機)相當於直接往inbox中添加requestMessage這種比較簡單，現在來介紹第二種形式，如果是遠程服務端怎麼處理呢？

首先會創建一個RpcOutboxMessage對象，然後將它添加到outbox中，如果本地已經創建了和遠程服務端的連接直接請求

NettyRpcEnv.scala
---------------------


      // 封裝rpc請求對象
        val rpcMessage = RpcOutboxMessage(serialize(message),
          onFailure,
          (client, response) => onSuccess(deserialize[Any](client, response)))
        //
        postToOutbox(message.receiver, rpcMessage)
    
 /**
    * 添加發送消息到outbox中
    * @param receiver
    * @param message
    */
  private def postToOutbox(receiver: NettyRpcEndpointRef, message: OutboxMessage): Unit = {
    if (receiver.client != null) {
      // 如果有接收端的連接，直接發送數據。第一次的時候receiver.client=null
      message.sendWith(receiver.client)
    } else {
      require(receiver.address != null,
        "Cannot send message to client endpoint with no listen address.")
        // 一個遠程服務地址對應一個outbox
      val targetOutbox = {
        // 查找是不是保存過這個client對應的outbox
        val outbox = outboxes.get(receiver.address)
        if (outbox == null) {
          // 如果沒有對應的outbox,創建一個
          val newOutbox = new Outbox(this, receiver.address)
          val oldOutbox = outboxes.putIfAbsent(receiver.address, newOutbox)
          if (oldOutbox == null) {
            newOutbox
          } else {
            oldOutbox
          }
        } else {
          outbox
        }
      }
      if (stopped.get) {
        // It's possible that we put `targetOutbox` after stopping. So we need to clean it.
        // 從outbox集合中移除，並停止
        outboxes.remove(receiver.address)
        targetOutbox.stop()
      } else {
        // 發送消息到接收端
        targetOutbox.send(message)
      }
    }
  }

targetOutbox.send發送消息代碼，比較簡單就是判斷當前outbox是否已經停止。

OutBox.scala
--------------------------

 def send(message: OutboxMessage): Unit = {
    val dropped = synchronized {
      if (stopped) {
        true
      } else {
        // 添加消息到outboxMessage集合中
        messages.add(message)
        false
      }
    }
    if (dropped) {
      message.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
    } else {
      // 處理outbox消息
      drainOutbox()
    }
  }

真正的處理消息邏輯就是從drainOutbox方法開始的，現在我們看一下它的具體實現過程：如果如果client沒有存在需要創建（重點），如果已經存在就將現有的所有消息，全部發送到遠程服務端。

OutBox.scala
----------------------


 private def drainOutbox(): Unit = {
    var message: OutboxMessage = null
    synchronized {
      if (stopped) {
        return
      }
      if (connectFuture != null) {
        // 如果有一個連接在處理，直接返回
        // We are connecting to the remote address, so just exit
        return
      }
      if (client == null) {
        // There is no connect task but client is null, so we need to launch the connect task.
        launchConnectTask()
        return
      }
      if (draining) {
        // There is some thread draining, so just exit
        return
      }
      message = messages.poll()
      if (message == null) {
        return
      }
      // 正在處理
      draining = true
    }
    // 一直消費到，outbox隊列中沒有數據
    while (true) {
      try {
        val _client = synchronized { client }
        if (_client != null) {
          // 發送到接收端
          message.sendWith(_client)
        } else {
          assert(stopped == true)
        }
      } catch {
        case NonFatal(e) =>
          handleNetworkFailure(e)
          return
      }
      synchronized {
        if (stopped) {
          return
        }
        // 再獲取一條消息
        message = messages.poll()
        if (message == null) {
          // 如果沒有消息直接返回
          draining = false
          return
        }
      }
    }
  }

由於我們探究的是rpc的客戶端，所以需要重點關注一下launchConnectTask方法。

OutBox.scala
----------------------


 private def launchConnectTask(): Unit = {
    connectFuture = nettyEnv.clientConnectionExecutor.submit(new Callable[Unit] {

      override def call(): Unit = {
        try {
          // 創建一個連接address的客戶端
          val _client = nettyEnv.createClient(address)
          outbox.synchronized {
            client = _client
            if (stopped) {
              closeClient()
            }
          }
        } catch {
          case ie: InterruptedException =>
            // exit
            return
          case NonFatal(e) =>
            outbox.synchronized { connectFuture = null }
            handleNetworkFailure(e)
            return
        }
        outbox.synchronized { connectFuture = null }
        // It's possible that no thread is draining now. If we don't drain here, we cannot send the
        // messages until the next message arrives.
        // 創建完成之後，重新消費
        drainOutbox()
      }
    })
  }

上面的核心方法就是這句：nettyEnv.createClient(address)，接下來會比較麻煩，請做好準備,nettyEnv.createClient最終會調用TransportClientFactory.createClient方法。這裏主要使用判斷是不是存在遠程服務緩存，如果有直接返回，如果沒有就使用TransportClientFactory.createClient(resolvedAddress)的方式創建。而創建邏輯和服務端的非常相似，所以直接看註釋就可以了。

TransportClientFactory.java
-----------------------------

 public TransportClient createClient(String remoteHost, int remotePort) throws IOException {
    // 將host,port封裝成InetSocketAddress對象
    final InetSocketAddress unresolvedAddress =
      InetSocketAddress.createUnresolved(remoteHost, remotePort);
    // 判斷連接池中是不是存在和該遠程服務器的連接
    ClientPool clientPool = connectionPool.get(unresolvedAddress);
    if (clientPool == null) {
      // 創建連接池
      connectionPool.putIfAbsent(unresolvedAddress, new ClientPool(numConnectionsPerPeer));
      clientPool = connectionPool.get(unresolvedAddress);
    }
    // 從緩存連接池中隨機獲取一個連接
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
    // 如果連接是有效的
    if (cachedClient != null && cachedClient.isActive()) {
      // 獲取TransportChannelHandler的傳輸處理器
      TransportChannelHandler handler = cachedClient.getChannel().pipeline()
        .get(TransportChannelHandler.class);
      synchronized (handler) {
        // 更新最後一次處理時間
        handler.getResponseHandler().updateTimeOfLastRequest();
      }

      if (cachedClient.isActive()) {
        logger.trace("Returning cached connection to {}: {}",
          cachedClient.getSocketAddress(), cachedClient);
        return cachedClient;
      }
    }
    // 如果緩存連接池中不存在與該遠程服務器的連接，需要重新創建一個
    final long preResolveHost = System.nanoTime();
    final InetSocketAddress resolvedAddress = new InetSocketAddress(remoteHost, remotePort);
    final long hostResolveTimeMs = (System.nanoTime() - preResolveHost) / 1000000;
    // 判斷nds解析時間是不是超時（最終改爲配置）
    if (hostResolveTimeMs > 2000) {
      logger.warn("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
    } else {
      logger.trace("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
    }

    // 更新隨機位置的客戶端連接對象
    synchronized (clientPool.locks[clientIndex]) {
      cachedClient = clientPool.clients[clientIndex];

      if (cachedClient != null) {
        if (cachedClient.isActive()) {
          logger.trace("Returning cached connection to {}: {}", resolvedAddress, cachedClient);
          return cachedClient;
        } else {
          logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
        }
      }
      // 創建遠程連接對象，並修改對應位置的客戶端連接池
      clientPool.clients[clientIndex] = createClient(resolvedAddress);
      return clientPool.clients[clientIndex];
    }
  }
  
  /**
   * 創建rpc客戶端的真正代碼
   */
 private TransportClient createClient(InetSocketAddress address) throws IOException {
    logger.debug("Creating new connection to " + address);

    Bootstrap bootstrap = new Bootstrap();
    bootstrap.group(workerGroup)
      .channel(socketChannelClass)
      .option(ChannelOption.TCP_NODELAY, true)
      .option(ChannelOption.SO_KEEPALIVE, true)
      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
      .option(ChannelOption.ALLOCATOR, pooledAllocator);

    final AtomicReference<TransportClient> clientRef = new AtomicReference<>();
    final AtomicReference<Channel> channelRef = new AtomicReference<>();
    // 添加處理，最終調用和服務端的處理器一樣的方法
    bootstrap.handler(new ChannelInitializer<SocketChannel>() {
      @Override
      public void initChannel(SocketChannel ch) {
        TransportChannelHandler clientHandler = context.initializePipeline(ch);
        // 修改最新的客戶端引用，方便到匿名內部類之外調用
        clientRef.set(clientHandler.getClient());
        channelRef.set(ch);
      }
    });

    long preConnect = System.nanoTime();
    ChannelFuture cf = bootstrap.connect(address);
    if (!cf.awaitUninterruptibly(conf.connectionTimeoutMs())) {
      throw new IOException(
        String.format("Connecting to %s timed out (%s ms)", address, conf.connectionTimeoutMs()));
    } else if (cf.cause() != null) {
      throw new IOException(String.format("Failed to connect to %s", address), cf.cause());
    }

    TransportClient client = clientRef.get();
    Channel channel = channelRef.get();
    assert client != null : "Channel future completed successfully with null client";

    long preBootstrap = System.nanoTime();
    logger.debug("Connection to {} successful, running bootstraps...", address);
    // 使用裝飾器模式添加多個處理器邏輯
    try {
      for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
        clientBootstrap.doBootstrap(client, channel);
      }
    } catch (Exception e) { 
      long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 1000000;
      logger.error("Exception while bootstrapping client after " + bootstrapTimeMs + " ms", e);
      client.close();
      throw Throwables.propagate(e);
    }
    long postBootstrap = System.nanoTime();

    logger.info("Successfully created connection to {} after {} ms ({} ms spent in bootstraps)",
      address, (postBootstrap - preConnect) / 1000000, (postBootstrap - preBootstrap) / 1000000);

    return client;
  }

參考文章：https://www.cnblogs.com/xia520pi/p/8693966.html

spark源碼解析-rpc原理

1.概念

2.rpc實現

2.1 rpc服務端實現

2.2 rpc客戶端實現

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

九大排序算法-冒泡排序

九大排序算法-插入排序

zookeeper源碼解析-四字命令

九大排序算法-歸併排序

zookeeper原理-一致性

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結