spark源碼解析-rpc原理

spark版本: 2.0.0

1.概念

spark是分佈式服務,需要涉及到大量的網絡通信以及遠程服務調用(rpc),在1.6前spark使用的是akka實現,但是考慮到akka兼容性問題,最後捨棄,改爲netty。這篇文章就將介紹基於netty的rpc服務是如何運作的。

在前一篇文章中介紹了master的啓動過程,但是其中對rpcEnv這部分介紹的很少,所以我將從上篇文章創建rpcEnv位置說明spark中的服務是如何通信的。

2.rpc實現

2.1 rpc服務端實現

在master啓動中有這樣一句代碼val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)用於創建rpcEnv環境,現在我們就來深入瞭解一下這段代碼究竟在幹什麼?

RpcEnv.scala
------------------
  
  def create(
      name: String,
      host: String,
      port: Int,
      conf: SparkConf,
      securityManager: SecurityManager,
      // clientMode=false,因爲啓動服務方一定是服務端
      clientMode: Boolean = false): RpcEnv = {
      // 封裝rpcEnv配置對象
    val config = RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
    // 使用基於netty的rpcEnv工廠(工廠模式),不過這裏爲了擴展方便可以使用反射方式創建對象
    new NettyRpcEnvFactory().create(config)
  }

上面核心代碼:new NettyRpcEnvFactory().create(config)

NettyRpcEnv.scala
--------------------


// 創建RpcEnv對象
  def create(config: RpcEnvConfig): RpcEnv = {
    val sparkConf = config.conf
    // Use JavaSerializerInstance in multiple threads is safe. However, if we plan to support
    // KryoSerializer in future, we have to use ThreadLocal to store SerializerInstance
    // 序列化方式,更好的方式:通過反射創建
    val javaSerializerInstance =
      new JavaSerializer(sparkConf).newInstance().asInstanceOf[JavaSerializerInstance]
    // 創建NettyRpcEnv 【1】
    val nettyEnv =
      new NettyRpcEnv(sparkConf, javaSerializerInstance, config.host, config.securityManager)
    // 如果是服務端,需要啓動服務 【2】
    if (!config.clientMode) {
      // 根據端口啓動服務
      val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort =>
        nettyEnv.startServer(actualPort)
        (nettyEnv, nettyEnv.address.port)
      }
      try {
        // 啓動服務
        Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1
      } catch {
        case NonFatal(e) =>
          nettyEnv.shutdown()
          throw e
      }
    }
    nettyEnv
  }
}

這段代碼是非常關鍵的,所以分爲了兩個主要部分:
【1】 創建NettyRpcEnv
【2】 啓動服務端

現在依次介紹這兩個部分:
【1】

NettyRpcEnv.class
-------------------------

在創建NettyRpcEnv對象時,需要關注以下主要屬性:
  // (1) 將sparkConf轉爲SparkTransportConf(傳輸配置對象)
  private[netty] val transportConf = SparkTransportConf.fromSparkConf(
    conf.clone.set("spark.rpc.io.numConnectionsPerPeer", "1"),
    "rpc",
    conf.getInt("spark.rpc.io.threads", 0))
  
  // (2)分發消息
  private val dispatcher: Dispatcher = new Dispatcher(this)
  
  // (3)處理數據流
  private val streamManager = new NettyStreamManager(this)
  
  // (4)傳輸數據上下文
  private val transportContext = new TransportContext(transportConf,
    new NettyRpcHandler(dispatcher, this, streamManager))  

(1)SparkTransportConf就是專門用來處理傳輸的配置對象
(2)在介紹master的啓動過程中,也說過dispatcher的registerRpcEndpoint方法用於註冊endpoint,並將endpoint信息記錄到endpoints,endpointRefs兩個主要屬性中,還有註冊的時候inbox會添加一個message=OnStart,用於觸發調用endpoint.onStart方法
(3) NettyStreamManager是專門用來處理文件,jar包,目錄等數據流
(4)transportContext對象中主要包含以下屬性:

  private final TransportConf conf; // 傳輸配置
  private final RpcHandler rpcHandler; // 消息處理對象,比如將字節流轉爲RequestMessage對象

  private final MessageEncoder encoder; // 消息編碼
  private final MessageDecoder decoder; // 消息解碼

【2】如果config.clientMode==false,將會調用nettyEnv.startServer(actualPort)啓動服務端【在獲取actualPort有一些特殊處理,如果指定的端口被佔用,會嘗試獲取新的端口】

NettyRpcEnv.class
----------------------

 /**
    * 開啓服務
    * @param port 服務端口
    */
  def startServer(port: Int): Unit = {
    val bootstraps: java.util.List[TransportServerBootstrap] =
      if (securityManager.isAuthenticationEnabled()) {
        java.util.Arrays.asList(new SaslServerBootstrap(transportConf, securityManager))
      } else {
        java.util.Collections.emptyList()
      }
    // 啓動通信服務
    server = transportContext.createServer(host, port, bootstraps)
    // 註冊校驗endpoint,可以參考master endpoint,這裏不做進一步分析
    dispatcher.registerRpcEndpoint(
      RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
  }
  

上面代碼先分析一下:transportContext.createServer(host, port, bootstraps)最終會調用new TransportServer(this, host, port, rpcHandler, bootstraps)所以我們來看一下TransportServer實例化過程:

  public TransportServer(
      TransportContext context,
      String hostToBind,
      int portToBind,
      RpcHandler appRpcHandler,
      List<TransportServerBootstrap> bootstraps) {
    this.context = context;
    this.conf = context.getConf();
    this.appRpcHandler = appRpcHandler;
    this.bootstraps = Lists.newArrayList(Preconditions.checkNotNull(bootstraps));

    try {
       //初始化netty服務
      init(hostToBind, portToBind);
    } catch (RuntimeException e) {
      JavaUtils.closeQuietly(this);
      throw e;
    }
  }
  

在初始化netty服務有很多操作,但是這些都是netty server創建最基礎的代碼,所以不多介紹,唯一要強調的是這段代碼

  private void init(String hostToBind, int portToBind) {
  ......
    // 添加消息處理器
    bootstrap.childHandler(new ChannelInitializer<SocketChannel>() {
      @Override
      protected void initChannel(SocketChannel ch) throws Exception {
      // 這裏的appRpcHandler根據前面的調用可以知道是NettyRpcHandler對象***
        RpcHandler rpcHandler = appRpcHandler;
        // 在rpcHandler上封裝多層處理,比如Sasl認證(裝飾器模式)
        for (TransportServerBootstrap bootstrap : bootstraps) {
          rpcHandler = bootstrap.doBootstrap(ch, rpcHandler);
        }
         // 初始化消息處理器(在messageDecoder之後處理)
        context.initializePipeline(ch, rpcHandler);
      }
    });

接着分析一下context.initializePipeline這個方法用於channel.pipeline()中添加一個handler

TransportContext.java
--------------------------


  public TransportChannelHandler initializePipeline(
      SocketChannel channel,
      RpcHandler channelRpcHandler) {
    try {
      TransportChannelHandler channelHandler = createChannelHandler(channel, channelRpcHandler);
      channel.pipeline()
        .addLast("encoder", encoder)
              // 用於處理粘包拆包
        .addLast(TransportFrameDecoder.HANDLER_NAME, NettyUtils.createFrameDecoder())
              // 消息解碼
        .addLast("decoder", decoder)
              // 當連接的空閒時間(讀或者寫)太長時,將會觸發一個 IdleStateEvent 事件。然後,你可以通過你的 ChannelInboundHandler 中重寫 userEventTrigged 方法來處理該事件。
              // 所以TransportChannelHandler中添加了userEventTrigged方法
        .addLast("idleStateHandler", new IdleStateHandler(0, 0, conf.connectionTimeoutMs() / 1000))
        // NOTE: Chunks are currently guaranteed to be returned in the order of request, but this
        // would require more logic to guarantee if this were not part of the same event loop.
        .addLast("handler", channelHandler);
      return channelHandler;
    } catch (RuntimeException e) {
      logger.error("Error while initializing Netty pipeline", e);
      throw e;
    }
  }

通過channel.pipeline()的添加流程,可以使用下圖表示:
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-jfH10Lil-1574436254221)(9B03E550A367455980F72BA3AD214F42)]
(IdleStateHandler處理省略)
爲了方便理解,介紹其中比較重要的三個handler: MessageEncoder,MessageDecoder,TransportChannelHandler

(1) MessageEncoder : 其encode方法可知,編碼器會提取Message對象的type和data等信息編碼成一個字節數組

(2) MessageDecoder: 它和MessageEncoder正好相反,會將字節數組轉爲Message對象

  private Message decode(Message.Type msgType, ByteBuf in) {
    switch (msgType) {
      case ChunkFetchRequest:
        return ChunkFetchRequest.decode(in);

      case ChunkFetchSuccess:
        return ChunkFetchSuccess.decode(in);
    ......

(3)TransportChannelHandler:

  @Override
  public void channelRead0(ChannelHandlerContext ctx, Message request) throws Exception {
   // 區分消息類型
    if (request instanceof RequestMessage) {
      requestHandler.handle((RequestMessage) request); // 具體處理查看:TransportRequestHandler.handle
    } else {
      responseHandler.handle((ResponseMessage) request); // 具體處理查看:TransportResponseHandler.handle
    }
  }

TransportChannelHandler在處理消息過程中主要區分了三種消息類型:RPC消息、ChunkFetch消息以及Stream消息。

  1. RPC消息用於抽象所有spark中涉及到RPC操作時需要傳輸的消息,通常這類消息很小,一般都是些控制類消息
  2. ChunkFetch消息用於抽象所有spark中涉及到數據拉取操作時需要傳輸的消息,它用於shuffle數據以及RDD Block數據傳輸
  3. Stream消息很簡單,主要用於driver到executor傳輸jar、file文件等

這裏着重介紹RPC消息的處理

(1)RequestMessage處理

TransportRequestHandler.java
------------------------
  
  /**
   * 處理rpc請求
   * @param req
   */
  private void processRpcRequest(final RpcRequest req) {
    try {
      // rpcHandler=NettyRpcHandler
      // 核心邏輯: rpcHandler.receive
      rpcHandler.receive(reverseClient, req.body().nioByteBuffer(), new RpcResponseCallback() {
        @Override
        public void onSuccess(ByteBuffer response) {
          respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
        }

        @Override
        public void onFailure(Throwable e) {
          respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
        }
      });
    } catch (Exception e) {
      logger.error("Error while invoking RpcHandler#receive() on RPC id " + req.requestId, e);
      respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
    } finally {
      req.body().release();
    }
  }
  
 NettyRpcHandler.java
---------------------------

 // 接收並處理請求
 override def receive(
      client: TransportClient,
      message: ByteBuffer,
      callback: RpcResponseCallback): Unit = {
    // ByteBuffer => requestMessage
    val messageToDispatch = internalReceive(client, message)
    // 分發請求信息,前面已經介紹了,最後會放到receivers觸發endpoint請求處理
    dispatcher.postRemoteMessage(messageToDispatch, callback)
  }

(2)ResponseMessage處理

 public void handle(ResponseMessage message) throws Exception {
 ....
else if (message instanceof RpcResponse) {
        // 處理RpcResponse類型
      RpcResponse resp = (RpcResponse) message;
      RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
      if (listener == null) {
        logger.warn("Ignoring response for RPC {} from {} ({} bytes) since it is not outstanding",
          resp.requestId, remoteAddress, resp.body().size());
      } else {
        outstandingRpcs.remove(resp.requestId);
        try {
            // 通過listener發送成功信息
          listener.onSuccess(resp.body().nioByteBuffer());
        } finally {
          resp.body().release();
        }
      }
    } else if (message instanceof RpcFailure) {
        // 處理RpcFailure類型
      RpcFailure resp = (RpcFailure) message;
      RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
      if (listener == null) {
        logger.warn("Ignoring response for RPC {} from {} ({}) since it is not outstanding",
          resp.requestId, remoteAddress, resp.errorString);
      } else {
        outstandingRpcs.remove(resp.requestId);
        // 通過listener發送失敗信息
        listener.onFailure(new RuntimeException(resp.errorString));
      }
    } 

2.2 rpc客戶端實現

在前面的master啓動分析中,我們分析過這行代碼// 向Master的通信終端發送請求,獲取綁定的端口號 val portsResponse = masterEndpoint.askWithRetry[BoundPortsResponse](BoundPortsRequest),它的作用就是通過拿到master endpint的引用請求master的服務,也就是以master client的形式請求數據。上次我們介紹到了請求時分兩種情況,其一remoteAddr == address(本機)相當於直接往inbox中添加requestMessage這種比較簡單,現在來介紹第二種形式,如果是遠程服務端怎麼處理呢?

首先會創建一個RpcOutboxMessage對象,然後將它添加到outbox中,如果本地已經創建了和遠程服務端的連接直接請求

NettyRpcEnv.scala
---------------------


      // 封裝rpc請求對象
        val rpcMessage = RpcOutboxMessage(serialize(message),
          onFailure,
          (client, response) => onSuccess(deserialize[Any](client, response)))
        //
        postToOutbox(message.receiver, rpcMessage)
    
 /**
    * 添加發送消息到outbox中
    * @param receiver
    * @param message
    */
  private def postToOutbox(receiver: NettyRpcEndpointRef, message: OutboxMessage): Unit = {
    if (receiver.client != null) {
      // 如果有接收端的連接,直接發送數據。第一次的時候receiver.client=null
      message.sendWith(receiver.client)
    } else {
      require(receiver.address != null,
        "Cannot send message to client endpoint with no listen address.")
        // 一個遠程服務地址對應一個outbox
      val targetOutbox = {
        // 查找是不是保存過這個client對應的outbox
        val outbox = outboxes.get(receiver.address)
        if (outbox == null) {
          // 如果沒有對應的outbox,創建一個
          val newOutbox = new Outbox(this, receiver.address)
          val oldOutbox = outboxes.putIfAbsent(receiver.address, newOutbox)
          if (oldOutbox == null) {
            newOutbox
          } else {
            oldOutbox
          }
        } else {
          outbox
        }
      }
      if (stopped.get) {
        // It's possible that we put `targetOutbox` after stopping. So we need to clean it.
        // 從outbox集合中移除,並停止
        outboxes.remove(receiver.address)
        targetOutbox.stop()
      } else {
        // 發送消息到接收端
        targetOutbox.send(message)
      }
    }
  }

targetOutbox.send發送消息代碼,比較簡單就是判斷當前outbox是否已經停止。

OutBox.scala
--------------------------

 def send(message: OutboxMessage): Unit = {
    val dropped = synchronized {
      if (stopped) {
        true
      } else {
        // 添加消息到outboxMessage集合中
        messages.add(message)
        false
      }
    }
    if (dropped) {
      message.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
    } else {
      // 處理outbox消息
      drainOutbox()
    }
  }

真正的處理消息邏輯就是從drainOutbox方法開始的,現在我們看一下它的具體實現過程:如果如果client沒有存在需要創建(重點),如果已經存在就將現有的所有消息,全部發送到遠程服務端。

OutBox.scala
----------------------


 private def drainOutbox(): Unit = {
    var message: OutboxMessage = null
    synchronized {
      if (stopped) {
        return
      }
      if (connectFuture != null) {
        // 如果有一個連接在處理,直接返回
        // We are connecting to the remote address, so just exit
        return
      }
      if (client == null) {
        // There is no connect task but client is null, so we need to launch the connect task.
        launchConnectTask()
        return
      }
      if (draining) {
        // There is some thread draining, so just exit
        return
      }
      message = messages.poll()
      if (message == null) {
        return
      }
      // 正在處理
      draining = true
    }
    // 一直消費到,outbox隊列中沒有數據
    while (true) {
      try {
        val _client = synchronized { client }
        if (_client != null) {
          // 發送到接收端
          message.sendWith(_client)
        } else {
          assert(stopped == true)
        }
      } catch {
        case NonFatal(e) =>
          handleNetworkFailure(e)
          return
      }
      synchronized {
        if (stopped) {
          return
        }
        // 再獲取一條消息
        message = messages.poll()
        if (message == null) {
          // 如果沒有消息直接返回
          draining = false
          return
        }
      }
    }
  }

由於我們探究的是rpc的客戶端,所以需要重點關注一下launchConnectTask方法。

OutBox.scala
----------------------


 private def launchConnectTask(): Unit = {
    connectFuture = nettyEnv.clientConnectionExecutor.submit(new Callable[Unit] {

      override def call(): Unit = {
        try {
          // 創建一個連接address的客戶端
          val _client = nettyEnv.createClient(address)
          outbox.synchronized {
            client = _client
            if (stopped) {
              closeClient()
            }
          }
        } catch {
          case ie: InterruptedException =>
            // exit
            return
          case NonFatal(e) =>
            outbox.synchronized { connectFuture = null }
            handleNetworkFailure(e)
            return
        }
        outbox.synchronized { connectFuture = null }
        // It's possible that no thread is draining now. If we don't drain here, we cannot send the
        // messages until the next message arrives.
        // 創建完成之後,重新消費
        drainOutbox()
      }
    })
  }

上面的核心方法就是這句:nettyEnv.createClient(address),接下來會比較麻煩,請做好準備,nettyEnv.createClient最終會調用TransportClientFactory.createClient方法。這裏主要使用判斷是不是存在遠程服務緩存,如果有直接返回,如果沒有就使用TransportClientFactory.createClient(resolvedAddress)的方式創建。而創建邏輯和服務端的非常相似,所以直接看註釋就可以了。

TransportClientFactory.java
-----------------------------

 public TransportClient createClient(String remoteHost, int remotePort) throws IOException {
    // 將host,port封裝成InetSocketAddress對象
    final InetSocketAddress unresolvedAddress =
      InetSocketAddress.createUnresolved(remoteHost, remotePort);
    // 判斷連接池中是不是存在和該遠程服務器的連接
    ClientPool clientPool = connectionPool.get(unresolvedAddress);
    if (clientPool == null) {
      // 創建連接池
      connectionPool.putIfAbsent(unresolvedAddress, new ClientPool(numConnectionsPerPeer));
      clientPool = connectionPool.get(unresolvedAddress);
    }
    // 從緩存連接池中隨機獲取一個連接
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
    // 如果連接是有效的
    if (cachedClient != null && cachedClient.isActive()) {
      // 獲取TransportChannelHandler的傳輸處理器
      TransportChannelHandler handler = cachedClient.getChannel().pipeline()
        .get(TransportChannelHandler.class);
      synchronized (handler) {
        // 更新最後一次處理時間
        handler.getResponseHandler().updateTimeOfLastRequest();
      }

      if (cachedClient.isActive()) {
        logger.trace("Returning cached connection to {}: {}",
          cachedClient.getSocketAddress(), cachedClient);
        return cachedClient;
      }
    }
    // 如果緩存連接池中不存在與該遠程服務器的連接,需要重新創建一個
    final long preResolveHost = System.nanoTime();
    final InetSocketAddress resolvedAddress = new InetSocketAddress(remoteHost, remotePort);
    final long hostResolveTimeMs = (System.nanoTime() - preResolveHost) / 1000000;
    // 判斷nds解析時間是不是超時(最終改爲配置)
    if (hostResolveTimeMs > 2000) {
      logger.warn("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
    } else {
      logger.trace("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
    }

    // 更新隨機位置的客戶端連接對象
    synchronized (clientPool.locks[clientIndex]) {
      cachedClient = clientPool.clients[clientIndex];

      if (cachedClient != null) {
        if (cachedClient.isActive()) {
          logger.trace("Returning cached connection to {}: {}", resolvedAddress, cachedClient);
          return cachedClient;
        } else {
          logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
        }
      }
      // 創建遠程連接對象,並修改對應位置的客戶端連接池
      clientPool.clients[clientIndex] = createClient(resolvedAddress);
      return clientPool.clients[clientIndex];
    }
  }
  
  /**
   * 創建rpc客戶端的真正代碼
   */
 private TransportClient createClient(InetSocketAddress address) throws IOException {
    logger.debug("Creating new connection to " + address);

    Bootstrap bootstrap = new Bootstrap();
    bootstrap.group(workerGroup)
      .channel(socketChannelClass)
      .option(ChannelOption.TCP_NODELAY, true)
      .option(ChannelOption.SO_KEEPALIVE, true)
      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
      .option(ChannelOption.ALLOCATOR, pooledAllocator);

    final AtomicReference<TransportClient> clientRef = new AtomicReference<>();
    final AtomicReference<Channel> channelRef = new AtomicReference<>();
    // 添加處理,最終調用和服務端的處理器一樣的方法
    bootstrap.handler(new ChannelInitializer<SocketChannel>() {
      @Override
      public void initChannel(SocketChannel ch) {
        TransportChannelHandler clientHandler = context.initializePipeline(ch);
        // 修改最新的客戶端引用,方便到匿名內部類之外調用
        clientRef.set(clientHandler.getClient());
        channelRef.set(ch);
      }
    });

    long preConnect = System.nanoTime();
    ChannelFuture cf = bootstrap.connect(address);
    if (!cf.awaitUninterruptibly(conf.connectionTimeoutMs())) {
      throw new IOException(
        String.format("Connecting to %s timed out (%s ms)", address, conf.connectionTimeoutMs()));
    } else if (cf.cause() != null) {
      throw new IOException(String.format("Failed to connect to %s", address), cf.cause());
    }

    TransportClient client = clientRef.get();
    Channel channel = channelRef.get();
    assert client != null : "Channel future completed successfully with null client";

    long preBootstrap = System.nanoTime();
    logger.debug("Connection to {} successful, running bootstraps...", address);
    // 使用裝飾器模式添加多個處理器邏輯
    try {
      for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
        clientBootstrap.doBootstrap(client, channel);
      }
    } catch (Exception e) { 
      long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 1000000;
      logger.error("Exception while bootstrapping client after " + bootstrapTimeMs + " ms", e);
      client.close();
      throw Throwables.propagate(e);
    }
    long postBootstrap = System.nanoTime();

    logger.info("Successfully created connection to {} after {} ms ({} ms spent in bootstraps)",
      address, (postBootstrap - preConnect) / 1000000, (postBootstrap - preBootstrap) / 1000000);

    return client;
  }

參考文章:https://www.cnblogs.com/xia520pi/p/8693966.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章