spark版本: 2.0.0
1.概念
spark是分佈式服務,需要涉及到大量的網絡通信以及遠程服務調用(rpc),在1.6前spark使用的是akka實現,但是考慮到akka兼容性問題,最後捨棄,改爲netty。這篇文章就將介紹基於netty的rpc服務是如何運作的。
在前一篇文章中介紹了master的啓動過程,但是其中對rpcEnv這部分介紹的很少,所以我將從上篇文章創建rpcEnv位置說明spark中的服務是如何通信的。
2.rpc實現
2.1 rpc服務端實現
在master啓動中有這樣一句代碼val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)
用於創建rpcEnv環境,現在我們就來深入瞭解一下這段代碼究竟在幹什麼?
RpcEnv.scala
------------------
def create(
name: String,
host: String,
port: Int,
conf: SparkConf,
securityManager: SecurityManager,
// clientMode=false,因爲啓動服務方一定是服務端
clientMode: Boolean = false): RpcEnv = {
// 封裝rpcEnv配置對象
val config = RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
// 使用基於netty的rpcEnv工廠(工廠模式),不過這裏爲了擴展方便可以使用反射方式創建對象
new NettyRpcEnvFactory().create(config)
}
上面核心代碼:new NettyRpcEnvFactory().create(config)
NettyRpcEnv.scala
--------------------
// 創建RpcEnv對象
def create(config: RpcEnvConfig): RpcEnv = {
val sparkConf = config.conf
// Use JavaSerializerInstance in multiple threads is safe. However, if we plan to support
// KryoSerializer in future, we have to use ThreadLocal to store SerializerInstance
// 序列化方式,更好的方式:通過反射創建
val javaSerializerInstance =
new JavaSerializer(sparkConf).newInstance().asInstanceOf[JavaSerializerInstance]
// 創建NettyRpcEnv 【1】
val nettyEnv =
new NettyRpcEnv(sparkConf, javaSerializerInstance, config.host, config.securityManager)
// 如果是服務端,需要啓動服務 【2】
if (!config.clientMode) {
// 根據端口啓動服務
val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort =>
nettyEnv.startServer(actualPort)
(nettyEnv, nettyEnv.address.port)
}
try {
// 啓動服務
Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1
} catch {
case NonFatal(e) =>
nettyEnv.shutdown()
throw e
}
}
nettyEnv
}
}
這段代碼是非常關鍵的,所以分爲了兩個主要部分:
【1】 創建NettyRpcEnv
【2】 啓動服務端
現在依次介紹這兩個部分:
【1】
NettyRpcEnv.class
-------------------------
在創建NettyRpcEnv對象時,需要關注以下主要屬性:
// (1) 將sparkConf轉爲SparkTransportConf(傳輸配置對象)
private[netty] val transportConf = SparkTransportConf.fromSparkConf(
conf.clone.set("spark.rpc.io.numConnectionsPerPeer", "1"),
"rpc",
conf.getInt("spark.rpc.io.threads", 0))
// (2)分發消息
private val dispatcher: Dispatcher = new Dispatcher(this)
// (3)處理數據流
private val streamManager = new NettyStreamManager(this)
// (4)傳輸數據上下文
private val transportContext = new TransportContext(transportConf,
new NettyRpcHandler(dispatcher, this, streamManager))
(1)SparkTransportConf就是專門用來處理傳輸的配置對象
(2)在介紹master的啓動過程中,也說過dispatcher的registerRpcEndpoint
方法用於註冊endpoint,並將endpoint信息記錄到endpoints,endpointRefs兩個主要屬性中,還有註冊的時候inbox會添加一個message=OnStart,用於觸發調用endpoint.onStart方法
(3) NettyStreamManager是專門用來處理文件,jar包,目錄等數據流
(4)transportContext對象中主要包含以下屬性:
private final TransportConf conf; // 傳輸配置
private final RpcHandler rpcHandler; // 消息處理對象,比如將字節流轉爲RequestMessage對象
private final MessageEncoder encoder; // 消息編碼
private final MessageDecoder decoder; // 消息解碼
【2】如果config.clientMode==false,將會調用nettyEnv.startServer(actualPort)啓動服務端【在獲取actualPort有一些特殊處理,如果指定的端口被佔用,會嘗試獲取新的端口】
NettyRpcEnv.class
----------------------
/**
* 開啓服務
* @param port 服務端口
*/
def startServer(port: Int): Unit = {
val bootstraps: java.util.List[TransportServerBootstrap] =
if (securityManager.isAuthenticationEnabled()) {
java.util.Arrays.asList(new SaslServerBootstrap(transportConf, securityManager))
} else {
java.util.Collections.emptyList()
}
// 啓動通信服務
server = transportContext.createServer(host, port, bootstraps)
// 註冊校驗endpoint,可以參考master endpoint,這裏不做進一步分析
dispatcher.registerRpcEndpoint(
RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
}
上面代碼先分析一下:transportContext.createServer(host, port, bootstraps)
最終會調用new TransportServer(this, host, port, rpcHandler, bootstraps)
所以我們來看一下TransportServer實例化過程:
public TransportServer(
TransportContext context,
String hostToBind,
int portToBind,
RpcHandler appRpcHandler,
List<TransportServerBootstrap> bootstraps) {
this.context = context;
this.conf = context.getConf();
this.appRpcHandler = appRpcHandler;
this.bootstraps = Lists.newArrayList(Preconditions.checkNotNull(bootstraps));
try {
//初始化netty服務
init(hostToBind, portToBind);
} catch (RuntimeException e) {
JavaUtils.closeQuietly(this);
throw e;
}
}
在初始化netty服務有很多操作,但是這些都是netty server創建最基礎的代碼,所以不多介紹,唯一要強調的是這段代碼
private void init(String hostToBind, int portToBind) {
......
// 添加消息處理器
bootstrap.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) throws Exception {
// 這裏的appRpcHandler根據前面的調用可以知道是NettyRpcHandler對象***
RpcHandler rpcHandler = appRpcHandler;
// 在rpcHandler上封裝多層處理,比如Sasl認證(裝飾器模式)
for (TransportServerBootstrap bootstrap : bootstraps) {
rpcHandler = bootstrap.doBootstrap(ch, rpcHandler);
}
// 初始化消息處理器(在messageDecoder之後處理)
context.initializePipeline(ch, rpcHandler);
}
});
接着分析一下context.initializePipeline
這個方法用於channel.pipeline()中添加一個handler
TransportContext.java
--------------------------
public TransportChannelHandler initializePipeline(
SocketChannel channel,
RpcHandler channelRpcHandler) {
try {
TransportChannelHandler channelHandler = createChannelHandler(channel, channelRpcHandler);
channel.pipeline()
.addLast("encoder", encoder)
// 用於處理粘包拆包
.addLast(TransportFrameDecoder.HANDLER_NAME, NettyUtils.createFrameDecoder())
// 消息解碼
.addLast("decoder", decoder)
// 當連接的空閒時間(讀或者寫)太長時,將會觸發一個 IdleStateEvent 事件。然後,你可以通過你的 ChannelInboundHandler 中重寫 userEventTrigged 方法來處理該事件。
// 所以TransportChannelHandler中添加了userEventTrigged方法
.addLast("idleStateHandler", new IdleStateHandler(0, 0, conf.connectionTimeoutMs() / 1000))
// NOTE: Chunks are currently guaranteed to be returned in the order of request, but this
// would require more logic to guarantee if this were not part of the same event loop.
.addLast("handler", channelHandler);
return channelHandler;
} catch (RuntimeException e) {
logger.error("Error while initializing Netty pipeline", e);
throw e;
}
}
通過channel.pipeline()的添加流程,可以使用下圖表示:
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-jfH10Lil-1574436254221)(9B03E550A367455980F72BA3AD214F42)]
(IdleStateHandler處理省略)
爲了方便理解,介紹其中比較重要的三個handler: MessageEncoder,MessageDecoder,TransportChannelHandler
(1) MessageEncoder : 其encode方法可知,編碼器會提取Message對象的type和data等信息編碼成一個字節數組
(2) MessageDecoder: 它和MessageEncoder正好相反,會將字節數組轉爲Message對象
private Message decode(Message.Type msgType, ByteBuf in) {
switch (msgType) {
case ChunkFetchRequest:
return ChunkFetchRequest.decode(in);
case ChunkFetchSuccess:
return ChunkFetchSuccess.decode(in);
......
(3)TransportChannelHandler:
@Override
public void channelRead0(ChannelHandlerContext ctx, Message request) throws Exception {
// 區分消息類型
if (request instanceof RequestMessage) {
requestHandler.handle((RequestMessage) request); // 具體處理查看:TransportRequestHandler.handle
} else {
responseHandler.handle((ResponseMessage) request); // 具體處理查看:TransportResponseHandler.handle
}
}
TransportChannelHandler在處理消息過程中主要區分了三種消息類型:RPC消息、ChunkFetch消息以及Stream消息。
- RPC消息用於抽象所有spark中涉及到RPC操作時需要傳輸的消息,通常這類消息很小,一般都是些控制類消息
- ChunkFetch消息用於抽象所有spark中涉及到數據拉取操作時需要傳輸的消息,它用於shuffle數據以及RDD Block數據傳輸
- Stream消息很簡單,主要用於driver到executor傳輸jar、file文件等
這裏着重介紹RPC消息的處理
(1)RequestMessage處理
TransportRequestHandler.java
------------------------
/**
* 處理rpc請求
* @param req
*/
private void processRpcRequest(final RpcRequest req) {
try {
// rpcHandler=NettyRpcHandler
// 核心邏輯: rpcHandler.receive
rpcHandler.receive(reverseClient, req.body().nioByteBuffer(), new RpcResponseCallback() {
@Override
public void onSuccess(ByteBuffer response) {
respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
}
@Override
public void onFailure(Throwable e) {
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
}
});
} catch (Exception e) {
logger.error("Error while invoking RpcHandler#receive() on RPC id " + req.requestId, e);
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
} finally {
req.body().release();
}
}
NettyRpcHandler.java
---------------------------
// 接收並處理請求
override def receive(
client: TransportClient,
message: ByteBuffer,
callback: RpcResponseCallback): Unit = {
// ByteBuffer => requestMessage
val messageToDispatch = internalReceive(client, message)
// 分發請求信息,前面已經介紹了,最後會放到receivers觸發endpoint請求處理
dispatcher.postRemoteMessage(messageToDispatch, callback)
}
(2)ResponseMessage處理
public void handle(ResponseMessage message) throws Exception {
....
else if (message instanceof RpcResponse) {
// 處理RpcResponse類型
RpcResponse resp = (RpcResponse) message;
RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
if (listener == null) {
logger.warn("Ignoring response for RPC {} from {} ({} bytes) since it is not outstanding",
resp.requestId, remoteAddress, resp.body().size());
} else {
outstandingRpcs.remove(resp.requestId);
try {
// 通過listener發送成功信息
listener.onSuccess(resp.body().nioByteBuffer());
} finally {
resp.body().release();
}
}
} else if (message instanceof RpcFailure) {
// 處理RpcFailure類型
RpcFailure resp = (RpcFailure) message;
RpcResponseCallback listener = outstandingRpcs.get(resp.requestId);
if (listener == null) {
logger.warn("Ignoring response for RPC {} from {} ({}) since it is not outstanding",
resp.requestId, remoteAddress, resp.errorString);
} else {
outstandingRpcs.remove(resp.requestId);
// 通過listener發送失敗信息
listener.onFailure(new RuntimeException(resp.errorString));
}
}
2.2 rpc客戶端實現
在前面的master啓動分析中,我們分析過這行代碼// 向Master的通信終端發送請求,獲取綁定的端口號 val portsResponse = masterEndpoint.askWithRetry[BoundPortsResponse](BoundPortsRequest)
,它的作用就是通過拿到master endpint的引用請求master的服務,也就是以master client的形式請求數據。上次我們介紹到了請求時分兩種情況,其一remoteAddr == address(本機)相當於直接往inbox中添加requestMessage這種比較簡單,現在來介紹第二種形式,如果是遠程服務端怎麼處理呢?
首先會創建一個RpcOutboxMessage對象,然後將它添加到outbox中,如果本地已經創建了和遠程服務端的連接直接請求
NettyRpcEnv.scala
---------------------
// 封裝rpc請求對象
val rpcMessage = RpcOutboxMessage(serialize(message),
onFailure,
(client, response) => onSuccess(deserialize[Any](client, response)))
//
postToOutbox(message.receiver, rpcMessage)
/**
* 添加發送消息到outbox中
* @param receiver
* @param message
*/
private def postToOutbox(receiver: NettyRpcEndpointRef, message: OutboxMessage): Unit = {
if (receiver.client != null) {
// 如果有接收端的連接,直接發送數據。第一次的時候receiver.client=null
message.sendWith(receiver.client)
} else {
require(receiver.address != null,
"Cannot send message to client endpoint with no listen address.")
// 一個遠程服務地址對應一個outbox
val targetOutbox = {
// 查找是不是保存過這個client對應的outbox
val outbox = outboxes.get(receiver.address)
if (outbox == null) {
// 如果沒有對應的outbox,創建一個
val newOutbox = new Outbox(this, receiver.address)
val oldOutbox = outboxes.putIfAbsent(receiver.address, newOutbox)
if (oldOutbox == null) {
newOutbox
} else {
oldOutbox
}
} else {
outbox
}
}
if (stopped.get) {
// It's possible that we put `targetOutbox` after stopping. So we need to clean it.
// 從outbox集合中移除,並停止
outboxes.remove(receiver.address)
targetOutbox.stop()
} else {
// 發送消息到接收端
targetOutbox.send(message)
}
}
}
targetOutbox.send發送消息代碼,比較簡單就是判斷當前outbox是否已經停止。
OutBox.scala
--------------------------
def send(message: OutboxMessage): Unit = {
val dropped = synchronized {
if (stopped) {
true
} else {
// 添加消息到outboxMessage集合中
messages.add(message)
false
}
}
if (dropped) {
message.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
} else {
// 處理outbox消息
drainOutbox()
}
}
真正的處理消息邏輯就是從drainOutbox方法開始的,現在我們看一下它的具體實現過程:如果如果client沒有存在需要創建(重點),如果已經存在就將現有的所有消息,全部發送到遠程服務端。
OutBox.scala
----------------------
private def drainOutbox(): Unit = {
var message: OutboxMessage = null
synchronized {
if (stopped) {
return
}
if (connectFuture != null) {
// 如果有一個連接在處理,直接返回
// We are connecting to the remote address, so just exit
return
}
if (client == null) {
// There is no connect task but client is null, so we need to launch the connect task.
launchConnectTask()
return
}
if (draining) {
// There is some thread draining, so just exit
return
}
message = messages.poll()
if (message == null) {
return
}
// 正在處理
draining = true
}
// 一直消費到,outbox隊列中沒有數據
while (true) {
try {
val _client = synchronized { client }
if (_client != null) {
// 發送到接收端
message.sendWith(_client)
} else {
assert(stopped == true)
}
} catch {
case NonFatal(e) =>
handleNetworkFailure(e)
return
}
synchronized {
if (stopped) {
return
}
// 再獲取一條消息
message = messages.poll()
if (message == null) {
// 如果沒有消息直接返回
draining = false
return
}
}
}
}
由於我們探究的是rpc的客戶端,所以需要重點關注一下launchConnectTask方法。
OutBox.scala
----------------------
private def launchConnectTask(): Unit = {
connectFuture = nettyEnv.clientConnectionExecutor.submit(new Callable[Unit] {
override def call(): Unit = {
try {
// 創建一個連接address的客戶端
val _client = nettyEnv.createClient(address)
outbox.synchronized {
client = _client
if (stopped) {
closeClient()
}
}
} catch {
case ie: InterruptedException =>
// exit
return
case NonFatal(e) =>
outbox.synchronized { connectFuture = null }
handleNetworkFailure(e)
return
}
outbox.synchronized { connectFuture = null }
// It's possible that no thread is draining now. If we don't drain here, we cannot send the
// messages until the next message arrives.
// 創建完成之後,重新消費
drainOutbox()
}
})
}
上面的核心方法就是這句:nettyEnv.createClient(address)
,接下來會比較麻煩,請做好準備,nettyEnv.createClient最終會調用TransportClientFactory.createClient方法。這裏主要使用判斷是不是存在遠程服務緩存,如果有直接返回,如果沒有就使用TransportClientFactory.createClient(resolvedAddress)的方式創建。而創建邏輯和服務端的非常相似,所以直接看註釋就可以了。
TransportClientFactory.java
-----------------------------
public TransportClient createClient(String remoteHost, int remotePort) throws IOException {
// 將host,port封裝成InetSocketAddress對象
final InetSocketAddress unresolvedAddress =
InetSocketAddress.createUnresolved(remoteHost, remotePort);
// 判斷連接池中是不是存在和該遠程服務器的連接
ClientPool clientPool = connectionPool.get(unresolvedAddress);
if (clientPool == null) {
// 創建連接池
connectionPool.putIfAbsent(unresolvedAddress, new ClientPool(numConnectionsPerPeer));
clientPool = connectionPool.get(unresolvedAddress);
}
// 從緩存連接池中隨機獲取一個連接
int clientIndex = rand.nextInt(numConnectionsPerPeer);
TransportClient cachedClient = clientPool.clients[clientIndex];
// 如果連接是有效的
if (cachedClient != null && cachedClient.isActive()) {
// 獲取TransportChannelHandler的傳輸處理器
TransportChannelHandler handler = cachedClient.getChannel().pipeline()
.get(TransportChannelHandler.class);
synchronized (handler) {
// 更新最後一次處理時間
handler.getResponseHandler().updateTimeOfLastRequest();
}
if (cachedClient.isActive()) {
logger.trace("Returning cached connection to {}: {}",
cachedClient.getSocketAddress(), cachedClient);
return cachedClient;
}
}
// 如果緩存連接池中不存在與該遠程服務器的連接,需要重新創建一個
final long preResolveHost = System.nanoTime();
final InetSocketAddress resolvedAddress = new InetSocketAddress(remoteHost, remotePort);
final long hostResolveTimeMs = (System.nanoTime() - preResolveHost) / 1000000;
// 判斷nds解析時間是不是超時(最終改爲配置)
if (hostResolveTimeMs > 2000) {
logger.warn("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
} else {
logger.trace("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
}
// 更新隨機位置的客戶端連接對象
synchronized (clientPool.locks[clientIndex]) {
cachedClient = clientPool.clients[clientIndex];
if (cachedClient != null) {
if (cachedClient.isActive()) {
logger.trace("Returning cached connection to {}: {}", resolvedAddress, cachedClient);
return cachedClient;
} else {
logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
}
}
// 創建遠程連接對象,並修改對應位置的客戶端連接池
clientPool.clients[clientIndex] = createClient(resolvedAddress);
return clientPool.clients[clientIndex];
}
}
/**
* 創建rpc客戶端的真正代碼
*/
private TransportClient createClient(InetSocketAddress address) throws IOException {
logger.debug("Creating new connection to " + address);
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(workerGroup)
.channel(socketChannelClass)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.SO_KEEPALIVE, true)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
.option(ChannelOption.ALLOCATOR, pooledAllocator);
final AtomicReference<TransportClient> clientRef = new AtomicReference<>();
final AtomicReference<Channel> channelRef = new AtomicReference<>();
// 添加處理,最終調用和服務端的處理器一樣的方法
bootstrap.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) {
TransportChannelHandler clientHandler = context.initializePipeline(ch);
// 修改最新的客戶端引用,方便到匿名內部類之外調用
clientRef.set(clientHandler.getClient());
channelRef.set(ch);
}
});
long preConnect = System.nanoTime();
ChannelFuture cf = bootstrap.connect(address);
if (!cf.awaitUninterruptibly(conf.connectionTimeoutMs())) {
throw new IOException(
String.format("Connecting to %s timed out (%s ms)", address, conf.connectionTimeoutMs()));
} else if (cf.cause() != null) {
throw new IOException(String.format("Failed to connect to %s", address), cf.cause());
}
TransportClient client = clientRef.get();
Channel channel = channelRef.get();
assert client != null : "Channel future completed successfully with null client";
long preBootstrap = System.nanoTime();
logger.debug("Connection to {} successful, running bootstraps...", address);
// 使用裝飾器模式添加多個處理器邏輯
try {
for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
clientBootstrap.doBootstrap(client, channel);
}
} catch (Exception e) {
long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 1000000;
logger.error("Exception while bootstrapping client after " + bootstrapTimeMs + " ms", e);
client.close();
throw Throwables.propagate(e);
}
long postBootstrap = System.nanoTime();
logger.info("Successfully created connection to {} after {} ms ({} ms spent in bootstraps)",
address, (postBootstrap - preConnect) / 1000000, (postBootstrap - preBootstrap) / 1000000);
return client;
}
參考文章:https://www.cnblogs.com/xia520pi/p/8693966.html