問題描述
參考RocketMQ官方文檔在本地啓動一個驗證環境的時候遇到超時報錯問題。
本地環境OS:CentOS Linux release 8.5.2111
首先,進入到RocketMQ安裝目錄,如:~/opt/rocketmq-all-5.2.0-bin-release
。
執行如下命令啓動NameServer:
$ sh bin/mqnamesrv
該命令執行很慢,但是最終還是顯示啓動NameServer成功了,輸出日誌如下:
Java HotSpot(TM) 64-Bit Server VM warning: Using the DefNew young collector with the CMS collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
The Name Server boot success. serializeType=JSON, address 0.0.0.0:9876
執行jps
命令也能看到相應進程:
$ jps
13730 NamesrvStartup
執行如下命令啓動Broker + Proxy:
$ sh bin/mqbroker -n localhost:9876 --enable-proxy
該命令執行非常漫長,差不多要90s左右纔會輸出如下日誌:
Sat Feb 24 19:48:03 CST 2024 rocketmq-proxy startup successfully
在~/logs/rocketmqlogs/proxy.log
日誌中也能看到broker啓動成功的日誌:
2024-02-24 19:47:53 INFO main - The broker[broker-a, 192.168.88.135:10911] boot success. serializeType=JSON and name server is localhost:9876
注意:日誌中的broker-a
是在broker.conf文件中配置的brokerName參數,如下所示:
brokerClusterName = DefaultCluster
brokerName = broker-a # 配置的默認brokerName參數
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH
再次執行jps
命令確認相應進程是否已經啓動:
$ jps
jps
13730 NamesrvStartup
14410 ProxyStartup
一切似乎看起來都正常,從~/logs/rocketmqlogs/namesrv.log
和~/logs/rocketmqlogs/proxy.log
日誌中也看不出明顯的異常。
但是在創建Topic時就會報錯:
$ sh bin/mqadmin updatetopic -n localhost:9876 -t TestTopic -c DefaultCluster
該命令在執行大約40s左右就會輸出如下報錯日誌:
org.apache.rocketmq.tools.command.SubCommandException: UpdateTopicSubCommand command failed
at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:198)
at org.apache.rocketmq.tools.command.MQAdminStartup.main0(MQAdminStartup.java:164)
at org.apache.rocketmq.tools.command.MQAdminStartup.main(MQAdminStartup.java:114)
Caused by: org.apache.rocketmq.remoting.exception.RemotingTimeoutException: invokeSync call the addr[127.0.0.1:9876] timeout
at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:549)
at org.apache.rocketmq.client.impl.MQClientAPIImpl.getBrokerClusterInfo(MQClientAPIImpl.java:1961)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineBrokerClusterInfo(DefaultMQAdminExtImpl.java:577)
at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineBrokerClusterInfo(DefaultMQAdminExt.java:318)
at org.apache.rocketmq.tools.command.CommandUtil.fetchMasterAddrByClusterName(CommandUtil.java:94)
at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:171)
... 2 more
從報錯信息看似乎是無法連接127.0.0.1:9876
,但是經過驗證發現該地址是一定可以連通的,再幾經嘗試之後依然報錯。
於是換了一臺Windows機器繼續驗證,奇怪的是在Windows機器上一切正常,而且我注意到在Windows環境啓動RocketMQ的時候brokerName使用是主機名,如下日誌:
# zhangsan是主機名
The broker[zhangsan, 20.5.133.188:10911] boot success. serializeType=JSON and name server is localhost:9876
於是腦袋中突然閃現一個疑問,是不是因爲沒有在CentOS的/etc/hosts
文件中映射主機名與127.0.0.1
地址導致的。
驗證後果然就正常的。
原因追蹤
根據相關報錯日誌梳理RocketMQ的源代碼,報錯是因爲在NettyRemotingClient.invokeSync()方法中做了超時判斷。
@Override
public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis)
throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
long beginStartTime = System.currentTimeMillis();
final Channel channel = this.getAndCreateChannel(addr);
String channelRemoteAddr = RemotingHelper.parseChannelRemoteAddr(channel);
if (channel != null && channel.isActive()) {
long left = timeoutMillis; // 默認超時時長是5000ms
try {
long costTime = System.currentTimeMillis() - beginStartTime;
left -= costTime;
if (left <= 0) { // 當執行時長超過5s時直接拋出異常
throw new RemotingTimeoutException("invokeSync call the addr[" + channelRemoteAddr + "] timeout");
}
RemotingCommand response = this.invokeSyncImpl(channel, request, left);
updateChannelLastResponseTime(addr);
return response;
}
//其他代碼省略...
}
//其他代碼省略...
}
由於是做了超時檢查拋出的異常,所以單純從日誌信息看就會認爲是無法連接127.0.0.1:9876
,實際上該地址是可以連通的。
進一步追蹤發現,是在執行Netty的ReflectiveChannelFactory.newChannel()
方法耗時較長,約10s左右。
@Override
public T newChannel() {
try {
// constructor是NioSocketChannel.class
// 所以本質上這裏是要通過反射的方式實例化一個NioSocketChannel對象
T t = constructor.newInstance();
return t;
} catch (Throwable t) {
throw new ChannelException("Unable to create Channel from class " + constructor.getDeclaringClass(), t);
}
}
驗證代碼如下:
long start = System.currentTimeMillis();
Constructor constructor = NioSocketChannel.class.getConstructor();
constructor.newInstance();
System.out.println(String.format("%s ms", System.currentTimeMillis() - start));
執行後輸出日誌:
10144 ms
奇怪的是,當在/etc/hosts
文件中明確指定主機名與127.0.0.1
的映射關係後,執行就非常快。
暫時還不清楚這個地方的深層次原因是什麼,爲什麼通過反射方式實例化NioSocketChannel
對象會跟主機名與127.0.0.1
的映射有關係呢?
【參考】
Windows 啓動RocketMQ