起因:由於業務系統有多個定時任務定時訪問銀行端,銀行每天也有大量業務訪問業務系統,都是通過mina通信,部署在測試環境的系統每過一兩天打開句柄過萬,生產的也是一週左右不重啓業務系統就會爆掉。一開始並不清楚到底是哪方面原因導致句柄增長這麼快,因爲這是一個老系統,經過多次升級,大量的併發、多線程,所以只好做了一個定時任務,每週重啓生產業務系統。
說明:業務系統和銀行之間的通信是通過c寫的轉換平臺轉發雙方的信息的,結構:業務系統(請求)——>轉換平臺(轉發)——>銀行端(相應)——>轉換平臺(轉發)——>業務系統收到響應,銀行端訪問業務系統也是這樣的方式。
開始通過命令查進程佔用的句柄數,從大到小排序,一行一個進程ID
lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more 其中第一列是打開的句柄數,第二列是進程ID。
然後通過命令查看單個進程所有打開的文件詳情
lsof -p 進程id
但是這樣查看感覺太亂了,沒辦法查看,於是通過命令:將執行結果內容輸出到日誌文件中查看
lsof -p 進程id > openfiles.log
發現是因爲很多socket連接沒有釋放,這就能定位出大概是業務系統和銀行通信的問題,分析原因:測試環境有多家銀行,有些銀行端測試環境沒有測試時並不會開啓,而業務系統直連的是轉換平臺,所以業務系統作爲客戶端,訪問轉換平臺是通的,而轉換平臺轉發不出去,無響應,雖說轉換平臺設置了超時時間,但是業務端作爲客戶端訪問時並沒有設置讀取超時時間,所以會導致客戶端等待因而導致句柄快速增長
下面貼出業務系統作爲cilen端和service端的代碼,並標誌出做出修改的部分。
client代碼:
package com.fortunes.hmfms.network.client;
import java.net.InetSocketAddress;
import java.util.concurrent.TimeUnit;
import org.apache.mina.core.RuntimeIoException;
import org.apache.mina.core.future.ConnectFuture;
import org.apache.mina.core.future.ReadFuture;
import org.apache.mina.core.future.WriteFuture;
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;
import org.apache.mina.filter.codec.ProtocolCodecFilter;
import org.apache.mina.filter.logging.LoggingFilter;
import org.apache.mina.transport.socket.SocketConnector;
import org.apache.mina.transport.socket.nio.NioSocketConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fortunes.Message;
import com.fortunes.hmfms.network.codec.MessageCodecFactory;
import com.fortunes.hmfms.network.model.XmlEntity;
public class Client extends IoHandlerAdapter{
final Logger logger = LoggerFactory.getLogger("ROOT");
public static final int CONNECT_TIMEOUT = 3000;
public static final String RETURN_VALUE = "returnValue";
private InetSocketAddress serverAddress;
private SocketConnector connector;
private IoSession session;
private MessageReceivedCallback messageReceivedCallback;
public Client() {
connector = new NioSocketConnector();
connector.getFilterChain().addLast("logger", new LoggingFilter());
connector.getFilterChain().addLast("codec",
new ProtocolCodecFilter(new MessageCodecFactory()));
//connector.setHandler(new Client());
connector.setHandler(this);
//設置超時 add by czp 20181207
connector.setConnectTimeoutMillis(CONNECT_TIMEOUT*2);
}
public boolean isConnected() {
return (session != null && session.isConnected());
}
public void connect(InetSocketAddress serverAddress){
setServerAddress(serverAddress);
connect();
}
public void reConnectIfNecessary(){
if(!isConnected()){
logger.info("連接被斷開,重新連接");
connect();
}
}
public void connect() {
ConnectFuture connectFuture = getConnector().connect(getServerAddress());
connectFuture.awaitUninterruptibly(CONNECT_TIMEOUT);
//add by czp 20181207
if (connectFuture.isDone()) {
if (!connectFuture.isConnected()) { //若在指定時間內沒連接成功,則拋出異常
logger.info("連接失敗");
getConnector().dispose(); //不關閉的話會運行一段時間後拋出,too many open files異常,導致無法連接
}
}
if(connectFuture.isConnected()){
try {
session = connectFuture.getSession();
session.getConfig().setUseReadOperation(true);
logger.info("成功連接至{},本地地址:{}",session.getRemoteAddress(),session.getLocalAddress());
} catch (RuntimeIoException e) {
logger.info("連接失敗",e);
}
}else{
connectFuture.cancel();
getConnector().dispose();
logger.info("連接失敗");
}
}
public XmlEntity sendRequest(Message message,MessageReceivedCallback callback){
/*setMessageReceivedCallback(callback);
WriteFuture writeFuture = session.write(message);
writeFuture.awaitUninterruptibly();
ReadFuture readFuture = session.read();
readFuture.awaitUninterruptibly();
return callback.process(this, session, (Message)readFuture.getMessage());
*/
//change by czp 20181207 解決銀行端無響應出現句柄快速上漲
Message resp=null;
try {
setMessageReceivedCallback(callback);
WriteFuture writeFuture = session.write(message);
writeFuture.awaitUninterruptibly();
ReadFuture readFuture = session.read();
if(readFuture.awaitUninterruptibly(CONNECT_TIMEOUT*2, TimeUnit.MILLISECONDS)){ //Wait until the message is received
resp=(Message)readFuture.getMessage();
//return callback.process(this, session, (Message)readFuture.getMessage());
}else{
logger.info("讀取服務端響應超時,服務端:"+readFuture.getSession().getServiceAddress());
if(session != null){
//關閉IoSession,該操作是異步的,true爲立即關閉,false爲所有寫操作都flush後關閉
//這裏僅僅是關閉了TCP的連接通道,並未關閉Client端程序
session.getService().dispose();
session.close(false);
//客戶端發起連接時,會請求系統分配相關的文件句柄,而在連接失敗時記得釋放資源,否則會造成文件句柄泄露
//當總的文件句柄數超過系統設置值時[ulimit -n],則拋異常"java.io.IOException: Too many open files",導致新連接無法創建,服務器掛掉
//所以,若不關閉的話,其運行一段時間後可能拋出too many open files異常,導致無法連接
connector.dispose();
logger.info("讀取服務端響應超時,客戶端自動釋放資源。。。。。。。。。。。");
}
}
} catch (Exception e) {
logger.info("Client.sendRequest出現異常:"+e.getStackTrace());
}
return callback.process(this, session, resp);
}
public void close(){
if(isConnected()){
//關閉IoSession,該操作是異步的,true爲立即關閉,false爲所有寫操作都flush後關閉
//這裏僅僅是關閉了TCP的連接通道,並未關閉Client端程序
session.getService().dispose();//add by czp 20181207
session.close(false);
connector.dispose();
logger.info("客戶端關閉了連接\n");
}
}
@Override
public void messageReceived(IoSession session, Object message)
throws Exception {
logger.info("收到來自"+session.getRemoteAddress()+"的消息:\n{} - 本地端口:{}",message,session.getLocalAddress());
}
@Override
public void messageSent(IoSession session, Object message) throws Exception {
logger.info("發送至"+session.getRemoteAddress()+"的消息:\n{}",message);
logger.info("消息已發送!");
}
@Override
public void sessionClosed(IoSession session) throws Exception {
session.getService().dispose();//add by czp 20181207
session.close(false);//add by czp 20181207
logger.info("連接至{}的連接被關閉!- 本地端口:{}",session.getRemoteAddress(),session.getLocalAddress());
}
@Override
public void exceptionCaught(IoSession session, Throwable cause)
throws Exception {
session.close(true);
logger.info("通信出現異常{}",cause);
}
public void setMessageReceivedCallback(MessageReceivedCallback messageReceivedCallback) {
this.messageReceivedCallback = messageReceivedCallback;
}
public MessageReceivedCallback getMessageReceivedCallback() {
return messageReceivedCallback;
}
public void setConnector(SocketConnector connector) {
this.connector = connector;
}
public SocketConnector getConnector() {
return connector;
}
public void setServerAddress(InetSocketAddress serverAddress) {
this.serverAddress = serverAddress;
}
public InetSocketAddress getServerAddress() {
return serverAddress;
}
}
尤其是在方法sessionClosed中添加的session.getService().dispose();和session.close(false);在session關閉前對句柄的釋放。這很重要,如果沒有釋放,即使session關閉,被它打開的文件句柄會一直持有的。
下面是業務系統作爲service服務端的代碼MessageHandler:(服務端代碼無修改)
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fortunes.Message;
import com.fortunes.hmfms.network.model.XmlEntity;
public class MessageHandler extends IoHandlerAdapter {
final Logger logger = LoggerFactory.getLogger("ROOT");
@Override
public void messageReceived(IoSession session, Object o){
logger.info("mina socket 通信,收到來自遠程客戶端:{}的請求,銀行代碼:{}",session.getRemoteAddress(),bankCode);
Message requestMessage = (Message)o;
XmlEntity responseXml = null;
XmlEntity requestXml = XmlEntity.parse(requestMessage.getContents());
if(requestXml == null){
responseXml = XmlEntity.create().createDefaultRequest(XML_ERROR);
responseXml.setResponseCodeAndMsg("000001", "XML報文格式解釋出錯!請檢查輸入的報文格式");
session.write(Message.createDefaultMessage(responseXml.buildAsBytes()));
}else{
try {
//業務處理代碼
} catch (NumberFormatException e) {
logger.info("報文接口程序執行異常", e);
responseXml = XmlEntity.create().createDefaultResponse(requestXml);
responseXml.setResponseCodeAndMsg("000001", "系統異常!請檢查輸入數據,"+e.getMessage());
} catch (Exception e) {
logger.info("報文接口程序執行異常", e);
responseXml = XmlEntity.create().createDefaultResponse(requestXml);
responseXml.setResponseCodeAndMsg("000001", "系統異常!請檢查輸入數據,稍後再試");
}
}
session.write(Message.createDefaultMessage(responseXml.buildAsBytes()));
}
@Override
public void exceptionCaught(IoSession session, Throwable cause)throws Exception {
session.close(true);
logger.info("通信程序執行異常", cause);
}
@Override
public void sessionClosed(IoSession session) throws Exception {
session.close(false);
// logger.info("連接已關閉!", session.getRemoteAddress());
logger.info("連接已關閉!");
}
}
通過上述修改,部署在測試環境測試後,發現再無句柄快速增長的情況,句柄數穩定在初始部署的條數。