1.問題現象
文件服務器在運行一段時間後,大量文件下載失敗
2.問題定位
查看服務器日誌,發現大量的連接池異常
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
導致從OBS下載文件失敗
通過查看服務器網絡狀態檢測到服務器有大量的CLOSE_WAIT的狀態
[root@host]# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
800 CLOSE_WAIT
23 ESTABLISHED
25 LISTEN
2 TIME_WAIT
研究了CLOSE_WAIT數量過大的原理一般是由於被動關閉連接處理不當導致的
例如,服務器A會去請求服務器B上面的apache獲取文件資源,正常情況下,如果請求成功,那麼在抓取完資源後服務器A會主動發出關閉連接的請求,這個時候就是主動關閉連接,連接狀態我們可以看到是TIME_WAIT。如果一旦發生異常呢?假設請求的資源服務器B上並不存在,那麼這個時候就會由服務器B發出關閉連接的請求,服務器A就是被動的關閉了連接,如果服務器A被動關閉連接之後自己並沒有釋放連接,那就會造成CLOSE_WAIT的狀態了
根據這個原理我們去分析,我們FileServer是作爲客戶端去請求OBS下載文件,如果OBS文件不存在,此時會發出關閉請求,如果FileServer沒有去關閉連接應該會導致此類問題
查看日誌,可以看出確實有大量的文件不存在情況
[root@host]# cat fileserver.log.2019-02-27* |grep 'get response from nsp is:404' |wc -l
881
查看代碼中連接池的配置,發現http最大連接數配置的是800,而現在CLOSE_WAIT的狀態已經達到800,這個也恰好可以驗證之前的猜想
繼續定位,查看下載文件時的代碼
private static CloseableHttpResponse getNSPResponse(String url, Integer fileType, String nspUrl) throws CException
{
HttpGet httpRequest = buildHttpGetUriRequest(url, fileType, nspUrl);
CloseableHttpResponse responseGet = null;
try
{
responseGet = HttpUtil.doGet(httpRequest);
}
catch (IOException e)
{
logDebugger.error("getNSPResponse.url = " + url + ",Send to nsp doGet is IOException:", e);
throw new CException(ResultCodeConstants.SEND_NSP_IO_EXCEPTION, "Send to nsp doGet is IOException");
}
if (responseGet == null || responseGet.getStatusLine() == null)
{
logDebugger.error("getNSPResponse.url = " + url + ",Get response from nsp is null");
throw new CException(ResultCodeConstants.GET_RESP_FROM_NSP_NUll, "Get response from nsp is null");
}
int statusCode = responseGet.getStatusLine().getStatusCode();
logDebugger.info("getNSPResponse.url = " + url + ",get response from nsp is:{}",statusCode);
if (HttpStatus.SC_OK != statusCode)
{
int errorCode = ResultCodeConstants.RESOPNE_FROM_NSP_PREFIX * 1000 + statusCode;
logDebugger.error("getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = " + errorCode);
throw new CException(errorCode,
"getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = "+ errorCode);
}
return responseGet;
}
在代碼中果然發現了問題,當我們獲取文件如果不存在時,代碼中直接拋出的異常,而沒有進行response流的關閉,導致連接一直未釋放
出現問題代碼如下:
3.模擬問題重現
下載一個OBS上不存在的文件,進行開發環境問題重現
public static void main(String[] args) throws IOException {
String url = "http://lfappdevfile01.hwcloudtest.cn:18085/FileServer/getFile/app/000/000/375/0900086000000000375.20190227145423.92492623218927803776262971924930:20190510161531:2500:9692682CC19232CA6DE605D340C269D7E11CBAFC1B67D7F3E476030E679D5EC4.jpg";
for(int i = 0; i< 1000; i++)
{
HttpRequest.downLoadFromUrl(url,"test" +i +".jpg","D:\\test\\test\\");
}
System.out.println("下載完成");
}
測試前
測試後,問題果然重新
4.代碼修復
知道原因以後修改代碼,進行測試,功能恢復正常,未出現大量CLOSE_WAIT狀態
環境運行一段時間未出現該問題
private static JSONObject getNSPResponse(String url, Integer fileType, String nspUrl) throws CException
{
HttpGet httpRequest = buildHttpGetUriRequest(url, fileType, nspUrl);
CloseableHttpResponse responseGet = null;
try()
{
responseGet = HttpUtil.doGet(httpRequest);
if (responseGet == null || responseGet.getStatusLine() == null)
{
logDebugger.error("getNSPResponse.url = " + url + ",Get response from nsp is null");
throw new CException(ResultCodeConstants.GET_RESP_FROM_NSP_NUll, "Get response from nsp is null");
}
int statusCode = responseGet.getStatusLine().getStatusCode();
logDebugger.info("getNSPResponse.url = " + url + ",get response from nsp is:{}",statusCode);
if (HttpStatus.SC_OK != statusCode)
{
int errorCode = ResultCodeConstants.RESOPNE_FROM_NSP_PREFIX * 1000 + statusCode;
logDebugger.error("getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = " + errorCode);
throw new CException(errorCode,
"getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = "+ errorCode);
}
JSONObject rspStruct = getResponse(responseGet);
logDebugger.info(
"NSPServiceClient.executeGet,rspStruct.getString(url)=" + rspStruct.getString("url") + ",status = "
+ responseGet.getStatusLine().getStatusCode() + " from nsp.");
return rspStruct;
}
catch (IOException e)
{
logDebugger.error("getNSPResponse.url = " + url + ",Send to nsp doGet is IOException:", e);
throw new CException(ResultCodeConstants.SEND_NSP_IO_EXCEPTION, "Send to nsp doGet is IOException");
}
finally
{
IOUtils.closeQuietly(responseGet);
}
}