IOCP使用時常見的幾個錯誤

在使用IOCP時,最重要的幾個API就是GetQueueCompeltionStatus、WSARecv、WSASend,數據的I/O及其完成狀態通過這幾個接口獲取並進行後續處理。

GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.

						BOOL WINAPI GetQueuedCompletionStatus(
  __in   HANDLE CompletionPort,
  __out  LPDWORD lpNumberOfBytes,
  __out  PULONG_PTR lpCompletionKey,
  __out  LPOVERLAPPED *lpOverlapped,
  __in   DWORD dwMilliseconds
);
				

If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfByteslpCompletionKey, and lpOverlapped parameters.

除了關心這個API的in & out(這是MSDN開頭的幾行就可以告訴我們的)之外,我們更加關心不同的return & out意味着什麼,因爲由於各種已知或未知的原因,我們的程序並不總是有正確的return & out。

If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.

假設我們指定dwMilliseconds爲INFINITE。

這裏常見的幾個錯誤有:

WSA_OPERATION_ABORTED (995): Overlapped operation aborted.

由於線程退出或應用程序請求,已放棄I/O 操作。

MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.

成因分析:這個錯誤一般是由於peer socket被closesocket或者WSACleanup關閉後,針對這些socket的pending overlapped I/O operation被中止。

解決方案:針對socket,一般應該先調用shutdown禁止I/O操作後再調用closesocket關閉。

嚴重程度輕微易處理

WSAENOTSOCK (10038): Socket operation on nonsocket.

MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or forselect, a member of an fd_set was not valid.

成因分析:在一個非套接字上嘗試了一個操作。

使用closesocket關閉socket之後,針對該invalid socket的任何操作都會獲得該錯誤。

解決方案:如果是多線程存在對同一socket的操作,要保證對socket的I/O操作邏輯上的順序,做好socket的graceful disconnect。

嚴重程度輕微易處理

WSAECONNRESET (10054): Connection reset by peer.

遠程主機強迫關閉了一個現有的連接。

MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

成因分析:在使用WSAAccpet、WSARecv、WSASend等接口時,如果peer application突然中止(原因如上所述),往其對應的socket上投遞的operations將會失敗。

解決方案:如果是對方主機或程序意外中止,那就只有各安天命了。但如果這程序是你寫的,而你只是hard close,那就由不得別人了。至少,你要知道這樣的錯誤已經出現了,就不要再費勁的繼續投遞或等待了。

嚴重程度輕微易處理

WSAECONNREFUSED (10061): Connection refused.

由於目標機器積極拒絕,無法連接。

MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.

成因分析:在使用connect或WSAConnect時,服務器沒有運行或者服務器的監聽隊列已滿;在使用WSAAccept時,客戶端的連接請求被condition function拒絕。

解決方案:Call connect or WSAConnect again for the same socket. 等待服務器開啓、監聽空閒或查看被拒絕的原因。是不是長的醜或者錢沒給夠,要不就是服務器拒絕接受天價薪酬自主創業去了?

嚴重程度輕微易處理

WSAENOBUFS (10055): No buffer space available.

由於系統緩衝區空間不足或列隊已滿,不能執行套接字上的操作。

MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

成因分析:這個錯誤是我查看錯誤日誌後,最在意的一個錯誤。因爲服務器對於消息收發有明確限制,如果緩衝區不足應該早就處理了,不可能待到send/recv失敗啊。而且這個錯誤在之前的版本中幾乎沒有出現過。這也是這篇文章的主要內容。像connect和accept因爲緩衝區空間不足都可以理解,而且危險不高,但如果send/recv造成擁堵並惡性循環下去,麻煩就大了,至少說明之前的驗證邏輯有疏漏。

WSASend失敗的原因是:The Windows Sockets provider reports a buffer deadlock. 這裏提到的是buffer deadlock,顯然是由於多線程I/O投遞不當引起的。

解決方案:在消息收發前,對最大掛起的消息總的數量和容量進行檢驗和控制。

嚴重程度嚴重

本文主要參考MSDN

************* 說明 *************

Fox只是對自己關心的幾個錯誤和API參照MSDN進行分析,不提供額外幫助。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章