昨天一運行客戶端頻繁出現
Tue Jul 17 16:06:21 2012 us=390000 Attempting to establish TCP connection with 1
92.168.1.86:10443 [nonblock]
Tue Jul 17 16:06:21 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
Tue Jul 17 16:06:26 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
初步懷疑是客戶端的問題,看了看客戶端配置,沒有發現任何異常,只好從服務器端判斷。
看了一下服務器的log
Tue Jul 17 10:48:20 2012 us=189690 192.168.1.189:52252 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:25 2012 us=186285 192.168.1.189:52254 Connection reset, restarting [0]
Tue Jul 17 10:48:25 2012 us=186317 192.168.1.189:52254 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:30 2012 us=188313 192.168.1.189:52256 Connection reset, restarting [0]
Tue Jul 17 10:48:30 2012 us=188344 192.168.1.189:52256 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:35 2012 us=188504 192.168.1.189:52258 Connection reset, restarting [0]
Tue Jul 17 10:48:35 2012 us=188536 192.168.1.189:52258 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:45 2012 us=188543 192.168.1.189:52265 Connection reset, restarting [0]
Tue Jul 17 10:48:45 2012 us=188576 192.168.1.189:52265 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:50 2012 us=187213 192.168.1.189:52270 Connection reset, restarting [0]
Tue Jul 17 10:48:50 2012 us=187244 192.168.1.189:52270 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:55 2012 us=185208 192.168.1.189:52287 Connection reset, restarting [0]
Tue Jul 17 10:48:55 2012 us=185239 192.168.1.189:52287 SIGUSR1[soft,connection-reset] received, client-instance restarting
貌似客戶端是自己退出的,通過抓包也發現客戶端發出syn包,就換了端口。
設置了一下,發現客戶端通過udp連接是正常的。
看了一天配置文檔後,換了官方標準的客戶端運行一切正常,當時暈倒。
通過和官方客戶端的對比發現,
Tue Jul 17 16:13:32 2012 us=250000 Attempting to establish TCP connection with 1
92.168.1.86:10443
Tue Jul 17 16:13:32 2012 us=250000 TCP connection established with 192.168.1.86:
10443
官方使用了阻塞方式connect。
open***處理連接的代碼在socket.c中,
int
open***_connect (socket_descriptor_t sd,
struct open***_sockaddr *remote,
int connect_timeout,
volatile int *signal_received)
{
int status = 0;
#ifdef CONNECT_NONBLOCK
set_nonblock (sd);
status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
if (status)
status = open***_errno_socket ();
if (status == EINPROGRESS )
{
while (true)
{
fd_set writes;
struct timeval tv;
FD_ZERO (&writes);
FD_SET (sd, &writes);
tv.tv_sec = 0;
tv.tv_usec = 0;
status = select (sd + 1, NULL, &writes, NULL, &tv);
if (signal_received)
{
get_signal (signal_received);
if (*signal_received)
{
status = 0;
break;
}
}
if (status < 0)
{
status = open***_errno_socket ();
break;
}
if (status <= 0)
{
if (--connect_timeout < 0)
{
status = ETIMEDOUT;
break;
}
open***_sleep (1);
continue;
}
/* got it */
{
int val = 0;
socklen_t len;
len = sizeof (val);
if (getsockopt (sd, SOL_SOCKET, SO_ERROR, (void *) &val, &len) == 0
&& len == sizeof (val))
status = val;
else
status = open***_errno_socket ();
break;
}
}
}
#else
status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
if (status)
status = open***_errno_socket ();
#endif
return status;
}
CONNECT_NONBLOCK 宏定義在 syshead.h中
/*
* Is non-blocking connect() supported?
*/
#if defined(HAVE_GETSOCKOPT) && defined(SOL_SOCKET)
&& defined(SO_ERROR) && defined(EINPROGRESS) &&
defined(ETIMEDOUT)
#define CONNECT_NONBLOCK
#endif
這個文件一直沒有修改,可能是環境變量的設置,導致CONNECT_NONBLOCK爲1,導致open***_connect使用nonblock。
發現open***的一個bug,open***的作者不大熟悉windows編程
set_nonblock (sd); windows下這行是廢話,不用設置nonblock
status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
if (status)
status = open***_errno_socket ();
返回 10035是正常的,
if (status == EINPROGRESS ) 就不對了, EINPROGRESS 115 /* Operation now in progress */,永遠不進循環裏面,
修改一下,
if (status == WSAEWOULDBLOCK || status == EINPROGRESS )
目前ok。