open*** bug

昨天一運行客戶端頻繁出現
Tue Jul 17 16:06:21 2012 us=390000 Attempting to establish TCP connection with 1
92.168.1.86:10443 [nonblock]
Tue Jul 17 16:06:21 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
Tue Jul 17 16:06:26 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
初步懷疑是客戶端的問題,看了看客戶端配置,沒有發現任何異常,只好從服務器端判斷。
看了一下服務器的log
Tue Jul 17 10:48:20 2012 us=189690 192.168.1.189:52252 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:25 2012 us=186285 192.168.1.189:52254 Connection reset, restarting [0]
Tue Jul 17 10:48:25 2012 us=186317 192.168.1.189:52254 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:30 2012 us=188313 192.168.1.189:52256 Connection reset, restarting [0]
Tue Jul 17 10:48:30 2012 us=188344 192.168.1.189:52256 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:35 2012 us=188504 192.168.1.189:52258 Connection reset, restarting [0]
Tue Jul 17 10:48:35 2012 us=188536 192.168.1.189:52258 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:45 2012 us=188543 192.168.1.189:52265 Connection reset, restarting [0]
Tue Jul 17 10:48:45 2012 us=188576 192.168.1.189:52265 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:50 2012 us=187213 192.168.1.189:52270 Connection reset, restarting [0]
Tue Jul 17 10:48:50 2012 us=187244 192.168.1.189:52270 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:55 2012 us=185208 192.168.1.189:52287 Connection reset, restarting [0]
Tue Jul 17 10:48:55 2012 us=185239 192.168.1.189:52287 SIGUSR1[soft,connection-reset] received, client-instance restarting
貌似客戶端是自己退出的,通過抓包也發現客戶端發出syn包,就換了端口。
設置了一下,發現客戶端通過udp連接是正常的。
看了一天配置文檔後,換了官方標準的客戶端運行一切正常,當時暈倒。
通過和官方客戶端的對比發現,
Tue Jul 17 16:13:32 2012 us=250000 Attempting to establish TCP connection with 1
92.168.1.86:10443
Tue Jul 17 16:13:32 2012 us=250000 TCP connection established with 192.168.1.86:
10443
官方使用了阻塞方式connect。
open***處理連接的代碼在socket.c中,
int
open***_connect (socket_descriptor_t sd,
   struct open***_sockaddr *remote,
   int connect_timeout,
   volatile int *signal_received)
{
  int status = 0;

#ifdef CONNECT_NONBLOCK
  set_nonblock (sd);
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
  if (status == EINPROGRESS )
    {
      while (true)
 {
   fd_set writes;
   struct timeval tv;

   FD_ZERO (&writes);
   FD_SET (sd, &writes);
   tv.tv_sec = 0;
   tv.tv_usec = 0;

   status = select (sd + 1, NULL, &writes, NULL, &tv);

   if (signal_received)
     {
       get_signal (signal_received);
       if (*signal_received)
  {
    status = 0;
    break;
  }
     }
   if (status < 0)
     {
       status = open***_errno_socket ();
       break;
     }
   if (status <= 0)
     {
       if (--connect_timeout < 0)
  {
    status = ETIMEDOUT;
    break;
  }
       open***_sleep (1);
       continue;
     }

   /* got it */
   {
     int val = 0;
     socklen_t len;

     len = sizeof (val);
     if (getsockopt (sd, SOL_SOCKET, SO_ERROR, (void *) &val, &len) == 0
  && len == sizeof (val))
       status = val;
     else
       status = open***_errno_socket ();
     break;
   }
 }
    }
#else
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
#endif

  return status;
}
CONNECT_NONBLOCK 宏定義在 syshead.h中
/*
 * Is non-blocking connect() supported?
 */

#if defined(HAVE_GETSOCKOPT) && defined(SOL_SOCKET) && defined(SO_ERROR) && defined(EINPROGRESS) && defined(ETIMEDOUT)
#define CONNECT_NONBLOCK
#endif
這個文件一直沒有修改,可能是環境變量的設置,導致CONNECT_NONBLOCK爲1,導致open***_connect使用nonblock。
發現open***的一個bug,open***的作者不大熟悉windows編程
 set_nonblock (sd); windows下這行是廢話,不用設置nonblock
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
返回 10035是正常的,
if (status == EINPROGRESS ) 就不對了, EINPROGRESS 115 /* Operation now in progress */,永遠不進循環裏面,
修改一下,
  if (status == WSAEWOULDBLOCK || status == EINPROGRESS  )

目前ok。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章