詭異的TCP連接問題

最近幫同事解決了一個詭異的TCP連接問題，最終找到的原因是程序在fork時的一個bug。

現象是這樣的：兩個進程A和B通過TCP建立Socket連接，運行不定長時間後A發現B斷開了，但B進程一直都在，從來沒有退出過，而且TCP連接仍然是ESTABLISHED狀態。

A進程裏的log：

2012-01-07T20:18:51.502162Z [warn] multiaccept.cpp:371: Component B is GONE

查看B進程（PID爲17076）的TCP鏈接：

[root@ucs106-3 ha-cm1]# netstat -anpt|grep 17076

tcp 0 0 70.0.0.15:34702 70.0.0.15:7400 ESTABLISHED 17076/B

再進一步查看，發現更加詭異的事情是，B進程的Socket竟然連接到了另外一個C進程，C是A fork出來的其中一個子進程。同時ouput buffer阻塞了大量數據。查了半天配置也沒有發現是配置文件的錯誤。

[root@ucs106-3 ha-cm1]# netstat -anpt|grep 34702

tcp 0 0 70.0.0.15:34702 70.0.0.15:7400 ESTABLISHED 17076/B

tcp 99884 0 70.0.0.15:7400 70.0.0.15:34702 ESTABLISHED 17089/C

最後問題是出在了下面的代碼中。

void _on_accept(...)

{

fd = accept(m->fd, (struct sockaddr*)&sa, (socklen_t*)&sa_size);

flags = fcntl(fd, F_GETFL, 0);

flags |= O_NONBLOCK;

fcntl(fd, F_SETFL, flags);

...

//fork C later

}

原因就在於fd是在fork新的子進程之前就建立起來的，因爲沒有設FD_CLOEXEC標誌位，子進程會引用同一個sock descriptor。因此即使父進程A已經斷開了與B的連接，子進程C和B之間還會維護一個“假”的連接。

這個從/proc文件系統中也能看到，假設A的PID是7177，C是7232。結果是這樣的：

[root@xcp-core-base3 fd]# pwd

/proc/7177/fd

[root@xcp-core-base3 fd]# ls -l|grep socket

lrwx------ 1 root root 64 Jan 16 01:08 10 -> socket:[47153468]

lrwx------ 1 root root 64 Jan 16 01:08 13 -> socket:[47153473]

lrwx------ 1 root root 64 Jan 16 01:08 14 -> socket:[47153481]

lrwx------ 1 root root 64 Jan 16 01:08 15 -> socket:[47153488]

...

[root@xcp-core-base3 fd]# cd /proc/7232/fd

[root@xcp-core-base3 fd]# pwd

/proc/7232/fd

[root@xcp-core-base3 fd]# ls -l|grep socket

lrwx------ 1 root root 64 Jan 16 01:08 13 -> socket:[47153473]

lrwx------ 1 root root 64 Jan 16 01:08 14 -> socket:[47153481]

lrwx------ 1 root root 64 Jan 16 01:08 15 -> socket:[47153488]

...

找到了原因，解決問題只需要下面一行代碼：

// Set the FD_CLOEXEC bit so exec'd processes don't get this  
// socket handle automatically.  
fcntl(fd, F_SETFD, FD_CLOEXEC);

詭異的TCP連接問題

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

京東面試：如何進行JVM調優？

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

OpenStack neutron floatingips 與 iptables 深入分析

用oFono來GPRS上網

我的友情鏈接

用packstack安裝OpenStack後無法ping通VM的解決辦法

oFono安裝和啓動

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結