最近在看Go語言的tcp連接,由於涉及知識很多很雜,先零零碎碎記錄一些,日後在整理。
目錄
理論
有關TCP三次握手和傳輸數據作者之前也寫過一篇,可以也閱讀一下。
Go語言中TCP、UDP都在net庫裏面封裝好了,對應底層調用的函數都是Linux系統函數。這裏我們主要關注TCP協議中listen函數中backlog參數。
(圖來自https://blog.csdn.net/ordeder/article/details/21551567#commentBox)
從上圖可知:
- 內核爲client與server建立的TCP連接維護2個隊列:未完成連接隊列和已完成連接隊列。
- 當client開始向server發送sync時,將conn加入到未完成連接隊列。
- 當server向client發送sync,ack時,將未完成連接隊列中的conn狀態改爲SYN_REVD。
- 當client向server發送ack,server將conn從未完成連接隊列移到未滿的已完成連接隊列。
- accept函數不參與三次握手。
- 當調用accept函數,則從已完成連接隊列獲取一個,同時把這個連接從隊列中清除。
注意:圖中說“兩隊列之和不超過backlog”,這是有問題的,並且作者測試也不是這個結果,在3.10.0內核版本man listen如下:
所以backlog指的是已經完成連接正等待應用程序接收的套接字隊列的長度,而不是未完成連接的數目。未完成連接套接字隊列的最大長度默認爲tcp_max_syn_backlog。
如果backlog設置比較小,但是同一時間有大量連接並且server也未及時accept,可能會導致client調用connect阻塞。下面會使用c語言和Go語言分別來測試:
前提環境:
[root@localhost tcptest]# uname -a
Linux localhost.localdomain 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost tcptest]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
測試前需要了解Linux系統net中一些默認配置:
- /proc/sys/net/core/somaxconn 指定backlog默認大小(128),當然我們通過listen設置就會忽略這個值,但是Go語言中,listen的backlog參數使用的是這個文件指定的值。
- /proc/sys/net/ipv4/tcp_syn_retries SYN重試次數,默認大小6。
- /proc/sys/net/ipv4/tcp_synack_retries SYN+ACK重試次數,默認大小5。
- /proc/sys/net/ipv4/tcp_max_syn_backlog 就是上面提到的tcp_max_syn_backlog(未完成連接隊列)的默認值(128)。
- /proc/sys/net/ipv4/tcp_abort_on_overflow 默認值是0。如果設置1,當服務器已完成隊列滿了,新的連接不能加入,則直接返回給客戶端RST宣告連接失敗,同時客戶端返回錯誤是errno=104(Connection reset by peer);如果設置0,當服務器已完成隊列滿了,新的連接不能加入,但是服務器重新給客戶端發送SYN+ACK,這時客戶端認爲上一次ACK丟失並開始重傳,這時已完成隊列還是滿的,服務器繼續重傳SYN+ACK,直到客戶端功能加入已完成隊列或者服務器重傳SYN+ACK次數達到tcp_synack_retries返回RST包給客戶端宣告連接失敗,同時客戶端返回錯誤是errno=110(Connection timed out)。
臨時修改上面這些值,執行:
sysctl -w net.core.somaxconn=2048
sysctl -w net.ipv4.tcp_syn_retries=7
sysctl -w net.ipv4.tcp_synack_retries=6
sysctl -w net.ipv4.tcp_overflow=0
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
永久修改的話,直接修改相應文件。
測試
下面爲了測試backlog較小,大量客戶端請求時,accept不及時調用,導致其餘連接連接失敗,直接將tcp_abort_on_overflow 設置爲1。
c語言版本
//client.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <pthread.h>
#include <time.h>
#include <errno.h>
#define SER "127.0.0.1"
#define PORT 8888
typedef struct sockaddr SA;
typedef struct sockaddr_in SA_IN;
char* getDateTime()
{
static char nowtime[20];
time_t rawtime;
struct tm* ltime;
time(&rawtime);
ltime = localtime(&rawtime);
strftime(nowtime, 20, "%Y-%m-%d %H:%M:%S", ltime);
return nowtime;
}
void* loopprint(void* ptr)
{
while(1){
if(errno == 111)
printf("errno:%d %s ",errno,strerror(errno));
}
}
void* dosomething(void *ptr)
{
int sockfd;
SA_IN server,addr;
int flags=1;
socklen_t addrlen=sizeof(SA_IN);
//socket1
sockfd=socket(AF_INET,SOCK_STREAM,0);
bzero(&server,sizeof(SA_IN));
server.sin_family=AF_INET;
server.sin_addr.s_addr=inet_addr(SER);
server.sin_port=htons(PORT);
if(setsockopt(sockfd,SOL_SOCKET,SO_REUSEADDR,&flags,sizeof(int)) ==-1)
printf("setsockopt sockfd\n");
if(connect(sockfd,(SA *)&server,sizeof(SA_IN)) == -1){
printf("errno:%d %s ",errno,strerror(errno));
int* cnt = (int *)ptr;
char* nowtime = getDateTime();
printf("%s thread:%d\n",nowtime,*cnt);
return NULL;
}else{
printf("connect ok ");
}
int* cnt = (int *)ptr;
char* nowtime = getDateTime();
printf("%s thread:%d\n",nowtime,*cnt);
while(1);
close(sockfd);
return NULL;
}
#define MAX 30
int main()
{
pthread_t pt[MAX],ptt;
int ret[MAX];
int param[MAX];
int i;
// 爲了捕捉ECONNREFUSED錯誤,但是沒有捕捉到
//pthread_create(&ptt, NULL, (void *)&loopprint, NULL);
for(i=0;i < MAX;i++){
sleep(1); //加不加其實結論都是一樣的
param[i] = i;
ret[i] = pthread_create(&pt[i], NULL, (void *)&dosomething, (void*)¶m[i]);
}
for(i=0;i < MAX;i++){
if(ret[i] != 0){
printf("thread[%d] create fail\n",i);
}
}
void *retval;
for(i=0;i < MAX;i++){
pthread_join(pt[i], &retval);
}
return 0;
}
//server.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <unistd.h>
#define SER "127.0.0.1"
#define PORT 8888
typedef struct sockaddr SA;
typedef struct sockaddr_in SA_IN;
int main(int argc,char **argv)
{
int sockfd;
SA_IN server,addr;
int flags=1;
//socket1
sockfd=socket(AF_INET,SOCK_STREAM,0);
bzero(&server,sizeof(SA_IN));
server.sin_family=AF_INET;
server.sin_addr.s_addr=inet_addr(SER);
server.sin_port=htons(PORT);
if(setsockopt(sockfd,SOL_SOCKET,SO_REUSEADDR,&flags,sizeof(int)) ==-1)
printf("setsockopt sockfd fail\n");
if(bind(sockfd,(SA *)&server,sizeof(SA_IN)) == -1)
printf("bind sockfd fail\n");
if(listen(sockfd,20) == -1)
printf("listen fail\n");
//struct sockaddr_in clnt_addr;
//socklen_t clnt_addr_size = sizeof(clnt_addr);
//int clnt_sock = accept(serv_sock, (struct sockaddr*)&clnt_addr, &clnt_addr_size);
while(1);
close(sockfd);
return 0;
}
運行結果:
connect ok 2019-11-13 14:30:06 thread:0
connect ok 2019-11-13 14:30:07 thread:1
connect ok 2019-11-13 14:30:08 thread:2
connect ok 2019-11-13 14:30:09 thread:3
connect ok 2019-11-13 14:30:10 thread:4
connect ok 2019-11-13 14:30:11 thread:5
connect ok 2019-11-13 14:30:12 thread:6
connect ok 2019-11-13 14:30:13 thread:7
connect ok 2019-11-13 14:30:14 thread:8
connect ok 2019-11-13 14:30:15 thread:9
connect ok 2019-11-13 14:30:16 thread:10
connect ok 2019-11-13 14:30:17 thread:11
connect ok 2019-11-13 14:30:18 thread:12
connect ok 2019-11-13 14:30:19 thread:13
connect ok 2019-11-13 14:30:20 thread:14
connect ok 2019-11-13 14:30:21 thread:15
connect ok 2019-11-13 14:30:22 thread:16
connect ok 2019-11-13 14:30:23 thread:17
connect ok 2019-11-13 14:30:24 thread:18
connect ok 2019-11-13 14:30:25 thread:19
connect ok 2019-11-13 14:30:26 thread:20
errno:104 Connection reset by peer 2019-11-13 14:30:28 thread:21
errno:104 Connection reset by peer 2019-11-13 14:30:29 thread:22
errno:104 Connection reset by peer 2019-11-13 14:30:29 thread:23
errno:104 Connection reset by peer 2019-11-13 14:30:31 thread:24
errno:104 Connection reset by peer 2019-11-13 14:30:32 thread:25
errno:104 Connection reset by peer 2019-11-13 14:30:33 thread:26
errno:104 Connection reset by peer 2019-11-13 14:30:33 thread:27
errno:104 Connection reset by peer 2019-11-13 14:30:35 thread:28
errno:104 Connection reset by peer 2019-11-13 14:30:36 thread:29
上面listen設置backlog爲20,但是連接成功有21個,剩下的直接返回失敗。
Go語言版本
首先,Go語言直接爲我們設置好了backlog的大小:
func maxListenerBacklog() int {
fd, err := open("/proc/sys/net/core/somaxconn")
if err != nil {
return syscall.SOMAXCONN
}
defer fd.close()
l, ok := fd.readLine()
if !ok {
return syscall.SOMAXCONN
}
f := getFields(l)
n, _, ok := dtoi(f[0])
if n == 0 || !ok {
return syscall.SOMAXCONN
}
// Linux stores the backlog in a uint16.
// Truncate number to avoid wrapping.
// See issue 5030.
if n > 1<<16-1 {
n = 1<<16 - 1
}
return n
}
[root@localhost tcptest-go]# cat /proc/sys/net/core/somaxconn
128
如果想修改這個值,就直接修改somaxconn文件中的值。這裏我們把改成20。
//client.go
func establishConn(i int) net.Conn {
//conn, err := net.Dial("tcp", ":8888")
conn,err := net.DialTimeout("tcp",":8888",10000*time.Second)
if err != nil {
log.Printf("%d: dial error: %s", i, err)
return nil
}
log.Println(i, ":connect to server ok")
for{
time.Sleep(time.Second)
}
return conn
}
func main() {
for i := 1; i <= 30; i++ {
go establishConn(i)
}
time.Sleep(time.Second * 10000)
}
//server.go
func main() {
l, err := net.Listen("tcp", ":8888")
if err != nil {
log.Println("error listen:", err)
return
}
defer l.Close()
log.Println("listen ok")
//var i int
for {
time.Sleep(time.Second * 100000)
//log.Printf("%d: accept a new connection\n",i)
//if _, err := l.Accept(); err != nil {
// log.Println("accept error:", err)
// break
//}
//i++
//log.Printf("%d: accept a new connection\n", i)
}
}
運行結果:
2019/11/13 15:44:30 30: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 29: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 28: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 27: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 26: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 25: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 24: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 23: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 22: dial error: dial tcp :8888: connect: connection reset by peer
2019/11/13 15:44:30 21 :connect to server ok
2019/11/13 15:44:30 20 :connect to server ok
2019/11/13 15:44:30 19 :connect to server ok
2019/11/13 15:44:30 18 :connect to server ok
2019/11/13 15:44:30 17 :connect to server ok
2019/11/13 15:44:30 16 :connect to server ok
2019/11/13 15:44:30 15 :connect to server ok
2019/11/13 15:44:30 14 :connect to server ok
2019/11/13 15:44:30 13 :connect to server ok
2019/11/13 15:44:30 12 :connect to server ok
2019/11/13 15:44:30 11 :connect to server ok
2019/11/13 15:44:30 10 :connect to server ok
2019/11/13 15:44:30 9 :connect to server ok
2019/11/13 15:44:30 8 :connect to server ok
2019/11/13 15:44:30 7 :connect to server ok
2019/11/13 15:44:30 6 :connect to server ok
2019/11/13 15:44:30 5 :connect to server ok
2019/11/13 15:44:30 4 :connect to server ok
2019/11/13 15:44:30 3 :connect to server ok
2019/11/13 15:44:30 2 :connect to server ok
2019/11/13 15:44:30 1 :connect to server ok
同樣listen設置backlog爲20,但是連接成功有21個,剩下的直接返回失敗。
總結
- 已完成隊列大小 = min(backlog,somaxconn)。
- 在實際工作中,將 somaxconn 設置一個合理大小,並且及時accept,對於一般的高併發是沒有什麼問題的。
- 疑問:以上例子backlog設置爲20(作者嘗試改成10,會有10+1連接成功),但是爲什麼有backlog+1個連接成功?
參考文章
Go語言TCP Socket編程
深入理解Linux TCP backlog
Linux SYN Backlog and somaxconn
Linux errno定義