故障現象:
1、服務器負載、網絡IO、磁盤IO等都沒有問題,很正常。
2、/var/log/secure日誌無報錯。
3、/var/log/maillog日誌,10點48的時候提示連接不上redis服務,500條左右這樣的報錯。
Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: dict([email protected],10.0.45.214): Failed to lookup key shared/passdb/10.0.45.214/pop3/[email protected] Aug 23 10:48:46 kkmail dovecot: auth-worker(1818): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused Aug 23 10:48:46 kkmail dovecot: auth-worker(1818): Error: dict([email protected],10.0.55.234): Failed to lookup key shared/passdb/10.0.55.234/pop3/[email protected] Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: dict([email protected],10.0.154.104): Failed to lookup key shared/passdb/10.0.154.104/pop3/[email protected] Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: dict([email protected],10.0.33.115): Failed to lookup key shared/passdb/10.0.33.115/imap/[email protected] Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: dict([email protected],172.24.1.66): Failed to lookup key shared/passdb/172.24.1.66/imap/[email protected] Aug 23 10:48:46 kkmail dovecot: auth-worker(1813): Error: redis: connect(127.0.0.1, 10035) failed: Connection refused
4、緊接着就一直報這個錯誤,並且造成服務器僵死,ssh遠程不過去:
Aug 23 10:56:11 kkmail dovecot: anvil: Error: net_accept() failed: Too many open files Aug 23 10:56:11 kkmail dovecot: config: Error: net_accept() failed: Too many open files
5、netstat查看IMAP和POP3很多closed_wait狀態(有時間研究下什麼情況下可能會出現)
解決步驟:
1、懷疑linux句柄數滿的時候,使用如下命令查詢句柄數說被哪個進程佔了。
lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more (第一列是數量,第二列是進程ID) lsof -n|awk '{print $2}' |sort|uniq -c|sort -nr | sed '$d' |awk '{sum += $1} END {print sum}' (統計所有之和,監控最好監控下這個) 備註:這點我沒注意看。
2、更改最大打開文件數。
(1)要使 limits.conf 文件配置生效,必須要確保 pam_limits.so 文件被加入到啓動文件中。查看 /etc/pam.d/login 文件中有:
session required /lib64/security/pam_limits.so
(2)Linux更改最大連接數.
解決辦法:解除 Linux 系統的最大進程數和最大文件打開數限制:
vi /etc/security/limits.conf * soft nproc 655350 * hard nproc 655350 * soft nofile 655350 * hard nofile 655350
3、重啓服務器,要不有些佔用的不會釋放。重啓完成後ulimit -n驗證。
4、vi /etc/rsyslog.conf發現maillog日誌被定向到另外一臺服務器了,取消了。
5、服務器可以遠程進去,郵件收發正常,不慢了。
總結:
1、可能是redis崩潰引起的Error: net_accept() failed: Too many open files故障,也有可能是Error: net_accept() failed: Too many open files故障引起的redis崩潰。
2、其實不知道具體怎麼引起的,大神知道的麻煩留言給我下,謝謝。