守護進程與殭屍進程

http://blog.csdn.net/russell_tao/article/details/7090033

04年時維護的第一個商業服務就用了兩次fork產生守護進程的做法，前兩天在網上看到許多帖子以及一些unix書籍，認爲一次fork後產生守護進程足夠了，各有道理吧，不過多了一次fork到底是出於什麼目的呢？

進程也就是task，看看內核裏維護進程的數據結構task_struct，這裏有兩個成員：

[cpp] view plain copy

struct task_struct {
volatile long state;
int exit_state;
...
}

看看include/linux/sched.h裏的value取值：

[cpp] view plain copy

#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
#define __TASK_STOPPED 4
#define __TASK_TRACED 8
/* in tsk->exit_state */
#define EXIT_ZOMBIE 16
#define EXIT_DEAD 32
/* in tsk->state again */
#define TASK_DEAD 64
#define TASK_WAKEKILL 128
#define TASK_WAKING 256
#define TASK_STATE_MAX 512

可以看到，進程狀態裏除了大家都理解的running/interuptible/uninterruptible/stop等狀態外，還有一個ZOMBIE狀態，這個狀態是怎麼回事呢？

這是因爲linux裏的進程都屬於一顆樹，樹的根結點是linux系統初始化結束階段時啓動的init進程，這個進程的pid是1，所有的其他進程都是它的子孫。除了init，任何進程一定有他的父進程，而父進程會負責分配（fork）、回收（wait4）它申請的進程資源。這個樹狀關係也比較健壯，當某個進程還在運行時，它的父進程卻退出了，這個進程卻沒有成爲孤兒進程，因爲linux有一個機制，init進程會接管它，成爲它的父進程。這也是守護進程的由來了，因爲守護進程的其中一個要求就是希望init成爲守護進程的父進程。

如果某個進程自身終止了，在調用exit清理完相關的內容文件等資源後，它就會進入ZOMBIE狀態，它的父進程會調用wait4來回收這個task_struct，但是，如果父進程一直沒有調用wait4去釋放子進程的task_struct，問題就來了，這個task_struct誰來回收呢？永遠沒有人，除非父進程終止後，被init進程接管這個ZOMBIE進程，然後調用wait4來回收進程描述符。如果父進程一直在運行着，這個ZOMBIE會永遠的佔用系統資源，用KILL發任何信號量也不能釋放它。這是很可怕的，因爲服務器上可能會出現無數ZOMBIE進程導致機器掛掉。

來看看內核代碼吧。進程在退出時執行sys_exit（C程序裏在main函數返回會執行到），而它會調用do_exit，do_exit首先清理進程使用的資源，然後調用exit_notify方法，將進程置爲殭屍ZOMBIE狀態，決定是否要以init進程做爲當前進程的父進程，最後通知當前進程的父進程：

kernel/exit.c

[cpp] view plain copy

static void exit_notify(struct task_struct *tsk)
{
int state;
struct task_struct *t;
struct list_head ptrace_dead, *_p, *_n;
if (signal_pending(tsk) && !tsk->signal->group_exit
&& !thread_group_empty(tsk)) {
/*
* This occurs when there was a race between our exit
* syscall and a group signal choosing us as the one to
* wake up. It could be that we are the only thread
* alerted to check for pending signals, but another thread
* should be woken now to take the signal since we will not.
* Now we'll wake all the threads in the group just to make
* sure someone gets all the pending signals.
*/
read_lock(&tasklist_lock);
spin_lock_irq(&tsk->sighand->siglock);
for (t = next_thread(tsk); t != tsk; t = next_thread(t))
if (!signal_pending(t) && !(t->flags & PF_EXITING)) {
recalc_sigpending_tsk(t);
if (signal_pending(t))
signal_wake_up(t, 0);
}
spin_unlock_irq(&tsk->sighand->siglock);
read_unlock(&tasklist_lock);
}
write_lock_irq(&tasklist_lock);
/*
* This does two things:
*
* A. Make init inherit all the child processes
* B. Check to see if any process groups have become orphaned
* as a result of our exiting, and if they have any stopped
* jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2)
*/
INIT_LIST_HEAD(&ptrace_dead);
forget_original_parent(tsk, &ptrace_dead);
BUG_ON(!list_empty(&tsk->children));
BUG_ON(!list_empty(&tsk->ptrace_children));
/*
* Check to see if any process groups have become orphaned
* as a result of our exiting, and if they have any stopped
* jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2)
*
* Case i: Our father is in a different pgrp than we are
* and we were the only connection outside, so our pgrp
* is about to become orphaned.
*/
t = tsk->real_parent;
if ((process_group(t) != process_group(tsk)) &&
(t->signal->session == tsk->signal->session) &&
will_become_orphaned_pgrp(process_group(tsk), tsk) &&
has_stopped_jobs(process_group(tsk))) {
__kill_pg_info(SIGHUP, (void *)1, process_group(tsk));
__kill_pg_info(SIGCONT, (void *)1, process_group(tsk));
}
/* Let father know we died
*
* Thread signals are configurable, but you aren't going to use
* that to send signals to arbitary processes.
* That stops right now.
*
* If the parent exec id doesn't match the exec id we saved
* when we started then we know the parent has changed security
* domain.
*
* If our self_exec id doesn't match our parent_exec_id then
* we have changed execution domain as these two values started
* the same after a fork.
*
*/
if (tsk->exit_signal != SIGCHLD && tsk->exit_signal != -1 &&
( tsk->parent_exec_id != t->self_exec_id ||
tsk->self_exec_id != tsk->parent_exec_id)
&& !capable(CAP_KILL))
tsk->exit_signal = SIGCHLD;
/* If something other than our normal parent is ptracing us, then
* send it a SIGCHLD instead of honoring exit_signal. exit_signal
* only has special meaning to our real parent.
*/
if (tsk->exit_signal != -1 && thread_group_empty(tsk)) {
int signal = tsk->parent == tsk->real_parent ? tsk->exit_signal : SIGCHLD;
do_notify_parent(tsk, signal);
} else if (tsk->ptrace) {
do_notify_parent(tsk, SIGCHLD);
}
state = EXIT_ZOMBIE;
if (tsk->exit_signal == -1 && tsk->ptrace == 0)
state = EXIT_DEAD;
tsk->exit_state = state;
/*
* Clear these here so that update_process_times() won't try to deliver
* itimer, profile or rlimit signals to this task while it is in late exit.
*/
tsk->it_virt_value = 0;
tsk->it_prof_value = 0;
write_unlock_irq(&tasklist_lock);
list_for_each_safe(_p, _n, &ptrace_dead) {
list_del_init(_p);
t = list_entry(_p,struct task_struct,ptrace_list);
release_task(t);
}
/* If the process is dead, release it - nobody will wait for it */
if (state == EXIT_DEAD)
release_task(tsk);
/* PF_DEAD causes final put_task_struct after we schedule. */
preempt_disable();
tsk->flags |= PF_DEAD;
}

大家可以看到這段內核代碼的註釋非常全。forget_original_parent這個函數還會把該進程的所有子孫進程重設父進程，交給init進程接管。

回過頭來，看看爲什麼守護進程要fork兩次。這裏有一個假定，父進程生成守護進程後，還有自己的事要做，它的人生意義並不只是爲了生成守護進程。這樣，如果父進程fork一次創建了一個守護進程，然後繼續做其它事時阻塞了，這時守護進程一直在運行，父進程卻沒有正常退出。如果守護進程因爲正常或非正常原因退出了，就會變成ZOMBIE進程。

如果fork兩次呢？父進程先fork出一個兒子進程，兒子進程再fork出孫子進程做爲守護進程，然後兒子進程立刻退出，守護進程被init進程接管，這樣無論父進程做什麼事，無論怎麼被阻塞，都與守護進程無關了。所以，fork兩次的守護進程很安全，避免了殭屍進程出現的可能性。

守護進程與殭屍進程

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

百度面試總結

2014阿里巴巴面試

2014網易實習生招聘面試題

騰訊2014校園招聘技術類筆試題詳解

最大的子序列和問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結