1 整體概括:
前提說明:
本篇wget分析僅僅是參數解析內容,不包括wget的遞歸和非遞歸下載,後面文章會陸續進行分析。本次主要分析參數爲tries(t) timeout(T) no-clobber quiet(q) recursive(r) help(h)version(V) append-output(a) execute(e) no(n) clobber, 其中括號裏面的爲wget短選項,括號前面的爲長選項。
在wget運行下載文件或頁面時,用戶可以通過參數來改變wget的行爲,比如想查看wget的調試和http數據包可以使用 wget --debug www.baidu.com/index.html。
我們這次分析下載url以baidu 搜索頁面(http://www.baidu.com/index.html)爲樣本,進行分析不同類型的參數,以達到拋磚引玉的目的。wget支持長選項和短選項,比如輸出調試信息短選項爲-d長選項爲—debug
wget有全局的struct options opt;保存着wget用戶參數設置值,來修改wget行爲。本篇主要講解用戶輸入參數如何轉化爲 opt的成員。
wget分析的版本爲1.13,gcc版本爲3.4.5,linux內核版本2.6.9_5-9-0-0
2 詳細代碼解析:
2.1數據結構
wget 對於配置轉化,設置struct options opt 有兩張表和長短選項數組
命令行表:
struct cmdline_option option_data
此表保存着wget支持的長短選項和長短選項屬性
命令轉化設置opt表:
commands
此表用於設置根據參數來設置opt成員。
長選項:
struct option long_options[2*countof(option_data) + 1]
短選項:
struct char short_options[128]
2.2參數解析流程
Main 首先根據不同平臺來設置使用時間函數,blog裏有monotonic time和wall time講解,這裏就不分析。
2.2.1 defaults();
然後調用defaults函數,該函數主要是給全局opt設置默認值(因爲代碼太長,給出部分代碼)。//#######################src/init.c
/* Reset the variables to default values. */
void
defaults (void)
{
char *tmp;
/* Most of the default values are 0 (and 0.0, NULL, and false).
Just reset everything, and fill in the non-zero values. Note
that initializing pointers to NULL this way is technically
illegal, but porting Wget to a machine where NULL is not all-zero
bit pattern will be the least of the implementors' worries. */
xzero (opt);
opt.cookies = true;
opt.verbose = -1;
opt.ntry = 20;
opt.reclevel = 5;
opt.add_hostdir = true;
opt.netrc = true;
opt.ftp_glob = true;
2.2.2 init_switches()
函數很簡單,追加一些ch註釋
static void
init_switches (void)
{
//p指向短選項數組
char *p = short_options;
size_t i, o = 0;
//遍歷所有選項
for (i = 0; i < countof (option_data); i++)
{
struct cmdline_option *opt = &option_data[i];
struct option *longopt;
//如果這個選項數據沒有長選項,直接跳過
if (!opt->long_name)
/* The option is disabled. */
continue;
//longopt指向長選項一個依次節點
longopt = &long_options[o++];
//長選項name指向opt的long_name
longopt->name = opt->long_name;
//長選項val執行opt的數組索引,用於根據長選項查找opt
longopt->val = i;
if (opt->short_name)
{
//如果存在短選項,把opt short_name保存在short_options中
*p++ = opt->short_name;
//用optmap保存short_name的value 來索引長選項數組
optmap[opt->short_name - 32] = longopt - long_options;
}
switch (opt->type)
{
case OPT_VALUE:
//參數需要值
longopt->has_arg = required_argument;
//如果參數需要設置值,並且短選項存在,就需要字符":"
if (opt->short_name)
*p++ = ':';
break;
case OPT_BOOLEAN:
/* 如果是bool類型(開關類型參數) 需要支持--option=off and --no-option .look the note of the blow*/
/* Specify an optional argument for long options, so that
--option=off works the same as --no-option, for
compatibility with pre-1.10 Wget. However, don't specify
optional arguments short-option booleans because they
prevent combining of short options. */
longopt->has_arg = optional_argument;
/* For Boolean options, add the "--no-FOO" variant, which is
identical to "--foo", except it has opposite meaning and
it doesn't allow an argument. */
longopt = &long_options[o++];
longopt->name = no_prefix (opt->long_name);
longopt->has_arg = no_argument;
/* Mask the value so we'll be able to recognize that we're
dealing with the false value. */
//索引加一個負數符號
longopt->val = i | BOOLEAN_NEG_MARKER;
break;
default:
//others 根據情況設置不同的值
assert (opt->argtype != -1);
longopt->has_arg = opt->argtype;
if (opt->short_name)
{
if (longopt->has_arg == required_argument)
*p++ = ':';
/* Don't handle optional_argument */
}
}
}
/* Terminate short_options. */
*p = '\0';
/* No need for xzero(long_options[o]) because its storage is static
and it will be zeroed by default. */
assert (o <= countof (long_options));
}
舉例分析(長選項爲append-output ,短(a)):
用gdb跟蹤下long_options和short_options
截取long_options一部分:
name(append-output) has_arg(1) val(2)
val==2 表示該長選項屬性在option_data的索引
其中字符’a’ ascii值爲97 那麼這個在opt_map中索引爲97-32=65
Such
也就可以通過短選項找個長選項索引,然後這個長選項val就是option_data的數組索引。
2.2.1 main set opt
while ((ret = getopt_long (argc, argv,
short_options, long_options, &longindex)) != -1)
{
int val;
struct cmdline_option *opt;
/* If LONGINDEX is unchanged, it means RET is referring a short
option. */
if (longindex == -1)
{
if (ret == '?')
{
print_usage (0);
printf ("\n");
printf (_("Try `%s --help' for more options.\n"), exec_name);
exit (2);
}
/* Find the short option character in the mapping. */
longindex = optmap[ret - 32];
}
val = long_options[longindex].val;
/* Use the retrieved value to locate the option in the
option_data array, and to see if we're dealing with the
negated "--no-FOO" variant of the boolean option "--foo". */
opt = &option_data[val & ~BOOLEAN_NEG_MARKER];
我截取了main處理argc argv部分代碼。
調用過api getopt_long, 如果longindex==-1那麼用戶輸入的是短選項,通過optmap來確定此短選項在長選項數組索引optmap[ret-32], 然後根據長選項的val找到在opt_data的此選項位置,如果用戶輸入的是長選項,就直接使用val。
val = long_options[longindex].val;
獲取此選項opt_data
opt = &option_data[val &~BOOLEAN_NEG_MARKER];
找到了參數在opt_data的位置,然後下面就開始設置全局opt
根據參數類型分析以下參數:
OPT_VALUE tries(t) timeout(T)
OPT_BOOLEAN no-clobber quiet(q) recursive(r)
OPT_FUNCALL help(h) version(V)
OPT__APPEND_OUTPUT append-output(a)
OPT_EXECUTE execute(e)
OPT_NO no(n)
OPT__PARENT|OPT__CLOBBER clobber
代碼段:
switch (opt->type)
{
case OPT_VALUE:
setoptval (opt->data, optarg, opt->long_name);
break;
case OPT_BOOLEAN:
if (optarg)
/* The user has specified a value -- use it. */
setoptval (opt->data, optarg, opt->long_name);
else
{
/* NEG is true for `--no-FOO' style boolean options. */
bool neg = !!(val & BOOLEAN_NEG_MARKER);
setoptval (opt->data, neg ? "0" : "1", opt->long_name);
}
break;
case OPT_FUNCALL:
{
void (*func) (void) = (void (*) (void)) opt->data;
func ();
}
break;
case OPT__APPEND_OUTPUT:
setoptval ("logfile", optarg, opt->long_name);
append_to_log = true;
break;
case OPT__EXECUTE:
run_command (optarg);
break;
case OPT__NO:
{
/* We support real --no-FOO flags now, but keep these
short options for convenience and backward
compatibility. */
char *p;
for (p = optarg; p && *p; p++)
switch (*p)
{
case 'v':
setoptval ("verbose", "0", opt->long_name);
break;
case 'H':
setoptval ("addhostdir", "0", opt->long_name);
break;
case 'd':
setoptval ("dirstruct", "0", opt->long_name);
break;
case 'c':
setoptval ("noclobber", "1", opt->long_name);
break;
case 'p':
setoptval ("noparent", "1", opt->long_name);
break;
default:
fprintf (stderr, _("%s: illegal option -- `-n%c'\n"),
exec_name, *p);
print_usage (1);
fprintf (stderr, "\n");
fprintf (stderr, _("Try `%s --help' for more options.\n"),
exec_name);
exit (1);
}
break;
}
case OPT__PARENT:
case OPT__CLOBBER:
case OPT__CLOBBER:
{
/* The wgetrc commands are named noparent and noclobber,
so we must revert the meaning of the cmdline options
before passing the value to setoptval. */
bool flag = true;
if (optarg)
flag = (*optarg == '1' || c_tolower (*optarg) == 'y'
|| (c_tolower (optarg[0]) == 'o'
&& c_tolower (optarg[1]) == 'n'));
setoptval (opt->type == OPT__PARENT ? "noparent" : "noclobber",
flag ? "0" : "1", opt->long_name);
break;
}
case OPT__DONT_REMOVE_LISTING:
setoptval ("removelisting", "0", opt->long_name);
break;
}
longindex = -1;
}
參數類型OPT_VALUE(t,T)
Setoptval(opt->data, optarg,opt->long_name)
->setval_internal(command_by_name(opt->data),“--“+opt->long_name, optarg)
其中command_by_name(opt->data)是通過二分查找,找到data在commands中的索引位置
Code:
static int
command_by_name (const char *cmdname)
{
/* Use binary search for speed. Wget has ~100 commands, which
guarantees a worst case performance of 7 string comparisons. */
int lo = 0, hi = countof (commands) - 1;
while (lo <= hi)
{
int mid = (lo + hi) >> 1;
int cmp = strcasecmp (cmdname, commands[mid].name);
if (cmp < 0)
hi = mid - 1;
else if (cmp > 0)
lo = mid + 1;
else
return mid;
}
return -1;
}
set_internal(comind, “--“+opt->long_name,optarg)
->commands[comind].action (“--“+opt->long_name, optarg,commands[comind].place);
比如tries commands信息如下
{ "tries", &opt.ntry, cmd_number_inf },
調用cmd_num_inf(“—tries”, optarg, opt.ntry)
函數設置opt.ntry = strtoul(optarg, 10,. NULL)
參數類型OPT_BOOLEAN
和OPT_BOOLEAN大同小異,此處略過。
參數類型OPT_FUNCALL
-h 和 –v
調用opt->data
如果用戶輸入參數爲-h或者-v就會調用print_help or print_version,這裏就略過了。
參數類型OPT__APPEND_OUTPUT
setoptval ("logfile", optarg,opt->long_name);//和OPT_VALUE相似,略過。
參數類型OPT__EXECUTE
參數-e
Run_command(optarg)
其中optarg 格式爲key=value,此函數解析出key和value,比如append-output=logfile.txt
就會調用set_internal(comind, com, val)來設置opt
參數類型OPT__NO、OPT__PARENT、OPT__CLOBBER、OPT__DONT_REMOVE_LISTING都是大同小異,這裏就略過了。
此篇文章就結束了。
2014/4/12 Leek in beijing