wget 參數解析篇

1 整體概括:

前提說明:

本篇wget分析僅僅是參數解析內容,不包括wget的遞歸和非遞歸下載,後面文章會陸續進行分析。本次主要分析參數爲tries(t) timeout(T) no-clobber quiet(q) recursive(r) help(h)version(V) append-output(a) execute(e) no(n) clobber, 其中括號裏面的爲wget短選項,括號前面的爲長選項。

 

在wget運行下載文件或頁面時,用戶可以通過參數來改變wget的行爲,比如想查看wget的調試和http數據包可以使用 wget --debug www.baidu.com/index.html

 

我們這次分析下載url以baidu 搜索頁面(http://www.baidu.com/index.html)爲樣本,進行分析不同類型的參數,以達到拋磚引玉的目的。wget支持長選項和短選項,比如輸出調試信息短選項爲-d長選項爲—debug

 

wget有全局的struct options opt;保存着wget用戶參數設置值,來修改wget行爲。本篇主要講解用戶輸入參數如何轉化爲 opt的成員。

 

wget分析的版本爲1.13,gcc版本爲3.4.5,linux內核版本2.6.9_5-9-0-0

2 詳細代碼解析:

2.1數據結構

wget 對於配置轉化,設置struct options opt 有兩張表和長短選項數組

命令行表:

struct cmdline_option option_data

此表保存着wget支持的長短選項和長短選項屬性

命令轉化設置opt表:

commands

此表用於設置根據參數來設置opt成員。

長選項:

struct option long_options[2*countof(option_data) + 1]

短選項:

struct char short_options[128]

2.2參數解析流程


Main 首先根據不同平臺來設置使用時間函數,blog裏有monotonic time和wall time講解,這裏就不分析。

2.2.1 defaults();

然後調用defaults函數,該函數主要是給全局opt設置默認值(因爲代碼太長,給出部分代碼)。
//#######################src/init.c
/* Reset the variables to default values.  */
  void
  defaults (void)
  {
    char *tmp;
  
    /* Most of the default values are 0 (and 0.0, NULL, and false).
       Just reset everything, and fill in the non-zero values.  Note
       that initializing pointers to NULL this way is technically
       illegal, but porting Wget to a machine where NULL is not all-zero
       bit pattern will be the least of the implementors' worries.  */
    xzero (opt);
  
    opt.cookies = true;
    opt.verbose = -1;
    opt.ntry = 20;
    opt.reclevel = 5;
    opt.add_hostdir = true;
    opt.netrc = true;
    opt.ftp_glob = true;

2.2.2 init_switches()

函數很簡單,追加一些ch註釋

static void
  init_switches (void)
  {
    //p指向短選項數組
    char *p = short_options;
    size_t i, o = 0;
    //遍歷所有選項
    for (i = 0; i < countof (option_data); i++)
      {
        struct cmdline_option *opt = &option_data[i];
        struct option *longopt;
  
        //如果這個選項數據沒有長選項,直接跳過
        if (!opt->long_name)
          /* The option is disabled. */
          continue;
  
        //longopt指向長選項一個依次節點
        longopt = &long_options[o++];
        //長選項name指向opt的long_name
        longopt->name = opt->long_name;
        //長選項val執行opt的數組索引,用於根據長選項查找opt
        longopt->val = i;
        if (opt->short_name)
         {
            //如果存在短選項,把opt short_name保存在short_options中
            *p++ = opt->short_name;
            //用optmap保存short_name的value 來索引長選項數組
            optmap[opt->short_name - 32] = longopt - long_options;
          }
        switch (opt->type)
          {
          case OPT_VALUE:
           //參數需要值
           longopt->has_arg = required_argument;
          //如果參數需要設置值,並且短選項存在,就需要字符":"
            if (opt->short_name)
              *p++ = ':';
            break;
          case OPT_BOOLEAN:
            /* 如果是bool類型(開關類型參數) 需要支持--option=off and --no-option .look the note of the blow*/
            /* Specify an optional argument for long options, so that
               --option=off works the same as --no-option, for
               compatibility with pre-1.10 Wget.  However, don't specify
               optional arguments short-option booleans because they
               prevent combining of short options.  */
            longopt->has_arg = optional_argument;
            /* For Boolean options, add the "--no-FOO" variant, which is
               identical to "--foo", except it has opposite meaning and
               it doesn't allow an argument.  */
            longopt = &long_options[o++];
            longopt->name = no_prefix (opt->long_name);
            longopt->has_arg = no_argument;
            /* Mask the value so we'll be able to recognize that we're
               dealing with the false value.  */
            //索引加一個負數符號
            longopt->val = i | BOOLEAN_NEG_MARKER;
            break;
          default:
            //others 根據情況設置不同的值
            assert (opt->argtype != -1);
            longopt->has_arg = opt->argtype;
            if (opt->short_name)
              {
                if (longopt->has_arg == required_argument)
                  *p++ = ':';
                /* Don't handle optional_argument */
              }
          }                                                                                                                                                         
      }
    /* Terminate short_options. */
    *p = '\0';
    /* No need for xzero(long_options[o]) because its storage is static
       and it will be zeroed by default.  */
    assert (o <= countof (long_options));
  }

舉例分析(長選項爲append-output ,短(a)):

用gdb跟蹤下long_options和short_options

截取long_options一部分:


name(append-output) has_arg(1) val(2)

val==2 表示該長選項屬性在option_data的索引


其中字符’a’ ascii值爲97 那麼這個在opt_map中索引爲97-32=65

Such 


也就可以通過短選項找個長選項索引,然後這個長選項val就是option_data的數組索引。

2.2.1 main set opt

 while ((ret = getopt_long (argc, argv,
                               short_options, long_options, &longindex)) != -1)
      {
        int val;
        struct cmdline_option *opt;
  
        /* If LONGINDEX is unchanged, it means RET is referring a short
           option.  */
        if (longindex == -1)
          {
            if (ret == '?')
              {
                print_usage (0);
                printf ("\n");
                printf (_("Try `%s --help' for more options.\n"), exec_name);
                exit (2);
              }
            /* Find the short option character in the mapping.  */
            longindex = optmap[ret - 32];                                                                                                                           
          }
        val = long_options[longindex].val;
  
        /* Use the retrieved value to locate the option in the
           option_data array, and to see if we're dealing with the
           negated "--no-FOO" variant of the boolean option "--foo".  */
        opt = &option_data[val & ~BOOLEAN_NEG_MARKER];

我截取了main處理argc argv部分代碼。

調用過api getopt_long, 如果longindex==-1那麼用戶輸入的是短選項,通過optmap來確定此短選項在長選項數組索引optmap[ret-32], 然後根據長選項的val找到在opt_data的此選項位置,如果用戶輸入的是長選項,就直接使用val。

val = long_options[longindex].val;

獲取此選項opt_data

opt = &option_data[val &~BOOLEAN_NEG_MARKER];

 

找到了參數在opt_data的位置,然後下面就開始設置全局opt

根據參數類型分析以下參數:

OPT_VALUE					tries(t) timeout(T) 
OPT_BOOLEAN				no-clobber quiet(q) recursive(r)
OPT_FUNCALL					help(h) version(V)
OPT__APPEND_OUTPUT			append-output(a)
OPT_EXECUTE					execute(e)
OPT_NO						no(n)
OPT__PARENT|OPT__CLOBBER	clobber

代碼段:

      switch (opt->type)                                                                                                                                          
         {
         case OPT_VALUE:
           setoptval (opt->data, optarg, opt->long_name);
           break;
         case OPT_BOOLEAN:
           if (optarg)
             /* The user has specified a value -- use it. */
             setoptval (opt->data, optarg, opt->long_name);
           else
             {
               /* NEG is true for `--no-FOO' style boolean options. */
               bool neg = !!(val & BOOLEAN_NEG_MARKER);
               setoptval (opt->data, neg ? "0" : "1", opt->long_name);
             }
           break;
         case OPT_FUNCALL:
           {
             void (*func) (void) = (void (*) (void)) opt->data;
             func ();
           }
           break;
         case OPT__APPEND_OUTPUT:
           setoptval ("logfile", optarg, opt->long_name);
           append_to_log = true;
           break;
         case OPT__EXECUTE:
           run_command (optarg);
           break;
         case OPT__NO:
           {
             /* We support real --no-FOO flags now, but keep these
                short options for convenience and backward
                compatibility.  */
             char *p;
             for (p = optarg; p && *p; p++)
               switch (*p)
                 {
                 case 'v':
                   setoptval ("verbose", "0", opt->long_name);
                   break;
                 case 'H':
                   setoptval ("addhostdir", "0", opt->long_name);
                   break;
                 case 'd':
                   setoptval ("dirstruct", "0", opt->long_name);
                   break;
                 case 'c':
                   setoptval ("noclobber", "1", opt->long_name);
                   break;
                 case 'p':
                   setoptval ("noparent", "1", opt->long_name);
                   break;
                 default:
                   fprintf (stderr, _("%s: illegal option -- `-n%c'\n"),
                            exec_name, *p);
                   print_usage (1);
                   fprintf (stderr, "\n");
                   fprintf (stderr, _("Try `%s --help' for more options.\n"),                                                                                      
                            exec_name);
                   exit (1);
                 }
             break;
           }
         case OPT__PARENT:
         case OPT__CLOBBER:
         case OPT__CLOBBER:
           {
             /* The wgetrc commands are named noparent and noclobber,
                so we must revert the meaning of the cmdline options
                before passing the value to setoptval.  */
             bool flag = true;
             if (optarg)
               flag = (*optarg == '1' || c_tolower (*optarg) == 'y'
                       || (c_tolower (optarg[0]) == 'o'
                           && c_tolower (optarg[1]) == 'n'));
             setoptval (opt->type == OPT__PARENT ? "noparent" : "noclobber",
                        flag ? "0" : "1", opt->long_name);
             break;
           }
         case OPT__DONT_REMOVE_LISTING:
           setoptval ("removelisting", "0", opt->long_name);
           break;
         }
 
       longindex = -1;
     }

參數類型OPT_VALUE(t,T)

Setoptval(opt->data, optarg,opt->long_name)

         ->setval_internal(command_by_name(opt->data),“--“+opt->long_name, optarg)

其中command_by_name(opt->data)是通過二分查找,找到data在commands中的索引位置

Code:

 static int
 command_by_name (const char *cmdname)
 {
   /* Use binary search for speed.  Wget has ~100 commands, which
      guarantees a worst case performance of 7 string comparisons.  */
   int lo = 0, hi = countof (commands) - 1;                                                                                                                        
 
   while (lo <= hi)
     {
       int mid = (lo + hi) >> 1;
       int cmp = strcasecmp (cmdname, commands[mid].name);
       if (cmp < 0)
         hi = mid - 1;
       else if (cmp > 0)
         lo = mid + 1;
       else
         return mid;
     }
   return -1;
 }

set_internal(comind, “--“+opt->long_name,optarg)

         ->commands[comind].action (“--“+opt->long_name, optarg,commands[comind].place);

比如tries commands信息如下

{ "tries",            &opt.ntry,              cmd_number_inf },

調用cmd_num_inf(“—tries”, optarg, opt.ntry)

函數設置opt.ntry = strtoul(optarg, 10,. NULL)

參數類型OPT_BOOLEAN

和OPT_BOOLEAN大同小異,此處略過。

參數類型OPT_FUNCALL

-h 和 –v

調用opt->data

如果用戶輸入參數爲-h或者-v就會調用print_help or print_version,這裏就略過了。

參數類型OPT__APPEND_OUTPUT

setoptval ("logfile", optarg,opt->long_name);//和OPT_VALUE相似,略過。

參數類型OPT__EXECUTE

參數-e

Run_command(optarg)

其中optarg 格式爲key=value,此函數解析出key和value,比如append-output=logfile.txt

就會調用set_internal(comind, com, val)來設置opt

參數類型OPT__NO、OPT__PARENT、OPT__CLOBBER、OPT__DONT_REMOVE_LISTING都是大同小異,這裏就略過了。

 

此篇文章就結束了。                                                                                           

2014/4/12     Leek  in beijing



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章