nutch1.3與solr3.4集成部署在eclipse上之——運行的輸出日誌

nutch1.3與solr3.4集成部署在eclipse上成功


在eclipse上運行參數是:

crawl urls -solr http://localhost:8080/l-nutch-solr -depth 3 -topN 10


運行時輸出日誌:

crawl started in: crawl-20111107123624
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=http://localhost:8080/solr/
topN = 10
Injector: starting at 2011-11-07 12:36:25
Injector: crawlDb: crawl-20111107123624/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-11-07 12:36:30, elapsed: 00:00:05
Generator: starting at 2011-11-07 12:36:30
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123633
Generator: finished at 2011-11-07 12:36:35, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:36:35
Fetcher: segment: crawl-20111107123624/segments/20111107123633
Fetcher: threads: 10
QueueFeeder finished: total 1 records + hit by time limit :0
fetching http://www.amazon.cn/
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=2
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:36:39, elapsed: 00:00:04
ParseSegment: starting at 2011-11-07 12:36:39
ParseSegment: segment: crawl-20111107123624/segments/20111107123633
ParseSegment: finished at 2011-11-07 12:36:42, elapsed: 00:00:02
CrawlDb update: starting at 2011-11-07 12:36:42
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123633]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:36:44, elapsed: 00:00:01
Generator: starting at 2011-11-07 12:36:44
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123646
Generator: finished at 2011-11-07 12:36:48, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:36:48
Fetcher: segment: crawl-20111107123624/segments/20111107123646
Fetcher: threads: 10
QueueFeeder finished: total 10 records + hit by time limit :0
fetching http://www.amazon.cn/%E4%B8%89%E6%98%9FS5838-3G%E6%89%8B%E6%9C%BA/dp/B005KP4AFG?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005OPL41A?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
fetching http://www.amazon.cn/b?ie=UTF8&node=79553071
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://www.amazon.cn/%E5%B0%8F%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=814224051
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-IdeaPad-Y470N-%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91/dp/B005LT2VIE?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
fetching http://www.amazon.cn/ThinkPad-E40-0579-A22-14-0%E8%8B%B1%E5%AF%B8%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91-%E9%80%81%E5%8E%9F%E8%A3%85%E5%8C%85/dp/B005LFRMVY?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640639907
  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640640909
  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640641910
  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640642911
  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640643912
  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640644496
  now           = 1320640644913
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640650546
  now           = 1320640645914
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640650546
  now           = 1320640646915
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640650546
  now           = 1320640647916
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640650546
  now           = 1320640648918
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640650546
  now           = 1320640649919
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640655698
  now           = 1320640650919
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640655698
  now           = 1320640651921
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640655698
  now           = 1320640652923
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640655698
  now           = 1320640653924
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640655698
  now           = 1320640654925
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640660855
  now           = 1320640655926
  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640660855
  now           = 1320640656927
  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640660855
  now           = 1320640657928
  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640660855
  now           = 1320640658929
  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640660855
  now           = 1320640659930
  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-finishing thread FetcherThread, activeThreads=9
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:37:43, elapsed: 00:00:55
ParseSegment: starting at 2011-11-07 12:37:43
ParseSegment: segment: crawl-20111107123624/segments/20111107123646
ParseSegment: finished at 2011-11-07 12:37:45, elapsed: 00:00:01
CrawlDb update: starting at 2011-11-07 12:37:45
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123646]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:37:47, elapsed: 00:00:01
Generator: starting at 2011-11-07 12:37:47
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123749
Generator: finished at 2011-11-07 12:37:51, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:37:51
Fetcher: segment: crawl-20111107123624/segments/20111107123749
Fetcher: threads: 10
QueueFeeder finished: total 10 records + hit by time limit :0
fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-THINK%E9%BB%91/dp/B005GZ0I5G?_encoding=UTF8&s=electronics
fetching http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-%E7%84%89%E7%B2%89/dp/B005GZ0IC4?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://www.amazon.cn/gp/yourstore/home
fetching http://www.amazon.cn/gp/css/homepage.html
fetching http://www.amazon.cn/%E6%89%8B%E8%A1%A8-%E6%97%B6%E9%92%9F/b?ie=UTF8&node=1953164051
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640683363
  now           = 1320640684037
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640685037
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640686039
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640687043
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640688044
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640689045
  0. http://www.amazon.cn/gp/registry/wishlist
  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  3. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/gp/registry/wishlist
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640689186
  now           = 1320640690047
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640695079
  now           = 1320640691048
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640695079
  now           = 1320640692049
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640695079
  now           = 1320640693049
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640695079
  now           = 1320640694051
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640695079
  now           = 1320640695053
  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  2. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640700231
  now           = 1320640696053
  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640700231
  now           = 1320640697054
  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640700231
  now           = 1320640698056
  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640700231
  now           = 1320640699057
  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640700231
  now           = 1320640700058
  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
  1. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640705384
  now           = 1320640701058
  0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640705384
  now           = 1320640702060
  0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640705384
  now           = 1320640703060
  0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640705384
  now           = 1320640704061
  0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1320640705384
  now           = 1320640705063
  0. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/gp/help/customer/display.html
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:38:26, elapsed: 00:00:35
ParseSegment: starting at 2011-11-07 12:38:26
ParseSegment: segment: crawl-20111107123624/segments/20111107123749
Error parsing: http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif: failed(2,0): Can't retrieve Tika parser for mime-type image/gif
ParseSegment: finished at 2011-11-07 12:38:28, elapsed: 00:00:01
CrawlDb update: starting at 2011-11-07 12:38:28
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123749]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:38:30, elapsed: 00:00:01
LinkDb: starting at 2011-11-07 12:38:30
LinkDb: linkdb: crawl-20111107123624/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123633
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123646
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123749
LinkDb: finished at 2011-11-07 12:38:32, elapsed: 00:00:01
SolrIndexer: starting at 2011-11-07 12:38:32
SolrIndexer: finished at 2011-11-07 12:38:37, elapsed: 00:00:05
SolrDeleteDuplicates: starting at 2011-11-07 12:38:37
SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/
SolrDeleteDuplicates: finished at 2011-11-07 12:38:39, elapsed: 00:00:01
crawl finished: crawl-20111107123624

抓取數據模型

1. CrawlDB,用於存儲所有的urls信息,包括抓取機制,抓取狀態,網頁指紋和元數據。

2. LinkDB,存儲每一個url的連入錨鏈接和錨文本

3. Segment,原始的網頁內容;解析後的網頁;元數據;外鏈接;用於索引的元文本




參考:http://blog.csdn.net/amuseme_lu/article/details/5993916

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章