1、安裝ant(自行百度)
目前官方2.x只提供了源碼下載,不再提供編譯的版本,需要用戶自己去編譯。
2、下載 nutch 2.2.1
由於對nutch2.3.1 進行編譯時,一直處在網絡檢測中,於是改爲對2.2.1版本進行編譯,
下載地址:http://archive.apache.org/dist/nutch/2.2.1/
解壓到自定義的文件夾下:tar -xvf apache-nutch-2.2.1-src-tar-gz /usr/local
3、nutch存儲採用mysql
修改
${NUTCH_HOME}
/ivy/ivy.xml文件,取消註釋
<dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/>
修改:<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />
爲:<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/>
<dependency org="org.apache.gora" name="gora-core" rev="0.2.1" conf="*->default"/>
4、數據庫連接配置
修改
${NUTCH_HOME}
/conf/gora.properties文件,註釋掉默認的數據庫連接配置,同時添加以下配置內容:############################### # Default MySQL properties # ############################### gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true gora.sqlstore.jdbc.user=xxxx(MySQL用戶名) gora.sqlstore.jdbc.password=xxxx(MySQL密碼)
5、修改 ${NUTCH_HOME}
/nutch-site.xml
配置文件
將以下內容覆蓋nutch-site.xml文件
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>http.agent.name</name> <value>YourNutchSpider</value> </property> <property> <name>http.accept.language</name> <value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value> <description>Value of the Accept-Language request header field. This allows selecting non-English language as default one to retrieve. It is a useful setting for search engines build for certain national group. </description> </property> <property> <name>storage.data.store.class</name> <value>org.apache.gora.sql.store.SqlStore</value> <description>The Gora DataStore class for storing and retrieving data. Currently the following stores are available:. </description> </property> <property> <name>parser.character.encoding.default</name> <value>utf-8</value> <description>The character encoding to fall back to when no other information is available</description> </property> <property> <name>generate.batch.id</name> <value>*</value> </property> </configuration>
6、ant編譯
切換到apache-nutch.2.2.1主目錄下,運行ant命令
遇到的問題
- 編譯中若出現:
Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.
則下載sonar-ant-task-2.2.jar,地址http://repo2.maven.org/maven2/org/codehaus/sonar-plugins/sonar-ant-task/2.2/sonar-ant-task-2.2.jar
將其拷貝到${NUTCH_HOME}/lib
目錄下面,並修改${NUTCH_HOME}/build.xml,在
下添加<taskdef uri="antlib:org.sonar.ant" resource="org/sonar/ant/antlib.xml">
<classpath><fileset dir="./lib" includes="sonar*.jar" /></classpath>
- 編譯build failed
或者是其他的依賴性問題導致BUILD FAILED的,可通過修改maven中央庫地址來解決
修改
${NUTCH_HOME}/ivy/ivysettings.xml中
<property name="repo.maven.org" value="http://repo1.maven.org/maven2/" override="false"/>
value值改爲其它中央庫地址:
http://repo2.maven.org/maven2/(這個靠譜)
http://repository.sonatype.org/content/groups/public/
http://central.maven.org/maven2/
-
編譯卡頓
若一直出現在以下界面:
resolve-default: [ivy:resolve] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: loading settings :: file = /opt/apache-nutch-2.3.1/ivy/ivysettings.xml
耐心等待兩分鐘,若還是不動,重新ant編譯,最好在網絡順暢的條件下編譯
- 編譯中出現以下情況:
重新ant編譯You probably access the destination server through a proxy server that is not well configured.
出現:
[ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: commons-httpclient#commons-httpclient;3.1: configuration not found in commons-httpclient#commons-httpclient;3.1: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default [ivy:resolve] WARN: :: log4j#log4j;1.2.15: configuration not found in log4j#log4j;1.2.15: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: FAILED DOWNLOADS :: [ivy:resolve] WARN: :: ^ see resolution messages for details ^ :: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: org.mortbay.jetty#jetty;6.1.26!jetty.zip [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] report for org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default produced in /root/.ivy2/cache/org.apache.nutch-nutch-default.xml [ivy:resolve] resolve done (2940ms resolve - 4576ms download) [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] :::: WARNINGS [ivy:resolve] [FAILED ] org.mortbay.jetty#jetty;6.1.26!jetty.zip: (0ms) [ivy:resolve] ==== local: tried [ivy:resolve] /root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip [ivy:resolve] ==== maven2: tried [ivy:resolve] http://central.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== sonatype: tried [ivy:resolve] http://oss.sonatype.org/content/repositories/releases/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip
若出現上面FAILED DOWNLOADS,重新ant編譯即可若是maven中央庫中確實沒有這個包,則需要手動下載放到
/root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip(具體地址看上述錯誤信息中的====local:tried部分)若出現上面UNRESOLVED DEPENDENCIES,首先看已經下載的庫中是否有這個包,地址在/root/.ivy2/cache或者/home/用戶名/.ivy2/cache下
若是已經下載的庫中有這個包,則刪除該包,重新ant編譯;
若下載的庫中沒有這個包,需要修改
${NUTCH_HOME}
/ivy/ivy.xml文件,通過定位commons-httpclient發現該包的conf屬性爲master,
將conf屬性修改爲default.<dependency org="commons-httpclient" name="commons-httpclient" rev="3.1" conf="*->master" />
參考文章:http://blog.csdn.net/u010317005/article/details/51090175