ubuntu下安裝nutch2.x

1、安裝ant(自行百度)

目前官方2.x只提供了源碼下載,不再提供編譯的版本,需要用戶自己去編譯。

2、下載 nutch 2.2.1

由於對nutch2.3.1 進行編譯時,一直處在網絡檢測中,於是改爲對2.2.1版本進行編譯,

下載地址:http://archive.apache.org/dist/nutch/2.2.1/

解壓到自定義的文件夾下:tar -xvf apache-nutch-2.2.1-src-tar-gz /usr/local

3、nutch存儲採用mysql

修改 ${NUTCH_HOME}/ivy/ivy.xml文件,取消註釋

 <dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/> 
<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />
修改:

<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/>
爲:
<dependency org="org.apache.gora" name="gora-core" rev="0.2.1" conf="*->default"/>

4、數據庫連接配置

修改 ${NUTCH_HOME}/conf/gora.properties文件,註釋掉默認的數據庫連接配置,同時添加以下配置內容:

###############################  
# Default MySQL properties    #  
###############################  
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver  
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true  
gora.sqlstore.jdbc.user=xxxx(MySQL用戶名)  
gora.sqlstore.jdbc.password=xxxx(MySQL密碼)

5、修改 ${NUTCH_HOME}/nutch-site.xml 配置文件

將以下內容覆蓋nutch-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>  
	<name>http.agent.name</name>  
	<value>YourNutchSpider</value>  
</property>  


<property>  
	<name>http.accept.language</name>  
	<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>  
	<description>Value of the Accept-Language request header field.  
		This allows selecting non-English language as default one to retrieve.  
		It is a useful setting for search engines build for certain national group.  
	</description>  
</property>

<property>  
	<name>storage.data.store.class</name>  
	<value>org.apache.gora.sql.store.SqlStore</value>  
	<description>The Gora DataStore class for storing and retrieving data.  
		Currently the following stores are available:.  
	</description>  
</property>
   
<property>  
	<name>parser.character.encoding.default</name>  
	<value>utf-8</value>  
	<description>The character encoding to fall back to when no other information  
	is available</description>  
</property>
  
<property>  
	<name>generate.batch.id</name>  
	<value>*</value>  
</property>  
</configuration>

6、ant編譯

切換到apache-nutch.2.2.1主目錄下,運行ant命令

遇到的問題

  • 編譯中若出現:

Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. 

則下載sonar-ant-task-2.2.jar,地址http://repo2.maven.org/maven2/org/codehaus/sonar-plugins/sonar-ant-task/2.2/sonar-ant-task-2.2.jar
將其拷貝到 ${NUTCH_HOME}/lib 目錄下面,並修改${NUTCH_HOME}/build.xml,在

<taskdef uri="antlib:org.sonar.ant" resource="org/sonar/ant/antlib.xml">
下添加
<classpath><fileset dir="./lib" includes="sonar*.jar" /></classpath>

  • 編譯build failed

或者是其他的依賴性問題導致BUILD FAILED的,可通過修改maven中央庫地址來解決

修改${NUTCH_HOME}/ivy/ivysettings.xml中

<property name="repo.maven.org"  
    value="http://repo1.maven.org/maven2/"  
    override="false"/>

value值改爲其它中央庫地址:

http://repo2.maven.org/maven2/(這個靠譜)

http://repository.sonatype.org/content/groups/public/

http://central.maven.org/maven2/

  • 編譯卡頓

若一直出現在以下界面:

resolve-default:  
[ivy:resolve] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::  
[ivy:resolve] :: loading settings :: file = /opt/apache-nutch-2.3.1/ivy/ivysettings.xml 

耐心等待兩分鐘,若還是不動,重新ant編譯,最好在網絡順暢的條件下編譯

  • 編譯中出現以下情況:

You probably access the destination server through a proxy server that is not well configured.
重新ant編譯

出現:

[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	:: commons-httpclient#commons-httpclient;3.1: configuration not found in commons-httpclient#commons-httpclient;3.1: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default
[ivy:resolve] WARN: 	:: log4j#log4j;1.2.15: configuration not found in log4j#log4j;1.2.15: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	::              FAILED DOWNLOADS            ::
[ivy:resolve] WARN: 	:: ^ see resolution messages for details  ^ ::
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: 	:: org.mortbay.jetty#jetty;6.1.26!jetty.zip
[ivy:resolve] WARN: 	::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] 	report for org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default produced in /root/.ivy2/cache/org.apache.nutch-nutch-default.xml
[ivy:resolve] 	resolve done (2940ms resolve - 4576ms download)
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve] 		[FAILED     ] org.mortbay.jetty#jetty;6.1.26!jetty.zip:  (0ms)
[ivy:resolve] 	==== local: tried
[ivy:resolve] 	  /root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip
[ivy:resolve] 	==== maven2: tried
[ivy:resolve] 	  http://central.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip
[ivy:resolve] 	==== sonatype: tried
[ivy:resolve] 	  http://oss.sonatype.org/content/repositories/releases/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip

若出現上面FAILED DOWNLOADS,重新ant編譯即可

若是maven中央庫中確實沒有這個包,則需要手動下載放到
/root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip(具體地址看上述錯誤信息中的====local:tried部分)

若出現上面UNRESOLVED DEPENDENCIES,首先看已經下載的庫中是否有這個包,地址在/root/.ivy2/cache或者/home/用戶名/.ivy2/cache下

若是已經下載的庫中有這個包,則刪除該包,重新ant編譯;

若下載的庫中沒有這個包,需要修改 ${NUTCH_HOME}/ivy/ivy.xml文件,通過定位commons-httpclient發現該包的conf屬性爲master,

<dependency org="commons-httpclient" name="commons-httpclient"
      rev="3.1" conf="*->master" />
將conf屬性修改爲default.


參考文章:http://blog.csdn.net/u010317005/article/details/51090175




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章