編譯Hadoop的append分支源碼

Hadoop versionHBase versionCompatible?
0.20.2 release0.90.2NO
0.20-append0.90.2YES
0.21.0 release0.90.2NO
0.22.x (in development)0.90.2NO

從上圖可以看出,HBase0.90.2與Hadoop的主幹版本0.20.0是不兼容的,雖然可以使用,但是在生產環境中會導致數據丟失。

比如在hbase的web界面會有如下提醒:

You are currently running the HMaster without HDFS append support enabled. This may result in data loss. Please see the HBase wiki for details.

As of today, Hadoop 0.20.2 is the latest stable release of Apache Hadoop that is marked as ready for production (neither 0.21 nor 0.22 are).

Unfortunately, Hadoop 0.20.2 release is not compatible with the latest stable version of HBase: if you run HBase on top of Hadoop 0.20.2, you risk to lose data! Hence HBase users are required to build their own Hadoop 0.20.x version if they want to run HBase on a production cluster of Hadoop. In this article, I describe how to build such a production-ready version of Hadoop 0.20.x that is compatible with HBase 0.90.2.

在Hbase0.20.2的官方book中也有提到:

This version of HBase will only run on Hadoop 0.20.x. It will not run on hadoop 0.21.x (nor 0.22.x). HBase will lose data unless it is running on an HDFS that has a durable sync. Currently only the branch-0.20-append branch has this attribute [1]. No official releases have been made from this branch up to now so you will have to build your own Hadoop from the tip of this branch. Check it out using this url, branch-0.20-append. Scroll down in the Hadoop How To Release to the section Build Requirements for instruction on how to build Hadoop.

Or rather than build your own, you could use Cloudera's CDH3. CDH has the 0.20-append patches needed to add a durable sync (CDH3 betas will suffice; b2, b3, or b4).


所以本文就討論如何使用編譯hadoop的append分支,並整合進入Hadoop主幹版本。

首先安裝git工具。(是個類似於svn一樣的版本控制工具)

$ apt-get install git

使用git獲取源代碼,並建立本地版本庫,需要下載較長時間

$ git clone git://git.apache.org/hadoop-common.git

進入庫內
$ cd hadoop-common

我們發現git到本地的庫只可以看到hadoop的最新主幹代碼,實際上,git已經獲取了所有版本,需要手動切換版本到append分支;

$ git checkout -t origin/branch-0.20-append
Branch branch-0.20-append set up to track remote branch branch-0.20-append from origin.
Switched to a new branch 'branch-0.20-append'

這樣就切換到了append分支

我們在分支就可以準備進行編譯:

首先在hadoop-common目錄下創建 build.properties ,內容如下:

resolvers=internal
version=0.20-taotaosou-dfs(你需要指定的版本號,例子代表淘淘搜-分佈式文件系統)
project.version=${version}
hadoop.version=${version}
hadoop-core.version=${version}
hadoop-hdfs.version=${version}
hadoop-mapred.version=${version}
hadoop-common目錄下,最後確認一下是否已經切換版本
git checkout branch-0.20-append

現在看,目錄中內容全變了,切換到了append分支

下面開始編譯,先安裝ant哦


啓動構建,需要較長時間完成(4分鐘左右)

$ ant mvn-install

注意,如果需要重新運行該指令,你應該先清除生成的文件

rm -rf $HOME/.m2/repository 

在hadoop-common目錄下執行
ant clean-cache

編譯完成之後,會進入測試階段

# Optional: run the full test suite or just the core test suite
$ ant test
$ ant test-core

第一個 測試全部內容,第二個只測試核心功能

ant test 時間非常久,非服務器約10小時。


在哪裏可以找到目標jar包呢?

$ find $HOME/.m2/repository -name "hadoop-*.jar"

.../repository/org/apache/hadoop/hadoop-examples/0.20-append-for-hbase/hadoop-examples-0.20-append-for-hbase.jar
.../repository/org/apache/hadoop/hadoop-test/0.20-append-for-hbase/hadoop-test-0.20-append-for-hbase.jar
.../repository/org/apache/hadoop/hadoop-tools/0.20-append-for-hbase/hadoop-tools-0.20-append-for-hbase.jar
.../repository/org/apache/hadoop/hadoop-streaming/0.20-append-for-hbase/hadoop-streaming-0.20-append-for-hbase.jar
.../repository/org/apache/hadoop/hadoop-core/0.20-append-for-hbase/hadoop-core-0.20-append-for-hbase.jar

接下來就是將新的jar替換舊的jar包(此處假設你已經架設好hadoop-0.20.2release版本)

1,替換舊的hadoop包;

2,替換hbase中lib文件夾中的包

請注意,替換jar包需要重新命名

Hadoop 0.20.2 release 版本的命名規則爲 hadoop-VERSION-PACKAGE.jar,如:hadoop-0.20.2-examples.jar.

而新編譯的版本命名規則爲 hadoop-PACKAGE-VERSION.jar,如: hadoop-examples-0.20-append-for-hbase.jar

所以你會以如下方式重命名:

hadoop-examples-0.20-append-for-hbase.jar  --> hadoop-0.20-append-for-hbase-examples.jar
hadoop-test-0.20-append-for-hbase.jar      --> hadoop-0.20-append-for-hbase-test.jar
hadoop-tools-0.20-append-for-hbase.jar     --> hadoop-0.20-append-for-hbase-tools.jar
hadoop-streaming-0.20-append-for-hbase.jar --> hadoop-0.20-append-for-hbase-streaming.jar
hadoop-core-0.20-append-for-hbase.jar      --> hadoop-0.20-append-for-hbase-core.jar

而與之相反,Hbase使用的命名規則爲hadoop-PACKAGE-VERSION.jar ,所以提交到$HBASE_HOME/lib的jar包則不需要重命名,只需要保持原來的名稱。


完成以上工作之後,新編譯的包就可以使用了。


但是在測試過程中,你可能遇到一些test fail

比如:TestFileAppend4 總是會出錯

但是幸運的是,這並不意味着不能使用,或許你還會遇到其他錯誤,但是,經過與hbase maillist聯繫,發現其實他們也是正常的。

所以有錯誤,也請放心,雖然你也或跟我一樣感到不爽。

好吧先寫到這裏。




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章