mooseFS 分佈式文件系統

市面上各種分佈式文件系統品種繁多,層出不窮。列舉幾個主要的:

mogileFS:Key-Value型元文件系統,不支持FUSE,應用程序訪問它時需要API,主要用在web領域處理海量小圖片,效率相比mooseFS高很多。

fastDFS:國人在mogileFS的基礎上進行改進的key-value型文件系統,同樣不支持FUSE,提供比mogileFS更好的性能。

mooseFS:支持FUSE,相對比較輕量級,對master服務器有單點依賴,用perl編寫,性能相對較差,國內用的人比較多

glusterFS:支持FUSE,比mooseFS龐大

ceph:支持FUSE,客戶端已經進入了linux-2.6.34內核,也就是說可以像ext3/rasierFS一樣,選擇ceph爲文件系統。徹底的分佈式,沒有單點依賴,用C編寫,性能較好。基於不成熟的btrfs,其本身也非常不成熟。

lustre:Oracle公司的企業級產品,非常龐大,對內核和ext3深度依賴

NFS:老牌網絡文件系統,具體不瞭解,反正NFS最近幾年沒發展,肯定不能用。

本來我打算用mogileFS,因爲它用的人最多,而且我的主要需求都是在web方面。

但是研究了它的api之後發現,Key-Value型文件系統沒有目錄結構,導致不能用list某個子目錄的所有文件,不能直接像本地文件系統一樣操作,幹什麼事情都需要一個api,讓人十分不爽。

mogileFs這種做法,可能是受同一個開發團隊的另一個大名鼎鼎的產品memcached的偵聽端口+api模式影響,也有可能是mogileFS剛開始設計的時候,FUSE還沒有開始流行。

總之我決心要找一個支持FUSE的分佈式文件系統,最後就在mooseFS, glusterFS, ceph中選擇。從技術上來看,ceph肯定是最棒的,用c編寫,進入linux-2.6.34內核,基於btrfs文件系統,保證了它的高性能,而多臺master的結構徹底解決了單點依賴問題,從而實現了高可用。可是ceph太不成熟了,它基於的btrfs本身就不成熟,它的官方網站上也明確指出不要把ceph用在生產環境中。

而且國內用的人較少,linux發行版中,ubuntu10.04的內核版本是2.6.32,仍然不能直接使用ceph。

而glusterFS比較適合大型應用,口碑相對較差,因此也不考慮。

最後我選擇了缺點和優點同樣明顯的mooseFS。雖然它有單點依賴,它的master非常佔內存。但是根據我的需求,mooseFS已經足夠滿足我的存儲需求。國內mooseFS的人比較多,並且有很多人用在了生產環境,更加堅定了我的選擇。

打算用一臺高性能服務器(雙路至強5500, 24GB內存)作爲爲master,兩臺HP DL360G4(6塊SCSI 146GB)作爲chunk服務器,搭建一個冗餘度爲2的分佈式文件系統,提供給web服務中的每一臺服務器使用。




一.MooseFS簡介:

MooseFs是一個具有容錯功能的網絡分佈式文件系統.

MooseFS獨有的特性:

*高可靠性,數據能在不同計算機上存儲若干副本。
*通過添加新的計算機或是磁盤來動態擴展空間。
*能存儲特定時間內刪除的文件。
*建立文件快照,和整個原文件保持一致的副本,原文件也可以正在被訪問或寫入

二. MooseFS架構(如圖):

包括四種類型的機器:

*Managing server(master server)
*Data servers(chunk servers)
*Metadata backup servers(metalogger server)
*Client
三.支持的平臺:

*Linux (Linux 2.6.14 and up have FUSE support included in the official kernel)
*FreeBSD
*NetBSD
*OpenSolaris
*MacOS X
四.環境如下:

Managing server(master server):                     OS: Centos5.4           IP:192.168.2.241
Metadata backup server (metalogger server):         OS: Centos5.4           IP:192.168.2.242
Data servers(chunk servers):                        OS: Centos5.4           IP:192.168.2.243
                                                    OS: Centos5.4           IP:192.168.2.244
Client(mfsmount):                                   OS: Ubuntu9.10          IP:192.168.2.66
五.安裝配置

1.Master server機器上:

*下載安裝

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-group=mfs --with-default-user=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
 
mv mfsexports.cfg.dist           mfsexports.cfg
mv mfsmaster.cfg.dist            mfsmaster.cfgcd  /usr/local/mfs/var/mfs
 
mv  metadata.mfs.empty    metadata.mfs  (否則啓動時會提示如下錯誤)
if this is new instalation then rename metadata.mfs.empty as metadata.mfs
init: file system manager failed !!! error occured during initialization – exiting
*修改mfsexports.cfg爲如下:(允許192.168.2.66 掛載)

#*            /    ro
#192.168.1.0/24        /    rw
#192.168.1.0/24        /    rw,alldirs,maproot=0,password=passcode
#10.0.0.0-10.0.0.5    /test    rw,maproot=nobody,password=test
#*            .    rw
192.168.2.66            /    rw,alldirs,maproot=0
*修改/etc/hosts添加:

192.168.2.241          mfsmaster
*其它配置文件我採用的默認方式

*啓動master

/usr/local/mfs/sbin/mfsmaster  start
*關閉master

/usr/local/mfs/sbin/mfsmaster  stop

啓動: /usr/local/mfs/sbin/mfscgiserv
停止: kill /usr/local/mfs/sbin/mfscgiserv
2. metalogger server機器上

*下載安裝:

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-user=mfs --with-default-user=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
mv mfsmetalogger.cfg.dist    mfsmetalogger.cfg
*修改/etc/hosts添加:

192.168.2.241          mfsmaster
*其它保持默認配置

*啓動metalogger:

/usr/local/mfs/sbin/mfsmetalogger start
*關閉metalogger:

/usr/local/mfs/sbin/mfsmetalogger stop
注:metalogger連接master的9419端口,注意Firewall把端口打開,我在做測試時把iptables 關閉了.當master server出現故障需要恢復時可以從metalogger server 複製metadata.mfs.back 和最後一個日誌,缺一不可.

3.chunk server 機器上

*下載安裝:

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd  mfs
useradd   mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd  mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfsmount --with-default-user=mfs --with-default-group=mfs
make&make install
*修改相關配置:

cd  /usr/local/mfs/etc
mv  mfschunkserver.cfg.dist         mfschunkserver.cfg
mv  mfshdd.cfg.dist                 mfshdd.cfg
*修改mfshdd.cfg 文件內容如下:

/store-data
注:mfshdd.cfg文件存放的是用來給MooseFS使用的空間,我這裏是/store-data分區,把其它沒用的刪除掉,如果你有更多分區可以一一加進來.

*修改 /etc/hosts添加:

192.168.2.241          mfsmaster
我這裏mfschunkserver.cfg文件採用默認配置.

*改變/store-data權限

chown –R mfs.mfs /store-data
*啓動chunk server:

/usr/local/mfs/sbin/mfschunkserver start
*停止chunk server

/usr/local/mfs/sbin/mfschunkserver stop
另一個chunk server配置步驟和這個一樣,詳細步驟不再給出.

4.Client配置:

需要用到FUSE,官網地址:http://fuse.sourceforge.net/

Ubuntu要用到libfuse-dev

*下載安裝

*安裝FUSE

由於我的Client是ubuntu9.10所以直接用apt-get install libfuse-dev 安裝的FUSE

非ubuntu系統可以用源碼直接安裝步驟如下:

wget  http://cdnetworks-kr-2.dl.sourceforge.net/project/fuse/fuse-2.X/2.8.3/fuse-2.8.3.tar.gz

tar  –zxvf fuse-2.8.3.tar.gz
cd  fuse-2.8.3
./configure  –prefix=/usr/local/fuse
make
make install
*安裝MooseFS

wget  http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz

groupadd  mfs
useradd mfs -g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfschunkserver --enable-mfsmount --with-default-user=mfs --with-default-group=mfs
make & make install
*掛載MooseFS

mkdir –p /media/mfs
/usr/local/mfs/bin/mfsmount -H mfsmaster /media/mfs/
mfsmaster accepted connection with parameters:read-write,restricted_ip;root mapped to root:root
*查看掛載情況:

4、掛接MFSMETA文件系統
(1)、創建掛接點 mkdir /mnt/mfsmeta
(2)、掛接MFSMETA
/usr/local/mfs-old/bin/mfsmount -m /mnt/mfsmeta/  -H 192.168.3.34

df –h | grep mfs
 
mfsmaster:9421  1.8T  139G  1.6T   9%  /media/mfs

5)、卸載已掛接的文件系統
利用Linux系統的umount命令就可以了,例如:
[root@www ~]# umount /mnt/mfs


如果出現下列情況:
[root@www ~]# umount /mnt/mfs
umount: /mnt/mfs: device is busy
umount: /mnt/mfs: device is busy
則說明客戶端本機有正在使用此文件系統,可以查明是什麼命令正在使用,然後推出就可以了,最好不要強制退出。

*在client端進行具體的操作指令

*指定文件副本的分數。

/usr/local/mfs/sbin/mfssetgoal –r 3 /media/mfs
注: (-r 表示遞歸)

*查看:

/usr/local/mfs/bin/mfsgetgoal /media/mfs/ubuntu-9.10-server-i386.iso
 
/media/mfs/ubuntu-9.10-server-i386.iso: 3
*指定文件刪除後回收的時間600秒(10分鐘)

/usr/local/mfs/bin/mfssettrashtime -r 600 /media/mfs
*更多的使用方法查看:

http://www.moosefs.com/reference-guide.html#operations-specific-for-moosefs

六.故障恢復測試

假設系統崩潰,我現在做的是刪除/usr/local/mfs並且重新啓動計算機,顯然現在整個文件系統是不可用的,重新安裝MooseFS,做和原先相同的配置

從Metadata backup server 複製metadata_ml.mfs.back和最後一個日誌文件我這裏是changelog_ml.30.mfs,到新安裝好的master機器上,執行恢復命令

/usr/local/mfs/sbin/mfsmetarestore -m metadata_ml.mfs.back -o metadata.mfs changelog_ml.30.mfsloading objects (files,directories,etc.) ... ok
 
loading names ... ok
 
loading deletion timestamps ... ok
 
checking filesystem consistency ... ok
 
loading chunks data ... ok
 
connecting files and chunks ... ok
 
applying changes from file: changelog_ml.30.mfs
 
meta data version: 4633
 
4765: version mismatch
接着執行如下命令:

/usr/local/mfs/sbin/mfsmetarestore  -a

file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead
 
loading objects (files,directories,etc.) ... ok
 
loading names ... ok
 
loading deletion timestamps ... ok
 
checking filesystem consistency ... ok
 
loading chunks data ... ok
 
connecting files and chunks ... ok
 
applying changes from file: /usr/local/mfs/var/mfs/changelog_ml.30.mfs
 
meta data version: 4633
 
4765: version mismatch
最後啓動master,並查看整個文件系統的恢復情況


按以下步驟安全停止MooseFS集羣:

在所有機器上用umount命令卸載文件系統(在我們的示例中是:umount /mnt/mfs)

停止chunk server進程: /usr/sbin/mfschunkserver stop
停止metalogger進程: /usr/sbin/mfsmetalogger stop
停止master server進程: /usr/sbin/mfsmaster stop


文件恢復
Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).

 

$ mfssettrashtime 3600 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3600

$ rm /mnt/mfs-test/test1

$ ls /mnt/mfs-test/test1

ls: /mnt/mfs-test/test1: No such file or directory

 

The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.

 

The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:

# ls -l /mnt/mfs-test-meta/trash/*test1

-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test1

# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test/test2

 

Moving this file to the trash/undel subdirectory causes restoring of the original file in a proper MooseFS file system - at path set in a way described above or the original path (if it was not changed).

# mv /mnt/mfs-test-meta/trash/00013BC7|test1 /mnt/mfs-test-meta/trash/undel/

 

Note: if a new file with the same path already exists, restoring of the file will not succeed.

Similarly, you cannot move the file with different filename.

mfs集羣開機順序與關機
MOOSEFS MAINTENANCE
 
Starting MooseFS cluster
 
The safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems) is to run the following commands in this sequence:

start mfsmaster process
start all mfschunkserver processes
start mfsmetalogger processes (if configured)
when all chunkservers get connected to the MooseFS master, the filesystem can be mounted on any number of clients using mfsmount (you can check if all chunkservers are connected by checking master logs or CGI monitor).
 
Stopping MooseFS cluster
 
To safely stop MooseFS:

unmount MooseFS on all clients (using the umount command or an equivalent)
stop chunkserver processes with the mfschunkserver stop command
stop metalogger processes with the mfsmetalogger stop command
stop master process with the mfsmaster stop command.
 
Maintenance of MooseFS chunkservers
 
Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.

 
MooseFS metadata backups
 
There are two general parts of metadata:

main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)
The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored. Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.

 
MooseFS master recovery
 
In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:

$ mfsmetarestore -a

If master data are stored in location other than the specified during MooseFS compilation, the actual path needs to be specified using the -d option, e.g.:

$ mfsmetarestore -a -d /storage/mfsmaster

 
MooseFS master recovery from a backup
 
In order to restore the master host from a backup:

install mfsmaster in normal way
configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
retrieve metadata.mfs.back file from the backup or metalogger host, place it in mfsmaster data directory
copy last metadata changelogs from any metalogger running just before master failure into mfsmaster data directory
merge metadata changelogs using mfsmetarestore command as specified before - either using mfsmetarestore -a, or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.
$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs

 

Please also read a mini howto about preparing a fail proof solution in case of outage of the master server. In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章