在centos搭建的hadoop平臺上運行Python

原創

2019-08-31 05:56

寫在前面:
服務器上的配置與虛擬機有所不同，由於服務器有外網和內網的區別，故/etc/hosts需要作改動，改動如下：
ifconfig命令查看ip，發現與外網的ip（106.15.224.113）不同

用命令vim /etc/hosts進行以下修改：
106.15.224.113 localhost
172.19.246.119 Centos6

前面這些基本的弄好之後，我們就可以正式開始了

一.用pycharm編輯python代碼
hdfs_map.py

import sys
# 將文件內容分隔
def read_input(file):
    for line in file:
        yield line.split()
def main():
    data=read_input(sys.stdin)

    for words in data:
        for word in words:
            print("%s%s%d"%(word,'\t',1))
if __name__=='__main__':
    main()

hdfs_reduce.py

import sys

from operator import itemgetter
from itertools import groupby
def read_mapper_output(file,separator='\t'):
    for line in file:
        yield line.rstrip().split(separator,1)
def main():
    data=read_mapper_output(sys.stdin)
    for current_word,group in groupby(data,itemgetter(0)):
        total_count=sum(int(count) for current_word,count in group)
        print('%s%s%d'%(current_word,'\t',total_count))
if __name__=='__main__':
    main()

二.將代碼上傳至服務器上，創建目錄/opt/python並放置在該目錄下

在該目錄下下載一本書
wget http://www.gutenberg.org/ebooks/20417.txt.utf-8

權限修改
chmod 777 /opt/python/hdfs_map.py
chmod 777 /opt/python/hdfs_reduce.py

centos上的文件目錄

三.運行hadoop

在你安裝的hadoop的sbin目錄下運行如下命令：

./start-all.sh

查看啓動情況

jps

驗證程序能否跑

echo "a b c"|python3 /opt/python/hdfs_map.py

echo “a a b d c b c c c”|python3 /opt/python/hdfs_map.py |sort -k1,1|python3 /opt/python/hdfs_reduce.py

四.在真正的hadoop上運行Python程序
創建需要的目錄
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/input

上傳本地文件至hadoop目錄中
hdfs dfs -put /opt/python/pg20417.txt /user/input

在hadoop上運行python程序
/home/hadoop/hadoop3.2/bin/hadoop jar /home/hadoop/hadoop3.2/share/hadoop/tools/lib/hadoop-streaming-3.2.0.jar -files “/opt/python/hdfs_map.py,/opt/python/hdfs_reduce.py” -input /user/input/*.txt -output /user/output -mapper “/root/Py37/bin/python3 /opt/python/hdfs_map.py” -reducer “/root/Py37/bin/python3 /opt/python/hdfs_reduce.py”

運行成功的效果圖

查看output裏面文件的情況
hdfs dfs -ls /user/output

後面那個文件是我們需要的文件,我們查看一下里面的內容
hdfs dfs -cat /user/output/part-00000

每天進步一點點，開心也多一點點。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

在centos搭建的hadoop平臺上運行Python

【現在才知道原來selenium環境是這樣配置的】

Constant expression required問題解決

【springBoot必知必會 yml文件配置】

【SpringBoot必知必會 Maven Helper插件】

IDEA實際開發常用快捷鍵總結

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結