一、下載須知
軟件準備:
spark-1.0.0-bin-hadoop1.tgz 下載地址:spark1.0.0
scala-2.10.4.tgz 下載下載:Scala 2.10.4
hadoop-1.2.1-bin.tar.gz 下載地址:hadoop-1.2.1-bin.tar.gz
jdk-7u60-linux-i586.tar.gz 下載地址:去官網下載就行,這個1.7.x都行
二、安裝步驟
hadoop-1.2.1安裝步驟,請看: http://my.oschina.net/dataRunner/blog/292584
1.解壓:
1
2
3
4
5
|
tar -zxvf scala-2.10.4.tgz mv scala-2.10.4 scala tar -zxvf spark-1.0.0-bin-hadoop1.tgz mv spark-1.0.0-bin-hadoop1 spark |
2. 配置環境變量:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
vim /etc/profile (在最後一行加入以下內容就行) export HADOOP_HOME_WARN_SUPPRESS=1 export JAVA_HOME=/home/big_data/jdk export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/home/big_data/hadoop export HIVE_HOME=/home/big_data/hive export SCALA_HOME=/home/big_data/scala export SPARK_HOME=/home/big_data/spark export PATH=.:$SPARK_HOME/bin:$SCALA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH |
3.修改spark的spark-env.sh文件
1
2
3
4
5
6
7
8
9
10
11
|
cd spark/conf cp spark-env.sh.template spark-env.sh vim spark-env.sh (在最後一行加入以下內容就行) export JAVA_HOME=/home/big_data/jdk export SCALA_HOME=/home/big_data/scala export SPARK_MASTER_IP=192.168.80.100 export SPARK_WORKER_MEMORY=200m export HADOOP_CONF_DIR=/home/big_data/hadoop/conf |
然後就配置完畢勒!!!(就這麼簡單,艹,很多人都知道,但是共享的人太少勒)
三、測試步驟
hadoop-1.2.1測試步驟,請看: http://my.oschina.net/dataRunner/blog/292584
1.驗證scala
1
2
3
4
5
6
7
8
9
10
11
12
|
[root @ master ~] # scala -version Scala code runner version 2.10 . 4 -- Copyright 2002 - 2013 , LAMP/EPFL [root @ master ~] # [root @ master big _ data] # scala Welcome to Scala version 2.10 . 4 (Java HotSpot(TM) Client VM, Java 1.7 . 0 _ 60 ). Type in expressions to have them evaluated. Type : help for more information. scala> 1 + 1 res 0 : Int = 2 scala> : q |
2.驗證spark (先啓動hadoop-dfs.sh)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
|
[root@master big_data]# cd spark [root@master spark]# cd sbin/start-all.sh ( 也可以分別啓動 [root@master spark]$ sbin/start-master.sh 可以通過 http://master:8080/ 看到對應界面 [root@master spark]$ sbin/start-slaves.sh park://master:7077 可以通過 http://master:8081/ 看到對應界面 ) [root@master spark]# jps [root@master ~]# jps 4629 NameNode (hadoop的) 5007 Master (spark的) 6150 Jps 4832 SecondaryNameNode (hadoop的) 5107 Worker (spark的) 4734 DataNode (hadoop的) 可以通過 http://192.168.80.100:8080/ 看到對應界面 [root@master big_data]# spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/07/20 21:41:04 INFO spark.SecurityManager: Changing view acls to: root 14/07/20 21:41:04 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) 14/07/20 21:41:04 INFO spark.HttpServer: Starting HTTP Server 14/07/20 21:41:05 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/20 21:41:05 INFO server.AbstractConnector: Started [email protected]:43343 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_60) 。。。 scala> 可以通過 http://192.168.80.100:4040/ 看到對應界面 (隨便上傳一個文件,裏面隨便一些英文單詞,到hdfs上面) scala> val file=sc.textFile("hdfs://master:9000/input") 14/07/20 21:51:05 INFO storage.MemoryStore: ensureFreeSpace(608) called with curMem=31527, maxMem=311387750 14/07/20 21:51:05 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory (estimated size 608.0 B, free 296.9 MB) file: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at textFile at < console >:12 scala> val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_) 14/07/20 21:51:14 INFO mapred.FileInputFormat: Total input paths to process : 1 count: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[10] at reduceByKey at < console >:14 scala> count.collect() 14/07/20 21:51:48 INFO spark.SparkContext: Job finished: collect at < console >:17, took 2.482381535 s res0: Array[(String, Int)] = Array((previously-registered,1), (this,3), (Spark,1), (it,3), (original,1), (than,1), (its,1), (previously,1), (have,2), (upon,1), (order,2), (whenever,1), (it’s,1), (could,3), (Configuration,1), (Master's,1), (SPARK_DAEMON_JAVA_OPTS,1), (This,2), (which,2), (applications,2), (register,,1), (doing,1), (for,3), (just,2), (used,1), (any,1), (go,1), ((equivalent,1), (Master,4), (killing,1), (time,1), (availability,,1), (stop-master.sh,1), (process.,1), (Future,1), (node,1), (the,9), (Workers,1), (however,,1), (up,2), (Details,1), (not,3), (recovered,1), (process,1), (enable,3), (spark-env,1), (enough,1), (can,4), (if,3), (While,2), (provided,1), (be,5), (mode.,1), (minute,1), (When,1), (all,2), (written,1), (store,1), (enter,1), (then,1), (as,1), (officially,1)... scala> scala> count.saveAsTextFile("hdfs://master:9000/output") (結果保存到hdfs上的/output文件夾下) scala> :q Stopping spark context. [root@master ~]# hadoop fs -ls / Found 3 items drwxr-xr-x - root supergroup 0 2014-07-18 21:10 /home -rw-r--r-- 1 root supergroup 1722 2014-07-18 06:18 /input drwxr-xr-x - root supergroup 0 2014-07-20 21:53 /output [root@master ~]# [root@master ~]# hadoop fs -cat /output/p* 。。。 (mount,1) (production-level,1) (recovery).,1) (Workers/applications,1) (perspective.,1) (so,2) (and,1) (ZooKeeper,2) (System,1) (needs,1) (property Meaning,1) (solution,1) (seems,1) |
好了我們安裝測試完成,入門教程到此結束!
你可以興奮的笑一笑,艹,原來spark這麼簡單。(僞分佈噢,呵呵,供學習用)
你如果喜歡這種共享精神,請加入我們
四、關於我們
本文author:數據的開拓者成員之一 江中煉
QQ羣:248087140
座右銘:
你在你擅長的領域牛逼,
並帶着一羣小白變牛逼,
別人會發自內心的去尊重你的。
--可點擊加入我們