参考:http://blog.csdn.net/gamer_gyt/article/details/52045663
hadoop2.7.3 yarn模式已经搭建完成.
hadoop安装目录:/home/hadoop/hadoop-2.7.3
java目录 :/home/java/jdk1.8.0_102
节点状况:
10.0.0.172 master172
10.0.0.171 slave171
10.0.0.185 slave185
安装scala
下载地址: http://www.scala-lang.org/download/2.11.8.html
下载scala-2.11.8.tgz放到 /home/hadoop/下面
$ vi /etc/profile
$ source /etc/profile
export SCALA_HOME=/home/hadoop/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
下载
下载spark-2.0.0-bin-hadoop2.7.tgz到/home/hadoop/下
$ cd /home/hadoop/
$ tar -zxvf spark-2.0.0-bin-hadoop2.7.tgz
$ mv spark-2.0.0-bin-hadoop2.7 spark
配置spark环境变量
$ vi /etc/profile
加入
export SPARK_HOME=/home/hadoop/spark
export PATH=$SPARK_HOME/bin:$PATH
$ source /etc/profile
配置spark
$ cd /home/hadoop/spark/conf
$ cp spark-env.sh.template spark-env.sh
$ vi spark-env.sh
加入
export SCALA_HOME=/home/hadoop/scala-2.11.8
export JAVA_HOME=/home/java/jdk1.8.0_102
export SPARK_MASTER_IP=10.0.0.172
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.3/etc/hadoop
配置slaves
cp slaves.template slaves
vi slaves
10.0.0.171
10.0.0.185
分发到slave节点
scp -r scala-2.11.8 10.0.0.171:/home/hadoop/
scp -r scala-2.11.8 10.0.0.185:/home/hadoop/
scp -r spark 10.0.0.171:/home/hadoop/
scp -r spark 10.0.0.185:/home/hadoop/
同样设置一下 /etc/profile 文件。
运行测试
打开spark-shell
$ cd /home/hadoop/spark
$ bin/spark-shell
在打开的spark-shell里输入,每一次输入,shell都会有一些反馈结果
scala> val file=sc.textFile("hdfs://10.0.0.172:9000/input/*")
file: org.apache.spark.rdd.RDD[String] = hdfs://10.0.0.172:9000/input/* MapPartitionsRDD[5] at textFile at <console>:24
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:26
scala> count.collect()
res0: Array[(String, Int)] = Array(((default),,1), (JNs,1), (Software,1), (Unless,9), (endpoint.,1), (user?,1), (<name>security.applicationclient.protocol.acl</name>,1), (start,1), (number,5), (getKeyVersion,1), (ApplicationHistoryProtocol,,1), (type,1), (with,28), (State,1), (RefreshUserMappingsProtocol.,1), (JavaKeyStoreProvider,,1), (ACL,,2), (inter-datanode,1), (at,11), ((root,1), (ApplicationClientProtocol,,1), (ResourceCalculator,1), (hot-reloaded,1), (keys,1), (<name>mapreduce.jobhistory.address</name>,1), (History,1), (implementation,1), (<name>security.namenode.protocol.acl</name>,1), (setup,1), (" ",19), (<configuration>,9), (Server,1), (allowed.</description>,18), (BASIS,,9), (datanodes,1), (file.,10), (resources,2), (stored.,1), (<name>mapreduce.jobhistory.webapp.address</na...
注意:9000端口,在hadoop中,只有NameNode暴露了,DataNode是没有暴露这个端口的,所以,要注意地址。