接着上篇部署。该篇是针对上篇的测试。
测试
-
Spark-shell测试
./spark-shell ... scala> val days = List("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday") days: List[String] = List(Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday) scala> val daysRDD =sc.parallelize(days) daysRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:14 scala>daysRDD.count() scala>res0:Long =7
-
脚本测试
-
本地模式
./bin/run-example org.apache.spark.examples.SparkPi 2 spark://localhost:7077
./bin/run-example SparkPi 10 --master local[2]
-
standalone模式
【注意】127.0.0.1 && *.jar的路径./spark-submit --class org.apache.spark.examples.SparkPi --master spark://127.0.0.1:7077 ../lib/spark-examples-1.4.0-hadoop2.6.0.jar 100
-
yarn测试(cluster模式和client模式)
【注意】*.jar的路径-
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ../lib/spark-examples*.jar 10
http://localhost:8088/
(localhost可以是服务器地址) ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ../lib/spark-examples*.jar 10
- 访问
localhost:8088
可以看到结果。
-
-
-
数据测试
- shell脚本
getNum(){ c=1 while [[ $c -le 5000000 ]] do echo $(($RANDOM/500)) ((c++)) done } for i in `seq 30` do getNum >> ${i}.txt & # getNum done wait echo "------------------DONE-----------------" cat [0-9]*.txt > num.txt
- 创建hdfs文件目录(执行文件位于hadoop/bin/hdfs;hdfs根目录是
hdfs://localhost:9000
)
执行命令:./bin/hdfs dfs -mkdir -p /user/hadoop/datatest
- 向创建的hdfs文件中写入数据(脚本生成的数据)
执行命令:./bin/hdfs dfs -put /root/num.txt /user/hadoop/datatest
- scala测试代码:
执行命令:spark/bin/Spark-shell
执行hadoop相关操作命令:(hadoop/bin/)scala> val file = sc.textFile("hdfs://localhost:9000/user/hadoop/datatest/num.txt") scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) scala> count.sortBy(_._2).map(x => x._1 + "\t" + x._2).saveAsTextFile("hdfs://localhost:9000/user/hadoop/datatest/numCount")
./hadoop fs -cat hdfs://localhost:9000/user/hadoop/datatest/numCount/p*|sort -k2n
测试结果如下: