1、搭建工程
1.可以使用local catalog
2.创建maven project,通过搜索scala-a
可以使用net.alchim31.maven
;最好注释掉pom中的test部分并且删除掉测试代码
3.可以更改scala library container
4.win7本地运行spark
5、需在pom引入spark-core依赖及其他用到的spark依赖,scope设置成provided,用于maven打包编译用(本地运行用user library导入spark jar即可)。使用maven-assembly-plugin引入打包其他非spark依赖
6、通过spark restful api提交请求(https://www.nitendragautam.com/bigdata/submit-apache-spark-job-with-rest-api/)1、URL:http://spark-cluster-ip:6066/v1/submissions/create
2、POST请求
3、Content-Type:application/json;charset=UTF-8
4、Body:{"clientSparkVersion":"1.5.0","appArgs":["2018-07","2018-07"],"mainClass":"class-package","appResource":"hdfs-jar-path","action":"CreateSubmissionRequest","sparkProperties":{"spark.jars":"hdfs-jar-path","spark.eventLog.enabled":"true","spark.eventLog.dir":"hdfs://m0.unidata:9000/history","spark.app.name":"custom-name","spark.submit.deployMode":"cluster","spark.master":"spark://m0.unidata:6066","spark.driver.supervise":"false","spark.executor.memory": "10g","spark.executor.cores": "5"},"environmentVariables":{"SPARK_ENV_LOADED":"1"}}
7.查看spark任务运行状态
http://spark-cluster-ip:6066/v1/submissions/status/driver-yyyymmddhhmmss-XXXX
2、向hdfs上传spark的jar文件可以使用hdfs explorer软件或者使用程序上传,最好不要使用ftp(jar会上传不全)
3、hdfs更换namenode重新启动
1.更改core-site.xml、excludes、hdfs-site.xml、slaves
2.格式化hadoop namenode -format
3.jps无datanode进程解决方法
4、hdfs更改副本数
1.设置副本数为1
./hdfs dfs -setrep -R 1 filepath
2../hdfs dfs -setrep -w 1 /
3.查看副本数:./hdfs dfs -ls /
4.文件健康检查:./hdfs fsck /
5、单独启动某个datanode,
1.在该机器的hadoop目录的sbin下执行
./hadoop-daemon.sh start datanode
6、删除hdfs上文件
1.
./hdfs dfs -rmr -skipTrash file_path