基于Yarn的MapReduce集群搭建可参照官方文档 https://hadoop.apache.org/docs/r2.6.5/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
基础知识
MapReduce
关于MapReduce基础知识,可查阅此篇文章:https://blog.csdn.net/luzhensmart/article/details/90202313
Yarn
关于Yarn基础知识,可查阅此篇文章:https://www.jianshu.com/p/3f406cf438be
服务器准备
本文搭建所用服务器环境是在上篇文章【Hadoop学习笔记三:高可用集群搭建(Hadoop2.x)】https://www.jianshu.com/p/666ff9bbf784 基础上进行的,基于Yarn的MapReduce集群服务器规划方案如下图。
一、免密登录
两个ResourceManager节点类似于高可用集群NameNode一个级别,主备之间可能需要进行切换,所以主备节点需要免秘钥登录。按上图所示,需配置Node03和Node04之间免密钥登录。
03节点 .ssh 目录下:
ssh-keygen -t dsa -P '' -f ./id_dsa
cat id_dsa.pub >> authorized_keys
scp id_dsa.pub node04:`pwd`/node03.pub
04节点 .ssh 目录下 :
cat node03.pub >> authorized_keys
ssh-keygen -t dsa -P '' -f ./id_dsa
cat id_dsa.pub >> authorized_keys
scp id_dsa.pub node03:`pwd`/node04.pub
03节点 .ssh 目录下:
cat node04.pub >> authorized_keys
二、配置项
1.mapred-site.xml
#重命名
mv mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
#参照 官网Single Node Cluster配置Yarn
2.yarn-site.xml
#配置数据洗牌阶段归于yarn管理
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
#参照 官网Single Node Cluster配置Yarn
#官网给出的最简配置
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node04</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node02:2181,node03:2181,node04:2181</value>
</property>
3.分发两个文件到:02 03 04节点
scp mapred-site.xml yarn-site.xml node02:`pwd`
scp mapred-site.xml yarn-site.xml node03:`pwd`
scp mapred-site.xml yarn-site.xml node04:`pwd`
三、启动集群
1.启动zookeeper
#02/03/04节点同时执行
zkServer.sh start
2.启动hdfs
#node01 节点启动
start-dfs.sh
#注意,有一个脚本不要用,start-all.sh
#如果nn1和 nn2没有启动,需要在node01,node02分别手动启动:
hadoop-daemon.sh start namenode
3.启动yarn
#启动NodeManager
start-yarn.sh
#在03,04节点 启动ResourceManager
yarn-daemon.sh start resourcemanager
浏览器访问: node03:8088 node04:8088
四、停止集群
#node01上执行
stop-dfs.sh
#node01上停止NodeManager
stop-yarn.sh
#node03,node04停止ResourceManager
yarn-daemon.sh stop resourcemanager
#node02,node03,node04上停止zk
zkServer.sh stop
五、测试计算能力
#test.txt 是 【Hadoop学习笔记二:全分布式搭建(Hadoop1.x)】 时上传
hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount test.txt /wordcount
计算过程可在浏览器查看进度
http://node04:8088/cluster
19/12/01 04:40:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
19/12/01 04:40:47 INFO input.FileInputFormat: Total input paths to process : 1
19/12/01 04:40:47 INFO mapreduce.JobSubmitter: number of splits:2
19/12/01 04:40:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575143975107_0004
19/12/01 04:40:49 INFO impl.YarnClientImpl: Submitted application application_1575143975107_0004
19/12/01 04:40:50 INFO mapreduce.Job: The url to track the job: http://node04:8088/proxy/application_1575143975107_0004/
19/12/01 04:40:50 INFO mapreduce.Job: Running job: job_1575143975107_0004
19/12/01 04:41:49 INFO mapreduce.Job: Job job_1575143975107_0004 running in uber mode : false
19/12/01 04:41:49 INFO mapreduce.Job: map 0% reduce 0%
19/12/01 04:43:30 INFO mapreduce.Job: map 33% reduce 0%
19/12/01 04:43:31 INFO mapreduce.Job: map 50% reduce 0%
19/12/01 04:46:44 INFO mapreduce.Job: map 50% reduce 17%
19/12/01 04:46:46 INFO mapreduce.Job: map 100% reduce 17%
19/12/01 04:46:50 INFO mapreduce.Job: map 100% reduce 43%
19/12/01 04:46:53 INFO mapreduce.Job: map 100% reduce 100%
19/12/01 04:46:55 INFO mapreduce.Job: Job job_1575143975107_0004 completed successfully
结果查看
hdfs dfs -cat /wordcount/part-r-00000