1、准备3台客户机(关闭防火墙、静态ip、主机名称)
2、快速分发java和hadoop
将/opt/module文件夹,及配置文件/etc/profile分发到hadoop102|103|104上
分发配置文件时要使用root账户
[atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop102:/opt/
[atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop103:/opt/
[atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop104:/opt/
[atguigu@hadoop101 opt]$ su - root
[root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop102:/etc/
[root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop103:/etc/
[root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop104:/etc/
刷新配置文件,使java环境变量生效
[atguigu@hadoop102 ~]$ source /etc/profile
[atguigu@hadoop103 ~]$ source /etc/profile
[atguigu@hadoop104 ~]$ source /etc/profile
3、集群规划(最小配置)
4、修改配置文件
4.1 核心配置文件core-site.xml
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
4.2 HDFS配置文件
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/hdfs-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
</property>
4.3 YARN配置文件
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-site.xml
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
4.4 MapReduce配置文件
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
[atguigu@hadoop102 hadoop-2.7.2]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-site.xml
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
5、分发配置文件
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/ hadoop103:/opt/module/hadoop-2.7.2/etc/
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/ hadoop104:/opt/module/hadoop-2.7.2/etc/
6、集群单点启动
6.0 如果集群是第一次启动,需要格式化NameNode
[atguigu@hadoop102 hadoop-2.7.2]$ hdfs namenode -format
6.1 hadoop102启动namenode
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start namenode
6.2 分别启动datanode
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop102 hadoop-2.7.2]$ jps
15761 Jps
15609 NameNode
15690 DataNode
[atguigu@hadoop103 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop103 hadoop-2.7.2]$ jps
15250 DataNode
15321 Jps
[atguigu@hadoop104 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop104 hadoop-2.7.2]$ jps
15253 DataNode
15324 Jps
6.3 HDFS-WEB验证
7、配置ssh
3台机器分别执行ssh
$ cd .ssh
$ ssh-keygen -t rsa
$ ssh-copy-id hadoop102
$ ssh-copy-id hadoop103
$ ssh-copy-id hadoop104
3台机器分别验证,都不需要密码则成功
$ ssh hadoop102
$ exit
$ ssh hadoop103
$ exit
$ ssh hadoop104
$ exit
8、集群启动
8.1 指定datanode
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/slaves
hadoop102
hadoop103
hadoop104
分发到其他两台虚拟机
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/slaves hadoop103:/opt/module/hadoop-2.7.2/etc/hadoop/
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/slaves hadoop104:/opt/module/hadoop-2.7.2/etc/hadoop/
8.2 群起hdfs
在hadoop102(namenode节点)上启动hdfs
[atguigu@hadoop102 hadoop-2.7.2]$ start-dfs.sh
在hadoop103(sourceManager节点)上启动yarn
[atguigu@hadoop103 hadoop-2.7.2]$ start-yarn.sh
8.3 web界面查看
hadoop102:50070
hadoop103:8088
9、运行wordcount
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put wcinput /
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wcinput /wcoutput
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cat /wcoutput/*
atguigu 2
hadoop 3
hdfs 1
mapreduce 1
yarn 1
hadoop102:50070
hadoop103:8088
10、启动日志聚集功能
日志聚集功能能够解决:当集群庞大,yarn的任务有错时无法定位到是某台节点的异常,日志聚集可以将所有节点日志都收集到历史服务器上,并通过yarn管理界面查看
10.0 关闭hadoop集群
[atguigu@hadoop103 hadoop-2.7.2]$ stop-yarn.sh
[atguigu@hadoop102 hadoop-2.7.2]$ stop-dfs.sh
10.1 配置日志服务器
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-site.xml
在mapred-site.xml中添加如下内容
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
</property>
[atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-site.xml
在yarn-site.xml中添加如下内容
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
10.2 同步配置文件
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop hadoop103:/opt/module/hadoop-2.7.2/etc
[atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop hadoop104:/opt/module/hadoop-2.7.2/etc
10.3 重启hadoop集群&重启history服务器
在hadoop102(namenode节点)上启动hdfs
[atguigu@hadoop102 hadoop-2.7.2]$ start-dfs.sh
在hadoop103(sourceManager节点)上启动yarn
[atguigu@hadoop103 hadoop-2.7.2]$ start-yarn.sh
在hadoop104上启动history
[atguigu@hadoop104 hadoop-2.7.2]$ mr-jobhistory-daemon.sh start historyserver
此时hadoop04出现JobHistoryServer服务
[atguigu@hadoop104 hadoop-2.7.2]$ jps
8097 DataNode
8276 NodeManager
8424 JobHistoryServer
8472 Jps
8201 SecondaryNameNode
10.4 重跑wordcount任务并查看历史和日志
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wcinput /historytest
发现还是访问不了日志页面,原来是没配pc端hosts文件。哈哈,不能偷懒呀