环境介绍
hadoop001:192.168.199.102
hadoop002:192.168.199.103
hadoop003:192.168.199.104
HostName配置:
sudo vi /etc/sysconfig/network
NETWOKRING=yes
HOSTNAME=hadoop001
hostname和ip地址的设置:sudo vi /etc/hosts
192.168.199.102 hadoop001
192.168.199.103 hadoop002
192.168.199.104 hadoop003
各节点角色分配:
hadoop001:NameNode / DataNode ResourceManager / NodeManager
hadoop002:DataNode NodeManager
hadoop003:DataNode NodeManager
前置配置
1)ssh免密码登录
在每台机器上都要运行:ssh-keygen -t rsa
2)以hadoop001为主机器
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop0001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop0002
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop0003
3)jdk安装
在hadoop001机器上解压jdk压缩包,并设置JAVA_HOME到系统环境变量
集群搭建
1)hadoop环境配置分发
Hadoop安装
在hadoop001机器上解压hadoop压缩包,并设置HADOOP_HOME到系统环境变量
配置hadoop-env.sh
export JAVA_HOME=/usr/hadoop/app/jdk1.7.0_79
配置core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://10.6.24.143:8020</value>
</property>
配置hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/tmp/dfs/data</value>
</property>
配置yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<---分布式和伪分布式只有下面的这一点区别-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>机器别名或者ip</value>
</property>
配置mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置集群节点slaves,这里也是跟伪分布式有区别的地方
hadoop001
hadoop002
hadoop003
2)把Hadoop安装配置分发到其他节点
scp -r ~/app hadoop@hadoop002:~/
scp -r ~/app hadoop@hadoop003:~/
scp ~/.bash_profile hadoop@hadoop002:~/
scp ~/.bash_profile hadoop@hadoop003:~/
在hadoop002和hadoop003上面对.bash_profile执行生效命令:source ~/.bash_profile
3)对NameNode做格式化:只需要在主节点上执行格式化即可
hdfs namenode -format
4)启动集群:只要在hadoop001上面执行即可
sbin/start-all.sh
5)验证
第一种方式:jps
hadoop001:应该有5个服务
SecondaryNameNode
DataNode
NodeManager
NameNode
ResourceManager
hadoop002:应该有2个服务
NodeManager
DataNode
hadoop003:应该有2个服务
NodeManager
DataNode
第二种方式:
webui
http://hadoop001:50070
http://hadoop002:50070
http://hadoop003:50070
6)停止集群
sbin/stop-all.sh
集群使用
HDFS
YARN
将Hadoop实战项目运行在Hadoop集群上
把待处理的数据和开发的jar包存上传到hadoop001主节点机器上运行即可。
运行命令:
hadoop jar /usr/hadoop/hadoop-2.6.0-cdh5.7.0/lib/bigdata-1.1.1.jar com.sc.mapreduce.WordCountApp hdfs://hadoop001:8020/xxx/test.txt hdfs://hadoop001:8020/xxx/output/wc