1 配置jiava环境
- 查看安装的java版本
rpm -qa|grep java
- 查看java安装目录
which java
- 配置hadoop-env.sh中的JAVA_HOME,复制上一条命令的结果,粘贴到JAVA_HOME中,把连带bin之后的字符删除
2 配置hadoop四大模块:common,hdfs,yarn,mapreduce
- 配置common,core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
#configure temp directory
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.2/data/tmp</value>
</property>
</configuration>
- 配置hdfs,hdfs-site.xml
配置副本数
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
3 启动文件系统
- 格式化文件系统
bin/hdfs namenode -format
- 启动namenode,datanode,secondarynamenode
sbin/start-dfs.sh
jps #查看启动的进程
9809 SecondaryNameNode
3752 DataNode
4171 Jps
3647 NameNode
netstat -ntlp 查看进程占用的端口
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9864 0.0.0.0:* LISTEN 3752/java
tcp 0 0 0.0.0.0:9866 0.0.0.0:* LISTEN 3752/java
tcp 0 0 0.0.0.0:9867 0.0.0.0:* LISTEN 3752/java
tcp 0 0 0.0.0.0:9868 0.0.0.0:* LISTEN 3922/java
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 3647/java
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.8:8020 0.0.0.0:* LISTEN 3647/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:33530 0.0.0.0:* LISTEN 3752/java
- 查看namenode状态
http://ip:9870/
- 创建用户
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
- 在分布式系统上创建输入文件夹
$ bin/hdfs dfs -put etc/hadoop input
- 运行实例
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
-获取, 查看日志
$ bin/hdfs dfs -get output output
$ cat output/*
-
在浏览器上查看
4 启动yarn
- 配置mapreduce,mapred-site.xml
- 配置yarn,yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
- 配置mapreduce,mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
- 启动
sbin/start-yarn.sh
jps #查看进程
3922 SecondaryNameNode
6276 Jps
3752 DataNode
6056 ResourceManager
3647 NameNode
6191 NodeManager
-
查看集群管理
[http://192.168.1.8:8088/cluster]
执行mapreduce实例
错误
Error:/bin/bash: /bin/java: No such file or directory
/bin/java不存在,则创建
sudo ln -s /opt/module/jdk1.8.0_162/bin/java /bin/java
- 关闭
sbin/stop-dfs.sh
sbin/stop-yarn.sh
5 配置
- 指定namenode主机: core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-yarn.cloudyhadoop.com:8020</value>
</property>
- 指定datanode主机:worker
- 指定secondarynamenode主机:hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>0.0.0.0:9868</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
- 指定resourcemanager主机名:yarn-site.xml
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>0.0.0.0</value>
</property>
- 指定nodemanager主机名:yarn-site.xml
<property>
<description>The hostname of the NM.</description>
<name>yarn.nodemanager.hostname</name>
<value>0.0.0.0</value>
</property>
- 指定historyserver主机名:mapred-site.xml
<name>mapreduce.jobhistory.admin.address</name>
<value>0.0.0.0:10033</value>
<description>The address of the History server admin interface.</description>
</property>