1. 借鉴
2. 开始
我们的集群规划如下:
hadoop01[172.173.16.10] | hadoop02[172.173.16.11] | hadoop03[172.173.16.12] | |
---|---|---|---|
HDFS | NameNode DataNode |
DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManager |
NodeManager |
PORT | 22,9000,50070 | 22 | 22 |
镜像准备
- docker hub 下载
docker pull caiserkaiser/centos-ssh
创建自定义网络
docker network create -d bridge --subnet "172.173.16.0/24" --gateway "172.173.16.1" datastore_net
启动容器
docker run -it -d --network datastore_net --ip 172.173.16.10 --name hadoop01 caiser/centos-ssh:7.8
下载并配置hadoop
下载hadoop
-
拷贝到容器内
docker cp ~/Downloads/hadoop-2.7.2.tar.gz c446857be6c0:/opt/envs
-
配置hadoop环境变量
a. 解压
tar -zxvf hadoop-2.7.2.tar.gz
b. 编写 /etc/profile
vi /etc/profile
c. 配置
export HADOOP_HOME=/opt/envs/hadoop-2.7.2 export PATH=$PATH:$HADOOP_HOME/sbin export PATH=$PATH:$HADOOP_HOME/bin
d. 使生效
source /etc/profile
e. 复制到~/.bashrc目录
cat /etc/profile >> ~/.bashrc
f. 使生效
source ~/.bashrc
hadoop-HDFS配置
core-site.xml
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml.bak
-
编辑core-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml
-
根据我们上面的规划进行配置
<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定Hadoop运行时产生的临时文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/datas/tmp</value> </property>
hadoop-env.sh
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh.bak
-
编辑hadoop-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
-
配置java home
a. 找到下面这两行# The java implementation to use. export JAVA_HOME=${JAVA_HOME}
b. 然后替换为以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hdfs-site.xml
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml.bak
-
编辑hdfs-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
-
根据我们的规划进行配置
<!-- 设置数据副本数量 --> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 设置Hadoop Second NameNode的地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop03:50090</value> </property>
hadoop-YARN配置
yarn-site.xml
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml.bak
-
编辑yarn-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml
-
根据我们上面的规划进行配置
<!-- 指定Reducer 获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN中ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop02</value> </property>
yarn-env.sh
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh.bak
-
编辑yarn-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh
-
配置java home
a. 找到下面这两行# some Java parameters # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
b. 然后替换为以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hadoop-MapReduce配置
mapred-site.xml
-
备份[本身有template]
cp /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml.template /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml
-
编辑mapred-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml
-
根据我们上面的规划进行配置
<!-- 指定MapReduce 运行在YARN上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
mapred-env.sh
-
备份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh.bak
-
编辑mapred-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh
-
配置java home
a. 找到下面这一行# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
b. 然后替换为以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hadoop-节点配置
-
编辑/etc/hosts,并添加以下hostname
172.173.16.10 hadoop01 172.173.16.11 hadoop02 172.173.16.12 hadoop03
-
编辑slaves
vi /opt/envs/hadoop-2.7.2/etc/hadoop/slaves
-
配置节点[没有空格,空行]
hadoop01 hadoop02 hadoop03
安装which
好吧,这是我在后面format namenode的时候出现的问题,所以在这里最好是装一下,否则,后面每台机器都得运行一遍
yum install which
保存为镜像并移除容器
docker commit c446857be6c0 caiser/hadoop:2.7.2
docker rm c446857be6c0
启动容器
docker run -it -d --network datastore_net --ip 172.173.16.10 --name hadoop01 caiser/hadoop:2.7.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.11 --name hadoop02 caiser/hadoop:2.7.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.12 --name hadoop03 caiser/hadoop:2.7.2 bin/bash
配置ssh免密登录
-
进入容器
docker exec -it hadoop01 /bin/bash
-
到~/.ssh目录下生成秘钥
ssh-keygen -t rsa
-
拷贝秘钥到hadoop01,hadoop02和hadoop03
a.[如果没开启]三个容器没有开启ssh服务[ps -ef | grep ssh],需要依次执行
/usr/sbin/sshd -D &
b. 拷贝秘钥
ssh-copy-id hadoop01 ssh-copy-id hadoop02 ssh-copy-id hadoop03
hadoop02和hadoop03依次执行上述1-3步骤
启动hdfs
在hadoop01上启动
因为我们已经将sbin加入到PATH,索引我们直接执行
-
格式化
hdfs namenode -format
-
启动
start-dfs.sh
看到这种就是成了
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-namenode-3118b3248ebd.out
hadoop03: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-777368e252cd.out
hadoop02: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-8a20f3cf05a1.out
hadoop01: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-3118b3248ebd.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-777368e252cd.out
启动yarn
在hadoop02上启动
因为我们已经将sbin加入到PATH,索引我们直接执行
-
启动
start-yarn.sh
看到这种就是成了
starting yarn daemons
starting resourcemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-resourcemanager-777368e252cd.out
hadoop01: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-3118b3248ebd.out
hadoop02: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-8a20f3cf05a1.out
hadoop03: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-777368e252cd.out