假设我们有3台虚拟机,主机名分别是master、slave1和slave2。
这3台虚拟机的Hadoop的HA集群部署如下:
master | slave1 | slave2 |
---|---|---|
Zookeeper | Zookeeper | Zookeeper |
JournalNode | JournalNode | JournalNode |
NodeManager | NodeManager | NodeManager |
DataNode | DataNode | DataNode |
NameNode | NameNode | none |
zkfc | zkfc | none |
ResourceManager | none |
ResourceManager |
集群的启动流程如下:
- 分别在 3 台机器上启动 Zookeeper
[root@master ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave2 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
- 启动Zookeeper之后,可以分别在3台机器上使用如下命令查看Zookeeper的启动状态:
[root@master ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Mode: follower
[root@slave1 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Mode: leader
[root@slave2 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.11/bin/../conf/zoo.cfg
Mode: follower
- 在master机器上启动HDFS:
[root@master ~]# start-dfs.sh
- 在master机器上启动YARN:
[root@master ~]# start-yarn.sh
- 在slave2机器上单独启动一个ResourceManager:
[root@slave2 ~]# yarn-daemon.sh start resourcemanager
- 最后,分别在3台机器上使用jps命令查看进程,确定是否启动成功:
[root@master ~]# jps
35872 DataNode
28868 QuorumPeerMain
37799 NodeManager
36216 JournalNode
35641 NameNode
36521 DFSZKFailoverController
37625 ResourceManager
40138 Jps
[root@slave1 ~]# jps
35266 JournalNode
35028 DataNode
28213 QuorumPeerMain
38773 Jps
36650 NodeManager
34843 NameNode
35517 DFSZKFailoverController
[root@slave2 ~]# jps
28689 QuorumPeerMain
38953 Jps
36874 NodeManager
38812 ResourceManager
35614 JournalNode
35343 DataNode
如果某一个NameNode进程挂掉了的话,就使用如下命令单独启动一个NameNode:
hadoop-daemon.sh start namenode
下面是停止Hadoop的HA集群的流程:
- 在master机器上停止HDFS:
[root@master ~]# stop-dfs.sh
- 在master机器上停止YARN:
[root@master ~]# stop-yarn.sh
- 在slave2机器上单独停止ResourceManager:
[root@slave2 ~]# yarn-daemon.sh stop resourcemanager
- 分别在3台机器上停止Zookeeper:
[root@master ~]# zkServer.sh stop
[root@slave1 ~]# zkServer.sh stop
[root@slave2 ~]# zkServer.sh stop
- 最后,分别在3台机器上使用jps命令查看进程,确定有关进程是否停止成功。
[root@master ~]# jps
45307 Jps
[root@slave1]# jps
40713 Jps
[root@slave2 ~]# jps
42108 Jps