前置条件
- 各软件版本:hadoop-2.7.7、hbase-2.1.5 、jdk1.8.0_211、zookeeper-3.4.10、apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
- 至少 3 台 Centos 服务器,主机名分别为:hadoop0001、hadoop0002、hadoop0003
- 这里所有的软件将安装在 hadoop 用户的 /home/hadoop/app 目录下
- 在每台服务器设置 hosts
[hadoop@hadoop0001 ~]$ vim /etc/hosts
host 内容如下:
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.1.102 hadoop0001
10.2.1.103 hadoop0002
10.2.1.104 hadoop0003
- ssh 免密登录(此步骤可以忽略,但 Hadoop 每次启动都需要输入密码)
在 hadoop0001 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0002 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0003 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
验证免密登录
[hadoop@hadoop0001 ~]$ ssh localhost
Last login: Fri Jan 4 13:45:54 2019 //出现这个结果表示免密登录成功
- JDK 安装
JDK 版本:
Linux:jdk-8u192-linux-x64.tar.gz
JDK 环境变量配置:
# 用户家目录下
[hadoop@hadoop0001 ~]$ vim .bashrc
添加以下内容:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_192
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH
最后使环境变量生效:
# 用户家目录下
[hadoop@hadoop0001 ~]$ . .bashrc
JDK 验证:
java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version
将 hadoop0001 的 JDK 复制到其他服务器上
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0002:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0003:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0002:/etc/profile
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0003:/etc/profile
- NTP 服务搭建
每台服务器上安装 ntp
[hadoop@hadoop0001 ~]$ yum install -y ntp
hadoop0001 配置 ntp
[hadoop@hadoop0001 ~]$ vim /etc/ntp.conf
添加以下配置:
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
logfile /var/log/ntpd.log
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
完整配置文件(ntp.conf):
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
logfile /var/log/ntpd.log
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Disable the monitoring facility to prevent amplification attacks using ntpdc
# monlist command when default restrict does not include the noquery flag. See
# CVE-2013-5211 for more details.
# Note: Monitoring will not be disabled with the limited restriction flag.
disable monitor
时间服务器可参考:https://www.pool.ntp.org/zone/asia
时间同步:
[hadoop@hadoop0001 ~]$ sudo ntpdate -u ntp1.aliyun.com
16 Jul 16:46:39 ntpdate[12700]: adjust time server 120.25.115.20 offset -0.002546 sec
启动时间服务:
[hadoop@hadoop0001 ~]$ sudo systemctl start ntpd
时间服务开机自启:
[hadoop@hadoop0001 ~]$ sudo systemctl enable ntpd
在 hadoop0002 和 hadoop0003 配置 ntp 客户端
在 /etc/ntp.conf 配置如下代码
server hadoop0001
查看 ntp 是否同步
如下表示未同步
[root@hadoop0002 ~]# ntpstat
unsynchronised
time server re-starting
polling server every 8 s
如下表示已同步
[root@hadoop0001 ~]# ntpstat
synchronised to NTP server (120.25.115.20) at stratum 3
time correct to within 976 ms
polling server every 64 s
注意:同步需要 10 分钟左右
Hadoop 安装
下载 Hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
解压 Hadoop
tar -zxvf hadoop-2.7.7.tar.gz
配置 hadoop-env.sh
# 根据实际业务需要配置
export HADOOP_HEAPSIZE=1024
配置 mapred-env.sh
export JAVA_HOME=${JAVA_HOME}
配置 yarn-env.sh
# 根据实际业务需要配置
JAVA_HEAP_MAX=-Xmx512m
YARN_HEAPSIZE=1024
配置 core-site.xml
<!-- hdfs 端口 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0001:8020</value>
</property>
<!-- hadoop 临时数据目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/application/hadoop-2.7.7/data</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>14400</value>
</property>
配置 yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop0001</value>
<discription>指定 YARN 的 ResourceManager 的地址</discription>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<discription>日志聚集功能</discription>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<discription>Reducer 获取数据方式</discription>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<discription>日志保留时间设置 7 天</discription>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>15000</value>
<discription>每个节点可用内存,单位MB</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>100</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>15000</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
<discription>NodeManager总的可用虚拟CPU个数</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<discription>单个可申请的最小。比如设置为1,则运行MapRedce作业时,每个Task最少可申请1个虚拟CPU</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>4</value>
<discription>单个可申请的最大虚拟CPU个数。比如设置为4,则运行MapRedce作业时,最多可申请4个虚拟CPU</discription>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.fair.preemption.cluster-utilization-threshold</name>
<value>0.8</value>
</property>
配置 hdfs-site.xml
<!-- hdfs 数据副本数目 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- hdfs 存储 fsimage 的地方
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/application/hadoop-2.8.5/data/hdfs/name</value>
</property>
-->
<!-- hdfs 数据存放 block 的地方
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/application/hadoop-2.8.5/data/hdfs/data</value>
</property>
-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop0001:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop0001:50070</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
配置 mapred-site.xml
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop0001:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop0001:19888</value>
</property>
<!-- 指定 MR 运行在 Yarn 上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置 slaves (/home/hadoop/app/hadoop-2.7.7)
hadoop0001
hadoop0002
hadoop0003
配置 Hadoop 环境变量
在用户家目录下的 .bashrc
# added by Hadoop installer
export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.7
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
使环境生效:
. .bashrc
将配置好的 hadoop 发送到其他服务器
[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0002:~/app/hadoop-2.7.7
[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0003:~/app/hadoop-2.7.7
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
在主 master 初始化 namenode
hadoop namenode -format
启动 hadoop 集群
# mater 节点 出现 NameNode、SecondaryNameNode,其他机器上出现 DataNode 说明集群搭建成功
start-all.sh
停止集群
stop-all.sh
Zookeeper 分布式集群搭建
下载 Zookeeper
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
解压 Zookeeper
tar -zxvf zookeeper-3.4.10.tar.gz
配置 zoo.cfg
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
配置内容如下:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=20
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/app/zookeeper-3.4.10/data
dataLogDir=/root/app/zookeeper-3.4.10/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop0001:2888:3888
server.2=hadoop0002:2888:3888
server.3=hadoop0003:2888:3888
在 zookeeper 根目录下创建 data 和 logs 文件夹
mkdir data
mkdir logs
在 data 目录下创建 myid
vim myid
内容为:
1
配置 zookeeper 环境变量
在用户家目录下的 .bashrc
# added by zookeeper installer
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.10
export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib
export PATH=$PATH:$ZOOKEEPER_HOME/bin
将配置好的 zookeeper 发送到其他机器上
[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0002:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0003:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
修改其他机器的 myid
将其他节点的 myid 修改为 2、3,保证每台机器的 myid 在集群内唯一
启动 zookeeper 服务
每台机器执行:
zkServer.sh start
查看 zookeeper 状态
zkServer.sh status
Hbase HA 分布式集群搭建
下载 hbase
wget http://mirror.bit.edu.cn/apache/hbase/2.1.5/hbase-2.1.5-bin.tar.gz
解压 hbase
tar -zxvf hbase-2.1.5-bin.tar.gz
配置 hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop0001:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 0.98 后的新变动,之前版本没有.port,默认端口为 60000 -->
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop0001,hadoop0002,hadoop0003</value>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.abortonerror</name>
<value>false</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/root/app/zookeeper-3.4.10/data</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsyn c).
Disable this if you intend to run on LocalFileSystem, denoted by a roo tdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures . If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
</description>
</property>
配置 regionservers
在 hbase 根目录下的 conf 目录下的 regionservers 文件加入如下配置:
# 主机名即 host
hadoop0001
hadoop0002
hadoop0003
配置 hbase 环境变量
在用户家目录下的 .bashrc
# added by hbase installer
export HBASE_HOME=/root/app/hbase-2.1.5/
export CLASSPATH=$CLASSPATH:$HBASE_HOME/lib
export PATH=$PATH:$HBASE_HOME/bin
将配置好的 hbase 发送到其他机器
[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0002:~/app/hbase-2.1.5
[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0003:~/app/hbase-2.1.5
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
配置 backup-masters(备用 master 节点)
在 hbase 根目录下的 conf 目录下的 backup-masters文件加入如下配置:
# master 节点配置,可配置多个
hadoop0002
启动 hbse 集群
start-hbase.sh
注意:在主节点出现 HMaster、HRegionServer(有可能没有,属于正常)及备用节点 出现 HMaster、HRegionServer;其他节点出现 HRegionServer;说明Hbase集群搭建成功;
停止 hbase 集群
stop-hbase.sh
Phoenix 集群安装
下载 Phoenix
wget http://mirror.bit.edu.cn/apache/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
解压 Phoenix
tar -zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
复制以下 jar 包到所有节点的 Habse 根目录下的 lib 目录下
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-queryserver.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-server.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-core-5.0.0-HBase-2.0.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
配置 Phoenix 环境变量(无需复制到其他节点)
# added by phoenix installer
export PHOENIX_HOME=/root/app/apache-phoenix-5.0.0-HBase-2.0-bin
export CLASSPATH=$CLASSPATH:$PHOENIX_HOME
export PATH=$PATH:$PHOENIX_HOME/bin
启动 Phoenix queryserver 模式
queryserver.py start
停止 Phoenix queryserver 模式
queryserver.py stop
连接 Phoenix queryserver
sqlline-thin.py hadoop0001:8765
客户端 jdbc 连接(jdbcUrl)
jdbc:phoenix:thin:url=http://10.2.1.102:8765?doAs=alice