本文详细记录了centos7中安装hadoop2.8分布式集群,之后在hadoop中部署hbase,并简单演示了使用。
安装环境如下:
- CPU:六核心
- 内存:8GB
- 操作系统版本centos7.6
- hadoop-2.8.5、hbase-1.4.9
- jdk1.8.0_171
- 服务器名称分别是myserver01、myserver02、myserver03
Hadoop分布式安装
软件准备
新建/u01/soft和/u01/app两个目录,下载的软件放在/u01/soft ,解压后的文件在/u01/app中
[root@myserver01 /]#mkdir -p /u01/soft
[root@myserver01 /]#mkdir -p /u01/app
[root@myserver01 /]cd /u01
[root@myserver01 u01]# ls
app soft
[root@myserver01 u01]#cd soft
[root@myserver01 soft]# wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
[root@myserver01 soft]#cd /u01/app
[root@myserver01 app]#tar -xzvf /u01/soft/hadoop-2.8.5.tar.gz
[root@myserver01 app]# ls
hadoop-2.8.5 jdk1.8.0_171
操作系统配置
- 配置hosts
[root@myserver01 app]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.16.9.32 myserver02
192.16.9.30 myserver01
192.16.9.33 myserver03
- hadoop用户配置
三个节点添加hadoop用户,这里这是的密码是hadoop
[root@myserver01 ~]# adduser hadoop
[root@myserver01 app]# passwd hadoop
更改hadoop解压文件的所属者
[root@myserver01 app]# chown -R hadoop:hadoop /u01/app/hadoop-2.8.5
[root@myserver02 ~]# adduser hadoop
[root@myserver02 ~]# passwd hadoop
[root@myserver03 ~]# adduser hadoop
[root@myserver03 ~]# passwd hadoop
在myserver01上配置不需要密码就可以访问其他两台机器
[root@myserver01 app]#su - hadoop
[hadoop@myserver01 hadoop]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:FZK+l6YeAaIUzeKuhletlT38UTlThUzMdtteHHkngAs hadoop@myserver01
The key's randomart image is:
+---[RSA 2048]----+
| .o .....*.oo|
| ..o E... Oo+|
| .... .. ... + +*|
| ... . .... = .+|
| .. . +S. o o ..|
| . . + =.= .|
|.. . o .* . |
|o.. . ... |
|.. .. |
+----[SHA256]-----+
[hadoop@myserver01 ~]$ ssh-copy-id myserver02
[hadoop@myserver01 ~]$ ssh-copy-id myserver03
[hadoop@myserver01 ~]$ ssh-copy-id myserver01
[hadoop@myserver01 ~]$ ssh myserver01 date
Tue Feb 19 03:33:40 EST 2019
[hadoop@myserver01 ~]$ ssh myserver02 date
Tue Feb 19 03:33:43 EST 2019
[hadoop@myserver01 ~]$ ssh myserver03 date
Tue Feb 19 03:33:46 EST 2019
三个节点都要配置hadoop用户JAVA_HOME
[hadoop@myserver01 hadoop]$ cat /home/hadoop/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/u01/app/jdk1.8.0_171
export JAVA_HOME
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin
export PATH
配置hadoop
- 配置hadoop-env.sh
主要是添加JAVA_HOME
[hadoop@myserver01 hadoop]$cd /u01/app/hadoop-2.8.5/etc/hadoop
[hadoop@myserver01 hadoop]$ pwd
/u01/app/hadoop-2.8.5/etc/hadoop
[hadoop@myserver01 hadoop]$ vi hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/u01/app/jdk1.8.0_171
- 修改core-site.xml
先在每个节点创建/apps/hdptmp目录
[root@myserver01 app]# mkdir -p /apps/hdptmp
[root@myserver01 app]# chown -R hadoop:hadoop /apps/hdptmp
然后编辑core-site.xml文件
[hadoop@myserver01 hadoop]$ vi core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/apps/hdptmp</value><!-- 请先在 /apps 目录下建立 hdptmp 文件夹-->
</property><!-- file system properties -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://myserver01:9500</value>
</property>
</configuration>
- 修改mapred-site.xml
[hadoop@myserver01 hadoop]$ vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 配置yarn-site.xml
[hadoop@myserver01 hadoop]$ vi yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resouremanager.hostname</name>
<value>myserver01</value>
</property>
<!--mapreduce执行shuffle(洗牌的意思)获取数据的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 添加slaves文件
[hadoop@myserver01 hadoop]$ vi slaves
myserver01
myserver02
myserver03
- 将修改好的hadoop目录拷贝到其余两个节点
将hadoop拷贝到其余两个节点
[root@myserver01 app]#scp -r hadoop-2.8.5 root@myserver02:/u01/app/
[root@myserver01 app]#scp -r hadoop-2.8.5 root@myserver03:/u01/app/
分别进入myserver02和myserver03将hadoop-2.8.5所属者改为hadoop
- 格式化HDFS
[hadoop@myserver01 ~]$ /u01/app/hadoop-2.8.5/bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
19/02/19 03:42:21 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hadoop
STARTUP_MSG: host = myserver01/172.16.9.30
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.5
- 启动hadoop
启动hdfs
此时如果提示要输入某个节点的密码表明节点信任没配置好。
[hadoop@myserver01 ~]$ /u01/app/hadoop-2.8.5/sbin/start-dfs.sh
此时在myserver01用jps可以看到相关进程
jps查看进程状态
[hadoop@myserver01 ~]$ jps
6160 Jps
5608 DataNode
5775 SecondaryNameNode
启动yarn
/u01/app/hadoop-2.8.5/sbin/start-yarn.sh
[hadoop@myserver01 ~]$ jps
6823 Jps
5608 DataNode
6316 ResourceManager
5775 SecondaryNameNode
6623 NodeManager
当然也可以使用使用命令一次性开启或者关闭所有进程
[hadoop@myserver01 ~]$ /u01/app/hadoop-2.8.5/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [myserver01]
myserver01: stopping namenode
myserver02: stopping datanode
myserver03: stopping datanode
myserver01: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
myserver02: no nodemanager to stop
myserver03: no nodemanager to stop
myserver01: stopping nodemanager
myserver01: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
[hadoop@myserver01 ~]$ /u01/app/hadoop-2.8.5/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [myserver01]
myserver01: starting namenode, logging to /u01/app/hadoop-2.8.5/logs/hadoop-hadoop-namenode-myserver01.out
myserver03: starting datanode, logging to /u01/app/hadoop-2.8.5/logs/hadoop-hadoop-datanode-myserver03.out
myserver01: starting datanode, logging to /u01/app/hadoop-2.8.5/logs/hadoop-hadoop-datanode-myserver01.out
myserver02: starting datanode, logging to /u01/app/hadoop-2.8.5/logs/hadoop-hadoop-datanode-myserver02.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /u01/app/hadoop-2.8.5/logs/hadoop-hadoop-secondarynamenode-myserver01.out
starting yarn daemons
starting resourcemanager, logging to /u01/app/hadoop-2.8.5/logs/yarn-hadoop-resourcemanager-myserver01.out
myserver02: starting nodemanager, logging to /u01/app/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-myserver02.out
myserver03: starting nodemanager, logging to /u01/app/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-myserver03.out
myserver01: starting nodemanager, logging to /u01/app/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-myserver01.out
检查hadoop
上面已经演示过使用jps查看hadoop进程,hadoop启动后也有相应的web页面可以查看hadoop信息
访问datanode
http://192.16.9.30:50070/dfshealth.html#tab-overview
截图
访问namenode
http://192.16.9.30:50075/
截图
HDFS基本使用
将本地文件拷贝的hdfs
[hadoop@myserver01 ~]$ cd /u01/app/hadoop-2.8.5/
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -copyFromLocal LICENSE.txt /
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -copyFromLocal README.txt /
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -ls /
Found 3 items
-rw-r--r-- 3 hadoop supergroup 99253 2019-02-20 04:46 /LICENSE.txt
-rw-r--r-- 3 hadoop supergroup 1366 2019-02-20 04:47 /README.txt
查看hdfs文件内容
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -cat /LICENSE.txt
hdfs创建目录
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -mkdir /gjwtttt
hdfs删除文件
[hadoop@myserver01 hadoop-2.8.5]$ /u01/app/hadoop-2.8.5/bin/hdfs dfs -rm /LICENSE.txt
Deleted /LICENSE.txt
大体来说跟在Linux中操作文件系统类似
安装HBASE
准备软件
[root@myserver01 /]# cd /u01/soft
下载软件
[root@myserver01 soft]# wget http://mirror.bit.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz
解压软件
[root@myserver01 soft]# cd /u01/app
[root@myserver01 app]# tar -xzvf /u01/soft/hbase-1.4.9-bin.tar.gz
改变文件所属
[root@myserver01 app]#chown -R hadoop:hadoop hbase-1.4.9
将hbase拷贝到其余两个节点
[root@myserver01 app]#scp -r hbase-1.4.9 root@myserver02:/u01/app/
[root@myserver01 app]#scp -r hbase-1.4.9 root@myserver03:/u01/app/
分别进入myserver02和myserver03将hbase-1.4.9所属者改为hadoop
配置hbase
三个节点的hbase都要做配置,也可以在配置完成后拷贝hbase
- 三个节点修改hbase-env.sh
[root@myserver01 app]su - hadoop
[hadoop@myserver03 ~]$ cd /u01/app/hbase-1.4.9/
[hadoop@myserver03 hbase-1.4.9]$ vi conf/hbase-env.sh
export JAVA_HOME=/u01/app/jdk1.8.0_171
- 三个节点修改hbase-site.xml
[hadoop@myserver01 conf]$ vi hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://myserver01:9500/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>myserver01,myserver02,myserver03</value>
</property>
</configuration>
- 启动hbase
[hadoop@myserver01 hbase-1.4.9]$ bin/start-hbase.sh
[hadoop@myserver01 hbase-1.4.9]$ jps
11424 ResourceManager
11537 NodeManager
11250 SecondaryNameNode
12274 HQuorumPeer
12418 HRegionServer
12339 HMaster
11077 DataNode
10969 NameNode
12718 Jps
此时可以看到hbase进程了
也可以在hdfs中查看是否有hbase目录
[hadoop@myserver01 hadoop-2.8.5]$ ./bin/hdfs dfs -ls /
Found 3 items
-rw-r--r-- 3 hadoop supergroup 1366 2019-02-20 04:47 /README.txt
drwxr-xr-x - hadoop supergroup 0 2019-02-20 04:49 /gjwtttt
drwxr-xr-x - hadoop supergroup 0 2019-02-19 05:00 /hbase
hbase基本使用
这里用hbase shell简单演示下hbase如何使用
这里简单创建一个表演示下
[hadoop@myserver01 hbase-1.4.9]$ ./bin/hbase shell
2019-02-19 04:59:48,625 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018
hbase(main):001:0>
hbase(main):001:0> create 'test','cf'
0 row(s) in 2.1080 seconds
=> Hbase::Table - test
hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0460 seconds
=> ["test"]
hbase(main):003:0> desc test
ArgumentError: wrong number of arguments (0 for 2)
hbase(main):004:0> desc 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCOD
ING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', RE
PLICATION_SCOPE => '0'}
1 row(s) in 0.0830 seconds
hbase(main):005:0> put 'test','row1','cf:name','wangziqiang'
0 row(s) in 0.2470 seconds
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:name, timestamp=1550570481753, value=wangziqiang
1 row(s) in 0.1400 seconds