1.Hive 日志位置
[hadoop@hadoop001 ~]$ cd $HIVE_HOME/conf
[hadoop@hadoop001 conf]$ cat hive-log4j.properties.template
日志文件是在~/tmp/hadoop里的hive.log
[hadoop@hadoop001 ~]$ cd /tmp/hadoop
考虑到Linux系统下/tmp文件夹可能会定期清空,所以在此修改一下log日志文件的默认存放路径
[hadoop@hadoop001 ~]$ cd $HIVE_HOME/conf
[hadoop@hadoop001 conf]$ cp hive-log4j.properties.template hive-log4j.properties
[hadoop@hadoop001 conf]$ vi hive-log4j.properties
hive.log.dir=/home/hadoop/app/tmp/
hive.log.file=hive.log
2.DDL
描述Hive表数据的结构:create , drop , alter...
Hive的真实数据存放在HDFS上,元数据存放在mysql上
1)Hive数据抽象的结构:
Hive的databases存放于HDFS之上,相当于一个顶层文件夹
Hive的tables也存放于HDFS之上,它对应存放的路径必然是在其对应的databases所指向的目录之下
Hive的Partition存放在table文件夹下
Hive的bucket对应的就是一个具体的文件
hive> create database hive;
OK
Time taken: 3.778 seconds
hive> show databases;
OK
default
hive
Time taken: 1.46 seconds, Fetched: 2 row(s)
[hadoop@hadoop001 ~]$ hdfs fs -ls /user/hive/warehouse
18/06/17 09:53:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x - hadoop supergroup 0 2018-06-17 09:47 /user/hive/warehouse/hive.db
a)发现有WARN:
如果是64位直接在log4j日志中去除告警信息。在$HADOOP_HOME/etc/hadoop/log4j.properties文件中添加log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
[hadoop@hadoop001 ~]$ cd $HADOOP_HOME/etc/hadoop
[hadoop@hadoop001 hadoop]$ vi log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
[hadoop@hadoop001 hadoop]$ hdfs dfs -ls /user/hive/warehouse
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2018-06-17 09:47 /user/hive/warehouse/hive.db
WARN就没有了
b)default是Hive中默认的一个数据库,存放路径为/user/hive/warehouse/
hive> desc database default;
OK
default Default Hive database hdfs://192.168.137.141:9000/user/hive/warehouse public ROLE
如果想修改数据库的默认存储路径,可以vi hive-site.xml文件进行配置(一般不用修改)
[hadoop@hadoop001 ~]$ cd $HIVE_HOME/conf
[hadoop@hadoop001 conf]$ vi hive-site.xml
2)创建数据库Create Database
官网说明:
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value,...)];
DATABASE|SCHEMA中的|表示二选一;
COMMENT:添加数据库的描述
IF NOT EXISTS表示判断新创建的database是否已经存在,不存在就会创建,存在就不创建(生产上注意要加上这句话);
LOCATION hdfs_path表示可以指定新建数据库的存放路径,不加默认在/user/hive/warehouse/路径下;
WITH DBPROPERTIES可添加数据库相关参数,例如'creator'='ruoze','date'='2018-06-17'
a)IF NOT EXISTS
hive>
> create database hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database hive already exists
因为数据库hive已经存在,所以再创建一次会报错,添加IF NOT EXISTS之后:
hive> create database IF NOT EXISTS hive;
OK
Time taken: 0.119 seconds
加上IF NOT EXISTS后就不再报错,生产上注意要加上这句话
b)LOCATION hdfs_path
hive> create database hive3 LOCATION '/zh';
hive> use hive3;
hive> create tablec(id int);
指定数据库的存放路径为/zh,并在其中创建c表格
在hdfs中查看一下:
[hadoop@hadoop001 ~]$ hadoop fs -ls /
drwx-wx-wx - hadoop supergroup 0 2018-06-13 22:09 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-06-16 15:53 /user
drwxr-xr-x - hadoop supergroup 0 2018-06-17 10:43 /zh
[hadoop@hadoop001 ~]$ hadoop fs -ls /zh
drwxr-xr-x - hadoop supergroup 0 2018-06-17 10:43 /zh/c
c)COMMENT&WITH DBPROPERTIES
hive> create database hive2_ruozedata comment 'this is ruozedata 03 test database' with dbproperties('creator'='ruoze','date'='2018-06-17');
hive> desc database extended hive2_ruozedata;
OK
hive2_ruozedata this is ruozedata 03 test database hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db hadoop USER {date=2018-06-17, creator=ruoze}
3)Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
RESTRICT:默认是restrict,如果该数据库还有表存在则报错;
CASCADE:级联删除数据库(当数据库还有表时,级联删除表后在删除数据库)
hive> drop database hive2;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database hive2 is not empty. One or more tables exist.)
因为hive2下面还有表格b所以不能直接删除数据库hive2,应先删除下边的所有表格才行
hive> use hive2;
OK
hive> show tables;
OK
b
hive> drop table b;
OK
hive> drop database hive2;
OK
hive> drop database hive2 CASCADE;可以强制直接删除hive2,但生产上不应使用
4)Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
5)基本数据类型
a)int、bigint、float、double、decimal、boolean、string、date、timestamp
date/timestamp和boolean不常用,生产中建议都使用string类型存
b)分隔符
行: \n
列: \001 ^A 可以自己定义
一般情况下,在创建表的时候就直接指定了分隔符:\t
6)Create Table创建表
官网说明:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
hive> create table ruozedata.ruozedata_person【或use ruozedata;create table ruozedata_person】
> (id int comment 'this is id',name string comment 'this is name')
> comment 'this is ruozedata_person'
> row format delimited
> fields terminated by '\t';
hive> use ruozedata;
hive> show tables;
OK
ruozedata_person
hive> desc formatted ruozedata_person;
##补充内容:Loading files into tables
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
local: 从本地文件系统加载数据到hive表
非local:从HDFS文件系统加载数据到hive表
OVERWRITE: 加载数据到表的时候数据的处理方式,覆盖(生产上一般是overwrite)
非OVERWRITE:追加
[hadoop@hadoop001 ~]$ cd data
[hadoop@hadoop001 data]$ rz
rz waiting to receive.
Starting zmodem transfer. Press Ctrl+C to cancel.
Transferring dept.txt...
100% 79 bytes 79 bytes/sec 00:00:01 0 Errors
Transferring emp.txt...
100% 700 bytes 700 bytes/sec 00:00:01 0 Errors
[hadoop@hadoop001 data]$ pwd
/home/hadoop/data
hive> create table ruozedata_emp
> (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t';
hive>LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE ruozedata_emp;
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp
[hadoop@hadoop001 data]$ hdfs dfs -text hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp/emp.txt
hive> select * from ruozedata_emp;
hive> CREATE table ruozedata_emp2 as select * from ruozedata_emp;
将ruozedata_emp中的数据给复制到ruozedata_emp2中
(*代表复制所有的字段进 ruozedata_emp2,如果只需要其中几个字段,把*替换为xxx,xxx...即可)
hive> CREATE table ruozedata_emp3 like ruozedata_emp;
只拷贝表结构,不拷贝表数据
7)修改表Alter Table
ALTER TABLE table_name RENAME TO new_table_name; 重命名表
hive> ALTER TABLE ruozedata_emp3 RENAME TO ruozedata_emp3_new;
8)Truncate Table
TRUNCATE TABLE table_name [PARTITION partition_spec];
清空表里的数据,但保留表结构
9)查看表的创建语句
hive> show create table ruozedata_emp;
10)创建表默认使用的是MANAGED_TABLE:内部表
还有一种表EXTERNAL_TABLE:外部表
内部表:drop时会把hdfs和mysql中的数据全删掉
外部表:drop时只把mysql中的元数据删掉,hdfs中的不会被删除
下面演示一下这两种表的异同:
a)首先,先新建一张内部表ruozedata_emp_managed
hive> create table ruozedata_emp_managed as select * from ruozedata_emp;
hive> desc formatted ruozedata_emp_managed;
b)进入mysql查看一下这张表的元数据在哪里
mysql> show databases;
mysql> use ruozedata_basic03;
mysql> show tables;
mysql> select * from tbls \G;
c)再看一下这张表在hdfs的哪里
[hadoop@hadoop001 ~]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp_managed
[hadoop@hadoop001 ~]$ hdfs dfs -text hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp_managed/000000_0
d)现在将表ruozedata_emp_managed删掉
hive> drop table ruozedata_emp_managed;
e)再查看hdfs中还有没有这张表
[hadoop@hadoop001 ~]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp_managed
ls: `hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/ruozedata_emp_managed': No such file or directory
发现hdfs中这张表已经消失了
f)再去mysql中查看还有没有
mysql> select * from tbls \G;
发现表ruozedata_emp_managed也消失了
g)创建一张外部表
hive> create EXTERNAL table ruozedata_emp_external
> (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION "/zh/external/emp";
【官网说明:The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table.创建外部表时没有默认路径,必须要自己指定路径 】
h)把emp.txt中的数据拷贝进ruozedata_emp_external
[hadoop@hadoop001 ~]$ cd data
[hadoop@hadoop001 data]$ hdfs dfs -put emp.txt hdfs://192.168.137.141:9000/zh/external/emp
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/zh/external/emp
hive> select * from ruozedata_emp_external;
i)在mysql中查看ruozedata_emp_external
mysql> select * from tbls \G;
j)删掉表ruozedata_emp_external
hive> drop table ruozedata_emp_external;
k)再次在mysql中查看ruozedata_emp_external
mysql> select * from tbls \G;
表ruozedata_emp_external消失了
l)在hdfs中查看表格ruozedata_emp_external
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/zh/external/emp
表格依然存在