Hive常用函数总结

目录：
一、关系运算
二、数学运算
三、逻辑运算
四、复杂的数据类型 array、map、struct
五、复杂类型访问操作
六、复杂类型长度统计函数
七、复合类型构造函数 map struct array
八、类型转换函数
九、日期函数
十、数值计算函数
十一、条件函数
十二、字符串函数
十三、混合函数
十四、汇总统计函数（UDAF）
十五、常用函数

查看hive内置函数

show functions;

查看某个函数用法：

//查看coalesce函数用法
desc function extended coalesce;

一、关系运算：

等值比较: =
语法：A=B
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE

举例：
select * from person where 1=1;
select * from person where 1=2;

等值比较:<=>

语法：<=>
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE
说明：作用于 =相同

举例：
select * from person where 1<=>1;
select * from person where 1<=>2;

不等值比较: <>和!=
语法: A <> B A != B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A与表达式B不相等，则为TRUE；否则为FALSE

举例：
select * from person where 1<>2;
select * from person where 1<>1;
select * from person where null<>null;---无查询结果
select * from person where 1 != 1;
select * from person where 1 != 2;
select * from person where null != null;---无查询结果

小于比较: <
语法: A < B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1<2;---有查询结果
select * from person where 2<1; ---无查询结果
select * from person where null<null;---无查询结果

小于等于比较: <=
语法: A <= B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于或者等于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1<= 2;---有查询结果
select * from person where 2<= 1; ---无查询结果
select * from person where null<=null;---无查询结果

大于比较: >
语法: A > B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1> 2;---无查询结果
select * from person where 2 >1; ---有查询结果
select * from person where null>null;---无查询结果

大于等于比较: >=
语法: A >= B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于或者等于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1>= 2;---无查询结果
select * from person where 2 >=1; ---有查询结果
select * from person where 1>=1; ---有查询结果
select * from person where null>= null;---无查询结果

区间比较
空值判断: IS NULL
语法: A IS NULL
操作类型:所有类型
描述:如果表达式A的值为NULL，则为TRUE；否则为FALSE

举例：
select * from person where 1 is null;---无查询结果
select * from person where null is null;---有查询结果

非空判断: IS NOT NULL
语法: A IS NOT NULL
操作类型:所有类型
描述:如果表达式A的值为NULL，则为FALSE；否则为TRUE

举例：
select * from person where 1 IS NOT NULL;---有查询结果
select * from person where null IS NOT NULL;---无查询结果

LIKE比较: LIKE
语法: A LIKE B
操作类型: strings
描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B 的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。

举例：
select1 from person where 'football' like 'foot%';

JAVA的LIKE操作: RLIKE
语法: A RLIKE B
操作类型: strings
描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。

举例：
select 1 from person where '123456' rlike '^\\d+$';----判断一个字符串是否全为数字
select 1 from person where '12aa456' rlike '^\\d+$';

REGEXP操作: REGEXP
语法: A REGEXP B
操作类型: strings
描述:功能与RLIKE相同

举例：
select 1 from person where 'footbar' REGEXP '^f.*r$';---有查询结果

二、数学运算：

加法操作: +
语法: A + B
操作类型：所有数值类型
说明：返回A与B相加的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int + int 一般结果为int类型，而int + double一般结果为double类型

举例：
select 1+2 from person;

减法操作: –
语法: A– B
操作类型：所有数值类型
说明：返回A与B相减的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int– int 一般结果为int类型，而int– double 一般结果为double类型

举例：
select 5-3 from person;
select 5.2-3 from person;

乘法操作: *
语法: A * B
操作类型：所有数值类型
说明：返回A与B相乘的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。注意，如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型

举例：
select 5*3 from person;
select 5.2*3 from person;

除法操作: /
语法: A / B
操作类型：所有数值类型
说明：返回A除以B的结果。结果的数值类型为double

举例：
select 5/3 from person;
select 6.0/3 from person;
select 6/3 from person;

取余操作: %
语法: A % B
操作类型：所有数值类型
说明：返回A除以B的余数。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 41 % 5 from person;

位与操作: &
语法: A & B
操作类型：所有数值类型
说明：返回A和B按位进行与操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 4 & 8 from person;-----不会这个位与操作，没听过

位或操作: |
语法: A | B
操作类型：所有数值类型
说明：返回A和B按位进行或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 4 | 8 from person;-----不会这个位与操作，没听过,后期学习补

位异或操作: ^
语法: A ^ B
操作类型：所有数值类型
说明：返回A和B按位进行异或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
 select 4 ^ 8 from person;-----不会这个位与操作，没听过,后期学习补

9．位取反操作: ~
语法: ~A
操作类型：所有数值类型
说明：返回A按位取反操作的结果。结果的数值类型等于A的类型。

举例： 
select ~6 ;
select 6 ;

三、逻辑运算：

逻辑与操作: AND 、&&
语法: A AND B
操作类型：boolean
说明：如果A和B均为TRUE，则为TRUE；否则为FALSE。如果A为NULL或B为NULL，则为NULL

举例：
select 1 from person where 1=1 and 2=2;
select 1 from person where 1=1 and 2<2;

逻辑或操作: OR
语法: A OR B
操作类型：boolean
说明：如果A为TRUE，或者B为TRUE，或者A和B均为TRUE，则为TRUE；否则为FALSE

举例：
select 1 from person  where 1=2 or 2<1;
select 1 from person  where 1=2 or 2>1;

逻辑非操作: NOT
语法: NOT A
操作类型：boolean
说明：如果A为FALSE，或者A为NULL，则为TRUE；否则为FALSE

举例：
select 1 from person  where not 1=2;

四、复杂的数据类型 array、map、struct

Hive中支持多种数据类型除了常用的TINYINT、SMALLINT、INT、BIGINT、BOOLEAN、FLOAT、DOUBLE、STRING、BINARY、TIMESTAMP、DECIMAL、DATE、VARCHAR、CHAR类型外，当然还包含一些复杂的数据类型（array、map、struct、union）。

1、数组array的用法
2.map的用法
3.struct的用法

参考文章:Hive复合数据类型array,map,struct的使用

1、数组array的用法

Array数组类型：由一系列相同数据类型的元素组成。

实例数据array.txt：姓名和工作地点

Huangbo beijing,shanghai,tianjin,Hangzhou
Xuzheng tianjin,chengdu,wuhan 
Wangbaoqiang    wuhan,shenyang,jilin

创建数据库表，该表中location的类型是数组类型

create table person(name string,location array<string>) row format delimited fields terminated by "\t" collection items terminated by ",";

数据加载到数据库

load data local inpath '/home/study/array.txt' into table person;

一些查询操作

select * from person_array;

//array类型访问: A[n]
//操作类型: A为array类型，n为int类型
//说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'
select name,location[0],size(location ) from person;

select name from person  where array_contains(location ,'beijing');

select location[3],location[4] from person;

2.map的用法

MAP：MAP包含key->value键值对，可以通过key来访问元素。比如”userlist”是一个map类型，其中username是key，password是value；那么我们可以通过userlist['username']来得到这个用户对应的password。

参考文章：Hive中复杂数据类型Map常用方法介绍

实例数据map.txt：姓名和学习成绩

huangbo yuwen:80,shuxue:89,yingyu:95
xuzheng yuwen:70,shuxue:65,yingyu:81
wangbaoqiang    yuwen:75,shuxue:100,yingyu:75

创建数据库表

create table score(name string, scores map<string,int>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';

desc formatted score;

数据加载到数据库

load data local inpath '/home/study/map.txt' into table score;

一些查询操作

select * from score;

select name from score; 

select scores from score; 
// map类型访问: M[key]
//语法: M[key]
//操作类型: M为map类型，key为map中的key值

 size(Map)函数：

3.struct的用法

实例数据structtable.txt：学号、课程及得分

1   english,80
2   math,89
3   chinese,95

创建数据库表

create table structtable(id int,course struct<name:string,score:int>) row format delimited fields terminated by '\t' collection items terminated by ',';

数据加载到数据库

load data local inpath '/home/study/structtable.txt' into table structtable;

一些查询操作

select * from structtable;
select id from structtable;
select course from structtable;
select t.course.name from structtable t;
select t.course.score from structtable t;

五、复杂类型访问操作

1. array类型访问: A[n]
语法: A[n]
操作类型: A为array类型，n为int类型
说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'。

举例：
 select location[0],location[1],location[2] from person;

2. map类型访问: M[key]
语法: M[key]
操作类型: M为map类型，key为map中的key值
说明：返回map类型M中，key值为指定值的value值。比如，M是值为{'f' -> 'foo', 'b'-> 'bar', 'all' -> 'foobar'}的map类型，那么M['all']将会返回'foobar'

举例：
select s.scores['shuxue'] from score s;

3. struct类型访问: S.x
语法: S.x
操作类型: S为struct类型
说明：返回结构体S中的x字段。比如，对于结构体struct foobar {int foo, int bar}，foobar.foo返回结构体中的foo字段

举例：
select t.course.score from structtable t;

六、复杂类型长度统计函数

1. Map类型长度函数: size(Map<K.V>)
语法: size(Map<K.V>)
返回值: int
说明:返回map类型的长度

举例：
select size(map('100','tom','101','mary'));
select size(scores) from score ;

2. array类型长度函数: size(Array<T>)
语法: size(Array<T>)
返回值: int
说明:返回array类型的长度

举例：
select size(array('100','101','102','103'));
select size(location) from person;

3、struct不能使用size()统计类型的长度

七、复合类型构造函数 map struct array

Map类型构建: map
语法: map (key1, value1, key2, value2,…)
说明：根据输入的key和value对构建map类型

举例：
select map('100','tom','200','mary');
select map('yuwen',77,'shuxue',99);

Struct类型构建: struct
语法: struct(val1, val2, val3,…)
说明：根据输入的参数构建结构体struct类型

举例：
select struct('tom','mary','tim');

array类型构建: array
语法: array(val1, val2,…)
说明：根据输入的参数构建数组array类型

举例：
select array("tom","mary","tim");

八、类型转换函数

1. 二进制转换：binary
只有string、char、varchar或binary数据可以转换为二进制数据类型。

举例
select binary('3');

2. 基础类型之间强制转换：cast
CAST函数用于将某种数据类型的表达式显式转换为另一种数据类型。CAST()函数的参数是一个表达式，它包括用AS关键字分隔的源值和目标数据类型。
语法：CAST (expression AS data_type)

举例
select  cast(123 as string);
select cast(345 AS double);

九、日期函数

UNIX时间戳转日期函数: from_unixtime
语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明:转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式。时间戳是指格林bai威治时间1970年01月01日00时00分00秒(北京du时间1970年01月01日08时00分00秒)起至现在的总秒数。

举例：
SELECT from_unixtime(1602034999, 'yyyy-MM-dd');

获取当前UNIX时间戳函数: unix_timestamp
语法: unix_timestamp()
返回值: bigint
说明:获得当前时区的UNIX时间戳

举例：
SELECT UNIX_TIMESTAMP();

将当前时间转化为时间戳函数:unix_timestamp
语法: unix_timestamp(string date)
返回值: bigint
说明:转换格式为"yyyy-MM-ddHH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。

举例：
select  unix_timestamp('2015-09-07 02:46:43');  //将当前时间转化为时间戳格式

指定格式日期转UNIX时间戳函数:unix_timestamp
语法: unix_timestamp(string date, string pattern)
返回值: bigint
说明:转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。

举例：
select unix_timestamp('20111207 13:01:03','yyyyMMddHH:mm:ss');
select unix_timestamp('20111207','yyyyMMdd');

日期时间转日期函数: to_date
语法: to_date(string timestamp)
返回值: string
说明:返回日期时间字段中的日期部分。

举例：
select to_date('2018-12-08 10:03:01');--2018-12-08  返回日期时间字段中的日期部分

日期转年函数: year
语法: year(string date)
返回值: int
说明:返回日期中的年。

举例：
select year('2018-12-08 10:03:01');--2018 返回日期时间字段中的年
select year('2018-12-08');--2018 返回日期时间字段中的年

日期转月函数: month
语法: month (string date)
返回值: int
说明:返回日期中的月份。

举例：
select month('2018-12-08 10:03:01');--2018 返回日期时间字段中的月
select month('2018-12-08');--12

日期转天函数: day
语法: day (string date)
返回值: int
说明:返回日期中的天。

举例：
select day('2018-12-08 10:03:01');--8 返回日期时间字段中的日
select day('2018-12-08');--8

日期转小时函数: hour
语法: hour (string date)
返回值: int
说明:返回日期中的小时。

举例：
select hour('2018-12-08 10:03:01');--10返回日期时间字段中的小时

日期转分钟函数: minute
语法: minute (string date)
返回值: int
说明:返回日期中的分钟。

举例：
select minute('2018-12-08 10:03:01');-----3 返回日期中的分钟

日期转秒函数: second
语法: second (string date)
返回值: int
说明:返回日期中的秒。

举例：
select second('2018-12-08 10:03:01');-----1 返回日期中的秒

日期转周函数: weekofyear
语法: weekofyear (string date)
返回值: int
说明:返回日期在当前年的周数。

举例：
select weekofyear('2018-01-08 10:03:01');--返回本年的第几周

日期比较函数: datediff
语法: datediff(string enddate, string startdate)
返回值: int
说明:返回结束日期减去开始日期的天数。

举例：
select datediff('2019-07-02','2019-07-23'),datediff('2020-07-02','2019-07-23');
//求第一个时间于第二个时间相差的时间天数

日期增加函数: date_add
语法: date_add(string startdate, int days)
返回值: string
说明:返回开始日期startdate增加days天后的日期。

举例：
select date_add('2019-07-02', 22);//当前日期增加22天

日期减少函数: date_sub
语法: date_sub (string startdate, int days)
返回值: string
说明:返回开始日期startdate减少days天后的日期。

举例：
select date_sub('2019-07-12',10);//当前日期减少10天

16.获取当前时间：current_timestamp

select current_timestamp;//获取当前日期

十、数值计算函数

取整函数: round
语法: round(double a)
返回值: BIGINT
说明:返回double类型的整数值部分（遵循四舍五入）

举例：
select round(2.6);---3.0,四舍五入取整

指定精度取整函数: round
语法: round(double a, int d)
返回值: DOUBLE
说明:返回指定精度d的double类型

举例：
select round(1.23454,2);--1.23 四舍五入保留两位小数
select round(1213232,-2);--1213200

向下取整函数: floor ，往下取整
语法: floor(double a)
返回值: BIGINT
说明:返回等于或者小于该double变量的最大的整数

举例：
select  floor(1.3) ;-- 1
select  floor(1.99) ;-- 1
select  floor(-1.3) ;--    -2
select  floor(-1.99) ;--    -2

向上取整函数: ceil
语法: ceil(double a)
返回值: BIGINT
说明:返回等于或者大于该double变量的最小的整数

举例：
select  ceil(1.0)  ;--  1
select  ceil(1.0001) ;--  2
select  ceil(1.99) ;--  2 
select  ceil(1.29)  ;--  2 
select  ceil(-1.3)  ;--    -1

向上取整函数: ceiling
语法: ceiling(double a)
返回值: BIGINT
说明:与ceil功能相同

举例：
select  ceiling(1.0);--1
select  ceiling(1.0001);--2
select  ceiling(1.99);-- 2 
select  ceiling(1.29);-- 2 
select  ceiling(-1.3) ;--     -1

取随机数函数: rand
语法: rand(),rand(int seed)
返回值: double
说明:返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列

举例：
select rand();----返回值: double,返回一个0到1范围内的随机数
select rand(rand(int seed));
select rand(3);--------返回值: double,会等到一个稳定的随机数序列

自然指数函数: exp
语法: exp(double a)
返回值: double
说明:返回自然对数e的a次方

举例：
select exp(2);

忘记自然数指数了……

以10为底对数函数: log10
语法: log10(double a)
返回值: double
说明:返回以10为底的a的对数

举例：
select  log10(35);
select  log10(100);

忘记自然数以10为底对数了……

以2为底对数函数: log2
语法: log2(double a)
返回值: double
说明:返回以2为底的a的对数

举例：
select  log2(8);

对数函数: log
语法: log(double base, double a)
返回值: double
说明:返回以base为底的a的对数

举例：
select log(100);

幂运算函数: pow
语法: pow(double a, double p)
返回值: double
说明:返回a的p次幂

举例：select pow(2,3); ---计算2的3次幂

幂运算函数: power
语法: power(double a, double p)
返回值: double
说明:返回a的p次幂,与pow功能相同

举例：select power(2,4) ;

开平方函数: sqrt
语法: sqrt(double a)
返回值: double
说明:返回a的平方根

举例：select sqrt(16);----返回16的平方根

二进制函数: bin
语法: bin(BIGINT a)
返回值: string
说明:返回a的二进制代码表示

举例：select bin(8);

十六进制函数: hex
语法: hex(BIGINT a)
返回值: string
说明:如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示

举例：select hex(30);

反转十六进制函数: unhex
语法: unhex(string a)
返回值: string
说明:返回该十六进制字符串所代码的字符串

举例：
select unhex(616263);

我也不知道这个怎么运用

进制转换函数: conv
语法: conv(BIGINT num, int from_base, int to_base)
返回值: string
说明:将数值num从from_base进制转化到to_base进制

举例：
select conv(18,10,4);---将18从十进制转化成4进制

绝对值函数: abs
语法: abs(double a) abs(int a)
返回值: double int
说明:返回数值a的绝对值

举例：
select abs(-3.9);

正取余函数: pmod
语法: pmod(int a, int b),pmod(double a, double b)
返回值: int double
说明:返回正的a除以b的余数

举例：
select pmod(9,2);

正弦函数: sin
语法: sin(double a)
返回值: double
说明:返回a的正弦值

举例：
select sin(0);

反正弦函数: asin
语法: asin(double a)
返回值: double
说明:返回a的反正弦值

举例：
select asin(1);

余弦函数: cos
语法: cos(double a)
返回值: double
说明:返回a的余弦值

举例：
select cos(0);

反余弦函数: acos
语法: acos(double a)
返回值: double
说明:返回a的反余弦值

举例：
select acos(1);

positive函数: positive
语法: positive(int a), positive(double a)
返回值: int double
说明:返回a

举例：
select positive(10）;

negative函数: negative
语法: negative(int a), negative(double a)
返回值: int double
说明:返回-a

举例：
select negative(5);

十一、条件函数

If函数: if
语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull

举例：
select if(1=2,100,200)；
select if(1=1,100,200)；

非空查找函数: COALESCE
语法: COALESCE(T v1, T v2,…)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

举例：
select COALESCE(null,null,null) ;
select COALESCE(null,'100','50') ;

nvl函数：空值转换函数。只能传2个参数
若expr1为Null，则返回expr2，否则返回expr1。但是expr1和expr2的数据类型必须为相同类型。

select nvl('asc','asd'),nvl(null,'123'),nvl('123',null),nvl(null,null);

条件判断函数：CASE
语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

举例：
Select case 100 when 50 then 'tom' when 100 then 'mary'else 'tim' end;
Select case 200 when 50 then 'tom' when 100 then 'mary'else 'tim' end；

条件判断函数：CASE
语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e

举例：
select case when 1=2 then 'tom' when 2=2 then 'mary' else'tim' end;
select case when 1=1 then 'tom' when 2=2 then 'mary' else'tim' end;

十二、字符串函数

字符ascii码函数：ascii
语法: ascii(string str)
返回值: int
说明：返回字符串str第一个字符的ascii码

举例：
select ascii('abcde')；

base64字符串
字符串连接函数：concat
语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串

举例：
select concat('abc','def','gh');
select concat('abc','-','def','-','gh');

带分隔符字符串连接函数：concat_ws
语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符

举例：
select concat_ws('-','abc','def','gh') ;

数组转换成字符串的函数：concat_ws
小数位格式化成字符串函数：format_number
字符串截取函数：substr,substring
语法: substr(string A, int start),substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串

举例：
select substr('abcde',3) ;
select substring('abcde',3);

字符串截取函数：substr,substring
语法: substr(string A, int start, int len),substring(string A, intstart, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串

举例：
select substr('abcde',3,2);---cd
 select substring('abcde',3,2);
select substring('abcde',-2,2);--de

字符串查找函数：instr
字符串位置查找函数

举例：
select instr('abc','b');--2

字符串长度函数：length
语法: length(string A)
返回值: int
说明：返回字符串A的长度

举例：
select length('abcedfg')；---7

字符串查找函数：locate
字符串格式化函数：printf
字符串转换成map函数：str_to_map
base64解码函数：unbase64(string str)
字符串转大写函数：upper,ucase
语法: upper(string A) ucase(string A)
返回值: string
说明：返回字符串A的大写格式

举例：
select upper('abSEd');
select ucase('abSEd');

字符串转小写函数：lower,lcase
语法: lower(string A) lcase(string A)
返回值: string
说明：返回字符串A的小写格式

举例：
select lower('abSEd');
select lcase('abSEd');

去空格函数：trim
语法: trim(string A)
返回值: string
说明：去除字符串两边的空格

举例：
select trim(' abc ');

左边去空格函数：ltrim
语法: ltrim(string A)
返回值: string
说明：去除字符串左边的空格

举例：
select ltrim(' abc ');

右边去空格函数：rtrim
语法: rtrim(string A)
返回值: string
说明：去除字符串右边的空格

举例：
select rtrim(' abc ');

正则表达式替换函数：regexp_replace
语法: regexp_replace(string A, string B, string C)
返回值: string
说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。

举例：
select regexp_replace('foobar', 'oo|ar', '');---fb

正则表达式解析函数：regexp_extract
语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。

举例：
select regexp_extract('foothebar', 'foo(.*?)(bar)', 1);---the
//不会正则

URL解析函数：parse_url
语法: parse_url(string urlString, string partToExtract [, stringkeyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

举例：
select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST');---facebook.com

json解析函数：get_json_object
语法: get_json_object(string json_string, string path)
返回值: string
说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

空格字符串函数：space
语法: space(int n)
返回值: string
说明：返回长度为n的字符串

举例：
select space(10);
select length(space(10));---10

重复字符串函数：repeat
语法: repeat(string str, int n)
返回值: string
说明：返回重复n次后的str字符串

举例：
select repeat('abc',5);--abcabcabcabcabc

左补足函数：lpad
语法: lpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行左补足到len位

举例：
select lpad('abc',10,'td');---tdtdtdtabc
//注意：与GP，ORACLE不同，pad不能默认

右补足函数：rpad
语法: rpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行右补足到len位

举例：
select rpad('abc',10,'td');--abctdtdtdt

分割字符串函数: split
语法: split(string str, stringpat)
返回值: array
说明:按照pat字符串分割str，会返回分割后的字符串数组

举例：
select split('abtcdtef','t');--["ab","cd","ef"]

集合查找函数: find_in_set
语法: find_in_set(string str, string strList)
返回值: int
说明:返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0

举例：
select find_in_set('ab','ef,ab,de');
select find_in_set('at','ef,ab,de') ;

分词函数：sentences
将字符串中内容按语句分组，每个单词间以逗号分隔，最后返回数组。

举例：
select sentences('Hello there! How are you?');
select sentences('Hello there How are you?');

分词后统计一起出现频次最高的TOP-K
分词后统计与指定单词一起出现频次最高的TOP-K

十三、混合函数

调用Java函数：java_method
调用Java函数：reflect
字符串的hash值：hash
十四、XPath解析XML函数
参考文章：Hive常用函数 -- 混合函数和XPath 解析 XML 函数
xpath
xpath
语法: xpath(string xmlstr,string xpath_expression)
返回值: array
说明: 从 xml 字符串中返回匹配到表达式的结果数组。

select xpath('<a><b>b1</b><b>b2</b><c>c1</c></a>','a/b/text()');
---["b1","b2"]

xpath_string
语法: xpath_string(string xmlstr,string xpath_expression)
返回值: string
说明: 默认情况下，从 xml 字符串中返回第一个匹配到表达式的节点的值。

SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', '//b')；--b1

//指定返回匹配到哪一个节点
hive> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', '//b[2]');--b2

xpath_boolean
语法: xpath_boolean (string xmlstr,string xpath_expression)
返回值: boolean
说明: 返回 xml 字符串中是否匹配 xml 表达式

SELECT xpath_boolean ('<a><b>b</b></a>', 'a/b');--true

xpath_short, xpath_int, xpath_long
语法: xpath_short (string xmlstr,string xpath_expression)
xpath_int (string xmlstr,string xpath_expression)
xpath_long (string xmlstr,string xpath_expression)
返回值: int
说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。
xpath_float, xpath_double, xpath_number
语法: xpath_float (string xmlstr,string xpath_expression)
xpath_double (string xmlstr,string xpath_expression)
xpath_number (string xmlstr,string xpath_expression)
返回值: number
说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。

select xpath_double('<a><b>10.5</b><c>11.2</c></a>','sum(a/*)');
--21.7

十四、汇总统计函数（UDAF）

个数统计函数: count
语法: count(), count(expr), count(DISTINCT expr[, expr_.])
返回值: int
说明: count()统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCTexpr[, expr_.])返回指定字段的不同的非空值的个数
总和统计函数: sum
语法: sum(col), sum(DISTINCT col)
返回值: double
说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果
平均值统计函数: avg
最小值统计函数: min
最大值统计函数: max
非空集合总体变量函数: var_pop
求指定列数值的方差

select  var_pop(age) from student;

非空集合样本变量函数: var_samp
求指定列数值的样本方差

select  var_samp(age) from student;

总体标准偏离函数: stddev_pop
求指定列数值的标准偏差

select  STDDEV_POP(age) from student;

样本标准偏离函数: stddev_samp

select  stddev_samp(age) from student;

10．中位数函数: percentile

select  percentile(age) from student;

中位数函数: percentile
参考文章：hive 分位数函数 percentile(col, p)

select  percentile(age) from student;

近似中位数函数: percentile_approx

select  percentile_approx(age,0.95) from student;
---取得排位在倒数第5%的年龄。（使用时会对年龄进行排序,一般可以用于求中位数）

近似中位数函数: percentile_approx

select  percentile_approx(age,0.5) from student;

直方图: histogram_numeric
语法: histogram_numeric(col, b)
返回值: array<struct {‘x’,‘y’}>
说明:以b为基准计算col的直方图信息。

举例：
select histogram_numeric(100,5)

集合去重数：collect_set
collect_set

举例1：
select age,concat_ws('-',collect_set(department)) id,collect_set(department) id2,concat_ws('-',collect_set(cast(id as string))) from student group by age;

举例2：
//将age转化为字符串，cast(age as string)
select  concat_ws('-',collect_set(cast(age as string))),collect_set(cast(age as string)) from student;

集合不去重函数：collect_list

举例：
select age,concat_ws('-',collect_list(department)) id,concat_ws('-',collect_list(cast(id as string))) from student group by age;

十六、表格生成函数Table-Generating Functions (UDTF)

数组拆分成多行：explode
Map拆分成多行：explode

select  explode(scores)  from score;

十五、常用函数

1、Coalesce
非空查找函数: COALESCE
语法: COALESCE(T v1, T v2,…)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

2、Explode

select  explode(scores)  from score;

4、lateral view
lateral view用于和split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
参考文章：hive中的 lateral view
参考文章：hive函数之~hive当中的lateral view 与 explode

数据pageAds.txt

front_page  1,2,3
contact_page    3,4,5

建表

//一个简单的例子，假设我们有一张表pageAds，它有两列数据，第一列是pageid string，第二列是adid_list，即用逗号分隔的广告
create table pageAds(pageid string,adid_list array<int>) row format delimited fields terminated by "\t" collection items terminated by ",";

加载数据

load data local inpath '/home/study/pageAds.txt' into table pageAds;

要统计所有广告ID在所有页面中出现的次数。
首先分拆广告ID：

select  *  from  pageAds ;

SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

接下来就是一个聚合的统计：

SELECT adid, count(1) 
    FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;

3、grouping sets
参考文章：hive中grouping sets的使用
 参考文章：Hive SQL grouping sets 用法

grouping sets是一种将多个group by 逻辑写在一个sql语句中的便利写法。
GROUPING SETS：根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL
GROUPING__ID：表示结果属于哪一个分组集合，属于虚字段
CUBE：根据GROUP BY的维度的所有组合进行聚合。
ROLLUP：为CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合。