在使用Hive 的时候,我们常常进行聚合统计运算。
聚合统计的运算函数有很多,比如 我们最常用的 GROUP BY 函数。但是常常我们需要多维度统计数据,这个时候我们就会用到Hive 的聚合统计函数
这里我们讲解下 ROLLUP, GROUPING SETS, CUBE 的含义以及用法。
我们结合案例讲解下这几个函数的使用规则
数据统计背景:
我们现在有多个公司,多个部门,多位员工的薪水情况。现在我们需要按照多种维度去统计薪水。
我们看下这几个函数的用法ROLLUP, GROUPING SETS, CUBE
我们从 GROUPING SETS , ROLLUP, CUBE 一点点讨论。
GROUPING SETS
GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统计选项,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来,下面是几个实例可以帮助我们了解.
首先我们学习下 GROUPING SETS
GROUPING SETS 用法如下:
SELECT
a,b...,f
FROM test_table
GROUP BY
a,b...,f
GROUPING SETS ((?,...,?),(xxx),(yyy))
GROUPING SETS 中间可以填写多个条件。
其中 (?,...,?) 可以为 a~f 中不重复的任意项
具体例子如下:
SELECT
a,b,SUM(c)
FROM test_table
GROUP BY
a,b
GROUPING SETS ((a),(a,b),())
等价于
SELECT
a
,NULL
,SUM(c)
FROM test_table
GROUP BY
a
UNION ALL
SELECT
a
,b
,SUM(c)
FROM test_table
GROUP BY
a
,b
UNION ALL
SELECT
NULL
,NULL
,SUM(c)
FROM test_table
;
实际案例,我们想按照公司 ,和整体 去统计员工的薪水, 但是我们想在 一条语句中完成。
我们该如何编写SQL 呢?
SQL如下:
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
GROUPING SETS ((company_name), ())
ORDER BY
grouping__id
;
输出如下:
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.13 sec HDFS Read: 11666 HDFS Write: 175 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.73 sec HDFS Read: 7060 HDFS Write: 188 SUCCESS
INFO : Total MapReduce CPU Time Spent: 7 seconds 860 msec
INFO : Completed executing command(queryId=hive_20200408032038_18e04047-b8c0-4d07-a5de-00ccbc7cb4cc); Time taken: 51.459 seconds
INFO : OK
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 0 | NULL | NULL | NULL | 133500.00 |
| 1 | x.qx | NULL | NULL | 59500.00 |
| 1 | s.zh | NULL | NULL | 74000.00 |
+---------------+---------------+-----------+----------+---------------+
可以看到 grouping_id 为0, 计算的是整体的薪水和,而grouping_id 为1, 计算的是分公司的薪水
我们刚才说过 用 GROUPING SETS 和 GROUP BY + UNION ALL 是等价的,但是它们真的一样么。我们运行 EXPLAIN 检验一下。
下面给出两个运行结果等价的SQL 的 EXPLAIN 结果。
EXPLAIN
SELECT
*
FROM
(
SELECT
0 AS mark
,NULL
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
UNION ALL
SELECT
1 AS mark
,company_name
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
) tmp
ORDER BY mark
;
SQL2
EXPLAIN
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
GROUPING SETS ((company_name), ())
ORDER BY
grouping__id
;
先贴出2个SQL的分别EXPLAIN 结果
UNION ALL 的SQL
注意 : UNION ALL 中做聚合增加更多的SQL 会增加 JOB 数量。
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1, Stage-3
Stage-3 is a root stage
Stage-0 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: datacube_salary_org
filterExpr: (pt = '20200407') (type: boolean)
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: salary (type: decimal(10,2))
outputColumnNames: salary
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(salary)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: decimal(20,2))
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: 0 (type: int), null (type: string), _col0 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col4
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Union
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string), _col4 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col4
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col4 (type: decimal(20,2))
TableScan
Union
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string), _col4 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col4
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col4 (type: decimal(20,2))
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), null (type: void), null (type: void), VALUE._col3 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: datacube_salary_org
filterExpr: (pt = '20200407') (type: boolean)
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: company_name (type: string), salary (type: decimal(10,2))
outputColumnNames: company_name, salary
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(salary)
keys: company_name (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: decimal(20,2))
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 145 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: 1 (type: int), _col0 (type: string), _col1 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col4
Statistics: Num rows: 3 Data size: 145 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
GROUPING SETS 的SQL
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: datacube_salary_org
filterExpr: (pt = '20200407') (type: boolean)
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: company_name (type: string), dep_name (type: string), user_id (type: bigint), salary (type: decimal(10,2))
outputColumnNames: company_name, dep_name, user_id, salary
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(salary)
keys: company_name (type: string), dep_name (type: string), user_id (type: bigint), 0 (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 14 Data size: 680 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col3 (type: int)
sort order: ++++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col3 (type: int)
Statistics: Num rows: 14 Data size: 680 Basic stats: COMPLETE Column stats: NONE
value expressions: _col4 (type: decimal(20,2))
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: bigint), KEY._col3 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col3 (type: int), _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col4 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: bigint), _col4 (type: decimal(20,2))
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: bigint), VALUE._col3 (type: decimal(20,2))
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
通过以上分析比较,我们不难发现 GROUPING SETS 的作业划分数量更少,
实际情况下 : GROUPING SETS 的 运行效率也要高于 UNION ALL 的 GROUP BY 形式
下面是对比的执行时间
GROUPING SETS
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.62 sec HDFS Read: 11666 HDFS Write: 175 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.51 sec HDFS Read: 7060 HDFS Write: 188 SUCCESS
INFO : Total MapReduce CPU Time Spent: 7 seconds 130 msec
INFO : Completed executing command(queryId=hive_20200408045412_4ab9e09f-436e-4433-9a1f-a03d5b32ef3e); Time taken: 49.676 seconds
INFO : OK
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 0 | NULL | NULL | NULL | 133500.00 |
| 1 | x.qx | NULL | NULL | 59500.00 |
| 1 | s.zh | NULL | NULL | 74000.00 |
+---------------+---------------+-----------+----------+---------------+
UNION ALL 的 GROUP BY
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.7 sec HDFS Read: 10541 HDFS Write: 119 SUCCESS
INFO : Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 4.34 sec HDFS Read: 10919 HDFS Write: 152 SUCCESS
INFO : Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 12932 HDFS Write: 188 SUCCESS
INFO : Total MapReduce CPU Time Spent: 13 seconds 120 msec
INFO : Completed executing command(queryId=hive_20200408045141_2033cbf6-a457-4bdb-aaec-65900b386972); Time taken: 84.365 seconds
INFO : OK
+-----------+----------+----------+----------+-------------------+
| tmp.mark | tmp._c1 | tmp._c2 | tmp._c3 | tmp.total_salary |
+-----------+----------+----------+----------+-------------------+
| 0 | NULL | NULL | NULL | 133500.00 |
| 1 | x.qx | NULL | NULL | 59500.00 |
| 1 | s.zh | NULL | NULL | 74000.00 |
+-----------+----------+----------+----------+-------------------+
可以看到 GROUPING SETS 的执行效率更高
ROLLUP
rollup可以实现从右到左递减多级的统计,显示统计某一层次结构的聚合。(即指定字段组合的递减汇总,如with rollup a,b,c,d 那对应的二进制只能是1111,1110,1100,1000,0000)
即
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
WITH ROLLUP
ORDER BY
grouping__id
;
等价于
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
GROUPING SETS ((company_name, dep_name, user_id), (company_name, dep_name), (company_name),())
ORDER BY
grouping__id
;
等价于
SELECT
company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
UNION ALL
SELECT
company_name
,dep_name
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
UNION ALL
SELECT
company_name
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
UNION ALL
SELECT
NULL
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
;
运算结果
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.21 sec HDFS Read: 11674 HDFS Write: 563 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.91 sec HDFS Read: 7448 HDFS Write: 602 SUCCESS
INFO : Total MapReduce CPU Time Spent: 8 seconds 120 msec
INFO : Completed executing command(queryId=hive_20200408052638_740f42b9-6f08-49a6-8123-9a77aedc6b19); Time taken: 50.563 seconds
INFO : OK
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 0 | NULL | NULL | NULL | 133500.00 |
| 1 | s.zh | NULL | NULL | 74000.00 |
| 1 | x.qx | NULL | NULL | 59500.00 |
| 3 | s.zh | tester | NULL | 20000.00 |
| 3 | x.qx | kiccp | NULL | 8600.00 |
| 3 | x.qx | finance | NULL | 50900.00 |
| 3 | s.zh | enginer | NULL | 54000.00 |
| 7 | x.qx | kiccp | 7 | 8600.00 |
| 7 | x.qx | finance | 6 | 13000.00 |
| 7 | x.qx | finance | 5 | 24500.00 |
| 7 | x.qx | finance | 4 | 13400.00 |
| 7 | s.zh | enginer | 2 | 26000.00 |
| 7 | s.zh | enginer | 1 | 28000.00 |
| 7 | s.zh | tester | 3 | 20000.00 |
+---------------+---------------+-----------+----------+---------------+
这里我们简要的说一下 GROUPING__ID 的计算规则
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
WITH ROLLUP
ORDER BY
grouping__id
;
可以看到 GROUP BY 中有 三个字段, 依次是 company_name, dep_name, user_id,可以看作3位二进制(0/1),(0/1),(0/1) ,低位对应 company_name,高位对应 user_id ,如果这一位聚合了(GROUP BY 中存在该字段),则为0,GROUPING 函数对应也返回0 ,最后 GROUPING_ID 为对应2进制 转换的 10进制数。
1) ROLLUP 的 全分组的子集
SELECT
company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
对应的结果
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 7 | x.qx | kiccp | 7 | 8600.00 |
| 7 | x.qx | finance | 6 | 13000.00 |
| 7 | x.qx | finance | 5 | 24500.00 |
| 7 | x.qx | finance | 4 | 13400.00 |
| 7 | s.zh | enginer | 2 | 26000.00 |
| 7 | s.zh | enginer | 1 | 28000.00 |
| 7 | s.zh | tester | 3 | 20000.00 |
+---------------+---------------+-----------+----------+---------------+
因为 GROUP BY 这三个字段都有,则为 111 ,所以 GROUPING__ID 为7
2) ROLLUP company_name, dep_name 分组的子集
SELECT
company_name
,dep_name
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
对应的子集
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 3 | s.zh | tester | NULL | 20000.00 |
| 3 | x.qx | kiccp | NULL | 8600.00 |
| 3 | x.qx | finance | NULL | 50900.00 |
| 3 | s.zh | enginer | NULL | 54000.00 |
+---------------+---------------+-----------+----------+---------------+
CUBE
cube简称数据魔方,可以实现hive多个任意维度的查询,cube(a,b,c)则首先会对(a,b,c)进行group by,然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by,它会统计所选列中值的所有组合的聚合(即所有字段的组合)
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
WITH CUBE
ORDER BY
grouping__id
;
等价于
(company_name,dep_name,user_id)
SELECT
company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
UNION ALL
SELECT
company_name
,dep_name
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
UNION ALL
SELECT
company_name
,NULL
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,user_id
UNION ALL
SELECT
company_name
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
UNION ALL
SELECT
NULL
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
dep_name
,user_id
UNION ALL
SELECT
NULL
,dep_name
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
dep_name
UNION ALL
SELECT
NULL
,NULL
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
user_id
UNION ALL
SELECT
NULL
,NULL
,NULL
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
;
结果如下:
+---------------+---------------+-----------+----------+---------------+
| grouping__id | company_name | dep_name | user_id | total_salary |
+---------------+---------------+-----------+----------+---------------+
| 0 | NULL | NULL | NULL | 133500.00 |
| 1 | s.zh | NULL | NULL | 74000.00 |
| 1 | x.qx | NULL | NULL | 59500.00 |
| 2 | NULL | finance | NULL | 50900.00 |
| 2 | NULL | kiccp | NULL | 8600.00 |
| 2 | NULL | tester | NULL | 20000.00 |
| 2 | NULL | enginer | NULL | 54000.00 |
| 3 | s.zh | tester | NULL | 20000.00 |
| 3 | s.zh | enginer | NULL | 54000.00 |
| 3 | x.qx | kiccp | NULL | 8600.00 |
| 3 | x.qx | finance | NULL | 50900.00 |
| 4 | NULL | NULL | 7 | 8600.00 |
| 4 | NULL | NULL | 5 | 24500.00 |
| 4 | NULL | NULL | 4 | 13400.00 |
| 4 | NULL | NULL | 3 | 20000.00 |
| 4 | NULL | NULL | 2 | 26000.00 |
| 4 | NULL | NULL | 1 | 28000.00 |
| 4 | NULL | NULL | 6 | 13000.00 |
| 5 | s.zh | NULL | 2 | 26000.00 |
| 5 | s.zh | NULL | 3 | 20000.00 |
| 5 | x.qx | NULL | 5 | 24500.00 |
| 5 | x.qx | NULL | 6 | 13000.00 |
| 5 | s.zh | NULL | 1 | 28000.00 |
| 5 | x.qx | NULL | 7 | 8600.00 |
| 5 | x.qx | NULL | 4 | 13400.00 |
| 6 | NULL | enginer | 1 | 28000.00 |
| 6 | NULL | finance | 4 | 13400.00 |
| 6 | NULL | tester | 3 | 20000.00 |
| 6 | NULL | finance | 5 | 24500.00 |
| 6 | NULL | kiccp | 7 | 8600.00 |
| 6 | NULL | enginer | 2 | 26000.00 |
| 6 | NULL | finance | 6 | 13000.00 |
| 7 | x.qx | finance | 5 | 24500.00 |
| 7 | x.qx | finance | 4 | 13400.00 |
| 7 | x.qx | kiccp | 7 | 8600.00 |
| 7 | s.zh | tester | 3 | 20000.00 |
| 7 | s.zh | enginer | 2 | 26000.00 |
| 7 | s.zh | enginer | 1 | 28000.00 |
| 7 | x.qx | finance | 6 | 13000.00 |
+---------------+---------------+-----------+----------+---------------+
总结:
grouping sets:指定字段的组合来做group by
with rollup:指定字段的组合依次从右向左来递减做group by
with cube:指定字段的全组合
————————————————
版权声明:本文为CSDN博主「高达一号」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u010003835/article/details/105353510