数仓实战05:数仓搭建-DWS层

1.业务术语

1)用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android 系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户。
2)新增用户
首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用 户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月新增用户。
3)活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计为一个活跃用户。
4)周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户。
5)月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例。
6)沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用 户质量和用户与 APP 的匹配程度。
7)版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之 间的优劣和用户行为习惯。
8)本周回流用户
上周未启动过应用,本周启动了应用的用户。
9)连续 n 周活跃用户
连续 n 周,每周至少启动一次。
10)忠诚用户
连续活跃 5 周以上的用户
11)连续活跃用户
连续 2 周及以上活跃的用户
12)近期流失用户
连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)
13)留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分 用户占当时新增用户的比例即是留存率。
例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过 应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%, 二个月后的留存率是 40%,三个月后的留存率是 25%。
14)用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例。
15)单次使用时长
每次启动使用的时间长度。
16)日使用时长
累计一天内的使用时间长度。
17)启动次数计算标准
IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的 间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用 30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用 行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔。

2.系统函数

2.1 collect_set 函数
1)创建原数据表

hive (gmall)> drop table if exists stud; 
create table stud (name string, area string, course string, score int);

2)向原数据表中插入数据

hive (gmall) > INSERT INTO TABLE stud
VALUES
    ('zhang3', 'bj', 'math', 88);

INSERT INTO TABLE stud
VALUES
    ('li4', 'bj', 'math', 99);

INSERT INTO TABLE stud
VALUES
    ('wang5', 'sh', 'chinese', 92);

INSERT INTO TABLE stud
VALUES
    ('zhao6', 'sh', 'chinese', 54);

INSERT INTO TABLE stud
VALUES
    ('tian7', 'bj', 'chinese', 91);

3)查询表中数据

hive (gmall)> select * from stud; 

输出:

stud.name stud.area stud.course stud.score
zhang3 bj math 88 
li4 bj math 99 
wang5 sh chinese 92 
zhao6 sh chinese 54 
tian7 bj chinese 91

4)把同一分组的不同行的数据聚合成一个集合

hive (gmall) > SELECT
    course,
    collect_set (area),
    avg(score)
FROM
    stud
GROUP BY
    course;

输出:

chinese ["sh","bj"] 79.0 
math ["bj"] 93.5

5) 用下标可以取某一个

hive (gmall)> select course, collect_set(area)[0], 
avg(score) from stud group by course;
chinese sh 79.0 math bj 93.5

2.2 nvl函数
1)基本语法
NVL(表达式 1,表达式 2)
如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函 数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型 和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型。

2.3 日期处理函数
1)date_format 函数(根据格式整理日期)

hive (gmall)> select date_format('2020-03-10','yyyy-MM');

2020-03

2)date_add 函数(加减日期)

hive (gmall)> select date_add('2020-03-10',-1); 
2020-03-09 
hive (gmall)> select date_add('2020-03-10',1); 
2020-03-11

3)next_day 函数
(1)取当前天的下一个周一

hive (gmall)> select next_day('2020-03-12','MO'); 
2020-03-16

说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
(2)取当前周的周一

hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7); 
2020-03-11

4)last_day 函数(求当月最后一天日期)

hive (gmall)> select last_day('2020-03-10'); 
2020-03-31
3.DWS 层(用户行为)

3.1 每日设备行为
每日设备行为,主要按照设备 id 统计。


1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_uv_detail_daycount;

CREATE external TABLE dws_uv_detail_daycount (
    `mid_id` string COMMENT '设备唯一标识',
    `user_id` string COMMENT '用户标识',
    `version_code` string COMMENT '程序版本号',
    `version_name` string COMMENT '程序版本名',
    `lang` string COMMENT '系统语言',
    `source` string COMMENT '渠道号',
    `os` string COMMENT '安卓系统版本',
    `area` string COMMENT '区域',
    `model` string COMMENT '手机型号',
    `brand` string COMMENT '手机品牌',
    `sdk_version` string COMMENT 'sdkVersion',
    `gmail` string COMMENT 'gmail',
    `height_width` string COMMENT '屏幕宽高',
    `app_time` string COMMENT '客户端日志产生时的时间',
    `network` string COMMENT '网络模式',
    `lng` string COMMENT '经度',
    `lat` string COMMENT '纬度',
    `login_count` BIGINT COMMENT '活跃次数'
) partitioned BY (dt string) stored AS parquet location '/warehouse/gmall/dws/dws_uv_detail_daycount';

2)数据装载

hive (gmall) > 
INSERT overwrite TABLE dws_uv_detail_daycount PARTITION (dt = '2020-03-10') SELECT
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws(
        '|',
        collect_set (version_code)
    ) version_code,
    concat_ws(
        '|',
        collect_set (version_name)
    ) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws(
        '|',
        collect_set (sdk_version)
    ) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws(
        '|',
        collect_set (height_width)
    ) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
FROM
    dwd_start_log
WHERE
    dt = '2020-03-10'
GROUP BY
    mid_id;

3)查询加载结果

hive (gmall)> 
select * from dws_uv_detail_daycount 
where dt='2020-03-10';
4.DWS层(业务)

DWS 层的宽表字段,是站在不同维度的视角去看事实表。重点关注事实表的度量值。


4.1 每日会员行为
1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_user_action_daycount;

CREATE external TABLE dws_user_action_daycount (
    user_id string COMMENT '用户 id',
    login_count BIGINT COMMENT '登录次数',
    cart_count BIGINT COMMENT '加入购物车次数',
    cart_amount DOUBLE COMMENT '加入购物车金额',
    order_count BIGINT COMMENT '下单次数',
    order_amount DECIMAL (16, 2) COMMENT '下单金额',
    payment_count BIGINT COMMENT '支付次数',
    payment_amount DECIMAL (16, 2) COMMENT '支付金额'
) COMMENT '每日用户行为' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_user_action_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)数据装载

hive (gmall) > 
with tmp_login as (
 select 
  user_id,
  count(*) login_count 
 from dwd_start_log
 where dt='2020-12-25'
 group by user_id
),
tmp_cart as (
 select 
  user_id,
  count(*) cart_count,
  sum(cart_price*sku_num)cart_amount
 from dwd_fact_cart_info
 where dt='2020-12-27'
 group by user_id
),
tmp_order as (
 select
  user_id,
  count(*) order_count,
  sum(final_total_amount) order_amount
 from dwd_fact_order_info
 where dt='2020-12-27'
 group by user_id
),
tmp_pament as (
 select
  user_id,
  count(*) payment_count,
  sum(payment_amount)payment_amount
 from dwd_fact_payment_info
 where dt='2020-12-27'
 group by user_id
)

insert overwrite table dws_user_action_daycount
partition(dt='2020-12-27')
select 
 user_actions.user_id,
 sum(user_actions.login_count),
 sum(user_actions.cart_count),
 sum(user_actions.cart_amount),
 sum(user_actions.order_count),
 sum(user_actions.order_amount),
 sum(user_actions.payment_count),
 sum(user_actions.payment_amount)
from 
(
 select
  user_id,
  login_count,
  0 cart_count,
  0 cart_amount,
  0 order_count,
  0 order_amount,
  0 payment_count,
  0 payment_amount
 from tmp_login
 union all
  select
   user_id,
   0 login_count,
   cart_count,
   cart_amount,
   0 order_count,
   0 order_amount,
   0 payment_count,
   0 payment_amount
  from tmp_cart
  union all
   select 
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    order_count,
    order_amount,
    0 payment_count,
    0 payment_amount
   from tmp_order
   union all
    select 
     user_id,
     0 login_count,
     0 cart_count,
     0 cart_amount,
     0 order_count,
     0 order_amount,
     payment_count,
     payment_amount
   from tmp_pament
)user_actions
group by user_id;

select *from dws_user_action_daycount


3)查询加载结果

hive (gmall)> 
select * from dws_user_action_daycount 
where dt='2020-03-10';

3.2 每日商品行为
1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_sku_action_daycount;

CREATE external TABLE dws_sku_action_daycount (
    sku_id string COMMENT 'sku_id',
    order_count BIGINT COMMENT '被下单次数',
    order_num BIGINT COMMENT '被下单件数',
    order_amount DECIMAL (16, 2) COMMENT '被下单金额',
    payment_count BIGINT COMMENT '被支付次数',
    payment_num BIGINT COMMENT '被支付件数',
    payment_amount DECIMAL (16, 2) COMMENT '被支付金额',
    refund_count BIGINT COMMENT '被退款次数',
    refund_num BIGINT COMMENT '被退款件数',
    refund_amount DECIMAL (16, 2) COMMENT '被退款金额',
    cart_count BIGINT COMMENT '被加入购物车次数',
    cart_num BIGINT COMMENT '被加入购物车件数',
    favor_count BIGINT COMMENT '被收藏次数',
    appraise_good_count BIGINT COMMENT '好评数',
    appraise_mid_count BIGINT COMMENT '中评数',
    appraise_bad_count BIGINT COMMENT '差评数',
    appraise_default_count BIGINT COMMENT '默认评价数'
) COMMENT '每日商品行为' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_sku_action_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)数据装载
注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单。

hive (gmall) > 
with
tmp_order as
(
 select 
  sku_id,
  count(*) order_count,
  sum(sku_num) order_num,
  sum(total_amount) order_amount
 from dwd_fact_order_detail
 where dt = '2020-12-27'
 GROUP BY sku_id
),
tmp_payment AS (
SELECT
 sku_id,
 count(*) payment_count,
 sum(sku_num) payment_num,
 sum(total_amount) payment_amount
FROM
 dwd_fact_order_detail
WHERE
 dt = '2020-12-27'
AND order_id IN (
 SELECT
  id
 FROM
  dwd_fact_order_info
 WHERE
(
dt = '2020-12-27'
OR dt = date_add('2020-12-27' ,- 1)
)
AND date_format(payment_time, 'yyyy-MM-dd') = '2020-12-27'
)
GROUP BY
sku_id
),
tmp_refund AS (
  SELECT
   sku_id,
   count(*) refund_count,
       sum(refund_num) refund_num,
       sum(refund_amount) refund_amount
    FROM
        dwd_fact_order_refund_info
    WHERE
        dt = '2020-12-27'
    GROUP BY
        sku_id
),
tmp_cart AS (
SELECT
sku_id,
count(*) cart_count,
sum(sku_num) cart_num
FROM
dwd_fact_cart_info
WHERE
dt = '2020-12-27'
AND date_format(create_time, 'yyyy-MM-dd') = '2020-12-27'
GROUP BY
sku_id
),
tmp_favor AS (
SELECT
sku_id,
count(*) favor_count
FROM
dwd_fact_favor_info
GROUP BY
sku_id
),
tmp_appraise AS (
SELECT
sku_id,
sum(IF(appraise = '1201', 1, 0)) appraise_good_count,
sum(IF(appraise = '1202', 1, 0)) appraise_mid_count,
sum(IF(appraise = '1203', 1, 0)) appraise_bad_count,
sum(IF(appraise = '1204', 1, 0)) appraise_default_count
FROM
dwd_fact_comment_info
WHERE
dt = '2020-12-27'
GROUP BY
sku_id
) INSERT overwrite TABLE dws_sku_action_daycount PARTITION (dt = '2020-12-27') SELECT
sku_id,
sum(order_count),
sum(order_num),
sum(order_amount),
sum(payment_count),
sum(payment_num),
sum(payment_amount),
sum(refund_count),
sum(refund_num),
sum(refund_amount),
sum(cart_count),
sum(cart_num),
sum(favor_count),
sum(appraise_good_count),
sum(appraise_mid_count),
sum(appraise_bad_count),
sum(appraise_default_count)
FROM
(
SELECT
sku_id,
order_count,
order_num,
order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_order
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
payment_count,
payment_num,
payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_payment
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
refund_count,
refund_num,
refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_refund
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
cart_count,
cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_cart
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_favor
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
appraise_good_count,
appraise_mid_count,
appraise_bad_count,
appraise_default_count
FROM
tmp_appraise
) tmp
GROUP BY
sku_id;

3)查询加载结果
hive (gmall)>
select * from dws_sku_action_daycount where dt='2020-03-10';

3.3 每日优惠券统计(预留)

1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_coupon_use_daycount;

CREATE external TABLE dws_coupon_use_daycount (
    `coupon_id` string COMMENT '优惠券 ID',
    `coupon_name` string COMMENT '购物券名称',
    `coupon_type` string COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',
    `condition_amount` string COMMENT '满额数',
    `condition_num` string COMMENT '满件数',
    `activity_id` string COMMENT '活动编号',
    `benefit_amount` string COMMENT '减金额',
    `benefit_discount` string COMMENT '折扣',
    `create_time` string COMMENT '创建时间',
    `range_type` string COMMENT '范围类型 1、商品 2、品类 3、品牌',
    `spu_id` string COMMENT '商品 id',
    `tm_id` string COMMENT '品牌 id',
    `category3_id` string COMMENT '品类 id',
    `limit_num` string COMMENT '最多领用次数',
    `get_count` BIGINT COMMENT '领用次数',
    `using_count` BIGINT COMMENT '使用(下单)次数',
    `used_count` BIGINT COMMENT '使用(支付)次数'
) COMMENT '每日优惠券统计' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_coupon_use_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)数据装载

hive (gmall) > INSERT overwrite TABLE dws_coupon_use_daycount PARTITION (dt = '2020-03-10') SELECT
    cu.coupon_id,
    ci.coupon_name,
    ci.coupon_type,
    ci.condition_amount,
    ci.condition_num,
    ci.activity_id,
    ci.benefit_amount,
    ci.benefit_discount,
    ci.create_time,
    ci.range_type,
    ci.spu_id,
    ci.tm_id,
    ci.category3_id,
    ci.limit_num,
    cu.get_count,
    cu.using_count,
    cu.used_count
FROM
    (
        SELECT
            coupon_id,
            sum(

                IF (
                    date_format(get_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) get_count,
            sum(

                IF (
                    date_format(using_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) using_count,
            sum(

                IF (
                    date_format(used_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) used_count
        FROM
            dwd_fact_coupon_use
        WHERE
            dt = '2020-03-10'
        GROUP BY
            coupon_id
    ) cu
LEFT JOIN (
    SELECT
        *
    FROM
        dwd_dim_coupon_info
    WHERE
        dt = '2020-03-10'
) ci ON cu.coupon_id = ci.id;

3)查询加载结果
hive (gmall)>
select * from dws_coupon_use_daycount where dt='2020-03-10';

3.4 每日活动统计(预留)

1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_activity_info_daycount;

CREATE external TABLE dws_activity_info_daycount (
    `id` string COMMENT '编号',
    `activity_name` string COMMENT '活动名称',
    `activity_type` string COMMENT '活动类型',
    `start_time` string COMMENT '开始时间',
    `end_time` string COMMENT '结束时间',
    `create_time` string COMMENT '创建时间',
    `order_count` BIGINT COMMENT '下单次数',
    `payment_count` BIGINT COMMENT '支付次数'
) COMMENT '购物车信息表' PARTITIONED BY (`dt` string) ROW format delimited FIELDS TERMINATED BY '\t' location '/warehouse/gmall/dws/dws_activity_info_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)数据装载

hive (gmall) > INSERT overwrite TABLE dws_activity_info_daycount PARTITION (dt = '2020-03-10') SELECT
    oi.activity_id,
    ai.activity_name,
    ai.activity_type,
    ai.start_time,
    ai.end_time,
    ai.create_time,
    oi.order_count,
    oi.payment_count
FROM
    (
        SELECT
            activity_id,
            sum(

                IF (
                    date_format(create_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) order_count,
            sum(

                IF (
                    date_format(payment_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) payment_count
        FROM
            dwd_fact_order_info
        WHERE
            (
                dt = '2020-03-10'
                OR dt = date_add('2020-03-10' ,- 1)
            )
        AND activity_id IS NOT NULL
        GROUP BY
            activity_id
    ) oi
JOIN (
    SELECT
        *
    FROM
        dwd_dim_activity_info
    WHERE
        dt = '2020-03-10'
) ai ON oi.activity_id = ai.id;

3)查询加载结果
hive (gmall)>
select * from dws_activity_info_daycount
where dt='2020-03-10';

3.5 每日购买行为

1)建表语句

hive (gmall) > DROP TABLE
IF EXISTS dws_sale_detail_daycount;

CREATE external TABLE dws_sale_detail_daycount (
    user_id string COMMENT '用户 id',
    sku_id string COMMENT '商品 id',
    user_gender string COMMENT '用户性别',
    user_age string COMMENT '用户年龄',
    user_level string COMMENT '用户等级',
    order_price DECIMAL (10, 2) COMMENT '商品价格',
    sku_name string COMMENT '商品名称',
    sku_tm_id string COMMENT '品牌 id',
    sku_category3_id string COMMENT '商品三级品类 id',
    sku_category2_id string COMMENT '商品二级品类 id',
    sku_category1_id string COMMENT '商品一级品类 id',
    sku_category3_name string COMMENT '商品三级品类名称',
    sku_category2_name string COMMENT '商品二级品类名称',
    sku_category1_name string COMMENT '商品一级品类名称',
    spu_id string COMMENT '商品 spu',
    sku_num INT COMMENT '购买个数',
    order_count BIGINT COMMENT '当日下单单数',
    order_amount DECIMAL (16, 2) COMMENT '当日下单金额'
) COMMENT '每日购买行为' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_sale_detail_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)数据装载

hive (gmall) > INSERT overwrite TABLE dws_sale_detail_daycount PARTITION (dt = '2020-03-10') SELECT
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between ('2020-03-10', ui.birthday) / 12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
FROM
    (
        SELECT
            user_id,
            sku_id,
            sum(sku_num) sku_num,
            count(*) order_count,
            sum(total_amount) order_amount
        FROM
            dwd_fact_order_detail
        WHERE
            dt = '2020-03-10'
        GROUP BY
            user_id,
            sku_id
    ) op
JOIN (
    SELECT
        *
    FROM
        dwd_dim_user_info_his
    WHERE
        end_date = '9999-99-99'
) ui ON op.user_id = ui.id
JOIN (
    SELECT
        *
    FROM
        dwd_dim_sku_info
    WHERE
        dt = '2020-03-10'
) si ON op.sku_id = si.id;

3)查询加载结果
hive (gmall)> select * from dws_sale_detail_daycount
where dt='2020-03-10';

5.DWS 层数据导入脚本

1)在/home/atguigu/bin 目录下创建脚本 dwd_to_dws.sh

[atguigu@hadoop102 bin]$ vim dwd_to_dws.sh

在脚本中填写如下内容

#!/bin/bash 
APP=gmall hive=/opt/module/hive/bin/hive 
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 
if [ -n "$1" ] ;
then 
do
_date=$1 
else
do
_date=`date -d "-1 day" +%F` 
fisql="
INSERT overwrite TABLE $ { APP }.dws_uv_detail_daycount PARTITION (dt = '$do_date') SELECT
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws(
        '|',
        collect_set (version_code)
    ) version_code,
    concat_ws(
        '|',
        collect_set (version_name)
    ) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws(
        '|',
        collect_set (sdk_version)
    ) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws(
        '|',
        collect_set (height_width)
    ) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
FROM
    $ { APP }.dwd_start_log
WHERE
    dt = '$do_date'
GROUP BY
    mid_id;

WITH tmp_login AS (
    selectuser_id,
    count(*) login_count
FROM
    $ { APP }.dwd_start_log
WHERE
    dt = '$do_date'
AND user_id IS NOT NULL
GROUP BY
    user_id
),
 tmp_cart AS (
    SELECT
        user_id,
        count(*) cart_count,
        sum(cart_price * sku_num) cart_amount
    FROM
        $ { APP }.dwd_fact_cart_info
    WHERE
        dt = '$do_date'
    AND user_id IS NOT NULL
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        user_id
),
 tmp_order AS (
    SELECT
        user_id,
        count(*) order_count,
        sum(final_total_amount) order_amount
    FROM
        $ { APP }.dwd_fact_order_info
    WHERE
        dt = '$do_date'
    GROUP BY
        user_id
),
 tmp_payment AS (
    SELECT
        user_id,
        count(*) payment_count,
        sum(payment_amount) payment_amount
    FROM
        $ { APP }.dwd_fact_payment_info
    WHERE
        dt = '$do_date'
    GROUP BY
        user_id
) INSERT overwrite TABLE $ { APP }.dws_user_action_daycount PARTITION (dt = '$do_date') SELECT
    user_actions.user_id,
    sum(user_actions.login_count),
    sum(user_actions.cart_count),
    sum(user_actions.cart_amount),
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(
        user_actions.payment_amount
    )
FROM
    (
        SELECT
            user_id,
            login_count,
            0 cart_count,
            0 cart_amount,
            0 order_count,
            0 order_amount,
            0 payment_count,
            0 payment_amount
        FROM
            tmp_loginunion ALL SELECT
                user_id,
                0 login_count,
                cart_count,
                cart_amount,
                0 order_count,
                0 order_amount,
                0 payment_count,
                0 payment_amount
            FROM
                tmp_cart
            UNION ALL
                SELECT
                    user_id,
                    0 login_count,
                    0 cart_count,
                    0 cart_amount,
                    order_count,
                    order_amount,
                    0 payment_count,
                    0 payment_amount
                FROM
                    tmp_order
                UNION ALL
                    SELECT
                        user_id,
                        0 login_count,
                        0 cart_count,
                        0 cart_amount,
                        0 order_count,
                        0 order_amount,
                        payment_count,
                        payment_amount
                    FROM
                        tmp_payment
    ) user_actions
GROUP BY
    user_id;

WITH tmp_order AS (
    SELECT
        sku_id,
        count(*) order_count,
        sum(sku_num) order_num,
        sum(total_amount) order_amount
    FROM
        $ { APP }.dwd_fact_order_detail
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
),
 tmp_payment AS (
    SELECT
        sku_id,
        count(*) payment_count,
        sum(sku_num) payment_num,
        sum(total_amount) payment_amount
    FROM
        $ { APP }.dwd_fact_order_detail
    WHERE
        dt = '$do_date'
    AND order_id IN (
        SELECT
            idfrom $ { APP }.dwd_fact_order_info
        WHERE
            (
                dt = '$do_date'
                OR dt = date_add('$do_date' ,- 1)
            )
        AND date_format(payment_time, 'yyyy-MM-dd') = '$do_date'
    )
    GROUP BY
        sku_id
),
 tmp_refund AS (
    SELECT
        sku_id,
        count(*) refund_count,
        sum(refund_num) refund_num,
        sum(refund_amount) refund_amount
    FROM
        $ { APP }.dwd_fact_order_refund_info
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
),
 tmp_cart AS (
    SELECT
        sku_id,
        count(*) cart_count,
        sum(sku_num) cart_num
    FROM
        $ { APP }.dwd_fact_cart_info
    WHERE
        dt = '$do_date'
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        sku_id
),
 tmp_favor AS (
    SELECT
        sku_id,
        count(*) favor_count
    FROM
        $ { APP }.dwd_fact_favor_info
    WHERE
        dt = '$do_date'
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        sku_id
),
 tmp_appraise AS (
    SELECT
        sku_id,
        sum(IF(appraise = '1201', 1, 0)) appraise_good_count,
        sum(IF(appraise = '1202', 1, 0)) appraise_mid_count,
        sum(IF(appraise = '1203', 1, 0)) appraise_bad_count,
        sum(IF(appraise = '1204', 1, 0)) appraise_default_count
    FROM
        $ { APP }.dwd_fact_comment_info
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
) INSERT overwrite TABLE $ { APP }.dws_sku_action_daycount PARTITION (dt = '$do_date') SELECT
    sku_id,
    sum(order_count),
    sum(order_num),
    sum(order_amount),
    sum(payment_count),
    sum(payment_num),
    sum(payment_amount),
    sum(refund_count),
    sum(refund_num),
    sum(refund_amount),
    sum(cart_count),
    sum(cart_num),
    sum(favor_count),
    sum(appraise_good_count),
    sum(appraise_mid_count),
    sum(appraise_bad_count),
    sum(appraise_default_count)
FROM
    (
        SELECT
            sku_id,
            order_count,
            order_num,
            order_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_count,
            0 refund_num,
            0 refund_amount,
            0 cart_count,
            0 cart_num,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        FROM
            tmp_order
        UNION ALL
            SELECT
                sku_id,
                0 order_count,
                0 order_num,
                0 order_amount,
                payment_count,
                payment_num,
                payment_amount,
                0 refund_count,
                0 refund_num,
                0 refund_amount,
                0 cart_count,
                0 cart_num,
                0 favor_count,
                0 appraise_good_count,
                0 appraise_mid_count,
                0 appraise_bad_count,
                0 appraise_default_count
            FROM
                tmp_payment
            UNION ALL
                SELECT
                    sku_id,
                    0 order_count,
                    0 order_num,
                    0 order_amount,
                    0 payment_count,
                    0 payment_num,
                    0 payment_amount,
                    refund_count,
                    refund_num,
                    refund_amount,
                    0 cart_count,
                    0 cart_num,
                    0 favor_count,
                    0 appraise_good_count,
                    0 appraise_mid_count,
                    0 appraise_bad_count,
                    0 appraise_default_count
                FROM
                    tmp_refund
                UNION ALL
                    SELECT
                        sku_id,
                        0 order_count,
                        0 order_num,
                        0 order_amount,
                        0 payment_count,
                        0 payment_num,
                        0 payment_amount,
                        0 refund_count,
                        0 refund_num,
                        0 refund_amount,
                        cart_count,
                        cart_num,
                        0 favor_count,
                        0 appraise_good_count,
                        0 appraise_mid_count,
                        0 appraise_bad_count,
                        0 appraise_default_count
                    FROM
                        tmp_cart
                    UNION ALL
                        SELECT
                            sku_id,
                            0 order_count,
                            0 order_num,
                            0 order_amount,
                            0 payment_count,
                            0 payment_num,
                            0 payment_amount,
                            0 refund_count,
                            0 refund_num,
                            0 refund_amount,
                            0 cart_count,
                            0 cart_num,
                            favor_count,
                            0 appraise_good_count,
                            0 appraise_mid_count,
                            0 appraise_bad_count,
                            0 appraise_default_count
                        FROM
                            tmp_favor
                        UNION ALL
                            SELECT
                                sku_id,
                                0 order_count,
                                0 order_num,
                                0 order_amount,
                                0 payment_count,
                                0 payment_num,
                                0 payment_amount,
                                0 refund_count,
                                0 refund_num,
                                0 refund_amount,
                                0 cart_count,
                                0 cart_num,
                                0 favor_count,
                                appraise_good_count,
                                appraise_mid_count,
                                appraise_bad_count,
                                appraise_default_count
                            FROM
                                tmp_appraise
    ) tmp
GROUP BY
    sku_id;

INSERT overwrite TABLE $ { APP }.dws_coupon_use_daycount PARTITION (dt = '$do_date') SELECT
    cu.coupon_id,
    ci.coupon_name,
    ci.coupon_type,
    ci.condition_amount,
    ci.condition_num,
    ci.activity_id,
    ci.benefit_amount,
    ci.benefit_discount,
    ci.create_time,
    ci.range_type,
    ci.spu_id,
    ci.tm_id,
    ci.category3_id,
    ci.limit_num,
    cu.get_count,
    cu.using_count,
    cu.used_count
FROM
    (
        SELECT
            coupon_id,
            sum(

                IF (
                    date_format(get_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) get_count,
            sum(

                IF (
                    date_format(using_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) using_count,
            sum(

                IF (
                    date_format(used_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) used_count
        FROM
            $ { APP }.dwd_fact_coupon_use
        WHERE
            dt = '$do_date'
        GROUP BY
            coupon_id
    ) cu
LEFT JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_coupon_info
    WHERE
        dt = '$do_date'
) ci ON cu.coupon_id = ci.id;

INSERT overwrite TABLE $ { APP }.dws_activity_info_daycount PARTITION (dt = '$do_date') SELECT
    oi.activity_id,
    ai.activity_name,
    ai.activity_type,
    ai.start_time,
    ai.end_time,
    ai.create_time,
    oi.order_count,
    oi.payment_count
FROM
    (
        SELECT
            activity_id,
            sum(

                IF (
                    date_format(create_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) order_count,
            sum(

                IF (
                    date_format(payment_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) payment_count
        FROM
            $ { APP }.dwd_fact_order_info
        WHERE
            (
                dt = '$do_date'
                OR dt = date_add('$do_date' ,- 1)
            )
        AND activity_id IS NOT nullgroup BY activity_id
    ) oi
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_activity_info
    WHERE
        dt = '$do_date'
) ai ON oi.activity_id = ai.id;

INSERT overwrite TABLE $ { APP }.dws_sale_detail_daycount PARTITION (dt = '$do_date') SELECT
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between ('$do_date', ui.birthday) / 12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
FROM
    (
        SELECT
            user_id,
            sku_id,
            sum(sku_num) sku_num,
            count(*) order_count,
            sum(total_amount) order_amount
        FROM
            $ { APP }.dwd_fact_order_detail
        WHERE
            dt = '$do_date'
        GROUP BY
            user_id,
            sku_id
    ) op
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_user_info_his
    WHERE
        end_date = '9999-99-99'
) ui ON op.user_id = ui.id
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_sku_info
    WHERE
        dt = '$do_date'
) si ON op.sku_id = si.id;
"
$hive -e "$sql"

2)增加脚本执行权限

[atguigu@hadoop102 bin]$ chmod 777 dwd_to_dws.sh 

3)执行脚本导入数据

[atguigu@hadoop102 bin]$ dwd_to_dws.sh 2020-03-11 4)

查看导入数据
hive (gmall)>
select * from dws_uv_detail_daycount
where dt='2020-03-11';
select * from dws_user_action_daycount
where dt='2020-03-11';
select * from dws_sku_action_daycount
where dt='2020-03-11';
select * from dws_sale_detail_daycount
where dt='2020-03-11';
select * from dws_coupon_use_daycount
where dt='2020-03-11';
select * from dws_activity_info_daycount
where dt='2020-03-11';

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,723评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,080评论 2 379
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,604评论 0 335
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,440评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,431评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,499评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,893评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,541评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,751评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,547评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,619评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,320评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,890评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,896评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,137评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,796评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,335评论 2 342

推荐阅读更多精彩内容