本文是对淘宝业务分析案例的汇总。本次分析从用户,商品以及平台三大方面展开,利用了漏斗分析、维度拆分、象限分层、RFM模型等分析方法,通过MySql进行数据的处理,最后利用PowerBI进行数据的可视化。
项目背景
移动互联网企业从粗放式到精细化运营管理过程中,需要结合市场、渠道、用户行为等数据分析,对用户开展有针对性的运营活动,提供个性化、差异化的运营策略,以实现运营业务指标。本项目利用sql对淘宝用户行为数据进行分析,通过用户行为分析业务问题,提供针对性的运营策略。
分析目标
本次分析希望通过淘宝的用户行为数据,了解淘宝基本情况。建立用户行为漏斗,探索需要优化的空间。同时对用户行为及商品展开多维度拆分,针对不同类别的用户及商品开展不同策略的精细化运营。
数据来源及描述:
- 数据集来自阿里云天池:https://tianchi.aliyun.com/dataset/dataDetail?dataId=649&userId=1
-
数据描述:
分析框架
数据导入:
- 将数据保存到本地;
- 进入MySql,进入要保存的database,创建表;
- 导入数据
mysql> load data local infile "C:\\Users\\MaJie\\Desktop\\tbt\\UserBehavior\\UserBehavior.csv"
-> into table user
-> fields terminated by ","
-> lines terminated by "\n";
备注:由于数据量较大,运行较慢,只取了原数据中的2000000条进行分析
mysql> create table userbehavior as select * from user limit 2000000;
数据清洗
- 查看数据缺失值:判断各列的数量是否一致
mysql> SELECT count(user_id),count(item),count(category),count(behavior),count(time) from userbehavior;
- 通过查看数据发现,time为时间戳格式,需要进行时间格式转化
mysql> alter table userbehavior add date varchar(20);
mysql> alter table userbehavior add hour varchar(20);
mysql> update userbehavior set date = from_unixtime(time,"%Y-%m-%d");
mysql> update userbehavior set hour = from_unixtime(time,"%H");
mysql> select * from userbehavior limit 2;
+---------+---------+----------+----------+------------+------------+------+
| user_id | item | category | behavior | time | date | hour |
+---------+---------+----------+----------+------------+------------+------+
| 1 | 2268318 | 2520377 | pv | 1511544070 | 2017-11-25 | 01 |
| 1 | 2333346 | 2520771 | pv | 1511561733 | 2017-11-25 | 06 |
+---------+---------+----------+----------+------------+------------+------+
- 通过查看发现数据主要聚集在11月25日到12-03日之间,取此区间,删除其他数据
# 查看时间聚集情况
select date, count(*) from userbehavior group by date;
# 删除其余数据
DELETE FROM userbehavior
WHERE date < '2017-11-25' or date > '2017-12-03';
mysql> select date, count(*) from userbehavior group by date;
+------------+----------+
| date | count(*) |
+------------+----------+
| 2017-11-25 | 207782 |
| 2017-11-26 | 213387 |
| 2017-11-27 | 198661 |
| 2017-11-28 | 197818 |
| 2017-11-29 | 202267 |
| 2017-11-30 | 208777 |
| 2017-12-01 | 219657 |
| 2017-12-02 | 273945 |
| 2017-12-03 | 276715 |
+------------+----------+
数据分析
1. 常规指标情况
1.1 PV,UV,用户平均访问量(PV/UV)
mysql> select
-> sum(case when behavior = "pv" then 1 else 0 end) as PV,
-> (select count(*) from (select user_id from userbehavior group by user_id) as g) as UV,
-> (sum(case when behavior = "pv" then 1 else 0 end) / (select count(*) from (select user_id from userbehavior group by user_id) as g)) as 'PV/UV'
-> from userbehavior;
+---------+-------+---------+
| PV | UV | PV/UV |
+---------+-------+---------+
| 1790225 | 19544 | 91.5997 |
+---------+-------+---------+
通过统计可知,在2017-11-25 到 2017-12-03之间:
访问量PV:1790225
访问用户数UV:19544
用户平均访问量:约92
1.2 日平均访问量DAU:
mysql> select date, count(user_id) as DAU from (select date,user_id from userbehavior group by date, user_id) as g group by date;
+------------+-------+
| date | DAU |
+------------+-------+
| 2017-11-25 | 13927 |
| 2017-11-26 | 14215 |
| 2017-11-27 | 14024 |
| 2017-11-28 | 14037 |
| 2017-11-29 | 14247 |
| 2017-11-30 | 14502 |
| 2017-12-01 | 14685 |
| 2017-12-02 | 19169 |
| 2017-12-03 | 19166 |
+------------+-------+
日活跃用户数整体上升,在12/02与12/03开始增幅较大,可能与双十二活动推广有关。
1.3 复购率
复购率:在某时间窗口内重复消费用户(消费两次及以上的用户)在总消费用户中占比
- 单日复购率
mysql> select date,
-> count(user_id) as "购买总人数",
-> sum(case when buy_num_day > 1 then 1 else 0 end) as "单日复购人数",
-> (sum(case when buy_num_day > 1 then 1 else 0 end) / count(user_id)) as "复购率"
-> from(
-> select date, user_id, count(*) as buy_num_day
-> from userbehavior
-> where behavior = "buy"
-> group by date, user_id) as g
-> group by date;
+------------+-----------------+--------------------+-----------+
| date | 购买总人数 | 单日复购人数 | 复购率 |
+------------+-----------------+--------------------+-----------+
| 2017-11-25 | 2626 | 747 | 0.2845 |
| 2017-11-26 | 2635 | 792 | 0.3006 |
| 2017-11-27 | 2882 | 866 | 0.3005 |
| 2017-11-28 | 2741 | 849 | 0.3097 |
| 2017-11-29 | 2868 | 880 | 0.3068 |
| 2017-11-30 | 2949 | 872 | 0.2957 |
| 2017-12-01 | 2811 | 827 | 0.2942 |
| 2017-12-02 | 3396 | 952 | 0.2803 |
| 2017-12-03 | 3517 | 1001 | 0.2846 |
+------------+-----------------+--------------------+-----------+
- 整体复购率
mysql> select
-> sum(case when buy_num > 1 then 1 else 0 end) as "整体复购人数",
-> count(user_id) as "购买总人数",
-> sum(case when buy_num > 1 then 1 else 0 end)/count(user_id) as "复购率"
-> from(
-> select *,count(behavior) as buy_num
-> from userbehavior
-> where behavior = 'buy'
-> group by user_id) as g;
+--------------------+-----------------+-----------+
| 整体复购人数 | 购买总人数 | 复购率 |
+--------------------+-----------------+-----------+
| 8795 | 13330 | 0.6598 |
+--------------------+-----------------+-----------+
从日复购率情况来看,日复购率在29%左右稳定波动,较为稳定。
从整体复购率来看,复购率高达约66%,表名淘宝的用户忠诚度较高
1.4 跳失率
跳失率:仅仅访问了单个页面的用户占全部访问用户的百分比,或者指从首页离开网站的用户占所有访问用户的百分比。即用户的behavior为1。
跳失率是衡量网站\APP内容质量的重要标准。
mysql> select count(user_id) as "访问一次页面的用户数"
-> from(
-> select user_id
-> from userbehavior
-> group by user_id
-> having count(behavior) = 1) as g;
+--------------------------------+
| 访问一次页面的用户数 |
+--------------------------------+
| 1 |
+--------------------------------+
# 查看下该用户的访问情况
select * from userbehavior where user_id = (select user_id from userbehavior group by user_id having count(behavior) = 1);
+---------+---------+----------+----------+------------+------------+------+
| user_id | item | category | behavior | time | date | hour |
+---------+---------+----------+----------+------------+------------+------+
| 152435 | 4895956 | 2465336 | pv | 1511567056 | 2017-11-25 | 07 |
+---------+---------+----------+----------+------------+------------+------+
9天时间跳失率仅为1,表明淘宝在产品与内容的运营上表现优秀,有足够的用户吸引力,确保用户可以长时间驻留选择。
2. 用户行为访问漏斗
常规转化流程大致为:
访问量(pv) ==> 加购物车数(cart) ==> 购买数(buy)
- 流量( PV)转化漏斗
select behavior, count(*) from userbehavior group by behavior;
+----------+----------+
| behavior | count(*) |
+----------+----------+
| buy | 40243 |
| cart | 111015 |
| fav | 57526 |
| pv | 1790225 |
+----------+----------+
- 用户(UV)转化漏斗
mysql> select behavior, count(user_id)
-> from (select behavior, user_id from userbehavior group by behavior, user_id) as g
-> group by behavior;
+----------+----------------+
| behavior | count(user_id) |
+----------+----------------+
| buy | 13330 |
| cart | 14672 |
| fav | 7854 |
| pv | 19463 |
+----------+----------------+
- 从PV转化来看:从浏览PV到加入购物车的转化率仅为6.2%,用户需要花费大量的精力在挑选商品上
- 从UV转化来看:从浏览UV到购买UV转化率达到68.5%,表名用户的实际购买需求较高,且商品种类能够满足用户的购买需求。
- 结合PV及UV的转化:用户能够在淘宝购买到满足自己需求的商品,但需要花费大量精力在挑选商品上。平台在商品推荐及搜索匹配环节有待提高。
- 改善建议:建议平台在商品推荐及搜索匹配等几个入口,根据用户基础及行为信息进行算法优化,提高商品推荐准确率,降低用户寻找商品的成本。
3. 用户行为分析——时间维度拆分
3.1 用户行为分析——按天拆解:
查看用户每天各行为的情况
用户数,访问量,加购量,收藏量,购买量
mysql> select aa.date, bb.用户数, aa.访问量, aa.加购量, aa.收藏量, aa.购买量
-> from(
-> select date,
-> sum(case when behavior = 'pv' then 1 else 0 end) as '访问量',
-> sum(case when behavior = 'cart' then 1 else 0 end) as '加购量',
-> sum(case when behavior = 'fav' then 1 else 0 end) as '收藏量',
-> sum(case when behavior = 'buy' then 1 else 0 end) as '购买量'
-> from userbehavior
-> group by date) as aa
-> inner join(
-> select date, count(*) as '用户数'
-> from (select date, user_id from userbehavior group by date, user_id) as g
-> group by date) as bb
-> on aa.date = bb.date;
+------------+-----------+-----------+-----------+-----------+-----------+
| date | 用户数 | 访问量 | 加购量 | 收藏量 | 购买量 |
+------------+-----------+-----------+-----------+-----------+-----------+
| 2017-11-25 | 13927 | 186480 | 11405 | 5926 | 3971 |
| 2017-11-26 | 14215 | 191557 | 11560 | 6277 | 3993 |
| 2017-11-27 | 14024 | 177388 | 10959 | 5827 | 4487 |
| 2017-11-28 | 14037 | 177040 | 10826 | 5727 | 4225 |
| 2017-11-29 | 14247 | 181006 | 10856 | 5921 | 4484 |
| 2017-11-30 | 14502 | 187171 | 11292 | 5802 | 4512 |
| 2017-12-01 | 14685 | 196714 | 12572 | 6057 | 4314 |
| 2017-12-02 | 19169 | 245281 | 15791 | 7849 | 5024 |
| 2017-12-03 | 19166 | 247588 | 15754 | 8140 | 5233 |
+------------+-----------+-----------+-----------+-----------+-----------+
从12/02与12/03,各项数据指标明显提高,推测与淘宝双12预热活动相关。
3.2 用户行为分析——按小时拆解:
mysql> select aa.hour, bb.用户数, aa.访问量, aa.加购量, aa.收藏量, aa.购买量
-> from(
-> select hour,
-> sum(case when behavior = 'pv' then 1 else 0 end) as '访问量',
-> sum(case when behavior = 'cart' then 1 else 0 end) as '加购量',
-> sum(case when behavior = 'fav' then 1 else 0 end) as '收藏量',
-> sum(case when behavior = 'buy' then 1 else 0 end) as '购买量'
-> from userbehavior
-> group by hour) as aa
-> inner join(
-> select hour, count(*) as '用户数'
-> from (select hour, user_id from userbehavior group by hour, user_id) as g
-> group by hour) as bb
-> on aa.hour = bb.hour;
+------+-----------+-----------+-----------+-----------+-----------+
| hour | 用户数 | 访问量 | 加购量 | 收藏量 | 购买量 |
+------+-----------+-----------+-----------+-----------+-----------+
| 00 | 6338 | 60816 | 3723 | 2030 | 1167 |
| 01 | 3249 | 27836 | 1756 | 884 | 484 |
| 02 | 1932 | 15475 | 925 | 597 | 284 |
| 03 | 1385 | 11355 | 750 | 366 | 129 |
| 04 | 1223 | 9515 | 586 | 306 | 156 |
| 05 | 1617 | 10848 | 777 | 406 | 151 |
| 06 | 3442 | 22997 | 1577 | 863 | 354 |
| 07 | 6400 | 43886 | 2849 | 1382 | 735 |
| 08 | 8802 | 60974 | 3839 | 1976 | 1249 |
| 09 | 10367 | 74294 | 4602 | 2531 | 1884 |
| 10 | 11482 | 86088 | 5293 | 2932 | 2484 |
| 11 | 11681 | 84907 | 5350 | 2967 | 2512 |
| 12 | 11819 | 86032 | 5219 | 2848 | 2430 |
| 13 | 12050 | 94554 | 5541 | 3099 | 2573 |
| 14 | 11731 | 91500 | 5355 | 2802 | 2332 |
| 15 | 11978 | 95059 | 5616 | 3045 | 2474 |
| 16 | 11870 | 92677 | 5499 | 2898 | 2345 |
| 17 | 11541 | 83124 | 5344 | 2862 | 2069 |
| 18 | 11492 | 86114 | 4988 | 2607 | 1868 |
| 19 | 12424 | 108553 | 6368 | 3269 | 2326 |
| 20 | 13130 | 131941 | 7911 | 3909 | 2686 |
| 21 | 13441 | 149720 | 9340 | 4372 | 2799 |
| 22 | 12895 | 149017 | 9692 | 4718 | 2766 |
| 23 | 10187 | 112943 | 8115 | 3857 | 1986 |
+------+-----------+-----------+-----------+-----------+-----------+
从一天内不同时刻各项数据指标来看,从5点到10点,各项指标处于上升阶段,10点到18点各项指标持续稳定在较高的水平,从19点到22点到达高峰,之后开始下降。与用户的作息规律基本吻合。
在活动运营时,在10-22点大概率会获得更好的效果,尤其是在19-22点之间,处于用户最活跃的时间段,适合各种活动营销的开展。
4. 用户行为分析——商品维度拆解
4.1 商品浏览量排行榜Top10
mysql> select item, count(*) as '浏览量'
-> from userbehavior
-> where behavior = 'pv'
-> group by item
-> order by count(*) desc
-> limit 10;
+---------+-----------+
| item | 浏览量 |
+---------+-----------+
| 812879 | 377 |
| 3845720 | 320 |
| 2331370 | 301 |
| 138964 | 294 |
| 2032668 | 293 |
| 3708121 | 258 |
| 1535294 | 256 |
| 3031354 | 252 |
| 987143 | 246 |
| 3371523 | 238 |
+---------+-----------+
4.2 商品销量排行榜Top10
mysql> select item, count(*) as '购买量'
-> from userbehavior
-> where behavior = 'buy'
-> group by item
-> order by count(*) desc
-> limit 10;
+---------+-----------+
| item | 购买量 |
+---------+-----------+
| 3122135 | 35 |
| 3237415 | 17 |
| 1910706 | 16 |
| 2560262 | 16 |
| 3031354 | 15 |
| 2964774 | 15 |
| 1034594 | 14 |
| 4157341 | 14 |
| 705557 | 14 |
| 4574184 | 13 |
+---------+-----------+
4.3 浏览量Top10中购买量Top10的占比
mysql> select aa.item, aa.浏览量, bb.购买量
-> from(
-> select item, count(*) as '浏览量'
-> from userbehavior
-> where behavior = 'pv'
-> group by item
-> order by count(*) desc
-> limit 10) as aa
-> inner join(
-> select item, count(*) as '购买量'
-> from userbehavior
-> where behavior = 'buy'
-> group by item
-> order by count(*) desc
-> limit 10) as bb
-> on aa.item = bb.item;
+---------+-----------+-----------+
| item | 浏览量 | 购买量 |
+---------+-----------+-----------+
| 3031354 | 265 | 15 |
+---------+-----------+-----------+
浏览量Top10的商品,只有1件在销量Top10,商品推荐算法不太准确,导致用户浏览的商品都不是自己满意的商品,需要对商品推荐算法再优化。
4.4 四象限法分类优化——商品浏览数与销量的关系
这里根据商品的浏览量与购买量将商品划分为四个象限,实际划分标准需要根据实际业务区划分。这里根据二八法则对两个维度进行划分。
mysql> select aa.item, aa.浏览量, bb.购买量
-> from(
-> select item, count(*) as '浏览量'
-> from userbehavior
-> where behavior = 'pv'
-> group by item
-> order by count(*) desc) as aa
-> left join(
-> select item, count(*) as '购买量'
-> from userbehavior
-> where behavior = 'buy'
-> group by item
-> order by count(*) desc) as bb
-> on aa.item = bb.item;
+---------+-----------+-----------+
| item | 浏览量 | 购买量 |
+---------+-----------+-----------+
| 812879 | 377 | 2 |
| 3845720 | 324 | 1 |
| 2032668 | 301 | 3 |
| 138964 | 296 | 3 |
| 2331370 | 281 | 3 |
| ...... | ...... | ...... |
+---------+-----------+-----------+
- 第一象限:商品浏览量与销量都高。这部分商品市场需求大,可以满足用户需求,可作为平台的重点推送产品
- 第二象限:商品浏览量低,销量高。一方面可能此部分商品的特征明显,竞品较少,用户购买目标明确;另一方面可能此部分商品市场需求大,但平台对此部分商品的曝光度不足。对此部分商品,平台在算法推荐上,结合用户特征加强此部分商品的曝光。
- 第三象限:商品浏览量低,销量低。一方面可能商品本身没有市场,另一方面也可能和曝光不足有关。对此部分商品从市场与运营两方面判断,如果市场空间不足则降低此部分商品数量,如果由于曝光度不够导致销量低,则需要根据商品特征,加强推广力度。
- 第四象限:浏览量高但销量低。一方面,商品推荐可能不合理导致无法满足用户需求。另一方面,商品本身存在缺陷,导致用户无法接受直接放弃。
5. 基于RFM用户分层模型分析用户行为
数据集只涉及最近消费时间间隔R与消费频率F,消费金额M数据集未涉及,这里不对M分层,只通过RF进行用户行为细分。这里定义当前日期为2017/12/03。5.1 R维度分析
# 获取时间间隔的范围区间
mysql> select datediff('2017/12/03',max(date)) as min_datediff, datediff('2017/12/03',min(date)) as max_datediff from userbehavior;
+--------------+--------------+
| min_datediff | max_datediff |
+--------------+--------------+
| 0 | 8 |
+--------------+--------------+
可知数据集的时间间隔在区间[0:8]天,这里我们定义[0:3]为'1',[4:8]为'0'。(实际工作中看情况定义)
mysql> select user_id, diff_time as R,
-> case when diff_time between 0 and 3 then '1'
-> when diff_time between 4 and 8 then '0'
-> end as R_num
-> from(
-> select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
-> from userbehavior
-> where behavior = 'buy'
-> group by user_id) as g;
5.2 F维度分析
select min(b_num), max(b_num)
from (select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg
+------------+------------+
| min(b_num) | max(b_num) |
+------------+------------+
| 1 | 72 |
+------------+------------+
通过查看,用户购买次数最少为1次,最多为72次。这里我们将36次作为标准,定义够阿米次数在[1:14]为0,[15:72]为1。
mysql> select user_id, b_num as F,
-> case when b_num between 1 and 14 then '1' else '0' end as F_num
-> from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg;
5.3 RF维度分析
- 将RF合并
select ra.user_id, concat(R_num, F_num) as RF
from(
select user_id, diff_time as R,
case when diff_time between 0 and 3 then '1'
when diff_time between 4 and 8 then '0'
end as R_num
from(
select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
from userbehavior
where behavior = 'buy'
group by user_id) as g) as ra
inner join(
select user_id, b_num as F,
case when b_num between 1 and 14 then '1' else '0' end as F_num
from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg) as fa
on ra.user_id = fa.user_id
- 为用户分层
select user_id,
case when RF = '11' then '价值客户'
when RF = '01' then '保持客户'
when RF = '10' then '发展客户'
when RF = '00' then '挽留客户' end as '用户分层'
from (
select ra.user_id, concat(R_num, F_num) as RF
from(
select user_id, diff_time as R,
case when diff_time between 0 and 3 then '1'
when diff_time between 4 and 8 then '0'
end as R_num
from(
select user_id, datediff('2017-12-03',max(date)) as 'diff_time'
from userbehavior
where behavior = 'buy'
group by user_id) as g) as ra
inner join(
select user_id, b_num as F,
case when b_num between 1 and 14 then '1' else '0' end as F_num
from(select user_id, count(*) as b_num from userbehavior where behavior = 'buy' group by user_id) as gg) as fa
on ra.user_id = fa.user_id) as user_rf
- 不同类型客户的应对策略
- 价值客户(11):最近消费时间近和消费频次都很高,提供VIP服务
- 保持客户(01):最近消费时间较远,但消费频次很高,说明这是个一段时间没来的忠诚客户,我们需要主动和他保持联系。
- 发展客户(10):最近消费时间较近,但频次不高,忠诚度不高,很有潜力的用户,必须重点发展。
- 挽留客户(00):最近消费时间较远、消费频次不高,可能是将要流失或者已经要流失的用户,应当给予挽留措施。
6. 探究淘宝商业模式满足“二八定律”还是“长尾理论”
1. 提出假设:淘宝商业模式满足“二八定律”
2. 分析过程
如果淘宝的商业模式满足“二八定律”,那么在淘宝,20%的商品将占有总销量的百分之80%
mysql> select item, count(*) as item_num
-> from userbehavior
-> where behavior = 'buy'
-> group by item
-> order by item_num desc
-> limit 10;
+---------+----------+
| item | item_num |
+---------+----------+
| 3122135 | 35 |
| 3237415 | 17 |
| 1910706 | 16 |
| 2560262 | 16 |
| 2964774 | 15 |
| 3031354 | 15 |
| 1034594 | 14 |
| 4157341 | 14 |
| 705557 | 14 |
| 3189426 | 13 |
+---------+----------+
根据查询结果,可以看到,销量最多的商品也只卖出35件,其余商品卖出的更少。因此以淘宝为代表的电商平台,其商品售卖主要是依靠长尾商品的累计效应,并非爆款商品的带动。
7. 总结
- 从平台基本指标来看:
- PV/UV/DAU及日复购率均维持稳定,跳失率极少,9日的整体复购率高达66%。表名当前淘宝在用户,平台,商品的运营稳定且对用户有足够的吸引力,同时从9日复购率来看,用户的忠诚度较高
- 从漏斗转化模型来看,UV转化率高达68.5%,而PV的转化率仅2.2%。用户本身有足够的购买需求且平台也能满足用户的购买需求,但用户需要花费大量的成本在寻找合适的商品上。因平台在商品推荐以及搜索匹配等方面需要进行优化,能够通过用户偏好准确推送相应的商品
- 用户行为拆分看:
- 按天来看,在12/02日开始各项指标均有明显上升,可能在12/02日展开了双十二购物街的活动预热,同时效果明显;
- 从一天中的各个时刻来看,用户在10:00 - 22:00活跃度较高,各项活动运营是个在此时间窗口中展开,尤其在19:00 - 22:00达到峰值。
- 此外,通过RFM模型对用户进行分组,在后续的用户运营中,对不同的群组采取不同的运营方式,以最大化收益。
- 商品拆分来看:
- 通过从商品浏览量与销售量两个维度进行象限分析,根据不同象限商品的特征,采取不同的策略,提升转化率
- 通过对“二八定律” 以及 “长尾理论” 的探究,发现淘宝的商业模式满足长尾理论。因此在后续的运营中,平台可以尝试丰富商品类型,以满足各式各样用户的需求,但也需要权衡成本,确保收益最大化。