本笔记将利用sql语言构建RFM模型，将会有两种办法对用户进行分类。
第一种方法是基于有明确业务指标计算RFM分值。
第二种是按二八定律设定阀值。

首先看看RFM模型是什么？

R值:Rencency(最近一次消费) 指的是用户在店铺最近一次购买时间距离分析点的时间间隔;
F值:Frequency(消费频率) 指的是是用户在固定时间内的购买次数;
M值:Monetary(消费金额) 指的是一段时间(通常是1年)内的消费金额;

image.png

根据三个值的高低之分，会得出8种类型的客户；

一般每个指标都会有1，2，3，4，5分的分值标准，此指标一般根据具体业务需求进行设置；
如：

image.png

然后根据以上标准对用户进行打分，并会求得各指标均值，进行比较，大于均值为高，少于均值为低。

image.png

接下来按照此标准用sql执行。

方法一

第一步

首先我们导入相关数据，并去重数据放进新表 temp_trade；
由于时间关系，以导入如下数据，期间利用
SET date_time = STR_TO_DATE(time,'%Y-%m-%d %H');
set dates=date(date_time);
这两个函数对原表（红框）日期进行处理；

image.png

create table o_retailers_trade_user
(
user_id int (9),
item_id int (9),
behavior_type int (1), -- 用户行为类型(1-曝光;2-购买;3-加入购物⻋;4-加入收藏夹。)
user_geohash varchar (14), 
item_category int (5),  -- 品类ID
time varchar (13) -- 用户发生行为的时间
);

-- 日期时间数据处理 增加新列date_time、dates
ALTER TABLE o_retailers_trade_user
ADD COLUMN date_time datetime null;
UPDATE o_retailers_trade_user 
SET date_time = STR_TO_DATE(time,'%Y-%m-%d %H');

alter table o_retailers_trade_user add column dates char(10) null;
update o_retailers_trade_user
set dates=date(date_time);
desc o_retailers_trade_user;

再检查一下关键字段有无缺失值

SELECT COUNT(user_id)  ,COUNT(item_id) ,COUNT(item_category) ,COUNT(behavior_type)  ,COUNT(time) 
FROM  o_retailers_trade_user;

查询后得出并无缺失。

image.png

再检查一下用户行为是否有1、2、3、4以外的异常值；

SELECT behavior_type FROM o_retailers_trade_user

WHERE behavior_type NOT IN (1,2,3,4);

查询结果无异常值；

-- 建新表，放进去重后的数据

create table temp_trade like o_retailers_trade_user;
insert into temp_trade select distinct * from o_retailers_trade_user;

#####**第二步**
**对R值进行计算**
R值定义:Rencency(最近一次消费) 指的是用户在店铺最近一次购买时间距离分析点的时间间隔; 

1. 查询每用户ID最近一次购买时间，即求时间最大值，排序后见到时间最大值为12月18日，接下来将假设项目在12月19日进行，来求与最近一次购买时间的差值，即下面的相隔天数。

SELECT user_id , max(dates) AS 最近一次消费时间
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
ORDER BY 最近一次消费时间 desc

查询结果如下：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-bda29525c0eeaba1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

2. 利用case when 函数得出每个相差天数对应的分值；
为方便后面对表调用，创建了视图；

CREATE VIEW r_clevel AS
SELECT user_id , 最近一次消费时间 , DATEDIFF('2019-12-19',最近一次消费时间) AS 相差天数,
(CASE
WHEN DATEDIFF('2019-12-19',最近一次消费时间)<=2 THEN 5
WHEN DATEDIFF('2019-12-19',最近一次消费时间)<=4 THEN 4
WHEN DATEDIFF('2019-12-19',最近一次消费时间)<=6 THEN 3
WHEN DATEDIFF('2019-12-19',最近一次消费时间)<=8 THEN 2
ELSE
1 END )AS R分值
FROM
(
SELECT user_id , max(dates) AS 最近一次消费时间
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
ORDER BY 最近一次消费时间 desc
)a


视图如下（部分截图）：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-b3fccfa2b13c4359.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


#####**第三步**
**对F值进行计算**
F值定义：Frequency(消费频率) 指的是是用户在固定时间内的购买次数; 

1. 首先求出每个用户的购买频次，即需用到count函数，对用户行为类型2购物进行计数；

SELECT user_id , COUNT(user_id) AS 购买频次
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
ORDER BY 购买频次 desc

查询结果如下：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-beb3286ed68f4459.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

2. 利用case when 函数得出每个用户购买次数对应的分值；
为方便后面对表调用，也创建了视图；

CREATE VIEW f_clevel AS
SELECT user_id , 购买频次 ,
(CASE
WHEN 购买频次<=2 THEN 1
WHEN 购买频次<=4 THEN 2
WHEN 购买频次<=6 THEN 3
WHEN 购买频次<=8 THEN 4
ELSE 5 END )AS F分值

FROM
(
SELECT user_id , COUNT(user_id) AS 购买频次
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
)a

视图如下（部分截图）：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-c7b1da1a1b5908d3.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



#####**第四步**
**求R、F的均值**

-- 1.R平均值
SELECT avg(R分值) as 'r_avg' FROM r_clevel;
-- 2.F平均值
select avg(F分值) as 'f_avg' from f_clevel;


计算结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-a6cc2a912289bfe2.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

![image.png](https://upload-images.jianshu.io/upload_images/22277555-54ed1b9e52ee7f29.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

#####**第五步**
**对用户进行等级划分**
由于该数据没有M值，故只建立了4个分类，若有M值可得8个分类；
继续利用case when 函数计算，最后创建视图RFM_table

create view RFM_table
as
select a.*,b.分值,
(case
when a.分值>2.5515 and b.分值>2.2606 then '重要高价值客户' when a.分值<2.5515 and b.分值>2.2606 then '重要唤回客户'
when a.分值>2.5515 and b.分值<2.2606 then '重要深耕客户' when a.分值<2.5515 and b.分值<2.2606 then '重要挽留客户' END
) as user_class
from r_clevel a, f_clevel b
where a.user_id=b.user_id;


查询结果如下：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-dac958fb9f9129e1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


#####**第六步**
**统计各用户等级数量**

SELECT user_class , COUNT(user_class)AS 数量
FROM
RFM_table
GROUP BY user_class

查询结果如下：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-4473934ae2193520.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

到此，整个方法一就完成了。


#方法二
由于方法一有时给人感觉分值指标有点拍脑袋想出来的，这里可以常用用二八法则来确定一个阀值，即在前20%的用户属于高分值用户，后面的为低分值。
#####**第一步**
**统计本次表中带有购买行为的用户数量，并找出在前20%位置的找出R值的阀值**
1. 求得20%的用户数是165x20%=33人

SELECT COUNT(DISTINCT user_id) AS 购买用户数
FROM
temp_trade
WHERE behavior_type='2'

![image.png](https://upload-images.jianshu.io/upload_images/22277555-0700a0f69cac20db.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
2. 利用limit函数，找到排第33个用户的相差天数，作为阀值。

SELECT
相差天数
FROM
(
SELECT user_id , 最近一次消费时间 , DATEDIFF('2019-12-19',最近一次消费时间) AS 相差天数
FROM
(
SELECT user_id , max(dates) AS 最近一次消费时间
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
ORDER BY 最近一次消费时间 desc
)a
ORDER BY 相差天数 DESC
)b
LIMIT 32,1


查询结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-9d30530ae9cd300f.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

#####**第二步**
**找出F值的阀值**

SELECT
购买频次
FROM
(
SELECT user_id , 购买频次

FROM
(
SELECT user_id , COUNT(user_id) AS 购买频次
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id
ORDER BY 购买频次 DESC
)a
)b
LIMIT 32,1

查询结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-5703c4c82674328e.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


得到R、F两个维度分类阈值分别为：19、7；

#####**第三步**
**在原表中求得用户的相差天数及购买频次**
创建视图RF_TABLE

CREATE VIEW RF_TABLE AS
SELECT user_id , 最近一次消费时间 , DATEDIFF('2019-12-19',最近一次消费时间) AS 相差天数,购买频次
FROM
(
SELECT user_id , max(dates) AS 最近一次消费时间 , COUNT(user_id) AS 购买频次
FROM
temp_trade
WHERE behavior_type='2'
GROUP BY user_id)a

查询结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-78545cebf74f7e17.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


#####**第四步**
**对用户进行等级划分**

select user_id,
(case
when 相差天数<=19 and 购买频次>=7 then '重要高价值客户' when 相差天数>19 and 购买频次>=7 then '重要唤回客户'
when 相差天数<=19 and 购买频次<7 then '重要深耕客户' when 相差天数>19 and 购买频次<7 then '重要挽留客户' END
) as user_class
from RF_TABLE ;

查询结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-6ecae95dab046dd0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



#####**第五步**
**统计各用户等级数量**

SELECT user_class , COUNT(user_class)AS 数量
FROM
(
select user_id,
(case
when 相差天数<=19 and 购买频次>=7 then '重要高价值客户' when 相差天数>19 and 购买频次>=7 then '重要唤回客户'
when 相差天数<=19 and 购买频次<7 then '重要深耕客户' when 相差天数>19 and 购买频次<7 then '重要挽留客户' END
) as user_class
from RF_TABLE
) a
GROUP BY user_class

查询结果：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-ef049b7330636eeb.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



再次对比方法一结果：
查询结果如下：
![image.png](https://upload-images.jianshu.io/upload_images/22277555-4473934ae2193520.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

        发现方法二缺少了最高价值客户数据，原因应在于在此数据中，没有用户的的R值和F值同时处于前20%而导致的，而数据量较少也是其中一个原因。

说明两种不同方法，得出的结果会有较大的出入。

SQL建立RFM模型指标的两种方法对比

SQL建立RFM模型指标的两种方法对比

方法一

第一步