一、 分析报告的背景:
2020年12月业务组组长需要向领导汇报2020年11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体。
二、分析目的:
1、如何制定销售策略,调整产品结构,才能保持高速增长,获取更多的收益,占领更多市场份额,是公司最关心的问题。
2、报告通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。
三、分析思路:
a.从整体的角度:分析2020.1—2020.11自行车整体销售表现
b.从地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
c.从产品的角度:分析11月类别产品销售量表现、11月细分产品销售量表现
d.热销产品:分析11月TOP10产品销量榜、11月TOP10销量增速榜
e.从用户的角度:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户比例及产品购买喜好
四:分析过程:
1、自行车整体销售表现:
读取数据
engine = sqlalchemy.create_engine(
sql_cmd = 'select * from dw_customer_order'
gather_customer_order = pd.read_sql(con=engine,sql=sql_cmd)
gather_customer_order.head()
查看数据类型
数据无空值,为方便分析,添加一列月份字段,按月分析维度使用
将数据进行处理,筛选出需要的部分
#将本字段的格式修改成datetime64的形式
gather_customer_order['create_date'] = gather_customer_order['create_date'].astype('datetime64')
#增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year_month']=gather_customer_order['create_date'].apply(lambda x:x.strftime('%Y-%m'))
#筛选产品类别为自行车的数据
gather_customer_order = gather_customer_order.loc[gather_customer_order['cplb_zw'] == '自行车']
自行车整体销量表现
#每月订单数量和销售金额,用groupby创建一个新的对象,需要将order_num、sum_amount求和
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':sum,'sum_amount':sum}).\
sort_index(ascending = False).reset_index()
利用diff函数求环比
diff函数时后一个数减前一个数后除以前一个数,日期降序排序,因此要除以 "-1"
#求每月自行车销售订单量环比,观察最近一年数据变化趋势
#环比是本月与上月的对比,例如本期2019-02月销售额与上一期2019-01月销售额做对比
order_num_diff = list((overall_sales_performance.order_num.diff()/overall_sales_performance.order_num)/-1)
order_num_diff.pop(0) #删除列表中第一个元素
order_num_diff.append(0) #将0新增到列表末尾
#将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'order_num_diff':order_num_diff})],axis=1)
overall_sales_performance
sum_amount_diff = list((overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount)/-1)
sum_amount_diff.pop(0) #删除列表中第一个元素
sum_amount_diff.append(0) #将0新增到列表末尾
#将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'sum_amount_diff':sum_amount_diff})],axis=1)
#销量环比字段名order_diff,销售金额环比字段名amount_diff
#按照日期排序,升序
overall_sales_performance = overall_sales_performance.rename(columns = {'order_num_diff':'order_diff','sum_amount_diff':'amount_diff'}).\
sort_values(by='create_year_month',ascending=True)
将处理好的数据存入数据库
engine = sqlalchemy.create_engine('mysql+pymysql://
overall_sales_performance.to_sql('pt_overall_sale_performance',con=engine,if_exists = 'append',index=False)
2、2020年11月自行车地域销售表现
#筛选10月11月自行车数据
gather_customer_order_10_11 = gather_customer_order[gather_customer_order['create_year_month'].isin(['2020-10','2020-11'])]
#按照区域、月分组,订单量求和,销售金额求和
gather_customer_order_10_11_group= \
gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
计算11月自行车环比
#将区域存为列表
region_list=gather_customer_order_10_11_group['chinese_territory'].drop_duplicates(keep='first').values.tolist()
#pct_change()当前元素与先前元素的相差百分比,求不同区域10月11月环比
order_x = pd.Series([])
amount_x = pd.Series([])
for i in region_list:
a=gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change().fillna(0)
b=gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change().fillna(0)
order_x=order_x.append(a)
amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
#10月11月各个区域自行车销售数量、销售金额环比
gather_customer_order_10_11_group.head()
存入数据库
engine = sqlalchemy.create_engine(
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory',con=engine,if_exists = 'append',index=False)
3.11月自行车销售量TOP10城市环比
#筛选11月自行车交易数据
gather_customer_order_11 = gather_customer_order_10_11.loc[gather_customer_order_10_11['create_year_month'] == '2020-11']
gather_customer_order_city_11=gather_customer_order_11.groupby(['chinese_city']).agg({'order_num':'sum'}).reset_index()
#11月自行车销售数量前十城市
gather_customer_order_city_head = gather_customer_order_city_11.sort_values(by = 'order_num',ascending = False).head(10)
#筛选销售前十城市,10月11月自行车销售数据
gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11['chinese_city'].isin(list(gather_customer_order_city_head['chinese_city']))]
#分组计算前十城市,自行车销售数量销售金额
gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
#计算前十城市环比
city_top_list = list(gather_customer_order_city_head['chinese_city'])
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
#print(i)
a=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change().fillna(0)
b=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change().fillna(0)
order_top_x=order_top_x.append(a)
amount_top_x = amount_top_x.append(b)
#计算前十城市环比
city_top_list = list(gather_customer_order_city_head['chinese_city'])
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
a=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change().fillna(0)
b=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change().fillna(0)
order_top_x=order_top_x.append(a)
amount_top_x = amount_top_x.append(b)
gather_customer_order_city_10_11.head(5)
导入数据库
engine = sqlalchemy.create_engine(
gather_customer_order_10_11_group.to_sql('pt_bicy_november_october_city',con=engine,if_exists = 'append',index=False)
4.细分市场表现
总体市场表现
#求每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').order_num.sum().reset_index()
#合并自行车销售信息表+自行车每月累计销售数量表,pd.merge
order_num_proportion = pd.merge(gather_customer_order, gather_customer_order_group_month, on='create_year_month')
#计算自行车销量/自行车每月销量占比
order_num_proportion['order_proportion'] = order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
#重命名sum_month_order:自行车每月销售量
order_num_proportion = order_num_proportion.rename(columns = {'order_num_y':'sum_month_order'})
公路/山地/旅游自行车细分市场表现
公路自行车细分市场销量表现
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
#求公路自行车不同型号产品销售数量
gather_customer_order_road_month = gather_customer_order_road.groupby(by = ['create_year_month','product_name']).order_num.sum().reset_index()
gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
#每个月公路自行车累计销售数量
gather_customer_order_road_month_sum = gather_customer_order_road_month.groupby('create_year_month').order_num.sum().reset_index()
山地自行车细分市场销量表现
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
#求山地自行车不同型号产品销售数量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(by = ['create_year_month','product_name']).order_num.sum().reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
#每个月公路自行车累计销售数量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').order_num.sum().reset_index()
#合并山地自行车hz_customer_order_Mountain_month与每月累计销售数量
#用于计算不同型号产品的占比
gather_customer_order_Mountain_month = pd.merge(gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum,on='create_year_month')
旅游·自行车细分市场销量表现与前面方法一样
最后三表合并
#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month])
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] = gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
#order_month_product当月产品累计销量
#sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace = True)
存入数据库
engine = sqlalchemy.create_engine("mysql+pymysql://root:root@localhost:3306/datafrog05_adventure")
gather_customer_order_month.to_sql('gather_customer_order_month_4',con = engine,if_exists='replace', index=False)
计算2019年11月自行车环比
#计算11月环比,先筛选10月11月数据
gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2020-10','2020-11'])]
#排序。将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
product_name_list = list(gather_customer_order_month_10_11.product_name.drop_duplicates())
order_top_x = pd.Series([])
for i in product_name_list:
#print(i)
a=gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change().fillna(0)
['sum_amount'].pct_change().fillna(0)
order_top_x=order_top_x.append(a)
gather_customer_order_month_10_11['order_num_diff'] = order_top_x
#筛选出11月自行车数据
gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month'] == '2020-11']
计算2020年1月至11月产品累计销量
#筛选2019年1月至11月自行车数据
gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month['create_year_month'].isin(['2020-01','2020-02','2020-03','2020-04','2020-05','2020-06','2020-07','2020-08','2020-09','2020-10','2020-11'])]
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()
#重命名sum_order_1_11:1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
2020年11月自行车产品销量、环比、累计销量
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on = 'product_name')
存入数据库
engine = sqlalchemy.create_engine(
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11',con=engine)
5.用户行为分析
读取数据,读取时加入判定条件,选取特定数据,优化读取速度
engine = sqlalchemy.create_engine(
sql_cmd = "select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2020-12-1'"
df_CUSTOMER = pd.read_sql(con=engine,sql=sql_cmd)
engine = sqlalchemy.create_engine(
sql_cmd = "select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2020-12-1'"
df_CUSTOMER = pd.read_sql(con=engine,sql=sql_cmd)
将销售订单表和客户信息表合并
sales_customer_order_11=pd.merge(df_CUSTOMER,df_sales_orders_11,on='customer_key',how='inner')
sales_customer_order_11.head(3)
提取用户出生年份,便于计算年龄
customer_birth_year = sales_customer_order_11['birth_date'].str.split('-',expand = True).rename(columns = {0:'birth_year'}).\
drop(labels = [1,2],axis = 1)
sales_customer_order_11 = pd.concat([sales_customer_order_11,customer_birth_year],axis = 1)
5.1用户年龄分析
计算用户年龄
#修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].astype('int')
# 计算用户年龄
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
年龄分层
#新增'age_level'分层区间列
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'], [30,35,40,45,50,55,60,65], labels=["30-34","35-39","40-44","45-49","50-54","55-59","60-64"])
sales_customer_order_11.head()
筛选销售订单为自行车的订单信息、计算年龄比率、
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
df_customer_order_bycle['age_level_rate'] = 1 / len(df_customer_order_bycle)
对年龄再次进行分层
df_customer_order_bycle['age_level2'] = pd.cut(df_customer_order_bycle['customer_age'],bins=[0,30,40,120],right=False,labels=['<=29','30-39','>=40'])
5.2用户性别
# 按性别分组
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
# 求每个年龄段人数
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()
# 关联上 age_level2_count,也就是各年龄段的人数
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
# 关联上 gender_count,也就是各个性别的人数
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
存入数据库
engine = sqlalchemy.create_engine('mysql+pymysql://账号:密码@链接/数据库')
df_customer_order_bycle.to_sql('pt_user_behavior_november',con = engine,if_exists='replace', index=False)
6.1 11月产品销量TOP10产品,销售数量及环比
计算TOP10产品
#计算产品销售数量,\ 为换行符
#按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().\
sort_values(by = 'order_num',ascending = False).head(10)
customer_order_11_top10.head()
计算TOP10销量及环比
#TOP10销量产品信息
list(customer_order_11_top10['product_name'])
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\
isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11['category'] = '本月TOP10销量'
customer_order_month_10_11.head()
6.2 11月增速TOP10产品,销售数量及环比
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2020-11'].\
sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\
isin(list(customer_order_month_11['product_name']))]
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
合并TOP10销量表customer_order_month_10_11,TOP10增速customer_order_month_11_top10_seep
# axis = 0按照行维度合并,axis = 1按照列维度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11.tail()
存入数据库
engine = sqlalchemy.create_engine('mysql+pymysql://账号:密码@链接/数据库')
df_customer_order_bycle.to_sql('pt_hot_products_novembe',con = engine,if_exists='replace', index=False)
可视化展示总结
1.整体销售情况
(1)自行车整体销售情况
近12个月销量最高的是1月份,为15527;9月份和12月份环比增长最高,较之前月份增长了2%
(2)自行车整体销售金额情况
近12个月销售金额最多的是1月份,为29.5百万,环比增长最高的是10月,较9月增长了3%;自行车销售量与销售金额趋于一致
2.地域销售分析
(1)地域销售增长环比增速
中南地区整体销量最高,西南地区销量增长最快。
(2)TOP10城市销售情况
济南市和株洲市销量最高,东莞市增长最快,高达31%
3.产品销售分析
细分市场销量
(1)市场总体销量
公路自行车销量占比均达到了市场份额的一般,旅游自行车销量最低,消费者更偏爱公路自行车
(2)公路自行车
11月公路自行车,较10月份总体销量稳定,各型号中Road-150-Red,Road-750-Black和Road-550-W Ywllow销量占比最多,更受消费者欢迎
(3)旅游自行车
11月份旅游自行车,除Touring-3000 Yellow其他自行车型号环比都成上升趋势,Touring-100 Blue和Touring-1000 Yellow销售额占比最高,更受消费者青睐
(4)山地自行车
11月份山地自行车,除Mountain-200 Silver其他型号环比呈现上升趋势,其中Mountain-200 Black和Mountain-200 Silver市场份额占比最大,这两款型号自行车更受消费者欢迎
5.用户行为分析
(1)年龄
根据年龄划分,年龄在35-39岁消费认输占比最多,达到了31%,随着年龄的增长,占比逐渐下降;在各年龄段中公路自行车也是最受欢迎的
(2)性别
按照性别分析,男女消费者差异不大,公路自行车无论男女都是受欢迎的产品,其次是山地自行车
6.热销品分析
(1)11月TOP10销量产品
11月型号为Mountain-200 Black销售量最多,为671辆
(2)11月Top10销量增速产品
11月型号为Road-650 Black增速最快,较10月增长46%。