聚类分析_客户群聚类分析

聚类是非监督学习的一种算法，我们使用k-means聚类算法，实现客户细分，以及营销战略如何在实际业务中应用。

1.导入数据

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
from sklearn.cluster import KMeans

data = pd.read_csv('./Mall_Customers.csv')

2.数据探索


data.head()


	CustomerID	Gender	Age	Annual Income (k$)	Spending Score (1-100)
0	1	Male	19	15	39
1	2	Male	21	15	81
2	3	Female	20	16	6
3	4	Female	23	16	77
4	5	Female	31	17	40



data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              200 non-null    int64 
 1   Gender                  200 non-null    object
 2   Age                     200 non-null    int64 
 3   Annual Income (k$)      200 non-null    int64 
 4   Spending Score (1-100)  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB



data.isnull().any()

CustomerID                False
Gender                    False
Age                       False
Annual Income (k$)        False
Spending Score (1-100)    False
dtype: bool



data.describe()


	CustomerID	Age	Annual Income (k$)	Spending Score (1-100)
count	200.000000	200.000000	200.000000	200.000000
mean	100.500000	38.850000	60.560000	50.200000
std	57.879185	13.969007	26.264721	25.823522
min	1.000000	18.000000	15.000000	1.000000
25%	50.750000	28.750000	41.500000	34.750000
50%	100.500000	36.000000	61.500000	50.000000
75%	150.250000	49.000000	78.000000	73.000000
max	200.000000	70.000000	137.000000	99.000000



data[['Gender','CustomerID']].groupby('Gender').count()


	CustomerID

|Gender||
|Female|112|
|Male|88|




gender = data['Gender'].value_counts()
labels = ['Female', 'Male']
colors = ['c', 'coral']
explode = [0, 0.05]
plt.figure(figsize=(8,8))
plt.title('Total of customers by gender', fontsize = 16, fontweight='bold') 
plt.pie(gender, colors = colors, autopct = '%1.0f%%', labels = labels, explode = explode, startangle=90, textprops={'fontsize': 16})
plt.savefig('Total of customers by gender.png', bbox_inches = 'tight')
plt.show()

output_11_0.png



plt.figure(figsize=(16,6))
plt.subplot(1,2,1)
sns.distplot(data['Spending Score (1-100)'], color = 'green')
plt.title('Distribution of Spending Score')
plt.subplot(1,2,2)
sns.distplot(data['Annual Income (k$)'], color = 'green')
plt.title('Distribution of Annual Income (k$)')
plt.show()

output_12_0.png



sns.pairplot(data=data[['Spending Score (1-100)','Annual Income (k$)','Age']], diag_kind="kde")
plt.savefig('Distribution.png', bbox_inches = 'tight')

output_13_0.png



plt.figure(figsize=(8,6))
plt.title('Annual Income vs Spending Score', fontsize = 16, fontweight='bold')  
plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'], color = 'indianred', edgecolors = 'crimson')
plt.xlabel('Annual Income', fontsize = 14)
plt.ylabel('Spending Score', fontsize = 14)
plt.savefig('Annual Income vs Spending Score.png', bbox_inches = 'tight')
plt.show()

output_14_0.png

3.模型开发



X1_Matrix = data.iloc[:, [2,4]].values # Age & Spending Score
X2_Matrix = data.iloc[:, [3,4]].values # Annual Income & Spending Score



inertias_1 = []
for i in range(1,20):
    kmeans = KMeans(n_clusters=i, init='k-means++',  max_iter=300, n_init=10,random_state=0)
    kmeans.fit(X1_Matrix)
    inertia = kmeans.inertia_
    inertias_1.append(inertia)
    print('For n_cluster =', i, 'The inertia is:', inertia)

For n_cluster = 1 The inertia is: 171535.5
For n_cluster = 2 The inertia is: 75949.15601023017
For n_cluster = 3 The inertia is: 45840.67661610867
For n_cluster = 4 The inertia is: 28165.58356662934
For n_cluster = 5 The inertia is: 23830.24505228459
For n_cluster = 6 The inertia is: 19502.407839362204
For n_cluster = 7 The inertia is: 15523.684014328752
For n_cluster = 8 The inertia is: 13020.084512948222
For n_cluster = 9 The inertia is: 11517.231348351697
For n_cluster = 10 The inertia is: 10299.698359250398
For n_cluster = 11 The inertia is: 9404.802904325206
For n_cluster = 12 The inertia is: 8659.542579270144
For n_cluster = 13 The inertia is: 7896.277200074606
For n_cluster = 14 The inertia is: 7223.8088214073505
For n_cluster = 15 The inertia is: 6691.75644045497
For n_cluster = 16 The inertia is: 6160.592835350923
For n_cluster = 17 The inertia is: 5552.953625949214
For n_cluster = 18 The inertia is: 5356.265766259883
For n_cluster = 19 The inertia is: 4869.198509239299



# Creating the figure
figure = plt.figure(1, figsize=(15,6), dpi=300)
plt.plot(np.arange(1,20), inertias_1, alpha=0.8, marker='o')
plt.xlabel("K")
plt.ylabel("Inertia ")

Text(0, 0.5, 'Inertia ')

output_18_1.png



Kmeans = KMeans(n_clusters=5, init='k-means++',  max_iter=300, n_init=10,random_state=0)
labels = Kmeans.fit_predict(X1_Matrix)
centroids1 = Kmeans.cluster_centers_ 
# the centroid points in each cluster
# Visualizing the 5 clusters
plt.scatter(x=X1_Matrix[labels==0, 0], y=X1_Matrix[labels==0, 1], s=20, c='red', marker='o')
plt.scatter(x=X1_Matrix[labels==1, 0], y=X1_Matrix[labels==1, 1], s=20, c='blue', marker='^')
plt.scatter(x=X1_Matrix[labels==2, 0], y=X1_Matrix[labels==2, 1], s=20, c='grey', marker='s')
plt.scatter(x=X1_Matrix[labels==3, 0], y=X1_Matrix[labels==3, 1], s=20, c='orange', marker='p')
plt.scatter(x=X1_Matrix[labels==4, 0], y=X1_Matrix[labels==4, 1], s=20, c='green', marker='*')
#Visualizing every centroids in different cluster.
plt.scatter(x=centroids1[:,0], y=centroids1[:,1], s=300, alpha=0.8, marker='+', label='Centroids')
#Style Setting
plt.title("Cluster Of Customers", fontsize=20)
plt.xlabel("Age")
plt.ylabel("Spending Score (1-100)")
plt.legend(loc=0)

<matplotlib.legend.Legend at 0x228401f81c8>

output_19_1.png



pd.Series(labels).value_counts()

0    57
1    41
2    37
3    34
4    31
dtype: int64



inertias_2 = []
for i in range(1,8):
    kmeans = KMeans(n_clusters=i, init='k-means++',  max_iter=300, n_init=10,random_state=1)
    kmeans.fit(X2_Matrix)
    inertia = kmeans.inertia_
    inertias_2.append(inertia)
    print('For n_cluster =', i, 'The inertia is:', inertia)

For n_cluster = 1 The inertia is: 269981.28
For n_cluster = 2 The inertia is: 181363.59595959596
For n_cluster = 3 The inertia is: 106348.37306211118
For n_cluster = 4 The inertia is: 73679.78903948834
For n_cluster = 5 The inertia is: 44448.45544793371
For n_cluster = 6 The inertia is: 37233.81451071001
For n_cluster = 7 The inertia is: 30227.606513152015



# Creating the figure
figure = plt.figure(1, figsize=(15,6), dpi=80)
plt.plot(np.arange(1,8), inertias_2, alpha=0.8, marker='o')
plt.xlabel("K")
plt.ylabel("Inertia ")
Kmeans = KMeans(n_clusters=5, init='k-means++',  max_iter=300, n_init=10,random_state=1)
labels = Kmeans.fit_predict(X2_Matrix)
centroids2 = Kmeans.cluster_centers_

output_22_0.png



# the centroid points in each cluster
# Visualizing the 5 clusters
plt.scatter(x=X2_Matrix[labels==0, 0], y=X1_Matrix[labels==0, 1], s=20, c='red', marker='o')
plt.scatter(x=X2_Matrix[labels==1, 0], y=X1_Matrix[labels==1, 1], s=20, c='blue', marker='^')
plt.scatter(x=X2_Matrix[labels==2, 0], y=X1_Matrix[labels==2, 1], s=20, c='grey', marker='s')
plt.scatter(x=X2_Matrix[labels==3, 0], y=X1_Matrix[labels==3, 1], s=20, c='orange', marker='p')
plt.scatter(x=X2_Matrix[labels==4, 0], y=X1_Matrix[labels==4, 1], s=20, c='green', marker='*')
#Visualizing every centroids in different cluster.
plt.scatter(x=centroids2[:,0], y=centroids2[:,1], s=300, alpha=0.8, marker='+', label='Centroids')
#Style Setting
plt.title("Cluster Of Customers", fontsize=20)
plt.xlabel("Annual Income (k$)")
plt.ylabel("Spending Score (1-100)")
plt.legend(loc=7)

<matplotlib.legend.Legend at 0x22840569d88>

output_23_1.png

5.总结

聚类结果显示：
在年龄方面，我们可以将客户分为5类，其中一类年轻人消费能力特别强，需要重点关注。
在年收入方面，我们可以将客户分为5类，有高收入低消费、高收入消费、中等收入中端消费、低收入第消费以及低收入高消费，可以针对他们做有针对性的营销策略。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 204,684评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,143评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,214评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,788评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,796评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,665评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,027评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,679评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 41,346评论 1赞 299
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,664评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,766评论 1赞 331
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,412评论 4赞 321
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,015评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,974评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,203评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,073评论 2赞 350
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,501评论 2赞 343

聚类分析_客户群聚类分析

1.导入数据

2.数据探索

3.模型开发

5.总结

推荐阅读更多精彩内容