读取数据
首先,加载pandas和numpy库,读取数据。
import pandas as pd
import numpy as np
detail = pd.read_csv('detail.csv',index_col=0,encoding = 'gbk')#中文编码
自定义离差标准化函数
def minmaxscale(data):
data=(data-data.min())/(data.max()-data.min())
return data
##对菜品订单表售价和销量做离差标准化
data1=minmaxscale(detail['counts'])
data2=minmaxscale(detail ['amounts'])
data3=pd.concat([data1,data2],axis=1)
print('离差标准化之前销量和售价数据为:\n',
detail[['counts','amounts']].head())
print('离差标准化之后销量和售价数据为:\n',data3.head())
结果为:
离差标准化之前销量和售价数据为:
counts amounts
detail_id
2956 1 49
2958 1 48
2961 1 30
2966 1 25
2968 1 13
离差标准化之后销量和售价数据为:
counts amounts
detail_id
2956 0.0 0.271186
2958 0.0 0.265537
2961 0.0 0.163842
2966 0.0 0.135593
2968 0.0 0.067797
也可以通过sklearn库中的minmax_scale函数实现
from sklearn import preprocessing
preprocessing.minmax_scale(detail['amounts'])
结果为:
Out[141]:
array([0.27118644, 0.26553672, 0.16384181, ..., 0.21468927, 0.03389831,
0.14689266])
自定义标准差标准化函数
def StandardScaler(data):
data=(data-data.mean())/data.std()
return data
##对菜品订单表售价和销量做标准化
data4=StandardScaler(detail['counts'])
data5=StandardScaler(detail['amounts'])
data6=pd.concat([data4,data5],axis=1)
print('标准差标准化之前销量和售价数据为:\n',
detail[['counts','amounts']].head())
print('标准差标准化之后销量和售价数据为:\n',data6.head())
结果为:
标准差标准化之前销量和售价数据为:
counts amounts
detail_id
2956 1 49
2958 1 48
2961 1 30
2966 1 25
2968 1 13
标准差标准化之后销量和售价数据为:
counts amounts
detail_id
2956 -0.177571 0.116671
2958 -0.177571 0.088751
2961 -0.177571 -0.413826
2966 -0.177571 -0.553431
2968 -0.177571 -0.888482
也可以通过sklearn库中的scale函数实现
from sklearn import preprocessing
preprocessing.scale(detail['amounts'])
结果为:
Out[143]:
array([ 0.11667727, 0.08875496, -0.41384669, ..., -0.16254587,
-1.05605991, -0.49761363])