<section class="output_wrapper" ><h2 ><span >读取数据</span></h2><p >首先,加载pandas和numpy库,读取数据。</p><pre ><code class="python language-python hljs" ><span class="hljs-keyword" >import</span> pandas <span class="hljs-keyword" >as</span> pd<br /><span class="hljs-keyword" >import</span> numpy <span class="hljs-keyword" >as</span> np<br />detail = pd.read_csv(<span class="hljs-string" >'detail.csv'</span>,index_col=<span class="hljs-number" >0</span>,encoding = <span class="hljs-string" >'gbk'</span>)<span class="hljs-comment" >#中文编码</span><br /></code></pre><h2 ><span >自定义离差标准化函数</span></h2><pre ><code class="hljs lua" >def minmaxscale(data):<br /> data=(data-data.<span class="hljs-built_in" >min</span>())/(data.<span class="hljs-built_in" >max</span>()-data.<span class="hljs-built_in" >min</span>())<br /> <span class="hljs-keyword" >return</span> data<br />##对菜品订单表售价和销量做离差标准化<br />data1=minmaxscale(detail[<span class="hljs-string" >'counts'</span>])<br />data2=minmaxscale(detail [<span class="hljs-string" >'amounts'</span>])<br />data3=pd.<span class="hljs-built_in" >concat</span>([data1,data2],axis=<span class="hljs-number" >1</span>)<br /><span class="hljs-built_in" >print</span>(<span class="hljs-string" >'离差标准化之前销量和售价数据为:\n'</span>,<br /> detail<span class="hljs-string" >[['counts','amounts']]</span>.head())<br /><span class="hljs-built_in" >print</span>(<span class="hljs-string" >'离差标准化之后销量和售价数据为:\n'</span>,data3.head())<br /></code></pre><p >结果为:</p><pre ><code class="hljs css" >离差标准化之前销量和售价数据为:<br /> <span class="hljs-selector-tag" >counts</span> <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span> <br />2956 1 49<br />2958 1 48<br />2961 1 30<br />2966 1 25<br />2968 1 13<br />离差标准化之后销量和售价数据为:<br /> <span class="hljs-selector-tag" >counts</span> <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span> <br />2956 0<span class="hljs-selector-class" >.0</span> 0<span class="hljs-selector-class" >.271186</span><br />2958 0<span class="hljs-selector-class" >.0</span> 0<span class="hljs-selector-class" >.265537</span><br />2961 0<span class="hljs-selector-class" >.0</span> 0<span class="hljs-selector-class" >.163842</span><br />2966 0<span class="hljs-selector-class" >.0</span> 0<span class="hljs-selector-class" >.135593</span><br />2968 0<span class="hljs-selector-class" >.0</span> 0<span class="hljs-selector-class" >.067797</span><br /></code></pre><h2 ><span >也可以通过sklearn库中的minmax_scale函数实现</span></h2><pre ><code class="hljs coffeescript" ><span class="hljs-keyword" >from</span> sklearn <span class="hljs-keyword" >import</span> preprocessing<br />preprocessing.minmax_scale(detail[<span class="hljs-string" >'amounts'</span>])<br /></code></pre><p >结果为:</p><pre ><code class="hljs delphi" ><span class="hljs-keyword" >Out</span>[<span class="hljs-number" >141</span>]: <br /><span class="hljs-keyword" >array</span>([<span class="hljs-number" >0.27118644</span>, <span class="hljs-number" >0.26553672</span>, <span class="hljs-number" >0.16384181</span>, ..., <span class="hljs-number" >0.21468927</span>, <span class="hljs-number" >0.03389831</span>,<br /> <span class="hljs-number" >0.14689266</span>])<br /></code></pre><h2 ><span >自定义标准差标准化函数</span></h2><pre ><code class="hljs kotlin" >def StandardScaler(<span class="hljs-keyword" >data</span>):<br /> <span class="hljs-keyword" >data</span>=(<span class="hljs-keyword" >data</span>-<span class="hljs-keyword" >data</span>.mean())/<span class="hljs-keyword" >data</span>.std()<br /> <span class="hljs-keyword" >return</span> <span class="hljs-keyword" >data</span><br />##对菜品订单表售价和销量做标准化<br />data4=StandardScaler(detail[<span class="hljs-string" >'counts'</span>])<br />data5=StandardScaler(detail[<span class="hljs-string" >'amounts'</span>])<br />data6=pd.concat([data4,data5],axis=<span class="hljs-number" >1</span>)<br />print(<span class="hljs-string" >'标准差标准化之前销量和售价数据为:\n'</span>,<br /> detail[[<span class="hljs-string" >'counts'</span>,<span class="hljs-string" >'amounts'</span>]].head())<br />print(<span class="hljs-string" >'标准差标准化之后销量和售价数据为:\n'</span>,data6.head())<br /></code></pre><p >结果为:</p><pre ><code class="hljs css" >标准差标准化之前销量和售价数据为:<br /> <span class="hljs-selector-tag" >counts</span> <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span> <br />2956 1 49<br />2958 1 48<br />2961 1 30<br />2966 1 25<br />2968 1 13<br />标准差标准化之后销量和售价数据为:<br /> <span class="hljs-selector-tag" >counts</span> <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span> <br />2956 <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> 0<span class="hljs-selector-class" >.116671</span><br />2958 <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> 0<span class="hljs-selector-class" >.088751</span><br />2961 <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.413826</span><br />2966 <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.553431</span><br />2968 <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.888482</span><br /></code></pre><h2 ><span >也可以通过sklearn库中的scale函数实现</span></h2><pre ><code class="hljs coffeescript" ><span class="hljs-keyword" >from</span> sklearn <span class="hljs-keyword" >import</span> preprocessing<br />preprocessing.scale(detail[<span class="hljs-string" >'amounts'</span>])<br /></code></pre><p >结果为:</p><pre ><code class="hljs delphi" ><span class="hljs-keyword" >Out</span>[<span class="hljs-number" >143</span>]: <br /><span class="hljs-keyword" >array</span>([ <span class="hljs-number" >0.11667727</span>, <span class="hljs-number" >0.08875496</span>, -<span class="hljs-number" >0.41384669</span>, ..., -<span class="hljs-number" >0.16254587</span>,<br /> -<span class="hljs-number" >1.05605991</span>, -<span class="hljs-number" >0.49761363</span>])<br /></code></pre></section><p><br /></p>
使用Python进行数据标准化
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
推荐阅读更多精彩内容
- http://www.cnblogs.com/jasonfreak/p/5448462.html 目录 1 使用s...
- https://blog.csdn.net/pipisorry/article/details/52247679 ...
- Machine Learning in Python (Scikit-learn)-(No.1) 作者:范淼(人人...
- Dataset transformations| 数据转换 Combining estimators|组合学习器 ...