Machine Learning--Gradient Descent（机器学习--梯度下降）

机器学习是什么，不同的人可能给出不同的定义。我的理解是，使用算法让机器从数据中学习，进而得到比人为设计更好的模型，去做某些诸如分类、预测的事情。
这里，我们研究波士顿房价预测这一问题，来对机器学习做一个简单的实践。

from sklearn.datasets import load_boston
data = load_boston()
X, y = data['data'], data['target']
X[1]
array([2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,
       6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,
       1.7800e+01, 3.9690e+02, 9.1400e+00])
len(y)
506
len(X[:, 0])
506
X_rm = X[:, 5]

上段代码中需要注意的地方有：

y代表着不同房子的房价，X代表着房子的各种变量，如大小，犯罪率等。可以看到，我们一共使用了506栋房子的数据。
为了简单起见，我们仅仅研究X的第6个参数与房价的关系，所以需要把第六个变量在各个房子上的取值单独拿出来为X_rm。

我们假设自变量与因变量之间是线性关系，即 $y = kx+b$ ， $k,b$ 为未知参数，定义price()函数，来计算给定自变量与参数值后的y值。我们的任务就是，找到一个合适的 $k,b$ 参数值，使得当我们给定一个 $x$ ,使用上式得到的预测值与真实值之间的差距尽可能的小。如果我们能够找到比较合适的 $k,b$ 参数值，那么就有可能得到准确率比较高的预测结果。
那么我们如何定义我们得到的预测值与真实值之间的差距呢？我们使用如下定义：

图片1

def price(rm, k, b):
    """f(x) = k * x + b"""
    return k * rm + b

def loss(y, y_hat): # to evaluate the performance 
    return sum((y_i - y_hat_i)**2 for y_i, y_hat_i in zip(list(y), list(y_hat))) / len(list(y))

# 也可以使用numpy来更简单的定义损失函数
import numpy as np
def loss(y,y_hat):
    e = np.array(y)-np.array(y_hat)
    return (e@e.T)/len(y)

上段代码中需要注意的地方有：

Python3 zip() 函数 https://www.runoob.com/python3/python3-func-zip.html

我们的任务就是，找到一个合适的 $k,b$ 参数值，使得loss尽可能小。那么按照机器学习的思想，我们要做的是先随机生成一个 $k,b$ ，然后通过数据去让程序自动的去调整 $k,b$ ，直到迭代多少次或者损失小于某个值。

Gradient Descent（梯度下降）

我们可以看到， $x,y$ 是确定的值，loss其实是以 $k,b$ 为变量的函数，我们求loss关于 $k,b$ 的偏导数以及相应代码如下所示：

图片2

def partial_k(x, y, y_hat):
    n = len(y)

    gradient = 0
    
    for x_i, y_i, y_hat_i in zip(list(x), list(y), list(y_hat)):
        gradient += (y_i - y_hat_i) * x_i
    
    return -2 / n * gradient


def partial_b(x, y, y_hat):
    n = len(y)

    gradient = 0
    
    for y_i, y_hat_i in zip(list(y), list(y_hat)):
        gradient += (y_i - y_hat_i)
    
    return -2 / n * gradient

我们在随机得到 $k,b$ 后，计算loss以及loss关于 $k,b$ 的偏导数，一般来说，随机得到的 $k,b$ 都会使得loss比较大，那么我们应该怎么变化 $k,b$ ，才能使得loss不断减小呢？偏导数为我们提供了变化的方向，我们定义一个正的学习率 $\alpha$ ，在计算完偏导数后，我们对 $k,b$ 的值做如下变化：
$k = k -\alpha\times \frac{\partial loss}{\partial k}$
$k = b -\alpha\times \frac{\partial loss}{\partial b}$
得到新的 $k,b$ 后，我们带回去计算loss，如果新的到的loss比之前的loss小，那么最小的loss就是新的到的loss， $k,b$ 也是比之前的 $k,b$ 更为合适的取值，接下来再重复上述过程，直到重复了某个次数或者损失小于某个值。注意， $k,b$ 一定要同步更新，不能先更新 $k$ 再用更新了的 $k$ 去计算函数关于 $b$ 的偏导数去更新 $b$ 。代码如下：

import random
trying_times = 2000
min_loss = float('inf') 
current_k = random.random() * 200 - 100
current_b = random.random() * 200 - 100
learning_rate = 1e-04
for i in range(trying_times):
    
    price_by_k_and_b = [price(r, current_k, current_b) for r in X_rm]
    
    current_loss = loss(y, price_by_k_and_b)

    if current_loss < min_loss: # performance became better
        min_loss = current_loss
        
        if i % 50 == 0: 
            print('When time is : {}, get best_k: {} best_b: {}, and the loss is: {}'.format(i, best_k, best_b, min_loss))

    k_gradient = partial_k(X_rm, y, price_by_k_and_b)
    
    b_gradient = partial_b(X_rm, y, price_by_k_and_b)
    
    current_k = current_k + (-1 * k_gradient) * learning_rate

    current_b = current_b + (-1 * b_gradient) * learning_rate

上段代码中需要注意的地方有：

Python中可以用如下方式表示正负无穷：float("inf"), float("-inf"),利用 inf 做加、乘算术运算仍会得到 inf。除了inf外的其他数除以inf，会得到0。
Python random() 函数。https://www.runoob.com/python/func-number-random.html。注意区分random模块中的random和numpy模块中的random。
Python format 格式化函数。https://www.runoob.com/python/att-string-format.html
1e-04代表 $1\times10^{-4}$

最后得到的结果如下所示：

When time is : 0, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 575.5349822522099
When time is : 50, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 277.9378161169662
When time is : 100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 147.24895628021088
When time is : 150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 89.8572545975801
When time is : 200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 64.65372567052019
When time is : 250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 53.58551239815359
When time is : 300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 48.72477014152337
When time is : 350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 46.59001559478237
When time is : 400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.65236839246802
When time is : 450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.24042644341104
When time is : 500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.059346031766644
When time is : 550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.97964764306714
When time is : 600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.94447083305862
When time is : 650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.928845550418174
When time is : 700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.921806290539294
When time is : 750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.918537593098634
When time is : 800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91692476670531
When time is : 850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91603915253814
When time is : 900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91547293354079
When time is : 950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91504701836891
When time is : 1000, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.914682759718445
When time is : 1050, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91434561990997
When time is : 1100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9140204318406
When time is : 1150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91370053492356
When time is : 1200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91338300417686
When time is : 1250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91306655509527
When time is : 1300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912750623583214
When time is : 1350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912434961909526
When time is : 1400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91211946127419
When time is : 1450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91180407388745
When time is : 1500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9114887787528
When time is : 1550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9111735666393
When time is : 1600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91085843348287
When time is : 1650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91054337748873
When time is : 1700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.910228397858496
When time is : 1750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9099134942312
When time is : 1800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.909598666438264
When time is : 1850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90928391439542
When time is : 1900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90896923805536
When time is : 1950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.908654637387244

一个简单的机器学习--梯度下降模型就完成啦，当然这其中还有很多问题，比如初始值的选取、学习率的选取等等，这些就是我们后面探讨的内容啦。

最后，欢迎大家访问我的GitHub查看更多代码：https://github.com/LiuPineapple
欢迎大家访问我的简书主页查看更多文章：https://www.jianshu.com/u/31e8349bd083

NLP笔记(3) -- 基于机器学习的模型，简单的梯度下降实践

NLP笔记(3) -- 基于机器学习的模型，简单的梯度下降实践

写在前面

Machine Learning--Gradient Descent（机器学习--梯度下降）

Gradient Descent（梯度下降）