写在前面
- 最近在学习NLP的课程,下面的代码,基本来自我的NLP课程作业,当然大部分都是模仿老师写的,使用Python完成,感兴趣的可以去我的github上面查看:https://github.com/LiuPineapple/Learning-NLP/tree/master/Assignments/lesson-02
- 作者水平有限,如果有文章中有错误的地方,欢迎指正!如有侵权,请联系作者删除。
Machine Learning--Gradient Descent(机器学习--梯度下降)
机器学习是什么,不同的人可能给出不同的定义。我的理解是,使用算法让机器从数据中学习,进而得到比人为设计更好的模型,去做某些诸如分类、预测的事情。
这里,我们研究波士顿房价预测这一问题,来对机器学习做一个简单的实践。
from sklearn.datasets import load_boston
data = load_boston()
X, y = data['data'], data['target']
X[1]
array([2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,
6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,
1.7800e+01, 3.9690e+02, 9.1400e+00])
len(y)
506
len(X[:, 0])
506
X_rm = X[:, 5]
上段代码中需要注意的地方有:
- y代表着不同房子的房价,X代表着房子的各种变量,如大小,犯罪率等。可以看到,我们一共使用了506栋房子的数据。
- 为了简单起见,我们仅仅研究X的第6个参数与房价的关系,所以需要把第六个变量在各个房子上的取值单独拿出来为
X_rm
。
我们假设自变量与因变量之间是线性关系,即,为未知参数,定义price()
函数,来计算给定自变量与参数值后的y值。我们的任务就是,找到一个合适的参数值,使得当我们给定一个,使用上式得到的预测值与真实值之间的差距尽可能的小。 如果我们能够找到比较合适的参数值,那么就有可能得到准确率比较高的预测结果。
那么我们如何定义我们得到的预测值与真实值之间的差距呢?我们使用如下定义:
def price(rm, k, b):
"""f(x) = k * x + b"""
return k * rm + b
def loss(y, y_hat): # to evaluate the performance
return sum((y_i - y_hat_i)**2 for y_i, y_hat_i in zip(list(y), list(y_hat))) / len(list(y))
# 也可以使用numpy来更简单的定义损失函数
import numpy as np
def loss(y,y_hat):
e = np.array(y)-np.array(y_hat)
return (e@e.T)/len(y)
上段代码中需要注意的地方有:
- Python3 zip() 函数 https://www.runoob.com/python3/python3-func-zip.html
我们的任务就是,找到一个合适的参数值,使得loss尽可能小。那么按照机器学习的思想,我们要做的是先随机生成一个,然后通过数据去让程序自动的去调整,直到迭代多少次或者损失小于某个值。
Gradient Descent(梯度下降)
我们可以看到,是确定的值,loss其实是以为变量的函数,我们求loss关于的偏导数以及相应代码如下所示:
def partial_k(x, y, y_hat):
n = len(y)
gradient = 0
for x_i, y_i, y_hat_i in zip(list(x), list(y), list(y_hat)):
gradient += (y_i - y_hat_i) * x_i
return -2 / n * gradient
def partial_b(x, y, y_hat):
n = len(y)
gradient = 0
for y_i, y_hat_i in zip(list(y), list(y_hat)):
gradient += (y_i - y_hat_i)
return -2 / n * gradient
我们在随机得到后,计算loss以及loss关于的偏导数,一般来说,随机得到的都会使得loss比较大,那么我们应该怎么变化,才能使得loss不断减小呢?偏导数为我们提供了变化的方向,我们定义一个正的学习率,在计算完偏导数后,我们对的值做如下变化:
得到新的后,我们带回去计算loss,如果新的到的loss比之前的loss小,那么最小的loss就是新的到的loss,也是比之前的更为合适的取值,接下来再重复上述过程,直到重复了某个次数或者损失小于某个值。注意,一定要同步更新,不能先更新再用更新了的去计算函数关于的偏导数去更新。代码如下:
import random
trying_times = 2000
min_loss = float('inf')
current_k = random.random() * 200 - 100
current_b = random.random() * 200 - 100
learning_rate = 1e-04
for i in range(trying_times):
price_by_k_and_b = [price(r, current_k, current_b) for r in X_rm]
current_loss = loss(y, price_by_k_and_b)
if current_loss < min_loss: # performance became better
min_loss = current_loss
if i % 50 == 0:
print('When time is : {}, get best_k: {} best_b: {}, and the loss is: {}'.format(i, best_k, best_b, min_loss))
k_gradient = partial_k(X_rm, y, price_by_k_and_b)
b_gradient = partial_b(X_rm, y, price_by_k_and_b)
current_k = current_k + (-1 * k_gradient) * learning_rate
current_b = current_b + (-1 * b_gradient) * learning_rate
上段代码中需要注意的地方有:
- Python中可以用如下方式表示正负无穷:
float("inf"), float("-inf")
,利用 inf 做加、乘算术运算仍会得到 inf。除了inf外的其他数除以inf,会得到0。 - Python random() 函数。https://www.runoob.com/python/func-number-random.html。注意区分random模块中的random和numpy模块中的random。
- Python format 格式化函数。https://www.runoob.com/python/att-string-format.html
-
1e-04
代表
最后得到的结果如下所示:
When time is : 0, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 575.5349822522099
When time is : 50, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 277.9378161169662
When time is : 100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 147.24895628021088
When time is : 150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 89.8572545975801
When time is : 200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 64.65372567052019
When time is : 250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 53.58551239815359
When time is : 300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 48.72477014152337
When time is : 350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 46.59001559478237
When time is : 400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.65236839246802
When time is : 450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.24042644341104
When time is : 500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.059346031766644
When time is : 550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.97964764306714
When time is : 600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.94447083305862
When time is : 650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.928845550418174
When time is : 700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.921806290539294
When time is : 750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.918537593098634
When time is : 800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91692476670531
When time is : 850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91603915253814
When time is : 900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91547293354079
When time is : 950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91504701836891
When time is : 1000, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.914682759718445
When time is : 1050, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91434561990997
When time is : 1100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9140204318406
When time is : 1150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91370053492356
When time is : 1200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91338300417686
When time is : 1250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91306655509527
When time is : 1300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912750623583214
When time is : 1350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912434961909526
When time is : 1400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91211946127419
When time is : 1450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91180407388745
When time is : 1500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9114887787528
When time is : 1550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9111735666393
When time is : 1600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91085843348287
When time is : 1650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91054337748873
When time is : 1700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.910228397858496
When time is : 1750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9099134942312
When time is : 1800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.909598666438264
When time is : 1850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90928391439542
When time is : 1900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90896923805536
When time is : 1950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.908654637387244
一个简单的机器学习--梯度下降模型就完成啦,当然这其中还有很多问题,比如初始值的选取、学习率的选取等等,这些就是我们后面探讨的内容啦。
最后,欢迎大家访问我的GitHub查看更多代码:https://github.com/LiuPineapple
欢迎大家访问我的简书主页查看更多文章:https://www.jianshu.com/u/31e8349bd083