神经网络:学习(一)

代价函数(Cost Function)

在神经网络模型中,我们引入一些新的标记:

  • L:表示神经网络模型的层数;
  • Sl:表示第l层的激活单元的个数(注:不包括偏置单元);
  • SL:表示输出层的激活单元的个数;
  • K:表示类别分类的个数。

在正则化的逻辑回归中,其代价函数J(θ)如下:

在逻辑回归中,只有一个输出变量y,但在神经网络模型中,其输出变量是一个维度为K的向量。因此,在神经网络模型中,代价函数J(θ)改写为:

补充笔记
Cost Function

Let's first define a few variables that we will need to use:

  • L = total number of layers in the network
  • sl = number of units (not counting bias unit) in layer l
  • K = number of output units/classes

Recall that in neural networks, we may have many output nodes. We denote hΘ(x)k as being a hypothesis that results in the kth output. Our cost function for neural networks is going to be a generalization of the one we used for logistic regression. Recall that the cost function for regularized logistic regression was:

For neural networks, it is going to be slightly more complicated:

We have added a few nested summations to account for our multiple output nodes. In the first part of the equation, before the square brackets, we have an additional nested summation that loops through the number of output nodes.

In the regularization part, after the square brackets, we must account for multiple theta matrices. The number of columns in our current theta matrix is equal to the number of nodes in our current layer (including the bias unit). The number of rows in our current theta matrix is equal to the number of nodes in the next layer (excluding the bias unit). As before with logistic regression, we square every term.

Note:

  • the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
  • the triple sum simply adds up the squares of all the individual Θs in the entire network
  • the i in the triple sum does not refer to training example i
反向传播算法(Backpropagation Algorithm)

在计算hΘ(x)时,我们采用正向传播算法,从输入层一层一层计算,直至输出层为止。

现为了计算偏导数:

我们采用反向传播算法,从输出层一层一层计算误差(误差:指激活单元的预测值ak(l)与实际值yk之间的误差,其中k=1:K。),直至倒数第二层为止。最后一层为输入层,其数据时我们从训练集中获取的,所以该部分没有误差。

假设现训练集中只有一个样本,神经网络模型如下图所示:

按照反向传播算法,我们先从输出层开始计算误差,此处为了标记误差,我们引用δ来表示,则该表达式为:

这时,我们利用上述误差δ(4)来计算第三层的误差,其表达式为:

其中,g'(z(l))根据逻辑回归(二)中关于梯度下降算法的公式推导,其求导后的表达式为:

最后,我们利用δ(3)来计算第二层的误差,其表达式为:

因此,我们可以推导出代价函数J(θ)的偏导数为:

若考虑正则化以及全体样本的训练集,则我们用Δij(l)表示误差矩阵,其运算步骤如下:

将上述步骤完成后得到误差矩阵Δi,j(l)

我们便可以计算代价函数的偏导数了,其计算方法如下:

最后,我们可以得到:

由此,我们可以利用该表达式使用梯度下降算法或其他高级算法。

补充笔记
Backpropagation Algorithm

"Backpropagation" is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression. Our goal is to compute:

That is, we want to minimize our cost function J using an optimal set of parameters in theta. In this section we'll look at the equations we use to compute the partial derivative of J(Θ):

To do so, we use the following algorithm:

Back propagation Algorithm

Given training set {(x(1),y(1))⋯(x(m),y(m))}

  • Set Δi,j(l) := 0 for all (l,i,j), (hence you end up having a matrix full of zeros)

For training example t =1 to m:

  1. Set a(1):=x(t)
  2. Perform forward propagation to compute a(l) for l=2,3,…,L
  1. Using y(t), compute δ(L)=a(L)−y(t)

Where L is our total number of layers and a(L) is the vector of outputs of the activation units for the last layer. So our "error values" for the last layer are simply the differences of our actual results in the last layer and the correct outputs in y. To get the delta values of the layers before the last layer, we can use an equation that steps us back from right to left:

  1. Compute δ(L−1)(L−2),…,δ(2) using δ(l)=((Θ(l))Tδ(l+1)) .∗ a(l) .∗ (1−a(l))

The delta values of layer l are calculated by multiplying the delta values in the next layer with the theta matrix of layer l. We then element-wise multiply that with a function called g', or g-prime, which is the derivative of the activation function g evaluated with the input values given by z(l).

The g-prime derivative terms can also be written out as:

  1. Δi,j(l):=Δi,j(l)+aj(l)δi(l+1) or with vectorization, Δ(l):=Δ(l)(l+1)(a(l))T

Hence we update our new Δ matrix.

he capital-delta matrix D is used as an "accumulator" to add up our values as we go along and eventually compute our partial derivative. Thus we get:

Backpropagation Intuition

Recall that the cost function for a neural network is:

If we consider simple non-multiclass classification (k = 1) and disregard regularization, the cost is computed with:

Intuitively, δj(l) is the "error" for aj(l) (unit j in layer l). More formally, the delta values are actually the derivative of the cost function:

Recall that our derivative is the slope of a line tangent to the cost function, so the steeper the slope the more incorrect we are. Let us consider the following neural network below and see how we could calculate some δj(l):

In the image above, to calculate δ2(2), we multiply the weights Θ12(2) and Θ22(2) by their respective δ values found to the right of each edge. So we get δ2(2)= Θ12(2)δ1(3)22(2)δ2(3). To calculate every single possible δj(l), we could start from the right of our diagram. We can think of our edges as our Θij. Going from right to left, to calculate the value of δj(l), you can just take the over all sum of each weight times the δ it is coming from. Hence, another example would be δ2(3)12(3)1(4).

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,839评论 6 482
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,543评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 153,116评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,371评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,384评论 5 374
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,111评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,416评论 3 400
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,053评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,558评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,007评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,117评论 1 334
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,756评论 4 324
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,324评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,315评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,539评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,578评论 2 355
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,877评论 2 345

推荐阅读更多精彩内容

  • 我总是像个不懂事的孩子,所有的开心,难过,生气,欢喜都写在脸上;喜欢的人怎么都好,对不喜欢的人客套话都说不...
    Air焕焕阅读 204评论 0 0