正则化

过拟合问题（The Problem of Overfitting）

如上图所示，第一个采用单变量线性回归模型来拟合数据集，但其效果并不好，因此我们将这种情况称为欠拟合（Underfitting）或高偏差（High Bias）；第二个采用二次多项式的线性回归模型来拟合数据集，其效果恰好，因此我们将这种情况称为“Just Right”；第三个采用四次多项式的线性回归模型来拟合数据集，其虽然对数据集拟合的非常好，但其曲线忽上忽下难以针对新数据进行预测，因此我们将这种情况称为过拟合（Overfitting）或高方差（ high variance）。

除此之外，逻辑回归模型也存在上述情况，如下图所示：

根据在线性回归模型中的分析，我们不难得知第一个为欠拟合，第二个最合适，第三个过拟合。

现在我们来看看过拟合的定义：

即若数据集中存在许多特征变量，我们通过使用高次方多项式来拟合数据集，其看似将数据集中的每个数据都拟合得很好，但其对于新数据的处理就无法做得很好，即泛化较差（泛化指一个假设模型能应用到新样板的能力），这时我们将其称为过拟合。

Question:
Consider the medical diagnosis problem of classifying tumors as malignant or begin. If a hypothesis h_θ(x) has overfit the training set, it means that:
A. It makes accurate predictions for examples in the training set and generalizes well to make accurate predictions on new, previously unseen examples.
B. It does not make accurate predictions for examples in the training set, but it does generalize well to make accurate predictions on new, previously unseen example.
C. It makes accurate predictions for examples in the training set, but it does not generalize well to make accurate predictions on new, previously unseen examples.
D. It does not make accurate predictions for examples in the training set and does not generalize well to make accurate predictions on new, previously unseen examples.

根据过拟合的定义我们不难得知C为正确答案。

针对过拟合问题，我们有如下方法来解决：

减少特征变量的个数：
- 人工选择特征变量
- 使用模型选择算法，自动选择特征变量
正则化：保留所有特征变量，但减小参数θ_j的值

补充笔记

The Problem of Overfitting

Consider the problem of predicting y from x ∈ R. The leftmost figure below shows the result of fitting a y = θ₀+θ₁x to a dataset. We see that the data doesn’t really lie on straight line, and so the fit is not very good.

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:

Reduce the number of features:
- Manually select which features to keep.
- Use a model selection algorithm (studied later in the course).
Regularization
- Keep all the features, but reduce the magnitude of parameters θ_j.
- Regularization works well when we have a lot of slightly useful features.

代价函数（Cost Function）

若假设函数h_θ(x) = θ₀ + θ₁x₁ + θ₂x₂² + θ₃x₃³ + θ₄x₄⁴，则会出现对下图数据集过拟合的情况。

现假设所有的特征变量x都是非常重要的，因此我们不能舍弃任何一个特征变量x。为了解决这个问题，我们使用正则化的方法将参数θj的值变小。

为此我们需要将代价函数J(θ)修改为如下图所示那样：

当我们使用梯度下降算法或其他高级算法来求得了参数θ的值来使得代价函数J(θ)最小化时，其θ₃和θ₄的值相比之前对新数据预测的影响要小。为什么呢？

这时因为我们通过使用正则化方法，在求得代价函数J(θ)最小化时，其θ₃和θ₄的值会无限接近于0。因此，假设函数h_θ(x)甚至可以改写为h_θ(x) = θ₀ + θ₁x₁ + θ₂x₂²。

如若某个数据集中有非常多的特征变量x且每个特征变量都非常重要，为了避免过拟合问题，我们可将代价函数J(θ)修改为：

其中λ称为正则化参数（Regularization Parameter）。因此，我们将这种方法称为正则化。

注：此处我们无需考虑θ₀。

对于正则化参数λ的选择我们也要慎重，一旦其值过大，则θ₁，θ₂，θ₃和θ₄都会无限接近于0。此时，假设函数h_θ(x)甚至可以改写为h_θ(x) = θ₀。

其结果如图中红线所示，这样就出现了欠拟合问题。

补充笔记

Cost Function

If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.

Say we wanted to make the following function more quadratic:

We'll want to eliminate the influence of θ₃x³ and θ₄x⁴ . Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:

We've added two extra terms at the end to inflate the cost of θ₃ and θ₄. Now, in order for the cost function to get close to zero, we will have to reduce the values of θ₃ and θ₄ to near zero. This will in turn greatly reduce the values of θ₃x³ and θ₄x⁴ in our hypothesis function. As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small terms θ₃x³ and θ₄x⁴.

We could also regularize all of our theta parameters in a single summation as:

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Using the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If lambda is chosen to be too large, it may smooth out the function too much and cause underfitting.

正则化的线性回归（Regularized Linear Regression）

正则化的代价函数J(θ)为：

现在我们使用学过的梯度下降算法和正规方程法来求出使得代价函数J(θ)最小化的参数θ的值。

梯度下降算法

由于在正则化过程中，我们不对θ₀做任何处理，于是梯度下降算法的表达式为：

对于j=1, 2, 3, ...时的迭代表达式可改写为：

其中1-α*λ/m﹤1一定成立。

正规方程

正则化的正规方程的公式为：

其中L矩阵为(n+1)*(n+1)。

对于样本数量m小于特征变量x的个数n时，X^TX为不可逆矩阵（奇异矩阵），若如我们在Octave中使用pinv()函数则可求出其伪逆矩阵，但使用inv()则无法求出其可逆矩阵。

注：对于样本数量m等于特征变量x的个数n时，X^TX可能为不可逆矩阵（奇异矩阵）。

存在正则化参数λ﹥0时，即使当样本数量m小于等于特征变量x的个数n时，X^TX为不可逆矩阵，也可使用inv()求出其可逆矩阵。

补充笔记

Regularized Linear Regression

We can apply regularization to both linear regression and logistic regression. We will approach linear regression first.

Gradient Descent

We will modify our gradient descent function to separate out θ₀ from the rest of the parameters because we do not want to penalize θ₀.

Normal Equation

Now let's approach regularization using the alternate method of the non-iterative normal equation.

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:

L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1). Intuitively, this is the identity matrix (though we are not including x₀), multiplied with a single real number λ.

Recall that if m < n, then X^TX is non-invertible. However, when we add the term λ⋅L, then X^TX + λ⋅L becomes invertible.

正则化的逻辑回归（Regularized Logistic Regression）

正则化的逻辑回归模型的代价函数J(θ)为：

梯度下降算法

其中h_θ(x) = g(θ^TX)。

高级优化算法

首先，创建costFunction.m文件并在文件中按如下图所示写出相关函数代码：

然后，如之前在逻辑回归（二）一文中所讲，在Octave中调用fminunc()函数，具体操作可回顾逻辑回归（二）一文。

补充笔记

Regularized Logistic Regression

We can regularize logistic regression in a similar way that we regularize linear regression. As a result, we can avoid overfitting. The following image shows how the regularized function, displayed by the pink line, is less likely to overfit than the non-regularized function represented by the blue line:

Cost Function

Recall that our cost function for logistic regression was:

We can regularize this equation by adding a term to the end:

最后编辑于：2017.12.10 03:10:24

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,242评论 5赞 459
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,769评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,484评论 0赞 319
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,133评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 61,007评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,080评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,496评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,190评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,464评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,549评论 2赞 309
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,330评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,205评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,567评论 3赞 298
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,889评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,160评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,475评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,650评论 2赞 335

正则化

过拟合问题（The Problem of Overfitting）

补充笔记

The Problem of Overfitting

代价函数（Cost Function）

补充笔记

Cost Function

正则化的线性回归（Regularized Linear Regression）

补充笔记

Regularized Linear Regression

正则化的逻辑回归（Regularized Logistic Regression）

补充笔记

Regularized Logistic Regression

推荐阅读更多精彩内容