机器学习应用建议(二)

偏差和方差的判别

高偏差和高方差本质上为学习模型的欠拟合和过拟合问题。

对于高偏差和高方差问题,即学习模型的欠拟合和过拟合问题,我们通常绘制如下图表进行判断:

高偏差——欠拟合问题

  • Jtrain(Θ)误差大
  • JCV(Θ)误差 ≈ Jtrain(Θ)误差

高方差——过拟合问题

  • Jtrain(Θ)误差小
  • JCV(Θ)误差 >> Jtrain(Θ)误差
补充笔记
Diagnosing Bias vs. Variance

In this section we examine the relationship between the degree of the polynomial d and the underfitting or overfitting of our hypothesis.

  • We need to distinguish whether bias or variance is the problem contributing to bad predictions.
  • High bias is underfitting and high variance is overfitting. Ideally, we need to find a golden mean between these two.

The training error will tend to decrease as we increase the degree d of the polynomial.

At the same time, the cross validation error will tend to decrease as we increase d up to a point, and then it will increase as d is increased, forming a convex curve.

High bias (underfitting): both Jtrain(Θ) and JCV(Θ) will be high. Also, JCV(Θ)≈Jtrain(Θ).

High variance (overfitting): Jtrain(Θ) will be low and JCV(Θ) will be much greater than Jtrain(Θ).

The is summarized in the figure below:

正则化的偏差与方差

在训练模型的过程中,为了避免过拟合问题我们通常使用正则化方法。但对于正则化参数λ的选择,我们是需要谨慎考虑的。

之前,我们在考虑正则化参数λ的选择时,只是考虑单变量的情况。现在,我们要考虑在多项式的情况下,正则化参数λ的取值问题。

例如:对于某一多项式模型,我们使用正则化方法。其中,正则化参数λ=0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10。现求出最佳的正则化参数λ的值。

首先,我们将数据集分为训练集、交叉验证集和测试集三部分。

然后,当正则化参数λ=0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10时,我们分别求出Jtran(θ)和JCV(θ)。

最后,我们利用测试集对JCV(θ)最小时的某个正则化参数λ值进行计算,求出其Jtest(θ)。

图中,假设正则化参数λ=0.08时,JCV(θ)最小。

为了便于理解,以及便于找到最佳的正则化参数λ的值,我们可以画出下图:

补充笔记
Regularization and Bias/Variance

In the figure above, we see that as λ increases, our fit becomes more rigid. On the other hand, as λ approaches 0, we tend to over overfit the data. So how do we choose our parameter λ to get it 'just right' ? In order to choose the model and the regularization term λ, we need to:

  1. Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24});
  2. Create a set of models with different degrees or any other variants.
  3. Iterate through the λs and for each λ go through all the models to learn some Θ.
  4. Compute the cross validation error using the learned Θ (computed with λ) on the JCV(Θ) without regularization or λ = 0.
  5. Select the best combo that produces the lowest error on the cross validation set.
  6. Using the best combo Θ and λ, apply it on Jtest(Θ) to see if it has a good generalization of the problem.
学习曲线

通过绘制学习曲线可以帮助我们了解学习算法是否运行正常。学习曲线为训练集误差、交叉验证集误差与训练集样本数量m之间的函数关系图。

上图中,假设函数为hθ(x) = θ0 + θ1x + θ2x2,且此处不考虑正则化。当m = 1时,我们的假设函数hθ(x)能完美拟合训练集,其Jtrain(θ) = 0,但对于交叉验证集而言,假设函数hθ(x)的泛化能力差,其JCV(θ)的值将较大;当m=2时,我们的假设函数hθ能较好地拟合训练集,其Jtrain(θ)的值将稍微增大,但对于交叉验证集而言,假设函数hθ(x)的泛化能力依旧较差,其JCV(θ)的值将较比之前有略微减小;······;但m足够大时,Jtrain(θ)的值将增大到某一特定值后保持水平,JCV(θ)的值将减小到某一特定值后保持水平,且Jtrain(θ)的值与JCV(θ)的值非常接近。

因此,当学习算法处于高偏差的情况时,我们增加训练集样本数量是毫无用处的。

上图中,我们的假设函数hθ(x) = θ0 + θ1x + θ2x2 + ... + θ100x100,此处考虑正则化,其中正则化参数λ的值很小。当m = 5时,假设函数hθ(x)能够较好地拟合训练集,其Jtrain(θ)的值较小,但假设函数hθ(x)的泛化能力较差,其JCV(θ)的值较大;当m = 12时,假设函数hθ(x)依旧能够较好地拟合训练集,但其Jtrain(θ)的值稍微增大一些,JCV(θ)的值略微减小一些;······;当m足够大时,Jtrain(θ)的值逐渐增大,JCV(θ)的值逐渐减小。

因此,此时学习算法处于高偏差的情况时,我们增加训练集样本数量可能会有些帮助。

注:当m足够大时,Jtrain(θ)的值逐渐增大,JCV(θ)的值逐渐减小,这两者是否会相交,视频中尚未交代清楚。

补充笔记
Learning Curves

Training an algorithm on a very few number of data points (such as 1, 2 or 3) will easily have 0 errors because we can always find a quadratic curve that touches exactly those number of points. Hence:

  • As the training set gets larger, the error for a quadratic function increases.
  • The error value will plateau out after a certain m, or training set size.

Experiencing high bias:

Low training set size: causes Jtrain(Θ) to be low and JCV(Θ) to be high.

Large training set size: causes both Jtrain(Θ) and JCV(Θ) to be high with Jtrain(Θ)≈JCV(Θ).

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.

Experiencing high variance:

Low training set size: Jtrain(Θ) will be low and JCV(Θ) will be high.

Large training set size: Jtrain(Θ) increases with training set size and JCV(Θ) continues to decrease without leveling off. Also, Jtrain(Θ) < JCV(Θ) but the difference between them remains significant.

If a learning algorithm is suffering from high variance, getting more training data is likely to help.

下一步决定做什么

机器学习应用建议(一)一文的开头,我们就预测结果存在高误差而提出了如下的解决方法:

  • 获取更多的样本
  • 尝试减少特征变量的数量
  • 尝试获取更多的特征变量
  • 尝试增加多项式特征
  • 尝试减小正则化参数λ的值
  • 尝试增大正则化参数λ的值

对于这些方法,我们分别进行了研究得出了如下结论:

  • 获取更多的样本——适合高方差(过拟合)问题
  • 尝试减少特征变量的数量——适合高方差(过拟合)问题
  • 尝试获取更多的特征变量——适合高偏差(欠拟合)问题
  • 尝试增加多项式特征——适合高偏差(欠拟合)问题
  • 尝试减小正则化参数λ的值——适合高偏差(欠拟合)问题
  • 尝试增大正则化参数λ的值 ——适合高方差(过拟合)问题

对于神经网络模型而言,使用“小”的模型,其容易出现高偏差(欠拟合)问题,但其优势在于计算代价较小;使用“大”的模型(即隐藏层激活单元较多或有多个隐藏层。),其容易出现高方差(过拟合)问题,且其计算代价较大。但一般而言,正则化的神经网络模型越“大”其性能越好。

通常我们选择只含有一层隐藏层的神经网络模型。但对于其他情况,只含有一层隐藏层的神经网络模型并不是最优的模型。因此,我们可以将数据集分为训练集、交叉验证集和测试集三部分,分别对隐藏层层数不同的神经网络模型进行训练,找到一个JCV(Θ)最小的神经网络模型为止。

补充笔记
Deciding What to Do Next Revisited

Our decision process can be broken down as follows:

  • Getting more training examples: Fixes high variance
  • Trying smaller sets of features: Fixes high variance
  • Adding features: Fixes high bias
  • Adding polynomial features: Fixes high bias
  • Decreasing λ: Fixes high bias
  • Increasing λ: Fixes high variance.

Diagnosing Neural Networks

  • A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
  • A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.

Model Complexity Effects:

  • Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.
  • Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.
  • In reality, we would want to choose a model somewhere in between, that can generalize well but also fits the data reasonably well.
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,968评论 6 482
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,601评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 153,220评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,416评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,425评论 5 374
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,144评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,432评论 3 401
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,088评论 0 261
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,586评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,028评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,137评论 1 334
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,783评论 4 324
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,343评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,333评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,559评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,595评论 2 355
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,901评论 2 345

推荐阅读更多精彩内容