机器学习-监督学习模型总结 V1.1

学习 Andrew Ng 吴恩达先生的《Machine Learning》,以及台湾国立大学林轩田先生的《机器学习基石》、《机器学习技法》,先将课程中涉及的机器学习的监督学习模型总结如下。

Classification

Classification 是指分类问题。

PLA

定义

PLA = Perceptrons Learning Algorithm ,属于 classification。一般说的 PLA 分为 Naive PLA 与 Pocket PLA。其中,感知机(英语:Perceptron)是一种二元线性分类器。

适用条件

二元线性分类。

如何使用

比较与拓展说明

Naive PLA算法的思想很简单。一直修正权重向量 W,直到向量 W 满足所有数据为止。Naive PLA的一大问题就是如果数据有杂音,不能完美的分类的话,算法就不会中止。所以,对于有杂音的数据,我们只能期望找到错误最少的结果。然后这是一个 NP Hard 问题。

Pocket PLA 一个贪心的近似算法,和 Naive PLA 算法类似。变顺序迭代为随机迭代,如果找出错误,则修正结果。在修正过程中,记录犯错误最少的向量。

Regression

Regression 与 Classification 的比较:
Classification trees have dependent variables that are categorical and unordered. Regression trees have dependent variables that are continuous values or ordered whole values. Regression means to predict the output value using training data. Classification means to group the output into a class.

When it comes to how to figure out which is a classification problem and which is a regression problem, an easy way to think about it is to ask yourself if you are trying to predict which class (or category) something belongs to or are you trying to predict a value.

Predicting a class is classification (ham/spam, image of a cat/not an image of a cat, etc...)Predicting a value (a number) is regression. (Housing prices, tomorrows temperature, etc...) Classification can.be built on top of regression.

Linear Regression

定义

Linear Regression.png

In statistics, linear regression is an approach for modeling the relationship between a scalar[标量的] dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.

适用条件

Training Set 中的数据是线性分布的,且输出的预计量也为数字。

如何使用

Logistic Regression

定义

Logistic Regression.png

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

适用条件

输出的预计量也为分出的类别。

如何使用

比较与拓展说明

linear regression 与 logistic regression 的区别:
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic Regression is used when response variable is categorical in nature.

Generative Learning algorithms

Consider a classification problem in which we want to learn to distinguish between elephants (y = 1) and dogs (y = 0), based on some features of an animal. Given a training set, an algorithm like logistic regression or the perceptron algorithm (basically) tries to find a straight line—that is, a decision boundary—that separates the elephants and dogs. Then, to classify a new animal as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly.

Gaussian Discriminant Analysis model(GDA)

定义

GDA.png

GDA, is a method for data classification commonly used when data can be approximated with a Normal distribution. You will need a training set, i.e. a bunch of data yet classified. These data are used to train your classifier, and obtain a discriminant function that will tell you to which class a data has higher probability to belong.

适用条件

Training data can be approximated with a Normal distribution.

如何使用

比较与拓展说明

GDA 与 Logistic Regression 的区别:
高斯判别算法(strong assumption)与logistic收敛(week assumption)。可参见 Andrew NG Notes2, Page 6 of 14.

回归模型是判别模型,也就是根据特征值来求结果的概率。比如说要确定一只羊是山羊还是绵羊,用判别模型的方法是先从历史数据中学习到模型,然后通过提取这只羊的特征来预测出这只羊是山羊的概率,是绵羊的概率。换一种思路,我们可以根据山羊的特征首先学习出一个山羊模型,然后根据绵羊的特征学习出一个绵羊模型。然后从这只羊中提取特征,放到山羊模型中看概率是多少,再放到绵羊模型中看概率是多少,哪个大就是哪个。

Naive Bayes 朴素贝叶斯

定义

Naive Bayes.png

It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

适用条件

Training set data xi are discrete-valued.

如何使用

假设关键词与关键词没有关联,常用于垃圾邮件分类等一类的分类问题。

比较与拓展说明

  1. GDA VS Bayes
    In GDA, the feature vectors x were continuous, real-valued vectors. Lets now talk about a different learning algorithm in which the xi’s are discrete-valued

  2. Logistic regression VS Naive Bayes
    Logistic Regression comes under the category of a Discriminative classifier, which models the posterior P(class|x) directly from the data, or learn a direct map from inputs x to the class labels.
    Whereas, Discriminant Analysis is a Generative classifier that learns a model of the joint probability P(x,class) and makes their predictions by Bayes' rule

Support Vector Machine (SVM)

SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs. Simply put, it does some extremely complex data transformations, then figures out how to seperate your data based on the labels or outputs you've defined.

when it comes to computing the SVM classifier, there are three approaches: primal, dual and kernel.

Linear SVM

Margin: If the training data are linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. The region bounded by these two hyperplanes is called the "margin", and the maximum-margin hyperplane is the hyperplane that lies halfway between them.

Hard and soft margin:……

Non-linear SVM

The idea is to gain linearly separation by mapping the data to a higher dimensional space.

AdaBoost(Adaptive Boosting)

定义

参见林轩田 Chapter 7 - 8

AdaBoost, short for "Adaptive Boosting, is a machine learning algorithm. It can be used in conjunction with many other types of learning algorithms to improve their performance. **The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. **AdaBoost is adaptive in the sense that **subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. **

AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

使用方法

推导思路与过程

adaboost is actually something like aggregation.
uniform blending or linear blending -> Bagging(Bootstrap Aggregation: resampling from D given)-> boosting(Focus on key examples(wrong predictions)) -> re-weighting different g -> adaptive boosting algorithm(Scale up incorrect -> dif hypothesis)

解释说明:
blending:aggregate after getting gt
learning:aggregate as well as getting gt

(Bootstrap Aggregation):用同一份资料得到不同的 g
Bootstrapping - resampling from D given, re-sample N examples form D uniformly with replacement(有放回的取出一笔又一笔的资料)

AdaBoosting:
U = 开根号(e/(1-e)): 错误越大,对形成 G 越重要,则权重比 U 越大。

Decision Tree

定义

Decision tree learning uses a decision tree as a predictive model observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Random Forest

See more on 林轩田机器学习技法 Chapter 10.

定义

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

Random Forest = bagging + decision tree.

使用方法

推导思路与过程

Out-of-bag (OOB) error
also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating to sub-sample data samples used for training. Eoob is self-validation of bagging/RF.

OutOfBag.png

Feature Selection

Permutation 方法 是将某个 Feature 下的数据乱序排列,再将这个 Feature 下的乱序数据和其他 Feature 下的原始数据重新组合起来,看该 Feature 数据乱序之后知否对整体产生重大影响。如果是,则该 Feature 很重要。如下图:

Permutation.png

事实上如下图,对于 RF,feature selection 要通过 permutation + OOB

image.png

Gradient boosted decision tree

定义

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

如何使用

推导过程

Gradient boosted decision tree.png
image.png

Gradient Boosted Decision Tree - GBDT
整体思路:sn 是根据数据 xn 和 gt 预测出来的值,yn 是真实值,yn-sn 是残差。我们会用切割后的数据集 x、残差 y-s 作为新的数据集,使用新的 gt (仍是 DecisionTree)做新的数据切割和预测,一直到残差无限接近于 0,即预测值和真实值非常接近。

  1. A 是我们未知的一个 regression 算法,采用的是 squared error 方法,然后决定采用 C&RT decision tree 做我们的 gt。可以简单理解为 A = gt = C&RT。
  2. 第一步将数据切一刀之后,at 是根据切分后的这部分数据做出的单变量 linear regression 的斜率,体现了我们 regression 的算法。此时的 gt(xn) 是采用 decision tree gt(x) 切后的那一部分数据。yn - sn
  3. s (score) = s + at*gt(xn), 其中此时的 s 是根据 linear regression 和 X 做出的预测值。

将该预测值和真实的 yn 的求差值。

GBDT.png

如下内容本文暂不涉及 neural network

参考链接

文中的参考链接以链接形式已在原文标出,其他参考链接或建议额外阅读的链接列举如下:

  1. An Introduction to Gradient Descent and Linear Regression
  2. Gradient Descent For Machine Learning
  3. How to select kernel for SVM
  4. An idiot's guide to Support vector machines(SVMs)
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,311评论 6 481
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,339评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 152,671评论 0 342
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,252评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,253评论 5 371
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,031评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,340评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,973评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,466评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,937评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,039评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,701评论 4 323
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,254评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,259评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,485评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,497评论 2 354
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,786评论 2 345

推荐阅读更多精彩内容

  • 据调查研究证实,如果孕妇长期玩手机或者睡前两个小时玩手机等电子产品,会抑制褪黑素的分泌,导致睡眠不好,长此以往将会...
    爱的家庭阅读 222评论 0 0
  • 内存是计算机非常关键的部件之一,是暂时存储程序以及数据的空间,CPU只有有限的寄存器可以用于 存储计算数据,而大部...
    dreamer_lk阅读 1,181评论 2 10
  • 暮云平,南山横。一叶知秋片片成,春夏又秋冬。 挽子玉,莫长空。飞絮流云和雁声,西楼风上风。
    爱羽扇纶巾阅读 159评论 0 0