英文笔记:
Machine Learning
-Grew out of work in AI
-New capability for computers
Examples:
-Database mining
Large data sets from growth of automation/web
e.g.Web click data,medical records,biology,engineering
Natural Language Processing(NLP),Computer Vision.
-Self-customizing programs
E.g. autonomous helicopter,handwriting recognition,most of Natural Language Processing(NLP),Computer Vision.
-Understanding human learning(brain,real AI)
What is machine learning
Definition:
Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field
of study that gives computers the ability to learn without being explicitly programmed." This
is an older, informal definition.
Tom Mitchell provides a more modern definition: "A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with experience E."
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
In general, any machine learning problem can be assigned to one of two broad
classifications:
Supervised learning and Unsupervised learning.
Supervised Learning
In supervised learning, we are given a data set and already know what our correct output
should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification" problems.
In a regression problem, we are trying to predict results within a continuous output, meaning
that we are trying to map input variables to some continuous function. In a classification
problem, we are instead trying to predict results in a discrete output. In other words, we are
trying to map input variables into discrete categories.
Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price
as a function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about
whether the house "sells for more or less than the asking price." Here we are classifying the
houses based on price into two discrete categories.
Example 2:
(a) Regression - Given a picture of a person, we have to predict their age on the basis of the
given picture
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is
malignant or benign.
Unsupervised learning allows us to approach problems with little or no idea what our results
should look like. We can derive structure from data where we don't necessarily know the
effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables
in the data.
With unsupervised learning there is no feedback based on the prediction results.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically
group these genes into groups that are somehow similar or related by different variables,
such as lifespan, location, roles, and so on.
Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic
environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail
party)
中文笔记:
机器学习简介
起源:随着AI技术(人工智能)的发展产生,是计算机的一项新能力。
机器学习运用举例:
1. 数据挖掘
数据源:来自自动化或网页的大量数据;如网页点击动作数据,医学记录,生物学,工程学,自然语言处理,计算机视觉等
2.定制化的程序
无人机,手写识别,计算机视觉,大多数的自然语言处理
3. 对于人类行为的学习
如对于人的大脑模拟,real AI;
可以这样理解:所有的机器学习不外乎
监督学习:简单理解为人来教机器一件事
无监督学习:就是机器自己来学习一件事
Others:Reinforcement learning(强化学习),recommender system(推荐系统)
机器学习的种类分为两种:
1、监督学习
2、非监督学习
机器学习就是通过学习一项任务得到经验,再从经验中继续学习不断迭代,直到可以达到某种标准实现或分类或回归的功能。
回归问题:通过拟合曲线预测数据趋势的一个过程。此例为房价预测与房屋面积的关系拟合问题。
通过机器学习算法,将数据分成不同类型,每一类型代表一个集体。
分类问题举例:
通过机器学习算法,计算肿瘤尺寸大小是良性或恶性的概率,以此得出判断。
监督学习与非监督学习的区别:
监督学习:告知每一数据其主体类别;
非监督学习:数据就是无差别的个体,需要通过机器学习算法实现自我分类的过程。
监督学习:分为回归问题(有连续的输出)和分类问题(离散输出);
非监督学习:聚类问题;
举例:
1、房价与面积关系
监督学习,回归问题。
2、房价与是否能卖出关系
监督学习,分类问题
无监督学习:
允许我们只给出数据,而不知道确切的数据属于哪一类别。
主要问题:
聚类问题:
举例:
1、计算机集群。通过机器学习算法得知让哪些计算机协同工作可以让数据中心更加高效的工作。
2、好友圈的划分。通过朋友间互发邮件以及社交媒体互动频率划分成各个小的人脉圈。
3、市场领域划分。根据机器学习算法,将市场提供的数据划分成不同的领域,使得市场运行更加高效。
4、星系划分。通过机器学习算法,将星星划分到不同星系,对星系形成理论产生重要贡献。
无聚类问题:
鸡尾酒算法:
在复杂环境中将属于不同主体的数据检测并分离。