[翻译]新手 Python-机器学习 四部曲资源汇总

翻译:Four steps to master machine learning with python (including free books & resources),来自LernPython. 这篇文章很烂,不过里面的资源汇总的不错,这里相当于限Mark下,后面准备翻译那些不错的书和Paper,也欢迎更多的人加入.

想要理解和研究机器学习,首先你应该要掌握 Python 或者 R ,都是和 C, Java, PHP 差不多的语言(译:差太多了好吧).不过呢, Python 和 R 都是比较年轻(译:不懂, Python 可并不年轻吧),而且呢更高级,完全不用理解底层(译:?),所以他俩都很容易学. Python 更牛逼的地方在于她能够处理更多的问题,比如,机器学习,算法,图像等,而不像 R 只能是进行数据处理和分析. Python 有着更广泛的应用领域,比如 后端框架 Django (译:原文是,'Hosting websites: Jango'),自然语言处理(译: 原文是, 'natural language proecssing',作者太不认真,NLP),网站接入等,而且 Python 更像 C 语言(译:扯淡),所以她现在很流行.

毛子的原文里面有不少错误,我以自己的理解加以修正,仅供参考.语法文法错误我就直接修改,原文作者的表达内容错误会依据原文不变,在()内说明.

新手用 Python 进行机器学习的四个步骤

  • Python 基础知识学习,有书,Mooc,视频.
  • 处理数据,你得了解一些模块,如: Pandas, Numpy, Matplotlib 和 Natural Language Processing.
  • 接着你就得爬取数据,可以通过API,也可以直接到网站上去爬取.网站爬虫模块: BeautifulSoup(译:应该是 Scrapy, BS 是 HTML/XML 解析器).我们用拿到的数据来训练算法.
  • 最后一步,就是要学习 ML 的相关算法,以及工具 Scikit-learn.

1. 学习 Python

学习 Python 最简单粗暴的法子就是到 Codecademy 上去注册个账号来学习基础知识.一个被好多码农推荐的很经典的网站 LearnPythonTheHardWay. Byte of Python 这篇文章是非常值得去学习的. Python社区还为新手给出了一个 Python 学习资源列表. O’Reilley 出版的一本书 Think Python, 这里可以免费下载. 最后还有一个 Introduction to Python for Econometrics, Statistics and Data Analysis 也讲了好多 Python 的基础知识.

2. 导入模块

做机器学习很重要的几个模块和工具是 NumPy, Pandas, Matplotlib 和 IPython.Data Analysis with Open Source Tools 这本书里面都有涉及这些内容. 上面提到的 Introduction to Python for Econometrics, Statistics and Data Analysis 也涵盖了这些东西.还有一本书 Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython.下面还有一些免费的资源:

3. 爬取挖掘数据

一旦你掌握了 Python 的基础,下面就要学会怎么去爬取数据. 也就是网页爬虫. 像 Twitter 和 LinkedIn 这些网站都给出了 API s接口,让我们去获得文本数据.关于这方面下面有几本书不错的书: Mining the Social Web(免费), Web Scraping with PythonWeb Scraping with Python: Collecting Data from the Modern Web.

最后这些文本数据要由 NLP 技术处理成数值化数据:Natural language processing with Python . 图像和视频要用图像处理 CV,下面有几个不错的资源: Programming Computer Vision with Python(免费), Programming Computer Vision with Python: Tools and algorithms for analyzing imagesPractical Python and OpenCV .

Python 爬虫的一些例子:

4. 机器学习

机器学习可以分为四部分: 分类, 聚类, 回归和降维.


Machine learning in Python

Scikit-learn 官网上有很多指南,下面列一些其它的:

书:

机器学习相关的Blog和课程

在线课程: Collection of links . MOOC : machine learningData Analyst Nanodegree.
这里是一些Blog.

机器学习理论

书:

还有一些 Watch 15 hours theory of machine learning!

越看越懒得翻,着实没什么营养,索性直接列出资源.下面是美国麻省理工学院(MIT)博士林达华老师(ML大牛)推荐的书单.

Machine Learning

Pattern Recognition and Machine Learning

By Christopher M. Bishop
A new treatment of classic machine learning topics, such as classification, regression, and time series analysis from a Bayesian perspective. It is a must read for people who intends to perform research on Bayesian learning and probabilistic inference.

Graphical Models, Exponential Families, and Variational Inference

By Martin J. Wainwright and Michael I. Jordan
It is a comprehensive and brilliant presentation of three closely related subjects: graphical models, exponential families, and variational inference. This is the best manuscript that I have ever read on this subject. Strongly recommended to everyone interested in graphical models. The connections between various inference algorithms and convex optimization is clearly explained. Note: pdf version of this book is freely available online.

Big Data: A Revolution That Will Transform How We Live, Work, and Think

Viktor Mayer-Schonberger, and Kenneth Cukier
A short but insightful manuscript that will motivate you to rethink how we should face the explosive growth of data in the new century.

Statistical Pattern Recognition (2nd/3rd Edition)

By Andrew R. Webb, and Keith D. Copsey
A well written book on pattern recognition for beginners. It covers basic topics in this field, including discriminant analysis, decision trees, feature selection, and clustering -- all are basic knowledge that researchers in machine learning or pattern recognition should understand.

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

By Bernhard Schlkopf and Alexander J. Smola
A comprehensive and in-depth treatment of kernel methods and support vector machine. It not only clearly develops the mathematical foundation, namely the reproducing kernel Hilbert space, but also gives a lot of practical guidance (e.g. how to choose or design kernels.)

Mathematics

Topology (2nd Edition)

By James Munkres
A classic on topology for beginners. It provides a clear introduction of important concepts in general topology, such as continuity, connectedness, compactness, and metric spaces, which are the fundamentals that you have to grasped before embarking on more advanced subjects such as real analysis.

Introductory Functional Analysis with Applications

ByErwin Kreyszig
It is a very well written book on functional analysis that I would like to recommend to every one who would like to study this subject for the first time. Starting from simple notions such as metrics and norms, the book gradually unfolds the beauty of functional analysis, exposing important topics including Banach spaces, Hilbert spaces, and spectral theory with a reasonable depth and breadth. Most important concepts needed in machine learning are covered by this book. The exercises are of great help to reinforce your understanding.

Real Analysis and Probability (Cambridge Studies in Advanced Mathematics)

By R. M. Dudley
This is a dense text that combines Real analysis and modern probability theory in 500+ pages. What I like about this book is its treatment that emphasizes the interplay between real analysis and probability theory. Also the exposition of measure theory based on semi-rings gives a deep insight of the algebraic structure of measures.

Convex Optimization

By Stephen Boyd, and Lieven Vandenberghe
A classic on convex optimization. Everyone that I knew who had read this book liked it. The presentation style is very comfortable and inspiring, and it assumes only minimal prerequisite on linear algebra and calculus. Strongly recommended for any beginners on optimization. Note: the pdf of this book is freely available on the Prof. Boyd's website.

Nonlinear Programming (2nd Edition)

By Dimitri P. Bersekas
A thorough treatment of nonlinear optimization. It covers gradient-based techniques, Lagrange multiplier theory, and convex programming. Part of this book overlaps with Boyd's. Overall, it goes deeper and takes more efforts to read.

Introduction to Smooth Manifolds

By John M. Lee
This is the book that I used to learn differential geometry and Lie group theory. It provides a detailed introduction to basics of modern differential geometry -- manifolds, tangent spaces, and vector bundles. The connections between manifold theory and Lie group theory is also clearly explained. It also covers De Rham Cohomology and Lie algebra, where audience is invited to discover the beauty by linking geometry with algebra.

Modern Graph Theory

By Bela Bollobas
It is a modern treatment of this classical theory, which emphasizes the connections with other mathematical subjects -- for example, random walks and electrical networks. I found some messages conveyed by this book is enlightening for my research on machine learning methods.

Probability Theory: A Comprehensive Course (Universitext)

By Achim Klenke
This is a complete coverage of modern probability theory -- not only including traditional topics, such as measure theory, independence, and convergence theorems, but also introducing topics that are typically in textbooks on stochastic processes, such as Martingales, Markov chains, and Brownian motion, Poisson processes, and Stochastic differential equations. It is recommended as the main textbook on probability theory.

A First Course in Stochastic Processes (2nd Edition)

By Samuel Karlin, and Howard M. Taylor
A classic textbook on stochastic process which I think are particularly suitable for beginners without much background on measure theory. It provides a complete coverage of many important stochastic processes in an intuitive way. Its development of Markov processes and renewal processes is enlightening.

Poisson Processes (Oxford Studies in Probability)

By J. F. C. Kingman
If you are interested in Bayesian nonparametrics, this is the book that you should definitely check out. This manuscript provides an unparalleled introduction to random point processes, including Poisson and Cox processes, and their deep theoretical connections with complete randomness.

Programming

Structure and Interpretation of Computer Programs (2nd Edition)

By Harold Abelson, Gerald Jay Sussman, and Julie Sussman
Timeless classic that must be read by all computer science majors. While some topics and the use of Scheme as the teaching language seems odd at first glance, the presentation of fundamental concepts such as abstraction, recursion, and modularity is so beautiful and insightful that you would never experienced elsewhere.

Thinking in C++: Introduction to Standard C++ (2nd Edition)

By Bruce Eckel
While it is kind of old (written in 2000), I still recommend this book to all beginners to learn C++. The thoughts underlying object-oriented programming is very clearly explained. It also provides a comprehensive coverage of C++ in a well-tuned pace.

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition)

By Scott Meyers
The Effective C++ series by Scott Meyers is a must for anyone who is serious about C++ programming. The items (rules) listed in this book conveys the author's deep understanding of both C++ itself and modern software engineering principles. This edition reflects latest updates in C++ development, including generic programming the use of TR1 library.

Advanced C++ Metaprogramming

ByDavide Di Gennaro
Like it or hate it, meta-programming has played an increasingly important role in modern C++ development. If you asked what is the key aspects that distinguishes C++ from all other languages, I would say it is the unparalleled generic programming capability based on C++ templates. This book summarizes the latest advancement of metaprogramming in the past decade. I believe it will take the place of Loki's "Modern C++ Design" to become the bible for C++ meta-programming.

Introduction to Algorithms (2nd/3rd Edition)

By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein
If you know nothing about algorithms, you never understand computer science. This is book is definitely a classic on algorithms and data structures that everyone who is serious about computer science must read. This contents of this book ranges from elementary topics such as classic sorting algorithms and hash table to advanced topics such as maximum flow, linear programming, and computational geometry. It is a book for everyone. Everytime I read it, I learned something new.

Design Patterns: Elements of Reusable Object-Oriented Software

By Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
Textbooks on C++, Java, or other languages typically use toy examples (animals, students, etc) to illustrate the concept of OOP. This way, however, does not reflect the full strength of object oriented programming. This book, which has been widely acknowledged as a classic in software engineering, shows you, via compelling examples distilled from real world projects, how specific OOP patterns can vastly improve your code's reusability and extensibility.

Structured Parallel Programming: Patterns for Efficient Computation

By Michael McCool, James Reinders, and Arch Robison
Recent trends of hardware advancement has switched from increasing CPU frequencies to increasing the number of cores. A significant implication of this change is that "free lunch has come to an end" -- you have to explicitly parallelize your codes in order to benefit from the latest progress on CPU/GPUs. This book summarizes common patterns used in parallel programming, such as mapping, reduction, and pipelining -- all are very useful in writing parallel codes.

Introduction to High Performance Computing for Scientists and Engineers

By Georg Hager and Gerhard Wellein
This book covers important topics that you should know in developing high performance computing programs. Particularly, it introduces SIMD, memory hierarchies, OpenMP, and MPI. With these knowledges in mind, you understand what are the factors that might influence the run-time performance of your codes.

CUDA Programming: A Developer's Guide to Parallel Computing with GPUs

By Shane Cook
This book provides an in-depth coverage of important aspects related to CUDA programming -- a programming technique that can unleash the unparalleled power of GPU computation. With CUDA and an affordable GPU card, you can run your data analysis program in the matter of minutes which may otherwise require multiple servers to run for hours.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,324评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,303评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,192评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,555评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,569评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,566评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,927评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,583评论 0 257
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,827评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,590评论 2 320
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,669评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,365评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,941评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,928评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,159评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,880评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,399评论 2 342

推荐阅读更多精彩内容

  • 通常母亲的微笑是很多见的,但在我的生活中母亲的微笑就不多见,虽然有时也会有,但也是极少极少的偶然。 我很想...
    蛋蛋君语录阅读 396评论 0 1
  • 忍住不吼娃day1。停止说教
    宜傧Belinda阅读 224评论 0 0
  • 这一首诗用来写给你 纪念不存在的相遇 和风一样的离别 夜色从星空铺展开去 如水,枯草如灯 照见黑暗里憔悴的橄榄林 ...
    宋云帆阅读 316评论 1 1
  • 文/冬日暖阳 在一个班集体,总有一两个孩子特别喜欢惹是生非,有的甚至经常动手打人。这让老师和家长都极为头痛,批评教...
    帮得上阅读 278评论 4 4