Machine Learning in a Week-机器学习一周入门实践

Machine Learning in a Week


Getting into machine learning(ml) can seem like an unachievable task from the outside.


And it definitely can be, if you attack it from the wrong end.


However, after dedicating one week to learning the basics of the subject, I found it to be much more accessible than I anticipated.


This article is intended to give others who’re interested in getting into ml a roadmap of how to get started, drawing from the experiences I made in my intro week.




Before my machine learning week, I had been reading about the subject for a while, and had gone through half of Andrew Ng’s course on Coursera and a few other theoretical courses. So I had a tiny bit of conceptual understanding of ml, though I was completely unable to transfer any of my knowledge into code. This is what I want to change.

在我开始机器学习的这一周前,我已经阅读过这个学科的内容有一阵并在Coursera上学习了一半Andrew Ng的课程以及一些其他相关的理论课程。所以我已经有一些机器学习基础概念的了解。我并没有将这些知识应用并转化为代码,这也是我为啥想要改变并开始新的一周的原因。

I wanted to be able to solve problems with ml by the end of the week, even though this meant skipping a lot of fundamentals,and going for a top-down approach, instead of bottoms up.


After asking for advice on Hacker News, I came to the conclusion that Python’s Scikit Learn-Module was the best starting point. This module gives you a wealth of algorithms to choose from, reducing the actual machine learning to a few lines of code.

通过在Hacker News上寻求的建议,我了解到Python Scikit learn模块是一个最好开始的一个点,这个模块给我们提供了一系列算法实践,让我们可以通过很少的代码来调用这些算法,用于处理实际的机器学习任务。

Monday: Learning some practicalities


I started off the week by looking for video tutorials which involved Scikit learn. I finally landed on Sentdex’s tutorial on how to use ml for investing in stocks, which gave me the necessary knowledge to move on to the next step.


The good thing about the Sentdex tutorials that the instructor takes you through all the steps of gathering the data.As you go along, you realize that fetching and cleaning up the data can be much more time consuming than doing the actually machine learning. So the ability to write scripts to scrape data from files or crawl the web are essential skills for aspiring machine learning learning geeks.


I have re-watched several of the videos later on, to help me when I’ve been stuck with problem, so I’d recommend you to do the same.


However, if you already know how to scrape data from websites, this tutorial might not be the perfect fit, as a lot of the videos evolve around data fetching. In that case, the Udacity’s Intro to Machine Learning might be a better place to start

另外,如果你已经具备了从网上收集数据的技能,这个教程可能并没有能特别适合你,不过关于数据抓取的视频教程晚上还有很多。真那样的话,Udacity’s Intro to Machine Learning应该会是个更好的开始。

Tuesday: Applying it to a real problem


Tuesday I wanted to see if I could use what I had learned to solve an actual problem. As another developer in my coding cooperative was working on Bank of England’s data visualization competition, I teamed up with him to check out the datasets the bank has released. The most interesting data was their household surveys. This is an annual survey the bank perform on a few thousand households, regarding money related subjects.


The Problem we decided to solve was the following:


Given a person education level, age and income, can the computer predict its gender?


I Played around with the dataset, spent a few hours cleaning up the data, and used the Scikit Learn map to find a suitable algorithm for the problem.

我开始和这些数据集打交道,花了几小时的时间来清洗数据,然后在Scikit Learn map中找到一个合适的算法来解决上述问题。

We ended up with a success ratio at around63%, which isn’t impressive at all. But the machine did at least manage to guess a little better than flipping a coin, which would have given a success rate at 50%.


Seeing results is like fuel to your motivation, so I’d recommend you doing this for yourself, once you have a basic grasp of how to use Scikit Learn

看到结果能点燃你的激情,所以我推荐你自己亲手完成这个过程,这样你会让你对Scikit learn有一个直观的把握。

It’s a pivotal moment when you realize that you can start using ml to solve in real life problems.


Wednesday: From the ground up


After playing around with various Scikit Learn modules, I decided to try and write linear regression algorithm from the ground up.

当我已经玩过了Scikit learn不同的模型,我决定尝试自己重头写一个线性回归算法。

I wanted to do this, because I felt (and still feel) that I really don’t understand what’s happening on under the hood.


Luckily, the Courera course goes into detail on how a few of the algorithms work, which came to great use at great use at this point. More specifically, ti describes the underlying concepts of using linear regressing with gradient descent.


This has definitely been the most effective of learning technique, as it forces you to understand the steps that are going on ‘under the hood’. I strongly recommend you to do this at some point.


I plan to rewrite my own implementations of more complex algorithms as I go along, but I prefer doing this after I’ve played around with the respective algorithms in Scikit Learn.

我计划重写更多复杂的算法实践,不过当前我更需要在我完全掌握应用Scikit Learn中各个算法,所以我计划以后再去完成算法的重写。

Thursday: Start competing


On Thursday, I started doing Kaggle’s introductory tutorials. Kaggle is a platform for machine learning competitions,where you can submit solutions to problems released by companies or organizations.


I recommend you trying out Kaggle after having a little bit of a theoretical and practical understanding of machine learning. You’ll need this in order to start using Kaggle. Otherwise, it will be more frustrating than rewarding.


The Bag of Words tutorial guides you through every steps you need to take in order to enter a submission to a competition, plus gives you a brief and exciting introduction into natural language Processing(NLP). I ended the tutorial with much higher interest in NLP than I had when entering it.


Friday: Back to school


Friday, I continued working on the Kaggle tutorials, and also started Udacity’s Intro to Machine Learning. I’m currently half ways through, and find it quite enjoyable.

周五,我继续把时间花在Kaggle上,也开始了学习Udacity’s Intro to Machine Learning课程,现在已经完成了一半的学习,我发现里面有很多有意思的东西。

It’s a lot easier the Coursera course, as it doesn’t go in depth in the algorithms. But it’s also more practical, as it teaches you Scikit Learn, which is a whole lot easier to apply to the real world than writing algorithms from the ground up in Octave, as you do in the Coursera course.

在Coursera的课程中有很多相对更简单的课程,并没有详细深入的介绍这些算法。相对来说,更多的Scikit Learn相关的练习,这些联系比起从Octave上从头开始写一个算法来说更容易在现实中得到应用。

The road ahead


Doing it for a week hasn’t just been great fun, it has also helped my awareness of its usefulness of machine learning in society. The more I learn about it, the more I see which areas it can be used to solve problems.


If you’re interested in getting into machine learning, I strongly recommend you setting off a few days or evenings and simply dive into it.


Choose a top down approach if you’re not ready for the heavy stuff, and get into problem solving as quickly as possible.


Good luck


