Make Your Own Neural Network

Introduction

I will have failed if I haven’t shown you how school level mathematics and simple computer recipes can be incredibly powerful - by making our own artificial intelligence mimicking the learning ability of human brains.

Part 1 - How They Work

A human may find it hard to do large sums very quickly but the process of doing it doesn’t require much intelligence at all.
We can process the quite large amount of information that the images contain, and very successfully process it to recognise what’s in the image. This kind of task isn’t easy for computers - in fact it’s incredibly difficult.

When we don’t know exactly how something works we can try to estimate it with a model which includes parameters which we can adjust. If we didn’t know how to convert kilometres to miles, we might use a linear function as a model, with an adjustable gradient.
A good way of refining these models is to adjust the parameters based on how wrong the model is compared to known true examples.
建立含参模型→猜测初始参数值→根据与已知数据集的误差修正参数（误差越大，修正越大）→循环修正过程直至达到误差要求。
从神经网络的学习过程可见完美主义以及害怕犯错的问题，即世上并无完美之事，而错误让我们知道离正确有多远。
Visualising data is often very helpful to get a better understand of training data, a feel for it, which isn’t easy to get just by looking at a list or table of numbers.
We want to use the error to inform the required change in parameter
We moderate the updates.
This way we move in the direction that the training example suggests, but do so slightly cautiously, keeping some of the previous value which was arrived at through potentially many previous training iterations.
The moderation can dampen the impact of those errors or noise.
The moderating factor is often called a learning rate.
神经网络中使用的学习速率告诉我们，学习时用力过猛会导致前面学过的内容被洗掉，以及犯了错误以后不要矫枉过正。
Traditional computers processed data very much sequentially, and in pretty exact concrete terms. There is no fuzziness or ambiguity about their cold hard calculations. Animal brains, on the other hand, although apparently running at much slower rhythms, seemed to process signals in parallel, and fuzziness was a feature of their computation.
Observations suggest that neurons don’t react readily, but instead suppress the input until it has grown so large that it triggers an output. You can think of this as a threshold that must be reached before any output is produced.
The sigmoid function is much easier to do calculations with than other S-shaped functions.
Interestingly, if only one of the several inputs is large and the rest small, this may be enough to fire the neuron. What’s more, the neuron can fire if some of the inputs are individually almost, but not quite, large enough because when combined the signal is large enough to overcome the threshold. In an intuitive way, this gives you a sense of the more sophisticated, and in a sense fuzzy, calculations that such neurons can do.
It is the weights that do the learning in a neural networks as they are iteratively refined to give better and better results.
The many calculations needed to feed a signal forward through a neural network can be expressed as matrix multiplication.

We’re using the weights in two ways. Firstly we use the weights to propagate signals forward from the input to the output layers in a neural network. Secondly we use the weights to propagate the error backwards from the output back into the network. It is called backpropagation.
Trying to vectorise the process: Being able to express a lot of calculations in matrix form makes it more concise to write down, and also allows computers to do all that work much more efficiently.
A matrix approach to propagating the errors back:

Gradient descent is a really good way of working out the minimum of a function.
To avoid ending up in the wrong valley, or function minimum, we train neural networks several times starting from different starting link weights.
The final answer that describes the slope of the error function, $E=(target - actual)^2$ , so we can adjust the weight $w_{jk}$ :

This is the key to training neural networks.
It’s worth a second look, and the colour coding helps show each part. The first part is simply the $(target - actual)$ error. The sum expression inside the sigmoids is simply the signal into the final layer node. It’s just the signal into a node before the activation squashing function is applied. That last part is the output from the previous hidden layer node $j$ .
The slope of the error function for the weights between the input and hidden layers:

The updated weight $w_{jk}$ is the old weight adjusted by the negative of the error slope with a learning rate $\alpha$ :

如何根据偏差的平方和对于权重参数的梯度变化，来调整权重参数值。

A very flat activation function is problematic because we use the gradient to learn new weights.
To avoid saturating a neural network, we should try to keep the inputs small.
We shouldn’t make it too small either, because the gradient also depends on the incoming signal ( $o_j$ ).
A good recommendation is to rescale inputs into the range 0.0 to 1.0. Some will add a small offset to the inputs, like 0.01.
The weights are initialised randomly sampling from a range that is roughly the inverse of the square root of the number of links into a node. So if each node has 3 links into it, the initial weights should be in the range 1/(√3) = 0.577. If each node has 100 incoming links, the weights should be in the range 1/(√100) = 0.1.
This is sampling from a normal distribution with mean zero and a standard deviation which is the inverse of the square root of the number of links into a node.
This assumes quite a few things which may not be true, such as an activation function like the alternative tanh() and a specific distribution of the input signals.

Part 2 - DIY with Python

Let’s sketch out what a neural network class should look like. We know it should have at least three functions:
initialisation - to set the number of input, hidden and output nodes
train - refine the weights after being given a training set example to learn from
query - give an answer from the output nodes after being given an input

# neural network class definition
class neuralNetwork:
   
    # initialise the neural network
    def __init__():
        pass
   
    # train the neural network
    def train():
        pass
   
    # query the neural network
    def query():
        pass

Good programmers, computer scientists and mathematicians, try to create general code rather than specific code whenever they can.
A good technique to start small and grow code, finding and fixing problems along the way:

    # initialise the neural network
    def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
        # set number of nodes in each input, hidden, output layer
        self.inodes = inputnodes
        self.hnodes = hiddennodes
        self.onodes = outputnodes
        # learning rate
        self.lr = learningrate
        pass

# number of input, hidden and output nodes
input_nodes = 3
hidden_nodes = 3
output_nodes = 3
# learning rate is 0.3
learning_rate = 0.3
# create instance of neural network
n = neuralNetwork(input_nodes,hidden_nodes,output_nodes, learning_rate)

Neural networks should find features or patterns in the input which can be expressed in a shorter form than the input itself. So by choosing a value smaller than the number of inputs, we force the network to try to summarise the key features. However if we choose too few hidden layer nodes, then we restrict the ability of the network to find sufficient features or patterns. We’d be taking away its ability to express its own understanding of the training data.
There isn’t a perfect method for choosing how many hidden nodes there should be for a problem. Indeed there isn’t a perfect method for choosing the number of hidden layers either. The best approaches, for now, are to experiment until you find a good configuration for the problem you’re trying to solve.
Overfitting is something to beware of across many different kinds of machine learning, not just neural networks.
神经网络只是机器学习的一种。
过度学习会导致对新事物的接受度下降，变得顽固。
Neural network learning is a random process at heart and can sometimes not work so well, and sometimes work really badly.
Do the testing experiment many times for each combination of learning rates and epochs to minimise the effect of randomness that is inherent in gradient descent.
The hidden layers are where the learning happens. Actually, it’s the link weights before and after the hidden nodes that do the learning.
You can’t learn more than the learning capacity, but you can change the network shape to increase the capacity.
问题：怎么修改代码，以设置隐藏层数和每个隐藏层的节点数？

最后编辑于：2020.10.16 08:46:23

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,242评论 5赞 459
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,769评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,484评论 0赞 319
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,133评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 61,007评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,080评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,496评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,190评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,464评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,549评论 2赞 309
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,330评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,205评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,567评论 3赞 298
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,889评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,160评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,475评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,650评论 2赞 335

Make Your Own Neural Network

Introduction

Part 1 - How They Work

Part 2 - DIY with Python

推荐阅读更多精彩内容