The linear regression algorithm
we begin our exploration of machine learning techniques with the linear regression algorithm. Our goal is to build a model by which to predict the values of a dependent variable from the values of one or more independent variables.
The relationship between these two variables is linear; that is, if y
is the dependent variable and x
the independent, then the linear relationship between the two variables will look like this: y = Ax + b
.
The linear regression algorithm adapts to a great variety of situations; for its versatility, it is used extensively in the field of applied sciences, for example, biology and economics.
Furthermore, the implementation of this algorithm allows us to introduce in a totally clear and understandable way the two important concepts of machine learning: the cost function and the gradient descent algorithms.
Data model
The first crucial step is to build our data model. We mentioned earlier that the relationship between our variables is linear, that is: y = Ax + b
, where A
and b
are constants. To test our algorithm, we need data points in a two-dimensional space.
We start by importing the Python library NumPy:
import numpy as np
Then we define the number of points we want to draw:
number_of_points = 500
We initialize the following two lists:
x_point = []
y_point = []
These points will contain the generated points.
We then set the two constants that will appear in the linear relation of y
with x
:
a = 0.22
b = 0.78
Via NumPy's random.normal
function, we generate 300 random points around the regression equation y = 0.22x + 0.78
:
for i in range(number_of_points):
x = np.random.normal(0.0,0.5)
y = a*x + b +np.random.normal(0.0,0.1)
x_point.append([x])
y_point.append([y])
Finally, view the generated points by matplotlib
:
import matplotlib.pyplot as plt
plt.plot(x_point,y_point, 'o', label='Input Data')
plt.legend()
plt.show()
Cost functions and gradient descent
The machine learning algorithm that we want to implement with TensorFlow must predict values of y
as a function of x
data according to our data model. The linear regression algorithm will determine the values of the constants A
and b
(fixed for our data model), which are then the true unknowns of the problem.
The first step is to import the tensorflow library:
import tensorflow as tf
Then define the A
and b
unknowns, using the TensorFlow tf.Variable
:
A = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
The unknown factor A
was initialized using a random value between -1
and 1
, while the variable b
is initially set to zero:
b = tf.Variable(tf.zeros([1]))
So we write the linear relationship that binds y
to x
:
y = A * x_point + b
Now we will introduce, this cost function: that has parameters containing a pair of values A
and b
to be determined which returns a value that estimates how well the parameters are correct. In this example, our cost function is mean square error:
cost_function = tf.reduce_mean(tf.square(y - y_point))
It provides an estimate of the variability of the measures, or more precisely, of the dispersion of values around the average value; a small value of this function corresponds to a best estimate for the unknown parameters A
and b
.
To minimize cost_function
, we use an optimization algorithm with the gradient descent. Given a mathematical function of several variables, gradient descent allows to find a local minimum of this function. The technique is as follows:
Evaluate, at an arbitrary first point of the function's domain, the function itself and its gradient. The gradient indicates the direction in which the function tends to a minimum.
Select a second point in the direction indicated by the gradient. If the function for this second point has a value lower than the value calculated at the first point, the descent can continue.
You can refer to the following figure for a visual explanation of the algorithm:
We also remark that the gradient descent is only a local function minimum, but it can also be used in the search for a global minimum, randomly choosing a new start point once it has found a local minimum and repeating the process many times. If the number of minima of the function is limited, and there are very high number of attempts, then there is a good chance that sooner or later the global minimum will be identified.
Using TensorFlow, the application of this algorithm is very simple. The instruction are as follows:
optimizer = tf.train.GradientDescentOptimizer(0.5)
Here 0.5
is the learning rate of the algorithm.
The learning rate determines how fast or slow we move towards the optimal weights. If it is very large, we skip the optimal solution, and if it is too small, we need too many iterations to converge to the best values.
An intermediate value (0.5
) is provided, but it must be tuned in order to improve the performance of the entire procedure.
We define train
as the result of the application of the cost_function
(optimizer
), through its minimize
function:
train = optimizer.minimize(cost_function)
Testing the model
Now we can test the algorithm of gradient descent on the data model you created earlier. As usual, we have to initialize all the variables:
model = tf.initialize_all_variables()
So we build our iteration (20 computation steps), allowing us to determine the best values of A and b, which define the line that best fits the data model. Instantiate the evaluation graph:
with tf.Session() as session:
We perform the simulation on our model:
session.run(model) for step in range(0,21):
For each iteration, we execute the optimization step:
session.run(train)
Every five steps, we print our pattern of dots:
if (step % 5) == 0:
plt.plot(x_point,y_point,'o',
label='step = {}'
.format(step))
And the straight lines are obtained by the following command:
plt.plot(x_point,
session.run(A) *
x_point +
session.run(B))
plt.legend()
plt.show()
The following figure shows the convergence of the implemented algorithm:
Linear regression : start computation (step = 0)
After just five steps, we can already see (in the next figure) a substantial improvement in the fit of the line:
The following (and final) figure shows the definitive result after 20 steps. We can see the efficiency of the algorithm used, with the straight line efficiency perfectly across the cloud of points.
Finally we report, to further our understanding, the complete code:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
number_of_points = 200
x_point = []
y_point = []
a = 0.22
b = 0.78
for i in range(number_of_points):
x = np.random.normal(0.0,0.5)
y = a*x + b +np.random.normal(0.0,0.1)
x_point.append([x])
y_point.append([y])
plt.plot(x_point,y_point, 'o', label='Input Data')
plt.legend()
plt.show()
A = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
B = tf.Variable(tf.zeros([1]))
y = A * x_point + B
cost_function = tf.reduce_mean(tf.square(y - y_point))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(cost_function)
model = tf.initialize_all_variables()
with tf.Session() as session:
session.run(model)
for step in range(0,21):
session.run(train)
if (step % 5) == 0:
plt.plot(x_point,y_point,'o',
label='step = {}'
.format(step))
plt.plot(x_point,
session.run(A) *
x_point +
session.run(B))
plt.legend()
plt.show()