2018 Forecasting exam

A1. Explain how the moving average method uses n observations to smooth time series data. What would be the difference in using n = 3 compared to n = 20?

The moving average method uses the last n observations in order to predict (observation t+1) or smooth

Moving average calculation

There are two differences in using n=3 and n=20: 1. the larger the n, the more smoothing the data is. That is to say, little fluctuation could we see in the data. we lose more futures in data points. 2. when the n= an odd number, we can choose centred moving average of order n, which is a weighted moving average method, and we can also choose not centred moving average. while n= a even number, we can simply use the famulation above to calculate the data value.

A2. Describe how simulated annealing works. Explain how the temperature variable and greed works in your answer.

The algorithm mimics the cooling of metallic solids from the liquid phase to increase the volume of crystals to make the metal “harder” and reduce the number of defects.The initial heat applied to the material forces its atoms to freely move in random directions (stochastic nature). As the cooling process occurs, the atom’s energy will slowly decrease resulting in a new formation.

Process of simulated annealing

Temperature variable and greed in SA

温度先设置成高温，原子可以随机游走，斯托克斯自然。当算法开始时，对于遇到的更差的解决方案，接受度会较高，随着温度逐渐降低，接受更差的解决方案的几率也会降低，最终会不再跳出这个局部最优。这样的算法能够减少其在高温时停留的概率，增加到寻找全局最优解的可能性。整个过程，目标函数时寻求最大解，温度只是其中的一个参数，用来计算接受邻近方案概率的。

When T is low, the probability of acceptance the next worse solution is low

A3. Explain what deep learning is and give examples of how it is being used.

Deep learning

packages: Tensorflow and Keras.

Convolution Neural Network is a subfield of deep learning. CNN was firstly used to solve problems of computer vision and pattern recognition and it has subsequently been shown to be effective for NLP (Natural Language Processing) and have achieved excellent results in many NLP tasks.

A4. List the conditional probabilities for the following Bayesian network:

Probabilities for Bayesian network

在知道B发生的情况下，E发生的概率

知道 A 和 B 同时发生的情况下， C 发生的概率

知道C发生的情况下， D 发生的概率

B1. Your manager has created a function to model a business process and wants to use a genetic algorithm to identify the minimum solution.

a. Describe how genetic algorithms work. Explain the crossover and

mutation stages in your answer giving examples with binary

encoding.

The general procedure of a Genetic Algorithm is as follows:

1. Define an end condition (time or number of iterations).

2. Generate a random population of chromosomes.

3. Evaluate fitness of each chromosome in the population.

4. Create a new population by repeating the following steps until a new population is complete:

• Select two parent chromosomes from the population according to their fitness.

• Crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring stochastically. There are a number of techniques which handle the crossover stage, however the most common method is binary encoding (从母体A和母体B 中前后各取一段，组成后代:Select a random cut off point and form a new offspring by merging one side of the cut point of parent A to the other side of the cut point of parent B: eg A:10001|011 and B:01101|110 produces offspring :10001110

• Randomly mutate the offspring.The mutation stage consists of a small alteration to the new offspring: For example: 10001110 mutates to 10101110 •The probability of this occurring to each individual bit of the chromosome is set by the decision-maker or analyst. Generally the probability is fixed to less than 0.1 (<10%). The level of the mutation probability denotes the stochastic nature of the algorithm.

• Place the offspring into the population.

5. Evaluate fitness of each chromosome in the population.

6. If the end condition is met, return the best solution(s) in the current

population.

b. Solve the function below with a genetic algorithm using the ga

function in the GA package in R. Use a real-valued optimisation

type and set the minimum input parameters as c(-10, -10) and the

maximum input parameters as c(10, 10).

Include the code used, a plot of the fitness value throughout the GA

generations, the summary output of the function and an explanation

of the summary output.

CODE HERE

# Define the business task function

B1 <- function(x)

{

sum <- sum(x^4 - 16 *x^2 +5*x)

return(-sum/2)

}

# load the GA package

library("GA")

# fit the B1 function into ga model and set parameters

GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10))

# look at the results

summary(GA)

plot(GA)

GA results HERE:

Iterations = 100

Fitness function value = 78.2173

Solution =

x1 x2

[1,] -2.821962 -2.916912

The summary of GA results show that afer 100 iteration, GA found a largest fitness function value with 78.2173 with the parameters : x1 (-2.82) and x2 ( -2.92). then, reverse the sign of the fitness value for GAs, we get the minimum solution : 78.21

c. Repeat the ga optimisation with the following custom parameters:

popSize = 100

pcrossover = 0.9

pmutation = 0.2

maxiter = 500

Include the code used and the summary output of the function.

Describe what these custom parameters are used for and how they

have affected the result in comparison to your previous answer.

CODE:

# fit the B1 function into ga model and modify parameters

GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10),popSize = 100,pcrossover = 0.9,pmutation = 0.2, maxiter = 500)

# look at the results

summary(GA)

plot(GA)

GA results:

Iterations = 500

Fitness function value = 78.33233

Solution =

x1 x2

[1,] -2.903774 -2.903451

EXPLANATION :

popSize= The population size.

pcrossover = The probability of crossover between pairs of chromosomes. Typically this is a large. DEFAULT IS 0.8, HERE IS 0.9

pmutation = The probability of mutation in a parent chromosome. Usually mutation occurs with a small probability, and by default is set to 0.1. HERE IS 0.2, it allows bigger mutation in population, and searched a better result

maxiter = The maximum number of iterations to run before the GA search is stopped. HERE IS 500, more iteration than before, a better result

Compared with previous result, this time we find a better result. The optimal solution increased 0.13 because this time we use a larger population size, a higher mutation rate and more iterations that increase the probability to find a better solution.

B2. Iveco has approached your consultancy company asking you to help them forecast the number of 35S12 vans sold in the UK in the next year. They have provided you with the quarterly time series sales data

from Q3 2008 to Q3 2017 (B2.csv) for the Iveco Daily 35S12 van.

a. Using the read.csv, ts and plot functions in R, import the data,

create a time series object then plot the time series object.

From looking at this plot, what can you say about the trend and

seasonality of the data? Include the plot in your answer.

Code :

#load the data set

data <- read.csv("2018B2.csv")

View(data)

# creat a time series object

iveco <- ts(data$Sales,start = c(2008,3),end = c(2017,3),frequency = 4)

# see the plot of time series data

plot (iveco)

From this plot, we can see that the trend is up before 2011 and then drop down dramatically. We can hardly see any seasonality of the data, maybe the seasonality is quite small.

b. Using the plot and stl functions in R, decompose the data with loess (additive) decomposition and explain what is shown in the plot.

Explain what the bars to the right of the plot represent.

# load "forecast" package

library("forecast")

#using loess decompose the data and set the seasonal window to periodic

lo <- stl(iveco, s.window = "periodic")

#take a look at the results

lo$time.series

plot(lo)

iveco decomposition

the bars indicate relative scale, large seasonal bar show that this variation is relatively small compared to data and trend.

In this plot, the first line shows the time series of the Iveco sales data. The following lines show seasonal, trend and reminder decomposition because we use the additive method here, so the sum of the last three is equal to the first line. The trend decomposition account for the largest part of the whole data and the seasonal decomposition is very small.

c. Using the ets() function in the forecast package in R, predict future sales using exponential smoothing for the next year (4 observations only). Set alpha so that you give more weight to more recent observations. Include an image of the forecast in your answer.

#using ets function to predict the sales of next year, set alpha a big value, more weight on recent value

fit <- ets(iveco[1:37],model = "ZZZ", alpha = 0.9 )

pre <- predict(fit, h=4)

plot(pre)

line(iveco[1:37])

B3. HR has approached you to help them study your company’s

employees. They have provided you with a dataset (B3.csv) with the

following 6 columns about 14,999 employees:

satisfaction_level: Satisfaction Level

last_evaluation: Last evaluation

number_project: Number of projects

average_montly_hours: Average monthly hours

time_spend_company: Time spent at the company

Work_accidents: Number of accidents the employee

has had at work

a. Describe the differences between Principal Components Analysis

(PCA) and Exploratory Factor Analysis (EFA).

pca 是降低变量间线性相关性的方法， EFA是寻找导致变量发生的因素的方法

b. Using read.csv and the corrgram function from the corrgram

package, import the data and create a correlogram plot of the 6

measurements of the employees. Discuss the suitability of the data

for PCA and include an image of the plot in your answer.

CODE：

pdata <- read.csv("2018B3.csv")

View(pdata)

# load corrgram package

library("corrgram")

corrgram(pdata)

The PCA method is suitable for reducing a large number of correlated variables. As we can see in the corrgram plot, the blue means variables are positive correlated, while red means negative. A darker colour means these two variables are highly correlated. last_evaluation, number_project and average_montly_hours are highly positive correlated, while satisfiction_level and number_project are highly negative correlated. We could use PCA method to reduce those correlated variables.

c. Using the plot and prcomp functions in R, plot a scree plot and

describe how you can use this plot to identify the number of

components to use in principal components analysis. Include the

scree plot in your answer.

code:

# fit the data into pca

hrpca <- prcomp(pdata, scale = TRUE)

plot(hrpca, type= "line", main = "scree plot")

A scree plot displays how much variation each principal component captures from the data. we can use the following rules

• Kaisers rule states to use components with values over 1.

• use the "elbow rule"

• Proportion of variance plot: the selected PCs should be able to describe at least 80% of the variance.

d. Using the prcomp function in R, use principal components analysis

on the data. Include and describe the results of the analysis.

Discuss the loadings and how appropriate it would be to use two

components.

code:

hrpca

summary(hrpca)

Result:

Importance of components:

PC1 PC2 PC3 PC4 PC5

Standard deviation 1.353 1.0534 1.0000 0.9362 0.7968

Proportion of Variance 0.305 0.1849 0.1667 0.1461 0.1058

Cumulative Proportion 0.305 0.4899 0.6566 0.8027 0.9085

Rotation (n x k) = (6 x 6):

PC1 PC2 PC3

satisfaction_level 0.08693115 -0.82848859 0.08271569

last_evaluation -0.50728391 -0.36995575 0.01296449

number_project -0.57900111 0.11114716 -0.03199330

average_montly_hours -0.54922118 -0.12501818 -0.00810438

time_spend_company -0.31310859 0.38036651 0.03235213

Work_accidents -0.01352139 0.06385507 0.99541656

PC4 PC5 PC6

satisfaction_level -0.37912166 0.272273055 -0.285204994

last_evaluation -0.04769970 -0.714195147 0.305414036

number_project 0.20810048 0.005747078 -0.779770228

average_montly_hours 0.25387813 0.635654862 0.462763070

time_spend_company -0.86109751 0.107569645 0.056369470

Work_accidents 0.06886704 -0.011459253 -0.003404899

The result shows that:

PC1 and PC2 together only can explain 48% of all variables. The selected PCs should be able to describe at least 80% of the variance. So, they are not enough.

last_evaluation,number_project and average_montly_hours have more weights on PC1 and satisfaction_level weight more on PC2

2018 Forecasting exam

推荐阅读更多精彩内容