A1. Explain how the moving average method uses n observations to smooth time series data. What would be the difference in using n = 3 compared to n = 20?
The moving average method uses the last n observations in order to predict (observation t+1) or smooth
There are two differences in using n=3 and n=20: 1. the larger the n, the more smoothing the data is. That is to say, little fluctuation could we see in the data. we lose more futures in data points. 2. when the n= an odd number, we can choose centred moving average of order n, which is a weighted moving average method, and we can also choose not centred moving average. while n= a even number, we can simply use the famulation above to calculate the data value.
A2. Describe how simulated annealing works. Explain how the temperature variable and greed works in your answer.
The algorithm mimics the cooling of metallic solids from the liquid phase to increase the volume of crystals to make the metal “harder” and reduce the number of defects.The initial heat applied to the material forces its atoms to freely move in random directions (stochastic nature). As the cooling process occurs, the atom’s energy will slowly decrease resulting in a new formation.
温度先设置成高温,原子可以随机游走,斯托克斯自然。当算法开始时,对于遇到的更差的解决方案,接受度会较高,随着温度逐渐降低,接受更差的解决方案的几率也会降低,最终会不再跳出这个局部最优。这样的算法能够减少其在高温时停留的概率,增加到寻找全局最优解的可能性。整个过程,目标函数时寻求最大解,温度只是其中的一个参数,用来计算接受邻近方案概率的。
A3. Explain what deep learning is and give examples of how it is being used.
packages: Tensorflow and Keras.
Convolution Neural Network is a subfield of deep learning. CNN was firstly used to solve problems of computer vision and pattern recognition and it has subsequently been shown to be effective for NLP (Natural Language Processing) and have achieved excellent results in many NLP tasks.
A4. List the conditional probabilities for the following Bayesian network:
在知道B发生的情况下,E发生的概率
知道 A 和 B 同时发生的情况下, C 发生的概率
知道C发生的情况下, D 发生的概率
B1. Your manager has created a function to model a business process and wants to use a genetic algorithm to identify the minimum solution.
a. Describe how genetic algorithms work. Explain the crossover and
mutation stages in your answer giving examples with binary
encoding.
The general procedure of a Genetic Algorithm is as follows:
1. Define an end condition (time or number of iterations).
2. Generate a random population of chromosomes.
3. Evaluate fitness of each chromosome in the population.
4. Create a new population by repeating the following steps until a new population is complete:
• Select two parent chromosomes from the population according to their fitness.
• Crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring stochastically. There are a number of techniques which handle the crossover stage, however the most common method is binary encoding (从母体A和母体B 中前后各取一段,组成后代:Select a random cut off point and form a new offspring by merging one side of the cut point of parent A to the other side of the cut point of parent B: eg A:10001|011 and B:01101|110 produces offspring :10001110
• Randomly mutate the offspring.The mutation stage consists of a small alteration to the new offspring: For example: 10001110 mutates to 10101110 •The probability of this occurring to each individual bit of the chromosome is set by the decision-maker or analyst. Generally the probability is fixed to less than 0.1 (<10%). The level of the mutation probability denotes the stochastic nature of the algorithm.
• Place the offspring into the population.
5. Evaluate fitness of each chromosome in the population.
6. If the end condition is met, return the best solution(s) in the current
population.
b. Solve the function below with a genetic algorithm using the ga
function in the GA package in R. Use a real-valued optimisation
type and set the minimum input parameters as c(-10, -10) and the
maximum input parameters as c(10, 10).
Include the code used, a plot of the fitness value throughout the GA
generations, the summary output of the function and an explanation
of the summary output.
CODE HERE
# Define the business task function
B1 <- function(x)
{
sum <- sum(x^4 - 16 *x^2 +5*x)
return(-sum/2)
}
# load the GA package
library("GA")
# fit the B1 function into ga model and set parameters
GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10))
# look at the results
summary(GA)
plot(GA)
GA results HERE:
Iterations = 100
Fitness function value = 78.2173
Solution =
x1 x2
[1,] -2.821962 -2.916912
The summary of GA results show that afer 100 iteration, GA found a largest fitness function value with 78.2173 with the parameters : x1 (-2.82) and x2 ( -2.92). then, reverse the sign of the fitness value for GAs, we get the minimum solution : 78.21
c. Repeat the ga optimisation with the following custom parameters:
popSize = 100
pcrossover = 0.9
pmutation = 0.2
maxiter = 500
Include the code used and the summary output of the function.
Describe what these custom parameters are used for and how they
have affected the result in comparison to your previous answer.
CODE:
# fit the B1 function into ga model and modify parameters
GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10),popSize = 100,pcrossover = 0.9,pmutation = 0.2, maxiter = 500)
# look at the results
summary(GA)
plot(GA)
GA results:
Iterations = 500
Fitness function value = 78.33233
Solution =
x1 x2
[1,] -2.903774 -2.903451
EXPLANATION :
popSize= The population size.
pcrossover = The probability of crossover between pairs of chromosomes. Typically this is a large. DEFAULT IS 0.8, HERE IS 0.9
pmutation = The probability of mutation in a parent chromosome. Usually mutation occurs with a small probability, and by default is set to 0.1. HERE IS 0.2, it allows bigger mutation in population, and searched a better result
maxiter = The maximum number of iterations to run before the GA search is stopped. HERE IS 500, more iteration than before, a better result
Compared with previous result, this time we find a better result. The optimal solution increased 0.13 because this time we use a larger population size, a higher mutation rate and more iterations that increase the probability to find a better solution.
B2. Iveco has approached your consultancy company asking you to help them forecast the number of 35S12 vans sold in the UK in the next year. They have provided you with the quarterly time series sales data
from Q3 2008 to Q3 2017 (B2.csv) for the Iveco Daily 35S12 van.
a. Using the read.csv, ts and plot functions in R, import the data,
create a time series object then plot the time series object.
From looking at this plot, what can you say about the trend and
seasonality of the data? Include the plot in your answer.
Code :
#load the data set
data <- read.csv("2018B2.csv")
View(data)
# creat a time series object
iveco <- ts(data$Sales,start = c(2008,3),end = c(2017,3),frequency = 4)
# see the plot of time series data
plot (iveco)
From this plot, we can see that the trend is up before 2011 and then drop down dramatically. We can hardly see any seasonality of the data, maybe the seasonality is quite small.
b. Using the plot and stl functions in R, decompose the data with loess (additive) decomposition and explain what is shown in the plot.
Explain what the bars to the right of the plot represent.
# load "forecast" package
library("forecast")
#using loess decompose the data and set the seasonal window to periodic
lo <- stl(iveco, s.window = "periodic")
#take a look at the results
lo$time.series
plot(lo)
the bars indicate relative scale, large seasonal bar show that this variation is relatively small compared to data and trend.
In this plot, the first line shows the time series of the Iveco sales data. The following lines show seasonal, trend and reminder decomposition because we use the additive method here, so the sum of the last three is equal to the first line. The trend decomposition account for the largest part of the whole data and the seasonal decomposition is very small.
c. Using the ets() function in the forecast package in R, predict future sales using exponential smoothing for the next year (4 observations only). Set alpha so that you give more weight to more recent observations. Include an image of the forecast in your answer.
#using ets function to predict the sales of next year, set alpha a big value, more weight on recent value
fit <- ets(iveco[1:37],model = "ZZZ", alpha = 0.9 )
pre <- predict(fit, h=4)
plot(pre)
line(iveco[1:37])
B3. HR has approached you to help them study your company’s
employees. They have provided you with a dataset (B3.csv) with the
following 6 columns about 14,999 employees:
satisfaction_level: Satisfaction Level
last_evaluation: Last evaluation
number_project: Number of projects
average_montly_hours: Average monthly hours
time_spend_company: Time spent at the company
Work_accidents: Number of accidents the employee
has had at work
a. Describe the differences between Principal Components Analysis
(PCA) and Exploratory Factor Analysis (EFA).
pca 是降低变量间线性相关性的方法, EFA是寻找导致变量发生的因素的方法
b. Using read.csv and the corrgram function from the corrgram
package, import the data and create a correlogram plot of the 6
measurements of the employees. Discuss the suitability of the data
for PCA and include an image of the plot in your answer.
CODE:
pdata <- read.csv("2018B3.csv")
View(pdata)
# load corrgram package
library("corrgram")
corrgram(pdata)
The PCA method is suitable for reducing a large number of correlated variables. As we can see in the corrgram plot, the blue means variables are positive correlated, while red means negative. A darker colour means these two variables are highly correlated. last_evaluation, number_project and average_montly_hours are highly positive correlated, while satisfiction_level and number_project are highly negative correlated. We could use PCA method to reduce those correlated variables.
c. Using the plot and prcomp functions in R, plot a scree plot and
describe how you can use this plot to identify the number of
components to use in principal components analysis. Include the
scree plot in your answer.
code:
# fit the data into pca
hrpca <- prcomp(pdata, scale = TRUE)
plot(hrpca, type= "line", main = "scree plot")
A scree plot displays how much variation each principal component captures from the data. we can use the following rules
• Kaisers rule states to use components with values over 1.
• use the "elbow rule"
• Proportion of variance plot: the selected PCs should be able to describe at least 80% of the variance.
d. Using the prcomp function in R, use principal components analysis
on the data. Include and describe the results of the analysis.
Discuss the loadings and how appropriate it would be to use two
components.
code:
hrpca
summary(hrpca)
Result:
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.353 1.0534 1.0000 0.9362 0.7968
Proportion of Variance 0.305 0.1849 0.1667 0.1461 0.1058
Cumulative Proportion 0.305 0.4899 0.6566 0.8027 0.9085
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3
satisfaction_level 0.08693115 -0.82848859 0.08271569
last_evaluation -0.50728391 -0.36995575 0.01296449
number_project -0.57900111 0.11114716 -0.03199330
average_montly_hours -0.54922118 -0.12501818 -0.00810438
time_spend_company -0.31310859 0.38036651 0.03235213
Work_accidents -0.01352139 0.06385507 0.99541656
PC4 PC5 PC6
satisfaction_level -0.37912166 0.272273055 -0.285204994
last_evaluation -0.04769970 -0.714195147 0.305414036
number_project 0.20810048 0.005747078 -0.779770228
average_montly_hours 0.25387813 0.635654862 0.462763070
time_spend_company -0.86109751 0.107569645 0.056369470
Work_accidents 0.06886704 -0.011459253 -0.003404899
The result shows that:
PC1 and PC2 together only can explain 48% of all variables. The selected PCs should be able to describe at least 80% of the variance. So, they are not enough.
last_evaluation,number_project and average_montly_hours have more weights on PC1 and satisfaction_level weight more on PC2