Q: What is cross validation? How to do it right?
A:
Cross-validation is a technique to evaluate predictive models and estimate how accurately it will perform in practice, by partitioning the original sample into a training set and a validation set.
K-fold CV Steps:
1. Split dataset into training dataset and test dataset;
2. Leave test dataset aside and partition training dataset equally into k set;
3. For k = 1,2,...,K, fit the model with (k-1) sets and calculate the test error rate with k-th set, repeat this step for k times;
4. Calculate the average of prediction errors calculated by validation dataset, and take it as the estimate of model performance;
5. Select the model with lowest prediction error and train the model on the whole training dataset.
Interview questions are from DataAppLab (Wechat: Datalaus)
Jun.28th, 2017 Seattle