Check the following things when training any type of deep neural network:
-
the data used to calculate training accuracy is not identical to the data used to train your NN. This sounds weird, but possible in practice, especially in case of images, if you don't keep track of what is happening. For example, you train on random patches of images and calculate training accuracy on
random
patches of same images. It is easy to forget that though they are same images, the patches are randomly selected.
More than the values of train and val accuracy, I would be concerned about what you said, "i'm copy pasting a random epoch but all are roughly the same". No, they can't be same. Accuracy at different epochs is mostly different, because network is learning so it is constantly changing its weights. If accuracy goes up then that means it is approaching the minima of the loss function.
I think you should be more concerned about getting a low training accuracy instead of getting a lower training accuracy than the validation accuracy.
-
Do all the sanity checks given
here. Read the entire article if possible, it's very good.
Make sure you are doing pre-processing in the right manner. For example, make sure that mean over entire training data is zero. For testing data, subtract the mean vector of the training data from each instance of testing data. Don't subtract the mean of testing data from itself. Since, you wouldn't know the mean of testing data at runtime.
-
Check if your loss at the very first epoch makes sense. For example, in a 10-class classification problem, starting loss should be -ln(0.1) = 2.302 (given
here).
-
Again, from
here, overfit a tiny subset of data and make sure you can achieve zero cost. Full details in the link.
If nothing works, just train and test on the same data and see if you can get 90% + accuracy. Otherwise, examine your network more closely by looking at individual layer outputs (given in Keras FAQ) etc.
I am sure you will find the fault somewhere if you follow all these steps for debugging a neural network.