A quick complete tutorial to save and restore Tensorflow models
保存文件介绍
Tensorflow 模型有两个主要的文件:
- meta graph
This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections etc. This file has .meta extension.
这是一个 protocol buffer(不会翻译)保存了完整的 Tensorflow 图,即所有变量、操作和集合等。拥有一个.meta
的扩展名。 - checkpoint file
This is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. This file has an extension .ckpt. However, Tensorflow has changed this from version 0.11. Now, instead of single .ckpt file, we have two files:
这是一个二进制文件包含了所有权重、偏置、梯度和其他变量的值。这个文件有一个.ckpt
的扩展名。在0.11版本以前只有一个文件,现在有两个。
mymodel.data-00000-of-00001
mymodel.index
.data
file is the file that contains our training variables and we shall go after it.
.data
文件包含了我们所有训练的变量。
Along with this, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.
于此一起的,Tensorflow 还有一个文件叫做checkpoint只是单纯记录了最近的保存的ckeckpoint file
所以,保存的模型像下图类似。
备注:可以没有graph文件,但是checkpoint一定要有,不然
tf.train.get_checkpoint_state
或者tf.train.latest_checkpoint
会找不到文件。
保存代码
import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my_test_model')
# This will save following files in Tensorflow v >= 0.11
# my_test_model.data-00000-of-00001
# my_test_model.index
# my_test_model.meta
# checkpoint
If we are saving the model after 1000 iterations, we shall call save by passing the step count:
如果我们在1000次迭代后保存模型,我们把迭代次数传给保存函数
saver.save(sess, 'my_test_model',global_step=1000)
This will just append ‘-1000’ to the model name and following files will be created:
这样会在保存文件后加入-1000
在模型名字后面。
my_test_model-1000.index
my_test_model-1000.meta
my_test_model-1000.data-00000-of-00001
checkpoint
if you want to keep only 4 latest models and want to save one model after every 2 hours during training you can use max_to_keep and keep_checkpoint_every_n_hours like this.
如果你只想保存四个模型并且两个小时保存模型一次,你可以使用max_to_keep
和keep_checkpoint_every_n_hours
#saves a model every 2 hours and maximum 4 latest models are saved.
saver = tf.train.Saver(max_to_keep=4, keep_checkpoint_every_n_hours=2)
导入预训练好的模型
If you want to use someone else’s pre-trained model for fine-tuning, there are two things you need to do:
如果你想用别人预训练好的模型进行fine-tuning,有两件事情需要做。
- 创造网络
you can create the network by writing python code to create each and every layer manually as the original model. However, if you think about it, we had saved the network in .meta file which we can use to recreate the network using tf.train.import() function like this: saver = tf.train.import_meta_graph('my_test_model-1000.meta')
你可以通过python写好和原来模型一样的每一层代码来创造网络,可是,仔细一想,我们已经通过.metaa
把网络存储起来,我们可以用来再创造网络使用tf.train.import()
语句。
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
Remember, import_meta_graph appends the network defined in .meta file to the current graph. So, this will create the graph/network for you but we still need to load the value of the parameters that we had trained on this graph.
记住,import_meta_graph
将定义在.meta
的网络导入到当前图中。所以,这会替你创造图/网络,但我们仍然需要导入在这张图上训练好的参数。
- 加载参数
We can restore the parameters of the network by calling restore on this saver which is an instance of tf.train.Saver() class.
我们可以恢复网络的参数,通过使用saver,它是tf.train.Saver()
类的一个实例。
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('my_test_model-1000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))