keras中文文档:
keras英文文档:
https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Conv2D
WDL知乎:(回顾Google经典CTR预估模型WDL)
https://zhuanlan.zhihu.com/p/100898327
LSTM: (“动手学深度学习” )
课程:https://zh.d2l.ai/chapter_recurrent-neural-networks/lstm.html
github: https://github.com/d2l-ai/d2l-zh
https://zhuanlan.zhihu.com/p/32085405
https://blog.csdn.net/hfutdog/article/details/96479716
Tensorflow编程
1. replica_device_setter 参数设备分配
如果有多个ps节点时,变量存储和更新该怎么分配呢?tf.train.replica_device_setter这个API给出了答案
worker分配和ps分配;Between-graph replication;
if FLAGS.job_name == "ps":
server.join() # ps hosts only join
elif FLAGS.job_name == "worker":
with tf.device(tf.train.replica_device_setter(worker_device="/job:worker/task:%d" % FLAGS.task_index, cluster=cluster)):
https://zhuanlan.zhihu.com/p/90234576
2. MonitoredTrainingSession
变量的初始化、从已有checkpoint恢复训练、summary、log和checkpoint的保存等。
with tf.train.MonitoredTrainingSession(master=server.target,
is_chief=(FLAGS.task_index == 0),
checkpoint_dir="./checkpoint_dir",
hooks=hooks) as mon_sess:
while not mon_sess.should_stop():
# mon_sess.run handles AbortedError in case of preempted PS.
_, ls, step = mon_sess.run([train_op, loss, global_step])
if step % 100 == 0:
print("Train step %d, loss: %f" % (step, ls))
https://zhuanlan.zhihu.com/p/91608555
3. checkpoint
TensorFlow的checkpoint中变量的重命名
https://zhuanlan.zhihu.com/p/33153473
tf.contrib.framework.list_variables(checkpoint_dir)
var = tf.contrib.framework.load_variable(checkpoint_dir, var_name)
tf.contrib.framework.get_variables_to_restore()
tf.trainable_variables()
tensorflow/python/tools/inspect_checkpoint.py
中提到的tf.train.NewCheckpointReader类
- tf.train.NewCheckpointReader
这种方法不需要model,只要有checkpoint文件就行。
reader=tf.train.NewCheckpointReader(checkpoint_path)
variable_map=reader.get_variable_to_shape_map()
for var_name in variable_map:
print(reader.get_tensor(var_name))
4. reduce_sum axis 轴 塌缩
https://www.zhihu.com/question/51325408
5. 变量复用variable_scope get_variable()
https://blog.csdn.net/Jerr__y/article/details/70809528
6. dropout的作用: 防止过拟合
https://zhuanlan.zhihu.com/p/38200980
7. 激活函数
https://zhuanlan.zhihu.com/p/172254089
8. 交叉熵
https://zhuanlan.zhihu.com/p/63731947?group_id=1112146751385022464
9. tf.shape(x) PK x.get_shape().as_list()
https://blog.csdn.net/m0_37393514/article/details/82226754
tf.shape(x) 返回的是Tensor,因此要放在session.run()里
x.get_shape().as_list() 返回元祖,因此要as_list()
10. 反向传播
https://blog.csdn.net/u014313009/article/details/51039334
11. attention
https://zhuanlan.zhihu.com/p/47063917
https://zhuanlan.zhihu.com/p/47282410
https://blog.csdn.net/qq_43331398/article/details/103192522
12. transformer
https://zhuanlan.zhihu.com/p/44121378
https://arxiv.org/pdf/1706.03762.pdf
https://blog.csdn.net/longxinchen_ml/article/details/86533005
13. TRAINABLE_VARIABLES
- TRAINABLE_VARIABLES is the collection of variables or training parameters which should be modified when minimizing the loss
TRAINABLE_VARIABLES变量是需要在最小化Loss时进行数值调整的 - UPDATE_OPS is a collection of ops (operations performed when the graph runs, like multiplication, ReLU, etc.), not variables. Specifically, this collection maintains a list of ops which need to run before each training step.
其中会保存一些需要在训练操作之前完成的操作,可配合tf.control_dependencies函数使得update操作先执行,然后再训练