GradientTape和implicit_gradients的区别

在学习Tensorflow的过程中，遇到了两个梯度计算函数GradientTape和implicit_gradients，就会纳闷他们的区别，最后在stackoverflow中找到了答案。

There are 4 ways to automatically compute gradients when eager execution is enabled (actually, they also work in graph mode):
tf.GradientTape context records computations so that you can call tfe.gradient() to get the gradients of any tensor computed while recording with regards to any trainable variable.
tfe.gradients_function() takes a function (say f()) and returns a gradient function (say fg()) that can compute the gradients of the outputs of f() with regards to the parameters of f() (or a subset of them).
tfe.implicit_gradients() is very similar but fg() computes the gradients of the outputs of f() with regards to all trainable variables these outputs depend on.
tfe.implicit_value_and_gradients() is almost identical but fg() also returns the output of the function f().

大体意思说的就是当处于Eager Execution模式时，有4种方法计算梯度。
tf.GradientTape记录所有在上下文中的操作，并且通过调用tfe.gradient()获得任何上下文中计算得出的张量的梯度
tfe.gradients_function()输入函数f并返回一个梯度函数(称之为fg())，fg()可以计算函数f输出的梯度
tfe.implicit_gradients()和👆很像但是输出的梯度是关于所有可训练的变量。
tfe.implicit_value_and_gradients()几乎和👆一样，但是fg()还同时返回f的输出

在TF2.0中，似乎只有 tf.GradientTape得以保留，因此推荐使用这种方式计算梯度。

例子

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()

w1 = tfe.Variable(2.0)
w2 = tfe.Variable(3.0)

def weighted_sum(x1, x2):
    return w1 * x1 + w2 * x2

s = weighted_sum(5., 7.)
print(s.numpy()) # 31


with tf.GradientTape() as tape:
    s = weighted_sum(5., 7.)

[w1_grad] = tape.gradient(s, [w1])
print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1

在GradientTape上下文中，所有操作都被记录下来，然后您可以计算上下文中计算的任意张量的梯度，关于任何可训练变量。例如，这段代码在GradientTape上下文中计算s，然后计算s相对于w1的梯度。由于s = w1 * x1 + w2 * x2,s对w1的梯度为x1:

原stackoverflow提问地址:
What's the difference between GradientTape, implicit_gradients?

GradientTape和implicit_gradients的区别

例子

推荐阅读更多精彩内容