传统的深度学习方法是根据输入数据更新网络的权值。而IST的算法是固定网络的参数，更新输入的数据。

1. We take input image and style images and resize them to equal shapes.

# Imports
import numpy as np
from PIL import Image
import requests
from io import BytesIO
import tensorflow as tf
import keras.backend  as K
from keras.models import Model
from keras.preprocessing.image import load_img, img_to_array, array_to_img
# Loads an image into PIL format.
# Converts a PIL Image instance to a Numpy array.
from keras.applications import vgg19

from scipy.optimize import fmin_l_bfgs_b

# Hyperparams
ITERATIONS = 2
CHANNELS = 3
IMAGE_WIDTH = 300
IMAGE_HEIGHT = 300
CONTENT_WEIGHT = 0.02
STYLE_WEIGHT = 4.5
TOTAL_VARIATION_WEIGHT = 0.9950
TOTAL_VARIATION_LOSS_FACTOR = 1.25

content_image_path = "input.png"
style_image_path = "style.png"
output_image_path = "output.png"
combined_image_path = "combined.png"

content_url_path = "http://y0.ifengimg.com/e42a62d6e58775df/2015/0813/re_55cc3b76ddef0.jpg"

style_rul_path = "http://5b0988e595225.cdn.sohucs.com/images/20181020/9b226e94fad640a9b711c99e5985853c.jpeg"

#Input visualization 
# PIL.Image.open读入的是RGB顺序，而opencv中cv2.imread读入的是BGR通道顺序 。cv2.imread会显示图片更蓝一些。
input_image = Image.open(BytesIO(requests.get(content_url_path).content))
input_image = input_image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))
input_image.save(content_image_path)
input_image

# Style visualization 
style_image = Image.open(BytesIO(requests.get(style_rul_path).content))
style_image = style_image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))
style_image.save(style_image_path)
style_image

def preprocess_image(image_path):
    # local
    # img = load_img(image_path, target_size=(self.nH, self.nW))  # Loads an image into PIL format.
    # web 
    img = image_path
    
    img = img_to_array(img)  # Converts a PIL Image instance to a Numpy array.
    img = np.expand_dims(img, axis=0)  # (500, 500, 3) ->(1, 500, 500, 3)
    img = vgg19.preprocess_input(img)  
    # The images are converted from RGB to BGR, then each color channel is zero-centered with respect to the ImageNet dataset, 
    # without scaling.
    return img
    
input_image_array = preprocess_image(input_image)
style_image_array = preprocess_image(style_image)

2. We load a pre-trained CNN (VGG16).

# 搭建模型
input_image = K.variable(input_image_array)
style_image = K.variable(style_image_array)
combination_image = K.placeholder((1, IMAGE_HEIGHT, IMAGE_WIDTH, 3))

# TensorShape([3, 500, 500, 3])
input_tensor = K.concatenate([input_image, style_image, combination_image], axis=0)
model = vgg19.VGG19(input_tensor=input_tensor, include_top=False)
model.summary()

4. Then we set our task as an optimization problem where we are going to minimize:

content loss (distance between the input and output images - we strive to preserve the content)
style loss (distance between the style and output images - we strive to apply a new style)
total variation loss (regularization - spatial smoothness to denoise the output image)

内容损失的目标是确保生成的照片x仍能保留内容照片p的“全局”风格。要想达到这个目标，内容损失函数会分别在给定层L中定义为p和x的特征表示之间的均方误差。内容损失函数为：

F和P是两个矩阵，包含N个行和M个列, F表示层L中x的特征, P表示层L中p的特征

def content_loss(content, combination):
    return K.sum(K.square(combination - content))

layers = dict([(layer.name, layer.output) for layer in model.layers])

content_layer = "block2_conv2"
layer_features = layers[content_layer]  # (3, 250, 250, 128)
content_image_features = layer_features[0, :, :, :]  # (250, 250, 128) input_image
combination_features = layer_features[2, :, :, :]  # (250, 250, 128)

loss = K.variable(0.)
loss = loss + CONTENT_WEIGHT*content_loss(content_image_features, combination_features)

风格损失需要保存风格照片a的风格特征。论文作者并未利用特征表示之间的不同，而是利用选定层中的格拉姆矩阵的不同之处，(纹理生成领域比较常见的Gram Matrix)其中格拉姆矩阵定义如下：

格拉姆矩阵是一个正方矩阵，包含层级L中每个矢量过滤器（vectorized filter）之间的点积。因此该矩阵可以看作层级L中过滤器的一个非规整矩阵。

那么我们可以将给定层L中的风格损失函数定义为：

其中A是风格照片a的格拉姆矩阵，G为生成照片x的格拉姆矩阵。N是CHANNELS_{层L中的过滤器}数量，M是层L的特征图谱（高度乘以宽度）中空间元素的数量

在大多数卷积神经网络中如VGG，提升层（ascending layer）的感受野（receptive field）会越来越大。随着感受野不断变大，输入图像的更大规模的特征也得以保存下来。正因如此，我们应该选择多个层级用于“风格迁移”，将局部和全局的风格质量进行合并。为了让这些层之间连接顺畅，我们可以为每个层赋予一个权重w，将整个风格损失函数定义为：

Gram矩阵就是每一层滤波后的feature map, 后将其转置并相乘得到的矩阵，如下图所示。其实就是不同滤波器滤波结果feature map两两之间的相关性。譬如说，（如下图）某一层中有一个滤波器专门检测尖尖的塔顶这样的东西，另一个滤波器专门检测黑色。又有一个滤波器负责检测圆圆的东西，又有一个滤波器用来检测金黄色。对梵高的原图做Gram矩阵，谁的相关性会比较大呢？如上图所示，“尖尖的”和“黑色”总是一起出现的，它们的相关性比较高。而“圆圆的”和“金黄色”都是一起出现的，他们的相关性比较高。因此在风格转移的时候，其实也在风景图里去寻找这种“匹配”，将尖尖的渲染为黑色，将圆圆的渲染为金黄色。如果我们承认“图像的艺术风格就是其基本形状与色彩的组合方式” ，这样一个假设，那么Gram矩阵能够表征艺术风格就是理所当然的事情了

风格表示使用了每个block的第一个卷积来计算损失函数，作者认为这种方式得到的纹理特征更为光滑，因为仅仅使用底层Feature Map得到的图像较为精细但是比较粗糙，而高层得到的图像则含有更多的内容信息，损失了一些纹理信息，但他的材质更为光滑。

def gram_matrix(x):
    #  K.batch_flatten Flattening a 3D tensor to 2D by collapsing the last dimension.
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))  # (64, 500, 500) -> (64, 250000)
    gram = K.dot(features, K.transpose(features))  # (64, 64)
    return gram
def compute_style_loss(style, combination):
    style = gram_matrix(style)  # (64, 64)
    combination = gram_matrix(combination)  # (64, 64)
    size = IMAGE_HEIGHT * IMAGE_WIDTH
    return K.sum(K.square(style - combination)) / (4. * (CHANNELS ** 2) * (size ** 2))

style_layers = ["block1_conv2", "block2_conv2", "block3_conv3", "block4_conv3", "block5_conv3"]
# (3, 500, 500, 64), (3, 250, 250, 128), (3, 125, 125, 256), (3, 62, 62, 512), (3, 31, 31, 512)
LAYER_WEIGHT = (STYLE_WEIGHT / len(style_layers))
for layer_name in style_layers:
    style_features = layers[layer_name][1, :, :, :]  # (500, 500, 64)
    combination_features = layers[layer_name][2, :, :, :]  # (500, 500, 64)
    style_loss = compute_style_loss(style_features, combination_features)
    loss = loss + LAYER_WEIGHT * style_loss

使得生成的图像局部平滑

def total_variation_loss(x):
    a = K.square(x[:, :IMAGE_HEIGHT-1, :IMAGE_WIDTH-1, :] - x[:, 1:, :IMAGE_WIDTH-1, :])
    b = K.square(x[:, :IMAGE_HEIGHT-1, :IMAGE_WIDTH-1, :] - x[:, :IMAGE_HEIGHT-1, 1:, :])
    return K.sum(K.pow(a + b, TOTAL_VARIATION_LOSS_FACTOR))

loss = loss+TOTAL_VARIATION_WEIGHT*total_variation_loss(combination_image)

5. Finally, we set our gradients and optimize with the L-BFGS algorithm.

要想开始改变我们的生成图像以最小化损失函数，我们必须用scipy和Keras后端再定义两个函数。首先，用一个函数计算整体损失，其次，用另一个函数计算梯度。两者计算后得到的结果会分别作为目标函数和梯度函数输入到Scipy优化函数中。在这里，我们使用L-BFGS算法（limited-memory BFGS）。

对于每张内容照片和风格照片，我们会提取特征表示，用来构建P和A（对于每个选中的风格层），然后为风格层赋给相同的权重。在实际操作中，通常用L-BFGS算法进行超过500次迭代后，产生的结果就比较可信了。

outputs = [loss]
# with tf.GradientTape() as gtape:
#     #grads = gtape.gradient(african_elephant_output, last_conv_layer.output)
#     outputs = outputs + gtape.gradient(loss, combination_image)
 
outputs = outputs + K.gradients(loss, combination_image)

def evaluate_loss_and_gradients(x):
    x = x.reshape((1, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS))
    outs = K.function([combination_image], outputs)([x])
    loss = outs[0]
    gradients = outs[1].flatten().astype("float64")
    return loss, gradients

class Evaluator:
    def loss(self, x):
        loss, gradients = evaluate_loss_and_gradients(x)
        self._gradients = gradients
        return loss

    def gradients(self, x):
        return self._gradients
evaluator = Evaluator()

x = np.random.uniform(0, 255, (1, IMAGE_HEIGHT, IMAGE_WIDTH, 3)) - 128.  # 随机生成output_image

for i in range(ITERATIONS):
    # fmin_l_bfgs_b是scipy包中一个函数。第一个参数是定义的损失函数，第二个参数是输入数据，fprime通常用于计算第一个损失函数的梯度，
    # maxfun是函数执行的次数。它的第一个返回值是更新之后的x的值，这里使用了递归的方式反复更新x，第二个返回值是损失值。
    x, loss, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(), fprime=evaluator.gradients, maxfun=20)
    print("Iteration %d completed with loss %d" % (i, loss))
    
def deprocess_image(img):
    # preprocess_image逆操作    
    img = img.reshape((IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS))
    print(img.shape)
    # 加上ImageNet训练集的图像平均值
    img[:, :, 0] += 103.939
    img[:, :, 1] += 116.779
    img[:, :, 2] += 123.68
    # BGR转RGB
    img = img[:, :, ::-1]
    img = np.clip(img, 0, 255).astype('uint8')
    
    img = array_to_img(img)  # Image.fromarray(img)
    return img

output_image = deprocess_image(x)
output_image.save(output_image_path)
output_image

# Visualizing combined results
combined = Image.new("RGB", (IMAGE_WIDTH*3, IMAGE_HEIGHT))
x_offset = 0
for image in map(Image.open, [content_image_path, style_image_path, output_image_path]):
    combined.paste(image, (x_offset, 0))
    x_offset = x_offset + IMAGE_WIDTH
combined.save(combined_image_path)
combined

https://github.com/dingtom/python/blob/master/Style_Transfer.ipynb