图像修复神器

Image Impainting for Irregular Holes

https://arxiv.org/pdf/1804.07723.pdf

在这篇之下 Adobe 的contexual Attenction显得十分鸡肋

https://arxiv.org/pdf/1801.07892.pdf

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).

Postprocessing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels.

We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.

现有的基于深度学习的图像修复方法在损坏的图像上使用一个标准卷积网络，使用有效像素以及被掩盖（mask)孔中的替代值（通常是平均值）条件下的卷积滤波器响应。这个通常导致诸如颜色差异和模糊的伪像。

后处理通常用于减少这种伪像，但是昂贵且可能会失败。我们提出使用部分卷积,这其中卷积被掩盖并重新正规化以使用有效像素作为仅有的条件。

我们还包含一个机制来自动生成一个更新掩模作为前向传播的一部分的下一层。我们的模型胜过其他方法为不规则的面具。我们展示质化和量化与其他方法比较以验证我们的方法。

图像修复，填充图像中的孔的任务，可以用于许多应用。例如，它可以用于图像编辑以去除不需要的图像内容，同时用合理的图像填充结果空间。以前的深度学习方法主要集中在位于图像中心附近的矩形区域，并且通常依赖于昂贵的后期处理。这项工作的目标是提出一种图像修复模型，该模型可以在不规则孔模式下稳健运行（见图1），并产生语义上有意义的预测，与图像的其余部分平滑结合，而不需要任何额外的后处理或混合操作。最近的图像修复方法不使用深度学习，而使用图像剩余图像的统计数据填入孔中。 PatchMatch [3]，最先进的方法之一，它迭代搜索最适合的补丁来填充在孔里。虽然这种方法通常会产生平滑的结果，但它受可用图像统计量的限制，并没有视觉语义学的概念。例如，在图2（b）中，PatchMatch能够使用来自周围阴影和墙壁的图像补丁平滑地填充绘画的缺失组件，但是语义感知方法将使用绘画中的补丁代替。

深度神经网络以端到端的方式学习语义先验和有意义的隐藏表示，已经用于最近的图像修复工作。这些网络在图像上使用卷积滤波器，用固定值替换已删除的内容。因此，这些方法受到初始孔值的依赖，

As a result, these approaches suffer from dependence on the initial hole values, which often manifests itself as lack of texture in the hole regions, obvious color contrasts, or artificial edge responses surrounding the hole. Examples using a U-Net architecture with typical convolutional layers with various hole value initialization can be seen in Figure 2(e) and 2(f). (For both, the training and testing share the same initalization scheme). Conditioning the output on the hole values ultimately results in various types of visual artifacts that necessitate expensive post-processing. For example, Iizuka et al. [1] uses fast marching [4] and Poisson image blending [5], while Yu et al. [2] employ a following-up refinement network to refine their raw network predictions.However, these refinement cannot resolve all the artifacts shown as 2(c) and 2(d).

结果，这些方法受到依赖于初始孔值的影响，这通常表现为孔区缺乏纹理，明显的颜色对比或人为的边缘响应。在图2（e）和2（f）中可以看到使用具有各种孔值初始化的典型卷积层的U-Net架构的例子。（对于这两者，训练和测试共享相同的初始化计划）。调整孔值的输出最终导致各种类型的视觉伪象，这需要昂贵的后处理。例如，Iizuka等人[1]使用快速前进[4]和泊松图像混合[5]，而余等人。 [2]采用后续改进网络来改进其原始网络预测。

However, these refinement cannot resolve all the artifacts shown as 2(c) and 2(d).Our work aims to achieve well-incorporated hole predictions independent of the hole initialization values and without any additional post-processing. Another limitation of many recent approaches is the focus on rectangular shaped holes, often assumed to be center in the image. We find these limitations may lead to overfitting to the rectangular holes, and ultimately limit the utility of these models in application. Pathak et al. [6] and Yang et al. [7] assume 64 × 64 square holes at the center of a 128×128 image. Iizuka et al. [1] and Yu et al. [2] remove the centered hole assumption and can handle irregular shaped holes, but do not perform an extensive quantitative analysis on a large number of images with irregular masks (51 test images in [8]). In order to focus on the more practical irregular hole use case, we collect a large benchmark of images with irregular masks of varying sizes. In our analysis, we look at the effects of not just the size of the hole, but also whether the holes are in contact with the image border.

然而，这些细化并不能解决所有2（c）和2（d）显示的伪像。我们的工作旨在实现独立于孔的有良好适应性的孔预测，并且不需要任何额外的后处理。许多最近的方法的另一个限制是矩形通常被认为是图像中心这个关注点。我们发现这些限制可能导致过拟合矩形孔，并最终限制了实用性这些模型在应用中。 Pathak等人[6]和杨等人。 [7]假设128×128图像中心的64×64方孔。 Iizuka等人[1]和余等人。 [2]去除中心孔的假设，并可以处理不规则形状但是不要对大量不规则掩模的图像进行广泛的定量分析（[8]中的51个测试图像）。为了专注于更实用的不规则孔使用案例，我们收集了大量的图像基准与不规则的大小不同的Mask。在我们的分析中，我们看看效果不只是孔的大小，而且孔是否与图像边界接触。

To properly handle irregular masks, we propose the use of a Partial Convolutional Layer, comprising a masked and re-normalized convolution operation followed by a mask-update step. The concept of a masked and re-normalized convolution is also referred to as segmentation-aware convolutions in [9] for the image segmentation task, however they did not make modifications to the input mask. Our use of partial convolutions is such that given a binary mask our convolutional results depend only on the non-hole regions at every layer. Our main extension is the automatic mask update step, which removes any masking where the partial convolution was able to operate on an unmasked value.Given sufficient layers of successive updates, even the largest masked holes will eventually shrink away, leaving only valid responses in the feature map. The partial convolutional layer ultimately makes our model agnostic to placeholder hole values.

为了正确处理不规则的蒙版，我们建议使用部分卷积层，包含一个掩蔽(mask)和重新归一化的卷积操作接着是Mask更新步骤。Mask和重新规范化的概念卷积在[9]中也被称为图像分割任务的分割感知卷积，但是他们没有对输入进行修改Mask。我们使用的局部卷积是这样的：给定一个二进制掩码卷积结果仅取决于每一层的非孔区域。我们的主要扩展是自动Mask更新步骤，它可以删除任何Mask局部卷积能够在未Mask的值上进行操作。如果有足够的连续更新层次，即使是最大的Mask孔也最终会缩小，只在特征映射中留下有效的响应。该局部卷积层最终使我们的模型对占位符不可知孔值。

（让我相当Mask RCNN)

总之，我们做出以下贡献：

- 我们建议使用带有自动Mask更新的部分卷积实现图像修复的最新技术。

- 在具有典型卷积的U-Net [10]中以前的工作中无法通过跳转链接获得良好的修补效果，我们证明替换具有部分卷积和掩模更新的卷积层可以实现最新的修补结果。

- 据我们所知，我们是第一个在不规则形状的孔上训练图像修复模型证明其功效的。 - 我们提出一个大型的不规则掩模数据集，并将公开发布以促进未来在培训和评估修补模型方面的努力。

3 Approach

Our proposed model uses stacked partial convolution operations and mask updating

steps to perform image inpainting. We first define our convolution and

mask update mechanism, then discuss model architecture and loss functions.

3.1 Partial Convolutional Layer

For brevity, we refer to our partial convolution operation and mask update function

jointly as the Partial Convolutional Layer.

Let W be the convolution filter weights for the convolution filter and b its

the corresponding bias. X are the feature values (pixels values) for the current

convolution (sliding) window and M is the corresponding binary mask. The

partial convolution at every location, similarly defined in [9], is expressed as:

相关研究

非学习方法：

1)传播邻近区域的外观 propagate appearance information from neighboring pixels to the target region using some mechanisms Distance Field：只能处理小hole。大hole over-smoothing或者artifacts

2) PatchBased 寻找非hole区域的补丁Iteratively

PatchMatch加速了。但还是太慢了

深度学习方法

初始化constant placeholder values ---> CNN ---> postprocess

1)Content Encoders + Propagate the texture information from non-hole regions to fill the hole regions as postprocessing.

2)blurry initial hole-filling result is used as the input to a network replaced with patches from the closest non-hole regions in the feature space

3) both global and local discriminators; Poisson blending as a post-process

4)contextual attention layers.

ignore mask placeholder values (we do)

1)searches for the closest encoding to the corrupted image in a latent space, which is then used to condition the output of a hole-filling generator

2) network needs no external dataset training and can rely on the structure of the generative network itself to complete the corrupted image.

缺点：

1)different set of hyper parameters for every image

2)applies several iterations to achieve good results

3)not able to use skip links, which are known to produce detailed outputs ( we can)

4) With standard convolutional layers, the raw features of noise or wrong hole initialization values in the encoder stage will propagate to the decoder stage.

Our Work:

-makes extensive use of a masked or reweighted convolution operation(which allows us to condition output only on valid inputs.)

-single forward pass

Previous Example for Masked Conv Ops:

-soft attention mask for semantic segmentation

-PixelCNN (condition the next pixel only on previously synthesized pixels)

What is Partial Conv?

a special case of the normalized convolution

Our Contri for Partial Conv:

-further update the input mask for the next layer based on where the partial convolution

was able to make a valid response

Our Model:

- encoder-decoder architecture

-with sufficient receptive fields such that the mask is fully valid before it enters the decoder half

-simplifies the decoding process.

Section 3 Model

Our proposed model uses stacked partial convolution operations and mask updating steps to perform image inpainting.

3.1 define our convolution and mask update mechanism,

3.2 model architecture

略。看原文吧

3.3 loss functions.

图像修复神器

推荐阅读更多精彩内容