CE-Net: Context Encoder Network for 2D MedicalImage Segmentation

2019．6

arXiv：https://arxiv.org/abs/1903.02740

github：https://github.com/Guzaiwang/CE-Net

Abstract

Medical image segmentation is an important step in medical image analysis. With the rapid development of convolutional neural network in image processing, deep learning has been used for medical image segmentation, such as optic disc segmentation, blood vessel detection, lung segmentation, cell segmentation, etc. Previously, U-net based approaches have been proposed. However, the consecutive pooling and strided convolutional operations lead to the loss of some spatial information. In this paper, we propose a context encoder network (referred to as CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation. CENet mainly contains three major components: a feature encoder module, a context extractor and a feature decoder module. We use pretrained ResNet block as the fixed feature extractor. The context extractor module is formed by a newly proposed dense atrous convolution (DAC) block and residual multi-kernel pooling (RMP) block. We applied the proposed CE-Net to different 2D medical image segmentation tasks. Comprehensive results show that the proposed method outperforms the original U-Net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation, cell contour segmentation and retinal optical coherence tomography layer segmentation.

医学图像分割是医学图像分析的重要环节。随着卷积神经网络在图像处理中的迅速发展,深度学习已应用于医学图像分割,如视盘分割、血管检测、肺分割、细胞分割等。此前,已提出基于U-net的方法。然而,连续的池和大步卷积操作会导致一些空间信息的丢失。本文提出了一种上下文编码器网络(简称CE-Net),用于捕获更高级的信息,并保存二维医学图像分割的空间信息。CENet 主要包含三个主要组件:功能编码器模块、上下文提取器和功能解码器模块。我们使用预先训练的 ResNet 块作为固定功能提取器。上下文提取器模块由新提出的密集卷积 (DAC) 块和残余多内核池 (RMP) 块组成。我们将建议的CE-Net应用于不同的2D医学图像分割任务。综合结果表明,该方法优于原有的U-Net方法和其他最先进的光学视盘分割、血管检测、肺分割、细胞轮廓分割和视网膜光学相干的方法。断层扫描层分割。

INTRODUCTION

Medical image segmentation is often animportant step in medical image analysis, such as optic disc segmentation [1],[2], [3] and blood vessel detection [4], [5], [6], [7], [8] in retinal images,cell segmentation [9], [10], [11] in electron microscopic (EM) recordings, lungsegmentation [12], [13], [14], [15], [16] and brain segmentation [17], [18],[19], [20], [21], [22] in computed tomography (CT) and magnetic resonanceimaging (MRI). Previous approaches to medical image segmentation are oftenbased on edge detection and template matching [15]. For example, circular orelliptical Hough transform are used in optic disc segmentation [23], [3]. Template matching is also used forspleen segmentation in MRI sequence images [24] and ventricular segmentation inbrain CT images [22].

医学图像分割通常是医学图像分析中的一个重要步骤,如视盘分割 [1]、[2]、[3] 和血管检测 [4]、[5]、[6]、[7]、[8]在视网膜图像中、细胞分割[9]、[10]、[11] 在电子显微 (EM) 中记录、肺分段[12]、[13]、[14]、[15]、[16]和大脑分割[17]、[18]、[19]、[20]、[21]、[22](22%)在计算机断层扫描(CT)和磁共振成像(MRI)。以前的医学图像分割方法通常基于边缘检测和模板匹配[15]。例如,圆形或椭圆形霍夫变换用于光学盘分割 [23]、[3]。模板匹配也用于MRI序列图像[24]和脑CT图像的脾脏分割[22]。

Deformable models are also proposed for medical image segmentation. The shape-based method using level sets [25] has been proposed for two-dimensional segmentation of cardiac MRI images and three-dimensional segmentation of prostate MRI images. In addition, a level set-based deformable model is adopted for kidney segmentation from abdominal CT images [26]. The deformable model has also been integrated with the Gibbs prior models for segmenting the boundaries of organs [27], with an evolutionary algorithm and a statistical shape model to segment the liver [16] from CT volumes. In optic disc segmentation, different deformable models have also been proposed and adopted, such as mathematical morphology, global elliptical model, local deformable model [28], and modified active shape model [29].

提出了用于医学图像分割的可变形模型。提出了用于心脏MRI图像的二维分割和前列腺MRI图像的三维分割的基于形状的方法[25]。此外,还采用了基于水平设置的可变形模型,用于从腹部CT图像进行肾脏分割[26]。可变形模型还与吉布斯先前用于分割器官边界[27]的模型集成,使用进化算法和统计形状模型将肝脏[16]从CT体积分割。在视盘分割中,还提出并采用了不同的可变形模型,如数学形态学、全局椭圆模型、局部可变形模型[28]、修正有源形状模型[29]等。

Learning based approaches are proposed to segment medical images as well. Aganj et al. [30] proposed the local center of mass based method for unsupervised learning based image segmentation in X-ray and MRI images. Kanimozhi et al. [31] applied the stationary wavelet transform to obtain the feature vectors, and self-organizing map is adopted to handle these feature vectors for unsupervised MRI image segmentation. Tong et al. [32] combined dictionary learning and sparse coding to segment multi-organ in abdominal CT images. Pixel classification based approaches [33], [1] are also learning based approaches which train classifiers based on pixels using pre-annotated data. However, it is not easy to select the pixels and extract features to train the classifier from the larger number of pixels. Cheng et al. [1] used the superpixel strategy to reduce the number of pixels and performed the optic disc and cup segmentation using superpixel classification. Tian et al. [34] adopted a superpixel-based graph cut method to segment 3D prostate MRI images. In [35], superpixel learning based method is integrated with restricted regions of shape constrains to segment lung from CT images.

基于学习的方法被提出用于图像分割。Aganj等人[30]提出了在X射线和MRI图像中无监督学习图像分割的局部质量中心方法。Kanimozhi等人[31]应用固定小波变换来获得特征向量,并采用自组织图处理这些特征向量,用于无监督的MRI图像分割。Tong等人[32]将字典学习和稀疏编码相结合,在腹部CT图像中对多器官进行分段。基于像素分类的方法 [33],[1] 也是基于学习的方法,这些方法使用预先编单的数据基于像素来训练分类器。但是,从大量像素中选择像素和提取要素以训练分类器并不容易。程等人采用超像素策略减少像素数,采用超像素分类进行光碟和杯分段。田等人采用超像素图形切割法对3D前列腺MRI图像进行分割。在[35]中,超像素学习方法与形状约束的受限区域集成在一起,从CT图像中分割肺。

The drawbacks of these methods lie in the utilization of hand-crafted features to obtain the segmentation results. On the one hand, it is difficult to design the representative features for different applications. On the other hand, the designed features working well for one type of images often fail on another type. Therefore, there is a lack of general approach to extract the feature.

这些方法的缺点是利用手工制作的特征来获得分割结果。一方面,很难为不同的应用设计具有代表性的功能。另一方面,为一种类型的图像设计的功能通常在另一种图像上失败。因此,缺乏提取特征的一般方法。

With the development of convolutional neural network (CNN) in image and video processing [36] and medical image analysis [37], [38], automatic feature learning algorithms using deep learning have emerged as feasible approaches for medical image segmentation. Deep learning based segmentation methods are pixel-classification based learning approaches. Different from traditional pixel or superpixel classification approaches which often use hand-crafted features, deep learning approaches learn the features and overcome the limitation of hand-crafted features.

随着卷积神经网络(CNN)在图像和视频处理[36]和医学图像分析[37]、[38]的发展,利用深度学习的自动特征学习算法已成为医学图像分割的可行方法。基于深度学习的分割方法是基于像素分类的学习方法。与通常使用手工制作功能的传统像素或超像素分类方法不同,深度学习方法了解这些功能并克服手工制作功能的限制。

Earlier deep learning approaches for medical image segmentation are mostly based on image patches. Ciresan et al. [39] proposed to segment neuronal membranes in microscopy images based on patches and sliding window strategy. Then, Kamnitsas et al. [40] employed a multi-scale 3D CNN architecture with fully connected conditional random field (CRF) for boosting patch based brain lesion segmentation. Obviously, this solution introduces two main drawbacks: redundant computation caused from sliding window and the inability to learn global features.

早期的医学图像分割深度学习方法主要基于图像补丁。Ciresan等人[39]建议根据贴片和滑动窗口策略在显微镜图像中分割神经元膜。然后,Kamnitsas等人[40]使用具有完全连接条件随机场(CRF)的多尺度3DCNN架构,用于促进基于补丁的脑病变分割。显然,此解决方案引入了两个主要缺点:滑动窗口导致的冗余计算和无法学习全局功能。

With the emerging of the end-to-end fully convolutional network (FCN) [41], Ronneberger et al. [10] proposed Ushape Net (U-Net) framework for biomedical image segmentation. U-Net has shown promising results on the neuronal structures segmentation in electron microscopic recordings and cell segmentation in light microscopic images. It has becomes a popular neural network architecture for biomedical image segmentation tasks [42], [43], [44], [45]. Sevastopolsky et al. [43] applied U-Net to directly segment the optic disc and optic cup in retinal fundus images for glaucoma diagnosis. Roy et al. [44] used a similar network for retinal layer segmentation in optical coherence tomography (OCT) images. Norman et al. [42] used U-Net to segment cartilage and meniscus from knee MRI data. The U-Net is also applied to directly segment lung from CT images [45].

随着端到端全卷积网络(FCN)[41]的出现,Ronneberger等人提出了用于生物医学图像分割的Ushape Net(U-Net)框架。U-Net在光显图像中电子显微记录和细胞分割的神经元结构分割方面已显示出可喜的结果。它已成为生物医学图像分割任务 [42] 、[43]、[44]、[45]的常用神经网络架构。塞瓦斯托波尔斯基等人[43]应用U-Net直接分割视盘和视杯的视网膜基质图像,用于青光眼诊断。Roy等人[44]在光学相干断层扫描(OCT)图像中使用了类似的视网膜层分割网络。Norman等人[42]使用U-Net从膝关节核磁共振成像数据中分割软骨和半月板。U-Net还应用于直接分割肺从CT图像[45]。

Many variations have been made on U-Net for different medical image segmentation tasks. Fu et al. [4] adopted the CRF to gather the multi-stage feature maps for boosting the vessel detection performance. Later, a modified U-Net framework (called M-Net) [2] is proposed for joint optic disc and cup segmentation by adding multi-scale inputs and deep supervision into the U-net architecture. Deep supervision mainly introduces the extra loss function associated with the middle-stage features. Based on the deep supervision, Chen et al. [46] proposed a Voxresnet to segment volumetric brain, and Dou et al. [47] proposed 3D deeply supervised network (3D DSN) to automatically segment lung in CT volumes.

针对不同的医学图像分割任务,在U-Net上进行了许多变化。Fu等人[4]采用CRF聚合多级特征图,以提高船舶检测性能。之后,通过在Unet架构中增加多尺度输入和深度监控,提出了一个改进的U-Net框架(称为M-Net)[2],用于联合光碟和杯分段。深度监管主要介绍与中间阶段特征相关的额外损耗功能。在深度监督的基础上,陈等人提出了Voxresnet对体积脑进行分割,Dou等人[47]提出了3D深度监督网络(3D DSN),以CT体积自动分割肺。

[46] H. Chen, Q. Dou, L. Yu, and P.-A. Heng, “Voxresnet: Deep voxelwise residual networks for volumetric brain segmentation,” arXiv preprint arXiv:1608.05895 , 2016.

[47] Q. Dou, H. Chen, Y. Jin, L. Yu, J. Qin, and P.-A. Heng, “3d deeply supervised network for automatic liver segmentation from ct volumes,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2016, pp. 149–157.

To enhance the feature learning ability of U-Net, some new modules have been proposed to replace the original blocks. Stefanos et al. [48] proposed a branch residual U-network (BRU-net) to segment pathological OCT retinal layer for agerelated macular degeneration diagnosis. BRU-net relies on residual connection and dilated convolutions to enhance the final OCT retinal layer segmentation. Gibson et al. [49] introduced dense connection in each encoder block to automatically segment multiple organs on abdominal CT. Kumar et al. [21] proposed an InfiNet for infant brain MRI segmentation. Besides the above achievements for U-Net based medical image segmentation, some researchers have also made progress to modify U-Net for general image segmentation. Peng et al. [50] proposed a novel global convolutional network to improve semantic segmentation. Lin et al. [51] proposed a multi-path refinement network, which contains residual convolution unit, multi-resolution fusion and chained residual pooling. Zhao et al. [52] adopted spatial pyramid pooling to gather the extracted feature maps to improve the semantic segmentation performance.

为了提高U-Net的功能学习能力,提出了一些新的模块来替换原来的模块。Stefanos等人[48]提出了一个分支残留U网络(BRU-net),用于分割病理性OCT视网膜层,用于年龄相关黄斑变性诊断。BRU-net 依靠残余连接和扩张卷积来增强最终的 OCT 视网膜层分割。吉布森等人[49]在每个编码器块中引入密集连接,以自动分割腹部CT上的多个器官。 Kumar等人[21]提出了用于婴儿脑MRI分割的InfiNet。除了上述基于U-Net的医疗图像分割成果外,一些研究人员还在修改U-Net进行一般图像分割方面取得了进展。彭等人提出了一种新的全卷积网络,以改善语义分割。林等人提出了多路径细化网络,其中包含残余卷积单元、多分辨率融合和链式残余池。赵等人采用空间金字塔集合来采集提取的特征图,以提高语义分割性能。

A common limitation of the U-Net and its variations is that the consecutive pooling operations or convolution striding reduce the feature resolution to learn increasingly abstract feature representations. Although this invariance is beneficial for classification or object detection tasks, it often impedes dense prediction tasks which require detailed spatial information. Intuitively, maintaining high-resolution feature maps at the middle stages can boost segmentation performance. However, it increases the size of feature maps, which is not optimal to accelerate the training and ease the difficulty of optimization. Therefore, there is a trade-off between accelerating the training and maintaining the high resolution. Generally, the U-Net structures can be considered as Encoder-Decoder architecture. The Encoder aims to reduce the spatial dimension of feature maps gradually and capture more high-level semantic features. The Decoder aims to recover the object details and spatial dimension. Therefore, it is spontaneous to capture more highlevel features in the encoder and preserve more spatial information in the decoder to improve the performance of image segmentation.

U-Net 及其变体的一个常见限制是,连续的池化操作或卷积旋转会降低特征分辨率,从而学习越来越抽象的特征表示。尽管此不变性有利于分类或对象检测任务,但它通常妨碍需要详细空间信息的密集预测任务。直观地讲,在中间阶段维护高分辨率要素地图可以提高分段性能。但是,它增加了要素图的大小,这不能优化训练,缓解优化的难度。因此,在加速训练和保持高分辨率之间需要权衡。通常,U-Net 结构可视为编码器解码器体系结构。编码器旨在逐渐减小要素地图的空间维度,并捕获更多高级语义要素。解码器旨在恢复对象详细信息和空间维度。因此,在编码器中捕获更多高级要素并在解码器中保留更多的空间信息,以提高图像分割的性能是自发的。

Motivated by the above discussions and also the InceptionResNet structures [53], [54] which make the neural network wider and deeper, we propose a novel dense atrous convolution (DAC) block to employ atrous convolution. The original UNet architecture captures multi-scale features in the limited scaling range by adopting the consecutive 3×3 convolution and pooling operations in the encoding path. Our proposed DAC block could capture wider and deeper semantic features by infusing four cascade branches with multi-scale atrous convolutions. In this module, the residual connection is utilized to prevent the gradient vanishing. In addition, we also proposea residual multi-kernel pooling (RMP) motivated from spatialpyramid pooling [55]. The RMP block further encodes themulti-scale context features of the object extracted from the DAC module by employing various size pooling operations, without the extra learning weights. In summary, the DAC block is proposed to extract enriched feature representations with multi-scale atrous convolutions, followed by the RMP block for further context information with multi-scale pooling operations. Integrating the newly proposed DAC block and the RMP block with the backbone encoder-decoder structure, we propose a novel context encoder network named as CENet. It relies on the DAC block and the RMP block to get more abstract features and preserve more spatial information to boost the performance of medical image segmentation.

在上述讨论以及初始空间ResNet结构[53]的激励下,[54]使神经网络更加广泛和深入,我们提出了一种新的密集卷积(DAC)块,以采用空洞卷积。原始 UNet 体系结构通过在编码路径中采用连续的 3×3 卷积和池运算来捕获有限缩放范围内的多比例功能。我们建议的 DAC 块可以通过在多尺度空洞卷积中注入四个级联分支来捕获更广泛和更深入的语义特征。在本模块中,残存连接用于防止梯度消失。此外,我们还提出了一个基于空间金字塔池的残余多内核池 (RMP) [55]。RMP 块通过采用各种大小的池化操作,进一步编码从 DAC 模块中提取的对象的多尺度上下文特征,而无需额外的学习权重。总之, DAC 块的提出是用于提取多尺度空洞卷积获得的丰富特征表示,然后是 RMP 块,以便使用多比例池操作获取进一步的上下文信息。将新提出的DAC模块和RMP模块与骨干编码器解码器结构相结合,提出了一种名为CENet的新型上下文编码器网络。它依靠 DAC 块和 RMP 块来获取更抽象的特征并保留更多的空间信息,以提高医学图像分割的性能。

The main contributions of this work are summarized as follows:

1) We propose a DAC block and RMP block to capture more high-level features and preserve more spatial information.

2) We integrate the proposed DAC block and RMP block with encoder-decoder structure for medical image segmentation.

3) We apply the proposed method in different tasks including optic disc segmentation, retinal vessel detection, lung segmentation, cell contour segmentation and retinal OCT layer segmentation. Results show that the proposed method outperforms the state-of-the-art methods in these different tasks.

The remainder of this paper is organized as follows. Section II introduces the proposed method in details. Section III presents the experimental results and discussions. In Section IV, we draw some conclusions.

这项工作的主要贡献概述如下:

1) 我们提出使用 DAC 块和 RMP 块来捕获更多高级要素并保留更多的空间信息。

2) 将建议的DAC块和RMP模块与编码器解码器结构集成,用于医学图像分割。

3) 将该方法应用于视盘分割、视网膜血管检测、肺分割、细胞轮廓分割和视网膜OCT层分割等不同任务。结果表明,在不同任务中,该方法优于最先进的方法。

本文的其余部分按如下方式组织。第二节详细介绍了拟议的方法。第三节介绍了实验结果和讨论。在第四节中,我们得出一些结论。

METHOD

图1.CE-Net的说明。首先,图像被输入到一个特征编码器模块中,其中从ImageNet预训练的ResNet-34模块用于替换原来的U-Net编码器模块。建议上下文提取器生成更高级的语义要素映射。它包含一个密集的卷积 (DAC) 块和一个残余多内核池 (RMP) 块。最后,提取的要素被输入到要素解码器模块中。本文采用解码器块来放大要素尺寸,取代了原来的上采样操作。解码器块包含 1⁄1 卷积和 3⁄3 反卷积操作。基于跳过连接和解码器块,获得掩码作为分割预测图。

The proposed CE-Net consists of three major parts: the feature encoder module, the context extractor module, and the feature decoder module, as shown in Fig. 1

提出的CE-Net由三个主要部分组成:特征编码器模块、上下文提取器模块和特征解码器模块,如图1所示

A. Feature Encoder Module

In U-Net architecture, each block of encoder contains two convolution layers and one max pooling layer. In the proposed method, we replace it with the pretrained ResNet-34 [53] in the feature encoder module, which retains the first four feature extracting blocks without the average pooling layer and the fully connected layers. Compared with the original block, ResNet adds shortcut mechanism to avoid the gradient vanishing and accelerate the network convergence, as shown in Fig. 1(b). For convenience, we use the modified U-net with pretrained ResNet as backbone approach.

在 U-Net 体系结构中,每个编码器块包含两个卷积层和一个最大池层。在建议的方法中,我们将其替换为特征编码器模块中预先训练的 ResNet-34 [53],该模块保留了前四个特征提取块,没有平均池层和完全连接的层。与原始块相比,ResNet 增加了快捷方式机制,以避免梯度消失并加速网络收敛,如图 1(b) 所示。为方便起见,我们使用经过预训练的 ResNet 的经过改进的 U-net 作为骨干方法。

B. Context Extractor Module

The context extractor module is a newly proposed module, consisting of the DAC block and the RMP block. This module extracts context semantic information and generates more high-level feature maps.

上下文提取器模块是新提出的模块，由DAC块和RMP块组成。该模块提取上下文语义信息并生成更多高级特征映射。

1) Atrous convolution: In semantic segmentation tasks and object detection tasks, deep convolutional layers have shown to be effective in extracting feature representations for images. However, the pooling layers lead to the loss of semantic information in images. In order to overcome this limitation, atrous convolution is adopted for dense segmentation [56]:

1）空洞卷积：在语义分割任务和对象检测任务中，深度卷积层已经证明在提取图像的特征表示方面是有效的。但是，池化层会导致图像中语义信息的丢失。为了克服这种限制，采用了空洞卷积进行密集分割[56]：

图3.密集的膨胀卷积块的图示。它包含四个级联分支，随着萎缩卷积数量的逐渐增加，从1到1,3和5，然后每个分支的感受域将是3,7,9,19。因此，网络可以提取特征来自不同的规模。

The atrous convolution is originally proposed for the efficient computation of the wavelet transform. Mathematically, the atrous convolution under two dimensional signals is computed as follows:

最初提出的膨胀卷积用于小波变换的有效计算。在数学上，二维信号下的膨胀卷积计算如下：

where the convolution of the input feature map x and a filter w yields the output y, and the atrous rate r corresponds to the stride with which we sample the input signal. It is equivalent to convolute the input x with upsampled filters produced by inserting r − 1 zeros between two consecutive filter values along each spatial dimension (hence the name atrous convolution in which the French word atrous means holes in English). Standard convolution is a special case for rate r = 1, and atrous convolution allows us to adaptively modify filters field-of-view by changing the rate value. See Fig. 2 for illustration.

其中输入特征映射x和滤波器w的卷积产生输出y，并且atrous rate r对应于我们对输入信号进行采样的步幅。它相当于输入x和上采样滤波器的旋转，它是通过在每个空间维度上的两个连续滤波器值之间插入r - 1个零点而产生的（因此，名称atrous convolution，其中法语单词atrous表示英语中的空洞）。标准卷积是速率r = 1的特殊情况，并且有空洞积允许我们通过改变速率值来自适应地修改滤波器感受野。参见图2以进行说明。

2) Dense Atrous Convolution module: Inception[54] and ResNet[53] are two classical and representative architectures in the deep learning. Inception-series structures adopt different receptive fields to widen the architecture. On the contrary, ResNet employs shortcut connection mechanism to avoid the exploding and vanishing gradients. It makes the neural network break through up to thousands of layers for the first time. Inception-ResNet [54] block, which combines the Inception and ResNet, inherits the advantages of both approaches. Then it becomes a baseline approach in the field of deep CNNs

2）密集的Atrous卷积模块：Inception [54]和ResNet [53]是深度学习中的两种经典和代表性的体系结构。初始序列结构采用不同的感受域来拓宽架构。相反，ResNet采用快捷连接机制来避免爆炸和消失的梯度。它使神经网络首次突破数千层。 Inception-ResNet [54]块结合了Inception和ResNet，继承了两种方法的优点。然后它成为深度CNN领域的基线方法

Inception-ResNet-v2

Motivated by the Inception-ResNet-V2 block and atrous convolution, we propose dense atrous convolution (DAC) block to encode the high-level semantic feature maps. As shown in Fig. 3, the atrous convolution is stacked in cascade mode. In this case, DAC has four cascade branches with the gradual increment of the number of atrous convolution, from 1 to 1, 3, and 5, then the receptive field of each branch will be 3, 7, 9, 19. It employs different receptive fields, similar to Inception structures. In each atrous branch, we apply one 1×1 convolution for rectified linear activation. Finally, we directly add the original features with other features, like shortcut mechanism in ResNet. Since the proposed block looks like a densely connected block, we name it dense atrous convolution block. Very often, the convolution of large reception field could extract and generate more abstract features for large objects, while the convolution of small reception field is better for small object. By combining the atrous convolution of different atrous rates, the DAC block is able to extract features for objects with various sizes.

在Inception-ResNet-V2模块和atrous卷积的推动下，我们提出密集的atrous卷积（DAC）块来编码高级语义特征映射。如图3所示，萎缩卷曲以级联模式堆叠。在这种情况下，DAC有四个级联分支，随着自然卷积数量的逐渐增加，从1到1,3和5，然后每个分支的感受野将是3,7,9,19。它采用不同的感知领域，类似于Inception结构。在每个atrous分支中，我们应用一个1×1卷积进行整流线性激活。最后，我们直接添加其他功能的原始功能，如ResNet中的短接方式。由于所提出的块看起来像一个密集连接的块，我们将其命名为密集空洞卷积块。通常，大接收场的卷积可以为大对象提取和生成更抽象的特征，而小接收场的卷积对于小对象更好。通过组合不同动态速率的迂回卷积，DAC块能够提取具有各种尺寸的对象的特征。

3) Residual Multi-kernel pooling: A challenge in segmentation is the large variation of object size in medical image. For example, a tumor in middle or late stage can be much larger than that in early stage. In this paper, we propose a residual multi-kernel pooling to address the problem, which mainly relies on multiple effective field-of-views to detect objects at different sizes.

3）残差多核池：分割中的一个挑战是医学图像中对象大小的大变化。例如，中期或晚期的肿瘤可能比早期的肿瘤大得多。在本文中，我们提出了一个残余的多内核池来解决这个问题，它主要依靠多个有效的视场来检测不同大小的对象。

The size of receptive field roughly determines how much context information we can use. The general max pooling operation just employs a single pooling kernel, such as 2×2. As illustrated in Fig. 4, the proposed RMP encodes global context information with four different-size receptive fields: 2×2, 3×3, 5×5 and 6×6. The four-level outputs contain the feature maps with various sizes. To reduce the dimension of weights and computational cost, we use a 1×1 convolution after each level of pooling. It reduces the dimension of the feature maps to the N1 of original dimension, where N represents number of channels in original feature maps. Then we upsample the low-dimension feature map to get the same size features as the original feature map via bilinear interpolation. Finally, we concatenate the original features with upsampled feature maps.

感受野的大小粗略地决定了我们可以使用多少上下文信息。 一般的最大池操作只使用单个池内核，例如2×2。如图4所示，所提出的RMP用四个不同大小的感受域编码全局上下文信息：2×2,3×3,5×5和6×6。四级输出包含各种尺寸的特征图。为了减少权重和计算成本的维数，我们在每个汇集级别后使用1×1卷积。它将要素图的尺寸减小到原始尺寸的N1，其中N表示原始要素图中的通道数。然后我们对低维特征图进行上采样，以通过双线性插值获得与原始特征图相同的尺寸特征。最后，我们将原始特征与上采样特征映射相结合。

图4.剩余多内核池（RMP）策略的图示。建议的RMP使用四个不同大小的池内核收集上下文信息。然后将特征输入1×1卷积以减小特征映射的维数。最后，上采样功能与原始功能连接在一起。

C. Feature Decoder Module

The feature decoder module is adopted to restore the highlevel semantic features extracted from the feature encoder5 module and context extractor module. The skip connection takes some detailed information from the encoder to the decoder to remedy the information loss due to consecutive pooling and striding convolutional operations. Similar to [48], we adopted an efficient block to enhance the decoding performance. The simple upscaling and deconvolution are two common operations of the decoder in the U-shape Networks. The upscaling operation increases the image size with linear interpolation, while deconvolution (also called transposed convolution) employs convolution operation to enlarge the image. Intuitively, the transposed convolution could learn a self-adaptive mapping to restore feature with more detailed information. Therefore, we choose to use the transposed convolution to restore the higher resolution feature in the decoder. As illustrated in Fig. 1(c), it mainly includes a 1×1 convolution, a 3×3 transposed convolution and a 1×1 convolution consecutively. Based on skip connection and the decoder block, the feature decoder module outputs a mask, the same size as the original input.

采用特征解码器模块恢复从特征编码器5模块和上下文提取器模块中提取的高级语义特征。跳过连接从编码器到解码器获取一些详细信息，以补救由于连续汇集和跨步卷积操作而导致的信息丢失。与[48]类似，我们采用了一种有效的块来增强解码性能。简单的放大和反卷积是U形网络中解码器的两种常见操作。升频操作通过线性插值增加图像尺寸，而反卷积（也称为转置卷积）采用卷积操作来放大图像。直观地，转置卷积可以学习自适应映射以恢复具有更详细信息的特征。因此，我们选择使用转置卷积来恢复解码器中的更高分辨率特征。如图1（c）所示，它主要包括1×1卷积，3×3转置卷积和1×1卷积连续。基于跳过连接和解码器块，特征解码器模块输出与原始输入相同大小的掩码。

D. Loss Function

Our framework is an end-to-end deep learning system. As illustrated in Fig. 1, we need to train the proposed method to predict each pixel to be foreground or background, which is a pixel-wise classification problem. The most common loss function is cross entropy loss function

我们的框架是一个端到端的深度学习系统。如图1所示，我们需要训练所提出的方法来预测每个像素是前景或背景，这是像素方式的分类问题。最常见的损失函数是交叉熵损失函数

However, the objects in medical images such as optic disc and retinal vessels often occupy a small region in the image. The cross entropy loss is not optimal for such tasks. In this paper, we use the Dice coefficient loss function [57], [58] to replace the common cross entropy loss. The comparison experiments and discussions are also conducted in the following section. The Dice coefficient is a measure of overlap widely used to assess segmentation performance when ground truthis available, as in Equation (2):

然而，诸如视神经盘和视网膜血管的医学图像中的物体通常占据图像中的小区域。交叉熵损失对于这样的任务不是最佳的。在本文中，我们使用Dice系数损失函数[57]，[58]来代替常见的交叉熵损失。比较实验和讨论也在以下部分中进行。 Dice系数是一种重叠度量，广泛用于评估gt可用时的分割性能，如公式（2）所示：

where N is the pixel number, p(k;i) ∈[0; 1] and g(k;i) 属于 f0; 1g denote predicted probability and ground truth label for class k, respectively. K is the class number, and Pk !k = 1 are the class weights. In our paper, we set wk = K1 empirically. The final loss function is defined as:

其中N是像素数，p（k; i）属于[0; 1]和g（k; i）属于f0; 1g分别表示类k的预测概率和地面实况标签。 K是类号，wk之和为1是类权重。在我们的论文中，我们根据经验设置了wk =1/ K。最终损失函数定义为：

where Lreg represents the regularization loss (also called to weight decay) [59] used to avoid overfitting.

To evaluate the performance of CE-Net, we apply the proposed method to five different medical image segmentation tasks: optic disc segmentation, retinal vessel detection, lung segmentation, cell contour segmentation and retinal OCT layer segmentation.

其中Lreg表示正则化损失（也称为重量衰减）[59]，用于避免过度拟合。为了评估CE-Net的性能，我们将所提出的方法应用于五种不同的医学图像分割任务：视盘分割，视网膜血管检测，肺分割，细胞轮廓分割和视网膜OCT层分割。

CE-Net: Context Encoder Network for 2D MedicalImage Segmentation

推荐阅读更多精彩内容