【论文翻译】Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Multi-label Classification with Partial Annotations using Class-aware Selective Loss

https://arxiv.org/pdf/2110.10955.pdf

Abstract

摘要

Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a na ̈ıve estimation computed using the dataset’s partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve state-of-the-art results on OpenImages dataset (e.g. reaching 87.3 mAP on V6). In addition, experiments conducted on LVIS and simulated-COCO demonstrate the effectiveness of our approach. Code is available at https://github.com/Alibaba-MIIL/PartialLabelingCSL.

大规模多标签分类数据集通常是部分标注的，这是不可避免的。也就是说，每个样本只有一小部分标签被标注。不同的缺失标签处理方法会对模型产生不同的影响，从而影响模型的准确性。在本工作中，我们分析了部分标注问题，并基于两个关键思想提出了解决方案：第一点，应该根据两个概率量对未标注的标签进行选择性处理：类别在整个数据集上的分布与给定样本特定标签可能性。我们提出使用一个专用的临时模型来估计类分布，并且我们展示了它比使用数据集的部分注释进行简单的估计更有效。第二点，在目标模型的训练过程中，我们通过使用专用的不对称损失函数来突出有注释标签比原本没有注释标签的贡献。通过我们的新方法，我们在OpenImages数据集上实现了最先进的结果（例如在V6上达到mAP 87.3）。此外，在LVIS和模拟coco实验上也验证了该方法的有效性。代码可以在https://github.com/Alibaba-MIIL/PartialLabelingCSL找到。

1. Introduction

1. 引言

Recently, a remarkable progress has been made in multi-label classification [4, 7, 16, 29]. Dedicated loss functions were proposed in [2, 27], as well as transformers based approaches [5, 16, 19]. In many common cases, such as [6, 8, 11, 14, 15], as the amounts of samples and labels in the data increase, it becomes impractical to fully annotate each image. For example, OpenImages dataset [15] consists of 9 million training images and having 9,600 classes. An exhaustive annotation process would require annotating more than 86 billion labels. As a result, partially labeled data is inevitable in realistic large-scale multi-label classification tasks. A partially labeled image is annotated with a subset of positive labels and a subset of negative labels, where the rest un-annotated labels are considered as unknown (Figure 1). Typically, the majority of the labels are un-annotated. For example, on average, a picture in OpenImages is annotated with only 7 labels. Thus, the question of how to treat the numerous un-annotated labels may have a considerable impact on the learning process.

近年来，多标签分类技术取得了显著进展 [4,7,16,29]。在 [2,27] 中提出了专用损失函数，以及基于transformer的方法 [5,16,19] 。在许多常见的情况下，如 [6,8,11,14,15]，随着数据中样本和标签数量的增加，对每一幅图像进行完全注释变得不切实际。例如，OpenImages数据集 [15] 包含900万张训练图像和9600个类。一个详尽的注释过程需要注释超过860亿个标签。因此，在现实的大规模多标签分类任务中，部分标注数据是不可避免的。部分标注的图像用正面标签的子集和负面标签的子集进行标注，其余未标注的标签被认为是未知的(图1)。通常，大多数标签都是未注释的。例如，OpenImages中的一张图片平均只有7个标签。因此，如何处理大量未注释标签的问题可能对学习过程有相当大的影响。

Figure 1. Challenges in partial annotation. (1) “Lip” and “Yellow” are clearly present in the left image but were not annotated as positive labels. The middle and right images are annotated with “Yellow” and “Lip” respectively, while not being dominant labels in those images. (2) The deficiency of positive annotations is a key challenge: classes that frequently appear in images (e.g. “Black”, “Lip”) may be annotated much less comparing with infrequent classes (“Flower”, “Guitar”) (3) Most labels are un-annotated. How to exploit a temporary model’s predictions for the un-annotated labels when training a target model?

图1。部分注释的挑战(1)“Lip”和“Yellow”在左侧图像中明显存在，但没有被标注为正面标签。中间和右边的图片分别标注了“Yellow”和“Lip”，但在这些图片中并不是主流标签。(2)缺乏积极的注释是一个关键的挑战:经常出现在图像中的类(如“Black”，“Lip”)可能比不经常出现的类(“Flower”，“Guitar”)注释少得多。(3)大多数标签是未标注的。当训练一个目标模型时，如何利用一个临时模型的为预测未注释的标签做预测?

The basic training mode for handling the un-annotated labels is simply to ignore their contribution in the loss function, as proposed in [6]. We denote this mode as Ignore. While ignoring the un-annotated labels is a reasonable choice, it may lead to a poor decision boundary as it exploits only a fraction of the data, see Figure 2(b). Moreover, in a typical multi-label dataset, the probability of a label being negative is very high. Consequently, treating the un-annotated labels as negative may improve the discriminative power as it enables the exploitation of the entire data [14]. However, this training mode, denoted as Negative, has two main drawbacks: adding label noise to the training, and inducing a high imbalance between negative and positive samples [2]. This mode is illustrated in Figure 2(c).

处理未标注标签最基本的训练模式是简单地忽略它们在损失函数中的贡献，就像 [6] 中提出的那样。我们将此模式表示为Ignore。尽管忽略未注释的标签是一个合理的选择，但它可能导致一个糟糕的决策边界，因为它只利用了数据的一部分，请参见图2(b)。此外，在典型的多标签数据集中，标签为负的概率非常高。因此，将未标注的标签作为负标签处理可能会提高鉴别能力，因为它允许利用整个数据 [14]。但是，这种被记为Negative的训练模式，有两个主要的缺点：在训练中加入了标签噪音，导致正负样本之间的高度不平衡 [2]。这种模式如图2(c)。

Figure 2. Illustration of training modes for handling partial labeling. (a) In a partially labeled dataset, only a portion of the samples are annotated for a given class. (b) Ignore mode exploits only a subset of the samples which may lead to a limited decision boundary. (c) Negative mode treats all un-annotated labels as negatives. It may produce suboptimal decision boundary as it adds noise of un-annotated positive labels. Also, annotated and un-annotated negative samples contribute similarly to the optimization process. (d) Our approach aims at mitigating these drawbacks by predicting the probability of a label being present in the image.

图2。处理部分标签的训练模式的说明。(a)在部分标注的数据集中，只有一部分样本对给定的类进行了注释。(b)Ignore模式只利用样本的一个子集，可能导致有限的决策边界。(c)Negative将所有未标注的标签视为否定。由于添加了未标注的正标签噪声，可能会产生次优决策边界。此外，标注和未标注的负样本对优化过程的贡献相同。(d)我们的方法旨在通过预测标签出现在图像中的概率来减轻这些缺陷。

While treating the un-annotated labels as negative can be useful for many classes, it may significantly harm the learning of labels that tend to appear frequently in the images although not being sufficiently annotated. For example, color classes are labeled in only a small number of samples in OpenImages [15], e.g. class “Black” is annotated in 1,688 samples, which is only 0.02% of the samples, while they are probably present in most of the images (see an example in Figure 1). Consequently, such classes are trained with many wrong negative samples. Thus, it would be worthwhile to first identify the frequent classes in the data and treat them accordingly. While in fully annotated multi-label datasets (e.g. MS-COCO [18]) the class frequencies can be directly inferred by counting the number of their annotations, in partially annotated datasets it is not straightforward. Counting the number of positive annotations per class is misleading as the numbers are usually not proportional to the true class frequencies. In OpenImages, assumably infrequent classes like “Boat” and “Snow” are annotated in more than 100,000 samples, while frequent classes as colors are annotated in only ∼1,500 images. Therefore, the class distribution is required to be estimated from the data.

虽然将未注释的标签视为负标签的对许多类都有用，但它可能严重损害对标签的学习，这些标签往往频繁地出现在图像中，尽管没有完全注释。例如，在OpenImages [15] 中，颜色类只在少量样本中被标注，例如，类“Black”在1,688个样本中被标注，这只占样本的0.02%，而它们可能存在于大多数图像中(见图1中的一个示例)。因此，这些类使用许多错误的负样本进行训练。因此，有必要首先确定数据中经常出现的类并相应地处理它们。虽然在完全标注的多标签数据集(如MS-COCO [18])中，类的频率可以通过计算它们标注的数量直接推断出来，但在部分标注的数据集中就不那么简单了。计算每个类的正标签的数量是有误导性的，因为这些数字通常与真实的类频率不成比例。在OpenImages中，像“Boat”和“Snow”这样不常见的类在超过10万个样本中得到了标注，而像颜色这样经常出现的类在大约1500张图片中得到了标注。因此，需要从数据来估计类的分布。

In this paper, we propose a Selective approach that aims at mitigating the weaknesses raised by the primary training modes (Figure 2). In particular, we will select one of the primary mode (Ignore or Negative) for each label individually by utilizing two probabilistic conditions, termed as label likelihood and label prior. The label likelihood quantifies the probability of a label being present in a specific image. The label prior represents the probability of a label being present in the data. To acquire a reliable label prior, we propose a method for estimating the class distribution. To that end, we train a classification model using Ignore mode and evaluate it on a representative dataset. Then, when training the final model, to handle the high negative-positive imbalance, we adopt the asymmetric loss [2], which enables focusing on the hard samples, while at the same time controlling the impact from the positive and negative samples. We further suggest decoupling the focusing levels of the annotated and un-annotated terms in the loss to emphasize the contributions from the annotated negative samples.

在本文中，我们提出了一种Selective 方法，旨在减轻主要训练模式所带来的缺点(图2)。特别地，我们将利用两个概率条件分别为每个标签选择一个主要模式(Ignore或Negative)，称为标签可能性和标签先验。标签可能性量化标签出现在特定图像中的概率。标签先验表示数据中存在标签的概率。为了获得一个可靠的标签先验，我们提出了一种估计类分布的方法。为此，我们使用Ignore模式训练分类模型，并在一个有代表性的数据集上对其进行评估。然后，在训练最终模型时，为了处理负-正样本的高度不均衡，我们采用了不对称损失函数 [2]，可以在关注困难样本的同时控制正、负样本的影响。我们进一步提出解耦损失函数中有标注和无标注部分的聚焦水平，以强调有标注负样本的贡献。

Extensive experiments were conducted on three datasets: OpenImages [15] (V3 and V6) and LVIS [8] which are partially annotated datasets with 9,600 and 1,203 classes, respectively. In addition, we simulated partially annotated versions of the MS-COCO [18] for exploring and verifying our approach. Results and comparisons demonstrate the effectiveness of our proposed scheme. Specifically, on OpenImages (V6) we achieve a state-of-the-art result of 87.34% mAP score. The contributions of the paper can be summarized as follows:
• Introducing a novel selective scheme for handling partially labeled data, that treat each un-annotated label separately based on two probabilistic quantities: label likelihood and label prior. Our approach outperforms previous methods in several partially labeled benchmarks.
• We identify a key challenge in partially labeled data, regarding the inaccuracy of calculating the class distribution using the annotations, and offer an effective approach for estimating the class distribution from the data.
• A partial asymmetric loss is proposed to dynamically control the impact of the annotated and un-annotated negative samples.

在三个数据集上进行了广泛的实验:OpenImages [15] (V3和V6)和LVIS [8]，它们是部分标注的数据集，分别有9,600和1,203个类。此外，我们模拟了MS-COCO[18]的部分标注版本，以探索和验证我们的方法。结果和比较验证了该方案的有效性。具体来说，在OpenImages (V6)上，我们实现了87.34%的mAP得分的最先进的结果。本文的贡献可归纳如下:
•引入一种新的selective方案，用于处理部分标注数据，根据两个概率量单独地处理每个未标注的标签:标签可能性和标签先验。我们的方法在几个部分标注的基准测试中优于以前的方法。
•我们发现了部分标注数据的一个关键挑战，即使用标注数据计算类分布的不准确性，并提供了一种从数据估计类分布的有效方法。
•提出了部分不对称损失函数，以动态控制标注负样本和未标注的负样本的影响。

2. Related Work

2. 相关工作

Several methods had been proposed to tackle the partial labeling challenge. [6] offered a partial binary cross-entropy (CE) loss to weigh each sample according to the proportion of known labels, where the un-annotated labels are simply ignored in the loss computation. In [14] they proposed to involve also the un-annotated labels in the loss, treating them as negative while smoothing their contribution by incorporating a temperature parameter in their sigmoid function. An interactive approach was presented in [11] whose loss is composed of cross-entropy for the annotated labels and a smoothness term as a regularization. A curriculum learning strategy was also used in [6] to complete the missing labels. Instead of using the same training mode for all classes, in this paper we propose adjusting the training mode, either as Ignore or Negative for each class individually, relying on probabilistic based conditions. Also, we introduce a key challenge in partial labeling, concerning the inability to infer the class distribution directly from the number of annotations, and suggest an estimation procedure to handle this.

已经提出了几种方法来解决部分标签的挑战。[6] 提供了一个部分二分类交叉熵(CE)损失，根据已知标签的比例来衡量每个样本，其中未注释的标签在损失计算中被简单地忽略。在 [14] 中，他们提出在损失中也包括未注释的标签，将它们视为负标签，同时通过在其sigmoid函数中加入温度参数来平滑它们的贡献。在 [11] 中提出了一种交互式方法，该方法的损失由标注标签的交叉熵和正则化的平滑项组成。在 [6] 中还使用了一种课程学习策略来补全缺失的标签。在本文中，我们提出了一种基于条件概率来调整训练模式的方法，即对每个类别单独地设置为Ignore或Negative，而不是对所有类别使用相同的训练模式。此外，我们引入了部分标注中的一个关键挑战，即不能直接从标注的数量来推断类的分布，并提出了一个估计过程来处理这个问题。

Other methods were proposed in [26, 28, 30] to cope with partial labels, for example by a low-rank empirical risk minimization [30] or by learning structured semantic correlations [28]. However, they are not scalable to large datasets and their optimization procedures are not well adapted to deep neural networks.

其他方法在 [26,28,30] 中提出以应对部分标签，如低等级的经验风险最小化 [30] 或者学习语义结构关联 [28]。然而，它们不能扩展到大型数据集，它们的优化过程不能很好地适应深度神经网络。

Positive Unlabeled (PU) is also related to partial labeling [1, 9, 12]. The difference is that PU learning approaches use only positive and un-annotated labels without any negative annotations.

阳性未标注(PU)也与部分标注有关 [1,9,12]。不同的是，PU的学习方法只用正标签和未标注标签而没有任何负标签。

3. Learning from Partial Annotations

3. 部分标注学习

3.1. Problem Formulation

3.1. 问题公式化

Given a partially annotated multi-label dataset with ${C}$ classes, each sample $\mathbf{x} \in \mathcal{X}$ , corresponding to a specific image, and is annotated by a label vector $\mathbf{y}=\left\{y_{c}\right\}_{c=1}^{C}$ , where ${y_{c}} \in \left\{-1,0,1\right\}$ , denotes whether the class ${c}$ is present in the image (‘1’), absent (‘−1’) or unknown (‘0’). For a given image, we denote the sets of positive and negative labels as $\mathcal{P}_{\mathbf{x}}=\left\{c \mid y_{c}=1\right\}$ , and $\mathcal{N}_{\mathbf{x}}=\left\{c \mid y_{c}=-1\right\}$ , respectively. The set of un-annotated labels is denoted by $\mathcal{U}_{\mathbf{x}}=\left\{c \mid y_{c}=0\right\}$ . Note that typically, $\left|\mathcal{P}_{\mathbf{x}} \cup \mathcal{N}_{\mathbf{x}}\right| \ll\left|\mathcal{U}_{\mathbf{x}}\right|$ . A general form of the partially annotated multi-label classification loss can be defined as follows,
$\mathcal{L}(\mathbf{x})=\sum_{c \in \mathcal{P}_{\mathbf{x}}} \mathcal{L}_{c}^{+}(\mathbf{x})+\sum_{c \in \mathcal{N}_{\mathbf{x}}} \mathcal{L}_{c}^{-}(\mathbf{x})+\sum_{c \in \mathcal{U}_{\mathbf{x}}} \mathcal{L}_{c}^{u}(\mathbf{x})$ (1)
where $\mathcal{L}_{c}^{+}(\mathbf{x})$ , $\mathcal{L}_{c}^{-}(\mathbf{x})$ and $\mathcal{L}_{c}^{u}(\mathbf{x})$ are the loss terms of the positive, negative and un-annotated labels for sample $\mathbf{x}$ , respectively. Given a set of ${N}$ labeled samples $\left\{\mathbf{x}_{i}, \mathbf{y}_{i}\right\}_{i=1}^N$ , our goal is to train a neural-network model $f(\mathbf{x} ; \boldsymbol{\theta})$ , parametrized by $\boldsymbol{\theta}$ , to predict the presence or absence of each class given an input image. We denote by $\mathbf{p}=\left\{\mathcal{p}_{c}\right\}_{c=1}^{C}$ the class prediction vector, computed by the model: $p_{c}=\sigma\left(z_{c}\right)$ where $\sigma({·})$ is the sigmoid function, and ${z}_{c}$ is the output logit corresponding to class ${c}$ .

给定一个带有 ${C}$ 类的部分标注多标签数据集，每个样本 $\mathbf{x} \in \mathcal{X}$ 对应于一个特定的图像，被标签向量 $\mathbf{y}=\left\{y_{c}\right\}_{c=1}^{C}$ 标注，其中 ${y_{c}} \in \left\{-1,0,1\right\}$ ，标示类 ${c}$ 是否存在于图片中，出现 (‘1’)，不出现 (‘-1’)，未知 (‘0’)。对于给定的图片，我们分别定义正标签集和负标签集为 $\mathcal{P}_{\mathbf{x}}=\left\{c \mid y_{c}=1\right\}$ 和 $\mathcal{N}_{\mathbf{x}}=\left\{c \mid y_{c}=-1\right\}$ 。未标注的标签定义为 $\mathcal{U}_{\mathbf{x}}=\left\{c \mid y_{c}=0\right\}$ 。值得注意的是，一般来说 $\left|\mathcal{P}_{\mathbf{x}} \cup \mathcal{N}_{\mathbf{x}}\right| \ll\left|\mathcal{U}_{\mathbf{x}}\right|$ 。部分标注多标签分类损失函数的一般形式可以定义为，
$\mathcal{L}(\mathbf{x})=\sum_{c \in \mathcal{P}_{\mathbf{x}}} \mathcal{L}_{c}^{+}(\mathbf{x})+\sum_{c \in \mathcal{N}_{\mathbf{x}}} \mathcal{L}_{c}^{-}(\mathbf{x})+\sum_{c \in \mathcal{U}_{\mathbf{x}}} \mathcal{L}_{c}^{u}(\mathbf{x})$ (1)
$\mathcal{L}_{c}^{+}(\mathbf{x})$ ， $\mathcal{L}_{c}^{-}(\mathbf{x})$ 和 $\mathcal{L}_{c}^{u}(\mathbf{x})$ 分别是样本 $\mathbf{x}$ 的正、负和未标注标签的损失项。给定 ${N}$ 个带标签的样本 $\left\{\mathbf{x}_{i}, \mathbf{y}_{i}\right\}_{i=1}^N$ ，我们的目标是训练一个神经网络模型 $f(\mathbf{x} ; \boldsymbol{\theta})$ ，由 $\boldsymbol{\theta}$ 参数化，用于预测每个类别在给定的输入图像中存在或不存在。我们用 $\mathbf{p}=\left\{\mathcal{p}_{c}\right\}_{c=1}^{C}$ 表示类预测向量，由模型计算： $p_{c}=\sigma\left(z_{c}\right)$ ，其中 $\sigma({·})$ 是sigmoid函数， ${z}_{c}$ 是对应于类别 ${c}$ 的逻辑输出。

For example, applying the binary CE loss while considering only the annotated labels is defined by setting the loss terms as $\mathcal{L}_{c}^{+}(\mathbf{x})=\log({p}_{c})$ , $\mathcal{L}_{c}^{-}(\mathbf{x})=\log(1−{p}_{c})$ and $\mathcal{L}_{c}^{u}(\mathbf{x})=0$ .

例如，只考虑标注标签时应用二分类CE损失函数将损失项定义为： $\mathcal{L}_{c}^{+}(\mathbf{x})=\log({p}_{c})$ , $\mathcal{L}_{c}^{-}(\mathbf{x})=\log(1−{p}_{c})$ 和 $\mathcal{L}_{c}^{u}(\mathbf{x})=0$ 。

3.2. How to Treat the Un-annotated Labels?

3.2. 如何对待未标注标签?

Typically, the number of un-annotated labels is much higher than the annotated ones. Therefore, the question of how to treat the un-annotated labels may have a considerable impact on the learning process. Herein, we will first define the two primary training modes and detail their strengths and limitations. Then, in light of these insights, we will propose a class aware mechanism which may better handle the un-annotated labels.
通常，未标注标签的数量要比标注标签的数量多得多。因此，如何对待未标注标签的问题可能对学习过程有相当大的影响。在此，我们将首先界定两种主要的训练模式，并详细说明它们的优势和局限性。然后根据这些理解，我们提出一种类感知机制，可以更好地处理未标注标签。

Mode Ignore. The basic scheme for handling the un-annotated labels is simply to ignore them, as suggested in [6]. In this mode we set $\mathcal{L}_{c}^{u} (x)=0$ . This way the training data is not contaminated with wrong annotations. However, its drawback is that it enables to use only a subset of the data. For example, in OpenImages dataset, the number of samples with either positive or negative annotations for the class “Cat” is only ∼ 0.9% of the training data. This may lead to a sub-optimal classification boundary when the annotated negative labels do not sufficiently cover the space of the negative class. See illustration in Figure 2(b).

模式 Ignore。处理未标注标签的基本方案是忽略它们，就像 [6] 中建议的那样。在此模式中，我们设置 $\mathcal{L}_{c}^{u} (x)=0$ 。这样训练数据就不会受到错误标注的污染。但是它的缺点是只能使用数据的一个子集。例如，在OpenImage数据集中，对于类“Cat”有正标签或者负标签的样本仅占训练样本的0.9%。当标注负样本没有充分覆盖父类空间时，这可能导致一个次优的分类边界。见图2(b)。

Mode Negative. In typical multi-label datasets, the chance of a specific label to appear in an image is very low. For example, in the fully-annotated MS-COCO dataset [18], a label is annotated as negative with a probability of ∼ 0.96. Based on this prior assumption, a reasonable choice would be to treat the un-annotated labels as negative, i.e. setting $\mathcal{L}_{c}^{u}(x)=\mathcal{L}_{c}^{-}(x)$ . This working mode was also suggested in [14]. While this mode enables the utilization of the entire dataset, it suffers from two main limitations. First, it may wrongly annotate positive labels as negative annotations, adding label noise to the training. Secondly, this mode inherently triggers a high imbalance between negative and positive samples. Balancing them, for example by down-weighting the contribution of the negative samples, may diminish the impact of the valuable annotated negative samples. These weaknesses are illustrated in Figure 2(c).

模式 Negative。在典型的多标签数据集中，特定标签出现在图像中的几率非常低。例如，在完全标注的MS-COCO数据集 [18] 中，一个标签被标注为负的概率为0.96。基于这个先验的假设，一个合理的选择是将未标注的标签视为负标签，即设置 $\mathcal{L}_{c}^{u}(x)=\mathcal{L}_{c}^{-}(x)$ 。在 [14] 中也提出了这种工作模式。虽然这种模式允许使用整个数据集，但它有两个主要的限制。首先，它可能错误地将正标签标注为负标签，给训练增加标签噪音。其次，这种模式本身会导致负样本和正样本之间的高度不平衡。为了平衡它们，例如使用降低负样本的作用，可能会降低有价值的标注负样本的影响。图2(c)说明这些缺点。

The question of which mode to choose has no unequivocal answer. It depends on various conditions and may have its origin in the annotation scheme used. In section 5.1, we will show that different partial annotation procedures can lead to favor different loss modes (See Figure 6). Moreover, as discussed in the next section, the used mode can influence each class differently, depends upon the class presence frequency in the data and the number of available annotations.

选择哪种模式的问题没有明确的答案。它取决于各种条件，可能起源于所使用的标注方案。在第5.1节中，我们将展示不同的部分标注过程可能导致不同的损失模式偏好（见图6）。此外，正如在下一节中讨论的，所选的模式可能会对每个类产生不同的影响，这取决于类在数据中出现的频率和可用标签的数量。

Figure 3. Proposed approach. First, a class distribution estimation phase is performed to obtain a reliable label prior using a temporary network trained with the Ignore mode. Then, the target model is trained using a Selective approach which assigns a Negative or Ignore mode for each label based on its estimated prior and likelihood.

图3。提出的方法。首先，在分布估计阶段用使用Ignore模式训练的临时网络获得可靠的标签先验。然后，使用Selective方法训练目标模型，该方法根据每个标签的先验和似然估计为其分配Negative或Ignore模式。

3.3. Class Distribution in Partial Annotation

3.3 部分标注中的类分布

As aforementioned, in multi-label datasets the majority of labels are present in only a small fraction of the data. For example, in MS-COCO, 89% of the classes appear in less than 5% of the data. Thus, treating all un-annotated labels as negative may improve the discriminative power for many classes, as more real negative samples are involved in the training, while the added label noise is negligible. However, this may significantly harm the learning of classes whose number of positive annotations in the dataset is much lower than the actual number of samples they appear in. Consider the case of the class “Person” in MS-COCO. It is present in 55% of the data (45,200 samples). Now, suppose that only a subset of 1,000 positive annotations are available, and the rest are switched to negative. It means that during the training, most of the prediction errors are due to wrong annotations. In this case, the optimization will be degraded and the network confidence will be decayed considerably. Hence, it will be beneficial to first identify the frequent labels and handle them differently in the loss.

如上所述，在多标签数据集中，大多数标签只存在于数据的一小部分中。例如，在MS-COCO中，89%的类出现在不到5%的数据中。因此，将所有未标注的标签视为负的可能会提高许多类的甄别能力，因为训练中涉及到更多真实的负样本，而添加的标签噪声可以忽略不计。然而，这可能会严重数据集中的正标签数量远远低于它们出现在样本中的实际数量的类的学习。以MS-COCO中的“Person”类为例。它存在于55%的数据(45200个样本)中。现在，假设只有1000个正标签的子集可用，其余的都切换为负面。这意味着在训练过程中，大部分的预测错误都是由于错误的标注造成的。在这种情况下，优化效果会下降，网络置信度会大幅下降。因此，在损失函数中，首先识别出经常出现的标签，并采取不同的处理方法将是有益的。

3.3.1 Positive Annotations Deficiency

3.3.1 正标签不足

To identify the frequent labels, we need to reliably acquire their distribution in the data. While in fully annotated datasets it can be easily obtained by counting the number of annotations per class and normalizing by the total number of samples, in partially annotated datasets it is not straight-forward. While one may suggest counting the number of positive annotations for each class, the resulted numbers are misleading and are usually not proportional to the true class frequencies. For example, in OpenImages (V6), we found that many common and general classes which are frequently present in images are labeled with very few positive annotations. For example, general classes such as “Day-time”, “Event” or “Design” are labeled in only 1,709, 1,517 and 1,394 images (out of 9M), respectively. Color classes which massively appear in images are also rarely annotated. “Black” and “White” classes are labeled in only 1,688 and 1,497 images, respectively. We may assume that classes such as “Daytime” or “White” are present in much more than 0.02% of the samples. Similarly, in LVIS dataset, the classes “Person” and “Shirt” are annotated in only 1,928 and 1,942 samples, respectively, while they practically appear in much more images (note that in MS-COCO, which shares the same images with LVIS, the class “Person” ap- pears in 55% of the samples).

为了获得标签的频率，我们需要可靠地获取它们在数据中的分布。虽然在完全标注的数据集中，通过计算每个类的标注数量并通过样本总数进行规范化可以很容易地获得它，但在部分标注的数据集中就不那么简单了。虽然有人可能建议计算每个类的正标签的数量，但得出的数字具有误导性，而且通常与真实的类频率不成比例。例如，在OpenImages (V6)中，我们发现许多常见的和通用的类经常出现在图像中，但很少有正面的标注。例如，“白天”、“事件”或“图案”这样的通用类分别只在1,709、1,517和1,394张图片中被标记(9M)。大量出现在图像中的颜色类也很少被注释。“黑色”和“白色”类别分别只在1688和1497张图片中被标记。我们可以假设，像“白天”或“白色”这样的类在样本中所占的比例远远超过0.02%。类似地，在LVIS数据集中，类“人”和“衬衫”分别只在1928和1942个样本中被标注，而实际上它们出现在更多的图像中(注意，在MS-COCO中，与LVIS共享相同的图像，类“人”出现在55%的样本中)。

Note that the labels are not necessarily annotated according to their dominance in the image. In Figure 1, we show examples of three images and corresponding annotations of the classes “Lip” and “Yellow”. As can be seen, the left image was not annotated with neither “Lip” nor “Yellow” although these labels are present and dominant in it. Also, “Lip” is annotated in only 1,121 images which is highly deficient in view of the fact that the class “Human face” is annotated in 327,899 images.

请注意，标签的标注不一定符合其在图像中的主导地位。在图1中，我们展示了三个图像的示例以及类“嘴唇”和“黄色”的相应标注。可以看到，左边的图像既没有标注“嘴唇”，也没有标注“黄色”，虽然这些标签在其中是存在的，并且是占主导地位的。此外，“嘴唇”的注释只有1121张，这与“人脸”类的注释有327,899张相比是非常不足的。

According to the above-mentioned observations, the number of positive annotations cannot be used to measure the class frequencies in partially labeled datasets. In section 4.2, we will propose a simple yet effective approach for estimating the class distribution from the data.

根据上述观察，正样本的数量不能用来衡量部分标注数据集中的类频率。在第4.2节中，我们将提出一种简单而有效的方法来从数据估计类分布。

4. Proposed Approach

4. 提出的方法

In this section we will present our method which aims at mitigating the issues raised in training partially annotated data. An overview of the proposed approach is summarized in Figure 3.

在本节中，我们将介绍我们的方法，该方法旨在减轻训练部分标注数据时产生的问题。图3概述了所提出的方法。

To mitigate the high negative-positive imbalance problem, we adopt the asymmetric loss (ASL) proposed in [2] as the base loss for the multi-label classification task. It enables to dynamically focus on the hard samples while at the same time controlling the contribution propagated from the positive and negative samples. First, let us denote the basic term of the focal loss [17] for a given class $c$ , by:
$L_{F}{\left(p_{c}, \gamma\right)} = \left(1 - p_{c}\right)^{\gamma} \log{p_{c}}$ (2)
where $\gamma$ is the focusing parameter, which adjusts the decay rate of the easy samples. Then, we define the partially annotated loss as follows,
$L\left(\mathbf{x}\right) = \sum_{x \in \mathcal{P}_{x}}\left(p_{c}, \gamma^{+}\right) + \sum_{c \in \mathcal{N}_{x}}L_{F}(1 - p_{c}, \gamma^{-}) + \sum_{c \in \mathcal{U}_{x}}{ w_{c} \mathcal{L}_{F} (1 - p_{c}, \gamma^{u})}$ (3)
where $\gamma^{+}$ , $\gamma^{-}$ and $\gamma^{u}$ are the focusing parameters for the positive, negative and un-annotated labels, respectively. $w_{c}$ is the selectivity parameter and it is introduced in section 4.1. We usually set $\gamma^{+} < \gamma^{-}$ to decay the positive term with a lower rate than the negative one because the positive samples are infrequent compared to the negative samples. In addition, as for a given class, the negative annotated samples are verified ground-truth we are interested in preserving their contribution to the loss. Therefore, we suggest decoupling of the focusing parameter of the annotated negative labels from the un-annotated one, allowing us to set a lower decay rate for the annotated negative labels: $\gamma^{-} < \gamma^{u}$ . This way, the impact of the annotated negative samples on establishing the classification boundary for each class is higher (see Figure 2(d)). We term this form of asymmetric loss as Partial-ASL (P-ASL).

为了缓解高度不平衡问题，我们采用 [2] 中提出的asymmetric loss(ASL)作为多标签任务的基础损失。它能够动态地关注困难样本，同时控制正样本和负样本的贡献传播。首先我们定义给定类 $c$ 的在focal loss [17] 中的基本项：
$L_{F}{\left(p_{c}, \gamma\right)} = \left(1 - p_{c}\right)^{\gamma} \log{p_{c}}$ (2)
其中 $\gamma$ 是焦点参数，调节容易样本的衰减率。然后，我们对部分标注损失的定义如下:
$L\left(\mathbf{x}\right) = \sum_{x \in \mathcal{P}_{x}}\left(p_{c}, \gamma^{+}\right) + \sum_{c \in \mathcal{N}_{x}}L_{F}(1 - p_{c}, \gamma^{-}) + \sum_{c \in \mathcal{U}_{x}}{ w_{c} \mathcal{L}_{F} (1 - p_{c}, \gamma^{u})}$ (3)
其中 $\gamma^{+}$ ， $\gamma^{-}$ 和 $\gamma^{u}$ 分别是正标签，负标签和未标注标签的焦点参数。 $w_{c}$ 是selectivity参数将在4.1节介绍。我们通常设置 $\gamma^{+} < \gamma^{-}$ 以比负标签更低的比例衰减正项，因为与负标签相比，正标签的频率较低。此外，对于一个给定的类，样本被标注的负标签被证实为真实标签，我们期望的是保留它们对损失函数的贡献。因此，我们建议将带标注负标签的焦点参数与未带标注负标签的焦点参数解耦，从而为带标注负标签设置更低的衰减率: $\gamma^{-} < \gamma^{u}$ 。这样，带标注的负样本对建立每个类的分类边界的影响更大(见图2(d))。我们将这种形式的不对称损失称为Partial-ASL (P-ASL)。

4.1. Class-aware Selective Loss

4.1. 类选择损失函数

As described in section 3.1, both Ignore and Negative modes are supported by inadequate assumptions for the partial annotation problem. In this section, we propose a selective approach for adjusting the mode per individual class. The core idea is to examine the probability of each un- annotated label being present in a given sample $\mathbf{x}$ . Un-annotated labels that are suspected as positive will be ignored. The others will be treated as negative.

如3.1节所述，对于部分标注问题，Ignore和Negative模式都有不充分的假设支持。在本节中，我们提出了一种针对每个类调整模式的选择性方法。核心思想是检查每个未标注标签出现在给定样本 $\mathbf{x}$ 中的概率。被怀疑为阳性的未标注标签将被忽略。其他的将被视为负面的。

For that purpose, we define two probabilistic values: label likelihood and label prior, and detail their usage in the following section. These two quantities are complementary to each other. The label likelihood enables to dynamically ignore the loss contribution of a label in a given image by inspecting its visual content. The label prior extracts useful information of the estimated class frequencies in the data and uses it regardless of the specific image content.

为此，我们定义了两个概率值:标签可能性和标签先验，并在下一节详细介绍它们的用法。这两个量是互补的。标签可能性能够通过检查给定图片的视觉内容来动态地忽略一个标签在给定图片中的损失贡献。标签先验提取数据中类频率估计的有用信息，并且使用它而不考虑具体的图像内容。

Label likelihood. Defined by the conditional probability of an un-annotated label $c$ of being positive given the image and the model parameters. i.e.
$P (y_{c} = 1 \mid \mathbf{x}; \theta );\quad \forall_{c} \in \mathcal{U}_{\mathbf{x}}$ (4)
It can be simply estimated by the network prediction $\left\{p_{c}\right\}_{c \in \mathcal{U}_{\mathbf{x}}}$ throughout the training. A high $p_{c}$ may imply that the un-annotated label $c$ appears in the image, and treating it as negative may lead to an error. Accordingly, the label $c$ should be ignored. In practice, we allow for $K$ un-annotated labels with top prediction values to be ignored. i.e.
$\Omega_{L} = \left\{c \in \mathcal{U}_{\mathbf{x}} \mid c \in \text{TopK}\left(\left\{p_{c}\right\}\right)\right\}$ (5)
where the $\text{TopK}\left(·\right)$ operator returns the indices of the top $K$ elements of the input vector. The algorithm scheme is illustrated in Figure 4. Note that this implementation enables us to “walk” on a continuous scale between the Negative and Ignore modes. Setting $K = 0$ corresponds to Negative mode, as no un-annotated label is ignored. $K = C$ equivalents to the Ignore mode, as all un-annotated labels are ignored.

标签可能性。定义为在给定图像和模型参数中未标注标签c为正的条件概率。即
$P (y_{c} = 1 \mid \mathbf{x}; \theta ); \quad \forall_{c} \in \mathcal{U}_{\mathbf{x}}$ (4)
它在整个训练过程中通过模型预测 $\left\{p_{c}\right\}_{c \in \mathcal{U}_{\mathbf{x}}}$ 来简单的估计。一个高的 $p_{c}$ 可能意味着未标注的标签 $c$ 出现在图片中，而将其视为负标签可能会导致错误。因此标签 $c$ 应该被忽略。在实践中，我们允许K个高预测值的未标注标签被忽略。即
$\Omega_{L} = \left\{c \in \mathcal{U}_{\mathbf{x}} \mid c \in \text{TopK}\left(\left\{p_{c}\right\}\right)\right\}$ (5)
其中 $\text{TopK}\left(·\right)$ 操作返回输入向量的前K个值的索引。算法方案如图4所示。请注意，这种实现使我们能够在Negative和Ignore模式之间连续“游走”。设置 $K = 0$ 对应于Negative模式，因为没有未标注的标签会被忽略。 $K = C$ 等同于Ignore模式，因为所有未标注的标签都会被忽略。

Label prior. Defined by the probability of a label $c$ being present in an image. It can also be viewed as the actual label presence frequency in the data. We are interested in the label prior for the un-annotated labels,
$P(y_{c} = 1); \quad \forall_{c} \in \mathcal{U}_{\mathbf{x}}$ (6)
According to section 3.3, the label prior should be estimated from the data, as the class distribution is hidden in partially annotated datasets. In the next section (4.2), we will introduce the scheme for estimating the label prior. Meanwhile, let us denote by $\hat{P}_{r}\left(c\right)$ the label prior estimator for class $c$ . We are interested in disabling the loss contribution of labels with high prior values. These labels are formally defined by the following set,
$\Omega_{P} = \left\{c \in \mathcal{U}_{\mathbf{x}} \mid \hat{P}_{r}\left(c\right) > \eta\right\}$ (7)
where $\eta \in \left[0, 1\right]$ represents the minimum fraction of the data determining a label to be ignored.

标签先验。定义为标签 $c$ 在图像中出现的概率。它也可以被视为数据中的实际标签出现频率。我们关注未标注标签的先验概率，
$P(y_{c} = 1); \quad \forall_{c} \in \mathcal{U}_{\mathbf{x}}$ (6)
根据3.3节，标签先验应该从数据中估计，因为类分布隐藏在部分标注的数据集中。在下一节(4.2)中，我们将介绍估计标签先验的方案。同时，让我们用 $\hat{P}_{r}\left(c\right)$ 表示类 $c$ 的标签先验估计。我们希望的是禁用具有高先验值的标签的损失贡献。这些标签由下面的集合正式定义，
$\Omega_{P} = \left\{c \in \mathcal{U}_{\mathbf{x}} \mid \hat{P}_{r}\left(c\right) > \eta\right\}$ (7)
其中 $\eta \in \left[0,1 \right]$ 表示数据决定忽略一个标签的最小分数。

Finally, we denote the set of labels whose loss contribution are ignored, as the union of the two previously computed sets,
$\Omega_{\text{Ignore}} = \Omega_{L} \cup \Omega_{P}$ (8)
Accordingly, we set the parameter $w_{c}$ in equation (3) as follows,
$w_{c} = \begin{cases} 0 & c \in \Omega_{\text{Ignore}} \\ 1 & c \notin \Omega_{\text{Ignore}}\end{cases}$ (9)
Note that we have explored other alternatives for implementing the label prior in the loss function. In particular, in appendix $\color{red}{\text{B}}$ we compare a soft method that integrates the label prior by setting $w_{c}=\exp (- \alpha \hat{P}_{r} (c)); \forall_{c} \notin \Omega_{L}$ , and show that using a hard decision mechanism, as proposed in equation ( $\color{red}{\text{9}}$ ), produces better results.

最后，我们将损失贡献被忽略的标签集合表示为之前计算的两个集合的并集，
$\Omega_{\text{Ignore}} = \Omega_{L} \cup \Omega_{P}$ (8)
据此，我们设置公式( $\color{red}{3}$ )中的参数 $w_{c}$ 如下:
$w_{c} = \begin{cases} 0 & c \in \Omega_{\text{Ignore}} \\ 1 & c \notin \Omega_{\text{Ignore}}\end{cases}$ (9)
注意，我们已经研究了在损失函数中标签先验的其他替代方法。尤其是，在附录 $\color{red}{\text{B}}$ 中，我们对比了一个设置 $w_{c}=\exp (- \alpha \hat{P}_{r} (c)); \forall_{c} \notin \Omega_{L}$ 来继承标签先验的软方法，并且展示了使用公式( $\color{red}{\text{9}}$ )提出的硬决策机制，可以产生更好的结果。

4.2. Estimating the Class Distribution

4.2. 类分布估计

We aim at estimating the class distribution in a representative dataset $\mathcal{X}$ . For that, we first need to assess the presence of each class in every image in the data, i.e. we would like to first approximate the probability of a class $c$ being present in an image $x \in \mathcal{X}: P(y_{c} = 1 \mid \mathbf{x})$ . To that end, we propose training a model parametrized by $\theta$ , for predicting each class in a given image, i.e. $P (y_{c} = 1 \mid \mathbf{x}; \theta)$ . Afterwards, the model is applied on the sample set $\mathcal{X}$ (e.g. the training data). The label prior can then be estimated by calculating the expectation,
$P \left(y_{c}=1; \theta\right) = \frac{1}{\mid\mathcal{x}\mid} \sum_{\mathbf{x} \in \mathcal{X}} P\left(y_{c}=1 \mid \mathbf{x}; \theta\right)$ (10)

我们的目的是估计一个代表性数据集 $\mathcal{X}$ 中的类别分布。为此，我们首先需要估计数据中每个图像中每个类的存在，即，我们想首先近似一个类别 $c$ 存在于图像中的概率 $x \in \mathcal{x}: P(y_{c} = 1 \mid \mathbf{x})$ 。为此，我们提出训练一个以 $\theta$ 为参数的模型，用于预测给定图像中的每个类，即 $P (y_{c} = 1 \mid \mathbf{x};\theta)$ 。然后，将模型应用于样本集合 $\mathcal{X}$ (例如训练数据)。标签先验可以通过计算期望来估计，
$P \left(y_{c}=1; \theta\right) = \frac{1}{\mid\mathcal{x}\mid} \sum_{\mathbf{x} \in \mathcal{X}} P\left(y_{c}=1 \mid \mathbf{x}; \theta\right)$ (10)

For estimating the label priors, we train the model in Ignore mode. While the discriminative power of the Negative mode may be stronger for majority of the labels, it may fail to provide a reliable prediction values for frequent classes with small number of positive annotations. Propagating abundance of gradient errors from wrong negative annotations will decay the expected returned prediction for those classes and will fail to approximate $P(y_{c} = 1 \mid \mathbf{x})$ . Consequently, our suggested estimation for the class distribution is given by,
$\hat{P}_{r}\left(c\right) = P\left(y_{c} = 1; \theta_{\text{Ignore}}\right)$ (11)
where $\theta_{\text{Ignore}}$ denotes the model parameters trained in Ignore mode. In section $\color{red}{5.2}$ , we will empirically show the effectiveness of the Ignore mode in ranking the class frequencies and the inapplicability of the Negative mode to do that. To qualitatively show the estimation effectiveness, we present in Figure $\color{red}{5}$ the top 20 frequent classes in OpenImages (V6) as estimated by our proposed procedure. Note that all the top classes are commonly present in images such as colors (“White”, “Black”, “Blue” etc.) or general classes such as “Photograph”, “Light”, “Daytime” or “Line”. In appendix $\color{red}{\text{D}}$ , we show the next top 60 estimated classes. Also, in appendix $\color{red}{\text{E}}$ , we provides the top 20 estimated frequent classes for LVIS dataset.

为了估计标签先验，我们采用Ignore模式训练模型。虽然Negative模式对大多数标签的识别力可能更强，但对于带有少量正正标签的频繁类，它可能无法提供可靠的预测值。从错误的负标签传播梯度误差将降低这些类的预测值，并且将无法近似 $P(y_{c} = 1 \mid \mathbf{x})$ 。因此，我们建议的类别分布估计如下:
$\hat{P}_{r}\left(c\right) = P\left(y_{c} = 1; \theta_{\text{Ignore}}\right)$ (11)
其中 $\theta_{\text{Ignore}}$ 表示在Ignore模式下训练的模型参数。在 $\color{red}{5.2}$ 节中，我们将经验地展示Ignore模式对类频率排序的有效性，以及Negative模式在此方面的不适用性。为了定量地表示估计的有效性，我们在图 $\color{red}{5}$ 中展示了使用我们提出的程序在OpenImages (V6)上估计出的最常用的20个类。请注意，所有顶级类别通常在图像中出现，如颜色(“白色”、“黑色”、“蓝色”等)或一般类别，如“照片”、“光”、“白天”或“线”。在附录 $\color{red}{\text{D}}$ 中，我们展示了之后60个类的估计。此外，在附录 $\color{red}{\text{E}}$ 中，我们展示了LVIS数据集频率估计最高的20个类。

5. Experimental Study

5. 实验研究

In this section, we will experimentally demonstrate the insights discussed in the previous sections. We will mainly utilize the fully annotated MS-COCO dataset [18] to validate and demonstrate the effectiveness of our approach by simulating partial annotation under specific case studies. The evaluation metric used in the experiments is the mean average precision (mAP). Training details are provided in appendix $\color{red}{\text{A}}$ .

5.1. Impact of Annotation Schemes

5.1. 标注规则的影响

As aforementioned in section $\color{red}{\text{3.2}}$ , the scheme used for annotating the dataset can substantially induce the learning process. Specifically, the choice of how to treat the un-annotated labels is highly influenced by the annotation scheme. To demonstrate that, we simulate two partial annotation schemes on the original fully annotated MS-COCO dataset [18]. MS-COCO includes 80 classes, 82,081 training samples, and 40,137 validation samples, following the 2014 split. The two simulated annotation schemes are detailed as follows:
Fixed per class (FPC). For each class, we randomly sample a fixed number of positive annotations, denoted by $N_{s}$ , and the same number of negative annotations. The rest of the annotations are dropped.
Random per annotation (RPA). We omit each annotation with probability $p$ . Note that this simulation preserves the true class distribution of the data.

正如 $\color{red}{\text{3.2}}$ 节提到的，数据集的标注方案可以从本质上引导学习过程。具体而言，如何处理未标注数据的选择很大程度上受标注方案的影响。为了证明这一点，我们在原始的全标注的MS-COCO数据集 [18]上模拟了两种标注方案。MS-COCO包含80个类别，2014年分为82081个训练样本和40137个验证样本。两种模拟标注方案详细说明如下：
固定每类(FPC)。对于每个类，我们随机抽取固定数量的正标注用 $N_{s}$ 表示和相同数量的负标注。其余的标注被删除。
随机每个标注(RPA)。我们以概率为 $p$ 删除每个标注。注意，这个模拟保留了数据的真实类分布。

In Figure $\color{red}{\text{6}}$ , we show results obtained using each one of the simulation schemes for each primary mode (Ignore and Negative) while varying $N_{s}$ and $p$ values. As can be seen, while in RPA (Figure $\color{red}{\text{6}}$ (a)), the Ignore mode consistently shows better results, in FPC (Figure $\color{red}{\text{6}}$ (b)), the Negative mode is superior. Note that as we keep more of the annotated labels (by either increasing $N_{s}$ or decreasing $p$ ), the gap between the two training modes is reduced, catch- ing the maximal result. The phenomenons observed in the two case studies we simulated are also related in real practical procedures for partially annotating multi-label datasets. While in the FPC simulation, the class distribution is completely vanished and cannot be inferred by the number of positive annotations ( $N_{s}$ for $c = 1, ..., C$ ), the RPA scheme preserves the class distribution.

在图 $\color{red}{\text{6}}$ 中，我们显示了在改变 $N_{s}$ 和 $p$ 值时，使用每个主模式(Ignore和Negative)的每个模拟方案获得的结果。可以看到，RPA(图 $\color{red}{\text{6}}$ (a))， Ignore模式始终显示更好的结果，在FPC(图 $\color{red}{\text{6}}$ (b))，Negative模式是优越的。注意，当我们保留更多的带标注标签时(通过增加 $N_{s}$ 或减少 $p$ )，两种训练模式之间的差距就会缩小，从而获得最大的结果。我们模拟的两个案例研究中观察到的现象也与多标签数据集部分标注的实际过程有关。而在FPC模拟中，类分布完全消失了，不能通过正标注的数量来推断( $N_{s}$ for $c = 1，…， C$ )， RPA方案保留了类的分布。

Figure 6. Impact of annotation schemes. mAP results obtained using the RPA (a) and the FPC (b) simulation schemes for each primary mode. While in RPA, Ignore mode consistently shows better results, in FPC, the Negative mode is superior.

图6。注释方案的影响。使用RPA (a)和FPC (b)模拟方案对每个主模式获得的mAP结果。而在RPA模式下，Ignore模式始终表现出较好的效果，而在FPC模式下，Negative模式表现出较好的效果。

Figure 7. Spearman correlation between the true class distribution and the estimated distribution. Unlike the Negative mode, training a model using Ignore mode is well suited for estimating the class distribution.

图7。真实类别分布与估计类别分布之间的Spearman相关性。与Negative不同，使用Ignore训练模型非常适合估计类分布。

5.2. Estimating the Label Prior

5.2. 标签先验估计

To demonstrate the estimation quality of the class distribution obtained by the approach proposed in section $\color{red}{\text{4.2}}$ , we follow the FPC simulation scheme applied on the MS-COCO dataset (as described in section $\color{red}{\text{5.1}}$ ), where a constant number of 1,000 annotations remained for each class. Because MS-COCO is a fully annotated dataset, we can compare the estimated class distribution (i.e. the label prior) to the true class distribution inferred by the original number of annotations. In particular, we measure the similarity be- tween the original class frequencies and the estimated ones using the Spearman correlation test. In figure $\color{red}{\text{7}}$ , we show the Spearman correlation scores while varying the number of top-ranked classes. We also show the results obtained with Negative mode as a reference. Specifically, the Spearman correlation computed over all the 80 classes, with the estimator obtained using the Ignore mode is 0.81, demonstrating the estimator’s effectiveness. In the next section, we will show how it benefits the overall classification results. Also, in appendix $\color{red}{\text{C}}$ we present the top frequent classes measured by our estimator and compare them to those obtained by the original class frequencies in MS-COCO.

为了证明由第 $\color{red}{\text{4.2}}$ 中提出的方法获得的类分布的估计质量，我们在MS-COCO数据集上应用的FPC模拟方案(如第 $\color{red}{\text{5.1}}$ 中所述)，其中每个类保留不变的1000个标注。由于MS-COCO是一个全标注数据集，我们可以将估计的类分布(即之前的标签)与原始标注数量推断出的真实类分布进行比较。特别地，我们使用Spearman相关检验来衡量原始类别频率和估计类别频率之间的相似性。在图 $\color{red}{\text{7}}$ 中，我们在改变排名靠前的类的数量时显示了Spearman相关分数。我们还展示了用Negative模式得到的结果，以供参考。具体来说，对所有80类进行Spearman相关性计算，使用Ignore模式得到的估计量为0.81，证明了估计量的有效性。在下一节中，我们将展示它对整体分类结果的好处。此外，在附录 $\color{red}{\text{C}}$ 中，我们呈现了由我们的估计器测量的最高频率类，并将它们与MS-COCO中原始类频率得到的那些进行比较。

Table 1. OpenImages (V6) results. The Selective approach with P-ASL improves both mAP(C) and mAP(O) scores.

表1。OpenImages (V6)结果. 使用P-ASL的Selective方法提高了mAP(C)和mAP(O)的得分

Table 2. OpenImages (V6) results for different backbones. Using TResNet-L model we achieve top results on OpenImages V6.

表2。不同骨干网络的OpenImages (V6)结果。使用TResNet-L在OpenImages V6上取得最好结果。

6. Benchmark Results

6. 基准测试结果

In this section, we will report our main results on the partially annotated multi-label datasets: OpenImages [15], and LVIS [8]. The results on MS-COCO dataset are presented in appendix $\color{red}{\text{C}}$ . We will present a comparison to previous methods which handle partial annotations, among other baseline approaches in multi-label classification. The evaluation metric used in the experiments is the mean average precision (mAP). In particular, we report the standard per-class mAP denoted as mAP(C), and overall mAP denoted as mAP(O), which considers the number of samples in each class. The training details and the loss hyper-parameters used are provided in appendix $\color{red}{\text{A}}$ .

在本节中，我们将报告关于部分标注的多标签数据集的主要结果:OpenImages [15] 和LVIS [8]。在MS-COCO数据集上的结果显示在附录 $\color{red}{\text{C}}$ 中。在多标签分类的其他基线方法中，我们将与以前处理部分标注的方法进行比较。实验中使用的评价指标是平均精度(mean average precision, mAP)。特别地，我们报告了标为mAP(C)的每个类的标准mAP，以及标为mAP(O)的总体mAP，它考虑了每个类中的样本数量。使用的训练细节和损失超参数在附录 $\color{red}{\text{A}}$ 中提供。

6.1. OpenImages V6

Openimages V6 is a large-scale multi-label dataset [15], consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes. In Table $\color{red}{\text{1}}$ , we present the mAP results obtained by our proposed Selective method and compare them to other approaches. Interestingly, Ignore mode produces better results than Negative mode, as OpenImages contains many under-annotated frequent classes such as colors and other general classes (see Figure $\color{red}{\text{5}}$ ). Using Negative mode adds a massive la- bel noise and harms the learning of many common classes. In Table $\color{red}{\text{2}}$ , we present results for different network architectures. Specifically, using TResNet-L [23], we achieve state-of-the-art result of 87.34 mAP score.

Openimages V6是一个大规模的多标签数据集[15]，由900万张训练图像、41,620个验证样本和125,456个测试样本组成。它是一个部分标注的数据集，有9600个可训练的类。在表 $\color{red}{\text{1}}$ 中，我们展示了通过我们提出的Selective方法获得的mAP结果，并将它们与其他方法进行比较。有趣的是，Ignore模式比Negative模式产生更好的结果，因为OpenImages包含许多未标注的频繁类，如colors和其他通用类(参见图 $\color{red}{\text{5}}$ )。使用否定模式会增加大量的噪音，并损害许多普通班级的学习。在表 $\color{red}{\text{2}}$ 中，我们展示了不同网络架构的结果。具体来说，我们使用TResNet-L[23]实现了最先进的mAP得分87.34的结果。

To show the impact of decoupling the focusing parameters of the annotated and un-annotated loss terms in P-ASL as proposed in equation ( $\color{red}{\text{3}}$ ), we varied the negative focusing parameter $\gamma_{−}$ , while fixing $\gamma_{u} = 7$ . The results are presented in Figure $\color{red}{\text{8}}$ . The case of $\gamma_{−} = 7$ represents the standard ASL [2]. As can be seen, the mAP score increases as we lower $\gamma_{−}$ , up to 2. It indicates that lowering the decay rate for the annotated negative term boosts their contribution to the loss.

为了显示解耦P-ASL中有标注和无标注损失项的焦点参数的影响，我们改变负焦点参数 $\color{red}{\text{3}}$ ，同时固定 $\gamma_{u} = 7$ 。结果显示在图 $\color{red}{\text{8}}$ 中。 $\gamma_{−}= 7$ 表示标准ASL [2]。可以看到，当我们将 $\gamma_{−}$ 降低到2时，mAP得分增加。这表明，降低标注负项的衰减率会增加它们对损失的贡献。

Table 3. Results for OpenImages (V3). Comparing the mAP score obtained using our Selective approach to previous multi-label classification methods.

表3。OpenImages (V3)的结果。比较使用我们的Selective方法获得的mAP得分与以前的多标签分类方法。

Figure 8. Impact of decoupling the focusing parameters. We set the un-annotated focusing to γu = 7 and varied the annotated negative focusing γ−.

图8。解耦对焦点参数的影响。我们将未标注的焦点设为γu = 7，并改变标注的负焦点

Figure 9. Ablation study of the Selective approach components. mAP results are shown for different numbers of top likelihood labels, K. We show results for the case of using only the likelihood condition ΩL, and with both conditions ΩL ∪ ΩP .

图9。Selective方法部件的消融研究。mAP结果显示了不同数量的最高可能性标签K。我们显示了仅使用似然条件ΩL的情况下的结果，以及同时使用两种条件ΩL∪ΩP。

In Figure $\color{red}{\text{9}}$ , we show the mAP scores while varying the number of top likelihood classes, $K$ as defined in equation ( $\color{red}{\text{5}}$ ). Note that setting $K = 0$ is equivalent to use Negative mode. Training with high enough K becomes similar to training using Ignore mode. The highest mAP results are obtained with both the likelihood and prior conditions.

在图 $\color{red}{\text{9}}$ 中，我们显示了当改变最高可能类的数量时的mAP分数， $K$ 定义在等式( $\color{red}{\text{5}}$ )中。注意，设置 $K = 0$ 等价于使用Negative模式。K足够高的训练变得类似于使用Ignore模式的训练。在似然和先验条件下获得最高的mAP结果。

Table 4. Results for LVIS. The Selective approach with P-ASL improves both mAP(C) and mAP(O) scores. Also, it provides top result for the frequent class ”Person”.

表4。LVIS的结果。Selective方法的P-ASL能提高mAP(C)和mAP(O)指标。此外，它还为频繁出现的类“Person”提供了最高结果。

6.2. OpenImages V3

To be compatible with previously published results, we used the OpenImages V3 which contains 5,000 trainable classes. We follow the comparison setting described in [11]. Also, for a fair comparison we used the ResNet-101 [10] backbone, pre-trained on the ImageNet dataset. In Table $\color{red}{\text{3}}$ , we show the mAP score results obtained using previous approaches and compared them to our Selective method. As shown, our method significantly outperforms previous approaches that deal with partial annotation in a multi-label setting.

为了与以前发布的结果兼容，我们使用了OpenImages V3，它包含5000个可培训的类。我们遵循 [11] 中描述的比较设置。此外，为了进行公平的比较，我们使用ResNet-101 [10] 主干，在ImageNet数据集上进行了预训练。在表 $\color{red}{\text{3}}$ ,中，我们展示了使用之前的方法获得的mAP评分结果，并将它们与我们的Selective方法进行了比较。如图所示，我们的方法明显优于以前处理多标签设置中的部分标签的方法。

6.3. LVIS

LVIS is a partially labeled dataset originally annotated for object detection and image segmentation, that was adopted as a multi-label classification benchmark. It consists of 100,170 images for training and 19,822 images for testing. It contains 1,203 classes. In Table $\color{red}{\text{4}}$ ,, we present a comparison between different approaches on the LVIS dataset. As can be seen, in this case, the Negative mode is better, compared to the Ignore mode. This can be related to the fact that most of the labels are related to specific objects which do no appear frequently in the images. The most frequent class is “Person”. Therefore we also added its average precision to Table $\color{red}{\text{3}}$ ,. Note that the Ignore model better learns the class “Person” compared to the one trained with Negative mode. Using the P-ASL with the Selective mode, we were able to obtain superior mAP results as well as top average precision even for the most frequent class, “Person”.

LVIS是一个部分标注的数据集，最初用于物体检测和图像分割，被用作多标签分类基准。它包括100,170张用于训练的图像和19822张用于测试的图像。它包含1,203个类。在表 $\color{red}{\text{4}}$ ,中，我们对LVIS数据集上的不同方法进行了比较。可以看出，在这种情况下，Negative模式比Ignore模式更好。这可能与这样一个事实有关，即大多数标签都与特定的对象有关，这些对象在图像中不经常出现。最常见的类是“Person”。因此，我们还将其平均精度添加到表 $\color{red}{\text{4}}$ ,中。注意，Ignore模型比Negative模式训练的更好地学习了“人”类。使用带有选择性模式的P-ASL，即使是最常见的类别“Person”，我们也能够获得更好的mAP结果以及最高的平均精度。

7. Conclusion

In this paper, we presented a novel technique for handling partially labeled data in multi-label classification. We observed that ignoring the un-annotated labels in the loss or treating them as negative should be determined individually for each class. We proposed a selective mechanism that uses the label likelihood computed throughout the training, and the label prior which is obtained by estimating the class distribution from the data. The un-annotated labels are further softened via a partial asymmetric loss. Extensive experiments analysis shows that our proposed approach outperforms other previous methods on partially labeled datasets, including OpenImages, LVIS, and simulated-COCO.

在本文中，我们提出了一种处理多标签分类中部分标注数据的新方法。我们注意到，应该针对每个类分别确定在损失函数中忽略未标注标签还是将它们视为负标签。我们提出了一种选择机制，使用在训练过程中计算的标签似然，以及通过估计数据的类分布获得的标签先验。未标注的标签通过部分不对称损失被进一步软化。大量的实验分析表明，我们提出的方法在部分标记数据集上优于其他以前的方法，包括OpenImages、LVIS和simulated-COCO。

References

参考文献

[1] Jessa Bekker and Jesse Davis. Learning from positive and unlabeled data: A survey. CoRR, abs/1811.04820, 2018. 3
[2] Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik- Manor. Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119, 2020. 1, 2, 4, 7, 8
[3] Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020. 7
[4] Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. Multi-label image recognition with graph convolu- tional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5177– 5186, 2019. 1
[5] Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, and Honglin Liu. Mltr: Multi-label classification with transformer, 2021. 1
[6] Thibaut Durand, Nazanin Mehrasa, and Greg Mori. Learn- ing a deep convnet for multi-label classification with partial labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 647–657, 2019. 1, 2, 3,7,8
[7] Bin-Bin Gao and Hong-Yu Zhou. Multi-label image recog- nition with multi-class attentional regions. arXiv preprint arXiv:2007.01755, 2020. 1
[8] Agrim Gupta, Piotr Dolla ́r, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation, 2019. 1, 2,7
[9] Zayd Hammoudeh and Daniel Lowd. Learning from positive and unlabeled data with arbitrary positive shift, 2020. 3
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 7, 8
[11] D. Huynh and E. Elhamifar. Interactive multi-label CNN learning with partial labels. IEEE Conference on Computer Vision and Pattern Recognition, 2020. 1, 3, 8
[12] Liwei Jiang, Dan Li, Qisheng Wang, Shuai Wang, and Song- tao Wang. Improving positive unlabeled learning: Practical aul estimation and new training method for extremely imbalanced data sets, 2020. 3
[13] Diederik P. Kingma and Jimmy Ba. Adam: A method for
stochastic optimization, 2017. 11
[14] Kaustav Kundu and Joseph Tighe. Exploiting weakly super-
vised visual patterns to learn from partial annotations. In
NeurIPS, 2020. 1, 3, 7, 8
[15] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui-
jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, and et al. The open images dataset v4. International Journal of Computer Vision, 128(7):1956–1981, Mar 2020. 1, 2, 7
[16] JackLanchantin,TianluWang,VicenteOrdonez,andYanjun Qi. General multi-label image classification with transform- ers, 2020. 1
[17] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dolla ́r. Focal loss for dense object detection. CoRR, abs/1708.02002, 2017. 5
[18] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dolla ́r. Microsoft coco: Common objects in context, 2014. 2, 3, 6
[19] Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. Query2label: A simple transformer way to multi-label clas- sification, 2021. 1
[20] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. 11
[21] Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, and Ross Girshick. Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels, 2016. 8
[22] Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. Imagenet-21k pretraining for the masses, 2021. 11
[23] Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir, and Itamar Friedman. Tresnet: High perfor- mance gpu-dedicated architecture. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1400–1409, 2021. 7, 11
[24] Leslie N. Smith. A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momen- tum, and weight decay, 2018. 11
[25] Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. Cnn-rnn: A unified framework for multi-label image classification, 2016. 8
[26] Baoyuan Wu, Fan Jia, Wei Liu, Bernard Ghanem, and Siwei Lyu. Multi-label learning with missing labels using mixed dependency graphs, 2018. 3
[27] Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. Distribution-balanced loss for multi-label classification in long-tailed datasets, 2020. 1
[28] Hao Yang, Joey Tianyi Zhou, and Jianfei Cai. Improving multi-label learning with missing labels by structured seman- tic correlations, 2016. 3
[29] Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, and Shilei Wen. Cross-modality attention with seman- tic graph embedding for multi-label classification. In AAAI, pages 12709–12716, 2020. 1
[30] Hsiang-FuYu,PrateekJain,PurushottamKar,andInderjitS. Dhillon. Large-scale multi-label learning with missing labels, 2013. 3

Appendices

附录

A. Training Details

A. 训练详情

Unless stated otherwise, all experiments were conducted with the following training configuration. As a default, we used the TResNet-M model [23], pre-trained on ImageNet- 21k dataset [22]. The model was fine-tuned using Adam optimizer [13] and 1-cycle cosine annealing policy [24] with a maximal learning rate of 2e-4 for training OpenImages and MS-COCO, and 6e-4 for training LVIS. We used true- weight-decay [20] of 3e-4 and standard ImageNet augmentations. For fair comparison to previously published results on OpenImages V3, we also trained a ResNet-101 model, pre-trained on ImageNet.
In the OpenImages experiments we used the following hyper-parameters: $\eta= 0.05$ , $K = 200$ , $\gamma_{u} = 7$ , $\gamma_{−} = 2$ and $\gamma_{+} = 1$ . In LVIS we used: $\gamma_{u} = 1$ , $\gamma_{−} = 0$ and $\gamma_{+} = 0$ .

除非另有说明，所有实验均采用以下训练配置进行。作为默认值，我们使用了在ImageNet21k数据集 [22] 上预训练的TResNet-M模型 [23] 。使用Adam优化器 [13] 和1周期余弦退火策略 [24] 对模型进行微调，OpenImages和MS-COCO的最大学习率为2e-4, LVIS的最大学习率为6e-4。我们使用3e-4的真实重量衰减 [20] 和标准的ImageNet增强。为了与之前在OpenImages V3上发布的结果进行公平的比较，我们还训练了一个ResNet-101模型，在ImageNet上进行了预训练。
在OpenImages实验中，我们使用了以下超参数: $\eta= 0.05$ ， $K = 200$ ， $\gamma_{u} = 7$ ， $\gamma_{−}= 2$ 和 $\gamma_{+} = 1$ 。在LVIS我们使用: $\gamma_{u} = 1$ ， $\gamma_{−}= 0$ 和 $\gamma_{+} = 0$ 。

B. Soft Label Prior

B. 软标签先验

Herein, we will explore a soft alternative for integrating the label prior in the loss. We follow equation (3) and define the un-annotaetd weights by
$w_{c} = \exp \left(-\alpha \hat{P}_{r}\left(c\right)\right)$ (12)
where $\alpha$ is the decay factor. In Table $\color{red}{\text{5}}$ we compare the soft label prior to the configuration used in section $\color{red}{\text{4.1}}$ .

在此，我们将探索一种软的替代方法，在损失之前集成标签。我们按照公式(3)定义未注释的权重为
$w_ {c} = \exp \left(-\alpha \hat{P}_{r}\left(c\right)\right)$ (12)
其中 $\alpha$ 是衰变因子。在表 $\color{red}{\text{5}}$ 中，我们比较软标签之前的配置在 $\color{red}{\text{4.1}}$ 中使用。

Table 5. OpenImages (V6) results using soft label prior. We used α = 10.

表5。使用软标签先验的OpenImages (V6)结果。We used α = 10。

As the soft label prior provided with lower mAP(C) results, we did not use it in our experiments.

由于软标签先验提供较低的mAP(C)结果，我们在实验中没有使用它。

C. Results on MS-COCO

C. MS-COCO上的结果

In this section, we will present the results obtained on a partially annotated version of MS-COCO, based on the fixed per class (FPC) simulation scheme. Note that in this experiment, the class distribution measured by the number of annotations is no longer meaningful, as all classes have the same number of annotations. The mAP results, as well as the average precision (AP) scores for the class ”Person”, are presented in Figure $\color{red}{\text{10}}$ . The Negative mode produces higher mAP (computed over all the classes) compared to the Ignore mode. However, as the frequent class ”Person” is present in most of the images, the Negative mode is inferior, especially in the cases of a small number of annotations. Using the Selective approach, top results can be achieved for both mAP and the person AP. In Figure $\color{red}{\text{11}}$ , we show the top 10 frequent classes obtained using our procedure for estimating the class distribution as described in section $\color{red}{\text{4.2}}$ , and compared them to those obtained using the original class frequencies in MS-COCO. As can be seen, most of the frequent classes measured by the original distribution are also highly ranked by our estimator.

在本节中，我们将介绍在基于固定每类(FPC)模拟方案的部分注释版本MS-COCO上获得的结果。注意，在这个实验中，由注释数量衡量的类分布不再有意义，因为所有类都有相同数量的注释。mAP结果，以及类“Person”的平均精度(AP)得分，显示在图 $\color{red}{\text{10}}$ 中。与Ignore模式相比，Negative模式产生更高的mAP(在所有类上计算)。但由于大多数图片中都出现了频繁的类“Person”，所以Negative模式较差，尤其是在标注数量较少的情况下。在图 $\color{red}{\text{11}}$ 中，我们显示了使用估计类分布的程序获得的最常见的10个类，如 $\color{red}{\text{4.2}}$ 中所述，并将它们与MS-COCO中使用原始类频率获得的类进行比较。可以看到，原始分布测量的大多数频繁类也被我们的估计器排得很高。

Figure 10. Results on MS-COCO (FPC).

图10。MS-COCO (FPC)的结果。

Figure 11. Class frequency estimation in MS-COCO. Top frequent classes measured by (a) original class distribution and (b) estimated class distribution. The estimated top 10 frequent classes are included in the original top classes.

图11。MS-COCO中的类频率估计。最频繁的类别由(a)原始班级分布和(b)估计类别分布测量。预估的10个最高频率类别都包含在原来的最高频率类别中。

D. Frequent classes in OpenImages

D. OpenImages中频繁出现的类

We add more results of the class distribution estimated by our approach (detailed in $\color{red}{\text{4.2}}$ ) for OpenImages dataset. See Figure $\color{red}{\text{12}}$ .

我们为OpenImages数据集添加了更多通过我们的方法估计的类分布结果(详细显示在 $\color{red}{\text{4.2}}$ )。参见图 $\color{red}{\text{12}}$ 。

E. Frequent classes in LVIS

E. LVIS中频繁出现的类

In Figure $\color{red}{\text{13}}$ we plot the top frequent classes in LVIS, obtained by our estimator detailed in section $\color{red}{\text{4.2}}$ . Also in LVIS, it can be seen that the most estimated frequent classes are related to common objects as ”Person”, ”Shirt”, ”Trousers”, ”Shoe”, etc.

在图 $\color{red}{\text{13}}$ 中，我们绘制了LVIS中最常见的类，由估计器获得，详细信息在 $\color{red}{\text{4.2}}$ 中。同样在LVIS中，我们可以看到，最常被估计的类与常见的对象有关，如“人”、“衬衫”、“裤子”、“鞋子”等。

Figure 12. Estimating the class distribution in OpenImages. Additional top 60 frequent classes as estimated by our approach.

图12。估计OpenImages中的类分布。根据我们的方法估计，另外的60个最频繁的类别。

Figure 13. Estimating the class distribution in LVIS. Top 20 frequent classes estimated by the Ignore model.

图13。估计LVIS中的类分布。Ignore模型估计的最频繁的20个类别。

最后编辑于：2022.06.20 18:02:50

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,214评论 6赞 481
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,307评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 152,543评论 0赞 341
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,221评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,224评论 5赞 371
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,007评论 1赞 284
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,313评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,956评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,441评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,925评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,018评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,685评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,234评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,240评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,464评论 1赞 261
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,467评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,762评论 2赞 345