1 写在前面
- https://arxiv.org/pdf/2207.12978.pdf
- ECCV2022
- task: large-scale Multiple Object Tracking (MOT)
2 introduction
MOT task: estimate the trajectory of objects in a video sequence.
limitation1: common MOT benchmarks [16,32,11] only consider tracking objects from very few pre-defined categories, e.g., pedestrian and car, existing MOT methods do not perform well on a large number of categories.
limitation2:the metrics of MOT can be better refined
Current MOT models and metrics are mainly designed for single-category multiple-object racking. When extending to large-scale multi-category MOT, methods simply detect and classify each object and achieve the association via the same labels. This relies heavily on the classification results.
Thus, when the classification is inaccurate e.g., in large-scale multi-category MOT, existing models and evaluation metrics should be improved.
This paper:
To expand tracking to a more general scenario, we propose that classification should be disentangled from tracking, in both evaluation and model design, for multi-category MOT.
- design a new metric, Track Every Thing Accuracy (TETA);
2)a new model, Track Every Thing tracker (TETer).
exp:
large-scale multi-category tracking datasets, TAO and BDD100K.
3 Tracking-Every-Thing Metric
3.1 Limitations for Large-scale MOT Evaluation
How to handle classification. 1. Simply associating objects via the same label relies on the correct classification results. 2. the most naive solution, ignoring the classification results, leads to the evaluation being dominated by the head classes in the long-tailed distribution dataset.
Incomplete Annotations: the large-scale datasets are not exhaustively annotated, so how can we identify and penalize false positive(FP) predictions?
3.2 Tracking-Every-Thing Accuracy (TETA)
TETA consists of three parts:
- a localization score
- an association score
- a classification score
evaluate the different aspects properly.
To avoid false punishments, we ignore the predictions that are not assigned to any clusters during evaluation.
4 Tracking-Every-Thing Tracker
framework:
4.1 class-agnostic localization
This shows the bottleneck of the detection model lies in the classification
Thus, this paper first performs class-agnostic localization.
4.2 associating everything
- common clues: location, appearance, and class
- motion (location) is irregular (x)
- many objects are not predefined (x)
- while objects in different classes usually have different appearances (selected as the main cue)
Instead of using class information as "hard" prior, the class information is used in a "soft" way by contrastive learning.
With the CEM learned, association can be done by comparing the similarities