Monocular Depth Estimation using Diffusion Models
Feb 2023
Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet
[Google Research]
https://arxiv.org/abs/2302.14816
https://depth-gen.github.io
我们使用去噪扩散模型来制定单目深度估计,其灵感来自去噪扩散模型最近在高保真图像生成中的成功。为此,我们引入了创新来解决由于训练数据中的噪声、不完整深度图而产生的问题,包括step-unrolled去噪扩散、L1损失和训练期间的深度填充。为了应对监督训练数据的有限可用性,我们利用了自监督图像到图像翻译任务的预训练。尽管该方法简单,但我们的DepthGen模型具有通用的损失和架构,在室内NYU数据集上实现了SOTA性能,在室外KITTI数据集上获得了接近SOTA的结果。此外,使用多模态后验,DepthGen自然地表示深度模糊(例如,来自透明表面),其zero-shot性能与深度插补相结合,实现了简单但有效的文本到3D管道。
We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an L1 loss, and depth infilling during training. To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks. Despite the simplicity of the approach, with a generic loss and architecture, our DepthGen model achieves SOTA performance on the indoor NYU dataset, and near SOTA results on the outdoor KITTI dataset. Further, with a multimodal posterior, DepthGen naturally represents depth ambiguity (e.g., from transparent surfaces), and its zero-shot performance combined with depth imputation, enable a simple but effective text-to-3D pipeline.