We still has to respect the fact that a point cloud is just a set of points and therefore invariant to permutations of its members, necessitating certain symmetrizations in the net computation. Further invariances to rigid motions also need to be considered.
Our PointNet is a unified architecture that directly takes point clouds as input and outputs either class labels for the entire input or per point segment/part labels for each point of the input. The basic architecture of our network is surprisingly simple as in the initial stages each point is processed identically and independently. In the basic setting each point is represented by just its three coordinates (x, y, z). Additional dimensions may be added by computing normals and other local or global features.
我们仍然必须尊重这样一个事实,即点云只是一组点,因此它的成员排列是不变的,这就需要在网络计算中进行某些对称操作。我们还需要考虑刚体运动的不变性。
我们的PointNet是一个统一的体系结构,它直接将点云作为输入,并输出整个输入的类标签或输入的每个点段/部分(point segment/part labels )标签。我们网络的基本架构是令人惊讶的简单,因为在初始阶段,每个点被相同和独立地处理。在基本设置中,每个点仅用它的三维坐标(x,y,z)表示。额外的维度可以通过计算法线和其他局部或全局特征来增加。
Key to our approach is the use of a single symmetric function, max pooling. Effectively the network learns a set of optimization functions/criteria that select interesting or informative points of the point cloud and encode the reason for their selection. The final fully connected layers of the network aggregate these learnt optimal values into the global descriptor for the entire shape as mentioned above (shape classification) or are used to predict per point labels (shape segmentation).
Our input format is easy to apply rigid or affine transformations to, as each point transforms independently. Thus we can add a data-dependent spatial transformer network that attempts to canonicalize the data before the PointNet processes them, so as to further improve the results.
我们的方法的关键是使用单个对称函数,最大池化。网络有效地学习了一组优化函数/标准(functions/criteria),这些函数/标准可以选择有趣的或信息丰富的点云中的点,并解释这么选择的理由。网络最后的完全连接层将这些学习到的最优值聚合到整个形状的全局描述符中,如上文所述(形状分类),或用于预测每一个点的标签(形状分割)。
我们的输入格式很容易应用于刚性或仿射变换,因为每个点都是独立的。因此,我们可以添加一个依赖于数据的空间转换网络,试图在PointNet处理数据之前将数据规范化,以便进一步改进结果。
We provide both a theoretical analysis and an experimental evaluation of our approach. We show that our network can approximate any set function that is continuous. More interestingly, it turns out that our network learns to summarize an input point cloud by a sparse set of key points, which roughly corresponds to the skeleton of objects according to visualization. The theoretical analysis provides an understanding why our PointNet is highly robust to small perturbation of input points as well as to corruption through point insertion (outliers) or deletion (missing data). On a number of benchmark datasets ranging from shape classification, part segmentation to scene segmentation, we experimentally compare our PointNet with state-ofthe-art approaches based upon multi-view and volumetric representations. Under a unified architecture, not only is our PointNet much faster in speed, but it also exhibits strong performance on par or even better than state of the art.
我们对我们的方法进行了理论分析和实验评估。我们证明了我们的网络可以逼近任何连续的集合函数。更有趣的是,事实证明我们的网络通过一组稀疏的关键点来学着总结输入点云,这些关键点通过可视化的方式可以看出它们大致对应于对象的骨架。理论分析能理解为什么我们的PointNet对输入点的小扰动,以及通过点插入(异常值)或删除(缺失数据)的破坏具有很强的鲁棒性。在几个长凳标记数据集上,从形状分类、部分分割到场景分割,我们实验比较了我们的pointnet和基于多视图和体积法的最新方法在统一的体系结构下,我们的PointNet不仅速度快得多,甚至比目前的先进水平表现得更好。
The key contributions of our work are as follows:
• We design a novel deep net architecture suitable for consuming unordered point sets in 3D;
• We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;
• We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;
• We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.
The problem of processing unordered sets by neural nets is a very general and fundamental problem – we expect that our ideas can be transferred to other domains as well.
我们的主要贡献如下:
·我们设计了一种新的深网结构,适用于使用3D中的无序点集;
·我们展示了如何训练这样的网来执行3D 形状分类、形状分割和场景语义解析任务;
·我们对该方法的稳定性和有效性进行了深入的经验和理论分析;
·我们演示了由网络中选定的神经元计算出来的三维特征,并对其性能做出了直观的解释。
神经网络处理无序集问题是一个非常普遍和基本的问题。我们希望我们的想法也可以转移到其他领域。