Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

1 Introduction
 
The framework directly regresses 3D  bounding bo xes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance.
 
It consists of a backbone network followed by two parallel network branches for
1) bounding box regression and
2) point mask prediction.
 
3D-BoNet is single-stage, anchor-free and end-to-end trainable
it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting.
 
 
Core problems on 3D geometric data such as point clouds include semantic segmentation, object detection and instance segmentation.
Point clouds are inherently unordered, unstructured and non-uniform.
 
SGPN:learns to group per-point features through a similarity matrix
Similarly,ASIS、JSIS3D、MASC、3D-BEVIS  apply the same per-point feature grouping pipeline to segment 3D instances.
Mo:formulate the instance segmentation as a per-point feature classifification problem in PartNet
These methods above do not explicitly detect the object boundaries, and require a post-processing step such as mean-shift clustering [6] to obtain the fifinal instance labels, which is computationally heavy.
 
 
The paper propose a framework for 3D instance segmentation, where objects are loosely but uniquely detected through a single-forward stage using effificient MLPs, and then each instance is precisely segmented through a simple point-level binary classififier. To this end, we introduce a new bounding box prediction module together with a series of carefully designed loss functions to directly learn object boundaries.
 
 
It first uses an existing backbone network to extract a local feature vector for each point and a global feature vector for the whole input point cloud.
The backbone is followed by two branches:
1) instance-level bounding box prediction
2) point-level mask prediction for instance segmentation.
 
The bounding box prediction branch aims to predict a bounding box for each instance without relying on predefined spatial anchors or a region proposal network.
to learn instance boxes involves critical issues:
1) the number of total instances is variable, i.e., from 1 to many,
2) there is no fifixed order for all instances.
 
      This box prediction branch simply takes the global feature vector as input and directly outputs a large and fifixed number of bounding boxes together with confifidence scores. These scores are used to indicate whether the box contains a valid instance or not. To supervise the network, we design a novel bounding box association layer followed by a multi-criteria loss function.
      Given a set of ground-truth instances, we need to determine which of the predicted boxes best fifit them. We formulate this association process as an optimal assignment problem with an existing solver.
      After the boxes have been optimally associated, our multi-criteria loss function not only minimizes the Euclidean distance of paired boxes, but also maximizes the coverage of valid points inside of predicted boxes.
 
The purpose of the point mask prediction branch is to classify whether each point inside of a bounding box belongs to the valid instance or the background.
 
 

The framework distinguishes from all existing 3D instance segmentation approaches:

.1) Compared with the proposal-free pipeline, our method segments instance with high objectness by explicitly learning 3D object boundaries.
2) Compared with the widely-used proposal-based approaches, our framework does not require expensive and dense proposals.
3) Our framework is remarkably effificient, since the instance-level masks are learnt in a single-forward pass without requiring any post-processing steps.
key contributions:
We propose a new framework for instance segmentation on 3D point clouds. The framework is single-stage, anchor-free and end-to-end trainable, without requiring any post-processing steps.
We design a novel bounding box association layer followed by a multi-criteria loss function to supervise the box prediction branch.
We demonstrate signifificant improvement over baselines and provide intuition behind our design choices through extensive ablation studies.
 
 
 
2 3D-BoNet
 
2.1 Overview
 
input:point cloud P with N points in total  
 is the number of channels such as the location and color  of each point
 
 ,k is the length of feature vectors.
 
 ,k is the length of feature vectors.
 
 
During training, the predicted bounding boxes B and the ground truth boxes are fed into a box association layer.
The output of the association layer is a list of association index A
The indices reorganize the predicted boxes, such that each ground truth box is paired with a unique predicted box for subsequent loss calculation.
The predicted bounding box scores are also reordered accordingly before calculating loss.
The reordered predicted bounding boxes are then fed into the multi-criteria loss function.
 
This loss function aims to not only minimize the Euclidean distance between each ground truth box and the associated predicted box, but also maximize the coverage of valid points inside of each predicted box.
Both the bounding box association layer and multi-criteria loss function are only designed for network training
 
In order to predict point-level binary mask for each instance, every predicted box together with previous local and global features, i.e., Fl and Fg, are further fed into the point mask prediction branch.
 
 
2.2 Bounding Box Prediction
 
 
Bounding Box Encoding:
 
Neural Layers: 
H is a predefined and fixed number of bounding boxes that the whole network are expected to predict in maximum.
 
Bounding Box Association Layer:
Optimal Association Formulation:
A : a boolean association matrix where  if the i th predicted box is assigned to the j th ground truth box,also calles association index.
C:the association cost matrix where Ci,j represents the cost that the i th predicted box is assigned to the j th ground truth box
 
Association Matrix Calculation:
1、Euclidean Distance between Vertices
 
2、Soft Intersection-over-Union on Points
The deeper the corresponding point is inside of the box, the higher the value. The farther away the point is outside, the smaller the value.
 
 
3、Cross-Entropy Score
 
the criterion (1) guarantees the geometric boundaries for learnt boxes and criteria (2)(3) maximize the coverage of valid points and overcome the non-uniformity
 
 
Loss Functions
 
Multi-criteria Loss for Box Prediction:
 
Loss for Box Score Prediction:
After being reordered by the association index A, the ground truth scores for the fifirst T scores are all ‘1’, and ‘0’ for the remaining invalid H H T scores. Use cross-entropy loss for this binary classifification task:
This loss function rewards the correctly predicted bounding boxes, while implicitly penalizing the cases where multiple similar
boxes are regressed for a single instance.
 
 
2.3 Point Mask Prediction
 
 
Neural Layers:
use sigmoid as the last activation function
 
Loss Function:
Due to the imbalance of instance and background point numbers, we use focal loss [29] with default hyper-parameters instead of the standard cross-entropy loss to optimize this branch. Only the valid T paired masks are used for the loss .
 
 
2.4 End-to-End Implementation
backbone:PointNet++
Adam
Initial learning rate is set to  and then divided by 2 every 20 epochs.
 
 
 

Hungarian algorithm :to solve the above optimal association problem

 
 
 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章