Rutgers
Cinque Terre

Chaitanya Mitash

Ph.D Student, Computer Science
Rutgers University


A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation (arXiv)

Chaitanya Mitash, Kostas E. Bekris and Abdeslam Boularias



Abstract: Impressive progress has been achieved in object detection with the use of deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort for labeling objects. This limits their applicability in robotics, where it is necessary to scale solutions to a large number of objects and a variety of conditions. This work proposes a fully autonomous process to train a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The application involves detection of objects placed in a clutter and in tight environments, such as a shelf. In particular, given access to 3D object models, several aspects of the environment are simulated and the models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset that we collected with a Motoman robotic manipulator. Results show that the proposed process outperforms popular training processes relying on synthetic data generation and manual annotation.


Code

The code for Physics aware scene simulation can be found here : physics-scene-rendering-vision
The code for 6D Pose estimation and self-learning pipeline can be found here : PHYSIM_6DPose

Some examples of scenes rendered using our simulation

Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre Cinque Terre

the scenes are accompanied with bounding box files that contain a list of "object label, tl_x, tl_y, br_x, br_y" with occlusion handling.



Bibtex

@article{physim,
title={A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation},
author={Mitash, Chaitanya and Bekris, Kostas and Boularias, Abdeslam},
journal={arXiv:1703.03347},
year={2017}
}