Ph.D Student, Computer Science
Abstract: Impressive progress has been achieved in object detection with the use of deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort for labeling objects. This limits their applicability in robotics, where it is necessary to scale solutions to a large number of objects and a variety of conditions. This work proposes a fully autonomous process to train a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The application involves detection of objects placed in a clutter and in tight environments, such as a shelf. In particular, given access to 3D object models, several aspects of the environment are simulated and the models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset that we collected with a Motoman robotic manipulator. Results show that the proposed process outperforms popular training processes relying on synthetic data generation and manual annotation.
the scenes are accompanied with bounding box files that contain a list of "object label, tl_x, tl_y, br_x, br_y" with occlusion handling.