Monocular Camera 2.5D Object Detection for Autonomous Systems at Ridecell
Tuesday, Oct 4, 2022
Object detection with a monocular camera is extremely important for the automotive industry as obtaining LiDAR data is not only expensive but getting them labelled is extremely difficult. Previous works have tried removing dependencies of LiDAR but only for inference, they still needed LiDAR data during training. In our work, there are no requirements of LiDAR data annotations. Yet the major advancement in our work compared to that of the previous works is that previously 3D detections were initially performed by stacking two different deep learning networks i.e., a 2D object detection network followed by projecting them to Bird’s Eye View (BEV) to get the depth from a depth prediction network. The presented approach instead combines the two different deep learning networks in one single feed-forward pass with a common backbone network separating out at heads. Having two different heads with common backbone helps the backpropagation learn the weights by mutually improving the two different tasks of 2D object detection and depth prediction simultaneously, thus giving better and faster output as the previous works.