Presenter: Zoe Kotti Date: 15 January 2021
We introduce the region-based convolutional neural networks (R-CNN) family of machine learning models, which are widely used in computer vision for object detection. Particularly, we focus on the R-FCN model, a region-based, fully convolutional network for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN, that apply a costly per-region subnetwork hundreds of times, R-FCN is fully convolutional with almost all computation shared on the entire image. To achieve this goal, position-sensitive score maps are proposed to address a dilemma between translation-invariance in image classification and translation-variance in object detection. This method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. The authors of this work show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, the result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn.