Unmanned Aerial Vehicles (UAVs) are an essential component in the realization of Industry 4.0. With drones helping to improve industrial safety and efficiency in utilities, construction and communication, there is an urgent need for drone-based intelligent applications. In this paper, we develop a unified framework to simultaneously detect and count vehicles from drone images. We first explore why the state-of-the-art detectors fail in highly dense drone scenes, which provides more appropriate insights. Then, we propose an effective loss to push the anchors towards matching the ground-truth boxes as much as possible, specifically designed for scale-adaptive anchor generation. Inspired by attention mechanisms in the human visual system, we maximize the mutual information between object classes and features by combining bottom-up cues with top-down attention mechanisms specifically designed for feature extraction. Finally, we build a counting layer with regularized constraint related to the number of vehicles. Extensive experiments demonstrate the effectiveness of our approach. For both tasks, our proposed method achieves state-of-the-art results on all four challenging datasets. In particular, our results reduce error by a larger factor than previous methods.
Overview of our approach. We propose a unified full convolutional neural network to localize and count vehicles in the drone-based images.It consists of three parts: a base network, a feature extractor, and a vehicle detector. We introduce a scale-adaptive strategy to generate the suitable anchor boxes. A circular flow is embedded in the feature extractor by combining the bottom-up cues with top-down attention. Finally, we build the counting layer and introduce a counting regularized term into the original loss.