Automatic adaptation of a generic pedestrian detector to a specific traffic scene

Authors:
Meng Wang; Xiaogang Wang
Affiliations:
Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China;Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 1

Scene transformation for detector adaptation

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years significant progress has been made learning generic pedestrian detectors from manually labeled large scale training sets. However, when a generic pedestrian detector is applied to a specific scene where the testing data does not match with the training data because of variations of viewpoints, resolutions, illuminations and backgrounds, its accuracy may decrease greatly. In this paper, we propose a new framework of adapting a pre-trained generic pedestrian detector to a specific traffic scene by automatically selecting both confident positive and negative examples from the target scene to re-train the detector iteratively. An important feature of the proposed framework is to utilize unsupervisedly learned models of vehicle and pedestrian paths, together with multiple other cues such as locations, sizes, appearance and motions to select new training samples. The information of scene structures increases the reliability of selected samples and is complementary to the appearance-based detector. However, it was not well explored in previous studies. In order to further improve the reliability of selected samples, outliers are removed through multiple hierarchical clustering steps. The effectiveness of different cues and clustering steps is evaluated through experiments. The proposed approach significantly improves the accuracy of the generic pedestrian detector and also outperforms the scene specific detector retrained using background subtraction. Its results are comparable with the detector trained using a large number of manually labeled frames from the target scene.