Fast feature selection and training for AdaBoost-based concept detection with large scale datasets

Authors:
Shi Chen;Jinqiao Wang;Yang Liu;Changsheng Xu;Hanqing Lu
Affiliations:
National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China;National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China;National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China;National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China;National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 8
Cited 0

K-d trees for semidynamic point sets

SCG '90 Proceedings of the sixth annual symposium on Computational geometry
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Improved boosting algorithms using confidence-rated predictions

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An introduction to variable and feature selection

The Journal of Machine Learning Research
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Fast Asymmetric Learning for Cascade Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets

Proceedings of the international conference on Multimedia information retrieval
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

AdaBoost has been proved a successful statistical learning method for concept detection with high performance of discrimination and generalization. However, it is computationally expensive to train a concept detector using boosting, especially on large scale datasets. The bottleneck of training phase is to select the best learner among massive learners. Traditional approaches for selecting a weak classifier usually run in O(NT), with N examples and T learners. In this paper, we treat the best learner selection as a Nearest Neighbor Search problem in the function space instead of feature space. With the help of Locality Sensitive Hashing (LSH) algorithm, the best learner searching procedure can be speeded up in the time of O(NL), where L is the number of buckets in LSH. Compared with the T (~500,000), the L (~600) is much smaller in our experiments. In addition, through studying the distribution of weak learners and candidate query points, we present an efficient method to try to partition the weak learner points and the feasible region of query points uniformly as much as possible, which can achieve significant improvement in both recall and precision compared with the random projection in traditional LSH algorithm. Experimental results reveal our method can significantly reduce the training time. And still the performance of our method is comparable with the state-of-art methods.