Pruning Training Sets for Learning of Object Categories

Authors:
Anelia Angelova;Yaser Abu-Mostafa;Pietro Perona
Affiliations:
California Institute of Technology;California Institute of Technology;California Institute of Technology
Venue:
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Year:
2005

Citing 0
Cited 16

Unsupervised data pruning for clustering of noisy data

Knowledge-Based Systems
Avoiding Boosting Overfitting by Removing Confusing Samples

ECML '07 Proceedings of the 18th European conference on Machine Learning
Active Learning Using a Constructive Neural Network Algorithm

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Improving object detection by removing noisy samples from training sets

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification

Computers in Biology and Medicine
Learning assignment order of instances for the constrained K-means clustering algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A team of continuous-action learning automata for noise-tolerant learning of half-spaces

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Edited AdaBoost by weighted kNN

Neurocomputing
RANSAC-based training data selection for emotion recognition from spontaneous speech

Proceedings of the 3rd international workshop on Affective interaction in natural environments
Face verification using indirect neighbourhood components analysis

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Learning Multi-modal Similarity

The Journal of Machine Learning Research
Unsupervised video surveillance

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
C-Mantec: A novel constructive neural network algorithm incorporating competition between neurons

Neural Networks
RANSAC-based training data selection on spectral features for emotion recognition from spontaneous speech

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms

International Journal of Multimedia Data Engineering & Management
The C-loss function for pattern classification

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training datasets for learning of object categories are often contaminated or imperfect. We explore an approach to automatically identify examples that are noisy or troublesome for learning and exclude them from the training set. The problem is relevant to learning in semi-supervised or unsupervised setting, as well as to learning when the training data is contaminated with wrongly labeled examples or when correctly labeled, but hard to learn examples, are present. We propose a fully automatic mechanism for noise cleaning, called ýdata pruningý, and demonstrate its success on learning of human faces. It is not assumed that the data or the noise can be modeled or that additional training examples are available. Our experiments show that data pruning can improve on generalization performance for algorithms with various robustness to noise. It outperforms methods with regularization properties and is superior to commonly applied aggregation methods, such as bagging.