Full border identification for reduction of training sets

Authors:
Guichong Li;Nathalie Japkowicz;Trevor J. Stocki;R. Kurt Ungar
Affiliations:
Computer Science of University of Ottawa;Computer Science of University of Ottawa;Radiation Protection Bureau, Health Canada, Ottawa, ON, Canada;Radiation Protection Bureau, Health Canada, Ottawa, ON, Canada
Venue:
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Year:
2008

Citing 8
Cited 4

Recursive stratified sampling for multidimensional Monte Carlo integration

Computers in Physics
C4.5: programs for machine learning

C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Active learning with statistical models

Journal of Artificial Intelligence Research

Instance Selection by Border Sampling in Multi-class Domains

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
The fundamental theory of optimal "Anti-Bayesian" parametric pattern classification using order statistics criteria

Pattern Recognition
Towards enhancing centroid classifier for text classification-A border-instance approach

Neurocomputing
"Anti-Bayesian" parametric pattern classification using order statistics criteria for some members of the exponential family

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Border identification (BI) was previously proposed to help learning systems focus on the most relevant portion of the training set so as to improve learning accuracy. This paper argues that the traditional BI implementation suffers from a serious limitation: it is only able to identify partial borders. This paper proposes a new BI method called Progressive Border Sampling (PBS), which addresses this limitation by borrowing ideas from recent research on Progressive Sampling. PBS progressively learns optimal borders from the entire training sets by, first, identifying a full border, thus, avoiding the limitation of the traditional BI method, and, second, by incrementing the size of that border until it converges to an optimal sample, which is smaller than the original training set. Since PBS identifies the full border, it is expected to discover more optimal samples than traditional BI. Our experimental results on the selected 30 benchmark datasets from the UCI repository show that, indeed, in the context of classification, PBS is more successful than traditional BI at reducing the size of the training sets and optimizing the accuracy results.