A dynamic adaptive sampling algorithm (DASA) for real world applications: finger print recognition and face recognition

Authors:
Ashwin Satyanarayana;Ian Davidson
Affiliations:
Department of Computer Science, State University of New York, Albany, Albany, NY;Department of Computer Science, State University of New York, Albany, Albany, NY
Venue:
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Year:
2005

Citing 7
Cited 0

Rigorous learning curve bounds from statistical mechanics

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Guide to Neural Computing Applications

Guide to Neural Computing Applications
Machine Learning

Machine Learning
Progressive rademacher sampling

Eighteenth national conference on Artificial intelligence
The learning-curve sampling method applied to model-based clustering

The Journal of Machine Learning Research
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real world problems, data mining algorithms have access to massive amounts of data (defense and security). Mining all the available data is prohibitive due to computational (time and memory) constraints. Thus, the smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Progressive sampling randomly selects an initial small sample and increases the sample size using either geometric or arithmetic series until the error converges, with the sampling schedule determined apriori. In this paper, we explore sampling schedules that are adaptive to the dataset under consideration. We develop a general approach to determine how many instances are required at each iteration for convergence using Chernoff Inequality. We try our approach on two real world problems where data is abundant: face recognition and finger print recognition using neural networks. Our empirical results show that our dynamic approach is faster and uses much fewer examples than other existing methods. However, the use of Chernoff bound requires the samples at each iteration to be independent of each other. Future work will look at removing this limitation which should further improve performance.