Overlap pattern synthesis with an efficient nearest neighbor classifier

Authors:
P. Viswanath;Narasimha Murty;Shalabh Bhatnagar
Affiliations:
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
Venue:
Pattern Recognition
Year:
2005

Citing 15
Cited 2

Recent advances in error rate estimation

Pattern Recognition Letters
Bias of Nearest Neighbor Error Estimates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bootstrap Techniques for Error Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayes Error Estimation Using Parzen and k-NN Procedures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Small Sample Error Rate Estimation for k-NN Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A Bootstrap Technique for Nearest Neighbor Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Sensitivities: an alternative to conditional probabilities for Bayesian belief networks

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Localized partial evaluation of belief networks

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
On the mean accuracy of statistical pattern recognizers

IEEE Transactions on Information Theory
Application of bootstrap and other resampling techniques: Evaluation of classifier performance

Pattern Recognition Letters

A nearest neighbor approach to letter recognition

Proceedings of the 44th annual Southeast regional conference
Pattern synthesis using fuzzy partitions of the feature set for nearest neighbor classifier design

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Nearest neighbor (NN) classifier is the most popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NN classifier are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensionality the number of training patterns needed by it to achieve a given classification accuracy becomes prohibitively large when the dimensionality of the data is high. In this paper, we propose novel techniques to improve the performance of NN classifier and at the same time to reduce its computational burden. These techniques are broadly based on: (i) overlap based pattern synthesis which can generate a larger number of artificial patterns than the number of input patterns and thus can reduce the curse of dimensionality effect, (ii) a compact representation of the given set of training patterns called overlap pattern graph (OLP-graph) which can be incrementally built by scanning the training set only once and (iii) an efficient NN classifier called OLP-NNC which directly works with OLP-graph and does implicit overlap based pattern synthesis. A comparison based on experimental results is given between some of the relevant classifiers. The proposed schemes are suitable for applications dealing with large and high dimensional datasets like those in data mining.