On Visualization and Aggregation of Nearest Neighbor Classifiers

Authors:
Anil K. Ghosh;Probal Chaudhuri;C. A. Murthy
Affiliations:
-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2005

Citing 13
Cited 9

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Decision Combination in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bagging predictors

Machine Learning
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Voting over Multiple Condensed Nearest Neighbors

Artificial Intelligence Review - Special issue on lazy learning
Prototype selection for composite nearest neighbor classifiers

Prototype selection for composite nearest neighbor classifiers
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Density-Based Multiscale Data Condensation

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Genetic algorithms for generation of class boundaries

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An Effective and Efficient Exact Match Retrieval Scheme for Symbolic Image Database Systems Based on Spatial Reasoning: A Logarithmic Search Time Approach

IEEE Transactions on Knowledge and Data Engineering
Cluster-based nearest-neighbour classifier and its application on the lightning classification

Journal of Computer Science and Technology
A multidimensional hybrid intelligent method for gear fault diagnosis

Expert Systems with Applications: An International Journal
On optimum choice of k in nearest neighbor classification

Computational Statistics & Data Analysis
An efficient nearest neighbor classifier using an adaptive distance measure

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis

Computational Statistics & Data Analysis
On hybrid classification using model assisted posterior estimates

Pattern Recognition
Efficient model selection for large-scale nearest-neighbor data mining

BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data
A probabilistic approach for semi-supervised nearest neighbor classification

Pattern Recognition Letters

Quantified Score

Hi-index	0.15

Visualization

Abstract

Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k.