Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm

Authors:
Brent D. Morring;Tony R. Martinez
Affiliations:
Computer Science Department, Brigham Young University, Provo, UT 84602, USA. E-mail: morringb@axon.cs.byu.edu;E-mail: martinez@cs.byu.edu
Venue:
Intelligent Data Analysis
Year:
2004

Citing 16
Cited 6

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Radial basis functions for multivariable interpolation: a review

Algorithms for approximation
Instance-Based Learning Algorithms

Machine Learning
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms

International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms

Machine Learning
Similarity metric learning for a variable-kernel classifier

Neural Computation
Unifying instance-based and rule-based induction

Machine Learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Selecting Typical Instances in Instance-Based Learning

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning Symbolic Prototypes

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Rule induction and instance-based learning a unified approach

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Multiresolution instance-based learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Symbolic nearest mean classifiers

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

A novel Supervised Instance Selection algorithm

International Journal of Business Intelligence and Data Mining
A method of learning weighted similarity function to improve the performance of nearest neighbor

Information Sciences: an International Journal
A proposed method of local feature-weighting to improve predictions of basic nearest neighbor rule

ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
InstanceRank: Bringing order to datasets

Pattern Recognition Letters
Automated constraint selection for semi-supervised clustering algorithm

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
PolarityRank: Finding an equilibrium between followers and contraries in a network

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two disadvantages of the standard nearest neighbor algorithm are1) it must store all the instances of the training set, thuscreating a large memory footprint and 2) it must search all theinstances of the training set to predict the classification of anew query point, thus it is slow at run time. Much work has beendone to remedy these shortcomings. This paper presents a newalgorithm WITS (Weighted-Instance Typicality Search) and a modifiedversion, Clustered-WITS (C-WITS), designed to address these issues.Data reduction algorithms address both issues by storing and usingonly a portion of the available instances. WITS is an incrementaldata reduction algorithm with O(n^2) complexity, where n is thetraining set size. WITS uses the concept of Typicality inconjunction with Instance-Weighting to produce minimal nearestneighbor solutions. WITS and C-WITS are compared to three otherstate of the art data reduction algorithms on ten real-worlddatasets. WITS achieved the highest average accuracy, showed fewercatastrophic failures, and stored an average of 71% fewer instancesthan DROP-5, the next most competitive algorithm in terms ofaccuracy and catastrophic failures. The C-WITS algorithm provides auser-defined parameter that gives the user control over thetraining-time vs. accuracy balance. This modification makes C-WITSmore suitable for large problems, the very problems data reductionsalgorithms are designed for. On two large problems (10,992 and20,000 instances), C-WITS stores only a small fraction of theinstances (0.88% and 1.95% of the training data)while maintaininggeneralization accuracies comparable to the best accuraciesreported for these problems.