Noisy data elimination using mutual k-nearest neighbor for classification mining

Authors:
Huawen Liu;Shichao Zhang
Affiliations:
Department of Computer Science, Zhejiang Normal University, China and Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, China;College of Computer Science and Information Technology, Guangxi Normal University, China and Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia
Venue:
Journal of Systems and Software
Year:
2012

Citing 30
Cited 1

Data preparation for data mining

Data preparation for data mining
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Locally Adaptive Metric Nearest-Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Toward an Ecplanatory Similarity Measure for Nearest-Neighbor Classification

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Selective Sampling for Nearest Neighbor Classifiers

Machine Learning
K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization

Proceedings of the 2004 ACM symposium on Applied computing
Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Pattern Recognition
Top 10 algorithms in data mining

Knowledge and Information Systems
IKNN: Informative K-Nearest Neighbor Pattern Classification

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Feature selection with dynamic mutual information

Pattern Recognition
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
A method of learning weighted similarity function to improve the performance of nearest neighbor

Information Sciences: an International Journal
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
A novel template reduction approach for the K-nearest neighbor method

IEEE Transactions on Neural Networks
Class Conditional Nearest Neighbor for Large Margin Instance Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probably correct k-nearest neighbor search in high dimensions

Pattern Recognition
Pattern classification with missing data: a review

Neural Computing and Applications - Special Issue - KES2008
Efficient mutual nearest neighbor query processing for moving object trajectories

Information Sciences: an International Journal
Ensemble gene selection for cancer classification

Pattern Recognition
Integrating induction and deduction for noisy data mining

Information Sciences: an International Journal
Selection of a Representative Sample

Journal of Classification
Joint sampling distribution between actual and estimated classification errors for linear discriminant analysis

IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
A review of instance selection methods

Artificial Intelligence Review
Selective sampling techniques for feedback-based data retrieval

Data Mining and Knowledge Discovery
Shell-neighbor method and its application in missing data imputation

Applied Intelligence
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.)

IEEE Transactions on Information Theory

Nearest neighbor selection for iteratively kNN imputation

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

k nearest neighbor (kNN) is an effective and powerful lazy learning algorithm, notwithstanding its easy-to-implement. However, its performance heavily relies on the quality of training data. Due to many complex real-applications, noises coming from various possible sources are often prevalent in large scale databases. How to eliminate anomalies and improve the quality of data is still a challenge. To alleviate this problem, in this paper we propose a new anomaly removal and learning algorithm under the framework of kNN. The primary characteristic of our method is that the evidence of removing anomalies and predicting class labels of unseen instances is mutual nearest neighbors, rather than k nearest neighbors. The advantage is that pseudo nearest neighbors can be identified and will not be taken into account during the prediction process. Consequently, the final learning result is more creditable. An extensive comparative experimental analysis carried out on UCI datasets provided empirical evidence of the effectiveness of the proposed method for enhancing the performance of the k-NN rule.