Noisy data elimination using mutual k-nearest neighbor for classification mining

  • Authors:
  • Huawen Liu;Shichao Zhang

  • Affiliations:
  • Department of Computer Science, Zhejiang Normal University, China and Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, China;College of Computer Science and Information Technology, Guangxi Normal University, China and Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

k nearest neighbor (kNN) is an effective and powerful lazy learning algorithm, notwithstanding its easy-to-implement. However, its performance heavily relies on the quality of training data. Due to many complex real-applications, noises coming from various possible sources are often prevalent in large scale databases. How to eliminate anomalies and improve the quality of data is still a challenge. To alleviate this problem, in this paper we propose a new anomaly removal and learning algorithm under the framework of kNN. The primary characteristic of our method is that the evidence of removing anomalies and predicting class labels of unseen instances is mutual nearest neighbors, rather than k nearest neighbors. The advantage is that pseudo nearest neighbors can be identified and will not be taken into account during the prediction process. Consequently, the final learning result is more creditable. An extensive comparative experimental analysis carried out on UCI datasets provided empirical evidence of the effectiveness of the proposed method for enhancing the performance of the k-NN rule.