Identifying Noise in an Attribute of Interest

  • Authors:
  • Taghi M. Khoshgoftaar;Jason Van Hulse

  • Affiliations:
  • Florida Atlantic University;Florida Atlantic University

  • Venue:
  • ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most significant issues facing the data mining community is that of low-quality data. Realworld datasets are often inundated with various types of data integrity issues, particularly noisy data. In response to the difficulties created by low-quality data, we propose a novel technique to detect noisy instances relative to an Attribute of Interest (AOI). Any attribute in the dataset can be defined by the user as the attribute of interest. A noise ranking of instances relative to the chosen attribute is output. This approach can be iterated for any number of user-specified attributes of interest. The case study described in this work demonstrates how our technique may be used to detect class noise, which occurs when errors are present in the class or dependent variable. In this scenario the class is declared to be the Attribute of Interest and an instance noise ranking relative to the class is provided. Our technique is compared to the well-known ensemble and classification filters which have been previously proposed for class noise detection. The results of this study demonstrate the effectiveness of our approach and show that our procedure is a useful tool for improving data quality.