Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
A novel classification algorithm to noise data
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Data quality in empirical software engineering: a targeted review
Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
Hi-index | 0.00 |
One of the most significant issues facing the data mining community is that of low-quality data. Realworld datasets are often inundated with various types of data integrity issues, particularly noisy data. In response to the difficulties created by low-quality data, we propose a novel technique to detect noisy instances relative to an Attribute of Interest (AOI). Any attribute in the dataset can be defined by the user as the attribute of interest. A noise ranking of instances relative to the chosen attribute is output. This approach can be iterated for any number of user-specified attributes of interest. The case study described in this work demonstrates how our technique may be used to detect class noise, which occurs when errors are present in the class or dependent variable. In this scenario the class is declared to be the Attribute of Interest and an instance noise ranking relative to the class is provided. Our technique is compared to the well-known ensemble and classification filters which have been previously proposed for class noise detection. The results of this study demonstrate the effectiveness of our approach and show that our procedure is a useful tool for improving data quality.