Empirical case studies in attribute noise detection

Authors:
Taghi M. Khoshgoftaar;Jason Van Hulse
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Year:
2009

Citing 11
Cited 2

Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation

Empirical Software Engineering
Correcting Noisy Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
The Necessity of Assuring Quality in Software Measurement Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Cost-Guided Class Noise Handling for Effective Cost-Sensitive Learning

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
The pairwise attribute noise detection algorithm

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Software mining and fault prediction

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of data is an important issue in any domain-specific data mining and knowledge discovery initiative. The validity of solutions produced by data-driven algorithms can be diminished if the data being analyzed are of low quality. The quality of data is often realized in terms of data noise present in the given dataset and can include noisy attributes or labeling errors. Hence, tools for improving the quality of data are important to the data mining analyst. We present a comprehensive empirical investigation of our new and innovative technique for ranking attributes in a given dataset from most to least noisy. Upon identifying the noisy attributes, specific treatments can be applied depending on how the data are to be used. In a classification setting, for example, if the class label is determined to contain the most noise, processes to cleanse this important attribute may be undertaken. Independent variables or predictors that have a low correlation to the class attribute and appear noisy may be eliminated from the analysis. Several case studies using both real-world and synthetic datasets are presented in this study. The noise detection performance is evaluated by injecting noise into multiple attributes at different noise levels. The empirical results demonstrate conclusively that our technique provides a very accurate and useful ranking of noisy attributes in a given dataset.