Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing case-based reasoning classifiers for predicting high risk software components
Journal of Systems and Software
Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Predicting Fault-Proneness using OO Metrics: An Industrial Case Study
CSMR '02 Proceedings of the 6th European Conference on Software Maintenance and Reengineering
Investigation of Logistic Regression as a Discriminant of Software Quality
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
ISSRE '02 Proceedings of the 13th International Symposium on Software Reliability Engineering
Analyzing Software Measurement Data with Clustering Techniques
IEEE Intelligent Systems
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study
Empirical Software Engineering
The Necessity of Assuring Quality in Software Measurement Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Enhancing software quality estimation using ensemble-classifier based noise filtering
Intelligent Data Analysis
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
The pairwise attribute noise detection algorithm
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Identifying noisy features with the Pairwise Attribute Noise Detection Algorithm
Intelligent Data Analysis
Extracting classification rule of software diagnosis using modified MEPA
Expert Systems with Applications: An International Journal
Hybrid sampling for imbalanced data
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Improving software-quality predictions with data sampling and boosting
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
IEEE Transactions on Neural Networks
Software diagnosis using fuzzified attribute base on modified MEPA
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Predicting high-risk program modules by selecting the right software measurements
Software Quality Control
Hi-index | 0.00 |
The performance of a classification model is invariably affected by the characteristics of measurement data it is built upon. If quality of the data is generally poor, then the classification model will demonstrate poor performance. The amount of noisy instances present in a given dataset is a good reflection of quality of the data. The detection and removal of noisy data instances will improve quality of the data, and consequently the performance of the classification model. This study presents an attractive and user-friendly approach for detecting data noise based on Boolean rules generated from the measurement data. The approach follows a simple and replicable approach that analyzes the rules to detect mislabeled noisy instances in the training dataset. Such instances are treated as data noise, and are removed to obtain a clean dataset. A case study of a software measurement dataset with known noisy instances is used to demonstrate the effectiveness of our approach. The dataset is obtained from a NASA software project developed for realtime predictions based on simulations. It is empirically demonstrated that the proposed approach is extremely effective in detecting noise in the dataset; in fact, the approach detected 100% of the known noisy instances. The proposed approach is compared with noise filtering based on five classification filters and an ensemble filter of five classifiers. We also demonstrate that the proposed approach shows excellent promise in detecting noisy instances in several (six) independent and real-world software measurement datasets with unknown noisy instances.