C4.5: programs for machine learning
C4.5: programs for machine learning
Case-based reasoning
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Artificial Intelligence Review - Special issue on lazy learning
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
A Comparison of Noise Handling Techniques
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Enhancing software quality estimation using ensemble-classifier based noise filtering
Intelligent Data Analysis
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
On the use of data filtering techniques for credit risk prediction with instance-based models
Expert Systems with Applications: An International Journal
Data quality in empirical software engineering: a targeted review
Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
Hi-index | 0.00 |
Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset.