C4.5: programs for machine learning
C4.5: programs for machine learning
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Artificial Intelligence Review - Special issue on lazy learning
Data quality and systems theory
Communications of the ACM
The impact of poor data quality on the typical enterprise
Communications of the ACM
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
A Comparison of Noise Handling Techniques
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Analyzing Software Measurement Data with Clustering Techniques
IEEE Intelligent Systems
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Software Reliability Prediction Using Group Method of Data Handling
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Hi-index | 0.00 |
We present two new noise filtering techniques which improve the quality of training datasets by removing data points that are likely to be noisy. In addition, a new measure called 'efficiency paired comparison' is introduced for simplifying the comparison between two filters. The filtering techniques are based on the partitioning approach the training dataset is first split into subsets, and base learners are induced on each of these subsets. The predictions are then combined in such a way that an instance in the training data is identified as noisy if it is misclassified by a certain number of base learners. The first technique, multiple partitioning filter combines several classifiers induced on each subset. The second technique, iterative-partitioning filter uses only one base learner but goes through multiple filtering iterations. The amount of noise removed by the techniques is varied by tuning either the filtering level or the number of iterations. Empirical studies using software measurement data from a high assurance software project assess the efficiencies of our two noise filtering approaches. The empirical results suggest that using several base classifiers as well as performing several iterations with a conservative filtering scheme can improve the efficiency of the filtering technique.