C4.5: programs for machine learning
C4.5: programs for machine learning
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Ordinal association rules for error identification in data sets
Proceedings of the tenth international conference on Information and knowledge management
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation
Empirical Software Engineering
Rule Induction with CN2: Some Recent Improvements
EWSL '91 Proceedings of the European Working Session on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Tree Structures for Mining Association Rules
Data Mining and Knowledge Discovery
Analyzing Software Measurement Data with Clustering Techniques
IEEE Intelligent Systems
Cost-Guided Class Noise Handling for Effective Cost-Sensitive Learning
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Enhancing software quality estimation using ensemble-classifier based noise filtering
Intelligent Data Analysis
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
The multiple imputation quantitative noise corrector
Intelligent Data Analysis
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
IEEE Transactions on Neural Networks
An exploration of learning when data is noisy and imbalanced
Intelligent Data Analysis
Multi-view learning from imperfect tagging
Proceedings of the 20th ACM international conference on Multimedia
Mining noisy tagging from multi-label space
Proceedings of the 21st ACM international conference on Information and knowledge management
Learning with limited and noisy tagging
Proceedings of the 21st ACM international conference on Multimedia
Ensemble-based noise detection: noise ranking and visual performance evaluation
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
The presence of a substantial number of noisy instances in a given dataset may adversely affect the hypothesis learnt from that data. Removing noisy instances prior to the construction of a classifier has been shown to improve the classification ability of a learner on new data. This paper introduces a novel technique for identifying observations with class noise in a dataset using frequent itemsets. For the given dataset, each instance is assigned a NoiseFactor, indicating a relative likelihood that it contains class noise. A frequent itemset is a set of instances with common attribute values which contains at least as many instances as a user-defined minimum support threshold. Consequently, the set of frequent itemsets contains information related to the structure and dependence between the attributes. Each frequent itemset is assigned a class, based on the proportion of instances within the itemset from each class. Instances that are contained in itemsets that have a large proportion of instances from the other class are identified as noisy. The technique proposed in this paper is analyzed in numerous case studies using real-world software measurement datasets with either inherent or injected noise. A comparison is provided with two well-known techniques for the identification of class noise: Classification Filter and Ensemble Filter. The results demonstrate that this new algorithm is very effective at identifying instances with class noise.