C4.5: programs for machine learning
C4.5: programs for machine learning
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Communications of the ACM
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Ordinal association rules for error identification in data sets
Proceedings of the tenth international conference on Information and knowledge management
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation
Empirical Software Engineering
Rule Induction with CN2: Some Recent Improvements
EWSL '91 Proceedings of the European Working Session on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
An Extensible Framework for Data Cleaning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A geometric theory of outliers and perturbation
A geometric theory of outliers and perturbation
Analyzing Software Measurement Data with Clustering Techniques
IEEE Intelligent Systems
The Necessity of Assuring Quality in Software Measurement Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Dealing with predictive-but-unpredictable attributes in noisy data sources
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Detecting noisy instances with the rule-based classification model
Intelligent Data Analysis
Detecting graph-based spatial outliers
Intelligent Data Analysis
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Determining noisy instances relative to attributes of interest
Intelligent Data Analysis
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
Use of Classification Algorithms in Noise Detection and Elimination
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Empirical case studies in attribute noise detection
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
A pattern-based outlier detection method identifying abnormal attributes in software project data
Information and Software Technology
Data quality: cinderella at the software metrics ball?
Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
A novel classification algorithm to noise data
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Information enhancement for data mining
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data
Expert Systems with Applications: An International Journal
Ensemble-based noise detection: noise ranking and visual performance evaluation
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Analyzing the quality of data prior to constructing data mining models is emerging as an important issue. Algorithms for identifying noise in a given data set can provide a good measure of data quality. Considerable attention has been devoted to detecting class noise or labeling errors. In contrast, limited research work has been devoted to detecting instances with attribute noise, in part due to the difficulty of the problem. We present a novel approach for detecting instances with attribute noise and demonstrate its usefulness with case studies using two different real-world software measurement data sets. Our approach, called Pairwise Attribute Noise Detection Algorithm (PANDA), is compared with a nearest neighbor, distance-based outlier detection technique (denoted DM) investigated in related literature. Since what constitutes noise is domain specific, our case studies uses a software engineering expert to inspect the instances identified by the two approaches to determine whether they actually contain noise. It is shown that PANDA provides better noise detection performance than the DM algorithm.