Statistical analysis with missing data
Statistical analysis with missing data
Unknown attribute values in induction
Proceedings of the sixth international workshop on Machine learning
Characterization and detection of noise in clustering
Pattern Recognition Letters
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Noise modelling and evaluating learning from examples
Artificial Intelligence
Discovering informative patterns and data cleaning
Advances in knowledge discovery and data mining
Data quality and systems theory
Communications of the ACM
The impact of poor data quality on the typical enterprise
Communications of the ACM
Understanding the Crucial Role of AttributeInteraction in Data Mining
Artificial Intelligence Review
Knowledge Acquisition from Databases
Knowledge Acquisition from Databases
Data Quality for the Information Age
Data Quality for the Information Age
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Unknown Attribute Values Processing by Meta-learner
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Probabilistic Noise Identification and Data Cleaning
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
Error detection and impact-sensitive instance ranking in noisy datasets
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using qualitative hypotheses to identify inaccurate data
Journal of Artificial Intelligence Research
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Data Mining and Knowledge Discovery
Effective classification of noisy data streams with attribute-oriented dynamic classifier selection
Knowledge and Information Systems
The pairwise attribute noise detection algorithm
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Efficient sampling of training set in large and noisy multimedia data
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Identifying noisy features with the Pairwise Attribute Noise Detection Algorithm
Intelligent Data Analysis
Active Learning Using a Constructive Neural Network Algorithm
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Imputation techniques for multivariate missingness in software measurement data
Software Quality Control
Detecting Aggregate Incongruities in XML
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Application-Independent Feature Construction from Noisy Samples
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data Mining and Knowledge Discovery
Performance evaluation of evolutionary algorithms in classification of biomedical datasets
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Use of Classification Algorithms in Noise Detection and Elimination
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
A study of the effect of different types of noise on the precision of supervised learning techniques
Artificial Intelligence Review
Journal of Multivariate Analysis
Web image annotation based on automatically obtained noisy training set
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Feature selection for Bayesian network classifiers using the MDL-FS score
International Journal of Approximate Reasoning
The Effects and Interactions of Data Quality and Problem Complexity on Classification
Journal of Data and Information Quality (JDIQ)
Robust ensemble learning for mining noisy data streams
Decision Support Systems
Exploiting probabilistic topic models to improve text categorization under class imbalance
Information Processing and Management: an International Journal
Controlling the prediction accuracy by adjusting the abstraction levels
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
A unifying view on dataset shift in classification
Pattern Recognition
A GMDH-based fuzzy modeling approach for constructing TS model
Fuzzy Sets and Systems
A robust missing value imputation method for noisy data
Applied Intelligence
Attribute outlier detection over data streams
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
A first study on decomposition strategies with data with class noise using decision trees
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Journal of Data and Information Quality (JDIQ)
International Journal of Business Intelligence and Data Mining
A novel classification algorithm to noise data
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Information Sciences: an International Journal
Learning with limited and noisy tagging
Proceedings of the 21st ACM international conference on Multimedia
Information Sciences: an International Journal
Dependent binary relevance models for multi-label classification
Pattern Recognition
Information Sciences: an International Journal
Hi-index | 0.00 |
Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Unfortunately, rare research has been conducted to systematically explore the impact of noise, especially from the noise handling point of view. This has made various noise processing techniques less significant, specifically when dealing with noise that is introduced in attributes. In this paper, we present a systematic evaluation on the effect of noise in machine learning. Instead of taking any unified theory of noise to evaluate the noise impacts, we differentiate noise into two categories: class noise and attribute noise, and analyze their impacts on the system performance separately. Because class noise has been widely addressed in existing research efforts, we concentrate on attribute noise. We investigate the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise. Our conclusions can be used to guide interested readers to enhance data quality by designing various noise handling mechanisms.