Statistical analysis with missing data
Statistical analysis with missing data
Structured induction in expert systems
Structured induction in expert systems
C4.5: programs for machine learning
C4.5: programs for machine learning
Knowledge acquisition from databases
Knowledge acquisition from databases
Data quality and systems theory
Communications of the ACM
Data Quality
Understanding the Crucial Role of AttributeInteraction in Data Mining
Artificial Intelligence Review
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Probabilistic Noise Identification and Data Cleaning
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Linear-Time Wrappers to Identify Atypical Points: Two Subset Generation Methods
IEEE Transactions on Knowledge and Data Engineering
Cost-Constrained Data Acquisition for Intelligent Data Preparation
IEEE Transactions on Knowledge and Data Engineering
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Data Mining and Knowledge Discovery
Editorial: Special issue on mining low-quality data
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Unsupervised data pruning for clustering of noisy data
Knowledge-Based Systems
Soft fuzzy rough sets for robust feature evaluation and selection
Information Sciences: an International Journal
RAMOBoost: ranked minority oversampling in boosting
IEEE Transactions on Neural Networks
Robust fuzzy rough classifiers
Fuzzy Sets and Systems
From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
ACM Transactions on Knowledge Discovery from Data (TKDD)
A novel classification algorithm to noise data
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Information Sciences: an International Journal
Hi-index | 0.00 |
Given a noisy dataset, how to locate erroneous instances and attributes and rank suspicious instances based on their impacts on the system performance is an interesting and important research issue. We provide in this paper an Error Detection and Impact-sensitive instance Ranking (EDIR) mechanism to address this problem. Given a noisy dataset D, we first train a benchmark classifier T from D. The instances, that cannot be effectively classified by T are treated as suspicious and forwarded to a subset S. For each attribute Ai, we switch Ai and the class label C to train a classifier APi for Ai. Given an instance Ik in S, we use APi and the benchmark classifier T to locate the erroneous value of each attribute Ai. To quantitatively rank instances in S, we define an impact measure based on the Information-gain Ratio (IR). We calculate IRi between attribute Ai and C, and use IRi as the impact-sensitive weight of Ai. The sum of impact-sensitive weights from all located erroneous attributes of Ik indicates its total impact value. The experimental results demonstrate the effectiveness of our strategies.