Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Discovering informative patterns and data cleaning
Advances in knowledge discovery and data mining
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Cleaning microarray expression data using Markov random fields based on profile similarity
Proceedings of the 2005 ACM symposium on Applied computing
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Data Mining and Knowledge Discovery
A surrogate variable-based data mining method using CFS and RSM
ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Conceptual equivalence for contrast mining in classification learning
Data & Knowledge Engineering
Domain independent data discrepancy detection using ensemble learning
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Application-Independent Feature Construction from Noisy Samples
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Journal of Data and Information Quality (JDIQ)
Error detection and impact-sensitive instance ranking in noisy datasets
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Modeling and querying possible repairs in duplicate detection
Proceedings of the VLDB Endowment
ERACER: a database approach for statistical inference and data cleaning
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Sensitivity of different machine learning algorithms to noise
Journal of Computing Sciences in Colleges
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Classifying noisy data streams
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Hi-index | 0.00 |
Real world data is never as perfect as we would like itto be and can often suffer from corruptions that may impactinterpretations of the data, models created from thedata, and decisions made based on the data.One approachto this problem is to identify and remove records that containcorruptions.Unfortunately, if only certain fields in arecord have been corrupted then usable, uncorrupted datawill be lost.In this paper we present LENS, an approach foridentifying corrupted fields and using the remaining non-corruptedfields for subsequent modeling and analysis.Ourapproach uses the data to learn a probabilistic model containingthree components: a generative model of the cleanrecords, a generative model of the noise values, and a probabilisticmodel of the corruption process.We provide an algorithmfor the unsupervised discovery of such models andempirically evaluate both its performance at detecting corruptedfields and, as one example application, the resultingimprovement this gives to a classifier.