Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new and versatile method for association generation
Information Systems
Quality information and knowledge
Quality information and knowledge
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
AIMQ: a methodology for information quality assessment
Information and Management
Data Quality: The Accuracy Dimension
Data Quality: The Accuracy Dimension
Mining Association Rules: Deriving a Superior Algorithm by Analyzing Today's Approaches
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Handbook of Data Visualization (Springer Handbooks of Computational Statistics)
Handbook of Data Visualization (Springer Handbooks of Computational Statistics)
Interactive Graphics for Data Analysis: Principles and Examples (Computer Science and Data Analysis)
Interactive Graphics for Data Analysis: Principles and Examples (Computer Science and Data Analysis)
Unsupervised discretization using kernel density estimation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
High data quality is important for every application. Inaccurate or inadequate data can lead to inappropriate assumptions, misleading results, bias and ultimately poor policy and decision making. Finding errors and cleaning data is a time consuming process. This paper presents a framework for automatically detecting unusual and erroneous data values in datasets. The main idea is to generate association rules with very high confidence and to identify the cases that are exceptions to these rules. Experimental results show that the proposed framework is able to successfully identify erroneous values in large datasets.