Determining the relative accuracy of attributes

Authors:
Yang Cao;Wenfei Fan;Wenyuan Yu
Affiliations:
School of Informatics, University of Edinburgh/ Big Data Research Center and SKLSDE Lab, Beihang University, Beijing, China;School of Informatics, University of Edinburgh/ Big Data Research Center and SKLSDE Lab, Beihang University, Edinburgh, United Kingdom;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 27
Cited 0

The notion of data and its quality dimensions

Information Processing and Management: an International Journal
Worst-case efficient priority queues

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Confirmation-guided discovery of first-order rules with tertius

Machine Learning
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)

Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Managing Information Quality: Increasing the Value of Information in Knowledge-intensive Products and Processes

Managing Information Quality: Increasing the Value of Information in Knowledge-intensive Products and Processes
Conditional functional dependencies for capturing data inconsistencies

ACM Transactions on Database Systems (TODS)
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Truth Discovery with Multiple Conflicting Information Providers on the Web

IEEE Transactions on Knowledge and Data Engineering
Discovering data quality rules

Proceedings of the VLDB Endowment
Data fusion

ACM Computing Surveys (CSUR)
An Accuracy Metric: Percentages, Randomness, and Probabilities

Journal of Data and Information Quality (JDIQ)
Truth discovery and copying detection in a dynamic world

Proceedings of the VLDB Endowment
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Enterprise architecture analysis for data accuracy assessments

EDOC'09 Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing
Setting priorities for data accuracy improvements in satisficing decision-making scenarios: A guiding theory

Decision Support Systems
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
Probabilistic models to reconcile complex data from inaccurate data sources

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
A framework for corroborating answers from multiple web sources

Information Systems
Data cleaning and query answering with matching dependencies and matching functions

Proceedings of the 14th International Conference on Database Theory
Interaction between record matching and data repairing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A Bayesian approach to discovering truth from conflicting sources for data integration

Proceedings of the VLDB Endowment
Foundations of Data Quality Management

Foundations of Data Quality Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The relative accuracy problem is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2A, i.e., t1A is closer to the true value of the A attribute of e than t2A. This has been a longstanding issue for data quality, and is challenging when the true values of e are unknown. This paper proposes a model for determining relative accuracy. (1) We introduce a class of accuracy rules and an inference system with a chase procedure, to deduce relative accuracy. (2) We identify and study several fundamental problems for relative accuracy. Given a set Ie of tuples pertaining to the same entity e and a set of accuracy rules, these problems are to decide whether the chase process terminates, is Church-Rosser, and leads to a unique target tuple te composed of the most accurate values from Ie for all the attributes of e. (3) We propose a framework for inferring accurate values with user interaction. (4) We provide algorithms underlying the framework, to find the unique target tuple te whenever possible; when there is no enough information to decide a complete te, we compute top-k candidate targets based on a preference model. (5) Using real-life and synthetic data, we experimentally verify the effectiveness and efficiency of our method.