The notion of data and its quality dimensions
Information Processing and Management: an International Journal
Worst-case efficient priority queues
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Confirmation-guided discovery of first-order rules with tertius
Machine Learning
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Supporting top-k join queries in relational databases
The VLDB Journal — The International Journal on Very Large Data Bases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Managing Information Quality: Increasing the Value of Information in Knowledge-intensive Products and Processes
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
Evaluating rank joins with optimal cost
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Truth Discovery with Multiple Conflicting Information Providers on the Web
IEEE Transactions on Knowledge and Data Engineering
Discovering data quality rules
Proceedings of the VLDB Endowment
ACM Computing Surveys (CSUR)
An Accuracy Metric: Percentages, Randomness, and Probabilities
Journal of Data and Information Quality (JDIQ)
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Enterprise architecture analysis for data accuracy assessments
EDOC'09 Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing
An Introduction to Duplicate Detection
An Introduction to Duplicate Detection
Probabilistic models to reconcile complex data from inaccurate data sources
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
A framework for corroborating answers from multiple web sources
Information Systems
Data cleaning and query answering with matching dependencies and matching functions
Proceedings of the 14th International Conference on Database Theory
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A Bayesian approach to discovering truth from conflicting sources for data integration
Proceedings of the VLDB Endowment
Foundations of Data Quality Management
Foundations of Data Quality Management
Hi-index | 0.00 |
The relative accuracy problem is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2A, i.e., t1A is closer to the true value of the A attribute of e than t2A. This has been a longstanding issue for data quality, and is challenging when the true values of e are unknown. This paper proposes a model for determining relative accuracy. (1) We introduce a class of accuracy rules and an inference system with a chase procedure, to deduce relative accuracy. (2) We identify and study several fundamental problems for relative accuracy. Given a set Ie of tuples pertaining to the same entity e and a set of accuracy rules, these problems are to decide whether the chase process terminates, is Church-Rosser, and leads to a unique target tuple te composed of the most accurate values from Ie for all the attributes of e. (3) We propose a framework for inferring accurate values with user interaction. (4) We provide algorithms underlying the framework, to find the unique target tuple te whenever possible; when there is no enough information to decide a complete te, we compute top-k candidate targets based on a preference model. (5) Using real-life and synthetic data, we experimentally verify the effectiveness and efficiency of our method.