Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Knowing what to believe (when you already know something)
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Probabilistic models to reconcile complex data from inaccurate data sources
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
CoBayes: bayesian knowledge corroboration with assessors of unknown areas of expertise
Proceedings of the fourth ACM international conference on Web search and data mining
Semi-supervised truth discovery
Proceedings of the 20th international conference on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Making better informed trust decisions with generalized fact-finding
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Mining knowledge from interconnected data: a heterogeneous information network analysis approach
Proceedings of the VLDB Endowment
Less is more: selecting sources wisely for integration
Proceedings of the VLDB Endowment
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Determining the relative accuracy of attributes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd international conference on World Wide Web
Mining collective intelligence in diverse groups
Proceedings of the 22nd international conference on World Wide Web
Reconciliation of categorical opinions from multiple sources
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Data fusion: resolving conflicts from multiple sources
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Maximum likelihood analysis of conflicting observations in social sensing
ACM Transactions on Sensor Networks (TOSN)
Hi-index | 0.00 |
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.