Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Exploring the tradeoff between performance and data freshness in database-driven Web servers
The VLDB Journal — The International Journal on Very Large Data Bases
Caching with "good enough" currency, consistency, and completeness
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Data fusion: resolving data conflicts for integration
Proceedings of the VLDB Endowment
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Data fusion: resolving data conflicts for integration
Proceedings of the VLDB Endowment
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Automatic extraction of clickable structured web contents for name entity queries
Proceedings of the 19th international conference on World wide web
Consistent query answers in inconsistent probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Active knowledge: dynamically enriching RDF knowledge bases by web services
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Knowing what to believe (when you already know something)
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Probabilistic models to reconcile complex data from inaccurate data sources
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Trust analysis with clustering
Proceedings of the 20th international conference companion on World wide web
Semi-supervised truth discovery
Proceedings of the 20th international conference on World wide web
Characterizing the uncertainty of web data: models and experiences
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Solomon: seeking the truth via copying detection
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Determining the currency of data
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Content-driven trust propagation framework
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
Conflict-aware historical data fusion
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
A Bayesian approach to discovering truth from conflicting sources for data integration
Proceedings of the VLDB Endowment
On truth discovery in social sensing: a maximum likelihood estimation approach
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Information integration over time in unreliable and uncertain environments
Proceedings of the 21st international conference on World Wide Web
Making better informed trust decisions with generalized fact-finding
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Multisensor data fusion: A review of the state-of-the-art
Information Fusion
Determining the Currency of Data
ACM Transactions on Database Systems (TODS)
Web data reconciliation: models and experiences
Search Computing
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Determining the relative accuracy of attributes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd international conference on World Wide Web
Data fusion: resolving conflicts from multiple sources
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Maximum likelihood analysis of conflicting observations in social sensing
ACM Transactions on Sensor Networks (TOSN)
Hi-index | 0.00 |
Modern information management applications often require integrating data from a variety of data sources, some of which may copy or buy data from other sources. When these data sources model a dynamically changing world (e.g., people's contact information changes over time, restaurants open and go out of business), sources often provide out-of-date data. Errors can also creep into data when sources are updated often. Given out-of-date and erroneous data provided by different, possibly dependent, sources, it is challenging for data integration systems to provide the true values. Straightforward ways to resolve such inconsistencies (e.g., voting) may lead to noisy results, often with detrimental consequences. In this paper, we study the problem of finding true values and determining the copying relationship between sources, when the update history of the sources is known. We model the quality of sources over time by their coverage, exactness and freshness. Based on these measures, we conduct a probabilistic analysis. First, we develop a Hidden Markov Model that decides whether a source is a copier of another source and identifies the specific moments at which it copies. Second, we develop a Bayesian model that aggregates information from the sources to decide the true value for a data item, and the evolution of the true values over time. Experimental results on both real-world and synthetic data show high accuracy and scalability of our techniques.