Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Truth Discovery with Multiple Conflicting Information Providers on the Web
IEEE Transactions on Knowledge and Data Engineering
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Automatically building probabilistic databases from the web
Proceedings of the 20th international conference companion on World wide web
Characterizing the uncertainty of web data: models and experiences
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
A Bayesian approach to discovering truth from conflicting sources for data integration
Proceedings of the VLDB Endowment
Web data reconciliation: models and experiences
Search Computing
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Determining the relative accuracy of attributes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Extraction and integration of partially overlapping web sources
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.