Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
A graph-based semi-supervised learning for question-answering
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Corroborating information from disagreeing views
Proceedings of the third ACM international conference on Web search and data mining
Automatic extraction of clickable structured web contents for name entity queries
Proceedings of the 19th international conference on World wide web
Web-scale knowledge extraction from semi-structured tables
Proceedings of the 19th international conference on World wide web
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
A Bayesian approach to discovering truth from conflicting sources for data integration
Proceedings of the VLDB Endowment
On truth discovery in social sensing: a maximum likelihood estimation approach
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Less is more: selecting sources wisely for integration
Proceedings of the VLDB Endowment
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Mining collective intelligence in diverse groups
Proceedings of the 22nd international conference on World Wide Web
Maximum likelihood analysis of conflicting observations in social sensing
ACM Transactions on Sensor Networks (TOSN)
Hi-index | 0.00 |
Accessing online information from various data sources has become a necessary part of our everyday life. Unfortunately such information is not always trustworthy, as different sources are of very different qualities and often provide inaccurate and conflicting information. Existing approaches attack this problem using unsupervised learning methods, and try to infer the confidence of the data value and trustworthiness of each source from each other by assuming values provided by more sources are more accurate. However, because false values can be widespread through copying among different sources and out-of-date data often overwhelm up-to-date data, such bootstrapping methods are often ineffective. In this paper we propose a semi-supervised approach that finds true values with the help of ground truth data. Such ground truth data, even in very small amount, can greatly help us identify trustworthy data sources. Unlike existing studies that only provide iterative algorithms, we derive the optimal solution to our problem and provide an iterative algorithm that converges to it. Experiments show our method achieves higher accuracy than existing approaches, and it can be applied on very huge data sets when implemented with MapReduce.