The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The Eigentrust algorithm for reputation management in P2P networks
WWW '03 Proceedings of the 12th international conference on World Wide Web
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TrustMe: Anonymous Management of Trust Relationships in Decentralized P2P Systems
P2P '03 Proceedings of the 3rd International Conference on Peer-to-Peer Computing
Link analysis ranking: algorithms, theory, and experiments
ACM Transactions on Internet Technology (TOIT)
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data fusion: resolving data conflicts for integration
Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
Consistent query answers in inconsistent probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Redundancy-driven web data extraction and integration
Procceedings of the 13th International Workshop on the Web and Databases
A generic framework for handling uncertain data with local correlations
Proceedings of the VLDB Endowment
Probabilistic models to reconcile complex data from inaccurate data sources
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
SOLOMON: seeking the truth via copying detection
Proceedings of the VLDB Endowment
Factal: integrating deep web based on trust and relevance
Proceedings of the 20th international conference companion on World wide web
Semi-supervised truth discovery
Proceedings of the 20th international conference on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Characterizing the uncertainty of web data: models and experiences
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Solomon: seeking the truth via copying detection
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Efficient query answering in probabilistic RDF graphs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
Some thoughts on using argumentation to handle trust
CLIMA'11 Proceedings of the 12th international conference on Computational logic in multi-agent systems
Conflict-aware historical data fusion
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Improving data quality by source analysis
Journal of Data and Information Quality (JDIQ)
A Bayesian approach to discovering truth from conflicting sources for data integration
Proceedings of the VLDB Endowment
CDAS: a crowdsourcing data analytics system
Proceedings of the VLDB Endowment
On the foundations of probabilistic information integration
Proceedings of the 21st ACM international conference on Information and knowledge management
Predicting website correctness from consensus analysis
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Using argumentation to reason with and about trust
ArgMAS'11 Proceedings of the 8th international conference on Argumentation in Multi-Agent Systems
Web data reconciliation: models and experiences
Search Computing
Numeric Query Answering on the Web
International Journal on Semantic Web & Information Systems
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Less is more: selecting sources wisely for integration
Proceedings of the VLDB Endowment
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Reasoning about uncertain information and conflict resolution through trust revision
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Mining collective intelligence in diverse groups
Proceedings of the 22nd international conference on World Wide Web
Data fusion: resolving conflicts from multiple sources
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Aggregating semantic annotators
Proceedings of the VLDB Endowment
Agreement based source selection for the multi-topic deep web integration
Proceedings of the 17th International Conference on Management of Data
Hi-index | 0.00 |
Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others. We present a novel approach that considers dependence between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (e.g., particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering accuracy of data sources and similarity between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.