Elements of information theory
Elements of information theory
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Video copy detection: a comparative study
Proceedings of the 6th ACM international conference on Image and video retrieval
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Truth discovery and copying detection in a dynamic world
Proceedings of the VLDB Endowment
Probabilistic models to reconcile complex data from inaccurate data sources
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
SOLOMON: seeking the truth via copying detection
Proceedings of the VLDB Endowment
Data integration with dependent sources
Proceedings of the 14th International Conference on Extending Database Technology
Automatically building probabilistic databases from the web
Proceedings of the 20th international conference companion on World wide web
Semi-supervised truth discovery
Proceedings of the 20th international conference on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Characterizing the uncertainty of web data: models and experiences
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Solomon: seeking the truth via copying detection
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Determining the currency of data
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Content-driven trust propagation framework
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
On truth discovery in social sensing: a maximum likelihood estimation approach
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Determining the Currency of Data
ACM Transactions on Database Systems (TODS)
Web data reconciliation: models and experiences
Search Computing
Truth finding on the deep web: is the problem solved?
Proceedings of the VLDB Endowment
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Mining heterogeneous information networks: a structural analysis approach
ACM SIGKDD Explorations Newsletter
Compact explanation of data fusion decisions
Proceedings of the 22nd international conference on World Wide Web
Data fusion: resolving conflicts from multiple sources
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Maximum likelihood analysis of conflicting observations in social sensing
ACM Transactions on Sensor Networks (TOSN)
Agreement based source selection for the multi-topic deep web integration
Proceedings of the 17th International Conference on Management of Data
Hi-index | 0.00 |
Web technologies have enabled data sharing between sources but also simplified copying (and often publishing without proper attribution). The copying relationships can be complex: some sources copy from multiple sources on different subsets of data; some co-copy from the same source, and some transitively copy from another. Understanding such copying relationships is desirable both for business purposes and for improving many key components in data integration, such as resolving conflicts across various sources, reconciling distinct references to the same real-world entity, and efficiently answering queries over multiple sources. Recent works have studied how to detect copying between a pair of sources, but the techniques can fall short in the presence of complex copying relationships. In this paper we describe techniques that discover global copying relationships between a set of structured sources. Towards this goal we make two contributions. First, we propose a global detection algorithm that identifies co-copying and transitive copying, returning only source pairs with direct copying. Second, global detection requires accurate decisions on copying direction; we significantly improve over previous techniques on this by considering various types of evidence for copying and correlation of copying on different data items. Experimental results on real-world data and synthetic data show high effectiveness and efficiency of our techniques.