SIMS: Retrieving and integrating information from multiple sources
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Distributed and Parallel Databases
Record linkage: making maximum use of the discriminating power of identifying information
Communications of the ACM
Managing Reference: Ensuring Referential Integrity of Ontologies for the Semantic Web
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Ontology mapping: the state of the art
The Knowledge Engineering Review
Market Blended Insight: Modeling Propensity to Buy with the Semantic Web
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Collaborative Support for Community Data Sharing
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
URI identity management for semantic web data integration and linkage
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Distributed human computation framework for linked data co-reference resolution
ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Web Semantics: Science, Services and Agents on the World Wide Web
Put in your postcode, out comes the data: a case study
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Hi-index | 0.00 |
The ongoing trend towards open data embraced by the Semantic Web has started to produce a large number of data sources. These data sources are published using RDF vocabularies, and it is possible to navigate throughout the data due to their graph topology. This paper presents LinksB2N, an algorithm for discovering information overlaps in RDF data repositories and performing data integration with no human intervention over data sets that partially share the same domain. LinksB2N identifies equivalent RDF resources from different data sets with several degrees of confidence. The algorithm relies on a novel approach that uses clustering techniques to analyze the distribution of unique objects that contain overlapping information in different data graphs. Our contribution is illustrated in the context of the Market Blended Insight project by applying the LinksB2N algorithm to data sets in the order of hundreds of millions of RDF triples containing relevant information in the domain of business to business (B2B) marketing analysis.