LinksB2N: Automatic Data Integration for the Semantic Web

  • Authors:
  • Manuel Salvadores;Gianluca Correndo;Bene Rodriguez-Castro;Nicholas Gibbins;John Darlington;Nigel R. Shadbolt

  • Affiliations:
  • Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK;Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK;Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK;Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK;Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK;Intelligence, Agents, Multimedia (IAM) Group, School of Electronics and Computer Science, University of Southampton, UK

  • Venue:
  • OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ongoing trend towards open data embraced by the Semantic Web has started to produce a large number of data sources. These data sources are published using RDF vocabularies, and it is possible to navigate throughout the data due to their graph topology. This paper presents LinksB2N, an algorithm for discovering information overlaps in RDF data repositories and performing data integration with no human intervention over data sets that partially share the same domain. LinksB2N identifies equivalent RDF resources from different data sets with several degrees of confidence. The algorithm relies on a novel approach that uses clustering techniques to analyze the distribution of unique objects that contain overlapping information in different data graphs. Our contribution is illustrated in the context of the Market Blended Insight project by applying the LinksB2N algorithm to data sets in the order of hundreds of millions of RDF triples containing relevant information in the domain of business to business (B2B) marketing analysis.