Information discovery in loosely integrated data

  • Authors:
  • Heasoo Hwang;Andrey Balmin;Hamid Pirahesh;Berthold Reinwald

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We model heterogeneous data sources with cross references, such as those crawled on the (enterprise) web, as a labeled graph with data objects as typed nodes and references or links as edges. Given the labeled data graph, we introduce flexible and efficient querying capabilities that go beyond existing capabilities by additionally discovering meaningful relationships between objects that satisfy keyword and/or structured query filters. We introduce the relationship search operator that exploits the link structure between data objects to rank objects related to the result of a filter. We implement the search operator using the ObjectRank [1] algorithm that uses the random surfer model. We study several alternatives for constructing summary graphs for query results that consist of individual and aggregate nodes that are somehow linked to qualifying result nodes. Some of the summary graphs are useful for presenting query results to the user, while others could be used to evaluate subsequent queries efficiently without considering all the nodes and links in the original data graph.