Link prediction for annotation graphs using graph summarization

  • Authors:
  • Andreas Thor;Philip Anderson;Louiqa Raschid;Saket Navlakha;Barna Saha;Samir Khuller;Xiao-Ning Zhang

  • Affiliations:
  • University of Maryland;University of Maryland;University of Maryland;University of Maryland;University of Maryland;University of Maryland;St. Bonaventure University

  • Venue:
  • ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.