Summarization graph indexing: beyond frequent structure-based approach

  • Authors:
  • Lei Zou;Lei Chen;Huaming Zhang;Yansheng Lu;Qiang Lou

  • Affiliations:
  • Huazhong University of Science and Technology, Wuhan, China;Hong Kong of Science and Technology, Hong Kong, China;The University of Alabama in Huntsville, Huntsville, AL;Hong Kong of Science and Technology, Hong Kong, China;The Temple University

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graph is an important data structure to model complex structural data, such as chemical compounds, proteins, and XML documents. Among many graph data-based applications, sub-graph search is a key problem, which is defined as given a query Q, retrieving all graphs containing Q as a sub-graph in the graph database. Most existing sub-graph search methods try to filter out false positives (graphs that are not possible in the results) as many as possible by indexing some frequent sub-structures in graph database, such as [20, 22, 4, 23]. However, due to ignoring the relationships between sub-structures, these methods still admit a high percentage of false positives. In this paper, we propose a novel concept, Summarization Graph, which is a complete graph and captures most topology information of the original graph, such as sub-structures and their relationships. Based on Summarization Graphs, we convert the filtering problem into retrieving objects with set-valued attributes. Moreover, we build an efficient signature file-based index to improve the filtering process. We prove theoretically that the pruning power of our method is larger than existing structure-based approaches. Finally, we show by extensive experimental study on real and synthetic data sets that the size of candidate set generated by Summarization Graph-based approach is only about 50% of that left by existing graph indexing methods, and the total response time of our method is reduced 2-10 times.