CP-index: on the efficient indexing of large graphs

  • Authors:
  • Yan Xie;Philip S. Yu

  • Affiliations:
  • University of Illinois at Chicago, chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graph search, i.e., finding all graphs in a database D that contain the query graph q, is a classical primitive prevalent in various graph database applications. In the past, there has been an abundance of studies devoting to this topic; however, with the recent emergence of large information networks, it places new challenges to the research community. Most of the traditional graph search schemes utilize the strategy of graph feature based indexing, whereas the index construction step that often involves frequent subgraph mining becomes a bottleneck for large graphs due to the high computational complexity. Although there have been several methods proposed to solve this mining bottleneck such as summarization of database graphs, the frequent subgraphs thus generated as indexing features are still unsatisfactory because the feature set is in general not only inadequate or deficient for the large graph scenario, but also with many redundant features. Furthermore, the large size of the graphs makes it too easy for a small feature to be contained in many of them, severely impacting its selectivity and pruning power. Motivated by all the above issues we identify, in this paper we propose a novel CP-Index (Contact Preservation) for efficient indexing of large graphs. To overcome the low selectivity issue, we reap further pruning opportunities by leveraging each feature's location information in the database graphs. Specifically, we look at how features are touching upon each other in the query, and check whether this contact pattern is preserved in the target graphs. Then, to tackle the deficiency and redundancy problems associated with features, new feature generation and selection methods such as dual feature generation and size-increasing bootstrapping feature selection are introduced to complete our design. Experiment results show that CP-Index is much more effective in indexing large graphs.