CP-index: on the efficient indexing of large graphs

Authors:
Yan Xie;Philip S. Yu
Affiliations:
University of Illinois at Chicago, chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 27
Cited 0

Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Mining specifications of malicious behavior

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Fast computing reachability labelings for large graphs with high compression rate

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CSV: visualizing and mining cohesive subgraphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ORIGAMI: Mining Representative Orthogonal Graph Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Efficient Correlation Search from Graph Databases

IEEE Transactions on Knowledge and Data Engineering
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Fast Graph Pattern Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
Graph pattern matching: from intractable to polynomial time

Proceedings of the VLDB Endowment
On graph query optimization in large networks

Proceedings of the VLDB Endowment
iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph search, i.e., finding all graphs in a database D that contain the query graph q, is a classical primitive prevalent in various graph database applications. In the past, there has been an abundance of studies devoting to this topic; however, with the recent emergence of large information networks, it places new challenges to the research community. Most of the traditional graph search schemes utilize the strategy of graph feature based indexing, whereas the index construction step that often involves frequent subgraph mining becomes a bottleneck for large graphs due to the high computational complexity. Although there have been several methods proposed to solve this mining bottleneck such as summarization of database graphs, the frequent subgraphs thus generated as indexing features are still unsatisfactory because the feature set is in general not only inadequate or deficient for the large graph scenario, but also with many redundant features. Furthermore, the large size of the graphs makes it too easy for a small feature to be contained in many of them, severely impacting its selectivity and pruning power. Motivated by all the above issues we identify, in this paper we propose a novel CP-Index (Contact Preservation) for efficient indexing of large graphs. To overcome the low selectivity issue, we reap further pruning opportunities by leveraging each feature's location information in the database graphs. Specifically, we look at how features are touching upon each other in the query, and check whether this contact pattern is preserved in the target graphs. Then, to tackle the deficiency and redundancy problems associated with features, new feature generation and selection methods such as dual feature generation and size-increasing bootstrapping feature selection are introduced to complete our design. Experiment results show that CP-Index is much more effective in indexing large graphs.