Complete Mining of Frequent Patterns from Graphs: Mining Graph Data
Machine Learning
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On mining cross-graph quasi-cliques
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding Frequent Patterns in a Large Sparse Graph*
Data Mining and Knowledge Discovery
To randomize or not to randomize: space optimal summaries for hyperlink analysis
Proceedings of the 15th international conference on World Wide Web
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Fg-index: towards verification-free query processing on graph databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Mining specifications of malicious behavior
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Fast computing reachability labelings for large graphs with high compression rate
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Graphs-at-a-time: query language and access methods for graph databases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CSV: visualizing and mining cohesive subgraphs
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ORIGAMI: Mining Representative Orthogonal Graph Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism
Proceedings of the VLDB Endowment
Efficient Correlation Search from Graph Databases
IEEE Transactions on Knowledge and Data Engineering
GADDI: distance index based subgraph matching in biological networks
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Mining graph patterns efficiently via randomized summaries
Proceedings of the VLDB Endowment
Graph pattern matching: from intractable to polynomial time
Proceedings of the VLDB Endowment
On graph query optimization in large networks
Proceedings of the VLDB Endowment
iGraph: a framework for comparisons of disk-based graph indexing techniques
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Graph search, i.e., finding all graphs in a database D that contain the query graph q, is a classical primitive prevalent in various graph database applications. In the past, there has been an abundance of studies devoting to this topic; however, with the recent emergence of large information networks, it places new challenges to the research community. Most of the traditional graph search schemes utilize the strategy of graph feature based indexing, whereas the index construction step that often involves frequent subgraph mining becomes a bottleneck for large graphs due to the high computational complexity. Although there have been several methods proposed to solve this mining bottleneck such as summarization of database graphs, the frequent subgraphs thus generated as indexing features are still unsatisfactory because the feature set is in general not only inadequate or deficient for the large graph scenario, but also with many redundant features. Furthermore, the large size of the graphs makes it too easy for a small feature to be contained in many of them, severely impacting its selectivity and pruning power. Motivated by all the above issues we identify, in this paper we propose a novel CP-Index (Contact Preservation) for efficient indexing of large graphs. To overcome the low selectivity issue, we reap further pruning opportunities by leveraging each feature's location information in the database graphs. Specifically, we look at how features are touching upon each other in the query, and check whether this contact pattern is preserved in the target graphs. Then, to tackle the deficiency and redundancy problems associated with features, new feature generation and selection methods such as dual feature generation and size-increasing bootstrapping feature selection are introduced to complete our design. Experiment results show that CP-Index is much more effective in indexing large graphs.