Scalable mining of large disk-based graph databases

Authors:
Chen Wang;Wei Wang;Jian Pei;Yongtai Zhu;Baile Shi
Affiliations:
Fudan University, China;Fudan University, China;State University of New York at Buffalo, NY & Simon Fraser University, Canada;Fudan University, China;Fudan University, China
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 8
Cited 21

Introduction to Algorithms

Introduction to Algorithms
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computing Frequent Graph Patterns from Semistructured Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data

GraphMiner: a structural pattern-mining system for large disk-based graph databases and its applications

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering Frequent Graph Patterns Using Disjoint Paths

IEEE Transactions on Knowledge and Data Engineering
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

IEEE Transactions on Knowledge and Data Engineering
Graph mining based on a data partitioning approach

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Mining frequent cross-graph quasi-cliques

ACM Transactions on Knowledge Discovery from Data (TKDD)
FOGGER: an algorithm for graph generator discovery

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
HADI: Mining Radii of Large Graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient algorithms for supergraph query processing on graph databases

Journal of Combinatorial Optimization
Efficient discovery of frequent subgraph patterns in uncertain graph databases

Proceedings of the 14th International Conference on Extending Database Technology
Constraint-Based graph mining in large database

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Relational approach for shortest path discovery over large graphs

Proceedings of the VLDB Endowment
Discovering re-usable design solutions in web conceptual schemas: metrics and methodology

ICWE'05 Proceedings of the 5th international conference on Web Engineering
A framework for SQL-Based mining of large graphs on relational databases

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Efficient mining of correlated sequential patterns based on null hypothesis

Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
Modelling and exploring historical records to facilitate service composition

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent structural patterns from graph databases is an interesting problem with broad applications. Most of the previous studies focus on pruning unfruitful search subspaces effectively, but few of them address the mining on large, disk-based databases. As many graph databases in applications cannot be held into main memory, scalable mining of large, disk-based graph databases remains a challenging problem. In this paper, we develop an effective index structure, ADI (for adjacency index), to support mining various graph patterns over large databases that cannot be held into main memory. The index is simple and efficient to build. Moreover, the new index structure can be easily adopted in various existing graph pattern mining algorithms. As an example, we adapt the well-known gSpan algorithm by using the ADI structure. The experimental results show that the new index structure enables the scalable graph pattern mining over large databases. In one set of the experiments, the new disk-based method can mine graph databases with one million graphs, while the original gSpan algorithm can only handle databases of up to 300 thousand graphs. Moreover, our new method is faster than gSpan when both can run in main memory.