An iterative MapReduce approach to frequent subgraph mining in biological datasets

Authors:
Steven Hill;Bismita Srichandan;Rajshekhar Sunderraman
Affiliations:
University of Maryland, College Park;Georgia State University;Georgia State University
Venue:
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Year:
2012

Citing 17
Cited 0

Knowledge discovery from structural data

Journal of Intelligent Information Systems
Graph-Based Data Mining

IEEE Intelligent Systems
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Dynamic Load Balancing for the Distributed Mining of Molecular Structures

IEEE Transactions on Parallel and Distributed Systems
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
DB-FSG: An SQL-Based Approach for Frequent Subgraph Mining

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
HDB-Subdue: A Scalable Approach to Graph Mining

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Substructure discovery using minimum description length and background knowledge

Journal of Artificial Intelligence Research
An efficient distributed subgraph mining algorithm in extreme large graphs

AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent subgraphs has attracted a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. There are many promising main memory-based techniques available in this area, but they lack scalability as the main memory is a bottleneck. Taking the massive data into consideration, traditional database systems like relational databases and object databases fail miserably with respect to efficiency as frequent subgraph mining is computationally intensive. With the advent of the MapReduce framework by Google, a few researchers have applied the MapReduce model on a single graph for mining frequent substructures. In this paper, we propose to make use of the MapReduce programming model which achieves multifold scalability on a set of labeled graphs. We tested our method on both real and synthetic datasets. To the best of our knowledge, this is the first attempt to implement transaction graphs using the MapReduce model.