High performance subgraph mining in molecular compounds

Authors:
Giuseppe Di Fatta;Michael R. Berthold
Affiliations:
Dept. of Computer and Information Science, University of Konstanz, Konstanz, Germany;Dept. of Computer and Information Science, University of Konstanz, Konstanz, Germany
Venue:
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Year:
2005

Citing 10
Cited 2

DIB—a distributed implementation of backtracking

ACM Transactions on Programming Languages and Systems (TOPLAS)
A randomized parallel branch-and-bound procedure

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing

Dynamic Load Balancing for the Distributed Mining of Molecular Structures

IEEE Transactions on Parallel and Distributed Systems
Decentralized load balancing for highly irregular search problems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.