Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

Authors:
Yun Zhang;Faisal N. Abu-Khzam;Nicole E. Baldwin;Elissa J. Chesler;Michael A. Langston;Nagiza F. Samatova
Affiliations:
University of Tennessee, Knoxville;Lebanese American University, Chouran, Beirut;Oak Ridge National Laboratory, TN;University of Tennessee, Memphis;University of Tennessee, Knoxville;Oak Ridge National Laboratory
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 9
Cited 9

Nonconstructive advances in polynomial-time complexity

Information Processing Letters
Nonconstructive tools for proving polynomial-time decidability

Journal of the ACM (JACM)
Regular Article: On search, decision, and the efficiency of polynomial-time algorithms

Proceedings of the 30th IEEE symposium on Foundations of computer science
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Parallel Out-of-Core Algorithm for Genome-Scale Enumeration of Metabolic Systemic Pathways

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scalable Parallel Algorithms for FPT Problems

Algorithmica
An O(2O(k)n3) FPT algorithm for the undirected feedback vertex set problem*

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Parameterized Complexity

Parameterized Complexity

Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Innovative computational methods for transcriptomic data analysis

Proceedings of the 2006 ACM symposium on Applied computing
Path: page access tracking to improve memory management

Proceedings of the 6th international symposium on Memory management
A scalable, parallel algorithm for maximal clique enumeration

Journal of Parallel and Distributed Computing
Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data

RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
The maximum clique enumeration problem: algorithms, applications and implementations

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
A systematic comparison of genome scale clustering algorithms

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
The cluster editing problem: implementations and experiments

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph-theoretical approaches to biological network analysis have proven to be effective for small networks but are computationally infeasible for comprehensive genome-scale systems-level elucidation of these networks. The difficulty lies in the NP-hard nature of many global systems biology problems that, in practice, translates to exponential (or worse) run times for finding exact optimal solutions. Moreover, these problems, especially those of an enumerative flavor, are often memory-intensive and must share very large sets of data effectively across many processors. For example, the enumeration of maximal cliques - a core component in gene expression networks analysis, cis regulatory motif finding, and the study of quantitative trait loci for high-throughput molecular phenotypes can result in as many as 3^n/3 maximal cliques for a graph with n vertices. Memory requirements to store those cliques reach terabyte scales even on modest-sized genomes. Emerging hardware architectures with ultra-large globally addressable memory such as the SGI Altix and Cray X1 seem to be well suited for addressing these types of data-intensive problems in systems biology. This paper presents a novel framework that provides exact, parallel and scalable solutions to various graph-theoretical approaches to genome-scale elucidation of biological networks. This framework takes advantage of these large-memory architectures by creating globally addressable bitmap memory indices with potentially high compression rates, fast bitwise-logical operations, and reduced search space. Augmented with recent theoretical advancements based on fixed-parameter tractability, this framework produces computationally feasible performance for genome-scale combinatorial problems of systems biology.