Grid Approach to Embarrassingly Parallel CPU-Intensive Bioinformatics Problems

Authors:
Heinz Stockinger;Marco Pagni;Lorenzo Cerutti;Laurent Falquet
Affiliations:
Swiss Institute of Bioinformatics, Vital-IT, Switzerland;Swiss Institute of Bioinformatics, Vital-IT, Switzerland;Swiss Institute of Bioinformatics, Vital-IT, Switzerland;Swiss Institute of Bioinformatics, Vital-IT, Switzerland
Venue:
E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Year:
2006

Citing 0
Cited 4

An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

IEEE Transactions on Parallel and Distributed Systems
A heterogeneous computing environment to solve the 768-bit RSA challenge

Cluster Computing
A framework for readapting and running bioinformatics applications in the cloud

Proceedings of the 2012 ACM Research in Applied Computation Symposium
An improved partitioning mechanism for optimizing massive data analysis using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bioinformatics algorithms such as sequence alignment methods based on profile-HMM (Hidden Markov Model) are popular but CPU-intensive. If large amounts of data are processed, a single computer often runs for many hours or even days. High performance infrastructures such as clusters or computational Grids provide the techniques to speed up the process by distributing the workload to remote nodes, running parts of the work load in parallel. Biologists often do not have access to such hardware systems. Therefore, we propose a new system using a modern Grid approach to optimise an embarrassingly parallel problem. We achieve speed ups by at least two orders of magnitude given that we can use a powerful, world-wide distributed Grid infrastructure. For large-scale problems our method can outperform algorithms designed for mid-size clusters even considering additional latencies imposed by Grid infrastructures.