Autotuned parallel I/O for highly scalable biosequence analysis

Authors:
Haihang You;Bhanu Rekapalli;Qing Liu;Shirley Moore
Affiliations:
National Institute for Computational Science, Oak Ridge, TN;National Institute for Computational Science, Oak Ridge, TN;National Institute for Computational Science, Oak Ridge, TN;University of Tennessee, Knoxville, TN
Venue:
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Year:
2011

Citing 8
Cited 0

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Exploiting Coarse-Grained Parallelism to Accelerate Protein Motif Finding with a Network Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ClawHMMER: A Streaming HMMer-Search Implementatio

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Accelerator design for protein sequence HMM search

Proceedings of the 20th annual international conference on Supercomputing
MPI-HMMER-Boost: Distributed FPGA Acceleration

Journal of VLSI Signal Processing Systems
HSP-HMMER: a tool for protein domain identification on a large scale

Proceedings of the 2009 ACM symposium on Applied Computing
Improving MPI-HMMER's scalability with parallel I/O

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the rate of genomics sequence generation increased dramatically due to significant advances in the sequencing technology. The genomics data is accumulating at an exponential rate in various databases all around the world and rapid analysis techniques will enhance the knowledge discovery in the fields of medicine and biotechnology. Analysis of such growing sequence databases demands tremendous computational power that can only be provided by massively parallel computers. Improving the performance and scalability of bioinformatics tools thus becomes a critical step in the quest to transform ever-growing raw genomics data into biological knowledge. In this paper we describe an efficient parallel implementation of a profile hidden Markov models (profile HMMs) code used for protein domain identification, along with auto-tuned parallel I/O optimization. Experimental results show linear speedup with increasing numbers of computing cores on a supercomputer, allowing the domain identification of millions of proteins in few minutes using hundreds of thousands computing cores.