Semantics-based distributed I/O for mpiBLAST

Authors:
Pavan Balaji;Wu-chun Feng;Jeremy Archuleta;Heshan Lin;Rajkumar Kettimuthu;Rajeev Thakur;Xiaosong Ma
Affiliations:
Argonne National Laboratory, Argonne, IL, USA;Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA;North Carolina State University, Raleigh, NC, USA;Argonne National Laboratory, Argonne, IL, USA;Argonne National Laboratory, Argonne, IL, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Year:
2008

Citing 1
Cited 3

Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Semantic enabled metadata management in PetaShare

International Journal of Grid and Utility Computing
Data parallelism in bioinformatics workflows using Hydra

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processes to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limited I/O capabilities such as distributed filesystems spread across a wide-area network. Thus, we present ParaMEDIC---a novel environment that uses application-specific semantic information to compress I/O data and improve performance in distributed environments. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing the output to the filesystem, the workers process the output using semantic knowledge about the application to generate metadata and write the metadata to the filesystem. I/O workers, which physically reside closer to the actual storage, then process this metadata to re-create the actual output and write it to the filesystem. This approach allows ParaMEDIC to reduce I/O time, thus accelerating mpiBLAST by as much as 25-fold.