A case study of parallel I/O for biological sequence search on Linux clusters

  • Authors:
  • Yifeng Zhu;Hong Jiang;Xiao Qin;David Swanson

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Maine, Orono, ME, USA.;Department of Computer Science and Engineering, University of Nebraska Lincoln, NE, USA.;Department of Computer Science, New Mexico Institute of Mining and Technology, Socorro, NM, USA.;Department of Computer Science and Engineering, University of Nebraska Lincoln, NE, USA

  • Venue:
  • International Journal of High Performance Computing and Networking
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.