Sequence analysis on a 216-processor beowulf cluster

  • Authors:
  • Katerina Michalickova;Moyez Dharsee;Christopher W. V. Hogue

  • Affiliations:
  • Dept. of Biochemistry, University of Toronto and Samuel Lunenfeld Research Institute, Toronto, ON, Canada;Samuel Lunenfeld Research Institute, Toronto, ON, Canada;Dept. of Biochemistry, University of Toronto and Samuel Lunenfeld Research Institute, Toronto, ON, Canada

  • Venue:
  • ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we describe the implementation of a 216-processor Beowulf cluster with switched gigabit Ethernet networking. This design includes the use of a 8-CPU high performance midrange computer with 8 gigabit ports as a cluster head, a design that limits I/O contention. We have been developing applications software for bioinformatics research in protein folding, as well as the MoBiDiCK system for managing cluster applications that is extensible to general purpose distributed computing. In addition to the cluster architecture, we present a new cluster application for bioinformatics, a variant of the BLAST family of sequence comparison programs. MOBLAST performs the BLAST algorithm in an exhaustive manner, avoiding its initial heuristic approach to finding hits. This effectively slows BLAST down to approach the speed of other comprehensive search methods such as a Smith-Waterman alignment. MOBLAST requires a sizeable cluster to run. We describe the development of MOBLAST and its use in making an exhaustive M×N database of alignments where M is the set of protein sequences with known 3-D structures, and N is the set of all protein sequences. This M×N database of protein alignments will facilitate further research in protein folding, the ultimate aim of our work with Beowulf cluster technology. Furthermore, we describe a general algorithm for partitioning M×N problems and implement this in the MoBiDiCK computing model.